Usage

This section gives examples of general usage. For a tutorial on setting up workflow tests, jump to the Tutorial section. The Module Index contains more code examples related to each module and function.

Next-generation sequencing fixtures

One of the main purposes of pytest_ngsfixtures is to provide functionality for setting up fixtures that can be used to test applications, such as workflows. The predefined test fixtures consist of a test path (formally a py._path.local.LocalPath object) in which test files have been setup following some file organization setup, henceforth referred to as test layout or simply layout. Basically, a layout is a set of links to (or copies of) the test data files. Currently there are predefined test fixtures for sequence data and reference data, with the main purpose of being used for testing analysis workflows from scratch.

Fixtures

There are three main fixtures that can be configured with the pytest.mark helper. In general, the test data files are defined as a dictionary of key:value pairs that are passed via the data option (or similar option) to the pytest.mark.testdata helper. Some fixtures predefine output directories which can be configured with the dirname option. The key corresponds the test fixture file path relative to the pytest root directory, whereas the value is the path to the test data file. In addition, there is a testunit option that allows grouping fixtures in the same test directory.

Under the hood, the fixtures call the class Fixture to setup the fixture. See Creating fixtures with the Fixture class for more information.

pytest_ngsfixtures.plugin.testdata()

A generic fixture for setting up test data.

pytest_ngsfixtures.plugin.samples()

A fixture for setting up sequence read data. Data files are defined via the layout option and are placed in the data directory. The layout and dirname options can also be configured via pytest.mark.parametrize, which enables the parametrization over different sample layouts:

@pytest.mark.parametrize("layout", [{'s1.fastq.gz': '/path/to/foo.fastq.gz'},
                                    {'s2.fastq.gz': '/path/to/foo.fastq.gz'}])
def test_samples(samples, layout):
    print(samples.listdir())

There are a number of predefined layouts defined in the pytest_ngsfixtures.config.layout dictionary.

pytest_ngsfixtures.plugin.ref()

A fixture for setting up reference data, by default in the data directory.

Files

Fixture files live in subdirectories of the pytest_ngsfixtures/data directory:

ref/

Reference data files which are used by default by the ref fixture.

seq/

Sequence files.

The sequence directory consists of the following files:

File name                   Sample ID         Type                Population
--------------------------  ------------      -----------------   ------------
CHS.HG00512_1.fastq.gz      CHS.HG00512       Individual          Han-Chinese
CHS.HG00513_1.fastq.gz      CHS.HG00513       Individual          Han-Chinese
CHS_1.fastq.gz              CHS               Pool                Han-Chinese
PUR.HG00731.A_1.fastq.gz    PUR.HG00731.A     Individual, run A   Puerto Rico
PUR.HG00731.B_1.fastq.gz    PUR.HG00731.B     Individual, run B   Puerto Rico
PUR.HG00733.A_1.fastq.gz    PUR.HG00733.A     Individual, run A   Puerto Rico
PUR.HG00733.B_1.fastq.gz    PUR.HG00733.B     Individual, run B   Puerto Rico
PUR_1.fastq.gz              PUR               Pool, run A         Puerto Rico
YRI.NA19238_1.fastq.gz      YRI.NA19238       Individual          Yoruban
YRI.NA19239_1.fastq.gz      YRI.NA19238       Individual          Yoruban
YRI_1.fastq.gz              YRI               Pool                Yoruban

and similarly for read 2. The sequence files have been generated from the 1000 genomes project, two each from the populations CHS (Han-Chinese), PUR (Puerto Rico) and YRI (Yoruban). They have been selected based on mappings to a variable region on chromosome 6 to ensure that running variant callers on the different data sets will generate differing variant call sets. The pools are simply concatenated versions of the individual files, with a ploidy of 4.

Advanced usage

Parametrizing existing sample layouts

pytest supports parametrizing tests over fixtures. The following code example shows how to parametrize over the predefined layouts:

@pytest.fixture(scope="function", autouse=False)
def data(request):
    return request.getfuncargvalue(request.param)

@pytest.mark.parametrize("data", pytest.config.getoption("ngs_layout", ["sample"]), indirect=["data"])
def test_run(data):
    # Do something with data

Here, we define an indirect fixture that calls one of the predefined layout fixtures by use of the request.getfuncargvalue function.

Grouping fixtures in test directories

When parametrizing fixtures over several conditions, it may be of interest to group fixtures in separate parametrized test directories. This can be achieved by using the testunit fixture option, as the following example shows:

@pytest.mark.parametrize("testunit", ["context1", "context2"])
def test_with_context(samples, ref, testunit):
    # Do something with data
    # Sample data will end up in context1/data, reference data in
    # context1/ref for context1 and so on

Creating fixtures with the Fixture class

In addition to using and configuring the predefined fixtures, you can setup fixtures by directly calling the Fixture class. The path option can be used to override invocation of the tmpdir_factory that otherwise is called at fixture setup. This feature is primarily useful when fixtures have to take parametrized values into account.

import pytest
from pytest_ngsfixtures.plugin import Fixture

@pytest.fixture
def metadata(request):
    p = Fixture(request, path=request.getfixturevalue("samples"))

@pytest.mark.parametrize("layout", [layout1, layout2])
def test_layout(samples, layout, metadata):
    # Do something with data

Plugin options

-nt, –ngs-threads

Set the number of threads to use in a given test.