tseda in the notebook

Example of tseda in the notebook
Author

Per Unneberg

This notebook provides a minimum working example of how to load a tseda file into a notebook and how to make use of the application widgets as standalone objects. The example is based on the test file test.trees.tsdate.tseda that can be found in the tests data folder.

Note that the API is still in development and that this notebook is meant to showcase basic functionality as currently implemented. Notably, it is currently not possible to modify the size of widgets, and accessing individual plots would benefit from an improved method naming system.

Module setup

First load the relevant tseda modules and panel extension for tables.

import panel as pn
pn.extension("tabulator")
from tseda import datastore, model, vpages

The datastore module defines the classes that are used by all application pages, namely

  • SampleSetsTable: manages and displays information about sample sets
  • IndividualsTable: handles individual data, such as population and sample set assignments
  • DataStore: provides access to the underlying TreeSequence data as well as the tables described above

model defines a model of the tree sequence data called TSModel. This is a modified version of tsbrowse.model.TSModel.

Finally, vpages holds a mapping to the application pages and widgets which can be instantiated by calling the relevant constructor method.

Setting up the datastore

We instantiate a model.TSModel by providing a path to a tseda file:

tsm = model.TSModel("../tests/data/test.trees.tsdate.tseda")
type(tsm)
tseda.model.TSModel

Note that the input tseda file must have been generated by running tseda preprocess on an input compressed tree sequence file. In order to instantiate a datastore.DataStore object, we first need to generate two tables.

We first make the SampleSetsTable.

sample_sets_table = datastore.make_sample_sets_table(tsm)

This table defines the sample sets used in the analyses. The starting table consists of the populations defined in the input tree sequence file, but custom sample sets can be added later on. Names and colors can be edited.

Next we load the IndividualsTable. We need to connect it to the current sample_sets_table and we set the page_size attribute to reduce the number of individuals shown by default.

individuals_table = datastore.make_individuals_table(tsm)
# NB: this is a bug; we need to set the sample_sets_table manually
individuals_table.sample_sets_table = sample_sets_table
individuals_table.page_size = 10
individuals_table

Briefly, this table displays the individual samples and corresponding metadata, such as population, name, longitude, and latitude. Note the distinction between population, which is immutable and corresponds to the original population assignment in the input tree sequence file, and sample_set_id, which is a placeholder for the current population assignment. By modifying this column, we can make arbitrary population (sample set) assignments to individual samples. Finally, the selected column lets us exclude samples from subsequent analyses.

With these three data structures in place, we can now define the datastore.DataStore model:

ds = datastore.DataStore(tsm=tsm, individuals_table=individuals_table,
    sample_sets_table=sample_sets_table)

Accessing vpages

The vpages module has an attribute PAGES that lists the available application pages:

vpages.PAGES
[tseda.vpages.overview.OverviewPage,
 tseda.vpages.individuals.IndividualsPage,
 tseda.vpages.structure.StructurePage,
 tseda.vpages.ignn.IGNNPage,
 tseda.vpages.stats.StatsPage,
 tseda.vpages.trees.TreesPage]

Every page is instantiated by passing along the ds object, as we show in the subsequent sections.

Overview

The overview page summarizes the tree sequence object.

ov = vpages.PAGES[0](datastore=ds)
ov

Individuals page

The individuals page displays three widgets:

  1. a map with sampling locations
  2. the sample set table
  3. the individuals table.
indp = vpages.PAGES[1](datastore=ds)
indp

Each widget can be accessed by calling the corresponding attributes, which here are geomap, sample_sets_table, and individuals_table.

The individuals page also has a sidebar function which is used in the application to modify tables, assign new sample sets and so on. This functionality does not yet work in the notebook setting however, so any modification of the datastore object must be done by editing the object manually.

Structure

The structure page summarizes population-wide genealogical nearest neighbor (GNN) values and \(F_{st}\).

struct = vpages.PAGES[2](datastore=ds)
struct

Individual GNN

The indivdual GNN page displays three widgets:

  1. a map with sampling locations
  2. a bar plot of individual-based GNN values, based on the current sample set definitions
  3. a widget, initially empty, for plotting chromosome-level GNN values

Instead of drawing the entire page, we here show how to access the initial widgets, starting with the bar plot of individual GNN values:

ignn = vpages.PAGES[3](datastore=ds)
ignn.vbar

Note that the plot is interactive and that you have a variety of bokeh tools, displayed on the right, to choose from, to interact with the plot.

As mentioned, the chromosome level GNN is empty to begin with:

ignn.gnnhaplotype

However, we can set the individual_id attribute to actually plot the haplotypes. Note that these values are calculated on the fly and may be slow for large samples!

ignn.gnnhaplotype.individual_id = 12
ignn.gnnhaplotype
0it [00:00, ?it/s]199it [00:00, 6730.26it/s]
WARNING:param.main: sizing_mode option not found for area plot with bokeh; similar options include: []
0it [00:00, ?it/s]199it [00:00, 8632.75it/s]
WARNING:param.main: sizing_mode option not found for area plot with bokeh; similar options include: []
0it [00:00, ?it/s]199it [00:00, 9942.78it/s]
WARNING:param.main: sizing_mode option not found for area plot with bokeh; similar options include: []
0it [00:00, ?it/s]199it [00:00, 9807.14it/s]
WARNING:param.main: sizing_mode option not found for area plot with bokeh; similar options include: []

Statistics

Tree sequence statistics come in two flavors, one-way that are defined over single sample sets, and multi-way that compare two or more sample sets. The one-way statistics are accessible via the oneway attribute:

stats = vpages.PAGES[4](datastore=ds)
stats.oneway

For multi-way statistics we need to set which sample sets to compare. Here, we can make use of the sidebar functionality, or set sample set groups, formatted (exactly) as INDEX1 & INDEX2:

stats.multiway.comparisons.value = ['0 & 1', '0 & 2']
stats.multiway

Here, the indexes correspond to sample set ids.

Trees

Finally, we can draw trees accessed by genomic position or index in the tree sequence:

trees = vpages.PAGES[5](datastore=ds)
trees.data.position = 10_000
trees
trees.sidebar()
WARNING:param.ParamMethod01936: The method supplied for Panel to display was declared with `watch=True`, which will cause the method to be called twice for any change in a dependent Parameter. `watch` should be False when Panel is responsible for displaying the result of the method call, while `watch=True` should be reserved for methods that work via side-effects, e.g. by modifying internal state of a class or global state in an application's namespace.

The slider can be used to modify the current position. The trees page actually renders the trees.data attribute, which is where we can also set tree attributes manually. For instance, to increase the number of shown trees to three, we can set trees.data.num_trees.value=3.

Conclusion

This notebook shows the basic functionality of tseda plotting widgets.