import panel as pn
"tabulator")
pn.extension(from tseda import datastore, model, vpages
tseda in the notebook
This notebook provides a minimum working example of how to load a tseda file into a notebook and how to make use of the application widgets as standalone objects. The example is based on the test file test.trees.tsdate.tseda
that can be found in the tests data folder.
Note that the API is still in development and that this notebook is meant to showcase basic functionality as currently implemented. Notably, it is currently not possible to modify the size of widgets, and accessing individual plots would benefit from an improved method naming system.
Module setup
First load the relevant tseda
modules and panel
extension for tables.
The datastore
module defines the classes that are used by all application pages, namely
SampleSetsTable
: manages and displays information about sample setsIndividualsTable
: handles individual data, such as population and sample set assignmentsDataStore
: provides access to the underlyingTreeSequence
data as well as the tables described above
model
defines a model of the tree sequence data called TSModel
. This is a modified version of tsbrowse.model.TSModel
.
Finally, vpages
holds a mapping to the application pages and widgets which can be instantiated by calling the relevant constructor method.
Setting up the datastore
We instantiate a model.TSModel
by providing a path to a tseda
file:
= model.TSModel("../tests/data/test.trees.tsdate.tseda")
tsm type(tsm)
tseda.model.TSModel
Note that the input tseda
file must have been generated by running tseda preprocess
on an input compressed tree sequence file. In order to instantiate a datastore.DataStore
object, we first need to generate two tables.
We first make the SampleSetsTable
.
= datastore.make_sample_sets_table(tsm) sample_sets_table
This table defines the sample sets used in the analyses. The starting table consists of the populations defined in the input tree sequence file, but custom sample sets can be added later on. Names and colors can be edited.
Next we load the IndividualsTable
. We need to connect it to the current sample_sets_table
and we set the page_size
attribute to reduce the number of individuals shown by default.
= datastore.make_individuals_table(tsm)
individuals_table # NB: this is a bug; we need to set the sample_sets_table manually
= sample_sets_table
individuals_table.sample_sets_table = 10
individuals_table.page_size individuals_table
Briefly, this table displays the individual samples and corresponding metadata, such as population, name, longitude, and latitude. Note the distinction between population
, which is immutable and corresponds to the original population assignment in the input tree sequence file, and sample_set_id
, which is a placeholder for the current population assignment. By modifying this column, we can make arbitrary population (sample set) assignments to individual samples. Finally, the selected
column lets us exclude samples from subsequent analyses.
With these three data structures in place, we can now define the datastore.DataStore
model:
= datastore.DataStore(tsm=tsm, individuals_table=individuals_table,
ds =sample_sets_table) sample_sets_table
Accessing vpages
The vpages
module has an attribute PAGES
that lists the available application pages:
vpages.PAGES
[tseda.vpages.overview.OverviewPage,
tseda.vpages.individuals.IndividualsPage,
tseda.vpages.structure.StructurePage,
tseda.vpages.ignn.IGNNPage,
tseda.vpages.stats.StatsPage,
tseda.vpages.trees.TreesPage]
Every page is instantiated by passing along the ds
object, as we show in the subsequent sections.
Overview
The overview page summarizes the tree sequence object.
= vpages.PAGES[0](datastore=ds)
ov ov
Individuals page
The individuals page displays three widgets:
- a map with sampling locations
- the sample set table
- the individuals table.
= vpages.PAGES[1](datastore=ds)
indp indp
Each widget can be accessed by calling the corresponding attributes, which here are geomap
, sample_sets_table
, and individuals_table
.
The individuals page also has a sidebar
function which is used in the application to modify tables, assign new sample sets and so on. This functionality does not yet work in the notebook setting however, so any modification of the datastore object must be done by editing the object manually.
Structure
The structure page summarizes population-wide genealogical nearest neighbor (GNN) values and \(F_{st}\).
= vpages.PAGES[2](datastore=ds)
struct struct
Individual GNN
The indivdual GNN page displays three widgets:
- a map with sampling locations
- a bar plot of individual-based GNN values, based on the current sample set definitions
- a widget, initially empty, for plotting chromosome-level GNN values
Instead of drawing the entire page, we here show how to access the initial widgets, starting with the bar plot of individual GNN values:
= vpages.PAGES[3](datastore=ds)
ignn ignn.vbar
Note that the plot is interactive and that you have a variety of bokeh tools, displayed on the right, to choose from, to interact with the plot.
As mentioned, the chromosome level GNN is empty to begin with:
ignn.gnnhaplotype
However, we can set the individual_id
attribute to actually plot the haplotypes. Note that these values are calculated on the fly and may be slow for large samples!
= 12
ignn.gnnhaplotype.individual_id ignn.gnnhaplotype
0it [00:00, ?it/s]199it [00:00, 6730.26it/s]
WARNING:param.main: sizing_mode option not found for area plot with bokeh; similar options include: []
0it [00:00, ?it/s]199it [00:00, 8632.75it/s]
WARNING:param.main: sizing_mode option not found for area plot with bokeh; similar options include: []
0it [00:00, ?it/s]199it [00:00, 9942.78it/s]
WARNING:param.main: sizing_mode option not found for area plot with bokeh; similar options include: []
0it [00:00, ?it/s]199it [00:00, 9807.14it/s]
WARNING:param.main: sizing_mode option not found for area plot with bokeh; similar options include: []
Statistics
Tree sequence statistics come in two flavors, one-way that are defined over single sample sets, and multi-way that compare two or more sample sets. The one-way statistics are accessible via the oneway
attribute:
= vpages.PAGES[4](datastore=ds)
stats stats.oneway
For multi-way statistics we need to set which sample sets to compare. Here, we can make use of the sidebar
functionality, or set sample set groups, formatted (exactly) as INDEX1 & INDEX2
:
= ['0 & 1', '0 & 2']
stats.multiway.comparisons.value stats.multiway
Here, the indexes correspond to sample set ids.
Trees
Finally, we can draw trees accessed by genomic position or index in the tree sequence:
= vpages.PAGES[5](datastore=ds)
trees = 10_000
trees.data.position trees
trees.sidebar()
WARNING:param.ParamMethod01936: The method supplied for Panel to display was declared with `watch=True`, which will cause the method to be called twice for any change in a dependent Parameter. `watch` should be False when Panel is responsible for displaying the result of the method call, while `watch=True` should be reserved for methods that work via side-effects, e.g. by modifying internal state of a class or global state in an application's namespace.
The slider can be used to modify the current position. The trees page actually renders the trees.data
attribute, which is where we can also set tree attributes manually. For instance, to increase the number of shown trees to three, we can set trees.data.num_trees.value=3
.
Conclusion
This notebook shows the basic functionality of tseda
plotting widgets.