BEELINE¶
Overview¶
BEELINE provides a set of tools for evaluating methods that infer gene regulatory networks (GRN) from single-cell gene expression data. The BEELINE framework is divided into the following modules:
- BLRun package : contains the BEELINE’s Runner module, a Python wrapper for 12 GRN inference algorithms with options to add new methods.
- BLEval package : contains the BEELINE’s Evaluation module that provides easy-to-use tools for evaluating GRN reconstructions.
- BLPlot package : contains the BEELINE’s plotting module for visualizing output from BLEval.
Getting Started¶
The BEELINE pipeline interfaces with the implementations of various algorithms through Docker containers. Please follow this tutorial on how to install docker on Ubuntu 18.0.
Tip
Setup docker to run docker without sudo using the following shell command
sudo usermod -aG docker $USER
See more details here.
Once docker has been set up correctly, the next step is to create the docker containers for each of the algorithms. The script initialize.sh builds these containers. Run the following in the terminal
. initialize.sh
Note
This step will take a while!
To compute ranked list of edges, run
python BLRun.py --config config-files/config.yaml
To compute areas under the ROC and PR curves using the BEELINE’s evaluation pipeline, run
python BLEvalAggregator.py --config config-files/config.yaml --auc
To display the complete list of evaluation options, run
python BLEvalAggregator.py --help
Tutorial¶
This tutorial will first explain the structure of the Beeline repository, with a walkthrough of the different components that the user can customize.
Project outline¶
The BEELINE repository is structured as follows:
Beeline
|-- inputs/
| `-- examples/
| `-- GSD/
| |--refNetwork.csv
| |--PseudoTime.csv
| `--ExpressionData.csv
|-- config-files/
| `-- config.yaml
|-- BLRun/
| |-- sinceritiesRunner.py
| `-- ...
|-- BLPlot/
| |-- NetworkMotifs.py
| `-- CuratedOverview.py
|-- BLEval/
| |-- parseTime.py
| `-- ...
`-- Algorithms/
`-- SINCERITIES/
Input Datasets¶
The sample input data set provided is generated by BoolODE using the Boolean model of Gonadal Sex Determination as input. Note that this dataset has been pre-processed to produce three files that are required in the BEELINE pipline.
- ExpressionData.csv contains the RNAseq data, with genes as rows and cell IDs as columns. This file is a required input to the pipline. Here is a sample ExpressionData.csv file
- PseudoTime.csv contains the pseudotime values for the cells in ExpressionData.csv. We recommend using the Slingshot method to obtain the pseudotime for a dataset. Many algorithms in the pipeline require a pseudotime file as input. Here is a sample PseudoTime file.
- refNetwork.csv contains the ground truth network underlying the interactions between genes in ExpressionData.csv. Typically this network is not available, and will have to be curated from various Transcription Factor databases. While this file is not a requirement to run the base pipeline, a reference network is required to run some of the performance evaluations in BLEval package. Here is a sample refNetwork.csv file
The figure below shows the t-SNE visualization of the expression data from the example dataset.
This dataset shows a bifurcating trajectory, as is evidenced by the part (a) of the figure, where each ‘cell’ is colored by the timepoint at which it was sampled in the simulation (the darker colors indicate earlier time points). Clustering the simulation confirms the two trajectories, indicated in red and blue respectively in part (b). Finally, running Slingshot on this dataset and specifying the cluster of cells corresponding to the early time points yields two pseudotime trajectories, shown in part (c). For details on the generation of this simulated dataset, see BoolODE.
Attention
Please ensure that any input dataset you create is comma separated, and contains the correct style of column names.
Config files¶
Beeline uses YAML files to allow users to flexibly specify inputs and algorithm run parameters. A sample config file is provided in here. A config file should contain at minimum
input_settings:
datasets:
- name: "Dataset name"
exprData: "Expression Data filename"
cellData: "PseudoTime filename"
trueEdges: "Ground truth network filename"
algorithms:
- name: "Algorithm name"
params:
# Any other parameters that can be passed to
# this algorithm
should_run: [True] # or False
Apart from indicating the path to the base directory and the specific folder containing the input, the config file indicates which algorithms should be run, along with the parameters to be passed to the algorithms, if any. For a list of parameters that the pipeline currently passes to the algorithms, see Details of supported algorithms . Finally, the YAML file also specifies the paths to the outputs.
Attention
Please ensure that the YAML file is correctly indented!
Running the pipeline¶
Once the input dataset has been generated and formatted as described
in Section Input Datasets , and the config file has been
created as described in Config files, the pipeline can be
executed by simply calling BLRun.py with the config file
specifying the inputs and the algorithms to run, passed using the
--config option which takes the path to the config file.
To run the pipeline, simply invoke
python BLRun.py --config PATH/TO/CONFIG/FILE
For details about the implementation of BLRun , see BLRun Details .
Running the evaluation scripts¶
Each algorithm outputs an inferred network in the form of a ranked edge list. BEELINE implements a consistent interface using the config file in order to retrieve the predictions of multiple algorithms and evaluate them using a variety of methods.
The evaluation of the inferred networks is done by calling the
BLEvalAggregator.py script. Like the BLRun.py script, the
EvalAggregator script takes the config file as input. Every subsequent
option passed to this script calls a different evaluation script. For instance,
in order to analyze the AUROC and AUPRC values, and also analyze network motifs,
use the following command
python BLEvalaggregator.py --config PATH/TO/CONFIG/FILE \
--auc \ # calls the computeAUC script
--motifs \ # calls the computeNetMotifs script
For a full list of available evaluation functions and their corresponding options to
be passed to BLEvalAggregator.py, see BLEval Details .
Details of supported algorithms¶
The following table lists the algorithms and the parameters they take as input, along with the default parameter values
| Algorithms | Input Parameters |
|---|---|
| SINCERITIES |
|
| SCODE |
|
| SCNS | None |
| SCINGE |
|
| PPCOR |
|
| PIDC | None |
| LEAP |
|
| SCRIBE |
|
| GRNVBEM | None |
| GRISLI |
|
| GENIE3 | None |
| GRNBOOST2 | None |
Developer Guide¶
BLRun Details¶
The BLRun.py creates a BLRun object for each algorithm
specified in the config file. Each BLRun object should contain three
modules
generateInputs(): This function reads the three input data files, and processes them into the format required by the given algorithmrun(): A function to construct a system command with the appropriate command line parameters to be passed to the docker container in order to run a given algorithmparseOutput(): A function to read the algorithm-specific output and reformat it into a standard format
The evaluation scripts in the final step of the pipeline expect the inferred networks from each algorithm to be a comma-separated file with the following format:
Gene1,Gene2,EdgeWeight
reg1,targ1,edgeweight
where the first line are the column names, and the subsequent lines contain the edges predicted by the network. The Gene1 column should contain regulators, the Gene2 column the targets, and EdgeWeight column the absolute value of the weight predicted for edge (regulator, target).
BLEval Details¶
This is the list of options of currently implemented evaluation functions
- Command line arguments:
-h, --help show this help message and exit -c CONFIG, --config CONFIG Configuration file containing list of datasets algorithms and output specifications. -a, --auc Compute median of areas under Precision-Recall and ROC curves. Calls BLEval.computeAUC.-j, --jaccard Compute median Jaccard index of predicted top-k networks for each algorithm for a given set of datasets generated from the same ground truth network. Calls BLEval.computeJaccard.-r, --spearman Compute median Spearman Corr. of predicted edges for each algorithm for a given set of datasets generated from the same ground truth network. Calls BLEval.computeSpearman.-t, --time Analyze time taken by each algorithm for a. Calls BLEval.parseTime.-e, --epr Compute median early precision. Calls BLEval.computeEarlyPrec.-s, --sepr Analyze median (signed) early precision for activation and inhibitory edges. BLEval.computeSignedEPrec.-m, --motifs Compute network motifs in the predicted top-k networks. Calls BLEval.computeNetMotifs.
API Reference¶
- BLRun package
- Submodules
- BLRun.genie3Runner module
- BLRun.grisliRunner module
- BLRun.grnboost2Runner module
- BLRun.grnvbemRunner module
- BLRun.jump3Runner module
- BLRun.leapRunner module
- BLRun.pidcRunner module
- BLRun.ppcorRunner module
- BLRun.runner module
- BLRun.scingeRunner module
- BLRun.scnsRunner module
- BLRun.scodeRunner module
- BLRun.scribeRunner module
- BLRun.sinceritiesRunner module
- BLEval package
- BLPlot package