DOI Docs License: MIT Binder Build Status

Introduction

DOOMSAYER ( Detection Of Outliers using Mutation Spectrum AnalYsis in Extremely Rare variants) is a utility for analyzing patterns of rare, single-nucleotide variants (SNVs) in whole-genome (WGS) or whole-exome sequencing (WES) data.

The basic intuition behind Doomsayer is that the non-somatic mutation spectra of rare SNVs should have little inter-individual heterogeneity. If an individual's mutation spectrum differs drastically from the expected distribution, it is likely due to cryptic error biases or batch effects rather than genuine biological variation. Doomsayer uses a series of statistical analyses to identify these outlier samples, and provides a diagnostic report summarizing the observed error signatures, helping to ensure rigor and reproducibility in the analysis of WGS/WES data.

In addition to its purpose as a quality control program, Doomsayer can be applied more generally to study between-sample differences in somatic and germline mutation signatures.


Setup

The following commands will create a Conda environment named doomsayer and install all dependencies from env.yml. This environment will install the R binaries and most required R packages; however, some packages are not available via the Conda channels, so must be installed in the environment using the install.r script (note that this script will NOT install these packages outside of the doomsayer environment).

git clone https://github.com/carjed/doomsayer.git
cd doomsayer

conda env create -n doomsayer -f env.yml
source activate doomsayer

R --quiet -f install.r

Local install

Prerequisites for Doomsayer can also be installed using pip and the included install.r script:

git clone https://github.com/carjed/doomsayer.git
cd doomsayer

pip install -r pip_reqs.txt

R --quiet -f install.r

Note that this method assumes you have pip and R (version 3.1 or higher) already installed. You will also need pandoc (version 1.19) installed in order to render RMarkdown reports. If RStudio (or RStudio Server) is installed on your system, the necessary pandoc binaries should already be available, and no additional acction is needed.

Debian/Ubuntu users may run the check_pandoc.sh script to confirm if the pandoc binaries are installed--if they are not found, the script will attempt to download the binaries to doomsayer/pandoc/.

bash check_pandoc.sh

Mac users will need to either install RStudio or manually install the binaries to doomsayer/pandoc/ per the instructions described here.

Docker

For more flexible deployment options, Doomsayer is available as a Docker container. The following command will pull and run the preconfigured image from the Docker Hub:

docker run -d --name doomsayer \
  -v /path/to/local/data:/data \ # map directory containing input data
  -p 8888:8888 \ # expose jupyter notebook on port 8888
  start-notebook.sh --NotebookApp.token='' \ # start with token disabled
  carjed/doomsayer

You may also clone this repository and build the dockerfile locally, using the following commands:

git clone https://github.com/carjed/doomsayer.git
cd doomsayer

docker build -t latest --force-rm .

docker run -d --name doomsayer \
  -p 8888:8888 \
  start-notebook.sh --NotebookApp.token='' \
  doomsayer

In both cases, Doomsayer will be available as a Jupyter notebook server, accessible at http://[machine ip]:8888.

Binder

Binder

A prebuilt Doomsayer docker image can be accessed via the cloud-based Binder platform.

When launched, this will spawn a Jupyter notebook with an interactive tutorial to guide new users through the various options and use cases for Doomsayer.

Due to the resource constraints of the public BinderHub server, this should not be used to run Doomsayer on large datasets. However, if you have generated the subtype count matrix locally, you can easily upload this file into a Binder instance and run Doomsayer.