DOOMSAYER ( Detection Of Outliers using Mutation Spectrum AnalYsis in Extremely Rare variants) is a utility for analyzing patterns of rare, single-nucleotide variants (SNVs) in whole-genome (WGS) or whole-exome sequencing (WES) data.
The basic intuition behind Doomsayer is that the non-somatic mutation spectra of rare SNVs should have little inter-individual heterogeneity. If an individual's mutation spectrum differs drastically from the expected distribution, it is likely due to cryptic error biases or batch effects rather than genuine biological variation. Doomsayer uses a series of statistical analyses to identify these outlier samples, and provides a diagnostic report summarizing the observed error signatures, helping to ensure rigor and reproducibility in the analysis of WGS/WES data.
In addition to its purpose as a quality control program, Doomsayer can be applied more generally to study between-sample differences in somatic and germline mutation signatures.
Using Conda (recommended)
The following commands will create a Conda environment named
doomsayer and install all dependencies from
env.yml. This environment will install the R binaries and most required R packages; however, some packages are not available via the Conda channels, so must be installed in the environment using the
install.r script (note that this script will NOT install these packages outside of the
git clone https://github.com/carjed/doomsayer.git cd doomsayer conda env create -n doomsayer -f env.yml source activate doomsayer R --quiet -f install.r
Prerequisites for Doomsayer can also be installed using
pip and the included
git clone https://github.com/carjed/doomsayer.git cd doomsayer pip install -r pip_reqs.txt R --quiet -f install.r
Note that this method assumes you have
R (version 3.1 or higher) already installed. You will also need
pandoc (version 1.19) installed in order to render RMarkdown reports. If RStudio (or RStudio Server) is installed on your system, the necessary pandoc binaries should already be available, and no additional acction is needed.
Debian/Ubuntu users may run the
check_pandoc.sh script to confirm if the pandoc binaries are installed--if they are not found, the script will attempt to download the binaries to
Mac users will need to either install RStudio or manually install the binaries to
doomsayer/pandoc/ per the instructions described here.
For more flexible deployment options, Doomsayer is available as a Docker container. The following command will pull and run the preconfigured image from the Docker Hub:
docker run -d --name doomsayer \ -v /path/to/local/data:/data \ # map directory containing input data -p 8888:8888 \ # expose jupyter notebook on port 8888 start-notebook.sh --NotebookApp.token='' \ # start with token disabled carjed/doomsayer
You may also clone this repository and build the dockerfile locally, using the following commands:
git clone https://github.com/carjed/doomsayer.git cd doomsayer docker build -t latest --force-rm . docker run -d --name doomsayer \ -p 8888:8888 \ start-notebook.sh --NotebookApp.token='' \ doomsayer
In both cases, Doomsayer will be available as a Jupyter notebook server, accessible at http://[machine ip]:8888.
A prebuilt Doomsayer docker image can be accessed via the cloud-based Binder platform.
When launched, this will spawn a Jupyter notebook with an interactive tutorial to guide new users through the various options and use cases for Doomsayer.
Due to the resource constraints of the public BinderHub server, this should not be used to run Doomsayer on large datasets. However, if you have generated the subtype count matrix locally, you can easily upload this file into a Binder instance and run Doomsayer.