The Data Science Garage

Using Data Science to Fight Cancer




GRaph Integration Platform


BioMedical Evidence Graph


Somatic Mutation Calling


RNA Fusion Calling and Isoform Quantification


Funnel Task Execution Server


Multi-Center Mutation Calling in Multiple Cancers

Selected Publications

The Cancer Genome Atlas (TCGA) cancer genomics dataset includes over 10,000 tumor-normal exome pairs across 33 different cancer types, in total >400 TB of raw data files requiring analysis. Here we describe the Multi-Center Mutation Calling in Multiple Cancers project, our effort to generate a comprehensive encyclopedia of somatic mutation calls for the TCGA data to enable robust cross-tumor-type analyses. Our approach accounts for variance and batch effects introduced by the rapid advancement of DNA extraction, hybridization-capture, sequencing, and analysis methods over time. We present best practices for applying an ensemble of seven mutation-calling algorithms with scoring and artifact filtering. The dataset created by this analysis includes 3.5 million somatic variants and forms the basis for PanCan Atlas papers. The results have been made available to the research community along with the methods used to generate them. This project is the result of collaboration from a number of institutes and demonstrates how team science drives extremely large genomics projects..
Cell System, 2018

Recent Publications

. A Community Challenge for Inferring Genetic Predictors of Gene Essentialities through Analysis of a Functional Screen of Cancer Cell Lines.. 2017.

. TumorMap: Exploring the Molecular Similarities of Cancer Samples in an Interactive Portal.. 2017.

. Combining tumor genome simulation with crowdsourcing to benchmark somatic single-nucleotide-variant detection.. 2015.

. Global optimization of somatic variant identification in cancer genomes with a global community challenge.. 2014.

. Enabling transparent and collaborative computational analysis of 12 tumor types within The Cancer Genome Atlas.. 2013.

. The Cancer Genome Atlas Pan-Cancer analysis project.. 2013.

. The UCSC Cancer Genomics Browser: update 2013.. 2013.

. TOPSAN: a dynamic web database for structural genomics.. 2011.