The Data Science Garage

Using Data Science to Fight Cancer

The Lab

Our lab works on the intersection of computer engineering, statistical analysis and biology science. Based out of the OHSU Knight Cancer Institute we study systems biology, cancer, computational systems and integrative analysis. We work on projects funded by the Nation Cancer Institute (NCI) and the National Human Genome Research Institute (NHGRI).

Projects

GDAN

The NCI’s Genomic Data Analysis Network (GDAN) was launched in 2016 to continue the work started by the TCGA in the analysis of large cancer cohorts.

AnVIL

The NHGRI Genomic Data Science Analysis, Visualization, and Informatics Lab-Space (AnVIL) is a project to build a data commons to allow researchers to efficiently analyze and visualize genomics data on the cloud.

GA4GH G2P

SMC Het

Grip

GRaph Integration Platform

BMEG

BioMedical Evidence Graph

DREAM-SMC

Somatic Mutation Calling

DREAM-SMC-RNA

RNA Fusion Calling and Isoform Quantification

Funnel

Funnel Task Execution Server

MC3

Multi-Center Mutation Calling in Multiple Cancers

People

Researchers

Brian Walsh

Senior Research Software Engineer

Graph Databases, Machine Learning, Pan Cancer, Cross Project Data Harmonization

Jordan Lee

Computational Biologist

Machine Learning, Precision Oncology, Cancer Early Detection

Liam Beckman

Research Software Engineer

Software Development, Computational Biology

Matthew Peterkort

Research Software Engineer

Graph Databases, JSON Schema, Data Science

Nasim Sanati

Computational Biologist

Biomedical Knowledge Graphs, Data Seience

Quinn Wai Wong

Research Software Engineer

Software Development, Systems Design, Computational Biology

Grad Students

Brian Karlberg

PhD student

Computational Biology, Precision Oncology

Principal Investigators

Kyle Ellrott

Associate Professor

Computational Biology, Data Science, Machine Learning, Precision Medicine, Cancer Early Detection

Alumni

Malisa Smith

Student Intern

Jeena Lee

Student Intern

Ryan Spangler

Research Software Engineer

Allison Creason

Alex Buchanan

Adam Struck

Research Software Engineer

Graph Databases, Machine Learning, Data Science

Recent Publications

Kyle Ellrott, Christopher K. Wong, Christina Yau, Mauro A. A. Castro, Jordan A. Lee, Brian J. Karlberg, Jasleen K. Grewal, Vincenzo Lagani, Bahar Tercan, Verena Friedl, Toshinori Hinoue, Vladislav Uzunangelov, Lindsay Westlake, Xavier Loinaz, Ina Felau, Peggy I. Wang, Anab Kemal, Samantha J. Caesar-Johnson, Ilya Shmulevich, Alexander J. Lazar, Ioannis Tsamardinos, Katherine A. Hoadley, A. Gordon Robertson, Theo A. Knijnenburg, Christopher C. Benz, Joshua M. Stuart, Jean C. Zenklusen, Andrew D. Cherniack, Peter W. Laird

January, 2025 Cancer Cell

Classification of non-TCGA cancer samples to TCGA molecular subtypes using compact feature sets

Summary Molecular subtypes, such as defined by The Cancer Genome Atlas (TCGA), delineate a cancer’s underlying biology, bringing hope to inform a patient’s prognosis and treatment plan. However, most approaches used in the discovery of subtypes are not suitable for assigning subtype labels to new cancer specimens from other studies or clinical trials. Here, we address this barrier by applying five different machine learning approaches to multi-omic data from 8,791 TCGA tumor samples comprising 106 subtypes from 26 different cancer cohorts to build models based upon small numbers of features that can classify new samples into previously defined TCGA molecular subtypes—a step toward molecular subtype application in the clinic. We validate select classifiers using external datasets. Predictive performance and classifier-selected features yield insight into the different machine-learning approaches and genomic data platforms. For each cancer and data type we provide containerized versions of the top-performing models as a public resource.