The Data Science Garage

Using Data Science to Fight Cancer

The Lab

Our lab works on the intersection of computer engineering, statistical analysis and biology science. Based out of the OHSU Knight Cancer Institute we study systems biology, cancer, computational systems and integrative analysis. We work on projects funded by the Nation Cancer Institute (NCI) and the National Human Genome Research Institute (NHGRI).

People

Researchers

Avatar

Brian Walsh

Senior Research Software Engineer

Graph Databases, Machine Learning, Pan Cancer, Cross Project Data Harmonization

Avatar

Jordan Lee

Computational Biologist

Machine Learning, Precision Oncology, Cancer Early Detection

Avatar

Liam Beckman

Research Software Engineer

Software Development, Computational Biology

Avatar

Matthew Peterkort

Research Software Engineer

Graph Databases, JSON Schema, Data Science

Avatar

Nasim Sanati

Computational Biologist

Biomedical Knowledge Graphs, Data Seience

Avatar

Quinn Wai Wong

Research Software Engineer

Software Development, Systems Design, Computational Biology

Grad Students

Avatar

Brian Karlberg

PhD student

Computational Biology, Precision Oncology

Principal Investigators

Avatar

Kyle Ellrott

Associate Professor

Computational Biology, Data Science, Machine Learning, Precision Medicine, Cancer Early Detection

Alumni

Avatar

Malisa Smith

Student Intern

Avatar

Jeena Lee

Student Intern

Avatar

Ryan Spangler

Research Software Engineer

Avatar

Adam Struck

Research Software Engineer

Graph Databases, Machine Learning, Data Science

Recent Posts

Recent Publications

Classification of non-TCGA cancer samples to TCGA molecular subtypes using compact feature sets

Summary Molecular subtypes, such as defined by The Cancer Genome Atlas (TCGA), delineate a cancer’s underlying biology, bringing hope to inform a patient’s prognosis and treatment plan. However, most approaches used in the discovery of subtypes are not suitable for assigning subtype labels to new cancer specimens from other studies or clinical trials. Here, we address this barrier by applying five different machine learning approaches to multi-omic data from 8,791 TCGA tumor samples comprising 106 subtypes from 26 different cancer cohorts to build models based upon small numbers of features that can classify new samples into previously defined TCGA molecular subtypes—a step toward molecular subtype application in the clinic. We validate select classifiers using external datasets. Predictive performance and classifier-selected features yield insight into the different machine-learning approaches and genomic data platforms. For each cancer and data type we provide containerized versions of the top-performing models as a public resource.