Cambridge Healthtech Institute’s Sixth Annual

Bioinformatics for Big Data

Creating Actionable Data

February 20-22, 2017 | Moscone North Convention Center | San Francisco, CA
Part of the 24th International Molecular Medicine Tri-Conference


Omics technologies and the pursuit of precision medicine continue to produce vast amounts of data. There is a growing need for specialists who can make this data meaningful and actionable for both clinicians and scientists. Computational biologists, more than ever, are making real contributions to the practice of healthcare and the development of personalized therapies. The Bioinformatics for Big Data Conference at the Molecular Medicine TriCon 2017 will showcase how a community from medical/academic centers and industry are developing tools and bioinformatics software for the identification, compilation, analysis, and visualization of huge amounts of biological, health, and personal data to meet this goal.

You may be interested in these related Tri-Conference short courses: 

SC3:       Sequencing 101  

SC21:     Best Practices in Personalized and Translational Medicine

Monday, February 20

10:30 am Conference Program Registration Open


11:50 Chairperson’s Opening Remarks  

John Mattison, M.D., Assistant Medical Director, Chief Medical Information Officer, Kaiser Permanente, SCAL

12:00 pm Making Experimental Results Findable, Accessible, Interoperable and Reusable: The CEDAR Technology for Managing Online Biomedical Metadata

Mark Musen, M.D., Ph.D., Professor, Biomedical Informatics at Stanford University, Director of the Stanford Center for Biomedical Informatics Research

The Center for Expanded Data Annotation and Retrieval (CEDAR) develops information technology to ease the authoring and management of the metadata that investigators need to make sense of online datasets. The approach aids the verification of experimental results, the secondary analysis of biomedical data, and the integration of online datasets. Partnerships with large projects such as LINCS and HIPC are demonstrating the value of CEDAR technology to the annotation of online Big Data.

12:30 Conducting Cancer Research in a Distributed Cloud Environment

Anthony R. Kerlavage, Ph.D., Chief, Cancer Informatics Branch, National Cancer Institute, Center for Biomedical Informatics & Information Technology

The NCI has launched the Genomic Data Commons and Cancer Genomics Cloud Pilots as a secure ecosystem to provide a repository for the growing amount of cancer genomic and related clinical data, and an analytics platform to conduct research on large cancer datasets. Together, these form the foundational elements of a Cancer Knowledge System.

1:00 Enjoy Lunch on Your Own



2:30 Chairperson’s Opening Remarks

John C. Earls, MS, Graduate Research Assistant, Nathan Price Lab, Institute for Systems Biology

2:40 From Big Data to Actionability: Lessons from the Pioneer 100 Project and Beyond

John C. Earls, MS, Graduate Research Assistant, Nathan Price Lab, Institute for Systems Biology

Healthcare is becoming more proactive and data-rich than anything before possible – and will increasingly focus on maintaining and enhancing wellness more than just reacting to disease. Lee Hood and I have recently launched a large-scale wellness project that integrates genomics, proteomics, transcriptomics, microbiomes, clinical chemistries and wearable devices of the quantified self to monitor wellness and disease, which is scaling now to thousands of people. I will present results from our proof-of-concept pilot study in a set of 108 individuals (the Pioneer 100 study), showing how the interpretation of this data led to actionable findings for individuals to improve health and reduce risk drivers of disease.

3:10 Integrating Multi-Omics Data for Clinical Actions

Han Liang, Ph.D., Associate Professor and Deputy Chair, Department of Bioinformatics and Computational Biology, Associate Professor, Department of Systems Biology, The University of Texas MD Anderson Cancer Center

Cancer omics data has been accumulated at a fascinating speed, and one key question is how to use these data to facilitate clinical decisions for precision cancer medicine. I will present the resource of cancer proteomics data based on reverse-phase protein arrays and discuss their utility in predict patient survival and drug options. I will also discuss how to identify driver mutations by using high-throughput functional assays.

3:40 From Bits to Bedside: Translating Big Data into Precision Medicine and Digital Health

Dexter Hadley, M.D., Ph.D., Assistant Professor of Pediatrics, Institute for Computational Health Sciences, University of California, San Francisco

In this talk, I will use examples from my research using big data analytics to define ideals of precision medicine and digital health across a variety of diseases. Specifically, I will introduce the audience to my work in large-scale population-wide analysis with public and private data sources, and my work on using mobile technology and digital health to foster these two ideals and improve patient care.

4:10 Democratizing Cancer Data to Accelerate Discovery

Anurag Sethi, Ph.D., Bioinformatics Scientist, Seven Bridges

The Cancer Genomics Cloud (CGC) democratizes access to The Cancer Genome Atlas. The CGC removes the need to download data, enables easy querying, and much more. We discuss its design and the avenues of discovery it enables.

4:40 Refreshment Break and Transition to Plenary Session

5:00 Plenary Keynote Session

6:00 Grand Opening Reception in the Exhibit Hall with Poster Viewing

7:30 Close of Day

Tuesday, February 21

7:30 am Registration Open and Morning Coffee

8:00 Plenary Keynote Session

9:00 Refreshment Break in the Exhibit Hall with Poster Viewing


10:05 Chairperson’s Remarks

Anthony R. Kerlavage, Ph.D., Chief, Cancer Informatics Branch, Center for Biomedical Informatics & Information Technology, National Cancer Institute

10:15 Sharing Data for Genomic Medicine

David Haussler, Ph.D., Professor of Biomolecular Engineering, UC Santa Cruz & Scientific Director, UC Santa Cruz Genomics Institute

Every human disease is a rare disease at the molecular level. No single institute has enough patients to understand any particular molecular subtype. For genomics to benefit medicine and science, we must share data. I outline the data standards and Application Programming Interfaces developed by the Global Alliance for Genomics and Health (GA4GH) that are intended to address this issue, and highlight a few global genomics projects that use them. Currently the human reference genome GRCh38 captures only a tiny fraction of common human genetic variation in its chosen alternative haplotype regions, and these are seldom used. I describe ideas discussed by the GA4GH for future extensions of the reference genome into a fuller graph-like structure to more adequately capture human genetic variation, so that the reference itself becomes a source of such information. This will enable a better standardized and more accurate discourse about human genetic variation for science and medicine. The GA4GH is supporting driver projects for its technology and standards. These include the Beacon project to discover databases containing rare genomic variants, the Matchmaker project to find patients at different medical centers with similar conditions, the BRCA challenge to share all of the world's BRCA gene variants, and a new project, the Cancer Gene Trust to share somatic mutations observed in cancer tumors. I will give the latest updates on these projects.

10:45 Health Systems as Translational Research Partners

Gregory J. Tranah, Ph.D., Director of Precision Medicine & Senior Scientist, Sutter Health; Adjunct Professor, Epidemiology and Biostatistics, University of California, San Francisco

Precision medicine is empowered by access to diverse data, patients, and provider networks. This presentation will describe the role of health systems as translational research partners that can drive: 1) molecular discoveries; 2) provider and patient engagement through an interoperable platform; and 3) rapid translation of discoveries to clinical practice.

11:15 A Closed-Loop, Multilevel Modeling of Glucose Homeostasis

Corrado Priami, Professor & CEO, The Microsoft Research - University of Trento COSBI

T2DM is one of the major diseases affecting western society that is diagnosed at phenotypic level by inspecting fasting glucose. An understanding of the links between phenotype and molecular signaling is fundamental to control the disease progression. We exploit multi-level dynamical modeling to improve existing whole-body model of glucose metabolism and connect them with molecular insulin signaling in adipose cells.

 Aspera11:45 Accelerating Big Data Research in the Cloud

Anand Basu, MS, MBA, Senior Vice President, ESAC Inc.

Sharing omics data with researchers around the world is a challenge given the growing size of datasets. Learn how ESAC leveraged Aspera high-speed transfer software to enable fast, secure online sharing of large proteomic data for a critical NCI initiative.

12:15 Enjoy Lunch on Your Own

1:25 Refreshment Break in the Exhibit Hall with Poster Viewing


2:00 Chairperson’s Remarks

Xianghong Jasmine Zhou, Ph.D., Professor, Pathology and Laboratory Medicine, University of California, Los Angeles

2:10 Structure-Function Mapping of 3D Human Genome

Xianghong Jasmine Zhou, Ph.D., Professor, Pathology and Laboratory Medicine, University of California at Los Angeles

Three-dimensional (3D) genome structures vary from cell to cell even in an isogenic sample. Unlike protein structures, genome structures are highly plastic, posing a significant challenge for structure-function mapping. Here we report an approach to comprehensively identify 3D chromatin clusters that each occurs frequently across a population of genome structures, either deconvoluted from ensemble-averaged Hi-C data or from a collection of single-cell Hi-C data. Applying our method to a population of genome structures (at the macrodomain resolution) of lymphoblastoid cells, we identify an atlas of stable inter-chromosomal chromatin clusters. A large number of these clusters are enriched in binding of specific regulatory factors and are therefore defined as 'Regulatory Communities.' We reveal two major factors, centromere clustering and transcription factor binding, which significantly stabilize such communities. Finally, we show that the regulatory communities differ substantially from cell to cell, indicating that expression variability could be impacted by genome structures.

2:40 Multi-Omics Data Analysis Tools for Biologists and Clinicians

Bing Zhang, Ph.D., Professor, Department of Molecular and Human Genetics, Lester & Sue Smith Breast Center, Baylor College of Medicine

A major challenge in the multi-omics era is to enable biologists and clinicians to directly use the complex, interconnected, and high-dimensional data. This talk will introduce two web applications that attempt to address this challenge. NetGestalt provides a network-based framework for multi-omics data visualization and analysis. LinkedOmics enables the discovery of novel associations between genomic, proteomic, and clinical attributes.

3:10 3’-UTR Shortening Represses Tumor Suppressors in Trans by Disrupting ceRNA Crosstalk

Wei Li, Ph.D., Associate Professor, Division of Biostatistics, Duncan L. Cancer Center-LI, Baylor College of Medicine

Widespread mRNA 3’-UTR shortening promotes tumor growth in vivo, yet its underlying mechanism remains largely unknown. Here, our big data analysis followed by experimental validation suggest that the major role of 3’-UTR shortening in tumorigenesis is to direct the release of microRNAs to repress tumor suppressor competing-endogenous RNA (ceRNA) in trans, such as PTEN.

3:40 Next-Generation Image Mining and Data Analysis Provides Real-Time Decision on Patient Stratification

Ralf Huss, M.D., CMO, Definiens

3:55 Finding a Needle in a Haystack: New Approaches to Select Disease-Causing Mutations in Patients' Genomes

Yuval Itan, Ph.D., Research Assistant Professor, Human Genetics of Infectious Diseases, The Rockefeller University

4:10 Hollywood Oscar Dessert Reception in the Exhibit Hall with Poster Viewing

5:00 Breakout Discussions in the Exhibit Hall

These interactive discussion groups are open to all attendees, speakers, sponsors and exhibitors. Participants choose a specific breakout discussion group to join. Each group has a moderator to ensure focused discussions around key issues within the topic. This format allows participants to meet potential collaborators, share examples from their work, vet ideas with peers, and be part of a group problem-solving endeavor. The discussions provide an informal exchange of ideas and are not meant to be a corporate or specific product discussion. Pre-registration to sign up for one of the topics will occur a week or two prior to the event via the app.

Target Identification from Omic Data
Vinod Kumar, Ph.D., Senior Scientific Investigator, Computational Biology (US), Target Sciences, R&D, GlaxoSmithKline

  • Target identification and validation using genomics and genetics
  • Using clinical transcriptomics-based generation of disease signatures, and their application in drug discovery
  • Clinical trial-derived data for discovery

From Big Data to Clinical Actionability
Yuval Itan, Ph.D., Research Assistant Professor, Human Genetics of Infectious Diseases, The Rockefeller University

  • Big data to actionability: lessons learned
  • Integrating multi-omics data for clinical actions
  • Information technology: translating a trillion points of data into therapies and diagnostics

    Personalized Medicine: Data Annotation, Retrieval, Accessibility and Security
    Dexter Hadley, M.D., Ph.D., Assistant Professor, Pediatrics, Institute for Computational Health Sciences, University of California, San Francisco

    • Making experimental results findable, accessible, interoperable and reusable
    • Cloud as a secure ecosystem
    • Crowd sourcing
    • Digital health explicitly

    6:00 Close of Day

    Wednesday, February 22

    7:00 am Registration Open

    7:00 Breakfast Presentation (Sponsorship Opportunity Available) or Morning Coffee

    8:00 Plenary Keynote Session

    10:00 Refreshment Break and Poster Competition Winner Announced in the Exhibit Hall


    10:50 Chairperson’s Remarks

    Peng Yue, Associate Director, Research Bioinformatics, Gilead Sciences

    11:00 Learning Real-World Evidence of Drug Efficacy and Safety from the EHR

    Nigam Shah, Ph.D., Associate Professor of Medicine (Biomedical Informatics) at Stanford University, Assistant Director of the Center for Biomedical Informatics Research

    With the widespread availability of Electronic Health Records (EHR), it is possible to examine the outcomes of decisions made by doctors during clinical practice to identify patterns of care-generating evidence from the collective experience of millions of patients. We will discuss methods that transform EHR data into real world evidence for comparative effectiveness, drug safety and Phase IV surveillance studies for a learning health system.

    11:30 Bioinformatics Approaches for Functional Interpretation of Genome Variation

    Kai Wang, Ph.D., Associate Professor, Biomedical Informatics, Institute for Genomic Medicine, Columbia University

    We developed Phenolyzer, which analyzes clinical phenotypes on a given patient and predicts the most likely candidate genes that are responsible for the phenotypes, by integrating multiple sources of gene-pathway-disease-phenotype information. Based on Phenolyzer, we also developed iCAGES (integrated CAncer GEnome Score), which is an effective tool for prioritizing cancer driver genes for a patient using genome sequencing data. We illustrate case studies where iCAGES can facilitate selection of optimal treatment strategies based on predicted personal driver genes.

    12:00 Visualization, Characterization and Mining of Real-World Patient Data

    Andreas Matern, Vice President, Partnerships & Innovation, BioReference Laboratories, GeneDX

    Real World Patient Data (RWPD) is plagued with a lack of data management strategy. In this talk, I will discuss the construction of a data repository and visualization tools used to mine and characterize RWPD from clinical patient records. The discussion will include overcoming the complexities of RWPD, modeling the data for use in clinical and pharmaceutical research, and visualizations and data mining techniques used to allow end users to interrogate the data in ways never before possible.

    12:30 Enjoy Lunch on Your Own

    1:10 Refreshment Break in the Exhibit Hall and Last Chance for Poster Viewing


    1:50 Chairperson’s Remarks

    Farida Kopti, Ph.D., Director, Chemistry/Pharmacology/HTS Informatics IT, Merck Research Labs, Merck & Co.

    2:00 Leveraging ‘Omics Data from Deeply Phenotyped Clinical Studies to Inform Target and Biomarker Validation

    Janna Hutz, Ph.D., Senior Director, Head, Human Biology & Data Science Engine, Eisai AiM Institute, Eisai, Inc.

    Beyond oncology, there have been few documented successes in using genome scale sequencing from clinical trials to inform design of subsequent trials. Rather, it is emerging that these datasets’ greatest value may lie in feeding back into earlier stages of drug discovery. I will share Eisai’s efforts to use NGS data from well-characterized clinical cohorts for target validation and biomarker identification.

    2:30 Disease Signatures to Drug Discovery

    Deepak K. Rajpal, Ph.D., Director, Computational Biology-Target Sciences, GSK

    We present how we have used clinical transcriptomics-based generation of disease signatures and their application in drug discovery. We have identified disease areas of interest and then clinical transcriptomics datasets from published literature associated with the diseases of interest. We have then generated disease signatures and by integrative informatics approaches, and have applied these datasets in our drug discovery efforts. We present here a case study in dermatological disease area.

    3:00 Target Identification and Validation Using Genomics and Genetics

    Vinod Kumar, Ph.D., Senior Scientific Investigator, Computational Biology (US), Target Sciences, R&D, GSK

    In practice, the identification of a novel disease target is an integrative step combining many lines of evidence, but may often be triggered by a key, highly publicized finding. Though relatively little attention has been paid to systematically evaluate the multiple lines of evidence that have proven effective in choosing a successful target for that disease. I will present how we use informatics approaches to leverage genetic, genomics and phenotypic data to prioritize targets and validate them experimentally.

    3:30 Session Break


    3:40 Chairperson’s Remarks

    Andrew C. Fish, J.D., Executive Director, AdvaMedDx

    3:45 Big Data – The Devil’s in the Details

    Mike Barlow, Vice President, Operations, MolDX Executive Lead, Specialty Contracts, Palmetto GBA

    Linking effective therapies and expanded trial designations are the expected benefit of the ever expanding capabilities of genomic biomarker and gene expression identification. More and more data is being generated every day. Keeping that data ‘valuable’ will require we maintain a critical focus on the quality and comparative values of the data, especially in the area of genomics and more specifically outcomes. Other questions will arise around where the data is collected, how it is curated, and who has access. As a Medicare payer, we support the concept of data collection/aggregation if that data can be effectively mined to create ever improving treatment protocols and more importantly improved outcomes.

    4:00 Efficiently Leveraging Commercial and Open Source Bioinformatics Tools for Clinical Interventions and Research Discoveries from Very Large Datasets

    Ben Busby, Ph.D., Genomics Outreach Coordinator, NCBI, NLM/NIH

    In precision medicine, it is often the case that efficacy does not depend on the appropriate computational intervention, but on the morphology of the data that informs the problem. For example, different strategies should be employed when calling short variants in stable versus unstable regions of the human genome, or when looking for pathogenic effectors in well-characterized versus newly discovered bacterial or viral pathogens. Pragmatic solutions from existing commercial and open source resources will be presented.

    4:15 From Bits to Bedside: Developing a Learning Digital Health System to Evaluate Pigmented Skin Lesions

    Dexter Hadley, M.D., Ph.D., Assistant Professor, Pediatrics, Institute for Computational Health Sciences, University of California, San Francisco

    Melanoma accounts for less than one percent of skin cancer cases but the vast majority of skin cancer deaths. Early screening and diagnosis significantly improves patient outcomes, yet no systematic framework exists for clinical evaluation of common pigmented lesions for risk of melanoma. In this talk, I will describe our approach to develop a learning health system focused on precision screening and diagnosis of skin cancer. We first leverage medical students in the clinics to capture digital samples of pigmented skin lesions at scale with mobile health technology. We then leverage this clinical big data to train state-of-the-art deep learning image classification algorithms to better screen for cancer. Our work translates routinely documented electronic health data into an impactful digital health initiative to directly improve clinical outcomes and advance clinical knowledge.


    5:15 Close of Conference Program

    Stay on for these Tri-Conference Symposium, taking place at February 23-24, 2017 at Moscone South Convention Center

    NGS Diagnostics: Knowledge Bases, Annotation and Interpretation

    Register Now
    March 26-27, 2024

    AI in Precision Medicine

    Implementing Precision Medicine

    At-Home & Point-of-Care Diagnostics

    Liquid Biopsy

    Spatial Biology

    March 27-28, 2024

    AI in Diagnostics

    Diagnostics Market Access

    Infectious Disease Diagnostics

    Multi-Cancer Early Detection

    Single-Cell Multiomics