Bioinformatics for Big Data

In the era of precision medicine, enormous amounts of data are being generated from disparate sources, including omics, imaging, sensing and beyond. Today, computational scientists need to develop better tools to manage, integrate and share data to make it clinically actionable. The Bioinformatics for Big Data conference at the Molecular Medicine Tri-Conference 2018 will showcase how medical centers and the pharma industry are developing such tools and software to meet this goal.

Who should attend: Directors, Managers, Researchers, and Scientists from Pharma, Biotechs, Academia, Government and Healthcare Organizations working in Research, Biomedical Informatics, Information Technology, Data Science, Modeling & Simulation, R&D Informatics, Software Engineering, Translational Genomics, Predictive Medicine, Biostatistics, Computational Biology, and Bioinformatics

Monday, February 12

10:30 am Conference Program Registration Open


11:50 Chairperson’s Opening Remarks

Elizabeth Worthey, Ph.D., Faculty Investigator, Clinical Informatics Director, and Adjunct Associate Professor, Software Development and Informatics, Pediatrics and Genetics, HudsonAlpha Institute for Biotechnology

12:00 pm How Data Commons Are Changing the Way That Large Biomedical Datasets are Analyzed and Shared

Robert Grossman, Ph.D., Frederick H. Rawson Professor, Professor of Medicine and Computer Science, Jim and Karen Frank Director, Center for Data Intensive Science (CDIS), Co-Chief, Section of Computational Biomedicine and Biomedical Data Science, Dept. of Medicine, University of Chicago

Biomedical data has grown too large for most research groups to host and analyze the data from large projects themselves. Data commons provide an alternative by co-locating data, storage and computing resources with commonly used software services, applications and tools for analyzing, harmonizing and sharing data to create an interoperable resource for the research community. We give an overview of data commons and describe some lessons learned from the NCI Genomic Data Commons, the BloodPAC Data Commons and the BRAIN Commons. We also give an overview of how an organization can set up a commons themselves.

12:30 Molecular Diagnostics in the Era of Big Data and Precision Medicine

Elizabeth Worthey, Ph.D., Faculty Investigator, Clinical Informatics Director, and Adjunct Associate Professor, Software Development and Informatics, Pediatrics and Genetics, HudsonAlpha Institute for Biotechnology

Genome-wide sequencing is used as a standard molecular diagnostic test. The major bottleneck in identification of causal variants is not sequencing or initial analysis, but rather interpretation. Interpretation of genetic findings is scarcely a new challenge, but the task today can be more complex given the increase in dataset size and complexity. Commoditizing interpretation requires development and application of appropriately scaled tools and methods. I will discuss challenges that are faced during implementation as well as the solutions in place within our institution.

1:00 Session Break

IBM Watson Health1:10 Luncheon Presentation: Applications of AI in Drug Discovery

Alix Lacoste, Ph.D., Lead Technical Solution Specialist, IBM Watson Health Life Sciences

With millions of scientific research articles published each year, innovation in the life sciences suffers from knowledge waste and lack of knowledge integration. IBM Watson for Drug Discovery addresses this issue by mining large corpuses of literature and data to help scientists accelerate biomedical research. Using advanced analytics and machine learning, the platform can also predict novel relationships, as demonstrated through our recent work with Barrow Neurological in ALS disease, and Pfizer in immuno-oncology, among many projects.

 1:40 Session Break


2:30 Chairperson’s Remarks

Nathan D. Price, Ph.D., Professor & Associate Director, Institute for Systems Biology

2:40 Mining Personal, Dense, Dynamic, Data Clouds for Health and Disease Insights

Nathan D. Price, Ph.D., Professor & Associate Director, Institute for Systems Biology

We have generated personal, dense, dynamic, data clouds (PD3) for thousands of people (and growing), consisting of genomics, proteomics, transcriptomics, microbiomes, clinical chemistries and wearable devices of the quantified self to monitor wellness and disease. I will present results from our proof-of-concept pilot study in a set of 108 individuals (Price et al., Nature Biotechnology 2017) as well as from the next thousand individuals. I will show how the interpretation of these data lead to actionable findings for individuals to improve health and reduce risk drivers of disease.

3:10 Systematic Functional Annotation of Somatic Mutations in Clinically Actionable Genes

Han Liang, Ph.D., Associate Professor and Deputy Chair, Department of Bioinformatics and Computational Biology, Associate Professor, Department of Systems Biology, The University of Texas MD Anderson Cancer Center

Understanding the functional effects of somatic mutations in cancer cells is a fundamental issue in cancer research, since mutated proteins have been widely used as biomarkers and therapeutic targets. We developed a systems-biology approach that integrates high-throughput mutant ORF construction, high-throughput sensitive cell viability assays, high-throughput functional proteomics, and drug sensitivity screens, and applied it to >1,000 mutations in clinically actionable genes. Our study provides a valuable resource for identifying clinically actionable mutations for precision cancer medicine.

3:40 LinkedOmics: Analyzing Multi-Omics Data within and across 32 Cancer Types

Bing Zhang, Ph.D., Professor, Department of Molecular and Human Genetics, Lester & Sue Smith Breast Center, Baylor College of Medicine

LinkedOmics is a web platform to explore associations between different types of molecular and clinical attributes, to compare associations discovered from different omics platforms or sample cohorts, and to interpret identified associations in the context of biological pathways and molecular networks. The current version of LinkedOmics includes all cancer genomic and proteomic data from TCGA and CPTAC, and it can be easily extended to support other cohort-based multi-omics studies.

4:10 Selected Poster Presentation: Novel Computational Method Integrating Disparate Data Types for Drug Candidate MoA Profiling

Timothy J. Cardozo, MD, PhD, Associate Professor, Department of Biochemistry and Molecular Pharmacology, NYU School of Medicine

4:40 Refreshment Break and Transition to Plenary Session

5:00 Plenary Keynote Session (click here for more details)


Precision for Medicine

 6:00 Grand Opening Reception in the Exhibit Hall with Poster Viewing

7:30 Close of Day

Tuesday, February 13

7:30 am Registration Open and Morning Coffee

8:00 Plenary Keynote Session (click here for more details)

9:00 Refreshment Break in the Exhibit Hall with Poster Viewing


10:05 Chairperson’s Remark

Hongzhe Li, Ph.D, Professor of Biostatistics and Statistics, Chair, Graduate Program in Biostatistics Director, Center for Statistics in Big Data (CSBD, University of Pennsylvania

10:15 Novel Feature Selection Strategies for Enhanced Predictive Modeling and Deep Learning in the Biosciences

Tom Chittenden, Ph.D., D.Phil., Lecturer and Senior Biostatistics and Mathematical Biology Consultant, Harvard Medical School

Artificial Intelligence (AI) is the single most transformative technology in history. Advancements in medicine depend upon furthering our understanding of how genetic variation and somatic mutation regulate aberrant gene activity and subsequent disease biology. Our advanced deepCODE feature selection strategies quantitatively integrate multiple types of high-throughput omics data. These approaches improve performance of classification methods and the subsequent identification of genes and molecular pathways more highly predictive of disease etiology.

10:45 CancerLocator: Non-Invasive Cancer Diagnosis and Tissue-of-Origin Prediction Using Methylation Profiles of Cell-Free DNA

Xianghong Jasmine Zhou, Ph.D., Professor, Pathology and Laboratory Medicine, University of California, Los Angeles

We propose a probabilistic method, CancerLocator, which exploits the diagnostic potential of cell-free DNA by determining not only the presence but also the location of tumors. CancerLocator simultaneously infers the proportions and the tissue-of-origin of tumor-derived cell-free DNA in a blood sample using genome-wide DNA methylation data. CancerLocator outperforms two established multi-class classification methods on simulations and real data, even with the low proportion of tumor-derived DNA in the cell-free DNA scenarios. CancerLocator also achieves promising results on patient plasma samples

11:05 Vetting Integrated ‘Big Data’ Approaches to Precision Health Care

Nicholas J. Schork, Ph.D., Professor, Quantitative Medicine, The Translational Genomics Research Institute

Vetting big data and machine learning techniques meant to enable precision medicine is not trivial. However, there are a few emerging strategies for proving the utility of integrated, data-intensive approaches for advancing precision health care. These include aggregating N-of-1 trials, pursuing drug matching trials, and developing clinical learning systems. In addition, recent trends in regulatory oversight may admit novel strategies like those discussed.

11:25Analyzing Genomic Data at Scale with Google Cloud

Jonathan Sheffi, Product Manager, Genomics & Life Sciences, Google Cloud

Google Cloud enables scientists to change the way they perform research and collaborate with one another. This presentation will highlight how Google Cloud is accelerating life sciences research and finding new ways to innovate.

11:55 Observational Data for Biomedical Discovery

Nicholas Tatonetti Ph.D., Herbert Irving Assistant Professor of Biomedical Informatics, Director of Clinical Informatics, Herbert Irving Comprehensive Cancer Center, Columbia University

Observation is the starting point of discovery. Based on observations scientists form hypotheses that are then tested. In the information trillions of observations are being made and recorded every day – from online social interactions to the emergency room visit. With so much data available, generating hypotheses using a single scientist’s mind is no longer sufficient. Data mining is about training algorithms to recognize patterns in enormous sets of data and automatically identify new hypotheses. In this talk, I will discuss how we use data mining algorithms to identify unexpected effects of drugs used singly and in combination with other drugs. Using integrative informatics methods, we are able to discover drug-drug interactions that no one considered possible before. Finally, I will demonstrate how to use simple and efficient laboratory experiments to validate these hypotheses. In many cases these experiments can be executed in high-throughput by robotic systems, with the ultimate goal of automating the scientific method.

12:15 pm Session Break

12:25  Enjoy Lunch on Your Own

1:25 Refreshment Break in the Exhibit Hall with Poster Viewing


2:00 Chairperson’s Remarks

Matthew Trunnell, Vice President and CIO, Fred Hutchinson Cancer Center

2:10 The NCI Cancer Research Data Commons: Integrating Heterogeneous Data for Knowledge Discovery

Anthony R. Kerlavage, Ph.D., Chief, Cancer Informatics Branch, National Cancer Institute, Center for Biomedical Informatics & Information Technology

Precision medicine requires identifying the molecular basis for disease and matching targeted therapies to each patient’s unique biology. Cancer researchers need to access, integrate, and analyze data from genomics, metabolomics, proteomics, microbiomics, imaging, clinical research and outcomes, population-based data, and data collected by health care providers and patients themselves. Building upon current systems, we are defining an integrated, cloud-based Cancer Research Data Commons necessary to fully leverage these data.

2:40 Converged IT and Data Commons

Simon Twigger, Ph.D., Senior Scientific Consultant, BioTeam Inc.

Data management is an ongoing and growing challenge in Life Sciences. The Data Commons approach aims to streamline accessibility to the right data and right types of analytics tools and resources by creating a converged platform from the foundational infrastructure to the user interface. This talk will cover the industry trends for developing a strategy around and implementing Data Commons solutions and what role converged IT plays in the process.

3:10 PANEL DISCUSSION: Data Commons

Moderator: Matthew Trunnell, Vice President, CIO, Fred Hutchinson Cancer Center


Lucila Ohno-Machado, M.D., Ph.D., Associate Dean, Informatics and Technology, University of California, San Diego Health

Lara Mangravite, Ph.D., President, Sage Bionetworks

Simon Twigger, Ph.D., Senior Scientific Consultant, BioTeam Inc.

Robert Grossman, Ph.D., Frederick H. Rawson Professor, Professor of Medicine and Computer Science, Jim and Karen Frank Director, Center for Data Intensive Science (CDIS), Co-Chief, Section of Computational Biomedicine and Biomedical Data Science, Dept. of Medicine, University of Chicago

  • What is a data common?
  • Challenges in data commons
  • Data commons and open science
  • Technology innovations

4:10 Valentine’s Day Celebration in the Exhibit Hall with Poster Viewing

5:00 Breakout Discussions in the Exhibit Hall

These interactive discussion groups are open to all attendees, speakers, sponsors, & exhibitors. Participants choose a specific breakout discussion group to join. Each group has a moderator to ensure focused discussions around key issues within the topic. This format allows participants to meet potential collaborators, share examples from their work, vet ideas with peers, and be part of a group problem-solving endeavor. The discussions provide an informal exchange of ideas and are not meant to be a corporate or specific product discussion.

Creating FAIR (Findable, Accessible, Interoperable, Reusable) Data

Lara Mangravite, Ph.D., President, Sage Bionetworks

  • Importance of FAIR data in biomedical research
  • How to minimize the effort required as a data generator to ensure that data is FAIR
  • Standards and systems for implementing FAIR data practices

Machine Learning Techniques and Big Data to Enable Precision Medicine

Nicholas J. Schork, Ph.D., Professor, Quantitative Medicine, The Translational Genomics Research Institute

  • How can machine learning be leveraged in very early, pre-clinical drug development initiatives, e.g., in drug screening studies, that might enable precision medicine?
  • What changes to current clinical trials infrastructure would need to be made to accommodate emerging big data and machine learning techniques?
  • What machine learning and big data-oriented strategies might complement, or even replace, traditional late phase (e.g., phase IV) clinical trials infrastructure?

6:00 Close of Day

Wednesday, February 14

7:30 am Registration Open and Morning Coffee

8:00 Plenary Keynote Session (click here for more details)

10:00 Refreshment Break and Poster Competition Winner Announced in the Exhibit Hall


10:50 Chairperson’s Remarks

Ajay Shah, Director, Research Informatics, Office of the Chief Informatics Officer, Beckman Research Institute and City of Hope National Medical Center

11:00 Using Human Genetics to Drive Drug Discovery: The Industry Perspective

Anna Podgornaia, Ph.D., Associate Principal Scientist, Genetics and Pharmacogenomics, Translational Medicine, Merck

The Merck Genetics and Pharmacogenomics (GpGx) group uses human genetics and genomics across the entire drug development pipeline to make decisions anchored in human genetics. During the presentation, I will provide 3 vignettes about how we use human genetics during the drug discovery process, including 1) Using human genetics to get inspiration for novel drug programs; 2) Using human genetics to gain insight into potential safety issues; 3) Pharmacogenomics. I will close with a section on challenges and opportunities in using human genetics to drive drug discovery.

11:30 Immune-Mediated Dermatological Conditions: Target Identification

Deepak K. Rajpal,Senior Scientific Director, Computational Biology, Target Sciences, GSK

We share a framework for developing new therapeutic intervention strategies for such indications by utilizing publicly available clinical transcriptomics data sets. We propose a strategy based on developing disease signatures, and utilization of the disease signatures conceptually for identifying potential drug repurposing opportunities and present novel target identification approaches. We anticipate that the conceptual methodology shared here or similar approaches will further support not only biomarker discovery efforts but also the development of new drugs.

12:00 pm bStyle: A Graphical, Integrated and Modular Systems Biology Platform

Corrado Priami, Ph.D., President & CEO, COSBI

bStyle is a graphical platform to run systems biology analysis in the field of systems pharmacology. It handles multi-omics data to detect active networks and end-up performing in silico experiments for drug design and development. All the mathematical technicalities are hidden behind the graphics and it is then easy to use even by a non-expert of modeling and data analysis.

12:30 Session Break

12:40  Enjoy Lunch on Your Own

1:10 Dessert Break in the Exhibit Hall and Last Chance for Poster Viewing


1:50 Chairperson’s Remarks

Michael N. Liebman, Ph.D., Managing Director, IPQ Analytics, LLC; Professor, Drexel College of Medicine; Professor, Wenzhou First University Medical School

2:00 Pharma and Physician Perspective - The Future of Drug Development and Health Care

Charles E. Barr, M.D., MPH, Group Medical Director & Head, RWE Strategy & External Relationships, US Medical Affairs, Genentech

Science advances the knowledge of disease mechanisms, enabling the creation of transformative therapies. However, both health care and drug development face serious challenges including unsustainable growth in costs. Feasible solutions will require new ways for patients, physicians and researchers to leverage advanced technologies to accelerate both research and health care cost-effectively.

2:30 Healthcare Perspective – Limitations of Big Data Approaches and Clinical Needs

Hal Wolf, Director and Practice Leader for Information and Digital Health Strategy, The Chartis Group

Genomics has quickly become a wide and broad topic capturing both the academic and consumer medical/health models. But the access to meaning big data sets that can be turned into useful knowledge and the lack of clear medical needs has left many approaches at a crossroads on how to proceed. Where will genomics set path and what are the dependencies to support its useful integration into the healthcare eco-system?


Moderator: Michael N. Liebman, Ph.D., Managing Director, IPQ Analytics, LLC; Professor, Drexel College of Medicine; Professor, Wenzhou First University Medical School

  • Complexity of disease(s): Disease stratification; limitations in diagnosis
  • Complexity of patients: Clinical history; co-morbidities; genomics
  • Clinical guidelines: Quality of guidelines; compliance
  • Trial populations vs. real world patients
  • Translation of clinical trial results into clinical practice
  • Unmet vs. unstated unmet clinical needs

3:30 Session Break


3:40 Chairperson’s Remarks

Lara Mangravite, Ph.D., President, Sage Bionetworks

3:45 Collaborative Ecosystems in Data-Intensive Science for Precision Medicine

Lara Mangravite, Ph.D., President, Sage Bionetworks

An advanced understanding of the dynamic nature of disease is necessary to meaningfully implement precision medicine but several barriers exist. In particular, approaches to understand dynamic fluctuations in disease are highly data intense and require bioinformatic inquiry for which standard methodologies do not exist. These issues can be systematically addressed by combining resources, benchmarking methods, and establishing community consensus around well-supported research findings.

4:15 Novel Approaches to Participant Engagement in Genetic Research and Translating Big Data into Action

David Verbel, MPH, Director, Translational Data Science, Human Biology and Data Science, Eisai, Inc.

To identify the right medicines and patients to receive them, Eisai is exploring ways to identify individuals who carry genetic variants of interest. In two such studies, biological samples from genetically and clinically selected individuals will be characterized to learn more about cellular and molecular consequences to changes in the function of particular genes. The first involves utilizing a novel research platform; the latter working with a leading academic center.

4:45 Scientific Informatics for Translational Oncology

Ronghua Chen, Director, Scientific Informatics, Global Research IT, R&D IT, Merck

The applications of molecular profiling technologies including next-generation sequencing in translational oncology offer unprecedented opportunities to discover new drug targets and biomarkers as well as to understand tumor biology. This presentation will elaborate the complexities of oncology data sets and highlight an integrated scientific informatics approach in analyzing data and supporting translational research.

5:15 Close of Conference Program

Register Now
March 26-27, 2024

AI in Precision Medicine

Implementing Precision Medicine

At-Home & Point-of-Care Diagnostics

Liquid Biopsy

Spatial Biology

March 27-28, 2024

AI in Diagnostics

Diagnostics Market Access

Infectious Disease Diagnostics

Multi-Cancer Early Detection

Single-Cell Multiomics