Cambridge Healthtech Institute’s Third Annual

Genomics & Sequencing Data Integration, Analysis and Visualization

Deriving Insights and Relationships from Big Data Sets to Advance Research and Patient Health

March 10-11, 2016 | Hilton San Francisco Union Square | San Francisco, CA
Part of the 23rd International Molecular Medicine Tri-Conference


The eruption of high-throughput genomic and proteomic technologies over the last two decades has motivated the development of tools and methodologies to transfer and integrate data into large-scale bioinformatics database platforms and repositories. The surge of biological data being collected has increased the need for standardized workflows, integrated solutions, economics of scale in the cloud, security and compliance in the cloud especially as genomics becomes more integrated with precision medicine initiatives, and tools to visualize and analyze the data. The third annual Genomics & Sequencing Data Integration, Analysis and Visualization Symposium will present concrete use cases in life sciences where analysis and visualization of big data have made a difference in science decisions. Thought leaders will discuss the trends in genomic data, big data analytics, and translational informatics and how dealing with data complexities has advanced research and patient health.

Final Agenda

Day 1 | Day 2 | Download Brochure

Thursday, March 10

7:30 am Registration and Morning Coffee


9:00 Chairperson’s Opening Remarks

Willy Valdivia-Granda, CEO, Orion Integrated Biosciences, Inc.

9:10 Power to the People: Annotation, Analysis and Visualization for Systems Biology and Precision Medicine Using CrosstalkerTM

Mark Chance, Ph.D., Vice Dean for Research; Director, Center for Proteomics and Bioinformatics; Charles W. and Iona A. Mathias Professor of Cancer Research, School of Medicine, Case Western Reserve University and Neo Proteomics, Inc.

Integration and visualization of diverse sets of molecular is one of the most challenging yet important approaches in order to identify dysregulated molecular targets in complex disease. Conceptions of disease states as high level descriptors is rapidly evolving into molecular descriptions of disease sub-types characterized at a systems level (e.g. networks and pathways), where the inclusion of specific patients into sub-types will eventually drive their individualized treatment protocols. While genomic and pathway characterization of cancer sub-types driving individual therapeutic decisions is now field standard, and provides an important proof of principle for precision medicine, the ability to integrate a wide range of gene, protein, or metabolite level data to permit the development of precision medicine across a wide range of diseases is in its infancy. Systems biology software solutions are essential to progress in this area, however, in terms of current commercial software solutions, both the analytical algorithms and databases lack transparency (black box), this feature limits the ability of the user to understand how and why their results came about, which is essential to mechanistic understanding. In addition, the reliance on proprietary databases ignores the increasing value and reliability of public (open-source) data. This lack of transparency will be increasingly untenable in light of the international movements towards reproducibility and accuracy of results demanded by sponsors and the public. On the other hand free-ware, which has the maximum in flexibility and agility, is typically limited to the “power-user” or professional bioinformatics community and historically lacks the levels of ongoing support and sustainability that is standard for commercial software products intended for broad user adoption. To overcome these challenges we have developed an integrated set of commercial tools that includes CrosstalkerTM, a user-friendly and transparent (e.g. white-box) analytical engine for molecular data analysis and integration, including but not limited to individual or simultaneous integration of mutations, SNPs, CNVs, array, RNAseq, and proteomics data, based on page rank-type approaches. This analytic framework is coupled to a molecular network generation engine called Disease Path FinderTM, where the user can document the details of the networks and pathways being annotated and scored and compare and contrast CrosstalkerTM integrations across different network and pathway frameworks to investigate and understand the molecular mechanisms underlying the disease and developmental phenotypes under study. Together these tools provide a systems biology workflow that includes the high quality of visualization and annotation expected in a commercial product while retaining the ability to integrate user molecular data with a wide range of public and private pathway and network representations, such that reproducibility and transparency of results is assured, paving the way for the adoption of precision medicine.
Authors: Mark R. Chance, Gurkan Bebek, Mehmet Koyuturk, and Sean Maxwell

9:40 Benchtop Sequence Analysis: Empowering Bench Scientists to Analyze Big Data through Web Interfaces

Dave Barkan, Ph.D., Investigator, Infectious Diseases, Novartis Institutes for BioMedical Research

As Next-Generation Sequencing library construction and data analysis have become refined and standardized, scientific focus has shifted from method development to results visualization and interpretation. For some basic sequence analysis tasks, a bioinformaticist's role may be simply to launch an established software pipeline on the command line with default parameters and send the generated results back to the bench scientists. In NIBR, the Bioinformatics and IT groups are working together to eliminate this intermediate step by building web front-ends that launch in-house bioinformatics pipelines and return the results and visualizations directly to the end-users in their browser.


10:10 Metabolomics, the Microbiome and Understanding Complex Diseases

Andreas Kogelnik, M.D., Ph.D., Founder and Director, Open Medicine Institute

The development of -omic biotechnologies such as gene expression, metabolomics and gut microbiome are advancing rapidly in terms of their utilization on the research front and are showing promise in clinical applications. Up until now, little attention has been paid to how these types of data relate to one another or their real-world impact on disease modulation. This talk will discuss two projects focused on integrating blood metabolomic and gut microbiomic data with direct clinical application. We will discuss how these technologies are being used to improve diagnostic rigor and pointing the way to therapeutic targets in particular, for complex diseases and chronic disease management. Current integrative -omics appears on course to re-shape precision diagnostics and therapies.

10:40 Coffee Break with Exhibit and Poster Viewing

11:15 Next-Generation Sequence Analysis System: Discovering the Unknown in Complex Genomic and Metagenomic Datasets

Willy Valdivia-Granda, CEO, Orion Integrated Biosciences, Inc.

NGS sequencing offers the possibility to sequence the DNA of known and unknown organisms. Despite the exponential accumulation of microbial genomic information, there is not a reference database where researchers can retrieve curated sequences specific to a give taxonomic group. This situation continues to hinder the rapid development of standardized diagnostic reagents, prophylactics and therapeutics. This talk discusses methods and technology to exploit genomic and metagenomics information to discover and prioritize targets to counter the impact of infectious agents.

11:45 Identification and Relevance of Fusion Transcripts in a Novel in vitro Progression Model of High-Grade Serous Ovarian Cancer

Sharmila Bapat, Ph.D., FNASc, FASc, Independent Project Investigator and Group Head, National Centre for Cell Science (NCCS), Pune India

High-grade serous ovarian adenocarcinoma (HGSC) is recognized to rapidly progress from asympomatic, silent onset to aggressive metatstatic disease that leads to the most dismal prognosis. Lack of early diagnosis has led to an opinion that better disease management through detailed molecular and biological understanding of tumors could pave the way for development of targeted ‘personalized’ therapeutic strategies and improve patient prognosis.

12:15 pm Sponsored Presentation (Opportunity Available)

12:30 Session Break

12:40 Luncheon Presentation (Sponsorship Opportunity Available) or Enjoy Lunch on Your Own

1:15 Session Break


1:50 Chairperson’s Remarks

Willy Valdivia-Granda, CEO, Orion Integrated Biosciences, Inc.

2:00 Software and Computational Platforms to Integrate Diverse Genomics and Epigenomic Datasets

Duygu Ucar, Ph.D., Assistant Professor, Genomic Medicine, Jackson Laboratory

We are building software and computational algorithms to integrate epigenetic datasets with other data sources including chromatin interaction maps, public data repositories (SNPs, gene sets, immune modules), and transcriptome. Attendees will learn about the methods and software we are developing in my lab, as well as the research directions that Jackson Laboratory for Genomic Medicine is taking, which is a brand-new genomics institute dedicated to understand human diseases, including cancer, immune diseases, and diabetes.

2:30 Mitochondrial Function, Mutation, and Diseases

Zhenglong Gu, Ph.D., Associate Professor, Division of Nutritional Sciences, Cornell Center for Comparative and Population Genomics, Cornell University

A majority of mitochondrial DNA (mtDNA) mutations reported to be implicated in diseases are heteroplasmic, a status with co-existing mtDNA variants in a single cell. Quantifying the prevalence of mitochondrial heteroplasmy and its pathogenic effect in healthy individuals could further our understanding of its possible roles in various diseases. In this talk I will discuss our results regarding this issue.

3:00 Refreshment Break with Exhibit and Poster Viewing

3:30 New Gene-Level Approaches to Identify Disease-Causing Mutations in Next-Generation Sequencing Data of Patients

Yuval Itan, Ph.D., MRes, Postdoctoral Associate, Human Genetics of Infectious Diseases, The Rockefeller University

Ascertaining whether a gene that harbors a variation may be relevant to the disease being studied is key to testing as few potential candidate mutant alleles as possible while not excluding the disease-causing allele(s). We developed two novel gene-level approaches to estimate the relevance of a specific gene to a disease. We first describe the gene damage index (GDI), a genome-wide, gene-level estimate of accumulated mutational damage for human protein-coding genes: genes that are frequently mutated in healthy individuals are unlikely to cause rare diseases, and yet they contribute to a large proportion of the next generation sequencing data generated for any patient. We then present the mutation significance cutoff (MSC): a gene-specific threshold (rather than a fixed threshold for all human genes in current methods) to differentiate between benign and damaging variants. We demonstrate that the combination of the GDI and MSC approaches significantly increases the discovery rate of new disease-causing mutations in next generation sequencing data of patients.

4:00 Clinical Transcriptomic Profiles – Providing Clues for Novel Therapeutic Development Strategies: A Case Study on Psoriasis

Deepak K. Rajpal, D.V.M., Ph.D., Director, Computational Biology, Target Sciences, GlaxoSmithKline

Psoriasis is a chronic inflammatory skin disease with complex pathological features. By mining the publicly available clinical transcriptomic profile data, we present a framework for developing new therapeutic intervention strategies. We propose a psoriasis disease signature, and the reversal of such signature on therapeutic intervention, presents approaches to drug repurposing and novel target selection strategies. These approaches would potentially support biomarker and drug discovery strategies for psoriasis.

4:30 Ten Things You Probably Don’t Know About GenBank

Ben Busby, Ph.D., Genomics Outreach Coordinator, NCBI, NIH

5:00 Reception with Exhibit and Poster Viewing

6:00 Close of Day

Day 1 | Day 2 | Download Brochure

Friday, March 11

8:00 am Morning Coffee


8:25 Chairperson’s Remarks

Martin Gollery, CEO, Tahoe Informatics

8:30 Role of Hadoop and Data Analysis to Move Genomics from Research to Personalized Medicine

Martin Gollery, CEO, Tahoe Informatics

9:00 Evolution of a Genomics Data Ecosystem: Efficient NGS Data Tracking, Processing, Integration and Results Sharing

Lihua Yu, Ph.D., Vice President, Data Science and Information Technology, H3 Biomedicine, Inc.

H3 Biomedicine is an oncology drug discovery company, which leverages cancer genomics data generated externally and internally throughout our target validation and drug discovery efforts. Our goals are to use genomics data to inform our drug discovery efforts, and most importantly allow data exploration by all scientists. Toward these goals, we have built a genomic data ecosystem with components including data storage, NGS data analysis with pipelines and workflow management tools, genomic data management/warehouse using AWS Redshift, genomic data integration system that allow data exploration for both computational biologists and other scientists, to results and knowledge sharing in a company- wide collaboration platform. Very importantly, we developed a genomics experiment and sample tracking system; the common IDs created in this system serve as unique identifiers to tie all components of the Eco-system together to allow efficient data flow and data/ information retrieval. This presentation discusses the importance of having such eco-system that also provides tractability and visibility and reusability of both the data and the scientific insights from genomics studies.

9:30 Speeding Up Drug Research with MongoDB: Introducing MongoDB into an RDBMS Environment

Doug Garrett, Research Leader, NGS Pipeline Development Group, Roche Sequencing

Genetic testing of animal models has been critical to Genentech Research in understanding the underlying cause of many diseases and in developing drugs to address those diseases. This importance has driven an increase in both the number and complexity of genetic testing requirements for the transgenic Genetic Analysis Lab. At the same time, improvements in genetic testing technologies have driven down the cost and increased the throughput of commercially available instruments. However, integrating these new instruments into our existing system had proven time consuming and resource intensive, with some new instruments requiring six months or more to integrate. To increase the flexibility of the system, we embarked on a major redesign, which included the use of MongoDB, a noSQL document database with a flexible schema. The new system has allowed a major reduction in the time needed to integrate new equipment from months to only weeks.

10:00 Sponsored Presentation (Opportunity Available)

10:30 Coffee Break with Exhibit and Poster Viewing


11:00 Patient Mediated Data Collection, Use and Donation without Violating HIPAA

Anil Sethi, CEO, Founder, Gliimpse

In modern times sensors and algorithms, along with self-reported health measures combined with genomic markers from NGS all collocated in our personal health records, illuminate the hidden stories of our health. Through the “Rise of the Consumer”, this model of modern health can be curated, shared, and even donated to help researchers find cures faster. This presentation discusses in detail how to integrate this type of data using a novel patient-mediated collection methodology.

11:30 Securing Sensitive Workloads in the Cloud: Best Practices and Procedures for Securing Your Data on Amazon Web Services

Brad Dispensa, Senior Solutions Architect, Amazon Web Services

Data security, access controls and monitoring are common areas of confusion for researchers interested in moving to the cloud. In this presentation I will cover how to configure your research to run securely using Amazon Web Services. We will review Amazon’s shared security model, encryption techniques, automation of security controls and resource provisioning and HIPAA workload design patterns on AWS.

12:00 pm Lessons Learned Scaling Up Analysis for Thousands of Samples Using Amazon Web Services

Ravi Madduri, Fellow, Computation Institute, University of Chicago; Project Manager, Math and Computer Science Division, Argonne National Lab

Globus Genomics is a cloud-based, large scale genomics analysis service that is used by research consortiums, healthcare providers for analyzing 1000s of raw genomics datasets. In order to deliver results of the analyses on the tight deadlines, we created cost-aware resource scheduling on AWS resources that leverages the computational profiles that we created for various tools to schedule cost/performance optimized execution. In this talk, we will present some of the use cases and success stories from our work.

12:30 Close of Symposium

Day 1 | Day 2 | Download Brochure

Japan-Flag Korea-Flag China-Simplified-Flag China-Traditional-Flag  

Premier Sponsors:  

Bina Technologies




Jackson Laboratory - small logo 






 Precision for Medicine 



Silicon Biosystems


Thomson Reuters-Large