Cambridge Healthtech Institute’s Fourth Annual

Bioinformatics for Big Data

Converting Data into Information and Knowledge

February 16-18, 2015 | Moscone North Convention Center | San Francisco, CA
Part of the 22nd Annual Molecular Medicine Tri-Conference


About this Conference:

CHI's Fourth Annual Bioinformatics for Big Data conference will assemble thought leaders who will discuss the latest developments and applications of bioinformatics to big data in scientific discovery and biomedical research that are contributing to solving real clinical problems and unmet needs in the healthcare and life sciences environment. Themes include modeling of systems and networks, scalable analysis, big data and computational drug design and repositioning, and translating data to patient care. With the ever-increasing volume of information generated for curing or treating diseases and cancers, bioinformatics technologies, tools and techniques play a critical role in turning data into meaningful biological applications and knowledge.

Day 1 | Day 2 | Day 3 | Plenary Session | Download Brochure 

Monday, February 16

10:30 am Conference Program Registration


11:50 Chairperson’s Opening Remarks

Michael Liebman, Ph.D., Managing Director, IPQ Analytics, LLC

12:00 pm Featured Presentation: Insights into Multiscale Mechanisms of Biological Functions and Polypharmacological Intervention Strategies using Methods of Computational Biology

Ivet Bahar, PhD, Distinguished Professor and John K Vries Chair, Department of Computational & Systems Biology, School of Medicine, University of Pittsburgh

Recent years have seen an explosion in the number of computational studies performed at multiple scales for gaining deeper insights into biomolecular systems dynamics as well as quantitative systems pharmacology. We recently launched a new Center, MMBioS, for multiscale modeling of neurobiological events. The newly developed computational methods open the way to examining complex neurobiological interactions such as excitatory signaling from a systems perspective and identifying new target and polypharmacological intervention methods.

12:30 Multi-Scale Modeling in Breast Cancer: Personalized Medicine in Population-Based Healthcare

Michael Liebman, Ph.D., Managing Director, IPQ Analytics, LLC

Sabrina Molinaro, Ph.D., Head, Epidemiology, Institute of Clinical Physiology, National Research Council - CNR Italy

Breast cancer continues to be a focus of big data research to enhance risk assessment, early detection, accurate diagnosis and optimal treatment. Actual clinical practice must deal directly with the patient in addressing these issues but at a personal level and currently, the tools are derived from population-based analyses. This presentation will present the specific instance of moving from population-based risk analysis to personalized risk assessment based on the patient’s unique physiologic development and lifestyle.

1:00 Session Break

1:15 Luncheon Presentation: Text Mining Full Text for Molecular Targets

George Jiang, Ph.D., Product Manager, Text Mining, Biology Products, Corporate Markets, ELSEVIER

Text mining is the process to derive high quality structured information from unstructured text, and its application can be very beneficial in semi-automating rapid finding of facts and relationships.  Scientific abstracts are invaluable high-quality summaries for researchers.  However, many facts and observations are often excluded from abstracts, appearing only within the body of the full-text article.  By combining text mining and full-text corpuses, researchers can find richer sets of results, particularly for types of information that may be underrepresented in abstracts, such as specific molecular targets.

1:45 Session Break

2:30 Chairperson’s Remarks

Michael Liebman, Ph.D., Managing Director, IPQ Analytics, LLC

2:40 Predicting Drug Pharmacology Networks and Mechanistic Targets

Michael Keiser, Ph.D., Assistant Professor, Institute for Neurodegenerative Diseases, University of California San Francisco, School of Medicine

Many drugs modulate more than one molecular target. We used the Similarity Ensemble Approach (SEA) to predict “liability target” profiles for hundreds of drugs, asking which account for their adverse reactions. Likewise, one may interrogate phenotypic mechanisms of action, using target profiles to guide chemical-genetic testing. In C. elegans, this revealed novel conserved pathways by which compounds up-regulate feeding. Applied at scale, this may automate determination of mechanistic targets underlying drug effects, desired and otherwise.

3:10 Ranking omics Data to Discover Diagnostic Biomarkers

Corrado Priami, Ph.D., Computer Science, The Microsoft Research - University of Trento Centre for Computational and Systems Biology (COSBI)

An approach based on ranking of measurements is presented to identify patient signatures. The signature can be used to define biomarkers with respect to diseases, stratification of patients with respect to interventions or even toxicology of drugs with respect to doses and number of deliveries. This talk presents the approach and application examples.

3:40 Steroid Resistance in Childhood Nephrotic Syndrome: Transcriptome-Wide Sequence Analysis Identifies SULF2 and Other Marker Genes

Saras Saraswathi, Ph.D., Informatics Specialist – Data Analytics, DHI, Sidra Medical and Research Center, Qatar Foundation, Doha, Qatar

Glucocorticoids induce remission of nephrotic syndrome (NS) in most children, although ∼20% present with or develop glucocorticoid resistance. Unfortunately, no biomarkers are available that can reliably distinguish steroid-resistant (SRNS) vs. steroid sensitive (SSNS) forms of NS. We thus sought to identify a gene panel able to distinguish patients with SRNS from those with SSNS, using quantitative transcriptome-wide mRNA sequence analysis of circulating leukocytes, collected both at presentation and after ~8 weeks of glucocorticoid therapy from children with either SSNS or SRNS. The data sets obtained were processed using statistical methods, a “Binary-Coded Genetic Algorithm”, and a neural network-based “Extreme Learning Machine” algorithm, resulting in the identification of twelve candidate genes able to differentiate between SSNS and SRNS patients. Among them is SULF2, which encodes an endoglucosamine-6-sulfatase that is known to be crucial in the physiology of renal podocytes. Subsequent biochemical analyses revealed that SULF2 plasma sulfatase activity ratios (posttreatment/pre-treatment with glucocorticoids) were greater in children with SSNS vs.SRNS, supporting SULF2 as a factor contributing to steroid sensitivity. In summary, differential expression of a small set of genes can differentiate steroid responsiveness in childhood NS, and may be valuable both in improving our understanding of NS pathophysiology and in biomarker development. 

4:10 Enabling Patient Centric NGS Workflows through a Hadoop Optimized Compute Platform and Graph Analytics

David Anstey, MBA, Global Head, Life Sciences, Cray, Inc.

The drive to precision medicine is continually taxing compute, storage, archive and analytics capabilities of current NGS pipelines. Using specific use cases, attendees will gain an understanding of how NGS pipelines can be optimized to eliminate data movement and compute bottlenecks. Attendees will also learn how the data generated by the NGS workflows can be rapidly integrated with patient data to support cohort selection and precision medicine initiatives.

4:40 Break and Transition to Plenary Session

5:00 Plenary Session Panel 

6:00 Grand Opening Reception in the Exhibit Hall with Poster Viewing

7:30 Close of Day

Day 1 | Day 2 | Day 3 | Plenary Session | Download Brochure 

Tuesday, February 17

7:00 am Registration and Morning Coffee

8:00 Plenary Session Panel 

9:00 Refreshment Break in the Exhibit Hall with Poster Viewing


10:05 PANEL DISCUSSION: Rescuing and Repurposing of Drugs for Cancer

Drug repurposing is becoming an attractive business strategy. It reduces risks and costs and creates opportunities to fill pipelines with new products that have a higher level of success, an accelerated development timeframe, and a quicker FDA approval process. Repurposing a drug from outside a therapeutic area and combining the drug with a blockbuster offers the potential to add significant and proprietary clinical value beyond that provided by the blockbuster drug alone. This session discusses examples and benefits of synergistic, repurposed-drug combinations; methods for identifying and developing such product candidates; and market trends to discover and develop novel, high-impact drugs for critical unmet clinical needs in the cancer arena.

Chairperson and Moderator: H. Kim Lyerly, M.D., George Barth Geller Professor of Cancer Research; Professor, Surgery; Associate Professor, Pathology; Assistant Professor, Immunology, Duke University


Devdatt Dubhashi, Ph.D., Professor, Department of Computer Science and Engineering, Chalmers University of Technology

Sandra Gesing, Ph.D., Research Assistant Professor, Center for Research Computing, University of Notre Dame

Prahalad Achutharao, Founder & CEO, InterpretOmics India Pvt. Ltd.

Anil Srivastava, President, Open Health Systems Laboratory

Wenjin Zhou, Ph.D., Assistant Professor, Computer Science and Engineering, Oakland University

12:15 pm Session Break

12:25 Luncheon Presentation: Accelerating Genomics Research using an Integrated High Performance Computing Solution

Yinhe Cheng, Ph.D., Senior Technical Consultant, IBM Life Sciences, Next Generation Sequencing, Software Defined Infrastructure, IBM

Jane Yu, Ph.D., Team Lead, Translational Medicine, Software Defined Infrastructure, IBM

Advancements in genomics research pose challenges for IT leaders, researchers and developers to analyze, share and store large scale data. HPC best practices are required to process data efficiently, including workload management software to optimize the genomics pipeline. Infrequently used data must be archived but still be easily accessible.  We will discuss your compute and data challenges, the latest architecture for a high performing genomics platform, and real-world strategies adopted by leading genomic research institutions.

1:25 Refreshment Break in the Exhibit Hall with Poster Viewing


2:00 Chairperson’s Remarks
Anil Srivastava, President, Open Health Systems Laboratory 

2:10 Systematic Drug Repositioning: Analytics for Clinically Viable Novel Indications

Pankaj Agarwal, Ph.D., Director, Systematic Drug Repositioning, Computational Biology, GlaxoSmithKline

2:40 Talk Title to be Announced

Devdatt Dubhashi, Ph.D., Professor, Department of Computer Science and Engineering, Chalmers University of Technology

3:10 Finding New Uses for Existing Drugs Using Public Drug Data Resources

Atul Butte, M.D., Ph.D., Division Chief and Associate Professor, Stanford University School of Medicine; Director, Center for Pediatric Bioinformatics, Lucile Packard Children’s Hospital; Co-Founder, Personalis and NuMedii

Dr. Butte’s lab at Stanford builds and applies tools that convert more than a trillion points of molecular, clinical, and epidemiological data into diagnostics, therapeutics, and new insights into disease. Dr. Butte, a bioinformatician and pediatric endocrinologist, will highlight his lab’s work on using publicly-available molecular measurements to find new uses for drugs including drug repositioning and discovering new treatable inflammatory mechanisms of disease in Type 2 diabetes.

3:40 Breaking the Classical Barriers to Collaboration and Scientific Discovery - Distance and Data Size 

Michelle Munson, President and Co-Founder, Aspera, an IBM company

Life sciences organizations need to dramatically reduce analytics time and speed up clinical interventions, but most still rely on shipping physical disks due to inherent problems with existing networks and transfer protocol inefficients. Spending days to transport data is not a viable option, this session will explore technology infrastructure for file transfer that will catalyze the transition from 1GbE to 10GbE and beyond.

4:10 Mardi Gras Celebration in the Exhibit Hall with Poster Viewing

5:00 Breakout Discussions in the Exhibit Hall

This interactive session provides attendees an opportunity to choose a specific discussion group to join. Each group has a moderator to ensure focused discussions around key issues within the topic. This format allows participants to meet potential collaborators, share examples from their work, vet ideas with peers, and be part of a group problem-solving endeavor. The discussions provide an informal exchange of ideas and are not meant to be a corporate or specific product discussion.

Big Data Bioinformatics: How to Surf the Tidal Wave

Martin Gollery, CEO, Tahoe Informatics

How to handle all that data using:

  • GPGPU's
  • FPGA's
  • Clouds
  • Phi
  • Hadoop

Bioinformatics, Big Data and Real World Patients; Do They Converge?

Michael Liebman, Ph.D., Managing Director, IPQ Analytics, LLC

Sabrina Molinaro, Ph.D., Head, Epidemiology, Institute of Clinical Physiology, National Research Council - CNR Italy

  • Big Data is bottom-up, Real World Patients are top-down, do they meet?
  • Data integration and data mining vs data modeling: opportunities vs challenges
  • Will technology along be enough to solve real world problems?

6:00 Close of Day

Day 1 | Day 2 | Day 3 | Plenary Session | Download Brochure 

Wednesday, February 18

7:00 am Breakfast Presentation (Sponsorship Opportunity Available) or Morning Coffee

8:00 Plenary Session Panel 

9:45 Refreshment Break and Poster Competition Winner Announced in the Exhibit Hall


10:35 Chairperson’s Remarks

Martin Gollery, CEO, Tahoe Informatics

10:45 Using Amazon Web Services for Large Scale Genomics Analysis

Angel Pizarro, Senior Solutions Architect, Amazon Web Services

In this presentation, we will discuss how customers are utilizing a variety of AWS services to create scalable performance and cost effective solutions for genomics and health big-data workflows. Come learn from technical experts demonstrating best practices for combining S3, Glacier, EC2, EMR, RDS, and RedShift to produce a robust platform for genomics.

11:15 Designing New Algorithms for Emerging Data-Intensive Computing Architectures to Improve the Speed and Accuracy of Shotgun Metagenomic Analysis

Jonathan Allen, Ph.D., Bioinformatics Scientist, Global Security Computing Applications, Lawrence Livermore National Laboratory

Results from analyzing the complete collection of 1000 human genomes data and the complete collection of human microbiome project data using a new data-intensive compute cluster will be used to show how new large memory computing architectures are used for more accurate taxonomic analysis of metagenomic samples that scale with increasing sequencer use.

11:45 File Transfer Capabilities with Globus Online

Ravi Madduri, Fellow, Computation Institute, University of Chicago and Argonne National Lab

12:15 pm Enjoy Lunch on Your Own

1:00 Refreshment Break in the Exhibit Hall and Last Chance for Poster Viewing


1:40 Chairperson’s Remarks

Ajay Shah, Ph.D., MBA, PMP, Director, Research Informatics & Systems, City of Hope National Medical Center

1:50 Finding Cohorts for Clinical Trials – An Integrated Informatics Approach

Ajay Shah, Ph.D., MBA, PMP, Director, Research Informatics & Systems, City of Hope National Medical Center

Integrating discovery, clinical and translational research informatics systems and data can help solve one of the key challenges in clinical trials – finding cohorts for clinical trials. SPIRIT – Software Platform for Integrated Research Information and Transformation is utilized to encode computable eligibility criteria, identify cohorts from EMR system via i2b2 and perform cohort analytics and visualization.

2:20 Building the High Performance Genomics Big Data Platform to Support Drug Discovery &Translational Medicine

Monica Wang, Ph.D., Lead Software Engineer, Project and Program Manager, Research Systems, Takeda

With the great advance of sequencing technology, vast amounts of genomics data is being generated every day. The ultimate challenge will be to best utilize the data and extract knowledge out of it to advance science and medicine. We will share our experience building the enterprise Genomics Big Data Platform with a focus on high performance to support our internal research effort for drug discovery and translational medicine.

2:50 Hackathons: Feed Innovation, Creativity, and Promote Thinking Outside of the Box

Kristen Cleveland, PMP, Senior IT Project Manager, R&D IT, Biogen Idec

Explore what a Hackathon is, how to plan it, and how to get the best out of the event for your organization.

Thomson Reuters3:20 Drug Repositioning in the Era of Precision Medicine

Chris Willis, Ph.D., Manager, Discovery Solution Scientists, IP & Science, Thomson Reuters

Craig Webb, Ph.D., CSO, NuMedii, Inc.

3:50 Refreshment Break

4:00 Chairperson’s Remarks

Chris Willis, Ph.D., Manager, Discovery Solutions, IP & Science, Thomson Reuters

4:10 KEYNOTE PRESENTATION: Global Exchange of Human Genetic Data for Medicine and Research

David Haussler, Ph.D., Distinguished Professor and Scientific Director, UC Santa Cruz Genomics Institute, University of California Santa Cruz

Every human disease is a rare disease at the molecular level. No single institute has enough patients to understand any particular molecular subtype. For genomics to benefit medicine and science, we must share data. This presentation outlines the data standards and Application Programming Interfaces developed by the Global Alliance for Genomics and Health that are intended to address this issue, and highlight a few global genomics projects that use them.

4:40 Data Linking and Warehousing to Support Evaluation of Pathogenicity of Genes and Genetic Variants by the Clinical Genome Resource Project

Xin Feng, Ph.D., Assistant Professor, Bioinformatics Research Lab and Department of Molecular and Human Genetics, Baylor College of Medicine

The Clinical Genome Resource (ClinGen) is an NIH-funded program dedicated to creating a database of clinically relevant genomic variants to inform genome interpretation in a variety of clinical contexts. A core component of ClinGen is ClinGenDB, an integration point for data about variants that supports their computational and manual evaluation by experts. The variant data is integrated from clinical and research databases, including several genomics initiatives. Data Warehousing is the traditional approach to data integration that brings all the relevant data physically together. Data Linking is a new approach to data integration that uses new web standards such as JSON-LD, RDF, and Linked Data Platform 1.0 to integrate data across distinct physical locations. In this presentation, we compare the two approaches by going through a number of use cases of data integration in ClinGenDB for the purpose of evaluating pathogenicity of genetic variants.

5:10 XPRIZE: Transforming Science Fiction into Science Reality through Incentivized Competition

Grant Campany, Senior Director, XPRIZE

Imagine a portable, wireless device in the palm of your hand that monitors and diagnoses your health conditions. That’s the technology envisioned by the $10 million Qualcomm Tricorder XPRIZE competition, and it will allow unprecedented access to personal health metrics. The end result: Radical innovation in healthcare that will give individuals far greater choices in when, where, and how they receive care.

5:40 Close of Conference Program

Day 1 | Day 2 | Day 3 | Plenary Session | Download Brochure 

Premier Sponsors:   


Jackson Laboratory - small logo  


 Precision for Medicine 


Silicon Biosystems 

Thomson Reuters-Large