Japan-Flag Korea-Flag China-Simplified-Flag China-Traditional-Flag  

Premier Sponsors:  

Beckman Coulter Life Sciences 

Boston Healthcare


Charles River Discovery




Silicon Biosystems   




  View All Sponsors 

2014 Archived Content

Cambridge Healthtech Institute’s Inaugural
Genomics & Sequencing Data Integration, Analysis and Visualization
Converging Cloud Computing and Big Data to Support Life Sciences Research
Part of the 21st Annual Molecular Medicine Tri-Conference

February 13-14, 2014 | Westin St. Francis | San Francisco, CA

As data shifts between research, sequencing labs, and the clinic, there is an ever-increasing volume of information generated for curing or treating diseases and cancers. Bioinformatics technologies, tools and techniques play a critical role in not only storing this mountain of information, but turning it into meaningful biological applications and knowledge. Many life science organizations and research labs use internal and external informatics resources to store sequencing data. This Symposium will cover real-world use cases across many areas, including drug discovery and design, R&D, molecular modeling, next-generation sequencing, and bioinformatics. Thought leaders will discuss the convergence of cloud computing and big data to support life sciences research.


Day 1 | Day 2 | Symposia Brochure | Full Event Brochure | Symposium Attendee List 

Thursday, February 13

7:30 am Registration and Morning Coffee


9:00 Chairperson’s Opening Remarks

Scott Kahn, Ph.D., CIO and Vice President, Informatics, Illumina


Genomic Big Data: Benefits and Challenges of Large Scale Information Aggregation

Scott KahnScott Kahn, Ph.D., CIO and Vice President, Informatics, Illumina

The recent trend towards the large-scale aggregation of genomic and phenotypic information creates several unique benefits to advancing many areas of the life sciences that belies the challenges to the practitioner. Such “genomic Big Data” can be viewed from a variety of overlapping perspectives that all must converge for a practical solution to emerge. This presentation will introduce a framework for dissecting these challenges and will discuss progress to date on achieving practical solutions for the scientist as well as the informatician.

9:35 High-Performance Access to Large, Diverse Genomics Data Set

Carl Meinhof, Ph.D., Manager, Research Informatics, IT, Ceres, Inc.

Scientists expect to navigate genomic data with the same ease and speed that they can navigate geographic data. We have developed a genome browser that uses algorithms from game development to provide high-performance visualization of genomics data. Data from multiple sources can be integrated in a relational database backend, but users can also visualize data from files. The database can be hosted in the cloud to facilitate sharing of data. Due to its high speed and ease of use the browser enables playful exploration of data. This presentation demonstrates live examples of how the application can be used and how it performs.

10:05 Integrative Analyses on Clinical Transcriptomics for Drug Discovery Programs

Deepak K. Rajpal, D.V.M., Ph.D., Director, Computational Biology, GlaxoSmithKline

Integrative analyses offer the power to bring together data from multiple sources. We will present a brief overview of the studies we have conducted for drug discovery programs.

10:35 Coffee Break with Exhibit and Poster Viewing


11:05 Bioinformatics in the Amazon Cloud 

Angel Pizarro, Senior Solutions Architect, Amazon Web Services

Learn how health care and life sciences organizations are leveraging the integration between Amazon DynamoDB, Amazon Elastic MapReduce, and Amazon Redshift to manage and compute their data at high scale for the entire data lifecycle: from creation to analysis. In this session, we will provide an introduction to Amazon Web Services, plus we will describe 21st century architecture design patterns leveraging cloud computing, and finally we will highlight a couple of customer success stories in the biomedical and life sciences industries. Using existing SQL-based tools and business intelligence systems in the Amazon cloud, you will learn how to gain deeper insight from your data at lower cost and without the traditional headaches of managing your own infrastructure.  

11:35 Scaling Systems for Research Computing

Adam Kraut, Scientific Consultant, BioTeam

12:05 pm Machine Learning Approaches to Predicting 7-day Hospital Readmission in Children
Saras Saraswathi, Ph.D., Clinical Instructor, Pediatrics, Ohio State University; Postdoctoral Research Associate, Battelle Center for Mathematical Medicine, Research Institute, Nationwide Children’s Hospital 

12:35 Luncheon Presentation (Sponsorship Opportunity Available) or Lunch on Your Own

1:05 Session Break


1:50 Chairperson’s Remarks
Ted Slater, Senior Solutions Architect, Life Sciences, YarcData, a Cray Company 

1:55 OpenBel: Data Standards and Knowledge Engineering for the Life Sciences

Ted Slater, Senior Solutions Architect, Life Sciences, YarcData, a Cray Company

The recent emphasis on big data and cloud computing has brought with it a sharper focus on data-centricity and infrastructure convergence. While these are excellent goals in principle, they are very difficult to achieve, in large part because of legacy knowledge representation and architecture choices that work primarily to create data silos. Data silos, in turn, are brittle, non-interoperable solutions that can severely hinder modern data infrastructure efforts. OpenBeL is an open source knowledge representation standard, together with a set of software tools, that can help eliminate data silos and fully enable knowledge-based life sciences research.

2:25 Mining the Human Immune System with NGS, AbGenesis & the Cloud

Giles Day, CEO, Distributed Bio

Using NGS sequencing it is now possible to gain insights into how antibody repertoires respond and adapt during treatments such as vaccination, immunomodulation and tumor suppression. The millions of sequence reads and complexity of the data require the adoption of powerful algorithms which in turn require enormous compute resources. This talk will give examples of how simple tools can now be used by bench scientists to mine the immune system.

2:55 Refreshment Break with Exhibit and Poster Viewing

3:25 Integrated Research Data Management and Analysis in NGS Using Globus Genomics

Ravi Madduri, Fellow, Computation Institute, University of Chicago; Project Manager, Math and Computer Science Division, Argonne National Lab

In this talk we will present Globus Genomics. Globus Genomics is a robust, scale on-demand solution that provides end-to-end research data management for Next-Gen Sequencing Analysis using Galaxy, Globus Online and Amazon Web Services. The emphasis is on providing the researcher with a high degree of flexibility to inspect, customize, and configure NGS analysis tools and workflows, and share findings with collaborators.

3:55 A Multi-Center Biomarkers Knowledge Environment for NCI’s EDRN Early Detection Cancer Research Program

Daniel Crichton, Informatics PI, NASA’s Jet Propulsion Laboratories

NASA Jet Propulsion Laboratory and the National Cancer Institute have developed a comprehensive knowledge environment to support the capture, processing, management, analysis and distribution of results from biomarker research generated from the Early Detection Research Network (EDRN). The knowledge environment leverages a distributed, open source infrastructure, originally developed at NASA’s Jet Propulsion Laboratory, to support scientific data management, archiving and distribution for NASA’s planetary and Earth robotic missions. The knowledge environment leverages modern informatics technologies for bringing the multi-center EDRN into a distributed, virtual enterprise. This talk will introduce the project, describe the transfer of technologies between space and cancer research, and lessons learned in building a national enterprise.

4:25 Breakout Discussions
These interactive discussion groups are open to all attendees, speakers, sponsors, & exhibitors. Participants choose a specific breakout discussion group to join. Each group has a moderator to ensure focused discussions around key issues within the topic. This format allows participants to meet potential collaborators, share examples from their work, vet ideas with peers, and be part of a group problem-solving endeavor. The discussions provide an informal exchange of ideas and are not meant to be a corporate or specific product discussion. 

Platform Independence, Good Idea or Bad Idea

Angel Pizarro, Senior Solutions Architect, Amazon Web Services

Research Data Management in Genomics

Ravi Madduri, Fellow, Computation Institute, University of Chicago; Project Manager, Math and Computer Science Division, Argonne National Lab

  • Where the data is generated ?
  • Where does it needs to go?
  • What are the various analyses that need to be done on the data?
  • What are the different types of data?
  • Typical Volumes of data
  • Data life cycle


Michael Shoffner, Senior Research Software Architect, Renaissance Computing Institute (RENCI)/University of North Carolina Chapel Hill (UNC); Adjunct Instructor, School of Information and Library Science (SILS), UN

  • Threat landscape & trends
  • Business & regs issues
  • Research directions

5:25 Close of Day 

Day 1 | Day 2 | Symposia Brochure | Full Event Brochure