Cambridge Healthtech Institute’s Inaugural
Genomics & Sequencing Data Integration, Analysis and Visualization
Converging Cloud Computing and Big Data to Support Life Sciences Research
Part of the 21st Annual Molecular Medicine Tri-Conference
February 13-14, 2014 | Westin St. Francis | San Francisco, CA
As data shifts between research, sequencing labs, and the clinic, there is an ever-increasing volume of information generated for curing or treating diseases and cancers. Bioinformatics technologies, tools and techniques play a critical role in not only storing this mountain of information, but turning it into meaningful biological applications and knowledge. Many life science organizations and research labs use internal and external informatics resources to store sequencing data. This Symposium will cover real-world use cases across many areas, including drug discovery and design, R&D, molecular modeling, next-generation sequencing, and bioinformatics. Thought leaders will discuss the convergence of cloud computing and big data to support life sciences research.
Day 1 | Day 2 | Symposia Brochure | Full Event Brochure
Thursday, February 13
7:30 am Registration and Morning Coffee
9:00 Chairperson’s Opening Remarks
9:05 KEYNOTE PRESENTATION:
Genomic Big Data: Benefits and Challenges of Large Scale Information Aggregation
Scott Kahn, Ph.D., CIO and Vice President, Informatics, Illumina
The recent trend towards the large-scale aggregation of genomic and phenotypic information creates several unique benefits to advancing many areas of the life sciences that belies the challenges to the practitioner. Such “genomic Big Data” can be viewed from a variety of overlapping perspectives that all must converge for a practical solution to emerge. This presentation will introduce a framework for dissecting these challenges and will discuss progress to date on achieving practical solutions for the scientist as well as the informatician.
9:35 High-Performance Access to Large, Diverse Genomics Data Set
Carl Meinhof, Ph.D., Manager, Research Informatics, IT, Ceres, Inc.
Scientists expect to navigate genomic data with the same ease and speed that they can navigate geographic data. We have developed a genome browser that uses algorithms from game development to provide high-performance visualization of genomics data. Data from multiple sources can be integrated in a relational database backend, but users can also visualize data from files. The database can be hosted in the cloud to facilitate sharing of data. Due to its high speed and ease of use the browser enables playful exploration of data. This presentation demonstrates live examples of how the application can be used and how it performs.
10:05 Integrative Analyses on Clinical Transcriptomics for Drug Discovery Programs
Deepak K. Rajpal, D.V.M., Ph.D., Director, Computational Biology, GlaxoSmithKline
Integrative analyses offer the power to bring together data from multiple sources. We will present a brief overview of the studies we have conducted for drug discovery programs.
10:35 Coffee Break with Exhibit and Poster Viewing
Chairperson: Harpreet Singh, Ph.D., Scientist – D, Indian Council of Medical Research
11:05 Bioinformatics in the Amazon Cloud
Ben Butler, Senior Manager, Big Data, Amazon Web Services
Learn how health care and life sciences organizations are leveraging the integration between Amazon DynamoDB, Amazon Elastic MapReduce, and Amazon Redshift to manage and compute their data at high scale for the entire data lifecycle: from creation to analysis. In this session, we will provide an introduction to Amazon Web Services, plus we will describe 21st century architecture design patterns leveraging cloud computing, and finally we will highlight a couple of customer success stories in the biomedical and life sciences industries. Using existing SQL-based tools and business intelligence systems in the Amazon cloud, you will learn how to gain deeper insight from your data at lower cost and without the traditional headaches of managing your own infrastructure.
11:35 Scaling Systems for Research Computing
Adam Kraut, Scientific Consultant, BioTeam
12:05 pm Sponsored Presentations (Opportunities Available)
12:35 Luncheon Presentation (Sponsorship Opportunity Available) or Lunch on Your Own
1:05 Session Break
1:50 Chairperson’s Remarks
1:55 OpenBel: Data Standards and Knowledge Engineering for the Life Sciences
Ted Slater, Senior Solutions Architect, Life Sciences, YarcData, a Cray Company
The recent emphasis on big data and cloud computing has brought with it a sharper focus on data-centricity and infrastructure convergence. While these are excellent goals in principle, they are very difficult to achieve, in large part because of legacy knowledge representation and architecture choices that work primarily to create data silos. Data silos, in turn, are brittle, non-interoperable solutions that can severely hinder modern data infrastructure efforts. OpenBeL is an open source knowledge representation standard, together with a set of software tools, that can help eliminate data silos and fully enable knowledge-based life sciences research.
2:25 Mining the Human Immune System with NGS, AbGenesis & the Cloud
Giles Day, CEO, Distributed Bio
Using NGS sequencing it is now possible to gain insights into how antibody repertoires respond and adapt during treatments such as vaccination, immunomodulation and tumor suppression. The millions of sequence reads and complexity of the data require the adoption of powerful algorithms which in turn require enormous compute resources. This talk will give examples of how simple tools can now be used by bench scientists to mine the immune system.
2:55 Refreshment Break with Exhibit and Poster Viewing
3:25 Integrated Research Data Management and Analysis in NGS Using Globus Genomics
Ravi Madduri, Fellow, Computation Institute, University of Chicago; Project Manager, Math and Computer Science Division, Argonne National Lab
In this talk we will present Globus Genomics. Globus Genomics is a robust, scale on-demand solution that provides end-to-end research data management for Next-Gen Sequencing Analysis using Galaxy, Globus Online and Amazon Web Services. The emphasis is on providing the researcher with a high degree of flexibility to inspect, customize, and configure NGS analysis tools and workflows, and share findings with collaborators.
3:55 A Multi-Center Biomarkers Knowledge Environment for NCI’s EDRN Early Detection Cancer Research Program
Daniel Crichton, Informatics PI, NASA’s Jet Propulsion Laboratories
NASA Jet Propulsion Laboratory and the National Cancer Institute have developed a comprehensive knowledge environment to support the capture, processing, management, analysis and distribution of results from biomarker research generated from the Early Detection Research Network (EDRN). The knowledge environment leverages a distributed, open source infrastructure, originally developed at NASA’s Jet Propulsion Laboratory, to support scientific data management, archiving and distribution for NASA’s planetary and Earth robotic missions. The knowledge environment leverages modern informatics technologies for bringing the multi-center EDRN into a distributed, virtual enterprise. This talk will introduce the project, describe the transfer of technologies between space and cancer research, and lessons learned in building a national enterprise.
4:25 Breakout Discussions
5:25 Close of Day
Day 1 | Day 2 | Symposia Brochure | Full Event Brochure