Monday, March 2 - Wednesday, March 4
DAY ONE: 12:00 PM - 4:15 PM | Day TWO: 8:00 am - 5:00 pm | Day THREE: 8:00 am - 1:30 pm
Molecular Medicine Tri-Conference & Bio-IT World WEST’s inaugural Bioinformatics Pipelines for Preclinical Drug Discovery Hackathon will bring together stakeholders from across pharma R&D to tackle datasets and projects
with maximum impact potential. The Tri-Con is proud to bring together innovative data scientists and developers from across the industry to solve real-world data
The Bioinformatics Pipelines for Preclinical Drug Discovery Hackathon is taking place March 2-4, 2020 at the Molecular Medicine Tri-Conference & Bio-IT World Conference & Expo WEST.
Confirmed Projects Include:
Dashboard for Comparisons of RNA-Seq Bioinformatics Approaches for Preclinical Analyses
Differences in RNA-seq methodology reporting limits reproducibility of RNA-seq based results. For clinically applicable RNAseq read counts as well as preclinical data robustness, comparisons across standard RNA-seq data processing pipelines (RSEM, Kallisto,
etc.) on non-simulated data would be useful to end user-researchers. Project participants will work on a web application that enables users to visualize real genome-wide expression data from NCBI’s SRA and GEO databases using plotly with Python.
Identifying Differentially Expressed Tumor-Specific Genes in Kidney Tissue Samples Using Deep Learning
Clemson University, Research to the People, and RareKidneyCancer.org
This study takes two approaches at identifying a list of genes that are statistically significant in tumor expression levels when compared against normal tissue samples.
The first approach first combined Bill Paseman’s RNAseq vector with the
hundreds of correlated TCGA samples. From there, we have used the tool TSPG (transcriptome state perturbation generator) to compare the single vector of KIRP expression data with the larger collections of GTEx and TCGA data. TSPG has allowed for statistical
analysis of the vector, even though the sample size n=1, by leveraging the publicly available data that your tumor has been associated with. The hope for the future is that we can take this list of genes that are both significantly up/down regulated
and statistically important and search for drugs that will alter the transcriptome state of the cells to match the findings of our perturbation generator.
Another approach focuses on extracting novel gene-gene relationships based on differential
RNA-seq expression levels between GTEx and TCGA kidney samples. We develop two algorithms, one using blob detection and the other using a deep learning architecture on a compressed data representation of the original gene expression matrix to construct
a differentially expressed gene correlation network (GCN). We hypothesize that this GCN captures genetic relationships that are specific to kidney cancer.
This project will encompass techniques to converge both these approaches with the result
of isolating genes and relationships that demonstrate differential expression in kidney tumor samples.
We are looking for team leads for the following projects! If you are interested, please reach out to Kaitlyn Barago
Haplotypes for Drug Discovery and Efficacy
Combinations of mutations within recombination sites are not random but inherited. Moreover, they often affect the way particular proteins interact with drugs. We will develop a standard pipeline to screen for haplotypes that interact with drugs in particular
Pipelines for Initial Analysis of Specific, Personalized Therapeutics
This project will not only use an individualized dataset and compare it to community genomes, it will build a pipeline for such community genomic characterization for treatment of not only an individuals' tumor or disease type, but for the tumor or disease
types of folks with similar germline mutations.
Building Reproducible Pipelines for Clustering Variants in Complex Disease
Polygenic risk scores (PRS) are a blunt tool unpopular with the clinical genetics community, and a consensus biolean vocabulary has been frustrated by community unfamiliarity with compound variants in cis- or trans-. This team will develop not only a
more refined way to present complex variants than PRS, but also help to develop a vocabulary to communicate such clusters to the FDA and other federal agencies.
Register for the 2020 Bioinformatics for Preclinical Drug Discovery Hackathon by clicking HERE!
Learn more about data hackathons:
Bio-IT FAIR Data Hackathon ‘Pushes the Needle’ in Science
NLM in Focus: What the Hack?
NCBI Biohackathons GitHub