Speakers
George Church
Professor of Genetics at Harvard Medical School
Afternoon Keynote: Near Future Applications of Petabyte Workflows
Speaker Bio: George Church is Professor of Genetics at Harvard Medical School and Director of PersonalGenomes.org, which provides the world's only open-access information on human Genomic, Environmental & Trait data (GET). His 1984 Harvard PhD included the first methods for direct genome sequencing, molecular multiplexing & barcoding. These led to the first genome sequence (pathogen, Helicobacter pylori) in 1994 . His innovations have contributed to nearly all "next generation" DNA sequencing methods and companies (CGI-BGI, Life, Illumina, Nanopore). This plus his lab's work on chip-DNA-synthesis, gene editing and stem cell engineering resulted in founding additional application-based companies spanning fields of medical diagnostics ( Knome/PierianDx, Alacris, AbVitro/Juno, Genos, Veritas Genetics ) & synthetic biology / therapeutics ( Joule, Gen9, Editas, Egenesis, enEvolv, WarpDrive ). He has also pioneered new privacy, biosafety, ELSI, environmental & biosecurity policies. He is director of an IARPA BRAIN Project and NIH Center for Excellence in Genomic Science. His honors include election to NAS & NAE & Franklin Bower Laureate for Achievement in Science. He has coauthored 590 papers, 155 patent publications & a book (Regenesis).
Chris Dwan
Data and Technology Strategies for Genomics
Morning Keynote: Towards Whole Genome Sequencing as Standard of Care
Panel Moderator: Future of Biomedical Infrastructure
Speaker Bio: Chris Dwan has been building systems and teams in support of genomics and precision medicine for more than 20 years. He was the first employee of the noted consulting company BioTeam, helped build the NY Genome Center, led research computing and IT at the Broad, and most recently served as SVP of Production Bioinformatics at Sema4. He has advised dozens of organizations, including universities, biotechs, and pharmaceutical companies on data and computing strategies for genomics.
Geraldine Van der Auwera
Director of Outreach and Communications at Broad Institute of MIT and Harvard
Panelist: Future of Biomedical Infrastructure
Speaker Bio: Dr. Geraldine Van der Auwera directs outreach and communication efforts for the Data Sciences Platform at the Broad Institute. As part of that role, she serves as an educator and advocate for researchers who use DSP software and services including GATK, the Broad's industry-leading toolkit for variant discovery analysis; the Cromwell/WDL workflow management system; and Terra.bio, a cloud-based analysis platform that integrates computational resources, methods repository and data management in a user-friendly environment. She is the co-lead of the Large-Scale Genomics workstream of the Global Alliance for Genomics and Health, and co-author of Genomics in the Cloud, a book published by O'Reilly Media and available at https://oreil.ly/genomics-cloud.
Dr. Van der Auwera received her Ph.D. in Biological Engineering from the Université catholique de Louvain (UCL) in Louvain-la-Neuve, Belgium in 2007, and trained as a postdoctoral fellow in the Kolter lab at Harvard Medical School, Department of Microbiology
Asya Shklyar
Global HPC Strategy Engineer
Panelist: Future of Biomedical Infrastructure
Speaker Bio: Asya Shklyar has been in IT since 1995, installing and configuring research infrastructures that work reliably for the people using them, and is currently applying what she learned from years in working at life sciences and pharma companies, with a brief departure to aerospace, in the manufacturing settings. The industry may be different but the data wrangling issues are the same everywhere. The newest hobby is building a car and racing it next year.
Jan Kuentzer
Senior Principal Scientist at Roche Diagnostics GmbH
Panelist: Future of Biomedical Infrastructure
Brad Chapman
Principal Data Science Architect at Ginkgo Bioworks, Inc.
Using Arvados Keep for Versioned Organization of Synthetic Biology Reference Genomes
Talk Abstract: At Ginkgo Bioworks we engineer a wide range of organisms requiring organization of diverse reference genomes across the bacterial, fungal, animal and plant kingdoms. Annotations, sequences and associated data files continually improve as we sequence and further characterize the genomes. Arvados Keep serves as a centralized location to organize and standardize these versioned updates. It allows us to provide easy access to flat files, searching with standardized metadata, and automation of processes that ensure consistent representation and use of genomes. We'll discuss how we use the API and UI to make genomes available to researchers.
Moritz Gilsdorf
Principal Scientist at Roche Pharma Early Research and Development
Using Arvados as Foundation of the Roche Digital Pathology Platform
Talk Abstract: In this talk we will present how we at Roche developed a platform for Digital Pathology. Arvados is used as the central component for the management of large volumes of raw imaging data, meta data and annotations and analysis results. We will share the architecture, the integration approaches, challenges and lessons learned as well as how we plan to evolve in the future.
Shanon Seger
Scientific Area Lead Digital Pathology at Roche
Using Arvados as Foundation of the Roche Digital Pathology Platform
Talk Abstract: In this talk we will present how we at Roche developed a platform for Digital Pathology. Arvados is used as the central component for the management of large volumes of raw imaging data, meta data and annotations and analysis results. We will share the architecture, the integration approaches, challenges and lessons learned as well as how we plan to evolve in the future.
Monika Krzyżanowska
Bioinformatician at Roche Pharma Early Research and Development
Arvados As Efficient Tool for Large Scale Genetics Analysis
Talk Abstract: Next Generation Sequencing data processing requires a lot a of computational power especially with large number of samples. To perform large scale genetics analysis it is important to use proper tools to do it efficiently and in reasonable time. Arvados is a tool that allows to perform easy development and data processing. We would like to present you with our approach to perform large scale genetic analysis and how Arvados features like workflow engine, keep file system and workbench helped us achieve that. We developed and used workflow using CWL that has capabilities to process at least 10000 Whole Genome Sequencing samples in single run.
Carlos Fenoy
Senior HPC and Linux Engineer at Roche
Roche Arvados Federated Setup
Talk Abstract: In this presentation we will show the Roche Arvados federated setup, show some of the challenges we have had since the adoption of Arvados and discuss future plans for our setup
Peter Amstutz
Chief Technology Officer at Curii
End to End Data Processing with Arvados
Panelist: Future of Biomedical Infrastructure
Talk Abstract: By combining robust data management with robust compute management capabilities, Arvados is uniquely able to capture the entire process in a way that single-purpose workflow management or data management platforms cannot. This talk will address best practices for creating end-to-end processing workflows in Arvados. This will include recommendations for the ingestion, organization, processing and publishing of data as well as the efficient orchestrating of these crucial data processing steps.
Sarah Wait Zaranek
CEO at Curii
Machine Learning on Genomes at Scale with Arvados Lightning
Talk Abstract: Artificial Intelligence for Alzheimer’s Disease (AI4AD) is a coordinated national initiative to develop transformative AI approaches for high throughput analysis of next generation sequencing and related AD biomarker and cognitive data. One of the project’s initial aims is to discover new genetic signatures that can be used in conjunction with biomarker and cognitive data. My talk will focus on the use of Arvados data “tiling” to perform ML models of ~15,000 WGS genomes (100+ TB) from the Alzheimer's Disease Sequencing Project (ADSP) including ADNI. Tiling efficiently represents genomic sequences into small segments. This lossless representation is particularly suitable for machine learning (ML). We use ML classification methods to predict phenotypes, as well as, identify and prioritize possible tile variants/genetic variants that are possible genetic signatures of AD. Preliminary results are promising indicating 20-40 tile variants of interest which correspond to genetic variants and genes of interest. Many of the associated genes were found in previous studies to be associated with increased AD risk, PHF-tau measurements, neurofibrillary tangles measurements, hippocampal atrophy, cortical thickness, and neuritic plaques. Addition to these known genes/variants, novel genes and variants are identified for further investigation.