Picture of George, a man with short grey/white hair and a long beard.

George Church

Professor of Genetics at Harvard Medical School

Afternoon Keynote: Near Future Applications of Petabyte Workflows

Speaker Bio: George Church is Professor of Genetics at Harvard Medical School and Director of PersonalGenomes.org, which provides the world's only open-access information on human Genomic, Environmental & Trait data (GET). His 1984 Harvard PhD included the first methods for direct genome sequencing, molecular multiplexing & barcoding. These led to the first genome sequence (pathogen, Helicobacter pylori) in 1994 . His innovations have contributed to nearly all "next generation" DNA sequencing methods and companies (CGI-BGI, Life, Illumina, Nanopore). This plus his lab's work on chip-DNA-synthesis, gene editing and stem cell engineering resulted in founding additional application-based companies spanning fields of medical diagnostics ( Knome/PierianDx, Alacris, AbVitro/Juno, Genos, Veritas Genetics ) & synthetic biology / therapeutics ( Joule, Gen9, Editas, Egenesis, enEvolv, WarpDrive ). He has also pioneered new privacy, biosafety, ELSI, environmental & biosecurity policies. He is director of an IARPA BRAIN Project and NIH Center for Excellence in Genomic Science. His honors include election to NAS & NAE & Franklin Bower Laureate for Achievement in Science. He has coauthored 590 papers, 155 patent publications & a book (Regenesis).

Picture of Chris, a man with short brown hair, a blue dress shirt and dark sports coat

Chris Dwan

Data and Technology Strategies for Genomics

Morning Keynote: Towards Whole Genome Sequencing as Standard of Care
Panel Moderator: Future of Biomedical Infrastructure

Speaker Bio: Chris Dwan has been building systems and teams in support of genomics and precision medicine for more than 20 years. He was the first employee of the noted consulting company BioTeam, helped build the NY Genome Center, led research computing and IT at the Broad, and most recently served as SVP of Production Bioinformatics at Sema4. He has advised dozens of organizations, including universities, biotechs, and pharmaceutical companies on data and computing strategies for genomics.

Photo of Geraldine, a women with short brown hair and red top

Geraldine Van der Auwera

Director of Outreach and Communications at Broad Institute of MIT and Harvard

Panelist: Future of Biomedical Infrastructure

Speaker Bio: Dr. Geraldine Van der Auwera directs outreach and communication efforts for the Data Sciences Platform at the Broad Institute. As part of that role, she serves as an educator and advocate for researchers who use DSP software and services including GATK, the Broad's industry-leading toolkit for variant discovery analysis; the Cromwell/WDL workflow management system; and Terra.bio, a cloud-based analysis platform that integrates computational resources, methods repository and data management in a user-friendly environment. She is the co-lead of the Large-Scale Genomics workstream of the Global Alliance for Genomics and Health, and co-author of Genomics in the Cloud, a book published by O'Reilly Media and available at https://oreil.ly/genomics-cloud.
Dr. Van der Auwera received her Ph.D. in Biological Engineering from the Université catholique de Louvain (UCL) in Louvain-la-Neuve, Belgium in 2007, and trained as a postdoctoral fellow in the Kolter lab at Harvard Medical School, Department of Microbiology

Picture of Asya, a person with a hat sitting in a fancy race car.

Asya Shklyar

Global HPC Strategy Engineer

Panelist: Future of Biomedical Infrastructure

Speaker Bio: Asya Shklyar has been in IT since 1995, installing and configuring research infrastructures that work reliably for the people using them, and is currently applying what she learned from years in working at life sciences and pharma companies, with a brief departure to aerospace, in the manufacturing settings. The industry may be different but the data wrangling issues are the same everywhere. The newest hobby is building a car and racing it next year.

Picture of default grey figure for speakers without photos

Jan Kuentzer

Senior Principal Scientist at Roche Diagnostics GmbH

Panelist: Future of Biomedical Infrastructure

Photo of Brad, a male with short hair wearing rectangular glasses with dark frames and a blue shirt.

Brad Chapman

Principal Data Science Architect at Ginkgo Bioworks, Inc.

Using Arvados Keep for Versioned Organization of Synthetic Biology Reference Genomes

Talk Abstract: At Ginkgo Bioworks we engineer a wide range of organisms requiring organization of diverse reference genomes across the bacterial, fungal, animal and plant kingdoms. Annotations, sequences and associated data files continually improve as we sequence and further characterize the genomes. Arvados Keep serves as a centralized location to organize and standardize these versioned updates. It allows us to provide easy access to flat files, searching with standardized metadata, and automation of processes that ensure consistent representation and use of genomes. We'll discuss how we use the API and UI to make genomes available to researchers.

Photo of Moritz man with brown beard and short hair wearing a blue polo shirt

Moritz Gilsdorf

Principal Scientist at Roche Pharma Early Research and Development

Using Arvados as Foundation of the Roche Digital Pathology Platform

Talk Abstract: In this talk we will present how we at Roche developed a platform for Digital Pathology. Arvados is used as the central component for the management of large volumes of raw imaging data, meta data and annotations and analysis results. We will share the architecture, the integration approaches, challenges and lessons learned as well as how we plan to evolve in the future.

Photo of Shanon, a woman with brown hair wearing a green top

Shanon Seger

Scientific Area Lead Digital Pathology at Roche

Using Arvados as Foundation of the Roche Digital Pathology Platform

Talk Abstract: In this talk we will present how we at Roche developed a platform for Digital Pathology. Arvados is used as the central component for the management of large volumes of raw imaging data, meta data and annotations and analysis results. We will share the architecture, the integration approaches, challenges and lessons learned as well as how we plan to evolve in the future.

Photo of Monika, a woman with brown hair and a red blouse

Monika Krzyżanowska

Bioinformatician at Roche Pharma Early Research and Development

Arvados As Efficient Tool for Large Scale Genetics Analysis

Talk Abstract: Next Generation Sequencing data processing requires a lot a of computational power especially with large number of samples. To perform large scale genetics analysis it is important to use proper tools to do it efficiently and in reasonable time. Arvados is a tool that allows to perform easy development and data processing. We would like to present you with our approach to perform large scale genetic analysis and how Arvados features like workflow engine, keep file system and workbench helped us achieve that. We developed and used workflow using CWL that has capabilities to process at least 10000 Whole Genome Sequencing samples in single run.

Photo of Carlos, a man with tied back brown curly hair and a beard

Carlos Fenoy

Senior HPC and Linux Engineer at Roche

Roche Arvados Federated Setup

Talk Abstract: In this presentation we will show the Roche Arvados federated setup, show some of the challenges we have had since the adoption of Arvados and discuss future plans for our setup

Picture of Peter, man with brown curly hair and glasses

Peter Amstutz

Chief Technology Officer at Curii

End to End Data Processing with Arvados
Panelist: Future of Biomedical Infrastructure

Talk Abstract: By combining robust data management with robust compute management capabilities, Arvados is uniquely able to capture the entire process in a way that single-purpose workflow management or data management platforms cannot. This talk will address best practices for creating end-to-end processing workflows in Arvados. This will include recommendations for the ingestion, organization, processing and publishing of data as well as the efficient orchestrating of these crucial data processing steps.

Picture of Sarah, women with brown curly hair

Sarah Wait Zaranek

CEO at Curii

Machine Learning on Genomes at Scale with Arvados Lightning

Talk Abstract: Artificial Intelligence for Alzheimer’s Disease (AI4AD) is a coordinated national initiative to develop transformative AI approaches for high throughput analysis of next generation sequencing and related AD biomarker and cognitive data. One of the project’s initial aims is to discover new genetic signatures that can be used in conjunction with biomarker and cognitive data. My talk will focus on the use of Arvados data “tiling” to perform ML models of ~15,000 WGS genomes (100+ TB) from the Alzheimer's Disease Sequencing Project (ADSP) including ADNI. Tiling efficiently represents genomic sequences into small segments. This lossless representation is particularly suitable for machine learning (ML). We use ML classification methods to predict phenotypes, as well as, identify and prioritize possible tile variants/genetic variants that are possible genetic signatures of AD. Preliminary results are promising indicating 20-40 tile variants of interest which correspond to genetic variants and genes of interest. Many of the associated genes were found in previous studies to be associated with increased AD risk, PHF-tau measurements, neurofibrillary tangles measurements, hippocampal atrophy, cortical thickness, and neuritic plaques. Addition to these known genes/variants, novel genes and variants are identified for further investigation.