Data Modernization: Improving the usefulness of genomic data


Kelsey Florek, PhD, MPH
Senior Genomics and Data Scientist
Wisconsin State Laboratory of Hygiene
May 21, 2024

Slides live at:
www.k-florek.net/talks

Supported By

AWS Diagnostic Development Initiative (DDI)

  1. Necessities of Next Generation Sequencing Capacity Building
  2. Blueprints for an NGS Data Solution
  3. Simplifying Genomics for Public Health Partners

Expanding Genomic Sequencing Capacity

Pre SARS-CoV-2 Pandemic

  • 4x Illumina MiSeq
  • 1x ONT MinION

Post SARS-CoV-2 Pandemic

  • 4x Illumina MiSeq
  • 2x NextSeq 2000
  • 1x ONT GridION
  • 1x Eppendorf epMotion
  • 1x Tecan Fluent 780 NGS Dream Prep

Over 900% increase in sequencing data generation capacity

NGS Data Storage

Improvements in Analytical Approaches

Old Approach

  • Entirely Python Based
  • Limited logging and fault tolerance
  • Required installing complicated and often conflicting dependencies

New Approach

  • Nextflow - Nf-Core Based
  • Containerized Steps
  • Detailed Logging
  • Compatible with a variety of Cloud and HPC environments
  • Supports a high degree of job parallelization and horizontal scalability
  1. Necessities of Next Generation Sequencing Capacity Building
  2. Blueprints for an NGS Data Solution
  3. Simplifying Genomics for Public Health Partners

Bioinformatics analytical infrastructure

  • Highly scalable and capable of managing burst data
  • Highly reliable and fault tolerant
  • Cost effective
  • Adaptable to changing needs
  • Detailed logging and traceability

WSLH Bioinformatics Analytical Infrastructure

AWS Batch

AWS Batch automatically provisions compute resources and optimizes the workload distribution based on the quantity and scale of the workloads.

AWS Batch

Nextflow Tower is an intuitive centralized command post that enables data analysis at scale. With Tower, users can easily launch, manage, and monitor scalable Nextflow data analysis pipelines and compute environments on-premises or across the cloud providers of their choice.

Nextflow Tower - Dashboard

Nextflow Tower - Monitor

AWS Athena

Connecting Data Across Siloed Systems

  1. Necessities of Next Generation Sequencing Capacity Building
  2. Blueprints for an NGS Data Solution
  3. Simplifying Genomics for Public Health Partners

Need for a centralized resource

AMD Bioinformatics Regional Resource - Midwest Region



Ad-hoc Analytical Support

Provision of Computational Resources

Need for a centralized resource

SARS-CoV-2 Genomic Surveillance

COVID-19 Genomics UK (COG-UK) CLIMB-COVID

COVID-19 Genomics UK (COG-UK) CLIMB-COVID

Easy Genomics Partnership

Easy Genomics - Minimal Viable Product

  • Simplify the process of launching and monitoring workflows
  • Provide the ability for users to upload sequence data through the web browser
  • Allow users to download analysis results through the web browser
  • User/Lab separation

Easy Genomics - Sequence Data Upload

Easy Genomics - Launch

Easy Genomics - Launch

Easy Genomics - Monitor

Easy Genomics - Monitor

Easy Genomics - Roadmap

  • 2024 Spring - Deploy Easy Genomics for internal use
  • 2024 Early Summer - Open Access to SARS-CoV-2 Sequencing Laboratories
  • 2024 Mid Summer - Easy Genomics MVP Update

Acknowledgments

Abigail Shockey, PhD
Christopher Jossart, MPH
Dustin Lyfoung, MS
Thomas Blader
Eva Gunawan, MS

Special Thanks