Data Modernization: Improving the usefulness of genomic data
Kelsey Florek, PhD, MPH
Senior Genomics and Data Scientist
Wisconsin State Laboratory of Hygiene
May 21, 2024
Supported By
- Necessities of Next Generation Sequencing Capacity Building
- Blueprints for an NGS Data Solution
- Simplifying Genomics for Public Health Partners
Expanding Genomic Sequencing Capacity
Pre SARS-CoV-2 Pandemic
- 4x Illumina MiSeq
- 1x ONT MinION
Post SARS-CoV-2 Pandemic
- 4x Illumina MiSeq
- 2x NextSeq 2000
- 1x ONT GridION
- 1x Eppendorf epMotion
- 1x Tecan Fluent 780 NGS Dream Prep
Over 900% increase in sequencing data generation capacity
NGS Data Storage
Improvements in Analytical Approaches
Old Approach
- Entirely Python Based
- Limited logging and fault tolerance
- Required installing complicated and often conflicting dependencies
New Approach
- Nextflow - Nf-Core Based
- Containerized Steps
- Detailed Logging
- Compatible with a variety of Cloud and HPC environments
- Supports a high degree of job parallelization and horizontal scalability
Necessities of Next Generation Sequencing Capacity Building
- Blueprints for an NGS Data Solution
- Simplifying Genomics for Public Health Partners
Bioinformatics analytical infrastructure
- Highly scalable and capable of managing burst data
- Highly reliable and fault tolerant
- Cost effective
- Adaptable to changing needs
- Detailed logging and traceability
WSLH Bioinformatics Analytical Infrastructure
AWS Batch
AWS Batch automatically provisions compute resources and optimizes the workload distribution based on the quantity and scale of the workloads.
AWS Batch
Nextflow Tower is an intuitive centralized command post that enables data analysis at scale. With Tower, users can easily launch, manage, and monitor scalable Nextflow data analysis pipelines and compute environments on-premises or across the cloud providers of their choice.
Nextflow Tower - Dashboard
Nextflow Tower - Monitor
AWS Athena
Connecting Data Across Siloed Systems
Necessities of Next Generation Sequencing Capacity Building
Blueprints for an NGS Data Solution
- Simplifying Genomics for Public Health Partners
Need for a centralized resource
AMD Bioinformatics Regional Resource - Midwest Region
Ad-hoc Analytical Support
Provision of Computational Resources
Need for a centralized resource
SARS-CoV-2 Genomic Surveillance
COVID-19 Genomics UK (COG-UK) CLIMB-COVID
COVID-19 Genomics UK (COG-UK) CLIMB-COVID
Easy Genomics Partnership
Easy Genomics - Minimal Viable Product
- Simplify the process of launching and monitoring workflows
- Provide the ability for users to upload sequence data through the web browser
- Allow users to download analysis results through the web browser
- User/Lab separation
Easy Genomics - Sequence Data Upload
Easy Genomics - Launch
Easy Genomics - Launch
Easy Genomics - Monitor
Easy Genomics - Monitor
Easy Genomics - Roadmap
- 2024 Spring - Deploy Easy Genomics for internal use
- 2024 Early Summer - Open Access to SARS-CoV-2 Sequencing Laboratories
- 2024 Mid Summer - Easy Genomics MVP Update
Acknowledgments
Abigail Shockey, PhD
Christopher Jossart, MPH
Dustin Lyfoung, MS
Thomas Blader
Eva Gunawan, MS
Special Thanks