Data Modernization: Improving the usefulness of genomic data
          
          
            Kelsey Florek, PhD, MPH
            Senior Genomics and Data Scientist
            Wisconsin State Laboratory of Hygiene
            May 21, 2024
          
          
          
          
          
        
        
          Supported By
          
        
        
          
            - Necessities of Next Generation Sequencing Capacity Building
 
            - Blueprints for an NGS Data Solution
 
            - Simplifying Genomics for Public Health Partners
 
          
        
        
          Expanding Genomic Sequencing Capacity
          
            Pre SARS-CoV-2 Pandemic
            
              - 4x Illumina MiSeq
 
              - 1x ONT MinION
 
            
           
          
            Post SARS-CoV-2 Pandemic
            
              - 4x Illumina MiSeq
 
              - 2x NextSeq 2000
 
              - 1x ONT GridION
 
              - 1x Eppendorf epMotion
 
              - 1x Tecan Fluent 780 NGS Dream Prep
 
            
           
          
            
Over 900% increase in sequencing data generation capacity
          
          
          
        
        
          NGS Data Storage
          
        
        
          Improvements in Analytical Approaches
          
            Old Approach
            
              - Entirely Python Based
 
              - Limited logging and fault tolerance
 
              - Required installing complicated and often conflicting dependencies
 
            
           
          
            New Approach
            
              - Nextflow - Nf-Core Based
 
              - Containerized Steps
 
              - Detailed Logging
 
              - Compatible with a variety of Cloud and HPC environments
 
              - Supports a high degree of job parallelization and horizontal scalability
 
            
           
        
        
          
            Necessities of Next Generation Sequencing Capacity Building 
            - Blueprints for an NGS Data Solution
 
            - Simplifying Genomics for Public Health Partners
 
          
        
        
          Bioinformatics analytical infrastructure
          
            - Highly scalable and capable of managing burst data
 
            - Highly reliable and fault tolerant
 
            - Cost effective
 
            - Adaptable to changing needs
 
            - Detailed logging and traceability
 
          
        
        
          WSLH Bioinformatics Analytical Infrastructure
          
        
        
          
 AWS Batch
          AWS Batch automatically provisions compute resources and optimizes the workload distribution based on the quantity and scale of the workloads.
        
        
          
 AWS Batch
          
        
        
          
          Nextflow Tower is an intuitive centralized command post that enables data analysis at scale. With Tower, users can easily launch, manage, and monitor scalable Nextflow data analysis pipelines and compute environments on-premises or across the cloud providers of their choice.
          
        
        
          Nextflow Tower - Dashboard
          
          
        
        
        
          Nextflow Tower - Monitor
          
          
        
        
          
 AWS Athena
          
        
        
          Connecting Data Across Siloed Systems
          
        
        
          
            Necessities of Next Generation Sequencing Capacity Building 
            Blueprints for an NGS Data Solution 
            - Simplifying Genomics for Public Health Partners
 
          
        
        
          Need for a centralized resource
          AMD Bioinformatics Regional Resource - Midwest Region
          
          
            
            
            
Ad-hoc Analytical Support
            Provision of Computational Resources
          
          
        
        
          Need for a centralized resource
          SARS-CoV-2 Genomic Surveillance
          
          
        
        
          COVID-19 Genomics UK (COG-UK) CLIMB-COVID
          
          
        
        
          COVID-19 Genomics UK (COG-UK) CLIMB-COVID
          
          
        
        
          Easy Genomics Partnership
          
        
        
          Easy Genomics - Minimal Viable Product
          
            - Simplify the process of launching and monitoring workflows
 
            - Provide the ability for users to upload sequence data through the web browser
 
            - Allow users to download analysis results through the web browser
 
            - User/Lab separation
 
          
        
        
          Easy Genomics - Sequence Data Upload
          
        
        
          Easy Genomics - Launch
          
        
        
          Easy Genomics - Launch
          
        
        
          Easy Genomics - Monitor
          
        
        
          Easy Genomics - Monitor
          
        
        
          Easy Genomics - Roadmap
          
            - 2024 Spring - Deploy Easy Genomics for internal use
 
            - 2024 Early Summer - Open Access to SARS-CoV-2 Sequencing Laboratories
 
            - 2024 Mid Summer - Easy Genomics MVP Update
 
          
        
        
          Acknowledgments
          
            
              
              Abigail Shockey, PhD
            
            
              
              Christopher Jossart, MPH
            
            
              
              Dustin Lyfoung, MS
            
           
          
            
              
              Thomas Blader
            
            
              
              Eva Gunawan, MS
            
            Special Thanks