Preparing for a future of genomic data driven infectious disease prevention


Kelsey Florek, PhD, MPH
Senior Genomics and Data Scientist
Wisconsin State Laboratory of Hygiene
May 16, 2022

Slides live at:
www.k-florek.net/talks

@kelsey_florek

Preventing the spread of Antimicrobial Resistance mechanisms requires a data driven approach that applies prevention strategies with precision.

Applying Genomic data to combat Antimicrobial Resistance:

  1. Outbreak Investigations: WGS can be utilized to establish directionality of transmission guiding prevention strategies
  2. Contact Tracing: WGS can identify secondary cases and support efforts to prevent further spread
  3. Risk Assessment: WGS and long read plasmid characterization can provide insight into assessing the risk of a mechanism
  4. Surveillance: Clinical and Environmental samples can provide insight into the presence of resistance mechanism and provide the ability to track patients or facilities over time

Genomic data alone is ineffective

Precision Public Health:
"delivering the right intervention to the right population at the right time"

- precision approachs for infectious diseases require an integrated approach connecting genomic, epidemiological, clinical, environmental, and sample data

Public Health is a series of Data Silos

Data integration is a Public Health Issue

  • CDC Data Modernization Initiative - "Move from siloed and brittle public health data systems to connected, resilient, adaptable, and sustainable ‘response-ready’ systems"
  • AMD Platform - national bioinformatics platform aiming to build capacity and integrate genomic analytical workflows

Anything we build @WSLH needs to be flexible and accommodate an uncertain future.

Build an infrastructure to support a genomic data ecosystem that provides actionable public health data for targeted interventions.

Application of Cloud Technologies towards a genomic data ecosystem



  • Advantages to Cloud:
    • On-demand Pay for Usage Model
    • Extreme Scalability
    • High Availability
    • Cost Efficient Storage and Compute
    • Serverless approaches to data infrastructure

What is a data lake?

centralized repository designed to store, process, and secure large amounts of structured and unstructured data at any scale



  • Data Lakes retain all data
  • Support all data types
  • Adapt easily to changes
  • Require data scientists to navigate
  • Require proper management to prevent swamp

Data Lake Architecture

Key Features of the cloud based architecture

  • flexibility to accommodate an uncertain future
    • capable of a variety of APIs to connect with future CDC platforms
    • greater functionality if CDC follows industry standards for APIs
  • high degree of scalability to meet demand
  • combines data with bioinformatic analytics
  • access to new machine learning and predictive technologies
  • cost efficient data storage and analytics

Navigating Challenges

Project Scope


  • Agile Development process: iterative development focused on an early delivery of an adaptive resource
  • Adaptive Strategy: cloud based approach that is highly flexible to meet the needs of a changing landscape
  • Initial Focus: platform designed to meet needs of SARS-CoV-2 while laying foundation for other activities

Security, Compliance, and Data Governance




  • Data Security: all data encrypted in rest and in transit, user access can be controlled down to row / column / cell level
  • Compliance: UW-Madison Cybersecurity Risk Assessment
  • Data Ownership: all partners submitting data will be informed and consulted on data usage

Workforce Development

  • Cloud Technology Training: AWS training through University, knowledge sharing, self-taught
  • Database Activities: SQL training, hiring expertise, in house experience
  • Visualization and Reports: self-taught, knowledge sharing
  • Project Management Resources: virtual training sessions

Summary

  • There is a massive need to update the technologies and approaches applied to data in public health.
  • The national strategy for genomic data and public health is murky.
  • Cloud technologies provide an avenue for developing flexible data platforms for managing data and developing new data strategies.
  • Public Health Laboratories are facing a workforce challenge.

Acknowledgments

Abigail Shockey, PhD
Christopher Jossart, MPH


  • Special Thanks
    • Nikki Marchan