AI/ML In Public Health


Kelsey Florek, PhD, MPH
Senior Genomics and Data Scientist
Wisconsin State Laboratory of Hygiene
June 26, 2025

Slides live at:
www.k-florek.net/talks

Objectives

  • Explore AI, ML, and LLMs
  • Identify examples of AI/ML used in research and public health
  • Describe the issues and challenges with Generative AI
  • Evaluate the application of Generative AI in bioinformatics

  1. Have you ever used AI/ML/Generative AI if so, then how?
  2. What concerns do you have about the usage of AI, in general or in public health?
  3. What benefits do you do think AI provides or could provide?

What is AI?

Artificial Intelligence

Software that allows machines or computer systems to perceive their environment and use learning and intelligence to achieve a defined goal.

Machine Learning

An area in artificial intelligence with a focus on statistical algorithms that can learn from data and generalize to unseen data.

Deep Learning

A subset of machine learning methods that are based on neural networks, with deep implying multiple layers.

Generative AI

  • a subset of AI and a type of deep learning model
  • designed to create new and "original" content
  • trained on massive datasets of existing content

Primer - AI/ML models and approaches

Model training paradigms

  • Supervised Learning
  • Unsupervised Learning
  • Reinforcement Learning
  • Online Learning
  • Batch Learning
  • Meta-learning
  • Semi-supervised Learning
  • Self-supervised Learning
  • Curriculum Learning
  • Rule-based Learning
  • Quantum Machine Learning

Supervised Learning

A supervised machine learning approach requires labelled input and output data, allowing human oversight of the model's classification.

  • Regression (prediction of a continuous variable):
    • Linear Regression
    • Polynomial Regression
  • Classification (prediction of a categorical variable):
    • Decision Trees
    • Random Forest
    • Logistic Regression
    • K-Nearest Neighbors

Unsupervised Learning

An approach that can be used to group data when no labels are present. Typically applied to cases where the model is representative of the data to ask

  • Clustering:
    • K-Means
    • DBSCAN
    • Hierarchical Clustering
  • Dimensionality Reduction:
    • Principal Component Analysis (PCA)
    • Singular Value Decomposition (SVD)

Neural Networks

A computational model inspired by biological neural networks, inspired by the behavior of neurons.

Can be supervised, semi-supervised, self-supervised, unsupervised.

Can you figure out how it works?

https://adamharley.com/nn_vis/mlp/2d.html

  • How well did it seem to work?
  • Did you find any patterns between the input/hidden layers/output?
  • If you can't understand how it works, can you trust or validate its answers?

Natural Language Processing (NLP) and Deep Learning

Deep learning using neural networks has become the dominate method of NLP, using massive volumes of text and voice to an unprecedented level of accuracy.

Transformers: Combining the position of words and subwords (tokenization) along with dependencies and relationships between words (self-attention) allows for calculating different parts of language together.

A question of experience - How much training do models need?

  • Type of problem - supervised vs unsupervised; image recognition or NLP
  • Model Complexity - more layers or nodes = more training data needed
  • Data Quality and Accuracy - noisy data will require more training data

Training LLM Models

Parameters

Cost

Enhancing Accuracy and Reliability with RAG

Retrieval-augmented generation (RAG) - enhances accuracy and reliability of generative AI models by linking AI services to external resources.

AI/ML applications in Public Health

Applying AI in public health

  • Disease Forecasting
  • Risk Prediction
  • Health Diagnosis
  • Spatial Modeling
  • Surveillance
  • Modeling

SARS-CoV-2 Lineage - Pangolin

  • Multinomial Logistic Regression (pangolin 2.0)
  • Decision Trees (pangolin 2.0 and 3.0)
  • Random Forests (pangolin 4.0)

Prediction of echinocandin resistance in Candida auris

  • 2,853 Candida auris isolates (AST breakpoints and FKS1 mutation data)
  • Models Tested: Gradient Boosting, Random Forest, SVM, and XGBoost
  • 80/20 train-test split
  • Gradient Boosting frequently provided the best balance between performance metrics
  • Ser639Phe are highly associated with resistance, demonstrating the potential of machine learning for genomic resistance prediction

Enhanced Detection System for Healthcare-Associated Transmission (EDS-HAT)

Combination of WGS surveillance and ML of electronic health records to identify outbreaks and transmission routes.

"EDS-HAT could have prevented 25 (lower bound) to 63 (upper bound) transmissions. Moreover, 3.1–8.0 fewer 30-day attributable readmissions and 1.6-3.3 fewer deaths would have occurred had EDS-HAT been running in real time."

Generative AI in healthcare

  • cross-sectional study of 195 randomly drawn patient questions from Reddit’s r/AskDocs
  • compared physician’s and chatbot’s responses to patient’s questions asked publicly on Reddit’s r/AskDocs
  • chatbot responses were preferred over physician responses and rated significantly higher for both quality and empathy
  • NYUTron - an LLM trained on clinical language and fine-tuned across a wide range of clinical and operational predictive tasks
    • 30-day all-cause readmission prediction
    • in-hospital mortality prediction
    • comorbidity index prediction
    • length of stay prediction
    • insurance denial prediction

Challenges of Generative AI

Reaching the limit - AI hallucinations

AI hallucination - a phenomenon where a large language model perceives a pattern that is nonexistent to human observers resulting in outputs that are nonsensical or inaccurate.

  • LLM Hallucinations
    • False Facts - confidently state incorrect information
    • Imaginary Scenarios - entirely fabricated stories or events
    • Nonsense/Incoherence - output that doesn't follow any logical flow or grammatical rules

Ethical Considerations

  • AI systems should be under human oversight.
  • They need a fallback plan if something is wrong and they must be accurate, reliable, and reproducible.
  • They must ensure full respect for privacy and data protection.
  • Transparent and offer traceability.
  • AI systems must avoid unfair bias.
  • Must benefit all human beings.
  • Must ensure responsibility and accountability.

AI SLOP

Carbon Emissions

Is AI worth the cost

Gartner hype cycle

AWS $50 Million Investment

AWS $50 Million Investment

"Bio-Rad Laboratories, a global leader in life sciences research and clinical diagnostics serving over 29,000 labs worldwide, developed a multilingual chatbot using AWS's Retrieval Augmented Generation technology with Amazon Bedrock. Bio-Rad indicated that this solution has reduced customer support calls by 20 percent while providing laboratories with near-instant access to critical product information, allowing them to focus on patient safety and regulatory compliance."

"When I asked her how she did on the assignment, she said she got a good grade. “I really like writing,” she said, sounding strangely nostalgic for her high-school English class — the last time she wrote an essay unassisted. “Honestly,” she continued, “I think there is beauty in trying to plan your essay. You learn a lot. You have to think, Oh, what can I write in this paragraph? Or What should my thesis be?” But she’d rather get good grades."

AI as a development tool

AI as a development tool

"Kids today don't just use agents; they use asynchronous agents. They wake up, free-associate 13 different things for their LLMs to work on, make coffee, fill out a TPS report, drive to the Mars Cheese Castle, and then check their notifications. They've got 13 PRs to review. Three get tossed and re-prompted. Five of them get the same feedback a junior dev gets. And five get merged."

"“I'm sipping rocket fuel right now,” a friend tells me. “The folks on my team who aren't embracing AI? It's like they're standing still.”"

Generative AI a stone soup

Applying Generative AI to Bioinformatics and Public Health

Generative AI to support NCBI Uploads

What questions do you have?