John Schrom, MPH, FAMIA

I'm an epidemiologist turned informaticist turned data scientist turned machine learning engineer, generally working in the biomedical domain. My work has included direct service as a health educator; providing analytic and data science support to clinics and health systems; building data-forward health technology applications; and working on policy at the municipal, county, and state levels. In my free time, I enjoy volunteering on marine mammal rescues and rehabilitation.

Below you'll find examples of my publicly-available work, with additional information on Google Scholar. If you're interested in connecting, you can find me on LinkedIn or we can connect via email at

Population Health

The utility of population health turns on its ability to improve health outcomes without increasing costs or provider burnout. This group of projects has involved finding approaches to do that through technology and automation, machine learning and statistics, and by identifying and directly targeting underlying factors that influence health outcomes.

Effective Data Sharing with Clinicians

Clinicians and clinic administators need to use data to run quality improvement initiatives and manage the health of their panels. Many health systems create large dashboards to "slice and dice" data and find areas for improvement - this seems intuitive, but is it the most effective approach?

As I was building out the internal reporting system at One Medical, I designed an experiment to test aspects of this hypothesis. Specifically, is it more effective to provide detailed rate data about a population ("your screening rate is 20%, and trending up") or to provide data in a directive manner tied to a specific call to action ("Screen 70 patients for depression this week"). We found the latter: patients who saw a provider that receieved directive data reports had an approximately 20% higher odds of being screened for depression1.

  1. Schrom JR, Slam A, Liu TK, Bouey C, Berk L, Gilmore A, Lesser L, Behal R. The Impact of Data Communication Style in Quality Reports on Depression Screening in Primary Care. Podium presentation at: American Medical Informatics Association Virtual Summit: March 2020; Virtual. [abstract, slides, video]

Social Determinants of Health

For many sick patients, their illness isn't just driven by some physical factors, but is also influenced by their social situation: their housing situation, relationship status, or the neighborhood they live in. Addressing these types of social determinants of health is increasingly being recognized as critical to improving the health of a population.

While working at an HIV clinic in Minnesota, I led the analysis of a study into drivers of care coordination time. The patient-centered medical home program in Minnesota would reimburse providers based on clinical complexity, but many clinics felt that clinical complexity alone didn't really predict how patients would utilize care coordination. They were right; I found that the largest drivers of utilization included being homeless or institutionally housed, requiring an interpreter, or having literacy issues1.

  1. Schrom JR, Shimotsu S, Poplau S, Larsen K. Improving patient center medical home care coordination in a safety net healthcare system among adults living with HIV. Podium presentation at: American Public Health Association Meeting: November 2013; Boston. [abstract, slides]

Personalized Medicine & Phenotyping

Perhaps the greatest promise of machine learning in healthcare is the ability to personalize care so that patients receive the most therapeutic benefit with fewest side effects. This group of projects has involved evaluating the effectiveness of interventions for particular conditions; identifying heterogeneity in risk, presentation, or treatment response within a condition; and collection of novel data to support this goal.

Acute Care

When patients are treated for acute concerns, they are often lost to follow-up: either the treatment worked and they didn't inform their provider (why would they?), or it didn't work and they delay or avoid additional follow-up.

One Medical developed an automated check-in within their personal health record specifically for following up with patients seen for acute issues. I did the analysis and write-up regarding how patients and providers used this feature. Providers had a 43% opt-out rate, and patients had a 46% response rate. Once responded, most indicated they were getting better while few (2%) said they were getting worse. Physician and patient response was driven, in part, by patient/provider characteristics as well as specifics of the acute condition (additional details in citation). This novel data source opens the door to more sophisiticated clinical phenotyping of acute conditions to drive personalized clinical decision support.1.

  1. Joshi V, Schrom JR, Munkittrick K, Stearns C, Ivanova S, Hoang D, Lesser L, Fakhouri T, Diamond A. Design and Implementation of an Electronic Survey for Follow-up of Acute Conditions in Primary Care. Podium presentation at: American Medical Informatics Association Meeting: November 2019; Washington DC. [abstract]

Mental Health

Mental health concerns are among the most prevalent sources of morbidity in our modern society: tens of millions of patients will have an episode of anxiety or depression each year. Patients struggling with mental health issues can have worse clinical outcomes for non-mental health conditions, as well as have overall higher utilization rates. So, what is the best way to treat these patients, and is it possible to do it without prescribing more benzos?

One Medical launched a group visits program, consisting of four 90-minute sessions covering a variety of mindfulness techniques. I did the evaluation of this program, and found significant (statistically and clinically) decreases in both symptoms (as measured by the GAD-7) and utilization (as measured by visit rate) in the six months following their enrollment. One additional finding is that the largest improvement came from the patients with the highest utilization; "outlier" patients moved back to a mean visit rate after finishing the group visits program1.

  1. Schrom JR, Patterson K, Gillmore A, Cohen P, Lesser L. Group Visits Improve Symptoms and Lower Utilization in Primary Care Patients with Anxiety. Poster presented at: Academy Health Annual Research Meeting; June 2018; Seattle. [abstract, poster]


Diabetes affects over 10% of the US population, with as much as 30% of the population being pre-diabetic. However, what would happen if diabetes and pre-diabetes wasn't a single, homogeneous disease? Perhaps some differences and ambiguity in risk and treatment response is driven by different phenotypes with similar clinical presentation.

Using association rule mining (sometimes also called "market basket analysis"), we were able to identify pre-diabetic phenotypes which, when using propensity score matching to adjust for confounding in treatment, showed different risk of disease progression when given certain medications1.

Extending this idea of using association rule mining for EHR-based clinical phenotyping, you commonly run into an issue of not being able to model survival while also handling confounding and dosage effects. To address this, you can modify the association rule mining algorithm to incorporate survival models; this combination is survival association rules (SAR)2.

  1. Schrom JR, Caraballo PJ, Castro MR, Simon GJ. Quantifying the Effect of Statin Use in Pre-Diabetic Phenotypes Discovered Through Association Rule Mining. AMIA Annual Symposium Proceedings. 2013;2013:1249-1257. [paper]

  2. Simon GJ, Schrom JR, Castro MR, Li PW, Caraballo PJ. Survival Association Rule Mining Towards Type 2 Diabetes Risk Assessment. AMIA Annual Symposium Proceedings. 2013;2013:1293-1302. [paper]

Infectious Diseases

Infectious diseases are, by their very nature, a relatively heterogeneious group of conditions. This presents an interesting opportunity to better dissect how this diverse group of etiologic agents impacts our body's physiology.

We can use gene expression data from patients infected with a variety of infectious diseases to identify what genes are over/under expressed by which diseases. Using secondary bioinformatics data sources presents unique challenges, especially around avoiding modeling artefacts from the original studies. However, using some statistical corrections and machine learning approaches, I was able to identify specific genes predictive of infection with each infectious disease in this study1.

One of the major concerns with treating infectious diseases is the emergence of drug resistance. That's particularly true in the case of HIV infection. Using similar techniques as above, I analyzed and modeled HIV viral genomes to idenitfy genetic drift and patterns of emerging resistance. I found mutations occuring at a similar rate as previously published in literature, and identified specific regions of the genome that appear to be associated with treatment resistance2.

  1. Schrom JR. Classifying Disease from Host Gene Expression Patterns [unpublished]. [poster, blog]

  2. Schrom JR. Understanding the impact of treatment on the HIV-1 genome [unpublished]. [poster, blog]

Decreasing EHR Burden

The proliferation of electronic health records (EHRs) has led to an increase in physician burnout, largely attributed to the administrative burden of documenting in EHRs. Machine Learning presents a unique opportunity to improve provider user experience and ameliorate this burden.

EHR Usability

Part of the usability issue with EHRs is how data is hidden throughout the chart, documenting and searching requires many clicks to find what you're looking for, and many tools designed to help (e.g., clinical decision support) actually just create additional hoops to jump through.

In a deceivingly simple experiment, we showed that we could dramatically improve our generic prescribing rates by simplying changing the order of how those medications show up in the search box. Rather than, say, adding some decision support alert encouraging a generic medication when a provider starts prescribing a branded one, if we just show it higher in the search results then they overwhelmingly select the generic1.

Taking that idea a step further, we could probably just predict which medication a provider is going to prescribe even before they see a patient. I developed a machine learning approach to achieve that goal: based on patient characteristics and the patient's reason for visit, I was able to frequently identify nearly every type of data produced from that visit. Where performance lagged, it was often due to actual variation in clinical decision making (e.g., "insomnia" can be treated with a number of different medications, based on patient and provider preferences). This suggests the ability to reduce documentation burden further by reducing the number of clicks to achieve a task2.

  1. Schrom JR, Cohen P, Krisch S, Fakhouri T. Modifying the order of medication search results in an electronic health record to increase physician generic prescribing behavior. Poster presented at: American Medical Informatics Summit: March 2019; San Francisco. [abstract, poster]

  2. Schrom JR, Joshi V, Bouey C, Gilmore A. Associating Chief Complaints with Electronic Health Record Activity to Decrease Provider Administrative Burden. Poster presented at: American Medical Informatics Association Meeting: November 2019; Washington DC. [abstract]

Message Routing

One unfortunate side effect of patients having digital access to their providers is the dramatic increase in messages and tasks generated by patients outside of regular appointments. Many of these may not even require a clinician's input -- especially if the concern is regarding scheduling or insurance issues. So anything that can be done to reduce this workload would make the provider's life easier.

One Medical built a machine learning model to identify who was most likely to resolve an incoming message: an administrative or clinical staff member. I was looped in to evaluate the model performance, make it interpretable, assess other approaches, and to do this write-up. This problem, as defined, appeared to be nearly linearly separable, with multiple models having nearly equivalent performance. This was implemented in production to sort incoming messages, and at the time of publication had rerouted 9% of messages out of clinical queues1.

  1. Ingram P, Srinivasan R, Grennan P, Schrom JR. Identifying Non-Clinical Patient Messages Using Naive Bayes. Poster presented at: American Medical Informatics Association Meeting: November 2018; San Francisco. [abstract, poster]