The Infectious Respiratory Symptoms Study

This study uses NLP to identify symptoms of respiratory infections in clinical notes. Read the published results in JMIR.

It allows for running different NLP strategies and comparing them:

  1. cTAKES and the negation cNLP transformer
  2. cTAKES and the termexists cNLP transformer
  3. ChatGPT 3.5
  4. ChatGPT 4

Each can be run separately, and may require different preparation. Read more below about the main approaches (cTAKES and ChatGPT).

This study started with an investigation of COVID-19 symptoms, so the ETL study identifier is covid_symptom.

cTAKES Preparation

First, you’ll want to register for a UMLS API key.

Then because cTAKES and cNLP transformers are both services separate from the ETL, you will want to make sure they are ready.

From your working directory with the Cumulus ETL’s compose.yaml, you can run the following to start those services:

export UMLS_API_KEY=your-umls-api-key  # don't forget to set this - cTAKES needs it
docker compose up --wait --profile covid-symptom-gpu

You’ll notice the -gpu suffix there. Running the transformers is much, much faster with access to a GPU, so we strongly recommend you run this on GPU-enabled hardware.

And since we are running the GPU profile, when you do run the ETL, you’ll want to launch the GPU mode instead of the default cumulus-etl CPU mode:

docker compose run cumulus-etl-gpu nlp …

But if you can’t use a GPU or you just want to test things out, you can use --profile covid-symptom above and the normal cumulus-etl run line to use the CPU.

ChatGPT Preparation

  1. Make sure you have an Azure ChatGPT account set up.
  2. Set the following environment variables:
    • AZURE_OPENAI_API_KEY
    • AZURE_OPENAI_ENDPOINT

Running the Tasks

To run any of these individual tasks, use the following task names and use the nlp subcommand:

  • cTAKES + negation: covid_symptom__nlp_results
  • cTAKES + termexists: covid_symptom__nlp_results_term_exists
  • ChatGPT 3.5: covid_symptom__nlp_results_gpt35
  • ChatGPT 4: covid_symptom__nlp_results_gpt4

For example, your Cumulus ETL command might look like:

cumulus-etl nlp … --task=covid_symptom__nlp_results

Clinical Notes

All these tasks will need access to clinical notes, which are pulled fresh from your EHR (unless you inlined your notes). This means you will likely have to provide some other FHIR authentication arguments like --smart-client-id and --fhir-url.

See --help for more authentication options.

Evaluating the Results

See the Cumulus Library study repository for more information about processing the raw NLP results that the ETL generates.

Those instructions will help you set up Label Studio so that you can compare the different NLP strategies against human reviewers.