The Covid Symptom Study

This study uses NLP to identify symptoms of COVID-19 in clinical notes.

It allows for running different NLP strategies and comparing them:

cTAKES and the negation cNLP transformer
cTAKES and the termexists cNLP transformer
ChatGPT 3.5
ChatGPT 4

Each can be run separately, and may require different preparation. Read more below about the main approaches (cTAKES and ChatGPT).

cTAKES Preparation

First, you’ll want to register for a UMLS API key.

Then because cTAKES and cNLP transformers are both services separate from the ETL, you will want to make sure they are ready.

From your working directory with the Cumulus ETL’s compose.yaml, you can run the following to start those services:

export UMLS_API_KEY=your-umls-api-key  # don't forget to set this - cTAKES needs it
docker compose --profile covid-symptom-gpu up --wait

You’ll notice the -gpu suffix there. Running the transformers is much, much faster with access to a GPU, so we strongly recommend you run this on GPU-enabled hardware.

And since we are running the GPU profile, when you do run the ETL, you’ll want to launch the GPU mode instead of the default cumulus-etl CPU mode:

docker compose run cumulus-etl-gpu …

But if you can’t use a GPU or you just want to test things out, you can use --profile covid-symptom above and the normal cumulus-etl run line to use the CPU.

ChatGPT Preparation

Make sure you have an Azure ChatGPT account set up.
Set the following environment variables:
- AZURE_OPENAI_API_KEY
- AZURE_OPENAI_ENDPOINT

Running the Tasks

To run any of these individual tasks, use the following names:

cTAKES + negation: covid_symptom__nlp_results
cTAKES + termexists: covid_symptom__nlp_results_term_exists
ChatGPT 3.5: covid_symptom__nlp_results_gpt35
ChatGPT 4: covid_symptom__nlp_results_gpt4

For example, your Cumulus ETL command might look like:

cumulus-etl … --task=covid_symptom__nlp_results

Clinical Notes

All these tasks will need access to clinical notes, which are pulled fresh from your EHR (unless you inlined your notes). This means you will likely have to provide some other FHIR authentication arguments like --smart-client-id and --fhir-url.

See --help for more authentication options.

Evaluating the Results

See the Cumulus Library Covid study repository for more information about processing the raw NLP results that the ETL generates.

Those instructions will help you set up Label Studio so that you can compare the different NLP strategies against human reviewers.