The Covid Symptom Study
This study uses NLP to identify symptoms of COVID-19 in clinical notes.
It allows for running different NLP strategies and comparing them:
- cTAKES and the
negation
cNLP transformer - cTAKES and the
termexists
cNLP transformer - ChatGPT 3.5
- ChatGPT 4
Each can be run separately, and may require different preparation. Read more below about the main approaches (cTAKES and ChatGPT).
cTAKES Preparation
Because cTAKES and cNLP transformers are both services separate from the ETL, you will want to make sure they are ready. From your git clone of the cumulus-etl
repo, you can run the following to run those services:
export UMLS_API_KEY=your-umls-api-key # don't forget to set this - cTAKES needs it
docker compose --profile covid-symptom-gpu up -d
You’ll notice the -gpu
suffix there. Running the transformers is much, much faster with access to a GPU, so we strongly recommend you run this on GPU-enabled hardware.
And since we are running the GPU profile, when you do run the ETL, you’ll want to launch the GPU mode instead of the default cumulus-etl
CPU mode:
docker compose run cumulus-etl-gpu …
But if you can’t use a GPU or you just want to test things out, you can use --profile covid-symptom
above and the normal cumulus-etl
run line to use the CPU.
ChatGPT Preparation
- Make sure you have an Azure ChatGPT account set up.
- Set the following environment variables:
AZURE_OPENAI_API_KEY
AZURE_OPENAI_ENDPOINT
Running the Tasks
To run any of these individual tasks, use the following names:
- cTAKES + negation:
covid_symptom__nlp_results
- cTAKES + termexists:
covid_symptom__nlp_results_term_exists
- ChatGPT 3.5:
covid_symptom__nlp_results_gpt35
- ChatGPT 4:
covid_symptom__nlp_results_gpt4
For example, your Cumulus ETL command might look like:
cumulus-etl … --task=covid_symptom__nlp_results
Clinical Notes
All these tasks will need access to clinical notes, which are pulled fresh from your EHR (since the ETL doesn’t store clinical notes). This means you will likely have to provide some other FHIR authentication arguments like --smart-client-id
and --fhir-url
.
See --help
for more authentication options.
Evaluating the Results
See the Cumulus Library Covid study repository for more information about processing the raw NLP results that the ETL generates.
Those instructions will help you set up Label Studio so that you can compare the different NLP strategies against human reviewers.