The Infectious Respiratory Symptoms Study
This study uses NLP to identify symptoms of respiratory infections in clinical notes. Read the published results in JMIR.
It allows for running different NLP strategies and comparing them:
- cTAKES and the
negation
cNLP transformer - cTAKES and the
termexists
cNLP transformer - ChatGPT 3.5
- ChatGPT 4
Each can be run separately, and may require different preparation. Read more below about the main approaches (cTAKES and ChatGPT).
This study started with an investigation of COVID-19 symptoms, so the ETL study identifier is covid_symptom
.
cTAKES Preparation
First, you’ll want to register for a UMLS API key.
Then because cTAKES and cNLP transformers are both services separate from the ETL, you will want to make sure they are ready.
From your working directory with the Cumulus ETL’s compose.yaml
, you can run the following to start those services:
export UMLS_API_KEY=your-umls-api-key # don't forget to set this - cTAKES needs it
docker compose up --wait --profile covid-symptom-gpu
You’ll notice the -gpu
suffix there. Running the transformers is much, much faster with access to a GPU, so we strongly recommend you run this on GPU-enabled hardware.
And since we are running the GPU profile, when you do run the ETL, you’ll want to launch the GPU mode instead of the default cumulus-etl
CPU mode:
docker compose run cumulus-etl-gpu nlp …
But if you can’t use a GPU or you just want to test things out, you can use --profile covid-symptom
above and the normal cumulus-etl
run line to use the CPU.
ChatGPT Preparation
- Make sure you have an Azure ChatGPT account set up.
- Set the following environment variables:
AZURE_OPENAI_API_KEY
AZURE_OPENAI_ENDPOINT
Running the Tasks
To run any of these individual tasks, use the following task names and use the nlp
subcommand:
- cTAKES + negation:
covid_symptom__nlp_results
- cTAKES + termexists:
covid_symptom__nlp_results_term_exists
- ChatGPT 3.5:
covid_symptom__nlp_results_gpt35
- ChatGPT 4:
covid_symptom__nlp_results_gpt4
For example, your Cumulus ETL command might look like:
cumulus-etl nlp … --task=covid_symptom__nlp_results
Clinical Notes
All these tasks will need access to clinical notes, which are pulled fresh from your EHR (unless you inlined your notes). This means you will likely have to provide some other FHIR authentication arguments like --smart-client-id
and --fhir-url
.
See --help
for more authentication options.
Evaluating the Results
See the Cumulus Library study repository for more information about processing the raw NLP results that the ETL generates.
Those instructions will help you set up Label Studio so that you can compare the different NLP strategies against human reviewers.