First Time Setup
Prerequisites
- Python 3.10 or later.
- Access to an AWS cloud services account. See the AWS setup guide for more information on this.
- An Athena database populated by Cumulus ETL version 1.0 or later
Installation
You can install directly from pypi by running:
pip install cumulus-library
Command line usage
Installing adds a cumulus-library
command for interacting with Athena. It provides several actions for users:
build
will create new study tables, replacing previously created versions (more information on this in Creating studies).clean
will remove studies from Athena, in case you no longer need themexport
will output the data in the tables to both a.csv
and.parquet
file. The former is intended for human review, while the latter is more compressed and should be preferred (if supported) for use when loading data into analytics packages.import
will re-insert a previously exported study into the databaseupload
will send data you exported to the Cumulus Aggregatorgenerate-sql
andgenerate-md
both create documentation artifacts, for users authoring studies
By default, all available studies will be used by build and export, but you can use or --target
to specify a specific study to be run. You can use it multiple times to configure several studies in order.
Several pip installable studies will automatically be added to the list of available studies to run. See study list for more details.
There are several other options - use --help
to get a detailed list of commands.
Example usage: building and exporting the template study
Let’s walk through configuring and creating a template study in Athena. With this completed, you’ll be ready to move on to Creating studies).
- First, follow the instructions in the readme of the Sample Database, if you haven’t already. Our follown steps assume you use the default names and deploy in Amazon’s US-East zone.
- Configure your system to talk to AWS as mentioned in the AWS setup guide
- Now we’ll build the tables we’ll need to run the template study. The
core
study creates tables for commonly used base FHIR resources likePatient
andObservation
. To do this, run the following command:cumulus-library build --target core
This usually takes around five minutes, but once it’s done, you won’t need to build
core
again unless the data changes. You should see some progress bars like this while the tables are being created:Creating core study in db... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00
- Now, we’ll build the built-in example
template
study. Run a very similar command, but targetingtemplate
this time:cumulus-library build --target template
This should be much faster - these tables will be created in around 15 seconds.
- You can use the AWS Athena console to view these tables directly, but you can also download designated study artifacts. To do the latter, run the following command:
cumulus-library export --target template ./path/to/my/data/dir/
And this will download some example count aggregates to the
data_export
directory inside of this repository. There’s only a few tables, but this will give you an idea of what kind of output to expect. Here’s the first few lines:cnt,influenza_lab_code,influenza_result_display,influenza_test_month 102,,, 70,,NEGATIVE (QUALIFIER VALUE), 70,"{code=92142-9, display=Influenza virus A RNA [Presence] in Respiratory specimen by NAA with probe detection, system=http://loinc.org}",, 70,"{code=92141-1, display=Influenza virus B RNA [Presence] in Respiratory specimen by NAA with probe detection, system=http://loinc.org}",, 69,"{code=92141-1, display=Influenza virus B RNA [Presence] in Respiratory specimen by NAA with probe detection, system=http://loinc.org}",NEGATIVE (QUALIFIER VALUE),
Next steps
Now that you are all set up, you can learn how to create studies of your own!