First Time Setup
Prerequisites
- Python 3.10 or later.
- Access to an AWS cloud services account. See the AWS setup guide for more information on this.
- An Athena database populated by Cumulus ETL version 1.0 or later
Installation
You can install directly from pypi by running:
pip install cumulus-library
Command line usage
Installing adds a cumulus-library
command for interacting with Athena. It provides several actions for users:
build
will create new study tables, replacing previously created versions (more information on this in Creating studies).clean
will remove studies from Athena, in case you no longer need themexport
will output the data in the tables to both a.csv
and.parquet
file. The former is intended for human review, while the latter is more compressed and should be preferred (if supported) for use when loading data into analytics packages.import
will re-insert a previously exported study into the databaseupload
will send data you exported to the Cumulus Aggregatorgenerate-sql
andgenerate-md
both create documentation artifacts, for users authoring studiesversion
will provide the installed version ofcumulus-library
and all present studies
You can use --target
to specify a specific study to be run. You can use it multiple times to configure several studies in order. You can use --study-dir
with most arguments to target a directory where you are working on studies/working with studies that aren’t available to install with pip
Several pip
installable studies will automatically be added to the list of available studies to run. See study list for more details.
There are several other options - use --help
to get a detailed list of commands.
Example usage: building and exporting the core study
Let’s walk through configuring and creating the core study in Athena. With this completed, you’ll be ready to move on to Creating studies).
- First, follow the instructions in the readme of the Sample Database, if you haven’t already. Our following steps assume you use the default names and deploy in Amazon’s US-East zone.
- Configure your system to talk to AWS as mentioned in the AWS setup guide
- Now we’ll build the tables we’ll need to run the core study. The
core
study creates tables for commonly used base FHIR resources likePatient
andObservation
. To do this, run the following command:cumulus-library build --target core
This usually takes around five minutes, but once it’s done, you won’t need to build
core
again unless the data changes. You should see some progress bars like this while the tables are being created:Creating core study in db... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00
- You can use the AWS Athena console to view these tables directly, but you can also download designated study artifacts. To do the latter, run the following command:
cumulus-library export --target core ./path/to/my/data/dir/
And this will download some example count aggregates to the
data_export
directory inside of this repository. There’s only a few tables, but this will give you an idea of what kind of output to expect.
Next steps
Now that you are all set up, you can learn how to create studies of your own!