First Time Setup

Prerequisites

  1. Python 3.10 or later.
  2. Access to an AWS cloud services account. See the AWS setup guide for more information on this.
  3. An Athena database populated by Cumulus ETL version 1.0 or later

Installation

You can install directly from pypi by running:

pip install cumulus-library

Command line usage

Installing adds a cumulus-library command for interacting with Athena. It provides several actions for users:

  • create will create a manifest file for you so you can start working on authoring queries (more information on this in Creating studies).
  • build will create new study tables, replacing previously created versions (more information on this in Creating studies).
  • clean will remove studies from Athena, in case you no longer need them
  • export will output the data in the tables to both a .csv and .parquet file. The former is intended for human review, while the latter is more compressed and should be preferred (if supported) for use when loading data into analytics packages.
  • upload will send data you exported to the Cumulus Aggregator

By default, all available studies will be used by build and export, but you can use or --target to specify a specific study to be run. You can use it multiple times to configure several studies in order.

Several pip installable studies will automatically be added to the list of available studies to run. See study list for more details.

There are several other options - use --help to get a detailed list of commands.

Example usage: building and exporting the template study

Let’s walk through configuring and creating a template study in Athena. With this completed, you’ll be ready to move on to Creating studies).

  • First, follow the instructions in the readme of the Sample Database, if you haven’t already. Our follown steps assume you use the default names and deploy in Amazon’s US-East zone.
  • Configure your system to talk to AWS as mentioned in the AWS setup guide
  • Now we’ll build the tables we’ll need to run the template study. The core study creates tables for commonly used base FHIR resources like Patient and Observation. To do this, run the following command:
    cumulus-library build --target core
    

    This usually takes around five minutes, but once it’s done, you won’t need to build core again unless the data changes. You should see some progress bars like this while the tables are being created:

    Creating core study in db... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00
    
  • Now, we’ll build the built-in example template study. Run a very similar command, but targeting template this time:
    cumulus-library build --target template
    

    This should be much faster - these tables will be created in around 15 seconds.

  • You can use the AWS Athena console to view these tables directly, but you can also download designated study artifacts. To do the latter, run the following command:
    cumulus-library export --target template ./path/to/my/data/dir/
    

    And this will download some example count aggregates to the data_export directory inside of this repository. There’s only a few tables, but this will give you an idea of what kind of output to expect. Here’s the first few lines:

    cnt,influenza_lab_code,influenza_result_display,influenza_test_month
    102,,,
    70,,NEGATIVE (QUALIFIER VALUE),
    70,"{code=92142-9, display=Influenza virus A RNA [Presence] in Respiratory specimen by NAA with probe detection, system=http://loinc.org}",,
    70,"{code=92141-1, display=Influenza virus B RNA [Presence] in Respiratory specimen by NAA with probe detection, system=http://loinc.org}",,
    69,"{code=92141-1, display=Influenza virus B RNA [Presence] in Respiratory specimen by NAA with probe detection, system=http://loinc.org}",NEGATIVE (QUALIFIER VALUE),
    

Next steps

Now that you are all set up, you can learn how to create studies of your own!