Table Exporter Application

The Table Exporter application is an application meant to help extract data from a dataset, cohort, or dashboard into a delimited file for further download or usage in analysis.

This app is intended only for customers with a DNAnexus Apollo license and Org approval (if applicable). Contact [email protected] for more information.

Overview

dx run table-exporter (use -h for help)

Overview of all file inputs for the Table Exporter app.

Inputs

The Table Exporter app requires as an input:

  • Dataset, Cohort, or Dashboard - the dataset, cohort, or dashboard that you want to extract from. This input must be a v3.0 version.

Additional Optional Inputs are:

  • Output File Name - a custom name for the file generated. The Output File Format will determine the file extension.

  • Output File Format - the CSV file format is the default which generates a comma "," separated file. TSV is also an option to generate a tab "/t" separated file.

  • Coding Option - "Replace" is the default that replaces fields with their coding value. If specified to "Raw", the raw value is exported. If specified to "Exclude", all coded values are excluded.

  • Entity - the name of the entity you would like to extract if you do not want the cohort table from the input Dataset, Cohort or Dashboard.

  • Field Titles - the field titles to export as a comma "," separated string. If this field is blank and the entity is specified, all fields on the entity are exported. The entity input must be specified if fields are provided.

  • See app documentation for further granular configurations.

Process

  1. If an Entity is specified, the Entity and Field Titles are used to generate the exported file.

  2. If an Entity is not specified, then:

    1. If the input is a Dashboard or a Cohort, the columns specified in the cohort table are used to generate the exported file.

    2. If the input is a Dataset and it has a default dashboard, the columns specified in the cohort table of the default dashboard are used to generate the exported file.

    3. If the input is a Dataset without a default dashboard, the main entity and all of its fields are used to generate the export file.

Outputs

  • CSV/TSV file - the delimited file generated.

  • Logs - available under Project: .table-exporter/<job-id>-clusterlogs.tar.gz.

    • Spark cluster logs - for advanced troubleshooting.

Best Practices

  1. For extremely large entities (thousands of columns with hundreds of thousands of rows), using "Replace" codings will significantly increase runtime and cost. It is recommended that in those instances you export without coding replacement.

  2. If you are exporting on a dataset that has databases in a controlled project where DB UI View Only permission is set, the application must be run in the project with the restricted database to execute successfully.