Clinical Dataset Merger

An Apollo license is required to use Apollo Datasets on the DNAnexus Platform. Org approval may also be required. Contact DNAnexus Sales for more information.

The Clinical Dataset Merger app provides a simple way to link Entities from one Apollo Dataset to Entities in another Dataset to create a new merged Dataset. The Entities from the source Dataset are linked to the target Dataset based on the inputs described for stand-alone use and/or integrated use with existing Datasets and downstream tools such as the Cohort Browser, JupyterLab, analysis apps, and custom developer-led initiatives.

To launch the Clinical Dataset Merger app via the CLI, enter the command:

dx run clinical_dataset_merger

Overview

Inputs

The Clinical Dataset Merger app requires the following as general input:

  • Source Dataset - The Dataset that contains the Entities that will be added to the Target Dataset.

  • Target Dataset - The core Dataset that will be extended. The main Entity on this Dataset will become the new main Entity. This Dataset may have none, one, or more Assays.

  • Output Dataset Name - The name of the new Dataset record to create as output.

  • Linking Information - This section is needed when the linkage is not between the Global Key of the source Dataset and the global key of the target Dataset.

    • Source Entity Name -Entity name that will be used to join the source Entity to the target Dataset (when blank, the main Entity of the Dataset is used).

    • Source Field Name -Field name on the source Entity that will be used to join to the target Dataset (when blank, the global key on the main Entity of the source Dataset is used. If the relationship is one-to-one (1:1) or many-to-one (N:1) then this Field must be unique).

    • Target Entity Name - Entity raw name that will be used to join the source Entity and the target Dataset (when blank, the main Entity of the Dataset is used).

    • Target Field Name - Field raw name on the target Entity that will be used to join to the source Dataset (when blank, the global key on the main Entity of the target Dataset is used. If the relationship is one-to-many (N:1) then this Field must be unique).

    • Join Relationship - The relationship between the source and target Entities. This can be automatically derived or can be set. Note that many-to-many is not supported.

See app documentation on the platform for further granular configurations.

Process

  1. All validations are performed to ensure that Entity names are unique and the linkage information is valid.

  2. A new Dataset is created and, in the Entities, the target Dataset’s Entities are added along with the Entities of the source Dataset based on the linkage information.

  3. Any other components in the target Dataset are added and the new Dataset is returned to the user.

Outputs

  • Dataset record- Dataset containing the merged Dataset containing all of the target Dataset and the Entities of the source Dataset.

Best Practices

  1. Ensure that the Entity names are unique between the source Dataset and the target Dataset.

  2. This app is a great way to split up sensitive and non-sensitive data into two different databases. Simply ingest each through its own ingestion process and then use this app to create a unified experience for users with permission to both sets of data.

Last updated