Assay Dataset Merger

An Apollo license is required to use Apollo Datasets on the DNAnexus Platform. Org approval may also be required. Contact DNAnexus Sales for more information.

The Assay Dataset Merger app provides a simple way to create mixed modality Apollo Datasets that contain one or more Assays. Simply provide a target Dataset and a Dataset with your source Assay and this app will create a new Dataset with the Assay specified linked to the target Dataset. This new Dataset can be used for stand-alone use and/or integrated use with existing Datasets and downstream tools such as the Cohort Browser, JupyterLab, analysis apps, and custom developer-led initiatives.

To launch the Assay Dataset Merger app via the CLI, enter the command:

dx run assay_dataset_merger

Overview

Inputs

The Assay Dataset Merger app requires the following, as general input:

  • Source Dataset - The Dataset that contains the Assay that will be added to the Target Dataset.

  • Target Dataset - The core Dataset that will be extended. The main Entity on this Dataset will become the new main Entity (this Dataset may have none, one, or more Assays).

  • Output Dataset Name - The name of the new Dataset record to create as output.

  • Linking Information - This section contains optional and required Fields to help support linking the Assay to the correct Entity in the target Dataset.

    • Linking File - (Optional) A CSV file linking the Assay sample IDs with the target Entity IDs. The linking file must have a header row. When this input is empty, a linking table is auto-generated based on an assumption of a one-to-one relationship and that the Assay's sample ID links to the target Dataset’s Global Key.

    • Linking Database Name - A user-provided name of the database to be used for the data being ingested. If the name exists in the context project, the linkage table attempts to create in the existing database and will error if a table with the same name exists or if the table cannot be created. The database name must start with a lowercase alphabetic character or underscore. Eligible characters are alphanumeric, underscore and hyphen characters.

See app documentation on the platform for further granular configurations.

Process

  1. The source data is reviewed to ensure it has the defined Assay.

  2. All validations are performed.

  3. The linkage table is either auto-created or ingested into the linking database defined.

  4. A new Dataset is created that contains the target Dataset with the new Assay linked per the input definition and the new Dataset is returned to the user.

Outputs

  • Dataset record- Dataset containing the merged Dataset with all of the target Dataset and the Assay from the source Dataset.

Best Practices

  1. Ensure that the Assay name is properly unique and descriptive. Overwrite configurations are available if the Assay name is not unique from an existing Assay in the target Dataset.

  2. Each run of the Assay merger creates a linking table in the linking database. It is recommended to use a separate database for the linking information to allow for simplified maintenance.

Last updated