Using Dataset Extender
Common usage patterns for the Dataset Extender app.
An Apollo license is required to use Dataset Extender on the DNAnexus Platform. Org approval may also be required. Contact DNAnexus Sales for more information.
Adding Derived Phenotypes to an Existing Entity
Identify your dataset to extend. If you are using the command line, ensure that you retrieve the record id
To add data to an existing entity, ensure the following conditions are met
The data is related to the entity in a one-to-one relationship
The data has the unique keys for the entity you are extending, preferably in the first column
Your column names do not overlap with any of the column names in the entity you are extending (excluding the column key, those can overlap).
Save the data as a file in your project. It is recommended you save as comma delimited, but tab delimited is also supported with an extra input configuration.
Run the Dataset Extender application with the following inputs
Source Data - This should be set to your data file
Target Dataset - This is the dataset you want to extend
Target Entity Name - Only specify this if you are extending an entity that is not the main entity
Source Data Delimiter - Select "/t" if you are using a tsv. The default is "," comma.
When running through dx-toolkit, you can use a pattern as follows:
dx run dataset-extender -isource_data=<file path> -itarget_dataset=<record id>
For additional configuration guidance refer to the Dataset Extender page
This process will generate:
A new dataset with the original data plus your new data
A new database if the original database cannot be written to
Supplementing a Dataset by Adding a New, Related Entity
Identify the dataset you want to extend. If you are using the command line, ensure that you retrieve the record id
To add data as a new entity, ensure the following conditions are met
The data is related to the entity in a one-to-one or many-to-one relationship
The data has the a column with values that correspond to the keys for the entity you are extending, preferably this is in in the first column
Save the data as a file in your project. It is recommended you save as comma delimited, but tab delimited is also supported with an extra input configuration.
Run the Dataset Extender application with the following inputs
Source Data - This should be set to your data file
Target Dataset - This is the dataset you want to extend
Build New Entity - This needs to be changed to
true
New Entity Name - The name of the new entity you are creating. This cannot overlap with any other entity title in the Target Dataset
Target Entity Name - Only specify this if you are extending an entity that is not the main entity
Source Data Delimiter - Select "/t" if you are using a tsv. The default is "," comma.
When running through dx-toolkit, you can use a pattern as follows:
dx run dataset-extender -isource_data=<file path> -itarget_dataset=<record id> -ibuild_new_entity=true -inew_entity_name=<entity name> -itarget_entity_name=<entity title the data relates to>
For additional configuration guidance refer to the Dataset Extender page
This process will generate:
A new dataset with the original data plus your new data
A new database if the original database cannot be written to
Last updated