Data Files Used by the Data Model Loader
The information on this page is for the use of customers working with the DNAnexus Professional Services team. Contact DNAnexus Sales for more information.
Overview
Phenotypic data must be delivered in a data model defined by the following files:
A data dictionary CSV file, named
data_dictionary.csv
in this example.A data CSV file,
data.csv
.A codings CSV file,
codings.csv
.
Data Dictionary
Description: The data dictionary file describes data in the data file (*.data.csv) by file name and by column within the specified data file. The data dictionary lists the metadata for the input data.csv
file and respective individual columns within the data file. This metadata includes human-readable field names, data types, data coding schemes, and the relationships between the data tables. The information supplied in the data dictionary guides the ingestion process.
Deliverable: The data_dictionary.csv
file is similar in structure to a spreadsheet, where columns are defined in Table 1 in the Appendix below. Each row in the data dictionary file represents a field for which an entity, i.e. "patient", "sample", or "encounter," will have a value.
Note: all columns have to be present, however they can be left empty if there are no values for the corresponding entity:field combination. The example below does not include all columns, just enough to illustrate the general format. A complete template can be provided upon request.
Example data_dictionary.csv
data_dictionary.csv
Entity Dictionary
Description: The optional entity dictionary file describes entities that the data file represent (*.data.csv). The entity dictionary lists the metadata for each input data.csv
. This metadata includes human-readable entity title, singular and plural labels, and a description. The information supplied in the entity dictionary provides additional contextualized information for the user.
Deliverable: The entity_dictionary.csv
file is similar in structure to a spreadsheet, where columns are defined in Table 3 in the Appendix below. Each row in the entity dictionary file represents an entity for which data file and entries in the entity_dictionary.csv
exist.
All columns have to be present, however they can be left empty if there are no values for the corresponding entity:field combination. The example below does not include all columns, just enough to illustrate the general format. A complete template can be provided upon request.
Example entity_dictionary.csv
entity_dictionary.csv
Data Files
Description: Phenotypic data must be supplied in a CSV file.
Deliverable: The data.csv
file is described in the data dictionary with its name. The data.csv
file must be a flattened data structure (a table) with a header, columns and rows. The values must be of the following listed types: date, datetime, integer, float, or string.
Example subject data.csv
data.csv
Codings
Description: The codings.csv
file is a single file that contains information for coded values present in all the *.data.csv files. Often a meaning is designed for human consumption and codes are designed to optimize data storage and retrieval. For example, instead of storing the value “Acquired pure red cell aplasia [erythroblastopenia]” multiple times in a database, it is easier to code this as the shortened ICD10 code, “D60.” Data are stored as codes in the encoded fields. When retrieving a coded value, such as the ICD10 code "D60", from a database and displaying the value for consumption, the system converts this back to the full meaning, “Acquired pure red cell aplasia [erythroblastopenia],” so that the information is intelligible to the users.
Hierarchical Data: The codings file supports specification of a hierarchy. For example, ICD10 codes, A00 to B99, all point to the category, “Certain infectious and parasitic diseases”. The code, A31, points to “Other sepsis” and codes, A31.X, point to specific types of sepsis. Such a hierarchical structure can be defined in the codings.csv
in the parent_code column (see example below).
Deliverable: The codings.csv
file must have columns as defined in Table 2 of the Appendix. Each categorical field in your data will need to have an associated entry in the codings.csv
file. The same coding_name value must be used in the respective data_dictionary.csv
field to match each field with its respective codes.
Example codings.csv
file
codings.csv
fileAppendix
Table 1: Data Dictionary File Description (data_dictionary.csv
)
data_dictionary.csv
)* A value is required unless specified as "or left empty."
Table 2: Codings File Description (codings.csv
)
codings.csv
)Table 3: Entity Dictionary File Description (entity_dictionary.csv
)
entity_dictionary.csv
)Last updated