Grouped Box Plot
Learn to build and use grouped box plots in the Cohort Browser.
Last updated
Learn to build and use grouped box plots in the Cohort Browser.
Last updated
Copyright 2024 DNAnexus
The Cohort Browser is accessible to all users of the UK Biobank Research Analysis Platform and the Our Future Health Trusted Research Environment.
For DNAnexus Platform users, an Apollo license is required to access the Cohort Browser. Contact DNAnexus Sales for more information.
Grouped box plots can be used to compare the distribution of values in a field containing numerical data, across different groups in a cohort. In a grouped box plot, each such group is defined by its members sharing the same value in another field that contains categorical data.
When creating a grouped box plot, note that:
The primary field must contain categorical or categorical multiple data
The primary field must contain no more than 15 distinct category values
The secondary field must contain numerical data
Supported Data Types
Primary Field
Secondary Field
The grouped box plot below shows a cohort that has been broken down into groups, according to the value in a field Doctor. For each group, a box plot provides detail on the reported Visit Feeling, for cohort members who share a doctor:
In some cases, a field containing numeric data may also contain some non-numeric values. These values cannot be represented in a grouped box plot. See the chart just above for an example of the informational message that will show below the chart, in this scenario.
Clicking the "non-numeric values" link will display detail on those values, and the number of record in which each appears:
Cohort Browser grouped box plots represent all non-null numeric values. When a field contains an outlier value or values - that is, values that are unusually high or low - this can result in a grouped box plot that looks like this:
This grouped box plot displays data on the number of cups of coffee consumed per day, by members of different groups in a particular cohort, with groups defined by shared value in a field Coffee type. Note that in several groups, one member was recorded as consuming far more cups of coffee per day than others in the group.
In Cohort Compare mode, a grouped box plot can be used to compare the distribution of values in a field that's common to both cohorts, across groups defined using values in a categorical field that is also common to both cohorts.
In this scenario, a separate, color-coded box plot is displayed for each group in each cohort.
Hovering over one of these box plots opens an informational window showing detail on the distribution of values for the group in question.
Clicking the "ˇ" icon, in the lower right corner of the tile containing the chart, opens a tooltip showing the cohort names and the colors used to represent data in each.
When ingesting data using Data Model Loader, note that the following data types can be visualized in grouped box plots:
String Categorical
String Categorical Multi-Select
String Categorical Sparse
Integer Categorical
Integer Categorical Multi-Select
Integer
Integer Sparse
Float
Float Sparse
Categorical or Categorical Multiple (<=15 categories)
Numerical (Integer) or Numerical (Float)