Box Plot

Learn to build and use box plots in the Cohort Browser.

The Cohort Browser is accessible to all users of the UK Biobank Research Analysis Platform and the Our Future Health Trusted Research Environment.

For DNAnexus Platform users, an Apollo license is required to access the Cohort Browser. Contact DNAnexus Sales for more information.

When to Use Box Plots

Box plots can be used to visualize numerical data.

Numerical data can also be visualized using histograms.

Using Box Plots in the Cohort Browser

Box plots provide a range of detail on the distribution of values in a field containing numerical data. Each box plot includes three thin blue horizontal lines, indicating, from top to bottom:

  • Max - The maximum, or highest value

  • Med - The median value

  • Min - The minimum, or lowest value

The blue box straddling the median value line represents the span covered by the median 50% of values. Of the total number of values, 25% sit above the box, and 25% lie below it.

Hovering over the middle of a box plot opens a window displaying detail on the maximum, median, and minimum values. Also shown are the values at the "top" ("Q3") and "bottom" ("Q1") of the box. "Q1" is the highest value in the first, or lowest, quartile of values; "Q3" is the highest value in the third quartile.

Also shown in this window is a number representing the total number of values covered by the box plot, along with the name of the entity to which the data relates.

Non-Numeric Data in Box Plots

In some cases, a field containing numeric data may also contain some non-numeric values. These values cannot be represented in a box plot. See the chart just above for an example of the informational message that will show below the chart, in this scenario.

Clicking the "non-numeric values" link will display detail on those values, and the number of record in which each appears:

Note as well that in this scenario, there will be a discrepancy between the "count" figure shown in the chart label, and that shown in the informational window that opens, when hovering over the middle of a box plot. The latter figure will be smaller, with the discrepancy determined by the number of records for which values can't be displayed in the box plot.

Outliers

Cohort Browser box plots represent all non-null numeric values. When a field contains an outlier value or values - that is, values that are unusually high or low - this can result in a box plot that looks like this:

This box plot displays data on the number of cups of coffee consumed per day, by members of a particular cohort. One cohort member was recorded as consuming 42 cups of coffee per day, much higher than the value (2 cups/day) at the "top" of the third quartile, and far higher than the median value of 2 cups/day.

Box Plots in Cohort Compare Mode

In Cohort Compare mode, a box plot chart can be used to compare the distribution of values in a field that's common to both cohorts. In this scenario, a separate, color-coded box plot is displayed for each cohort.

Hovering over either of the plots opens an informational window showing detail on the distribution of values for the cohort in question.

Clicking the "ˇ" icon, in the lower right corner of the tile containing the chart, opens a tooltip showing the cohort names and the colors used to represent data in each.

Preparing Data for Visualization in Box Plots

When ingesting data using Data Model Loader, note that the following data types can be visualized in box plots:

  • Integer

  • Integer Sparse

  • Float

  • Float Sparse

Last updated

Copyright 2024 DNAnexus