Box Plot

Learn to build and use box plots in the Cohort Browser.

An Apollo license is required to use Cohort Browser on the DNAnexus Platform. Org approval may also be required. Contact DNAnexus Sales for more information.

When to Use Box Plots

Box plots can be used to visualize numerical data.

Supported Data Types

Numerical (Integer)

Numerical (Float)

Numerical data can also be visualized using histograms.

Using Box Plots in the Cohort Browser

Box plots provide a range of detail on the distribution of values in a field containing numerical data. Each box plot includes three thin blue horizontal lines, indicating, from top to bottom:

Max - The maximum, or highest value
Med - The median value
Min - The minimum, or lowest value

The blue box straddling the median value line represents the span covered by the median 50% of values. Of the total number of values, 25% sit above the box, and 25% lie below it.

Hovering over the middle of a box plot opens a window displaying detail on the maximum, median, and minimum values. Also shown are the values at the "top" ("Q3") and "bottom" ("Q1") of the box. "Q1" is the highest value in the first, or lowest, quartile of values. "Q3" is the highest value in the third quartile.

Also shown in this window is the total count of values covered by the box plot, along with the name of the entity to which the data relates.

Customizing Chart Display

You can customize how box plot data is displayed by clicking ⛭ Chart Settings in the chart toolbar.

For data with wide value ranges or skewed distributions, you can apply logarithmic scaling to the Y-axis:

log₂ - Values transformed using $f(y) = \text{sign}(y) \cdot \log_2(|y|+1)$
log₁₀ - Values transformed using $f(y) = \text{sign}(y) \cdot \log_{10}(|y|+1)$

When you apply logarithmic transformation, the Y-axis label updates to show the transformation type (log₂ or log₁₀).

Non-Numeric Data in Box Plots

Fields containing primarily numeric data may also include non-numeric values. These non-numeric values cannot be represented in a box plot. See the chart above for an example of the informational message that shows below the chart when non-numeric values are present.

Clicking the "non-numeric values" link displays detail on those values, and the number of record in which each appears:

In this scenario, a discrepancy exists between the "count" figure shown in the chart label and the one shown in the informational window that opens when hovering over the middle of a box plot. The latter figure is smaller, with the discrepancy determined by the number of records for which values can't be displayed in the box plot.

Outliers

Cohort Browser box plots represent all non-null numeric values. When a field contains an outlier value or values - that is, values that are unusually high or low - this can result in a box plot that looks like this:

This box plot displays data on the number of cups of coffee consumed per day, by patients of a particular cohort. One cohort patient was recorded as consuming 42 cups of coffee per day, much higher than the value (2 cups/day) at the "top" of the third quartile, and far higher than the median value of 2 cups/day.

Box Plots in Cohort Compare Mode

In Cohort Compare mode, a box plot chart can be used to compare the distribution of values in a field that's common to both cohorts. In this scenario, a separate, color-coded box plot is displayed for each cohort.

Hovering over either of the plots opens an informational window showing detail on the distribution of values for the cohort.

Clicking the "ˇ" icon, in the lower right corner of the tile containing the chart, opens a tooltip showing the cohort names and the colors used to represent data in each.

Preparing Data for Visualization in Box Plots

When ingesting data using Data Model Loader, the following data types can be visualized in box plots:

Integer
Integer Sparse
Float
Float Sparse

Last updated 21 days ago

Was this helpful?

hashtagWhen to Use Box Plots

hashtagUsing Box Plots in the Cohort Browser

hashtagCustomizing Chart Display

hashtagNon-Numeric Data in Box Plots

hashtagOutliers

hashtagBox Plots in Cohort Compare Mode

hashtagPreparing Data for Visualization in Box Plots