Apollo Apps

Spark Applications

The Spark application is an extension of the current app(let) framework. Currently, app(let)s have a specification for their VM (instance type, OS, packages). This has been extended to allow for an additional optional cluster specification with type=dxspark.

  • Calling /app(let)-xxx/run for Spark apps creates a Spark cluster (+ master VM).

  • The master VM (where the app shell code runs) acts as the driver node for Spark.

  • Code in the master VM leverages the Spark infrastructure.

  • Job mechanisms (monitoring, termination, etc.) are the same for Spark apps as for any other regular app(let)s on the Platform.

  • Spark apps use the same platform "dx" communication between the master VM and DNAnexus API servers.

  • There's a new log collection mechanism to collect logs from all nodes.

  • You can use the Spark UI to monitor running job using ssh tunneling.

Spark apps can be launched over a distributed Spark cluster. All Spark apps are listed under Translational Informatics category. Spark apps suitable for analysis of data are grouped in Apollo Analysis category.

The following Spark apps are available in the App Library.

App

Description

CSV Loader

Load CSV files into the database

VCF Loader

Load the VCF file into the database

Spark SQL Runner

Run generic Spark SQL commands

Allele frequency calculator

Calculate allele frequencies for a cohort and load the results into the database

GWAS

Do a GWAS analysis on the specified cohort(s) and load the results into the database

FAM generator

Generate the phenotype/sex/family relationship table from the phenotype database

PLATO Single Variant Analysis

Perform single-outcome (GWAS) and multiple outcome (PheWAS) analyses

NOTE: Not all applications are available in all packages. Please contact sales@dnanexus.com for more information.