The Spark application is an extension of the current app(let) framework. Currently, app(let)s have a specification for their VM (instance type, OS, packages). This has been extended to allow for an additional optional cluster specification with
Calling /app(let)-xxx/run for Spark apps creates a Spark cluster (+ master VM).
The master VM (where the app shell code runs) acts as the driver node for Spark.
Code in the master VM leverages the Spark infrastructure.
Job mechanisms (monitoring, termination, etc.) are the same for Spark apps as for any other regular app(let)s on the Platform.
Spark apps use the same platform "dx" communication between the master VM and DNAnexus API servers.
There's a new log collection mechanism to collect logs from all nodes.
You can use the Spark UI to monitor running job using ssh tunneling.
Spark apps can be launched over a distributed Spark cluster. All Spark apps are listed under Translational Informatics category. Spark apps suitable for analysis of data are grouped in Apollo Analysis category.
The following Spark apps are available in the App Library.
Load CSV files into the database
Load the VCF file into the database
Run generic Spark SQL commands
Allele frequency calculator
Calculate allele frequencies for a cohort and load the results into the database
Do a GWAS analysis on the specified cohort(s) and load the results into the database
Generate the phenotype/sex/family relationship table from the phenotype database
PLATO Single Variant Analysis
Perform single-outcome (GWAS) and multiple outcome (PheWAS) analyses
NOTE: Not all applications are available in all packages. Please contact [email protected] for more information.