# CSV Loader

{% hint style="info" %}
A license is required to access Spark functionality on the DNAnexus Platform. [Contact DNAnexus Sales](mailto:sales@dnanexus.com) for more information.
{% endhint %}

## Overview

The CSV Loader ingests CSV files into a database. The input CSV files are loaded into a Parquet-format database and tables that can be queried using Spark SQL.

You can load a single CSV file or many CSV files. In the many files case, all files must be syntactically equal.

For example:

* All files must have the same separator. This can be a comma, tab, or another consistent delimiter.
* All files must include a header line, or all files must exclude it

{% hint style="info" %}
Each CSV file is loaded into its own table within the specified database.
{% endhint %}

## How to Run CSV Loader

*Input:*

* CSV (array of CSV files to load into the database)

*Required Parameters:*

* `database_name` -> name of the database to load the CSV files into.
* `create_mode` -> `strict` mode creates database and tables from scratch and `optimistic` mode creates databases and tables if they do not already exist.
* `insert_mode` -> `append` appends data to the end of tables and `overwrite` is equivalent to truncating the tables and then appending to them.
* `table_name` -> array of table names, one for each corresponding CSV file by array index.
* `type` -> the cluster type, `"spark"` for Spark apps

*Other Options:*

* `spark_read_csv_header` -> default `false` -- whether the first line of each CSV should be used as column names for the corresponding table.
* `spark_read_csv_sep` -> default `,` -- the separator character used by each CSV.
* `spark_read_csv_infer_schema` -> default `false` -- whether the input schema should be inferred from the data.

### Basic Run

The following case creates a brand new database and loads data into two new tables:

```shell
dx run app-csv-loader \
   -i database_name=pheno_db \
   -i create_mode=strict \
   -i insert_mode=append \
   -i spark_read_csv_header=true \
   -i spark_read_csv_sep=, \
   -i spark_read_csv_infer_schema=true \
   -i csv=file-xxxx \
   -i table_name=sample_metadata \
   -i csv=file-yyyy \
   -i table_name=gwas_result
```


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://documentation.dnanexus.com/user/spark/example-applications/csv-loader.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
