CSV Loader
Last updated
Was this helpful?
Last updated
Was this helpful?
The CSV Loader ingests CSV files into a database. The input CSV files are loaded into a Parquet-format database and tables that can be queried using Spark SQL.
You can load a single CSV file or many CSV files. In the many files case, all files must be syntactically equal.
For example:
All files must have the same separator (e.g. comma, tab)
All files must include a header line, or all files must exclude it
NOTE: Each CSV file is loaded into its own table within the specified database.
Input:
CSV (array of CSV files to load into the database)
Required Parameters:
database_name
-> name of the database to load the CSV files into.
create_mode
-> strict
mode creates database and tables from scratch and optimistic
mode creates databases and tables if they do not already exist.
insert_mode
-> append
appends data to the end of tables and overwrite
is equivalent to truncating the tables and then appending to them.
table_name
-> array of table names, one for each corresponding CSV file by array index.
type
-> the cluster type, "spark"
for Spark apps
Other Options:
spark_read_csv_header
-> default false
-- whether the first line of each CSV should be used as column names for the corresponding table.
spark_read_csv_sep
-> default ,
-- the separator character used by each CSV.
spark_read_csv_infer_schema
-> default false
-- whether the input schema should be inferred from the data.
The following case creates a brand new database and loads data into two new tables: