1. Reading Data from S3
1.1 Supported File Formats
pg_duckdb supports the following file formats:- CSV: Use
read_csv()
function - Parquet: Use
read_parquet()
function - JSON: Use
read_json()
function
1.2 S3 Path Format
Basic Format
Reading Files with Wildcards
- Use
*
to match all files in the current directory:
- Use
**
to recursively match all files in subdirectories:
AWS Endpoint Examples
- Format:
s3.ap-east-1.amazonaws.com
- Reference: https://docs.aws.amazon.com/general/latest/gr/s3.html
1.3 Basic Usage Examples
CSV Files
Parquet Files
JSON Files
2. Advanced Parameter Configuration
2.1 CSV Parameters
read_csv()
automatically detects format by default, but also supports manual parameter specification:
2.2 JSON Parameters
2.3 Parquet Parameters
3. Data Import
Relyt supports two data import methods:CREATE TABLE AS SELECT
with automatic schema inference and manual INSERT INTO SELECT
.
3.1 CREATE TABLE AS SELECT
CREATE TABLE AS SELECT
creates a new table and automatically infers data types:
- Creates new table with automatic data type inference
- Simple syntax, completes table creation and data import in one step
- Suitable for rapid data exploration and initial data import
3.2 INSERT INTO SELECT
INSERT INTO SELECT
can insert data into existing tables:
- Inserts data into existing tables
- Requires pre-created table structure
- Suitable for appending data to existing tables or scenarios requiring precise table structure control
3.3 Data Type Handling
Important Note: When usingINSERT INTO SELECT
method, read_xxx()
functions return data type as struct
(i.e., duckdb.row
), therefore manual column name and data type conversion specification is required.
Example 1: Parquet File Import
Example 2: CSV File Import without Header
For CSV files without headers, usecolumn0
, column1
, column2
… as column names: