1. Reading Data from S3
1.1 Supported File Formats
pg_duckdb supports the following file formats:- CSV: Use
read_csv()function - Parquet: Use
read_parquet()function - JSON: Use
read_json()function
1.2 S3 Path Format
Basic Format
Reading Files with Wildcards
- Use
*to match all files in the current directory:
- Use
**to recursively match all files in subdirectories:
AWS Endpoint Examples
- Format:
s3.ap-east-1.amazonaws.com - Reference: https://docs.aws.amazon.com/general/latest/gr/s3.html
1.3 Basic Usage Examples
CSV Files
Parquet Files
JSON Files
2. Advanced Parameter Configuration
2.1 CSV Parameters
read_csv() automatically detects format by default, but also supports manual parameter specification:
2.2 JSON Parameters
2.3 Parquet Parameters
3. Data Import
Relyt supports two data import methods:CREATE TABLE AS SELECT with automatic schema inference and manual INSERT INTO SELECT.
3.1 CREATE TABLE AS SELECT
CREATE TABLE AS SELECT creates a new table and automatically infers data types:
- Creates new table with automatic data type inference
- Simple syntax, completes table creation and data import in one step
- Suitable for rapid data exploration and initial data import
3.2 INSERT INTO SELECT
INSERT INTO SELECT can insert data into existing tables:
- Inserts data into existing tables
- Requires pre-created table structure
- Suitable for appending data to existing tables or scenarios requiring precise table structure control
3.3 Data Type Handling
Important Note: When usingINSERT INTO SELECT method, read_xxx() functions return data type as struct (i.e., duckdb.row), therefore manual column name and data type conversion specification is required.
Example 1: Parquet File Import
Example 2: CSV File Import without Header
For CSV files without headers, usecolumn0, column1, column2… as column names: