1. Reading Data from S3

1.1 Supported File Formats

pg_duckdb supports the following file formats:
  • CSV: Use read_csv() function
  • Parquet: Use read_parquet() function
  • JSON: Use read_json() function

1.2 S3 Path Format

Basic Format

s3://[endpoint]/[bucket]/[prefix]/filename

Reading Files with Wildcards

  • Use * to match all files in the current directory:
s3://[endpoint]/[bucket]/my_data/*
  • Use ** to recursively match all files in subdirectories:
s3://[endpoint]/[bucket]/my_data/**

AWS Endpoint Examples

1.3 Basic Usage Examples

CSV Files

SELECT * FROM public.read_csv(
  's3://[endpoint]/[bucket]/[prefix]/* accessid=xxxxxx secret=yyyyyy'
);

Parquet Files

SELECT * FROM public.read_parquet(
  's3://s3.us-west-1.amazonaws.com/my-bucket/parquet_files/*
   accessid=XXXX
   secret=YYYY'
);

JSON Files

SELECT * FROM public.read_json(
  's3://s3.ap-east-1.amazonaws.com/tpc-h/table.json
  accessid=XXXX
  secret=YYYY'
);

2. Advanced Parameter Configuration

2.1 CSV Parameters

read_csv() automatically detects format by default, but also supports manual parameter specification:
SELECT * FROM public.read_csv(
  's3://xxxx...', 
  delim='|',           -- Delimiter
  escape='"',          -- Escape character
  quote='"',           -- Quote character
  header=true          -- Whether header is included
);

2.2 JSON Parameters

SELECT * FROM public.read_json(
  's3://xxxxx...',
  ignore_errors=true   -- Ignore JSON parsing failed rows
);

2.3 Parquet Parameters

SELECT * FROM public.read_parquet(
  's3://xxxx...',
  filename=true,           -- Include filename in results
  file_row_number=true,    -- Include row number in results
  hive_partitioning=true,  -- Use path as partition values
  union_by_name=true       -- Merge different file schemas by column name
);

3. Data Import

Relyt supports two data import methods: CREATE TABLE AS SELECT with automatic schema inference and manual INSERT INTO SELECT.

3.1 CREATE TABLE AS SELECT

CREATE TABLE AS SELECT creates a new table and automatically infers data types:
-- Automatically create table and import data
CREATE TABLE my_table AS 
SELECT * FROM read_parquet('s3://bucket/data.parquet accessid=xxx secret=yyy');

-- Create table from CSV file
CREATE TABLE csv_table AS 
SELECT * FROM read_csv('s3://bucket/data.csv accessid=xxx secret=yyy');

-- Create table from JSON file
CREATE TABLE json_table AS 
SELECT * FROM read_json('s3://bucket/data.json accessid=xxx secret=yyy');
Features:
  • Creates new table with automatic data type inference
  • Simple syntax, completes table creation and data import in one step
  • Suitable for rapid data exploration and initial data import

3.2 INSERT INTO SELECT

INSERT INTO SELECT can insert data into existing tables:
INSERT INTO target_table 
SELECT * FROM read_xxx('s3://...');
Features:
  • Inserts data into existing tables
  • Requires pre-created table structure
  • Suitable for appending data to existing tables or scenarios requiring precise table structure control

3.3 Data Type Handling

Important Note: When using INSERT INTO SELECT method, read_xxx() functions return data type as struct (i.e., duckdb.row), therefore manual column name and data type conversion specification is required.

Example 1: Parquet File Import

-- Create target table
CREATE TABLE NATION (
    N_NATIONKEY  INTEGER NOT NULL,
    N_NAME       VARCHAR NOT NULL,
    N_REGIONKEY  INTEGER NOT NULL,
    N_COMMENT    VARCHAR,
    N_DUMMY      VARCHAR
);

-- Import data
INSERT INTO nation 
SELECT 
  r['n_nationkey']::INTEGER,
  r['n_name']::VARCHAR,
  r['n_regionkey']::INTEGER,
  r['n_comment']::VARCHAR,
  r['n_dummy']::VARCHAR
FROM read_parquet(
  's3://xxx/nation.parquet accessid=xxx secret=yyyy'
) r;

Example 2: CSV File Import without Header

For CSV files without headers, use column0column1column2… as column names:
INSERT INTO nation 
SELECT 
  r['column0']::INTEGER,
  r['column1']::VARCHAR,
  r['column2']::INTEGER,
  r['column3']::VARCHAR,
  r['column4']::VARCHAR
FROM read_csv(
  's3://xxx/.../nation.tbl accessid=xxx secret=yyy'
) r;