Load Data From Object Store

1. Reading Data from S3

1.1 Supported File Formats

pg_duckdb supports the following file formats:

CSV: Use read_csv() function
Parquet: Use read_parquet() function
JSON: Use read_json() function

1.2 S3 Path Format

Basic Format

s3://[endpoint]/[bucket]/[prefix]/filename

Reading Files with Wildcards

Use * to match all files in the current directory:

s3://[endpoint]/[bucket]/my_data/*

Use ** to recursively match all files in subdirectories:

s3://[endpoint]/[bucket]/my_data/**

AWS Endpoint Examples

Format: s3.ap-east-1.amazonaws.com
Reference: https://docs.aws.amazon.com/general/latest/gr/s3.html

1.3 Basic Usage Examples

CSV Files

SELECT * FROM public.read_csv(
  's3://[endpoint]/[bucket]/[prefix]/* accessid=xxxxxx secret=yyyyyy'
);

Parquet Files

SELECT * FROM public.read_parquet(
  's3://s3.us-west-1.amazonaws.com/my-bucket/parquet_files/*
   accessid=XXXX
   secret=YYYY'
);

JSON Files

SELECT * FROM public.read_json(
  's3://s3.ap-east-1.amazonaws.com/tpc-h/table.json
  accessid=XXXX
  secret=YYYY'
);

2. Advanced Parameter Configuration

2.1 CSV Parameters

read_csv() automatically detects format by default, but also supports manual parameter specification:

SELECT * FROM public.read_csv(
  's3://xxxx...', 
  delim='|',           -- Delimiter
  escape='"',          -- Escape character
  quote='"',           -- Quote character
  header=true          -- Whether header is included
);

2.2 JSON Parameters

SELECT * FROM public.read_json(
  's3://xxxxx...',
  ignore_errors=true   -- Ignore JSON parsing failed rows
);

2.3 Parquet Parameters

SELECT * FROM public.read_parquet(
  's3://xxxx...',
  filename=true,           -- Include filename in results
  file_row_number=true,    -- Include row number in results
  hive_partitioning=true,  -- Use path as partition values
  union_by_name=true       -- Merge different file schemas by column name
);

3. Data Import

Relyt supports two data import methods: CREATE TABLE AS SELECT with automatic schema inference and manual INSERT INTO SELECT.

3.1 CREATE TABLE AS SELECT

CREATE TABLE AS SELECT creates a new table and automatically infers data types:

-- Automatically create table and import data
CREATE TABLE my_table AS 
SELECT * FROM read_parquet('s3://bucket/data.parquet accessid=xxx secret=yyy');

-- Create table from CSV file
CREATE TABLE csv_table AS 
SELECT * FROM read_csv('s3://bucket/data.csv accessid=xxx secret=yyy');

-- Create table from JSON file
CREATE TABLE json_table AS 
SELECT * FROM read_json('s3://bucket/data.json accessid=xxx secret=yyy');

Features:

Creates new table with automatic data type inference
Simple syntax, completes table creation and data import in one step
Suitable for rapid data exploration and initial data import

3.2 INSERT INTO SELECT

INSERT INTO SELECT can insert data into existing tables:

INSERT INTO target_table 
SELECT * FROM read_xxx('s3://...');

Features:

Inserts data into existing tables
Requires pre-created table structure
Suitable for appending data to existing tables or scenarios requiring precise table structure control

3.3 Data Type Handling

Important Note: When using INSERT INTO SELECT method, read_xxx() functions return data type as struct (i.e., duckdb.row), therefore manual column name and data type conversion specification is required.

Example 1: Parquet File Import

-- Create target table
CREATE TABLE NATION (
    N_NATIONKEY  INTEGER NOT NULL,
    N_NAME       VARCHAR NOT NULL,
    N_REGIONKEY  INTEGER NOT NULL,
    N_COMMENT    VARCHAR,
    N_DUMMY      VARCHAR
);

-- Import data
INSERT INTO nation 
SELECT 
  r['n_nationkey']::INTEGER,
  r['n_name']::VARCHAR,
  r['n_regionkey']::INTEGER,
  r['n_comment']::VARCHAR,
  r['n_dummy']::VARCHAR
FROM read_parquet(
  's3://xxx/nation.parquet accessid=xxx secret=yyyy'
) r;

Example 2: CSV File Import without Header

For CSV files without headers, use column0, column1, column2… as column names:

INSERT INTO nation 
SELECT 
  r['column0']::INTEGER,
  r['column1']::VARCHAR,
  r['column2']::INTEGER,
  r['column3']::VARCHAR,
  r['column4']::VARCHAR
FROM read_csv(
  's3://xxx/.../nation.tbl accessid=xxx secret=yyy'
) r;

Features

API Introduction

AI Tools

Extensions

Security

Blog

Load Data From Object Store

1. Reading Data from S3

1.1 Supported File Formats

1.2 S3 Path Format

Basic Format

Reading Files with Wildcards

AWS Endpoint Examples

1.3 Basic Usage Examples

CSV Files

Parquet Files

JSON Files

2. Advanced Parameter Configuration

2.1 CSV Parameters

2.2 JSON Parameters

2.3 Parquet Parameters

3. Data Import

3.1 CREATE TABLE AS SELECT

3.2 INSERT INTO SELECT

3.3 Data Type Handling

Example 1: Parquet File Import

Example 2: CSV File Import without Header

Features

API Introduction

AI Tools

Extensions

Security

Blog

​1. Reading Data from S3

​1.1 Supported File Formats

​1.2 S3 Path Format

​Basic Format

​Reading Files with Wildcards

​AWS Endpoint Examples

​1.3 Basic Usage Examples

​CSV Files

​Parquet Files

​JSON Files

​2. Advanced Parameter Configuration

​2.1 CSV Parameters

​2.2 JSON Parameters

​2.3 Parquet Parameters

​3. Data Import

​3.1 CREATE TABLE AS SELECT

​3.2 INSERT INTO SELECT

​3.3 Data Type Handling

​Example 1: Parquet File Import

​Example 2: CSV File Import without Header

1. Reading Data from S3

1.1 Supported File Formats

1.2 S3 Path Format

Basic Format

Reading Files with Wildcards

AWS Endpoint Examples

1.3 Basic Usage Examples

CSV Files

Parquet Files

JSON Files

2. Advanced Parameter Configuration

2.1 CSV Parameters

2.2 JSON Parameters

2.3 Parquet Parameters

3. Data Import

3.1 CREATE TABLE AS SELECT

3.2 INSERT INTO SELECT

3.3 Data Type Handling

Example 1: Parquet File Import

Example 2: CSV File Import without Header