duckdb.force_execution is enabled, pg_duckdb handles query execution but still relies on PostgreSQL to scan row-store tables. For scans without selective filters, this hybrid flow can bottleneck on PostgreSQL’s table scan before DuckDB ever gets to apply its vectorized magic. Converting hot analytical tables into parquet-backed columnstores removes that bottleneck—the data is read natively by DuckDB, which keeps the entire pipeline columnar.
Why columnstore tables help
- Row-store scans stay in PostgreSQL and are serialized tuple-by-tuple into DuckDB batches.
- Columnstore tables (backed by parquet files) are read directly by DuckDB, avoiding the conversion step and enabling parallel columnar IO.
- You can keep the original OLTP table for writes while exporting snapshots to parquet whenever you need faster analytics.
Real example: clickstream engagement
1. Create the row-store fact table
2. Insert 3 million sample rows
3. Benchmark aggregates on the row-store table
Run the single-stage aggregation below first with PostgreSQL’s executor, then flipSET duckdb.force_execution = true;. DuckDB still trims latency (about 3.0s → 2.1s), but the scan stays in PostgreSQL so both paths remain in the same ballpark.
| Engine | Table scanned | Runtime observed | Notes |
|---|---|---|---|
| PostgreSQL | event_log | ~3.0 seconds | Row-store scan plus aggregate materialization |
| DuckDB | event_log | ~2.1 seconds | DuckDB executes, but scan still comes from PostgreSQL |
Export to parquet for true columnar IO
4. Copy the dataset to parquet files
Prerequisites:- Access to an S3-compatible object store that DuckDB can reach.
5. Create an external table
6. Rerun the query against the columnstore
| Engine | Table scanned | Runtime observed | Notes |
|---|---|---|---|
| DuckDB | event_log_cstore | ~0.8 seconds | Fully columnar scan, vectorized aggregation |
event_log table for day-to-day writes.