Does QueryFlow support real-time CDC from Postgres?

Not in v1.5. QueryFlow uses batch sync patterns (snapshot, incremental, upsert) on a schedule. Real-time CDC via Postgres logical replication is on the public roadmap at queryflow.featurebase.app/roadmap with active voting. For sub-minute replication SLAs, current QueryFlow users typically use 5-minute interval scheduled syncs as the practical alternative.

How does QueryFlow handle Postgres tables with hundreds of millions of rows?

Initial snapshot of a 100M+ row table takes hours to complete depending on Postgres read throughput and Snowflake write throughput. After the initial snapshot, incremental syncs only process changed rows (typically thousands or millions, not hundreds of millions). For tables where initial snapshot is impractical, you can do a one-time bulk export via pg_dump + Snowflake stage as the initial load, then use QueryFlow for ongoing incremental sync.

Does QueryFlow handle Postgres extensions like PostGIS or hstore?

PostgreSQL data types from extensions (PostGIS GEOMETRY, hstore, etc.) are read as their text representation. Custom handling in Flow Books can parse these to more appropriate Snowflake types. The most common case (PostGIS) is typically mapped to Snowflake GEOGRAPHY for spatial queries.

Can multiple Postgres sources sync into the same Snowflake destination?

Yes. You can have multiple pipelines, each with a different Postgres source, all writing to different tables in the same Snowflake destination. Schedule them independently.

What happens if a sync fails halfway through?

QueryFlow uses transactional writes where possible — failed pipelines don't leave partial data in the destination. For pipelines using Bulk operations (Snowflake's COPY INTO), the operation either fully completes or fully rolls back. Failed runs surface in the Observatory dashboard with full error context, and you can retry with one click.

Postgres to Snowflake Sync — QueryFlow (Mac-Native ETL)

The Postgres to Snowflake sync problem

Most production data lives in operational databases (Postgres being the most common modern choice), but analytics workloads need a warehouse-shaped store. Syncing data from one to the other is a constant operational need — you can't query production for analytics without performance impact, but you need recent data in the warehouse to make decisions. The standard solutions are Fivetran (managed, expensive), Airbyte (open source, requires Docker hosting), AWS DMS (cloud-native, complex config), or custom code (requires maintenance).

QueryFlow as the desktop-native alternative

QueryFlow connects to Postgres via postgres-nio (the same TCP wire protocol library that powers Vapor and other Swift server frameworks). It works with any Postgres deployment: localhost, AWS RDS, AWS Aurora, Neon, Supabase, Heroku Postgres, Crunchy Bridge, anywhere. The Snowflake connector uses the SQL API v2 with Programmatic Access Tokens. A pipeline connects them: read from Postgres, optionally transform, write to Snowflake.

Sync patterns supported

Full snapshot: replicate the entire source table to destination on each run. Simple, works well for small dimension tables (under a few million rows), schedule daily or weekly. Incremental by timestamp: track an updated_at column, only sync rows changed since last run. Standard pattern for fact tables. Incremental by sequence: track an auto-increment ID for append-only tables. Most efficient pattern when applicable. Upsert: merge new rows and update changed rows in the destination, requires a primary key.

Schema synchronization

QueryFlow can either write to existing Snowflake tables (you create the table once, QueryFlow writes data to it) or auto-create destination tables (QueryFlow generates a Snowflake table from the Postgres source schema). Type mapping: Postgres INTEGER → Snowflake NUMBER(10), Postgres TIMESTAMP WITH TIME ZONE → Snowflake TIMESTAMP_TZ, Postgres JSONB → Snowflake VARIANT, Postgres TEXT → Snowflake VARCHAR. Custom mappings via Flow Books.

Why teams choose QueryFlow over Fivetran for this workflow

Cost: $299.99/year vs Fivetran's $1,500-15,000/month. Control: pipelines run on your Mac with your code in front of you, not in a vendor's cloud. Speed of iteration: change a pipeline and re-run in seconds, no waiting for vendor deployment. Salesforce, Sheets, and CSV use cases: QueryFlow handles all of these without adding connector fees. Trade-offs: you're responsible for the Mac being on, no vendor SLA, single-machine fault tolerance.

Production patterns observed in QueryFlow users

Dedicated Mac mini as ETL server: $600 hardware, plug into ethernet, leave on 24/7. Many users find this dramatically cheaper than equivalent AWS infrastructure for the same workload. Laptop + macOS Power Schedule: configure your daily-driver MacBook to wake at scheduled job times. Works for non-mission-critical syncs. Hybrid: use QueryFlow for less critical syncs, keep Fivetran for the highest-volume tables. Many teams reduce their Fivetran bill by 60-80% this way.

When Fivetran or Airbyte is still the right answer

If you're syncing dozens of tables continuously with sub-minute freshness SLAs and any downtime is unacceptable, vendor-managed CDC infrastructure justifies its cost. If your team has zero capacity to manage ETL infrastructure and can pay for fully-managed, Fivetran is the comfortable choice. For most Postgres-to-Snowflake syncs at small-to-medium scale with daily or hourly freshness requirements, QueryFlow does the work for 5% of the cost.

PostgreSQL to Snowflake, scheduled and synced.