ETL to bring all your data into Databricks/Snowflake is a lot of effort. Much be...

moonikakiss · 2025-05-14T19:56:12 1747252572

With the push towards open table formats (Iceberg) from both Snowflake and Databricks, it's even harder to get your Postgres OLTP tables ready for OLAP.

The problem isn't in the CDC / replication tools in the market.

The problem is that columnar stores (especially Iceberg) are not designed for the write /upserts patterns of OLTP systems.

They just can't keep up...

This is a big problem we're hoping to solve at Mooncake [0]. Turn Iceberg into an operational columnstore. So that it can be keep up (<s freshness) with your Postgres.

https://www.mooncake.dev/

ako · 2025-05-14T21:02:29 1747256549

Is Iceberg involved in every read/write? I thought it was mostly metadata?

zhousun · 2025-05-15T17:27:47 1747330067

DataFile(parquet) is not enough for table with update/delete, (they are part of iceberg "metadata"). for CDC from OLTP use-cases, the pattern involves rapidly marking rows as deleted/ insert new rows and optimizing small files. This is required for minutes-latency replication.

And for second latency replication, it is more involving, you actually need to build layer on top of iceberg to track pk/ apply deletion.