Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

ETL to bring all your data into Databricks/Snowflake is a lot of effort. Much better if your OLTP data already exists in Databricks and you directly access it from your OLAP layer.


With the push towards open table formats (Iceberg) from both Snowflake and Databricks, it's even harder to get your Postgres OLTP tables ready for OLAP.

The problem isn't in the CDC / replication tools in the market.

The problem is that columnar stores (especially Iceberg) are not designed for the write /upserts patterns of OLTP systems.

They just can't keep up...

This is a big problem we're hoping to solve at Mooncake [0]. Turn Iceberg into an operational columnstore. So that it can be keep up (<s freshness) with your Postgres.

https://www.mooncake.dev/


Is Iceberg involved in every read/write? I thought it was mostly metadata?


DataFile(parquet) is not enough for table with update/delete, (they are part of iceberg "metadata"). for CDC from OLTP use-cases, the pattern involves rapidly marking rows as deleted/ insert new rows and optimizing small files. This is required for minutes-latency replication.

And for second latency replication, it is more involving, you actually need to build layer on top of iceberg to track pk/ apply deletion.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: