Creating a Lakehouse Service: Background
I haven’t written nearly as much as I’d have liked to. However, I’m starting a new job next week and took the holiday week off in between jobs and decided it was a good time to hack around a bit. This will be part one (zero?) of a multiple-part series (as I actually start to piece these things together). Lakehouse? Very early on in my data scientist-to-engineer transition, the lakehouse became a very appealing architecture pattern and started receiving a lot of attention. At work (previous employer), we were working on migrating analytics workloads off of a SQL Server and onto Snowflake. For a hackathon (three-ish years ago!), my team and I decided to take a look at the dbt-external-tables package for some ingestion tasks. At the time, the package was really focused on snowpipes, so that was the focus of our project as well.
Creating a Lakehouse Service: Part 1: Deploying a UI Pod
This is part one of a longer series . On a personal note, I really enjoyed this part of the project because it helped me solidify some knowledge and learned some new stuff in the process. Selecting Tools DuckDB I knew that I could use open table formats pretty easily with some of the more popular distributed engines such as Snowflake. But, after the Small Data 2025 conference, I was particularly motivated to see how much use I could get out of DuckDB and some of its built in tooling, such as the new UI. Additionally, I have been really impressed with DuckDB’s integrations. Not only have they been focusing on compatibility with different open table formats, but they’ve been putting their engine in some pretty cool places .
Creating a Lakehouse Service: Part 2: Creating K8s Resources
This is part two of a longer series and here is the link to the previous part. sitrep Ok, so after part one, we have: an init script that can be used to “pre-configure” our duckdb session a working k8s pod configuration a custom duckdb-based image that runs our init script and launches UI an nginx sidecar that actually exposes the duckdb UI a configured traefik IngressRoute to access the nginx exposed port But, this is only for one pod. We want this to be “scalable”, so that many users can each have their own pod/session and are connected to the same Iceberg tables.