mindflakes

Oct 11, 2022 - 2 minute read

Datasette-lite sqlite-httpvfs experiment POC with the California Unclaimed Property database

screenshot

There’s a web browser version of datasette called datasette-lite which runs on Python ported to WASM with Pyodide which can load SQLite databases. I grafted the enhanced lazyFile implementation from emscripten and then from this implementation to datasette-lite relatively recently as a curious test. Threw in a 18GB CSV from CA’s unclaimed property records here

https://www.sco.ca.gov/upd_download_property_records.html

into a FTS5 Sqlite Database which came out to about 28GB after processing:

POC, non-merging Log/Draft PR for the hack:

https://github.com/simonw/datasette-lite/pull/49

You can run queries through to datasette-lite if you URL hack into it and just get to the query dialog, browsing is kind of a dud at the moment since datasette runs a count(*) which downloads everything.

Elon Musk’s CA Unclaimed Property

Still, not bad for a $0.42/mo hostable cached CDN’d read-only database. It’s on Cloudflare R2, so there’s no BW costs.

Celebrity gawking is one thing, but the real, useful thing that this can do is search by address. If you aren’t sure of the names, such as people having multiple names or nicknames, you can search by address and get a list of properties at a location. This is one thing that the California Unclaimed Property site can’t do.

I am thinking of making this more proper when R2 introduces lifecycle rules to delete old dumps. I could automate the process of dumping with GitHub Actions but I would like R2 to handle cleanup.