Cloudflare

Oct. 11, 2022

Datasette-lite sqlite-httpvfs experiment POC with the California Unclaimed Property database

screenshot

There’s a web browser version of datasette called datasette-lite which runs on Python ported to WASM with Pyodide which can load SQLite databases. I grafted the enhanced lazyFile implementation from emscripten and then from this implementation to datasette-lite relatively recently as a curious test. Threw in a 18GB CSV from CA’s unclaimed property records here

https://www.sco.ca.gov/upd_download_property_records.html

into a FTS5 Sqlite Database which came out to about 28GB after processing:

POC, non-merging Log/Draft PR for the hack:

https://github.com/simonw/datasette-lite/pull/49

You can run queries through to datasette-lite if you URL hack into it and just get to the query dialog, browsing is kind of a dud at the moment since datasette runs a count(*) which downloads everything.

Elon Musk’s CA Unclaimed Property

Still, not bad for a $0.42/mo hostable cached CDN’d read-only database. It’s on Cloudflare R2, so there’s no BW costs.

Celebrity gawking is one thing, but the real, useful thing that this can do is search by address. If you aren’t sure of the names, such as people having multiple names or nicknames, you can search by address and get a list of properties at a location. This is one thing that the California Unclaimed Property site can’t do.

I am thinking of making this more proper when R2 introduces lifecycle rules to delete old dumps. I could automate the process of dumping with GitHub Actions but I would like R2 to handle cleanup.

8:36 pm / Datasette , Datasette lite , Sqlite httpvfs , Experiment , Sqlite , Cloudflare , R2

Apr. 16, 2022

Released Gargantuan Takeout Rocket

Finally released GTR or Gargantuan Takeout Rocket.

GitHub repository

Gargantuan Takeout Rocket (GTR) is a toolkit of guides and software to help you take out your data from Google Takeout and put it somewhere else safe easily, periodically, and fast to make it easy to do the right thing of backing up your Google account and related services such as your YouTube account or Google Photos periodically.

It took a lot of time to research and to workaround the issues I found in Azure.

I also needed to take apart browser extensions and implement my own browser extension to facilitate the transfers.

Cloudflare Workers was also used to work around issues with Azure’s API.

All this combined, I was able to takeout 1.25TB of data in 3 minutes.

Now I’m showing it around on Twitter, Discord, Reddit, and more to users who have used VPSes to backup Google Takeout or have expressed dismay at the lack of options for users who are looking to store archives on the cheap. The response from people who’ve opted to stay on Google has been good!

There is also a project page with additional details here.

9:04 pm / Google , Takeout , Azure , Cloudflare , Chrome , Extensions , Dataportability , Backup

Apr. 16, 2022

Gargantuan Takeout Rocket

https://github.com/nelsonjchen/gtr

Used to do this with a VPS. It was still too slow to backup 1.25TB. TOOK HOURS! Built a Rocket.

It is comprised of a two major components:

A “GTR Proxy” Cloudflare Worker that HTTP3-izes the the Azure API and is capable of proxying base64 URLs to Google Takeout URLs.
A browser extension that coordinates all this and constructs a job plan to execute.

These two components are infinitely scaleable and can be used to backup Google Takeout at breathtaking, unprecedented speeds.

The GTR repo contains a guide on how to setup all of this.

A future revision or successor may target S3 or S3-like APIs. In particular, can we target Cloudflare’s R2 when it comes out?

12:00 am / Google , Takeout , Azure , Cloudflare , Chrome , Extensions , Dataportability , Backup

Apr. 1, 2021

GitHub Wiki SEE

https://github-wiki-see.page/

A project to get GitHub Wikis indexed by Google Search.

Explanation is right on the front page.

If a search engine can’t see it, it may as well be invisible.

Long-term project involving Rust, BigQuery, Cloudflare Workers, and lots of random hosting.

Probably one more the more visited sites on the internet and a continous effort to keep costs low.

Has caused GitHub to revisit their policy on a limited subset of Wikis meeting some criteria.

Will continue until all Wikis are indexed even if they are publically editable or don’t meet some star count criteria.

Personally, I just got really dismayed all this documentation I was doing on a Wiki wasn’t visible to Google. Then I realized it was a good idea to make it visible. Then I realized others don’t realize it was invisible at all. This was a problem.

12:00 am / Github , Wiki , Seo , Rust , Cloud run , Bigquery , Gcp , Flyio , Cloudflare