Friday, May 30, 2025

Top 5 This Week

Related Posts

Compiler Explorer and the Promise of URLs That Last Forever

- Advertisement -

0:00

- Advertisement -

The history is this: back in the old days (2012), we used to store the entire Compiler Explorer state in the URL. That got unwieldy (who would have thought encoding an entire compiler state in a URL might get a bit long?), so we added support for Google’s link shortener goo.gl in March 2014. That meant short links were of the form goo.gl/abc123. Clicking a goo.gl link would eventually redirect you to the full URL link on our site, and we’d decode the state from the URL.

In 2016, Stack Overflow banned link shorteners because of how they cloak the actual destination of links. Abusers could post innocent goo.gl links that directed folks unwittingly to bad content. However, that meant our Compiler Explorer links were also affected. At the time, we had no intention of storing any user data, so we came up with a hack: we still used goo.gl, but we then rewrote the link we handed out to be godbolt.org/g/abc123 (where the abc123 is the goo.gl unique ID). We then redirected any hits to /g/abc123 to goo.gl/abc123, which then (finally) redirected back to godbolt.org with the appropriate state in the URL. If you’re keeping track, that’s three redirects to show you some assembly code. We were really committed to making things complicated. Later, we used Google’s API to avoid the redirection dance.

- Advertisement -

By 2018, the limitations of storing state in the URL started to bite. There’s a limit to how long a URL can be (and we’d already started compressing the data in the URL), so we needed a better solution. We finally implemented our own storage solution: we hash the input, save the state as a JSON document on S3 under the hash, and then give out a shortened form of the hash as a godbolt.org/z/hashbit URL. We use DynamoDB to store the mapping of shortened hashes to the full paths (accounting for partial collisions, etc.). And, amusingly, we check the short link’s hash for rude words and add deliberate extra information into the document until we no longer get a rude word. Yes, we literally check if your shortened URL contains profanity. Because apparently even random hashes can’t be trusted to keep it clean. This led to bug #1297, which remains one of my favourite issues we’ve ever had to fix.

We still support the godbolt.org/g/abc123 links, but… despite Google solemnly promising that “all existing links will continue to redirect to the intended destination,” it went read-only a few years back, and now they’re finally sunsetting it in August 2025. Here I was in 2014, thinking I was so clever using Google’s shortener. “It’ll be around forever!” I said. “Google never discontinues products!” I said. Er…

That means we’ll no longer be able to resolve goo.gl-based links! Which is, to use technical terminology, a bit pants. One of my founding principles is that Compiler Explorer links should last forever. I can’t do anything about the really legacy actual goo.gl links, but I can do something about the godbolt.org/g/abc123 links!

- Advertisement -

Over the last few days, I’ve been scraping everywhere I can think of, collating the links I can find out in the wild, and compiling my own database of links1 – and importantly, the URLs they redirect to. So far, I’ve found 12,000 links from scraping:

  • Google (using their web search API)
  • GitHub (using their API)
  • Our own (somewhat limited) web logs
  • The archive.org Stack Overflow data dumps
  • Archive.org’s own list of archived webpages

SQLite terminal showing query result: SELECT COUNT(*) from google_links; returning 12298 rescued links

12,298 rescued links and counting – not bad for a few days of digital archaeology

We’re now using the database in preference to goo.gl internally, so I’m also keeping an eye on new “g” links that we don’t yet have.

Thanks to Peter Cordes for reminding us about this issue and bringing it to our attention2.

If you have a secret cache of godbolt.org/g/abc123 links you have lying around, now’s the time to visit each of them! That will ensure they’re in my web logs and I’ll add them to the database. Otherwise, sadly, in August 2025 those links will stop working, joining the great digital graveyard alongside Flash games and GeoCities pages.

The Bigger Picture

This whole saga reinforces why I’m skeptical of relying on third-party services for critical infrastructure. Google’s URL shortener was supposed to be permanent. The redirect chains we built were clever workarounds that bought us time, but ultimately, the only way to truly keep a promise of “URLs that last forever” is to own the entire stack.

It’s been a fascinating archaeological dig through the internet, hunting down these legacy links like some sort of digital Indiana Jones, except instead of ancient artifacts I’m rescuing compiler flags and optimization examples. Each one represents someone’s attempt to share knowledge, ask a question, or demonstrate a concept. Preserving them feels like preserving a small piece of programming history.

So if you’ve got old Compiler Explorer links bookmarked somewhere, dust them off and give them a click. You’ll be helping preserve a little corner of the internet’s shared knowledge – and keeping a promise I made back in 2012. And hey, at least this time I’m in control of the infrastructure. What could possibly go wrong?


Disclaimer

This article was written by a human, but links were suggested by and grammar checked by an LLM.

Posted at 10:12:00 CDT on 28th May 2025.

Read More

- Advertisement -

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Popular Articles