LessWrong is migrating hosting providers (report bugs!)

jimrandomh

[-]the gears to ascension3mo4730

Why?

[-]habryka3mo310

I might respond in more depth later, and I am sure other team members have opinions, but roughly:

React, our frontend framework, has chosen a kind of weird path where if you want to utilize the latest set of features in React 19, you basically have to use NextJS (more concretely, server functions and server components are two features that you would be extremely hard to use without framework support, and NextJS is the only framework with support).
We've been using NextJS for all of the other web projects that we've been building (including AI 2027, the new MIRI book website, our internal Lighthaven booking infrastructure, our conference scheduling software Writehaven, and our internal company infrastructure), and it's generally been a great experience in almost every respect (it's been a less great experience for LessWrong, which isn't surprising since it's a much much bigger and more complicated codebase).
- Jim also has some not-great experience working on some non-Lightcone projects
AWS Beanstalk was a kind of terrible deployment/hosting service, or at least we never figured out how to use it properly. Our deploys would routinely take 20+ minutes, and then take another 20+ minutes to roll back, which means we had multiple instances of ~1 hour downtime that could have instead been a 5-minute downtime if deploys and rollbacks had been fast.
NextJS is a serverless framework. There are some developer experience benefits you get from restructuring things in a serverless way. The one I am most excited about is having preview deployments. PR review is much easier if every pull request just has a link to a deployed version of the app attached to it that you can click to visit, click around in to find any bugs or visual issues, and leave comments on directly.

There are some more reasons, but these are the big ones from my perspective.

[-]kave3mo190

Here are some other reasons, though I think they're a bit less central than the ones in Habryka's comment.

1.
I think current AI systems find it much easier to help with NextJS web apps than they did our sui generis palimpsest of frameworks and approaches. It's a bit unclear if this is on a trajectory to fix itself, but for now it seems like a relatively big difference. I think partly they're just way more familiar with this newer stuff, and partly serverless stuff is a bit more architecturally suited to LLMs making narrow changes.

2.
Another reason is that we had a lot of technical debt that we wanted to pay down. The project that became the hosting transfer was originally known as the "debungle"^[1].

The codebase had a bunch of very particular ways of doing things (like you weren't supposed to just write and export new React components, but call a registration function on them. You weren't supposed to write direct queries against our GraphQL server, but use a system of helpers).

I don't think this stuff is necessarily bad. But because Lightcone is largely composed of generalists, onboarding costs are a bit higher. If you have a blessed way to make a query, and that blessed way is itself changing (as it needs to shift for performance or feature reasons), someone who is working on LessWrong one month in three is paying more cost for keeping up with the internal, undocumented framework magic.

There have been several times I've asked a distracted Habryka what the Standard Way to do something in our codebase is, implemented his quick answer, only to get a PR review from Robert asking why I'm doing stuff in a semi-deprecated way.

3.
Habryka mentioned wanting to use newer React features. I think possibly a bigger issue was the transitive out-of-date dependencies you get if you stick on an old React. You can't update Material UI, you can't update some other library, some of them have security holes, so you vendor the old version and patch it by hand, ... That stuff starts to grow as a maintenance and jank burden over time.

In general, I'm pro things being crufty and janky and not spending too much time "rewriting thing to be nice", and a lot of the stuff I listed above can be worked around. I think probably my list alone wouldn't be worth the effort of the shift. To be clear, I'm unsure if the combination of my list, Habryka's, the other arguments I'm aware of, and the expected strength of the arguments I'm not aware of, overall make this a worthwhile shift. I'm guessing yes, but it's too soon to say.

^{^}
We had our eyes on a NextJS switch early on. But we thought it was valuable to do even without that.

[-]jimrandomh3mo112

My stance at the beginning was that the entire project was a mistake, and going through the process of actually doing it did not change my mind.

[-]habryka3mo40

It's true! May history judge who was right in the end.

[-]Raemon3mo156

My prediction is that a year from now Jim will still think it was a mistake and Habryka will still think it was a good call because they value different things.

[-]cousin_it2mo20

Yeah, I'm also a bit puzzled. Most features of a forum like LW can be implemented as concatenating HTML strings on the server, which is a very simple mental model, and has plenty of simple implementations that can run on generic hosting. The DOM-based mental model of React/Next doesn't seem to bring much benefit in this case, and carries a ton of overhead.

[-]habryka2mo20

I am sure that mental model has nothing to do with why Jim thinks this is/was a bad idea. I think we are all really quite happy we are built on React (or something of that family). Gluing HTML strings together would be a crazy nightmare.

[-]cousin_it2mo20

I see, yeah, then your team is a different culture than me. To me simple server side rendering (well not literally concatenating strings, but using templating and the like) is basically the only non-"crazy nightmare" way to build web stuff. While a lot of React stuff (like hooks, reducers, hydration) gives me crazy nightmare vibes. But since this isn't a programming forum, maybe not much use arguing :-)

[-]habryka2mo20

Yeah, I care a lot about client-side reactivity, which I think you just can't really achieve that way (unless you want to glue together javascript strings using templates, which I would not recommend).

I think people should just treat the web as an application platform. Doing a roundtrip for each piece of interactivity, or needing to pre-render each piece of interactivity is IMO really not viable at the complexity level of something like LW.

[-]cousin_it2mo*42

Yeah, this is maybe also about user taste. I use GW because it feels more like a website, while LW feels a bit too much like an application. There's a certain "website UI feel" that's distinct from "application UI feel" and makes me happier somehow. Though of course other people can feel differently.

[-]RobertM2mo20

(Also, to clarify, we were already on React - it's mostly other bits of framework glue that got tossed out/replaced/etc.)

[-]dbohdan3mo10

What do you think about the denial-of-wallet risk with this migration? From what I've read about Vercel on ServerlessHorrors (a partisan source) and in random internet comments, you can make costly mistakes with Vercel, but they'll waive the charges.

[-]jimrandomh3mo100

LW has a continuous onslaught of crawlers that will consume near-infinite resources if allowed (moreso than other sites, because of its deep archives), so we've already been through a bunch of iteration cycles on rate-limits and firewall rules, and we kept our existing firewall (WAF) in place. When stuff does slip through, while it's true that Vercel will autoscale more aggressively than our old setup, our old setup did also have autoscaling. It can't scale to too large a multiple of our normal size, before some parts of our setup that don't auto-scale (our postgres db) fall over and we get paged.

[-]habryka3mo20

Yeah, my model is if someone does this once they'll waive the charges. We already had autoscaling in our previous hosting context and both under the current setup and the previous setup people could DDos us if they want to take us down. Within a week or so we could likely switch things around to be robust against most forms of DDos (probably at some cost to user-experience and development experience).

If someone does this a lot, we can just turn on billing limits, and then go down instead of going bankrupt, which is roughly the same situation we were in before.

[-]Garrett Baker3mo*20

~~The intercom button no longer appears in the bottom right of the website. I'm using Firefox on Fedora (I also don't have intercom turned off in my settings).~~

Probably not relevant, but mentioning anyway: A bit ago I participated on LessWrong's test of an AI assistant to help find & summarize posts, which does still appear in the bottom right (though also it doesn't work anymore).

idk what changed, but its fixed now!

[-]RobertM3mo20

Concerning! Intercom shows up for me on Firefox (macos), will see if there's anything in the logs. How does the LLM integration being broken present itself?

[-]Garrett Baker3mo20

oh huh, the LLM integration is no longer broken, I noticed that a few weeks ago it wasn't communicating with the API, but now it seems fine.

[-]Nina Panickssery3mo*20

Saw this on mobile and then had to reload page (then it worked)

[-]habryka3mo20

We pushed some changes yesterday that should eliminate that bug!

[-]homosapien973mo10

Very minor: I saw "The Rise of Parasitic AI" twice in a row on my "Enriched" home page yesterday, the first instance with no special icons, the second with the three little stars. When I refreshed the page, the problem went away.

[-]mruwnik3mo20

I've also had double articles a couple of times over the last week or so, so it's probably not because of this?

I can't remember which articles this happened for, but I believe it might have been Four ways learning Econ makes people dumber re: future AI?

[-]RobertM3mo20

Yeah, sadly this is an existing bug.

^{^}

"Why" is a longer story that we might get into in the comments if people are curious.

^{^}

If something goes wrong here, some fraction of users might see downtime on the order of an hour, if they happen to be behind some piece of infrastructure which doesn't honor short DNS TTLs. (We've ensured that the DNS TTL is set to 60 seconds a few days ahead of time.)

^{^}

I expect a moderate performance hit for e.g. initially loading the front page on a new tab, maybe on the order of 20-30%, at least until we get around to optimizing that harder. If you're on a fast internet connection, this should hopefully not be very noticeable - our median front page load is under 1 second. If you run into something that seems to take longer than it used to, in a way that's perceptible to you, please err on the side of letting us know. It might end up being nothing, or might be too costly to fix in the short-term, but we can't perform triage if we don't know about the problems in the first place.

^{^}

Unless you've hidden it in your account settings.

LESSWRONG
LW

LESSWRONG
LW

40

LessWrong is migrating hosting providers (report bugs!)

40

40