[ Question ]

[Meta?] Using the LessWrong codebase for a blog

by George1 min read20th Dec 202018 comments

19

Software ToolsSite Meta
Personal Blog

I enjoy the experience of using less-wrong. I believe it's one of the best-designed websites I've ever had the pleasure of stumbling upon.

I also have a personal blog, and I've been considering revamping it. While I do enjoy coding from scratch, I've been considering using a pre-existing platform. 

Using WordPress plus some themes is always an option, but I feel like this would be an overkill since the "feel" of LW so closely aligns to my own tastes.

I'm curios if anyone else is doing this (using the LW codebase for their own project, be it a blog, a discussion form, or something else), and how it's going thus far ?

More specifically, I'd be curious:

  • What kind of machine (very roughly speaking) would you need to handle top volumes ~20,000 visitors/hr (say, max 3,000/min) without the core functionality breaking and ~500 visitors/hr (say, max 100/min) with top-notch user experience (assuming optimal db, distro and reverse proxy choices)
  • How hard is it to setup the sys admin side of things ? Deploying a prod server behind an nginx with a non-sqlite db and pointing it to your own cdn
  • How hard is it to modify the theme (e.g. fonts, color-scheme, icons) ?
  • How hard is it to integrate your own 3rd party service or get rid of them ? Specifically, add your own app cred for the signups and remove google analytics, intercom and all other 3rd party integrations that would make Richard stallman cry which aren't critical to the commenting experience.
  • Are the makers of LW explicitly fine and open to it being used by other people or is it open source mainly for the sake of community debugging?
  • What are particularly difficult/annoying/deal-breaking parts of the setup that were unexpected?

... Hopefully the question is not too out of place, it seemed to fit the form better than the repo.

19

New Answer
Ask Related Question
New Comment

3 Answers

I believe the only other [EDIT: fully independent] deployment of the LW codebase today is the EA forum: https://forum.effectivealtruism.org/

My guess is that it is massively overkill for an individual blog, and that both the level of complexity and the level of churn is not what you would want to deal with.

Have you considered either (a) blogging directly on LW or (b) modifying a simpler blogging engine to look like LW?

Most blogs have comment sections that would be much improved by LW style karma. It's also not something that can be easily gotten by a simpler engine. 

2jefftk1moIt is possible to have the LW comment section while hosting your own blog; that's what I do. For example, the comments shown on https://www.jefftk.com/p/heel-and-toe-drumming [https://www.jefftk.com/p/heel-and-toe-drumming] are the comments on the LW crosspost (https://www.lesswrong.com/posts/fBf3MFi3iFEY4AN44 [https://www.lesswrong.com/posts/fBf3MFi3iFEY4AN44]). I present mine in a deliberately minimalist way, though you could choose to style them identically to on LW. The main downside to this approach is that people have to click through to LW to vote or comment.
1George1moThere is remark42 (what I use currently) which is a plug and play comment system with upvotes and users and loads of auth. It doesn't have karma but I believe it would be trivial to implement, it has the problem of not supporting CSS, but again, fairly trivial to implement (or, at least, simpler than modifying LW). So probably having a semi-static blog + a remark42 mod as a comment system would replicate the effect (and that's what I might go for)

Good to know, I suspect you're right, since many people posting on LW have blogs, and I'm unlikely to be the first with this idea, I assume only 2 deployments in existence means it's annoying to maintain.

Indeed, I took a closer look at the thing yesterday eve and it did seem a bit, ahem, convoluted (not necessarily a bad thing, but I assume it takes a bit of time to get an intuitive feel for it).

Thanks for the feedback

6habryka1moThe AI Alignment forum is a separate server but runs on the same database, so about half of its infrastructure is shared (e.g. in terms of upkeep we only have to run migrations once for both the AIAF and the LW content, but the EA Forum team has to run them separately)

I am really glad to hear that you like the site!

Overall, my guess is for a blog where you don't expect many other contributors at the top-level, the current LW codebase is pretty overkill. I am already sad about how bloated our client-bundle is for our use-case, and my feeling is that things like our javascript execution time and bundle size and server overhead would be really overkill for a personal blog. 

There will also be bugs and things will break. We got everything running pretty well for our use-case, but sometimes the EA Forum runs into different problems than we do even though their use-case is only very slightly different, and that sometimes surfaces previously undiscovered bugs. For the EA Forum we have a close collaboration that usually allows us to address these reasonably quickly, but my guess is that wouldn't be an option for you.

As a concrete example: Spammers use all kinds of different tactics to post things. We covered most of the tactics that spammers use on LW in our defenses, but sometimes spammers use different tactics on the EA Forum (like editing wiki articles instead of creating comments or posts), and then we have to quickly build something that prevents that attack vector. Similar things might happen with a personal blog.

That said, because it seems good to have this written down and because I don't know the alternatives and the tradeoffs, here are answers to all your questions:

Are the makers of LW explicitly fine and open to it being used by other people or is it open source mainly for the sake of community debugging?

Yes, we are totally fine and even happy about other people using the code!

What kind of machine (very roughly speaking) would you need to handle top volumes ~20,000 visitors/hr (say, max 3,000/min) without the core functionality breaking and ~500 visitors/hr (say, max 100/min) with top-notch user experience (assuming optimal db, distro and reverse proxy choices)

This is maybe the part where it starts seeming like a bad idea to do this for a blog. Since we have a dev team of 4-5 people, all of which could make a decent salary as full-time engineers in the Bay Area, we really didn't optimize much to bring the server costs down (though this might change with some upcoming big refactors that drastically change how we deploy the site and how it runs). 

Our current AWS balance runs into about $600 a month for the main site, or something like $6k-$8k annually. This is of course fine for us to pay, given that our staff costs are much higher, but I would of course be a bit hesitant to pay this much for hosting a blog. My guess is to hit your targets, you could maybe pay half of that, so about $3-$5k annually. 

How hard is it to setup the sys admin side of things ? Deploying a prod server behind an nginx with a non-sqlite db and pointing it to your own cdn

I would be happy to help you get set up with stuff. We are currently deploying with AWS beanstalk, so there is basically just a file where you plop your DB credentials, your AWS credentials and your server settings, and then press the deploy button, and then the site should go up. The codebase doesn't currently touch on any CDN stuff. Images we deliver are currently uploaded to the CKEditor CDN (the people who built our editor framework), and I would be happy to give you a subaccount on that and bill you the costs from time to time (or if the usage is under a threshold of something like $30 a month just ignore it and let you use it for free). If you wanted to have your own image CDN for images in posts and comments, you could just deactivate the CKEditor upload plugin and then use externally hosted images in posts. 

How hard is it to modify the theme (e.g. fonts, color-scheme, icons) ?

The very basics of the theme (like the primary color and font) are in a single theme file and very easy to change. This will affect all the things that are different between LW and the EA Forum. 

Most other things would have to be changed in React and in the JSS + SCSS itself. 

How hard is it to integrate your own 3rd party service or get rid of them ? Specifically, add your own app cred for the signups and remove google analytics, intercom and all other 3rd party integrations that would make Richard stallman cry which aren't critical to the commenting experience.

This depends a lot on the specific service. Getting rid of our search service (Algolia) would probably be a big pain, since we use it it in a ton of different places around the codebase and all kinds of stuff break without it. Getting rid of Google Analytics is just a single line in a config file somewhere. Getting rid of Intercom is also just a single line, I am pretty sure. My guess is you can remove most of them pretty easily, but it might be that something turns out to be more of a pain.

What are particularly difficult/annoying/deal-breaking parts of the setup that were unexpected?

Things that you would probably currently find most annoying: 

  • In terms of using the site on a phone, it really isn't remotely as performant as it could be, and you will get complaints about this. We are improving this, but my guess is it will still be an issue.
  • A bunch of stuff you won't need will be a bit hard to deactivate. Like we have a PM system and an events system and a "put yourself on this map" system, and a "notify me of nearby events" system and a whole set of moderation tools that we use to keep tabs on new posters, and a bunch of code written for Petrov Day and a Question and Answer system and a wiki and tagging system, and a sequences system, and the whole annual review system and so many more small things. My guess is you won't need all of those, but getting rid of them might require some amount of work (I am not sure how much, but would be surprised if it turns out to be less than 40h of time to remove all of them cleanly).
  • We make lots of changes to the codebase, all over the place. Merging with our upstream codebase is a good amount of work if you have your own changes on top if it. Not sure how much, but my guess is it takes the EA Forum team something like 4-5h a week. If you make fewer changes it might be a lot faster, but you do have to keep up with the database migrations we are running, and sometimes we sunset features that nobody uses, and if you needed them, you might be stuck with maintaining them yourself. 

Here are some things that are currently annoying, but likely won't be issues in 1-2 months because we are in the middle of a big refactor that fixes them: 

  • Meteor really isn't a good choice these days. We used it because we built off of an existing forum system, but it will confuse and befuddle you in hundreds of ways.
  • Deployment takes a really long time (50 minutes) because of a dumb bug in a script we are using to deploy to AWS Beanstalk. 
  • Restarting a server after making a change also takes really far too long (~30 seconds on most machines). 

Our current AWS balance runs into about $600 a month for the main site, or something like $6k-$8k annually. This is of course fine for us to pay, given that our staff costs are much higher, but I would of course be a bit hesitant to pay this much for hosting a blog. My guess is to hit your targets, you could maybe pay half of that, so about $3-$5k annually.

To give some perspective on what someone might expect to need for a personal blog, a minimum virtual private server (VPS) runs about five dollars a month, and can easily handle 20k qph / 2k qpm if you... (read more)

1George1moThis number is on the money for me actually. With the traffic above (5k visitors day average plus front page HN + front page of some mid-sized subreddit spikes... which end up in the low x0,000k/hr)... I pay a grand total of ~30$/month (CDN included), but I use the same machine to run coordination scripts for my GPU machines and to host friend wordpress blogs. Then again, I optimized the website quite a bit, and the comment+auth system (remark42) is also a monster in terms of minimalism and caching.

Thanks a lot for the in-depth reply.

I must agree with you that there are a lot of dealbreakers for me there. But it's interesting to see what goes into deploying & maintaining the website.

I do think that if you guys refactor it, it might be quite nice to put a "generic" version of it out there, including some instructions like these for people that want to run it. There's a lack of well-executed open source community platforms at the moment.

This one may be messy and not the most efficient or prettiest in terms of the underlying code, but the user exper... (read more)

4habryka1moYeah, I strongly agree with this. There is a reason why we ended up building this one mostly from scratch, really none of the other options seemed very good. Not directly an answer to this suggestion, but I've definitely been considering spending a month or so packaging up what we have so far, and selling hosting + support for it as a way of financing the LW team and getting us more financial independence. That would also have the side-effect of having everything more nicely packaged up for other people like you to run their own versions. But I don't really know what the demand is, and I am worried it would make us a lot more hesitant to do small hacky one-off things that make sense to have on LessWrong, but don't make sense to have for all of our other users (like the annual review system).
2Viliam22dWould it be possible to split the current website into "core" (what other people might need) and "plugins" (currently one: the LW and EA specific functionality)? Each plugin could have its own tables, its own pages, and inject some HTML code into specified places in the "core" (or other) pages. The plugin support should not cover all hypothetically possible cases, only what is needed to make the current "plugin" run. I agree that thinking about how changes might impact other users makes development (and release management) way more complicated. Would probably need separate "stable" (bugfixes only) and "experimental" branches. But I find it interesting, how many people want to have online debates, and how much the existing solutions suck. I already gave up the idea of hosting my own web forum, because it seems like it would become a full-time job to moderate and fight spam -- and the choices are whether the "full-time job" would consist of developing and maintaining my own solution; or learning how to use an existing one, updating it constantly because every two weeks there is a critical security update, and coding my own plugins to fix or add functionality anyway; I am not even sure which option is less work in long term. This is really not good.
2habryka22dIt's possible, but I really expect it to be a lot of work. We have a lot of shared UI and small interleaving pieces of functionality all across the site, and I expect this would cut off a lot of the design space we could explore, or basically just force everyone to run the whole forum anyways. Like, we modify the code responsible for displaying post-items during the review to display the small "Review" button on the right. I don't know how to factor that out so that you don't even get that piece of code shipped, and these kinds of dependencies are all over the place. The comments UI depends on moderation guidelines in subtle ways, the recommendations for the frontpage depend on your view-history, but I would also like it to depend on your comment and vote-history, and this would become harder where everything is plugin-structured. The visual design philosophy of the site is also very much centered around minimalism and trying to only present you with the information you really need in any given moment, which often leads to small design changes based on context that I find hard to factor out (contrast this with the usual PhPBB forums that tend to just throw small independent widgets everywhere in order to customize your experience, leading to what I experience as massive information overload, but having the benefit of being much more modular and you can easily turn off whether you display the last-commented time of a thread, or the total number of comments, or the total number of contributors, etc.).
2Viliam22dThis is exactly what I imagined. Each plugin would have a function "insert my code" with two parameters: first would be the name of the place (in this case e.g. "article widgets"), second would be an object describing the context (in this case it could provide the article name, date, ID, URL). When a page wants to show an article, it calls this function for all registered plugins, in order they were registered. (There is one source file with hardcoded list of registered plugins.) Each plugin returns a HTML code, or empty value, and the page inserts the returned values. What places exist? The ones we currently use. What values are provided in the context? The ones we currently use. Later, expand as necessary. The other cases you mentioned sound more complicated...

How hard is it to setup the sys admin side of things ? Deploying a prod server behind an nginx with a non-sqlite db and pointing it to your own cdn

It's pretty poor, and also in flux. We use Meteor and mup-aws-beanstalk, but are dissatisfied with them (due to slow client-side Javascript initialization, slow development-cycle build time, and extremely slow server deploy time), so I'm converting it to esbuild and a yet-to-be-chosen deploy/CI setup. So if you deploy an instance of the LW codebase today, you will probably find that the recommended deployment mechanism has changed to something else entirely different a month from now. A month from now it'll hopefully be pretty good, though!

What kind of machine (very roughly speaking) would you need to handle top volumes ~20,000 visitors/hr (say, max 3,000/min) without the core functionality breaking and ~500 visitors/hr (say, max 100/min) with top-notch user experience (assuming optimal db, distro and reverse proxy choices)

Logged-out users visiting the front page and a few post pages (ie, getting Slashdotted) will all be served from the page cache, so pretty much only limited by bandwidth. Lesswrong itself runs on a dynamically scaling pool of t2.small instances (though one would probably be enough) and a MongoDB Atlas M30 cluster.

How hard is it to modify the theme (e.g. fonts, color-scheme, icons) ?

If you're familiar with CSS/JSS, this should be pretty straightforward.

How hard is it to integrate your own 3rd party service or get rid of them ? Specifically, add your own app cred for the signups and remove google analytics, intercom and all other 3rd party integrations that would make Richard stallman cry which aren't critical to the commenting experience.

OAuth, Google Analytics, and Intercom are all already used, so integrating them is just be a matter of getting an API key and putting it into the right config setting.

Are the makers of LW explicitly fine and open to it being used by other people or is it open source mainly for the sake of community debugging?

We are explicitly fine with this. We just haven't gotten around to optimizing it much for this use case.

What are particularly difficult/annoying/deal-breaking parts of the setup that were unexpected?

Site search is powered by Algolia, which is kind of expensive and not especially good.

Site search is powered by Algolia, which is kind of expensive and not especially good.

Why not use something like https://github.com/meilisearch/meilisearch ?

I had to use it for a project (completely unrelated) and integrating it into a website is a sub-hour task, it's also silly portable since with rust you can just get a statically compiled version of packages that just need a dynamically linked libc.

4habryka1moYeah, I expect we will eventually deploy our own version here, but Algolia was pretty straightforward to set up initially, and I was familiar with it from a smaller project earlier (it works great for small corpuses, and their free-tier is great, but the pricing is steep and the performance gets worse in larger corpuses). I appreciate the pointer to meilisearch. Will look into it.