We've identified two problems, both now fixed:

  1. memcached died. This significantly increased server load. Our monitoring noticed, but we've had a small plague through the Trike offices coinciding with some holiday leave and enough people were down sick that it took us about a day to respond.
  2. A very far future meetup surfaced a bug in our code. Every page that rendered the meetups sidebar module (= every page) caused extra server load. Our plague delayed our response to this too.

Apologies to all affected.
If you think you've diagnosed further problems, please direct message me or raise a ticket.

ETA: Site very slow at Fri Sep 16 18:35:23 UTC 2011 - we've obviously missed something.
ETA: Site fast now, and can't reproduce Friday's slowness. We're wait and watching - please let us know (see above) if you see any other problems.

Thanks for tracking this down!

Hooray! Hope everyone feels better soon.

Yeah. I convinced myself that the prior suggested that it wasn't just me. Yay adjusting for paranoia.

It seems to have been slow or broken more often than normally to me too.

I like that that url is so easy to remember. That way I'm not going to be trying to get it off of LW while LW is down.

I've been getting a lot of errors lately also. Sometimes the server just seems to time out. Other times I get the generic reddit error message. It seems that the error image file is also not being found when it does that. The current error messages also seem to be reddit specific (one version includes a link to the reddit store). Aside from fixing whatever is going on, it might be nice if we had error messages that were both more informative and more pertinent. It would be nice to be able to say "I got error message _" rather than "I got a stupid joke about free software which was randomly chosen by the server."

It's been slow or broken quite often in the past few days again. And it's not just me.

It is very slow for me currently.

Yeah, I've also just had problems in the last hour or so where it is repeatedly timing out.

It seems to have very recently become faster, so possibly some action has been taken.

There's a DNS record for but whatever machine is there doesn't seem to be serving pages. (When I try to connect to it, some proxy between me and it says it can't connect. downforeveryoneorjustme says it's down.)

I don't think you'd be able to get that to work. Different websites have different usual speeds, so you'd need a database of typical speeds. Maybe a browser plugin?

For me, yes, for the discussion area and user pages. The main posts area has seemed normal.

Does the site server software have any sort of monitoring interface? Can it, for instance, publish data about the number of page views on the different types of page, including error pages served? I notice there are stats being exported to Sitemeter, but that only breaks down by hour so far as I can tell, and doesn't report on error pages.

Was unable to load it for a couple hours today (11 AM - 1 PM, estimated). I would just get a blank page with no HTML source. OvercomingBias had some latency as well. At one point I could load the wiki but not main page.

(EDIT: Portland, Oregon. Pacific Daylight Time, GMT-7. Don't know the ISP work uses)

I have noticed the site often (but not always) loading very slowly. No error pages, but sometimes it seems to time out or only returns a blank page.

LW has seemed normal to me recently.

I was confused by the error message "Conde Nast certainly got their money's worth".

In case you didn't figure it out, LW uses the reddit codebase, and reddit used to be owned by Condé Nast.

Yup. I thought "error on LessWrong" not "error in the Reddit codebase".

I've gotten a lot of errors tonight, but not much else recently

I got an error page tonight.

It's been slow for me too.

Very slow. But my computer is a mess so I thought it was on my end.