[ Question ]

Why are all these domains called from Less Wrong?

by Viliam1 min read27th Jun 202011 comments


Site Meta
Personal Blog

When I visit a Less Wrong page, the browser also attempts to load content from the following domains:

* algolia.net
* algolianet.com
* cloudflare.com
* cloudinary.com
* dl.drop
* dropbox.com
* dropboxusercontent.com
* google.com
* googleapis.com
* googletagmanager.com
* intercom.io
* jsdelivr.net
* lr-ingest.io
* typekit.net

Why is it so? I don't want to advertise to half of internet (and specifically to Google) the fact that I read Less Wrong. What happens if I simply block all these domains? What service do they provide if I don't block them?

New Answer
Ask Related Question
New Comment

2 Answers

I'm not sure, but here are some guesses.

Algolia provides site search I believe, which seems reasonable. Cloudflare is generally used for DDOS protection, which is reasonable even if I personally think that Cloudflare is getting too monopolistic. Cloudinary is for image and video storage, probably for images in videos embedded in posts. I'm not sure about dl.drop, and I'm not sure why LW needs to use Dropbox. The Google connections are probably for analytics, although LW could and should definitely do that in-house. Intercom.io is for the messaging-with-developers that you can reach by clicking on the button in the bottom right corner of the screen. Jsdelivr.net is a caching service for Javascript, which helps you save internet bandwidth, which is reasonable. lr-ingest.io is apparently for analytics on user interaction with the site, which seems like a stretch for "does LW need it." Typekit.net (presumably) provides the fonts used, which is useful for caching although LW could also do it locally.

So to summarize, dl.drop, Dropbox, and some of the Google usage are for unknown reasons. Using Google analytics and lr-ingest.io make me uncomfortable personally. And Typekit and Jsdelivr provide marginal benefit at some cost, which aren't worth it in my opinion.

LessWrong developer here. Here's an overview of what all those domains are. The code is open source, so you should be able to verify these, with some effort.

Algolia (algolia.net, algolianet.com) is a service we use for site search (what you get when you click the magnifying glass icon on the top-bar). They have a mirror of all searchable data (ie non-draft posts and comments, tag pages, user bios); they receive a copy of searches that are performed through the site search box, which they can associate with IP addresses but not with usernames.

Cloudflare is a CDN that is hosting components of MathJax, the Javascript library that renders LaTeX in posts and comments, and some libraries we use for integrating MathJax with the comment/post editors. The CDN URLs were defaults that came with libraries we're using; we could probably move them to our own domain with a little effort. JsDelivr is hosting some things that similarly came with library defaults, as parts of MathJax3 and Algolia.

Cloudinary is an image-hosting CDN that we use for images in some posts and images that are part of the site UI.

dropbox.com and dropboxusercontent.com are hosting images that were used in posts, presumably because they were visible in the Recent Discussion section when you loaded the front page. Currently, when users insert images into posts, depending how they do it and which editor they're using, it may point to the original domain of the image. Also, for authors we have set up automatic crossposting for, the crossposts will use the original image URLs. We will hopefully switch this to always upload those images to Cloudinary and host them there instead, partially for privacy reasons but mostly to prevent link rot in archives of old posts.

dl.drop is not a valid domain name; it's either a broken image link in some post that was in Recent Discussion, or a typo in this post.

The Google domains are from Google Analytics, Google Tag Manager, Google Fonts, and ReCaptcha. Google Analytics and Google Tag Manager measure site traffic and aggregate usage patterns.

intercom.io is for the chat icon in the bottom-right corner, used for messaging the admins about the site.

lr-ingest.io is LogRocket. We (the devs) use it to see how the site is being used; we can watch anonymized replays of sessions (anonymized in that the username in the corner is edited out). As policy, we don't read people's direct messages or unpublished drafts, or deanonymize votes, though in principle we have the capability to (both with this tool or with direct database access).

TypeKit, aka Adobe Fonts, is a font library and font hosting service. We could probably consolidate this with one of the other CDNs being used, but font-hosting involves some user-agent-string based compatibility polyfills, which would be somewhat annoying to reproduce ourselves.