AI safety and cybersecurity

Wiki Contributions


The link of "this is a linkpost for" is not the correct one

Here are the same two GIFs but with a consistent speed (30ms/frame) and an infinite loop, in case anyone else is interested for e.g. presentations:

CoinRun training (in-distribution) CoinRun test (out-of-distribution)

I think you can guess a reasonable answer even as a complete outsider (like me), considering the purpose of these funds, which is to maximize the amount of expected good they cause by the allocation of their money. A few things that must come into consideration:

  1. is it productive to pay very good researchers the bare minimum they need to survive? No:
    • it consequently make the path of independent research unattractive to most;
    • it produces some level of adverse selection in future applicants, i.e. you'll move toward getting more applications from people who couldn't really be hired in the industry, which isn't always bad but could be thought to correlate with not being an excellent researcher;
    • you don't want your researchers to waste too much time and energy on things that could be straightforwardly solved with a bit more money.
  2. on the other hand, paying too much has a few issues:
    • the money spent on a researcher hitting decreasing returns after a certain point;
    • also some level of adverse selection, if people know they can make the same amount of money here or in the industry, and your fund's approval process is lighter than industry interviews due to a lack of resources and a hits-based giving approach: you might get an influx of candidates who aren't altruistically motivated and were rejected from the industry for perhaps good reasons;
    • damaging the reputation of the movement by appearing to allocate your money frivolously.

In terms of existing data, you can have a look at past public payouts, for example the December 2021 LTFF report. In it, you'll find a 6-month salary of $50,000 for a well-known independent researcher, which is above the bare minimum needed to survive, probably allowing for some level of travel and luxury, while staying well below industry standards.

Google can decide to verify existing accounts. I think it's likely that in its efforts to avoid platforming spam, cybercrime and astroturfing, Google may decide to leave existing "reputable" accounts alone, but will ask for e.g. a phone number or other KYC for existing accounts that were basically never used, as an easy fix to the issue you're describing here.

Another mitigation strategy against the "LLM learns steganography from other LLM's output" problem is to add some noise to the training data (performing a number of transformations that preserve meaning) prior to training, to break the embedded steganography schemes. I feel like this should be relatively easy and effective?

Note that the loss would obviously be worse on the outputs of the previous LLM.

Which makes me think that comparing losses could be a way to detect that some text contains steganography, even if we can't pinpoint how exactly (but our transformations break it).

EDIT: in case of disagreement, I can try being more specific about my claim that this should be relatively easy and effective, using concrete examples. Basically I believe that after scrambling, there can only remain a small number of bits of steganographic information per token, even for a steganographic encoding that originally used the null space of natural language optimally.


Thanks for the replies and sorry for the inaccuracies. I initially reported 4,331 blog posts and 890k words; the real results are that Robin wrote 3,302 blog posts (thanks DominikPeters for pointing this out, and for finding these better urls) and 1.5M words.

(4,331 blog posts corresponds to all authors on overcomingbias. 890k words doesn't represent anything, because the posts were truncated when accessed from the monthly archive urls.)

# Get the real number of words from Robin
$ n_current_pages=331
$ echo > /tmp/page_urls
$ for i in $(seq 2 $n_current_pages); do echo$i >> /tmp/page_urls; done
$ getwords() { curl $1 | pup '#content' | html2text --ignore-links | wc -w; }
$ export -f getwords
$  parallel getwords < /tmp/page_urls > /tmp/words_by_page
$ awk '{sum += $1} END {print sum}' /tmp/words_by_page


Scoping: 4331 blog posts and 890k words (for overcomingbias only).

# Number of blog posts
$ curl | pup '#monthly-archives' | rg '\(\d+\)' | tr -d ' ()' | awk '{sum += $1} END {print sum}'

# Rough number of words (bash)
$ curl | pup '#monthly-archives a attr{href}' > /tmp/urls_monthly_archives
$ getwords() { curl $1 | pup '#content' | html2text --ignore-links | wc -w; }
$ export -f getwords
$ parallel getwords < /tmp/urls_monthly_archives > /tmp/words_per_month
$ awk '{sum += $1} END {print sum}' /tmp/words_per_month

Interesting... Can you tell more about how your self-control training looked like? Like when in the day, how long, how hard, what tasks, etc? Was the most productive period in your life during or after this training? Why did you stop?

To carry on with the strength training comparison, we're usually trying to achieve a maximum deployed strength over our lifetime. Perhaps we're already deploying as much strength as we can every day for useful tasks, so that adding strength training on pointless tasks would remove strength from the other tasks?

In my opinion, the 4 killer features of vim beyond moving around and basic editing are:

  • macros
  • the s command used with regular expressions
  • the g command (see e.g. Power of g)
  • the ability to run text through any Unix utility (Vim and the shell)

If you know your Unix utilities (I often use awk inside of vim; it's also great outside of it), your regular expressions, and these features, you can sculpt any text as you wish. What I mean is that you can take the output of pretty much anything, in a messed up format, and transform it into the content and format you want. This is supposed to be inspiring but I'm not sure how good a job I'm doing.

Also, if anyone's interested, here are my current vim Anki cards. I use Anki for keyboard shortcuts which are supposed to be muscle memory, AMA

"Twitter" has a high variance. For some people (probably the vast majority of them), the comparison to smoking is certainly relevant; for a few others, Twitter is very beneficial. Here are a few variables that I think have a huge impact on the overall value a user derives from Twitter:

  1. Who are you? What's your mental state like?
  2. What do you want from Twitter (read thoughtful discussions? participate in them? be aware of relevant news and opportunities? meet like-minded people? influence people? increase your follower count / gain status? escape from boredom? etc)?
  3. Who do you follow, unfollow, mute, partially mute?
  4. When and how often do you engage with Twitter?
  5. How frequently are you reading vs writing publicly vs writing DMs?
  6. When you're only reading, how fast are you scrolling and how much are you thinking?
  7. What client do you use?
  8. Do you read tweets from your home timeline, from lists or from specific users' profiles?
  9. Are you continually trying to improve the quality of your experience?

I'm probably missing many others.

Meta-problems in general [...] are issues outside the Overton window.

Does anyone have a theory about why this is the case? Thinking out loud:

  • voting systems: I guess any mention from a politician would be immediately dismissed as agenda-based. And well, probably any mention from anyone. Making changes to political systems also has a Chesterton's fence component: we know how our system behaves, we don't know how this new system would behave, and we know from history that we should be quite happy to have political systems that kind of work and haven't already led us to totalitarianism. I'm not sure this aspect contributes to making these ideas outside of the Overton window, though.

  • academic science being focused more on prestige than on producing knowledge: it's about prestige and social status, which are a bit taboo? Fear of fueling science deniers? Respectable newspapers have themselves been pursuing prestige for decades?

  • other relevant meta-problems?

In general, if you talk about changing the rules of the game you're playing in your personal career, you'll be dismissed as making up new rules just to get an edge; if you talk about rules you're not playing, you're not an expert.

Any thoughts?

Load More