LESSWRONG
LW

2357
reallyeli
420Ω22450
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
3reallyeli's Shortform
7mo
13
AI Safety Field Growth Analysis 2025
reallyeli4d30

While I like the idea of the comparison, I don't think the gov't definition of "green jobs" is the right comparison point. (e.g. those are not research jobs)

Reply2
leogao's Shortform
reallyeli15d40

one very easy way to trick our own calibration sensors is to add a bunch of caveats or considerations that make it feel like we've modeled all the uncertainty (or at least, more than other people who haven't). so one thing i see a lot is that people are self-aware that they have limitations, but then over-update on how much this awareness makes them calibrated

Agree, and well put. I think the language of "my best guess" "it's plausible that" etc. can be a bit thought-numbing for this and other reasons. It can function as plastic bubble wrap around the true shape of your beliefs, preventing their sharp corners from coming into contact with reality. Thoughts coming into contact with reality is good, so sometimes I try to deliberately strip away my precious caveats when I talk.

I most often to this when writing or speaking to think, not to communicate, since by doing this you pay the cost of not communicating your true confidence level which can of course be bad.

Reply
reallyeli's Shortform
reallyeli1mo10

(This is a brainstorm-type post which I'm not highly confident in, putting out there so I can iterate. Thanks for replying and helping me think about it!)

I don't mean that the entire proof fits into working memory, but that the abstractions involved in the proof do. Philosophers might work with a concept like "the good" which has a few properties immediately apparent but other properties available only on further deep thought. Mathematicians work with concepts like "group" or "4" whose properties are immediately apparent, and these are what's involved in proofs. Call these fuzzy / non-fuzzy concepts.

(Philosophers often reflect on their concepts, like "the good," and uncover new important properties, because philosophy is interested in intuitions people have from their daily experience. But math requires clear up-front definitions; if you reflect on your concept and uncover new important properties not logically entailed from the others, you're supposed to use a new definition.)

Reply
reallyeli's Shortform
reallyeli1mo10

Human minds form various abstractions over our environment. These abstractions are sometimes fuzzy (too large to fit into working memory) or leaky (they can fail).

Mathematics is the study of what happens when your abstractions are completely non-fuzzy (always fit in working memory) and completely non-leaky (never fail). And also the study of which abstractions can do that.

Reply
plex's Shortform
reallyeli1mo98

I think this is a good metaphor, but note that it is still very possible to be a dick, hurt other people, etc. while communicating in NVC style. It's not a silver bullet because nothing is.

Reply1
reallyeli's Shortform
reallyeli2mo32

It might be important for AI strategy to track approx how many people have daily interactions with AI boyfriends / girlfriends. Or, in more generalized form, how many people place a lot of emotional weight and trust in AIs (& which ones they trust, and on what topics).

This could be a major vector via which AIs influence politics, get followers to do things for them, and generally cross the major barrier. The AIs could be misaligned & scheming, or could be acting as tools of some scheme-y humans, or somewhere in between.

(Here I'm talking about AIs which have many powerful capabilities, but aren't able to act on the world themselves e.g. via nanotechnology or robot bodies — this might happen for a variety of reasons.)

Reply
How Does A Blind Model See The Earth?
reallyeli2mo12467

If any post ever deserved the "World modeling" tag it's this one.

Reply34
reallyeli's Shortform
reallyeli2mo13-5

LLMs are trained on a human-generated text corpus. Imagine an LLM agent deciding whether or not to be a communist. Seems likely (though not certain) it would be strongly influenced by the existing human literature on communism, i.e. all the text humans have produced about communism arguing its pros/cons and empirical consequences.

Now replace 'communism' with 'plans to take over.' Humans have also produced a literature on this topic. Shouldn't we expect that literature to strongly influence LLM-based decisions on whether to take over?

This is an argument I'm more confident in. Now an argument I'm less confident in.

'The literature' would seem to have a stronger effect on LLMs than it does on humans. Their knowledge is more crystallized, less abstract, more like "learning to play the persona that does X" rather than "learning to do X." So maybe at the time LLM agents have other powerful capabilities, their thinking on whether to take over will still be very 'stuck in its ways' i.e. very dependent on 'traditional' ideas from the human text corpus. It might not be able to see beyond the abstractions and arguments used in the human discourse, even if they're flawed.

Per both arguments, if you're training an LLM, you might want to care a lot about what its training data says about the topic of AIs taking over from humans.

Reply
reallyeli's Shortform
reallyeli4mo30

A Berkeley professor speculates that LLMs are doing something more like "copying the human mind" than "learning from the world." This seems like it would imply some things we already see (e.g. fuzzily, they're "not very creative"), and it seems like it would imply nontrivial things for what we should expect out of LLMs in the future, though I'm finding it hard to concretize this.

That is, if LLMs are trained with a simple algorithm and acquire functionality that resembles that of the mind, then their underlying algorithm should also resemble the algorithm by which the mind acquires its functionality. However, there is one very different alternative explanation: instead of acquiring its capabilities by observing the world in the same way as humans, LLMs might acquire their capabilities by observing the human mind and copying its function. Instead of implementing a learning process that can learn how the world works, they implement an incredibly indirect process for scanning human brains to construct a crude copy of human cognitive processes.

Reply
reallyeli's Shortform
reallyeli6mo120

If AI turns out to be very useful for cheaply writing formally verified code, what does that do for AI control? We can now request that the untrusted AI produce along with any code it writes a spec and certificate verifying that the code matches the spec.

How bad of a position does this put the untrusted model in, when it's trying to write code that does bad stuff? Some sub-questions towards answering that question:

  • How tight are the constraints that the formal verification process, at its AI-driven heights, can put on the code? Seems like once something is formally verified at all this rules out large classes of vulnerabilities an AI might exploit. Can we push this further to remove almost all attack vectors?
  • How easy is it to write a spec that looks good to humans, but actually contains a significant omission?
  • How feasible is it for the spec writer to collude with the code writer?
Reply
Load More
No wikitag contributions to display.
3reallyeli's Shortform
7mo
13
9Funding for programs and events on global catastrophic risk, effective altruism, and other topics
1y
0
16Funding for work that builds capacity to address risks from transformative AI
1y
0
36Are "superforecasters" a real phenomenon?
6y
29