LESSWRONG
LW

662
Roman Malov
1979540
Message
Dialogue
Subscribe

Bachelor in general and applied physics. AI safety/Agent foundations researcher wannabe. 

I love talking to people, and if you are an alignment researcher we will have at least one common topic (but I am very interested in talking about unknown to me topics too!), so I encourage you to book a call with me: https://calendly.com/roman-malov27/new-meeting

Email: roman.malov27@gmail.com
GitHub: https://github.com/RomanMalov
TG channels (in Russian): https://t.me/healwithcomedy, https://t.me/ai_safety_digest

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
3Roman Malov's Shortform
9mo
33
Towards a scale-free theory of intelligent agency
Roman Malov3d*10

I would like at some point to develop a theory of an agent who has “other stuff to do” besides the decision problem presented to them. Maybe this agent has some macro-scale quantities, like the (current) amount of compute, the (current) speed of self-improvement, or the (current) rate of gaining utility (analogous to macro-scale variables like “temperature” and “pressure” in thermodynamics). So when you present this agent with a decision problem, it can decide that it’s not even worth its time, or it can spend years of time and gazillions of flops of compute if the query is actually worth it (though I expect the first version of the theory to only deal with queries much smaller than the overall stuff the agent deals with). Continuing the analogy with thermodynamics, I would like the macro-scale properties of the whole agent to somehow emerge from micro-scale properties of the decision problems it faces plus some uniformity assumptions.

I hope that this would help develop a scale-free theory of agency, in the same sense that thermodynamics is scale-free.

Reply
Roman Malov's Shortform
Roman Malov5d10

Your definition seems sensible to me. Humans are not bayesians, they are not built as probabilistic machines with all of their probability being put explicitly in the memory. So I usually think of Bayesian approximation, which is basically what you’ve said. It’s unconscious when you don’t try to model those beliefs as Bayesian and unconscious otherwise.

Reply
Roman Malov's Shortform
Roman Malov5d1013

Just as you can unjustly privilege a low-likelihood hypothesis just by thinking about it, you can in the exact same way unjustly unprivilege a high-likelihood hypothesis just by thinking about it. Example: I believe that when I press a key on a keyboard, the letter on the key is going to appear on the screen. But I do not consciously believe that; most of the time I don't even think about it. And so, just by thinking about it, I am questioning it, separating it from all hypotheses which I believe and do not question.

Some breakthroughs were in the form of "Hey, maybe something which nobody ever thought of is true," but some very important breakthroughs were in the form "Hey, maybe this thing which everybody just assumes to be true is false."

Reply
Roman Malov's Shortform
Roman Malov23d44

People often say, "Oh, look at this pathetic mistake AI made; it will never be able to do X, Y, or Z." But they would never say to a child who made a similar mistake that they will never amount to doing X, Y, or Z, even though the theoretical limits on humans are much lower than for AI.

Reply
Roman Malov's Shortform
Roman Malov24d20

Idea status: butterfly idea

In real life, there are too many variables to optimize each one. But if a variable is brought to your attention, it is probably important enough to consider optimizing it.

Negative example: you don’t see your eyelids; they are doing their job of protecting your eyes, so there’s no need to optimize them.

Positive example: you tie your shoelaces; they are the focus of your attention. Can this process be optimized? Can you learn to tie shoelaces faster, or learn a more reliable knot?

Humans already do something like this, but mostly consider optimizing a variable when it annoys them. I suggest widening the consideration space because the “annoyance” threshold is mostly emotional and therefore probably optimized for a world with far fewer variables and much smaller room for improvement (though I only know evolutionary psychology at a very surface level and might be wrong).

Reply
The Observer Effect for belief measurement
Roman Malov2mo20

To the first: I already addressed it in the "Why not just...?" part:

Add "hey I'm just probing you, please don't update on that query" in the query 


That might decrease the update a bit, but insofar if inquirer counterfactually adds that in cases they need the answer in some hypothesis-specific case the oracle would still update somewhat.

To the second: that one might actually work, I don't see an obvious way it fails. Perhaps only in the scenario with an extremely smart oracle, which could somehow predict the question you actually want to know the answer to. But at that point it would be hard to stop it from updating on anything, so updating on the query would be the least of your problems. Though it only gives you an answer to 1 question traded for 1000 queries. If we want full distribution, that would require 1000*#of_hypotheses queries, which is O(n) and beats my O(n!) suggestion, but it is still far from ~1 query per hypothesis (which would be ideal).

Reply
Doomsday Argument and the False Dilemma of Anthropic Reasoning
Roman Malov2mo10

P(First|Second)

I think you meant P(Second|Find) here.

Reply
LessWrong Feed [new, now in beta]
Roman Malov2mo10
Reply
Daniel Kokotajlo's Shortform
Roman Malov2mo10

I am a bit confused about what 10x slowdown means. I assumed you meant going from eλt to e0.1λt on R&D coefficient, but the definition from the comment by @ryan_greenblatt seems to imply going from eλt to 0.1eλt (which, according to AI 2027 predictions, would result in a 6-month delay).

The definition I'm talking about:

8x slowdown in the rate of research progress around superhuman AI researcher level averaged over some period

Reply
LessWrong Feed [new, now in beta]
Roman Malov2mo30

i.e. to see fresh comments

Reply1
Load More
No wikitag contributions to display.
9Two Types of (Human) Uncertainty
1mo
2
8The Observer Effect for belief measurement
2mo
4
12An Analogy for Interpretability
3mo
2
5Question to LW devs: does LessWrong tries to be facebooky?
4mo
1
12Could we go another route with computers?
Q
4mo
Q
4
5Neuron Activations to CLIP Embeddings: Geometry of Linear Combinations in Latent Space
7mo
0
9Is "hidden complexity of wishes problem" solved?
Q
8mo
Q
4
3Roman Malov's Shortform
9mo
33
25Visual demonstration of Optimizer's curse
10mo
3