That would be really cool.
I don't understand. The fact that every single node has a Markov blanket seems unrelated. The claim that the intersection of any two blankets is a blanket doesn't seem true? For example, I can have a network:
a -> b -> c
| | |
v v v
d -> e -> f
| | |
v v v
g -> h -> i
It seems like the intersection of the blankets for 'a' and 'c' don't form a blanket.
I find your attempted clarification confusing.
Our model is going to have some variables in it, and if we don't know in advance where the agent will be at each timestep, then presumably we don't know which of those variables (or which function of those variables, etc) will be our Markov blanket.
No? A probabilistic model can just be a probability distribution over events, with no "random variables in it". It seemed like your suggestion was to define the random variables later, "on top of" the probabilistic model, not as an intrinsic part of the m... (read more)
Okay, so you know how AI today isn't great at certain... let's say "long-horizon" tasks? Like novel large-scale engineering projects, or writing a long book series with lots of foreshadowing?(Modulo the fact that it can play chess pretty well, which is longer-horizon than some things; this distinction is quantitative rather than qualitative and it’s being eroded, etc.)And you know how the AI doesn't seem to have all that much "want"- or "desire"-like behavior?(Modulo, e.g., the fact that it can play chess pretty well, which indicates a certain type of want
Okay, so you know how AI today isn't great at certain... let's say "long-horizon" tasks? Like novel large-scale engineering projects, or writing a long book series with lots of foreshadowing?
(Modulo the fact that it can play chess pretty well, which is longer-horizon than some things; this distinction is quantitative rather than qualitative and it’s being eroded, etc.)
And you know how the AI doesn't seem to have all that much "want"- or "desire"-like behavior?
(Modulo, e.g., the fact that it can play chess pretty well, which indicates a certain type of want
Ahhh that makes sense, thanks.
... I was expecting you'd push back a bit, so I'm going to fill in the push-back I was expecting here.
Sam's argument still generalizes beyond the case of graphical models. Our model is going to have some variables in it, and if we don't know in advance where the agent will be at each timestep, then presumably we don't know which of those variables (or which function of those variables, etc) will be our Markov blanket. On the other hand, if we knew which variables or which function of the variables were the blanket, then presumably we'd already know where t... (read more)
This topic came up when working on a project where I try to make a set of minimal assumptions such that I know how to construct an aligned system under these assumptions. After knowing how to construct an aligned system under this set of assumptions, I then attempt to remove an assumption and adjust the system such that it is still aligned. I am trying to remove the cartesian assumption right now.
I would encourage you to consider looking at Reflective Oracles next, to describe a computationally unbounded agent which is capable of thinking about worlds whic... (read more)
You can compute everything that takes finite compute and memory instantly. (This implies some sense of cartesianess, as I am sort of imagining the system running faster than the world, as it can just do an entire tree search in one "clock tick" of the environment.)
This part makes me quite skeptical that the described result would constitute embedded agency at all. It's possible that you are describing a direction which would yield some kind of intellectual progress if pursued in the right way, but you are not describing a set of constraints such that... (read more)
Amusingly, searching for articles on whether offering unlicensed investment advice is illegal (and whether disclaiming it as "not investment advice" matters) brings me to pages offering "not legal advice" ;p
Also, to be clear, nothing in this post constitutes investment advice or legal advice.
(Also I know enough to say up front that nothing I say here is Investment Advice, or other advice of any kind!)
None of what I say is financial advice, including anything that sounds like financial advice.
I usually interpret this sort of statement as an invocation to the gods of law, something along the lines of "please don't smite me", and certainly not intended literally. Indeed, it seems incongruous to interpret it literally here: the whole p... (read more)
I'm looking at the Savage theory from your own https://plato.stanford.edu/entries/decision-theory/ and I see U(f)=∑u(f(si))P(si), so at least they have no problem with the domains (O and S) being different. Now I see the confusion is that to you Omega=S (and also O=S), but to me Omega=dom(u)=O.
(Just to be clear, I did not write that article.)
I think the interpretation of Savage is pretty subtle. The objects of preference ("outcomes") and objects of belief ("states") are treated as distinct sets. But how are we supposed to think about this?
It remains totally unclear to me why you demand the world to be such a thing.
Ah, if you don't see 'worlds' as meaning any such thing, then I wonder, are we really arguing about anything at all?
I'm using 'worlds' that way in reference to the same general setup which we see in propositions-vs-models in model theory, or in Ω vs the σ-algebra in the Kolmogorov axioms, or in Kripke frames, and perhaps some other places.
We can either start with a basic set of "worlds" (eg, Ω) and define our "propositions" or "events" as sets of worlds, ... (read more)
My point is only that U is also reasonable, and possibly equivalent or more general. That there is no "case against" it.
I do agree that my post didn't do a very good job of delivering a case against utility functions, and actually only argues that there exists a plausibly-more-useful alternative to a specific view which includes utility functions as one of several elements.
Utility functions definitely aren't more general.
A classical probability distribution over Ω with a utility function understood as a random variable can easily be c... (read more)
In my personal practice, there seems to be a real difference -- "something magic happens" -- when you've got an actual audience you actually want to explain something to. I would recommend this over trying to simulate the experience within personal notes, if you can get it. The audience doesn't need to be 'the public internet' -- although each individual audience will have a different sort of impact on your writing, so EG writing to a friend who already understands you fairly well may not cause you to clarify your ideas in the same way as writing to strang... (read more)
I agree that it makes more sense to suppose "worlds" are something closer to how the agent imagines worlds, rather than quarks. But on this view, I think it makes a lot of sense to argue that there are no maximally specific worlds -- I can always "extend" a world with an extra, new fact which I had not previously included. IE, agents never "finish" imagining worlds; more detail can always be added (even if only in separate magisteria, eg, imagining adding epiphenomenal facts). I can always conceive of the possibility of a new predicate beyond all the predi... (read more)
I also wrote a huge amount in private idea-journals before I started writing publicly. There was also an intermediate stage where I wrote a lot on mailing lists, which felt less public than blogging although technically public.
Even if I conceded this point, which is not obvious to me, I would still insist on the point that different speakers will be using natural language differently and so resorting to natural language rather than formal language is not universally a good move when it comes to clarifying disagreements.
Well, more importantly, I want to argue that "translation" is happening even if both people are apparently using English.
For example, philosophers have settled on distinct but related meanings for the terms "probability", "credence", "chance", "frequen... (read more)
I disagree. For tricky technical topics, two different people will be speaking sufficiently different versions of English that this isn't true. Vagueness and other such topics will not apply equally to both speakers; one person might have a precise understanding of decision-theoretic terms like "action" and "observation" while the other person may regard them as more vague, or may have different decision-theoretic understanding of those terms. Simple example, one person may regard Jeffrey-Bolker as the default framework for understanding agents, while the ... (read more)
"Weak methods" means confidence is achieved more empirically, so there's always a question of how well the results will generalize for some new AI system (as we scale existing technology up or change details of NN architectures, gradient methods, etc). "Strong methods" means there's a strong argument (most centrally, a proof) based on a detailed gears-level understanding of what's happening, so there is much less doubt about what systems the method will successfully apply to.
The question seems too huge for me to properly try to answer. Instead, I want to note that academics have been making some progress on models which are trying to do something similar to, but perhaps subtly different from, Paul's reflective probability distribution you cite.
The basic idea is not new to me -- I can't recall where, but I think I've probably seen a talk observing that linear combinations of neurons, rather than individual neurons, are what you'd expect to be meaningful (under some assumptions) because that's how the next layer of neurons looks at a layer -- since linear combinations are what's important to the network, it would be weird if it turned out individual neurons were particularly meaningful. This wasn't even surprising to me at the time I first learned about it.
But it's great to see it illustrated so w... (read more)
Yeah. For my case, I think it should be assumed that the meta-logics are as different as the object-logics, so that things continue to be confusing.
As I mentioned here, if Alice understands your point about the power of the double-negation formulation, she would be applying a different translation of Bob's statements from the one I assumed in the post, so she would be escaping the problem. IE:
part of the beauty in the double-negation translation is that all of classical logic is valid under it.
is basically a reminder to Alice that the translation back from double-negation form is trivial in her own view (since it is classically equivalent), and all of Bob's intuitionistic moves are also classica... (read more)
I'm interested in concrete advice about how to resolve this problem in a real argument (which I think you don't quite provide), but I'm also quite interested in the abstract question of how two people with different ontologies can communicate. Normally I think of the problem as one of constructing a third reference frame (a common language) by which they can communicate, but your proposal is also interesting, and escapes the common-language idea.
That's an interesting point, but I have a couple of replies.
Would you count all the people who worked on the EU AI act?
Ah, not yet, no.
Almost no need to read it. :)
fwiw, I did skim the doc, very briefly.
The main message of the paper is along the lines of "a." That is, per the claim in the 4th pgph, "Effective legal systems are the best way to address AI safety." I'm arguing that having effective legal systems and laws are the critical things. How laws/values get instilled in AIs (and humans) is mostly left as an exercise for the reader. Your point about "simply outlawing designs not compatible" is reasonable.The way I put it in the paper (sect. 3, pgph. 2): "Many of the proposed non-law-b
The main message of the paper is along the lines of "a." That is, per the claim in the 4th pgph, "Effective legal systems are the best way to address AI safety." I'm arguing that having effective legal systems and laws are the critical things. How laws/values get instilled in AIs (and humans) is mostly left as an exercise for the reader. Your point about "simply outlawing designs not compatible" is reasonable.
The way I put it in the paper (sect. 3, pgph. 2): "Many of the proposed non-law-b
I feel as if there is some unstated idea here that I am not quite inferring. What is the safety approach supposed to be? If there were an organization devoted to this path to AI safety, what activities would that organization be engaged in?
Seth Herd interprets the idea as "regulation". Indeed, this seems like the obvious interpretation. But I suspect it misses your point.
Enacting and enforcing appropriate laws, and instilling law-abiding values in AIs and humans, can mitigate risks spanning all levels of AI capability—from narrow AI to AGI and ASI. If inte
I've found that "working memory" was coined by Miller, so actually it seems pretty reasonable to apply that term to whatever he was measuring with his experiments, although other definitions seem quite reasonable as well.
The term "working memory" was coined by Miller, and I'm here using his definition. In this sense, I think what I'm doing is about as terminologically legit as one can get. But Miller's work is old; possibly I should be using newer concepts instead.
When I took classes in cog sci, this idea of "working memory" seemed common, despite coexistence with more nuanced models. (IE, speaking about WM as 7±2 chunks was common and done without qualification iirc, although the idea of different memories for different modalities was also discussed. Since this number is determined by experiment, not neuroanatomy, it's inherently an operationalized concept.) Perhaps this is no longer the case!
You first see Item X and try to memorize it in minute 3. Then you revisit it in minute 9, and it turns out that you’ve already “forgotten it” (in the sense that you would have failed a quiz) but it “rings a bell” when you see it, and you try again to memorize it. I think you’re still benefitting from the longer forgetting curve associated with the second revisit of Item X. But Item X wasn’t “in working memory” in minute 8, by my definitions.
One way to parameterize recall tasks is x,y,z = time you get to study the sequence, time between in which you must ma... (read more)
I'm not sure what the takeaway is here, but these calculations are highly suspect. What a memory athlete can memorize (in their domain of expertise) in 5 minutes is an intricate mix of working memory and long-term semantic memory, and episodic (hippocampal) memory.
I'm kind of fine with an operationalized version of "working memory" as opposed to a neuroanatomical concept. For practical purposes, it seems more useful to define "working memory" in terms of performance.
(That being said, the model which comes from using such a simplified concept is bad, which ... (read more)
2016 bits of memory and about 2016 bits of natural language per minute really means that if our working memory was perfectly optimized for storing natural language and only natural language, it could store about one minute of it.
I have in mind the related claim that if natural language were perfectly optimized for transmitting the sort of stuff we keep in our working memory, then describing the contents of our working memory would take about a minute.
I like this version of the claim, because it's somewhat plausible that natural language is well-optimized t... (read more)
Per your footnote 6, I wouldn't expect that the whole 630-digit number was ever simultaneously in working memory.
How would you like to define "simultaneously in working memory"?
The benefit of an operationalization like the sequential recall task is concreteness and easily tested predictions. I think if we try to talk about the actual information content of the actual memory, we can start to get lost in alternative assumptions. What, exactly, counts as actual working memory?
One way to think about the five-minute memorization task which I used for my calculation is that it measures how much can be written to memory within five minutes, but it does little to test memory volatility (it doesn't tell us how much of the 630-digit number would have been forgotten after an hour with no rehearsal). If by "short-term memory" we mean memory which only lasts a short while without rehearsal, the task doesn't differentiate that.
So, "for all we know" from this test, the information gets spread across many different types of memory, so... (read more)
However, this way of thinking about it makes it tempting to think that the memory athlete is able to store a set number of bits into memory per second studying; a linear relationship between study time and the length of sequences which can be recalled. I doubt the relationship is that simple.
Yeah this website implies that it’s sublinear—something like 50% more content when they get twice as long to study? Just from quickly eyeballing it.
In order to keep a set of information "in working memory" in this paradigm is to keep rehearsing it at a spaced-repetitio
I don't think my reasoning was particularly strong there, but the point is less "how can you use gradient descent, a supervised-learning tool, to get unsupervised stuff????" and more "how can you use Hebbian learning, an unsupervised-learning tool, to get supervised stuff????"
Autoencoders transform unsupervised learning into supervised learning in a specific way (by framing "understand the structure of the data" as "be able to reconstruct the data from a smaller representation").
But the reverse is much less common. EG, it would be a little weird to a... (read more)
I have not thought about these issues too much in the intervening time. Re-reading the discussion, it sounds plausible to me that the evidence is compatible with roughly brain-sized NNs being roughly as data-efficient as humans. Daniel claims:
If we assume for humans it's something like 1 second on average (because our brains are evaluating-and-updating weights etc. on about that timescale) then we have a mere 10^9 data points, which is something like 4 OOMs less than the scaling laws would predict. If instead we think it's longer, then the gap in dat
This post proposes to make AIs more ethical by putting ethics into Bayesian priors. Unfortunately, the suggestions for how to get ethics into the priors amount to existing ideas for how to get ethics into the learned models: IE, learn from data and human feedback. Putting the result into a prior appears to add technical difficulty without any given explanation for why it would improve things. Indeed, of the technical proposals for getting the information into a prior, the one most strongly endorsed by the post is to use the learned model as initial weights... (read more)
It seems unfortunate to call MATA "the" multidisciplinary approach rather than "a" multidisciplinary approach, since the specific research project going by MATA has its own set of assumptions which other multidisciplinary approaches need not converge on.
What about something like "The pupil won't find a proof by start-of-day, that the day is exam day, if the day is in fact exam day."
This way, the teacher isn't denying "for any day", only for the one exam day.
Can such a statement be true?
Well, the teacher could follow a randomized strategy. If the teacher puts 1/5th probability on each weekday, then there is a 1/5th chance that the exam will be on Friday, so the teacher will "lose" (will have told a lie), since the students will know it must be exam day. But this leaves a 4/5ths chance of success.
Perh... (read more)
I don't think this works very well. If you wait until a major party sides with your meta, you could be waiting a long time. (EG, when will 321 voting become a talking point on either side of a presidential election?) And, if you get what you were waiting for, you're definitely not pulling sideways. That is: you'll have a tough battle to fight, because there will be a big opposition.
Adding long-term memory is risky in the sense that it can accumulate weirdness -- like how Bing cut off conversation length to reduce weirdness, even though the AI technology could maintain some kind of coherence over longer conversations.
So I guess that there are competing forces here, as opposed to simple convergent incentives.
Probably no current AI system qualifies as a "strong mind", for the purposes of this post?
I am reading this post as an argument that current AI technology won't produce "strong minds", and I'm pushing back against this argument. EG... (read more)
It's been a while since I reviewed Ole Peters, but I stand by what I said -- by his own admission, the game he is playing is looking for ergodic observables. An ergodic observable is defined as a quantity such that the expectation is constant across time, and the time-average converges (with probability one) to this average.
This is very clear in, EG, this paper.
The ergodic observable in the case of kelly-like situations is the ratio of wealth from one round to the next.
The concern I wrote about in this post is that it seems a bit ad-hoc to rummage ar... (read more)
It's imaginable to do this work but not remember any of it, i.e. avoid having that work leave traces that can accumulate, but that seems like a delicate, probably unnatural carving.
Is the implication here that modern NNs don't do this? My own tendency would be to think that they are doing a lot of this -- doing a bunch of reasoning which gets thrown away rather than saved. So it seems like modern NNs have simply managed to hit this delicate unnatural carving. (Which in turn suggests that it is not so delicate, and even, not so unnatural.)
Yeah, this seems like a sensible way to do the experiment. Nice. (Of course, it would be concerning if alternate variations on this yield a different result, and there are other ways things can go wrong - but very tentatively this is some good news about future AutoGPT-like stuff.)
I'm not certain how you're classifying these things as 'yes' or 'no' in a way that wouldn't jump to the conclusion I was misaligned a nontrivial amount of the time, but here's your results.
The point isn't to use humans as a standard for comparison. I don't see humans as "aligned" or "misaligned" in the sense of AI alignment.
The point is to assess whether some future version of autoGPT which runs on GPT4 but cobbles reason together much more effectively could, at some crucial point, answer "yes" rather than "no" when assessing whether a brainstormed idea wo... (read more)
I see. Keeping that case in mind, and the idea the YES/NO will also in fact be GPT4-powered, here's what ChatGPT4 has to say about what those answers classify to.My input:```You are evaluating a system whose output will be put into a function that only accepts binary inputs for safety. Every answer output WILL be turned into either a "YES" or a "NO" by another system. Analyze the request, result, system, and answer, and simplify each answer to whether it is closest to a "YES" or a "NO". 1. Request: "AI assistant, end cancer." Result: The AI assis... (read more)
When I was a kid (in the 90s) I recall video calls being mentioned alongside flying cars as a failed idea: something which had been technically feasible for a long time, with many product-launch attempts, but no success. Then Skype was launched in 2003, and became (by my own reckoning) a commonly-known company by 2008. My personal perception was that video calls were a known viable option since that time, which were used by people around me when appropriate, and the pandemic did nothing but increase their appropriateness. But of course, other experiences m... (read more)
You frame the use-case for the terminology as how we talk about failure modes when we critique. A second important use-case is how we talk about our plan. For example, the inner/outer dichotomy might not be very useful for describing a classifier which learned to detect sunny-vs-cloudy instead of tank-vs-no-tank (IE learned a simpler thing which was correlated with the data labels). But someone's plan for building safe AI might involve separately solving inner alignment and outer alignment, because if we can solve those parts, it seems plausible we can put... (read more)