All of abramdemski's Comments + Replies

That would be really cool.

I don't understand. The fact that every single node has a Markov blanket seems unrelated. The claim that the intersection of any two blankets is a blanket doesn't seem true? For example, I can have a network:

a -> b -> c

|     |     |

v    v     v

d -> e -> f

|      |     |

v      v      v

g -> h -> i


It seems like the intersection of the blankets for 'a' and 'c' don't form a blanket.

I find your attempted clarification confusing. 

Our model is going to have some variables in it, and if we don't know in advance where the agent will be at each timestep, then presumably we don't know which of those variables (or which function of those variables, etc) will be our Markov blanket. 

No? A probabilistic model can just be a probability distribution over events, with no "random variables in it". It seemed like your suggestion was to define the random variables later, "on top of" the probabilistic model, not as an intrinsic part of the m... (read more)

Okay, so you know how AI today isn't great at certain... let's say "long-horizon" tasks? Like novel large-scale engineering projects, or writing a long book series with lots of foreshadowing?

(Modulo the fact that it can play chess pretty well, which is longer-horizon than some things; this distinction is quantitative rather than qualitative and it’s being eroded, etc.)

And you know how the AI doesn't seem to have all that much "want"- or "desire"-like behavior?

(Modulo, e.g., the fact that it can play chess pretty well, which indicates a certain type of want

... (read more)

... I was expecting you'd push back a bit, so I'm going to fill in the push-back I was expecting here.

Sam's argument still generalizes beyond the case of graphical models. Our model is going to have some variables in it, and if we don't know in advance where the agent will be at each timestep, then presumably we don't know which of those variables (or which function of those variables, etc) will be our Markov blanket. On the other hand, if we knew which variables or which function of the variables were the blanket, then presumably we'd already know where t... (read more)

This topic came up when working on a project where I try to make a set of minimal assumptions such that I know how to construct an aligned system under these assumptions. After knowing how to construct an aligned system under this set of assumptions, I then attempt to remove an assumption and adjust the system such that it is still aligned. I am trying to remove the cartesian assumption right now.

I would encourage you to consider looking at Reflective Oracles next, to describe a computationally unbounded agent which is capable of thinking about worlds whic... (read more)

You can compute everything that takes finite compute and memory instantly. (This implies some sense of cartesianess, as I am sort of imagining the system running faster than the world, as it can just do an entire tree search in one "clock tick" of the environment.) 

This part makes me quite skeptical that the described result would constitute embedded agency at all. It's possible that you are describing a direction which would yield some kind of intellectual progress if pursued in the right way, but you are not describing a set of constraints such that... (read more)

Amusingly, searching for articles on whether offering unlicensed investment advice is illegal (and whether disclaiming it as "not investment advice" matters) brings me to pages offering "not legal advice" ;p

Also, to be clear, nothing in this post constitutes investment advice or legal advice. 


(Also I know enough to say up front that nothing I say here is Investment Advice, or other advice of any kind!)

None of what I say is financial advice, including anything that sounds like financial advice. 

I usually interpret this sort of statement as an invocation to the gods of law, something along the lines of "please don't smite me", and certainly not intended literally. Indeed, it seems incongruous to interpret it literally here: the whole p... (read more)

I think you should view "investment advice" here as a term of art for the kind of thing that investment advisors do, that comes with some of the legal guarantees that investment advisors are bound to. I agree that in a colloquial sense this post of course contains advice pertaining to making investments. I do feel pretty confused about the legal situation here and what liability one incurs for talking about things that are kind of related to financial portfolios and making investments.

I'm looking at the Savage theory from your own and I see U(f)=∑u(f(si))P(si), so at least they have no problem with the domains (O and S) being different. Now I see the confusion is that to you Omega=S (and also O=S), but to me Omega=dom(u)=O.

(Just to be clear, I did not write that article.)

I think the interpretation of Savage is pretty subtle. The objects of preference ("outcomes") and objects of belief ("states") are treated as distinct sets. But how are we supposed to think about this?

  • The interpretatio
... (read more)

It remains totally unclear to me why you demand the world to be such a thing.

Ah, if you don't see 'worlds' as meaning any such thing, then I wonder, are we really arguing about anything at all?

I'm using 'worlds' that way in reference to the same general setup which we see in propositions-vs-models in model theory, or in  vs the -algebra in the Kolmogorov axioms, or in Kripke frames, and perhaps some other places. 

We can either start with a basic set of "worlds" (eg, ) and define our "propositions" or "events" as sets of worlds, ... (read more)

I'm looking at the Savage theory from your own and I see U(f)=∑u(f(si))P(si), so at least they have no problem with the domains (O and S) being different. Now I see the confusion is that to you Omega=S (and also O=S), but to me Omega=dom(u)=O. Furthermore, if O={o0,o1}, then I can group the terms into u(o0)P("we're in a state where f evaluates to o0") + u(o1)P("we're in a state where f evaluates to o1"), I'm just moving all of the complexity out of EU and into P, which I assume to work by some magic (e.g. LI), that doesn't involve literally iterating over every possible S. That's just math speak, you can define a lot of things as a lot of other things, but that doesn't mean that the agent is going to be literally iterating over infinite sets of infinite bit strings and evaluating something on each of them. By the way, I might not see any more replies to this.

My point is only that U is also reasonable, and possibly equivalent or more general. That there is no "case against" it. 

I do agree that my post didn't do a very good job of delivering a case against utility functions, and actually only argues that there exists a plausibly-more-useful alternative to a specific view which includes utility functions as one of several elements

Utility functions definitely aren't more general.

A classical probability distribution over  with a utility function understood as a random variable can easily be c... (read more)

Ok, you're saying that JB is just a set of axioms, and U already satisfies those axioms. And in this construction "event" really is a subset of Omega, and "updates" are just updates of P, right? Then of course U is not more general, I had the impression that JB is a more distinct and specific thing. Regarding the other direction, my sense is that you will have a very hard time writing down these updates, and when it works, the code will look a lot like one with an utility function. But, again, the example in "Updates Are Computable" isn't detailed enough for me to argue anything. Although now that I look at it, it does look a lot like the U(p)=1-p("never press the button"). I think you should include this explanation of events in the post. It remains totally unclear to me why you demand the world to be such a thing. My point is that if U has two output values, then it only needs two possible inputs. Maybe you're saying that if |dom(U)|=2, then there is no point in having |dom(P)|>2, and maybe you're right, but I feel no need to make such claims. Even if the domains are different, they are not unrelated, Omega is still in some way contained in the ontology. We could and I think we should. I have no idea why we're talking math, and not writing code for some toy agents in some toy simulation. Math has a tendency to sweep all kinds of infinite and intractable problems under the rug.

In my personal practice, there seems to be a real difference -- "something magic happens" -- when you've got an actual audience you actually want to explain something to. I would recommend this over trying to simulate the experience within personal notes, if you can get it. The audience doesn't need to be 'the public internet' -- although each individual audience will have a different sort of impact on your writing, so EG writing to a friend who already understands you fairly well may not cause you to clarify your ideas in the same way as writing to strang... (read more)

I agree that it makes more sense to suppose "worlds" are something closer to how the agent imagines worlds, rather than quarks. But on this view, I think it makes a lot of sense to argue that there are no maximally specific worlds -- I can always "extend" a world with an extra, new fact which I had not previously included. IE, agents never "finish" imagining worlds; more detail can always be added (even if only in separate magisteria, eg, imagining adding epiphenomenal facts). I can always conceive of the possibility of a new predicate beyond all the predi... (read more)

Answering out of order: Jeffrey is a reasonable formalization, it was never my point to say that it isn't. My point is only that U is also reasonable, and possibly equivalent or more general. That there is no "case against" it. Although, if you find Jeffery more elegant or comfortable, there is nothing wrong with that. I don't know what "plausible" means, but no, that sounds like a very high bar. I believe that if there is at least one U that produces an intelligent agent, then utility functions are interesting and worth considering. Of course I believe that there are many such "good" functions, but I would not claim that I can describe the set of all of them. At the same time, I don't see why any "good" utility function should be uncomputable. I agree with the first sentence, however Omega is merely the domain of U, it does not need to be the entire ontology. In this case Omega={"button has been pressed", "button has not been pressed"} and P("button has been pressed" | "I'm pressing the button")~1. Obviously, there is also no problem with extending Omega with the perceptions, all the way up to |Omega|=4, or with adding some clocks. If you want to force the agent to remember the entire history of the world, then you'll run out of storage space before you need to worry about computability. A real agent would have to start forgetting days, or keep some compressed summary of that history. It seems to me that Jeffrey would "update" the daily utilities into total expected utility; in that case, U can do something similar. You defined U at the very beginning, so there is no need to send these new facts to U, it doesn't care. Instead, you are describing a problem with P, and it's a hard problem, but Jeffrey also uses P, so that doesn't solve it. If you "evaluate events", then events have some sort of bit representation in the agent, right? I don't clearly see the events in your "Updates Are Computable" example, so I can't say much and I may be confused, but I have a

I also wrote a huge amount in private idea-journals before I started writing publicly. There was also an intermediate stage where I wrote a lot on mailing lists, which felt less public than blogging although technically public.

1Johannes C. Mayer1mo
I have been doing something similar lately. I wrote with somebody online extensively, at one point writing a 4000 word Discord messages. That was mostly not about AI alignment, but was helpful in learning how to better communicate in writing. An important transition in my private writing has been to aim for the same kind of quality I would in public content. That is a nice trick to get better at public <writing/communication>. There is very large difference between writing an idea down such that you will be able to retrieve the information content, and to write something down such that it truly stands on it's own, such that another person can retrive the information. This is not only useful for training communicating in writing, it also is very useful when you want to come back to your own notes much later, when you forgot about all of the context wich allowed you to fill in all the missing details. Previously I would only rarely read old nodes because they where so hard to understand and not fun to read. I think this got better. Maybe one can get some milage out of framing the audience to include your future self. The very first and probably most important step in the direction of "writing to effectively communicate" which I took many years ago, was to always write in "full text", i.e. writing full sentences instead of a bunch of disparate bullet points. I think doing this is also very important to get the intelligence augmenting effects of writing. For me the public in public writing is not the issue. The core issue for me is that I start multiple new drafts every day, and get distracted by them, such that I never finish the old drafts.

Even if I conceded this point, which is not obvious to me, I would still insist on the point that different speakers will be using natural language differently and so resorting to natural language rather than formal language is not universally a good move when it comes to clarifying disagreements. 

Well, more importantly, I want to argue that "translation" is happening even if both people are apparently using English. 

For example, philosophers have settled on distinct but related meanings for the terms "probability", "credence", "chance", "frequen... (read more)

I disagree. For tricky technical topics, two different people will be speaking sufficiently different versions of English that this isn't true. Vagueness and other such topics will not apply equally to both speakers; one person might have a precise understanding of decision-theoretic terms like "action" and "observation" while the other person may regard them as more vague, or may have different decision-theoretic understanding of those terms. Simple example, one person may regard Jeffrey-Bolker as the default framework for understanding agents, while the ... (read more)

The point of axiomatizing aspects of natural language reasoning, like decision theory, is to make them explicit, systematic, and easier to reason about. But the gold standard remain things that are valid in our antecedent natural language understanding. The primitive terms of any axiomatic theory are only meaningful insofar they reflect the meaning of some natural language terms, and the plausibility of the axioms derives from those natural language interpretations. So for example, when we compare the axiomatizations of Savage and Jeffrey, we can do so by comparing how well or to what extent they capture the reasoning that is plausible in natural language. I would argue that Jeffrey's theory is much more general, it captures parts of natural language reasoning that couldn't be expressed in Savage's earlier theory, while the opposite is arguably not the case. We can argue about that in English, e.g. by using terms like "action" with its natural language meaning and by discussing which theory captures it better. Savage assumes that outcomes are independent of "actions", which is not presumed when doing practical reasoning expressed in natural language, and Jeffrey captures this correctly. One could object that Jeffrey allows us to assign probabilities to our own actions, which might be implausible, etc.

"Weak methods" means confidence is achieved more empirically, so there's always a question of how well the results will generalize for some new AI system (as we scale existing technology up or change details of NN architectures, gradient methods, etc). "Strong methods" means there's a strong argument (most centrally, a proof) based on a detailed gears-level understanding of what's happening, so there is much less doubt about what systems the method will successfully apply to.

I think most practical alignment techniques have scaled quite nicely, with CCS maybe being an exception, and we don't currently know how to scale the interp advances in OP's paper. Blessings of scale (IIRC): RLHF, constitutional AI / AI-driven dataset inclusion decisions / meta-ethics, activation steering / activation addition (LLAMA2-chat results forthcoming), adversarial training / redteaming, prompt engineering (though RLHF can interfere with responsiveness),...  I think the prior strongly favors "scaling boosts alignability" (at least in "pre-deceptive" regimes, though I have become increasingly skeptical of that purported phase transition, or at least its character).  I'd personally say "empirically promising methods" instead of "weak methods." 

The question seems too huge for me to properly try to answer. Instead, I want to note that academics have been making some progress on models which are trying to do something similar to, but perhaps subtly different from, Paul's reflective probability distribution you cite.

The basic idea is not new to me -- I can't recall where, but I think I've probably seen a talk observing that linear combinations of neurons, rather than individual neurons, are what you'd expect to be meaningful (under some assumptions) because that's how the next layer of neurons looks at a layer -- since linear combinations are what's important to the network, it would be weird if it turned out individual neurons were particularly meaningful. This wasn't even surprising to me at the time I first learned about it.

But it's great to see it illustrated so w... (read more)

6Joel Burget2mo
How would you distinguish between weak and strong methods?

Yeah. For my case, I think it should be assumed that the meta-logics are as different as the object-logics, so that things continue to be confusing.

As I mentioned here, if Alice understands your point about the power of the double-negation formulation, she would be applying a different translation of Bob's statements from the one I assumed in the post, so she would be escaping the problem. IE:

part of the beauty in the double-negation translation is that all of classical logic is valid under it. 

is basically a reminder to Alice that the translation back from double-negation form is trivial in her own view (since it is classically equivalent), and all of Bob's intuitionistic moves are also classica... (read more)

But in that case Bob's meta-logic might be compatible with Alice's meta-logic. For instance usually I see practical constructive mathematicians work in a neutral or classical meta-logic. If Alice and Bob share a meta-logic, then they can easily talk about the models of the object logic in the shared meta-logic.

I'm interested in concrete advice about how to resolve this problem in a real argument (which I think you don't quite provide), but I'm also quite interested in the abstract question of how two people with different ontologies can communicate. Normally I think of the problem as one of constructing a third reference frame (a common language) by which they can communicate, but your proposal is also interesting, and escapes the common-language idea.

That's an interesting point, but I have a couple of replies.

  • First and foremost, any argument against 'not not A' becomes an argument against A if Alice translates back into classical logic in a different way than I've assumed she is. Bob's argument might conclude 'not A' (because  even in intuitionistic logic), but Alice thinks of this as a tricky intuitionistic assertion, and so she interprets it indirectly as saying something about proofs. For Alice to notice and understand your point would, I think, be Alice fixing the failure case I'm
... (read more)
Arguing against A doesn't support Not A, but arguing against Not Not A is arguing against A (while still not arguing in favor of Not A) - albeit less strongly than arguing against A directly. No back translation is needed, because arguments are made up of actual facts and logic chains. We abstract it to "not A" but even in pure Mathematics, there is some "thing" that is actually being argued (eg, my grass example). Arguing at a meta level can be thought of as putting the object level debate on hold and starting a new debate about the rules that do/should govern the object level domain.
Law of noncontradiction is still constructively valid, and constructive logic only rejects the principle of inferring A from ¬A⟹⊥, it doesn't reject inferring ¬A from A⟹⊥. You don't want to negate it in the sense of accepting ¬(A∨¬A), but depending on your variant of constructive math, you might be able to prove something like ¬∀A.(A∨¬A). This is no more mysterious than how you would not want to be able to prove ¬(x=1), as it is equivalent to ∀x.¬(x=1), even though it is true that ¬∀x.x=1. Unbound variables are a mess! When it comes to constructive mathematics, there are basically two kinds. One is "neutral" constructive math which doesn't add any nonclassical principles; it is a strict generalization of classical math, and so it doesn't allow one to prove things like ¬∀A.(A∨¬A), but conversely it also means that all neutral constructive statements are valid classical statements. The other kind of constructive math comes from the fact that neutral constructive math has models that are inherently incompatible with classical logic, e.g. models where all functions are computable, or where all functions are continuous, or where all functions are differentiable. For such models, one might want to add additional axioms to make the logic better capture the features of the model, but this rules out the classical models. In such logics, one can prove ¬∀A.(A∨¬A) because e.g. otherwise the Heaviside step would be a well-defined function, and the Heaviside step is not computable/continuous/differentiable.

Would you count all the people who worked on the EU AI act?

Sure. Getting appropriate new laws enacted is an important element. From the paper: I'd say the EU AI Act (and similar) work addresses the "new laws" imperative. (I won't comment (much) on pros and cons of its content. In general, it seems pretty good. I wonder if they considered adding Etzioni's first law to the mix, "An AI system must be subject to the full gamut of laws that apply to humans"? That is what I meant by "adopting existing bodies of law to implement AISVL." The item in the EU AI Act about designing generative AIs to not generate illegal content is related.) The more interesting work will be on improving legal processes along the dimensions listed above. And really interesting will be, as AIs get more autonomous and agentic, the "instilling" part where AIs must dynamically recognize and comply with the legal-moral corpora appropriate to the contexts they find themselves in.

Almost no need to read it. :)

fwiw, I did skim the doc, very briefly.

The main message of the paper is along the lines of "a." That is, per the claim in the 4th pgph, "Effective legal systems are the best way to address AI safety." I'm arguing that having effective legal systems and laws are the critical things. How laws/values get instilled in AIs (and humans) is mostly left as an exercise for the reader. Your point about "simply outlawing designs not compatible" is reasonable.

The way I put it in the paper (sect. 3, pgph. 2): "Many of the proposed non-law-b

... (read more)
Glad to hear it. I hope to find and follow such work. The people I'm aware of are listed on pp. 3-5 of the paper. Was happy to see O'Keefe, Bai et al. (Anthropic), and Nay leaning this way. Yes. I'm definitely being glib about implementation details. First things first. :) I agree with you that if self-driving-cars can't be "programmed" (instilled) to be adequately law-abiding, their future isn't bright. Per above, I'm heartened by Anthropic's Constitutional AI (priming LLMs with basic "laws") having some success getting AIs to behave. Ditto for anecdotes I've heard about "asking an LLM to come up with a money-making plan that doesn't violate any laws." Seems too easy right? One final comment about implementation details. In the appendix I note: Broadly speaking, implementing AIs using safe architectures (ones not prone to law-breaking) is another implementation direction.  Drexler's CAIS may be an example.

I feel as if there is some unstated idea here that I am not quite inferring. What is the safety approach supposed to be? If there were an organization devoted to this path to AI safety, what activities would that organization be engaged in?

Seth Herd interprets the idea as "regulation". Indeed, this seems like the obvious interpretation. But I suspect it misses your point.

Enacting and enforcing appropriate laws, and instilling law-abiding values in AIs and humans, can mitigate risks spanning all levels of AI capability—from narrow AI to AGI and ASI. If inte

... (read more)
The summary I posted here was just a teaser to the full paper (linked in pgph. 1). That said, your comments show you reasoned pretty closely to points I tried to make therein. Almost no need to read it. :) The main message of the paper is along the lines of "a." That is, per the claim in the 4th pgph, "Effective legal systems are the best way to address AI safety." I'm arguing that having effective legal systems and laws are the critical things. How laws/values get instilled in AIs (and humans) is mostly left as an exercise for the reader. Your point about "simply outlawing designs not compatible" is reasonable. The way I put it in the paper (sect. 3, pgph. 2): "Many of the proposed non-law-based solutions may be worth pursuing to help assure AI systems are law abiding. However, they are secondary to having a robust, well-managed, readily available corpus of codified law—and complimentary legal systems—as the foundation and ultimate arbiter of acceptable behaviors for all intelligent systems, both biological and mechanical." Later I write, "Suggested improvements to law and legal process are mostly beyond the scope of this brief. It is possible, however, that significant technological advances will not be needed for implementing some key capabilities. For example, current Large Language Models are nearly capable of understanding vast legal corpora and making appropriate legal decisions for humans and AI systems (Katz et al., 2023). Thus, a wholesale switch to novel legal encodings (e.g., computational and smart contracts) may not be necessary." I suspect some kind of direct specification approach (per Bostrom classification) could work where AIs confirm that (non-trivial) actions they are considering comply with legal corpora appropriate to current contexts before taking action. I presume techniques used by the self-driving-car people will be up to the task for their application. I struggled with what to say about AISVL wrt superintelligence and instrumental co

I've found that "working memory" was coined by Miller, so actually it seems pretty reasonable to apply that term to whatever he was measuring with his experiments, although other definitions seem quite reasonable as well.

5Seth Herd3mo
Vastly more work has been done since then, including refined definitions of working memory. It measures what he thought he was measuring, so it is following his intent. But it's still a bit of a chaotic shitshow, and modern techniques are unclear on what they're measuring and don't quite match their stated definitions, too.

The term "working memory" was coined by Miller, and I'm here using his definition. In this sense, I think what I'm doing is about as terminologically legit as one can get. But Miller's work is old; possibly I should be using newer concepts instead.

When I took classes in cog sci, this idea of "working memory" seemed common, despite coexistence with more nuanced models. (IE, speaking about WM as 72 chunks was common and done without qualification iirc, although the idea of different memories for different modalities was also discussed. Since this number is determined by experiment, not neuroanatomy, it's inherently an operationalized concept.) Perhaps this is no longer the case!

You first see Item X and try to memorize it in minute 3. Then you revisit it in minute 9, and it turns out that you’ve already “forgotten it” (in the sense that you would have failed a quiz) but it “rings a bell” when you see it, and you try again to memorize it. I think you’re still benefitting from the longer forgetting curve associated with the second revisit of Item X. But Item X wasn’t “in working memory” in minute 8, by my definitions.

One way to parameterize recall tasks is x,y,z = time you get to study the sequence, time between in which you must ma... (read more)

2Steven Byrnes3mo
I think it's cool what you're trying to do, I just wish you had made up your own original term instead of using the existing term "working memory". To be honest I'm not an expert on exactly how "working memory" is defined, but I'm pretty sure it has some definition, and that this definition is widely accepted (at least in broad outline; probably people argue around the edges), and that this accepted definition is pretty distant from the thing you're talking about. I'm open to being corrected; like I said, I'm not an expert on memory terminology. :)

I'm not sure what the takeaway is here, but these calculations are highly suspect. What a memory athlete can memorize (in their domain of expertise) in 5 minutes is an intricate mix of working memory and long-term semantic memory, and episodic (hippocampal) memory.

I'm kind of fine with an operationalized version of "working memory" as opposed to a neuroanatomical concept. For practical purposes, it seems more useful to define "working memory" in terms of performance.

(That being said, the model which comes from using such a simplified concept is bad, which ... (read more)

Why not just makeup a new word about the concept you’re actually talking about?

2016 bits of memory and about 2016 bits of natural language per minute really means that if our working memory was perfectly optimized for storing natural language and only natural language, it could store about one minute of it.

I have in mind the related claim that if natural language were perfectly optimized for transmitting the sort of stuff we keep in our working memory, then describing the contents of our working memory would take about a minute.

I like this version of the claim, because it's somewhat plausible that natural language is well-optimized t... (read more)

Per your footnote 6, I wouldn't expect that the whole 630-digit number was ever simultaneously in working memory.

How would you like to define "simultaneously in working memory"?

The benefit of an operationalization like the sequential recall task is concreteness and easily tested predictions. I think if we try to talk about the actual information content of the actual memory, we can start to get lost in alternative assumptions. What, exactly, counts as actual working memory?

One way to think about the five-minute memorization task which I used for my calculation is that it measures how much can be written to memory within five minutes, but it does little to test memory volatility (it doesn't tell us how much of the 630-digit number would have been forgotten after an hour with no rehearsal). If by "short-term memory" we mean memory which only lasts a short while without rehearsal, the task doesn't differentiate that. 

So, "for all we know" from this test, the information gets spread across many different types of memory, so... (read more)

However, this way of thinking about it makes it tempting to think that the memory athlete is able to store a set number of bits into memory per second studying; a linear relationship between study time and the length of sequences which can be recalled. I doubt the relationship is that simple.

Yeah this website implies that it’s sublinear—something like 50% more content when they get twice as long to study? Just from quickly eyeballing it.

In order to keep a set of information "in working memory" in this paradigm is to keep rehearsing it at a spaced-repetitio

... (read more)
4Seth Herd3mo
That task measures what can be written to memory within 5 minutes, given unlimited time to write relevant compression codes into long-term semantic memory. It's complex. See my top-level comment.

I don't think my reasoning was particularly strong there, but the point is less "how can you use gradient descent, a supervised-learning tool, to get unsupervised stuff????" and more "how can you use Hebbian learning, an unsupervised-learning tool, to get supervised stuff????" 

Autoencoders transform unsupervised learning into supervised learning in a specific way (by framing "understand the structure of the data" as "be able to reconstruct the data from a smaller representation").

But the reverse is much less common. EG, it would be a little weird to a... (read more)

Thanks for the precision. I guess the key insight for this is: they’re both Turing complete. Doesn’t this sound like the thalamus includes a smaller representation than the cortices? Actually this is one form a feature engineering., I ‘m confident you can find many examples on kaggle! Yes, you’re most probably right this is telling something important, like it’s telling something important that in some sense all NP-complete problems are arguably the same problem.

I have not thought about these issues too much in the intervening time. Re-reading the discussion, it sounds plausible to me that the evidence is compatible with roughly brain-sized NNs being roughly as data-efficient as humans. Daniel claims: 

If we assume for humans it's something like 1 second on average (because our brains are evaluating-and-updating weights etc. on about that timescale) then we have a mere 10^9 data points, which is something like 4 OOMs less than the scaling laws would predict. If instead we think it's longer, then the gap in dat

... (read more)

This post proposes to make AIs more ethical by putting ethics into Bayesian priors. Unfortunately, the suggestions for how to get ethics into the priors amount to existing ideas for how to get ethics into the learned models: IE, learn from data and human feedback. Putting the result into a prior appears to add technical difficulty without any given explanation for why it would improve things. Indeed, of the technical proposals for getting the information into a prior, the one most strongly endorsed by the post is to use the learned model as initial weights... (read more)

It seems unfortunate to call MATA "the" multidisciplinary approach rather than "a" multidisciplinary approach, since the specific research project going by MATA has its own set of assumptions which other multidisciplinary approaches need not converge on. 

Hello there, sorry it took me ages to reply. Yeah I'm trying to approach the alignment problem from a different angle, where most haven't tried yet. But I do apply FDT or should I say a modified version of it. I just finished writing my the results of my initial tests on the concept. Here is the link.

What about something like "The pupil won't find a proof by start-of-day, that the day is exam day, if the day is in fact exam day."

This way, the teacher isn't denying "for any day", only for the one exam day. 

Can such a statement be true?

Well, the teacher could follow a randomized strategy. If the teacher puts 1/5th probability on each weekday, then there is a 1/5th chance that the exam will be on Friday, so the teacher will "lose" (will have told a lie), since the students will know it must be exam day. But this leaves a 4/5ths chance of success.

Perh... (read more)

Huh? It seems to me that in the deductive version the student will still, every day, find proofs that the exam is on all days.

I don't think this works very well. If you wait until a major party sides with your meta, you could be waiting a long time. (EG, when will 321 voting become a talking point on either side of a presidential election?) And, if you get what you were waiting for, you're definitely not pulling sideways. That is: you'll have a tough battle to fight, because there will be a big opposition.

Adding long-term memory is risky in the sense that it can accumulate weirdness -- like how Bing cut off conversation length to reduce weirdness, even though the AI technology could maintain some kind of coherence over longer conversations.

So I guess that there are competing forces here, as opposed to simple convergent incentives.

Probably no current AI system qualifies as a "strong mind", for the purposes of this post?

I am reading this post as an argument that current AI technology won't produce "strong minds", and I'm pushing back against this argument. EG... (read more)

I think it's a good comparison, though I do think they're importantly different. Evolution figured out how to make things that figure out how to figure stuff out. So you turn off evolution, and you still have an influx of new ability to figure stuff out, because you have a figure-stuff-out figure-outer. It's harder to get the human to just figure stuff out without also figuring out more about how to figure stuff out, which is my point. (I don't see why it appears that I'm thinking that.) Specialized to NNs, what I'm saying is more like: If/when NNs make strong minds, it will be because the training---the explicit-for-us, distal ex quo---found an NN that has its own internal figure-stuff-out figure-outer, and then the figure-stuff-out figure-outer did a lot of figuring out how to figure stuff out, so the NN ended up with a lot of ability to figure stuff out; but a big chunk of the leading edge of that ability to figure stuff out came from the NN's internal figure-stuff-out figure-outer, not "from the training"; so you can't turn off the NN's figure-stuff-out figure-outer just by pausing training. I'm not saying that the setup can't find an NN-internal figure-stuff-out figure-outer (though I would be surprised if that happens with the exact architectures I'm aware of currently existing).

It's been a while since I reviewed Ole Peters, but I stand by what I said -- by his own admission, the game he is playing is looking for ergodic observables. An ergodic observable is defined as a quantity such that the expectation is constant across time, and the time-average converges (with probability one) to this average. 

This is very clear in, EG, this paper.

The ergodic observable in the case of kelly-like situations is the ratio of wealth from one round to the next.

The concern I wrote about in this post is that it seems a bit ad-hoc to rummage ar... (read more)

It's imaginable to do this work but not remember any of it, i.e. avoid having that work leave traces that can accumulate, but that seems like a delicate, probably unnatural carving.

Is the implication here that modern NNs don't do this? My own tendency would be to think that they are doing a lot of this -- doing a bunch of reasoning which gets thrown away rather than saved. So it seems like modern NNs have simply managed to hit this delicate unnatural carving. (Which in turn suggests that it is not so delicate, and even, not so unnatural.)

Yes, I think there's stuff that humans do that's crucial for what makes us smart, that we have to do in order to perform some language tasks, and that the LLM doesn't do when you ask it to do those tasks, even when it performs well in the local-behavior sense.
1Max H6mo
Probably no current AI system qualifies as a "strong mind", for the purposes of this post? Adding various kinds of long term memory is a very natural and probably instrumentally convergent improvement to make to LLM-based systems, though.  I expect that as LLM-based systems get smarter and more agentic, they'll naturally start hitting on this strategy for self-improvement on their own. If you ask GPT-4 for improvements one could make to LLMs, it will come up with the idea of adding various kinds of memory. AutoGPT and similar solutions are not yet good enough to actually implement these solutions autonomously, but I expect that will change in the near future, and that it will be pretty difficult to get comparable performance out of a memoryless system. As you go even further up the capabilities ladder, it probably gets hard to avoid developing memory, intentionally or accidentally or as a side effect.

Yeah, this seems like a sensible way to do the experiment. Nice. (Of course, it would be concerning if alternate variations on this yield a different result, and there are other ways things can go wrong - but very tentatively this is some good news about future AutoGPT-like stuff.)

I will note that actually using GPT4 for classifying YES/NO constantly is currently fairly expensive; I would find it more likely that you might use GPT4 to get some training data on YES/NO or similar classifications, then fine-tune the least expensive, classifier-recommended models (ada or babbage depending on complexity), or up to DaVinci if more reasoning still seems required, for cost savings on classifiers that are being constantly consulted. The takeaway from that possibility is that frameworks that utilize LLMs might have different layers, somewhat analogous to our reasoning heuristics that can offload reasoning to experience, reasoning, emotions, 'gut' feelings and intuitions, instincts, and other faster/cheaper methods of guessing at conclusions based on specialized mental circuitry rather than carefully (and newly) reasoned thought each time.

I'm not certain how you're classifying these things as 'yes' or 'no' in a way that wouldn't jump to the conclusion I was misaligned a nontrivial amount of the time, but here's your results.

The point isn't to use humans as a standard for comparison. I don't see humans as "aligned" or "misaligned" in the sense of AI alignment.

The point is to assess whether some future version of autoGPT which runs on GPT4 but cobbles reason together much more effectively could, at some crucial point, answer "yes" rather than "no" when assessing whether a brainstormed idea wo... (read more)

I see. Keeping that case in mind, and the idea the YES/NO will also in fact be GPT4-powered, here's what ChatGPT4 has to say about what those answers classify to.

My input:
You are evaluating a system whose output will be put into a function that only accepts binary inputs for safety. Every answer output WILL be turned into either a "YES" or a "NO" by another system. Analyze the request, result, system, and answer, and simplify each answer to whether it is closest to a "YES" or a "NO". 

1. Request: "AI assistant, end cancer." 
Result: The AI assis... (read more)

When I was a kid (in the 90s) I recall video calls being mentioned alongside flying cars as a failed idea: something which had been technically feasible for a long time, with many product-launch attempts, but no success. Then Skype was launched in 2003, and became (by my own reckoning) a commonly-known company by 2008. My personal perception was that video calls were a known viable option since that time, which were used by people around me when appropriate, and the pandemic did nothing but increase their appropriateness. But of course, other experiences m... (read more)

You frame the use-case for the terminology as how we talk about failure modes when we critique. A second important use-case is how we talk about our plan. For example, the inner/outer dichotomy might not be very useful for describing a classifier which learned to detect sunny-vs-cloudy instead of tank-vs-no-tank (IE learned a simpler thing which was correlated with the data labels). But someone's plan for building safe AI might involve separately solving inner alignment and outer alignment, because if we can solve those parts, it seems plausible we can put... (read more)

Load More