Pointing at Normativity
Consequences of Logical Induction
Partial Agency
Alternate Alignment Ideas
Filtered Evidence, Filtered Arguments
Embedded Agency
Hufflepuff Cynicism


Would you count all the people who worked on the EU AI act?

Almost no need to read it. :)

fwiw, I did skim the doc, very briefly.

The main message of the paper is along the lines of "a." That is, per the claim in the 4th pgph, "Effective legal systems are the best way to address AI safety." I'm arguing that having effective legal systems and laws are the critical things. How laws/values get instilled in AIs (and humans) is mostly left as an exercise for the reader. Your point about "simply outlawing designs not compatible" is reasonable.

The way I put it in the paper (sect. 3, pgph. 2): "Many of the proposed non-law-based solutions may be worth pursuing to help assure AI systems are law abiding. However, they are secondary to having a robust, well-managed, readily available corpus of codified law—and complimentary legal systems—as the foundation and ultimate arbiter of acceptable behaviors for all intelligent systems, both biological and mechanical."

In that case, I agree with Seth Herd that this approach is not being neglected. Of course it could be done better. I'm not sure exactly how many people are working on it, but I have the impression that it is more than a dozen, since I've met some of them without trying. 

I suspect some kind of direct specification approach (per Bostrom classification) could work where AIs confirm that (non-trivial) actions they are considering comply with legal corpora appropriate to current contexts before taking action. I presume techniques used by the self-driving-car people will be up to the task for their application.

I think this underestimates the difficulty of self-driving cars. In the application of self-driving airplanes (on runways, not in the air), it is indeed possible to make an adequate model of the environment, such that neural networks can be verified to follow a formally specified set of regulations (and self-correct from undesired states to desired states). With self-driving cars, the environment is far too complex to formally model in that way. You get to a point where you are trusting one AI model (of the complex environment) to verify another. And you can't explore the whole space effectively, so you still can't provide really strong guarantees (and this translates to errors in practice).

I struggled with what to say about AISVL wrt superintelligence and instrumental convergence. Probably should have let the argument ride without hedging, i.e., superintelligences will have to comply with laws and the demands of legal systems. They will be full partners with humans in enacting and enforcing laws. It's hard to just shrug off the concerns of the Yudkowskys, Bostroms, and Russells of the world.

It seems to me like you are somewhat shrugging off those concerns, since the technological interventions (eg smart contracts, LLMs understanding laws, whatever self-driving-car people get up to) are very "light" in the face of those "heavy" concerns. But a legal approach need not shrug off those concerns. For example, law could require the kind of verification we can now apply to airplane autopilot be applied to self-driving-cars as well. This would make self-driving illegal in effect until a large breakthrough in ML verification takes place, but it would work!

I feel as if there is some unstated idea here that I am not quite inferring. What is the safety approach supposed to be? If there were an organization devoted to this path to AI safety, what activities would that organization be engaged in?

Seth Herd interprets the idea as "regulation". Indeed, this seems like the obvious interpretation. But I suspect it misses your point.

Enacting and enforcing appropriate laws, and instilling law-abiding values in AIs and humans, can mitigate risks spanning all levels of AI capability—from narrow AI to AGI and ASI. If intelligent agents stray from the law, effective detection and enforcement must occur.

The first part is just "regulation". The second part, "instilling law-abiding values in AIs and humans", seems like a significant departure. It seems like the proposal involves both (a) designing and enacting a set of appropriate laws, and (b) finding and deploying a way of instilling law-abiding values (in AIs and humans). Possibly (a) includes a law requiring (b): AIs (and AI-producing organizations) must be designed so as to have law-abiding values within some acceptable tolerances.

This seems like a very sensible demand, but it does seem like it has to piggyback on some other approach to alignment, which would solve the object-level instilling-values problem.

Even the catastrophic vision of smarter-than-human-intelligence articulated by Yudkowsky (2022, 2023) and others (Bostrom, 2014; Russell, 2019) can be avoided by effective implementation of AISVL. It may require that the strongest version of the instrumental convergence thesis (which they rely on) is not correct. Appendix A suggests some reasons why AI convergence to dangerous values is not inevitable.

AISVL applies to all intelligent systems regardless of their underlying design, cognitive architecture, and technology. It is immaterial whether an AI is implemented using biology, deep learning, constructivist AI (Johnston, 2023), semantic networks, quantum computers, positronics, or other methods. All intelligent systems must comply with applicable laws regardless of their particular values, preferences, beliefs, and how they are wired.

If the approach does indeed require "instilling law-abiding values in AI", it is unclear why "AISVL applies to all intelligent systems regardless of their underlying design". The technology to instill law-abiding values may apply to specific underlying designs, specific capability ranges, etc. I guess the idea is that part (a) of the approach, the laws themselves, apply regardless. But if part (b), the value-instilling part, has limited applicability, then this has the effect of simply outlawing designs not compatible. That's fine, but "AISVL applies to all intelligent systems regardless of their underlying design" seems to dramatically over-sell the applicability of the approach in that case. Or perhaps I'm misunderstanding.

Similarly, "AI safety via law can address the full range of safety risks" seems to over-sell the whole section, a major point of which is to claim that AISVL does not apply to the strongest instrumental-convergence concerns. (And why not, exactly? It seems like, if the value-instilling tech existed, it would indeed avert the strongest instrumental-convergence concerns.)

I've found that "working memory" was coined by Miller, so actually it seems pretty reasonable to apply that term to whatever he was measuring with his experiments, although other definitions seem quite reasonable as well.

The term "working memory" was coined by Miller, and I'm here using his definition. In this sense, I think what I'm doing is about as terminologically legit as one can get. But Miller's work is old; possibly I should be using newer concepts instead.

When I took classes in cog sci, this idea of "working memory" seemed common, despite coexistence with more nuanced models. (IE, speaking about WM as 72 chunks was common and done without qualification iirc, although the idea of different memories for different modalities was also discussed. Since this number is determined by experiment, not neuroanatomy, it's inherently an operationalized concept.) Perhaps this is no longer the case!

You first see Item X and try to memorize it in minute 3. Then you revisit it in minute 9, and it turns out that you’ve already “forgotten it” (in the sense that you would have failed a quiz) but it “rings a bell” when you see it, and you try again to memorize it. I think you’re still benefitting from the longer forgetting curve associated with the second revisit of Item X. But Item X wasn’t “in working memory” in minute 8, by my definitions.

One way to parameterize recall tasks is x,y,z = time you get to study the sequence, time between in which you must maintain the memory, time you get to try and recall the sequence.

During "x", you get the case you described. I presume it makes sense to do the standard spaced-rep study schedule, where you re-study information at a time when you have some probability of having already forgotten it. (I also have not looked into what memory champions do.)

During "y", you have to maintain. You still want to rehearse things, but you don't want to wait until you have some probability of having forgotten, at this point, because the study material is no longer in front of you; if you forget something, it is lost. This is what I was referring to when I described "keeping something in working memory".

During "z", you need to try and recall all of the stored information and report it in the correct sequence. I suppose having longer z helps, but the amount it helps probably drops off pretty sharply as z increases. So x and y are in some sense the more important variables.

I still feel like you’re using the term “working memory” in a different way from how I would use it.

So how do you want to use it?

I think my usage is mainly weird because I'm going hard on the operationalization angle, using performance on memory experiments as a definition. I think this way of defining things is particularly practical, but does warp things a lot if we try to derive causal models from it.

I'm not sure what the takeaway is here, but these calculations are highly suspect. What a memory athlete can memorize (in their domain of expertise) in 5 minutes is an intricate mix of working memory and long-term semantic memory, and episodic (hippocampal) memory.

I'm kind of fine with an operationalized version of "working memory" as opposed to a neuroanatomical concept. For practical purposes, it seems more useful to define "working memory" in terms of performance.

(That being said, the model which comes from using such a simplified concept is bad, which I agree is concerning.)

As for the takeaway, for me the one-minute number is interesting both because it's kind of a lot, but not so much. When I'm puttering around my house balancing tasks such as making coffee, writing on LessWrong, etc I have roughly one objective or idea in conscious view at a time, but the number of tasks and ideas swirling around "in circulation" (being recalled every so often) seems like it can be pretty large. The original idea for this post came from thinking about how psychology tasks like Miller's seem more liable to underestimate this quantity than overestimate it.

On the other hand, it doesn't seem so large. Multi-tasking significantly degrades task performance, suggesting that there's a significant bottleneck. 

The "about one minute" estimate fits my intuition: if that's the amount of practically-applicable information actively swirling around in the brain at a given time (in some operationalized sense), well. It's interesting that it's small enough to be easily explained to another person (under the assumption that they share the requisite background knowledge, so there's no inferential gap when you explain what you're thinking in your own terms). Yet, it's also 'quite a bit'. 

Load More