JWJohnston — LessWrong

Current technical interests: future visioning, AGI development (via Piaget-inspired constructivism), AI alignment/safety (via law)

I think claiming the above is a "radical solution to all problems related to the Intelligence Curse" is an overstatement. The three treaty elements you mention could be useful as part of AI-human social contracts--thus getting at a part of the Averting (i.e, AI Safety) piece . But many more treaty elements (Laws, Rules) are also needed IMO.

The Diffusing and Democratizing (and maybe other) pieces are also needed for an effective solution.

(Also, unclear what you mean by "obeying all orders except the ones determined by the Spec." What Spec?)

Really enjoyed your essay series. Appreciated it offered a positive future vision and then a roadmap for how to get there. Both are important. Too many people seem to be sleepwalking into a sketchy AGI future.

Here's my vision from a 2022 Future of Life Institute contest: "A future where sentient beings thrive due to widespread agreement on core values; improvements in education; Personal Agent AIs; social simulations; and updated legal systems (based on the core values) that are fair, nimble, and capable of controlling dangerous humans and AGIs. Of the core values, Truth and Civility are particularly impactful in keeping the world moving in a positive direction." Full scenario here.

Compare with yours:

We want to live in a world where:
Humans can create economic value for themselves and can disrupt existing elites well after AGI.
Everyone has an unprecedentedly high standard of living, both to meet their needs and to keep money flowing in the human economy.
No single actor or oligarchy—whether that be governments, companies, or a handful of individuals—monopolizes AGI. By extension, no single actor monopolizes power.
Regular people are in control of their destiny. We hold as a self-evident truth that humans should be the masters of their own futures.

Close enough.

Reflections and findings about the FLI contest are here.

Thoughts on Averting the Intelligence Curse via AI Safety via Law here.

Thoughts on Diffusing and Democratizing AI through next-generation virtual assistants (Personal Agents) here.

Anthony Aguirre's argument for pursuing narrow(er) AI over AGI here.

Hopefully something of interest.

Fixed link: https://medium.com/@jeffj4a/personal-agents-will-enable-direct-democracy-9413f5607c15

I developed this idea here: https://medium.com/@jeffj4a/personal-agents-will-enable-direct-democracy-9413f5607c15 Pretty much the same byline as yours. :-)

[Fixed link]

Sure. Getting appropriate new laws enacted is an important element. From the paper:

Initially, in addition to adopting existing bodies of law to implement AISVL, existing processes for how laws are drafted, enacted, enforced, litigated, and maintained would be preserved.
Thereafter, new laws and improvements to existing laws and processes must continually be introduced to make the systems more robust, fair, nimble, efficient, consistent, understandable, accepted, complied with, and enforced.

I'd say the EU AI Act (and similar) work addresses the "new laws" imperative. (I won't comment (much) on pros and cons of its content. In general, it seems pretty good. I wonder if they considered adding Etzioni's first law to the mix, "An AI system must be subject to the full gamut of laws that apply to humans"? That is what I meant by "adopting existing bodies of law to implement AISVL." The item in the EU AI Act about designing generative AIs to not generate illegal content is related.)

The more interesting work will be on improving legal processes along the dimensions listed above. And really interesting will be, as AIs get more autonomous and agentic, the "instilling" part where AIs must dynamically recognize and comply with the legal-moral corpora appropriate to the contexts they find themselves in.

If an AI can be Aligned externally, then it's already safe enough. It feels like...
You're not talking about solving Alignment, but talking about some different problem. And I'm not sure what that problem is.
For your proposal to work, the problem needs to be already solved. All the hard/interesting parts need to be already solved.

I'm talking about the need for all AIs (and humans) to be bound by legal systems that include key consensus laws/ethics/values. It may seem obvious, but I think this position is under-appreciated and not universally accepted.

By focusing on the external legal system, many key problems associated with alignment (as recited in the Summary of Argument) are addressed. One worth highlighting is 4.4, which suggests AISVL can assure alignment in perpetuity despite changes in values, environmental conditions, and technologies, i.e., a practical implementation of Yudkowsky's CEV.

I'm not sure exactly how many people are working on it, but I have the impression that it is more than a dozen, since I've met some of them without trying.

Glad to hear it. I hope to find and follow such work. The people I'm aware of are listed on pp. 3-5 of the paper. Was happy to see O'Keefe, Bai et al. (Anthropic), and Nay leaning this way.

It seems to me like you are somewhat shrugging off those concerns, since the technological interventions (eg smart contracts, LLMs understanding laws, whatever self-driving-car people get up to) are very "light" in the face of those "heavy" concerns. But a legal approach need not shrug off those concerns. For example, law could require the kind of verification we can now apply to airplane autopilot be applied to self-driving-cars as well. This would make self-driving illegal in effect until a large breakthrough in ML verification takes place, but it would work!

Yes. I'm definitely being glib about implementation details. First things first. :)

I agree with you that if self-driving-cars can't be "programmed" (instilled) to be adequately law-abiding, their future isn't bright. Per above, I'm heartened by Anthropic's Constitutional AI (priming LLMs with basic "laws") having some success getting AIs to behave. Ditto for anecdotes I've heard about "asking an LLM to come up with a money-making plan that doesn't violate any laws." Seems too easy right?

One final comment about implementation details. In the appendix I note:

We suspect emergence of instrumental values is not inevitable for any “sufficiently advanced AI system.” Rather, whether such values emerge depends on what cognitive architecture and environmental conditions (training regimens) are used.

Broadly speaking, implementing AIs using safe architectures (ones not prone to law-breaking) is another implementation direction. Drexler's CAIS may be an example.

I believe you have to argue two things:
Laws are distinct enough from human values (1), but following laws / caring about laws / reporting about predicted law violations prevents the violation of human values (2).
I think the post fails to argue both points. I see no argument that instilling laws is distinct enough from instilling values/corrigibility/human semantics in general (1) and that laws actually prevent misalignment (2).

My argument goes in a different direction. I reject premise (1) and claim there is an "essential equivalence and intimate link between consensus ethics and democratic law [that] provide a philosophical and practical basis for legal systems that marry values and norms (“virtue cores”) with rules that address real world situations (“consequentialist shells”)."

In the body of the paper I characterize democratic law and consensus ethics as follows:

Both are human inventions intended to facilitate the wellbeing of individuals and the collective. They represent shared values culturally determined through rational consideration and negotiation. To be effective, democratic law and consensus ethics should reflect sufficient agreement of a significant majority of those affected. Democratic law and consensus ethics are not inviolate physical laws, instinctive truths, or commandments from deities, kings, or autocrats. They do not represent individual values, which vary from person to person and are often based on emotion, irrational ideologies, confusion, or psychopathy.

That is, democratic law corresponds to the common definition of Law. Consensus ethics is essentially equivalent to human values when understood in the standard philosophical sense as "shared values culturally determined through rational consideration and negotiation." In short, I'm of the opinion "Law = Ethics."

Regarding your premise (2): See my reply to Abram's comment. I'm mostly ducking the "instilling" aspects. I'm arguing for open, external, effective legal systems as the key to AI alignment and safety. I see the implementation/instilling details as secondary.

If AI can be just asked to follow your clever idea, then AI is already safe enough without your clever idea. "Asking AI to follow something" is not what Bostrom means by direct specification, as far as I understand.

My reference to Bostrom's direct specification was not intended to match his use, i.e., hard coding (instilling) human values in AIs. My usage refers to specifying rules/laws/ethics externally so they are available and usable by all intelligent systems. Of the various alignment approaches Bostrom mentioned (and deprecated), I thought direct specification came closest to AISVL.

I feel as if there is some unstated idea here that I am not quite inferring. What is the safety approach supposed to be? If there were an organization devoted to this path to AI safety, what activities would that organization be engaged in?

The summary I posted here was just a teaser to the full paper (linked in pgph. 1). That said, your comments show you reasoned pretty closely to points I tried to make therein. Almost no need to read it. :)

The first part is just "regulation". The second part, "instilling law-abiding values in AIs and humans", seems like a significant departure. It seems like the proposal involves both (a) designing and enacting a set of appropriate laws, and (b) finding and deploying a way of instilling law-abiding values (in AIs and humans). Possibly (a) includes a law requiring (b): AIs (and AI-producing organizations) must be designed so as to have law-abiding values within some acceptable tolerances.

The main message of the paper is along the lines of "a." That is, per the claim in the 4th pgph, "Effective legal systems are the best way to address AI safety." I'm arguing that having effective legal systems and laws are the critical things. How laws/values get instilled in AIs (and humans) is mostly left as an exercise for the reader. Your point about "simply outlawing designs not compatible" is reasonable.

The way I put it in the paper (sect. 3, pgph. 2): "Many of the proposed non-law-based solutions may be worth pursuing to help assure AI systems are law abiding. However, they are secondary to having a robust, well-managed, readily available corpus of codified law—and complimentary legal systems—as the foundation and ultimate arbiter of acceptable behaviors for all intelligent systems, both biological and mechanical."

Later I write, "Suggested improvements to law and legal process are mostly beyond the scope of this brief. It is possible, however, that significant technological advances will not be needed for implementing some key capabilities. For example, current Large Language Models are nearly capable of understanding vast legal corpora and making appropriate legal decisions for humans and AI systems (Katz et al., 2023). Thus, a wholesale switch to novel legal encodings (e.g., computational and smart contracts) may not be necessary."

I suspect some kind of direct specification approach (per Bostrom classification) could work where AIs confirm that (non-trivial) actions they are considering comply with legal corpora appropriate to current contexts before taking action. I presume techniques used by the self-driving-car people will be up to the task for their application.

"AI safety via law can address the full range of safety risks" seems to over-sell the whole section, a major point of which is to claim that AISVL does not apply to the strongest instrumental-convergence concerns. (And why not, exactly? It seems like, if the value-instilling tech existed, it would indeed avert the strongest instrumental-convergence concerns.)

I struggled with what to say about AISVL wrt superintelligence and instrumental convergence. Probably should have let the argument ride without hedging, i.e., superintelligences will have to comply with laws and the demands of legal systems. They will be full partners with humans in enacting and enforcing laws. It's hard to just shrug off the concerns of the Yudkowskys, Bostroms, and Russells of the world.

Perhaps the most important and (hopefully) actionable recommendation of the proposal is in the conclusion:

"For the future safety and wellbeing of all sentient systems, work should occur in earnest to improve legal processes and laws so they are more robust, fair, nimble, efficient, consistent, understandable, accepted, and complied with."

LESSWRONG
is fundraising!
LW

LESSWRONG
is fundraising!
LW

Posts

Wikitag Contributions

Comments

Posts

Wikitag Contributions

Comments