LESSWRONG
LW

Harrison G
42250
Message
Dialogue
Subscribe

Interested in AI alignment, thinking about ethics, tap dancing, playing instruments, and wearing sandals year-round.

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
[Linkpost] Introducing Superalignment
Harrison G2y30

The quote: "Finally, we can test our entire pipeline by deliberately training misaligned models, and confirming that our techniques detect the worst kinds of misalignments (adversarial testing)."

Reply
More ways to spot abysses
Harrison G3y30

Super helpful; thanks for writing!

Reply
On sincerity
Harrison G3y30

(read: The Athena-Parfit Long-Term Institute for Raising for Effectively Prioritizing Global Alignment Challenges)

I laughed about this for a while. Thank you for this though-provoking post, and for incorporating occasional humor throughout.

Reply
Things I carry almost every day, as of late December 2022
Harrison G3y10

At the top right is a pocket constitution made by Legal Impact for Chickens. I received this at an Effective Altruism Global conference, during the career fair. What actually happened was that someone came up to the booth I was at holding the pocket constitution, I noted that it looked cool, and they were kind enough to offer it to me. Unfortunately, I have never knowingly met anybody from Legal Impact for Chickens. I have not actually used this pocket constitution, but I carry it anyway in my winter jacket’s inner breast pocket since (a) it fits very unobtrusively and (b) it seems cool to carry around a pocket constitution.

If this was EAG SF, I remember an experience that sounds very similar to this, and I think I was this person! Ha

Reply
A Proof Against Oracle AI
Harrison G3y00

" [...] since every string can be reconstructed by only answering yes or no to questions like 'is the first bit 1?' [...]"

Why would humans ever ask this question, and (furthermore) why would we ever ask this question n number of times? It seems unlikely, and easy to prevent. Is there something I'm not understanding about this step?

Reply
16Apply to the Cambridge ERA:AI Fellowship 2025
4mo
0
10Thinking About Propensity Evaluations
11mo
0
13A Taxonomy Of AI System Evaluations
11mo
0
11Distilled - AGI Safety from First Principles
3y
1