Ariel_ — LessWrong

LESSWRONG
LW

Scheming Toy Environment: "Incompetent Client"

Thanks! Tbh, I never would have posted it if not for encouragement from a friend

Yeah, I agree premotrem is not super commonly used. Not sure where I learned it, maybe an org design course. I mainly gave that as an example of over-eagerness to name existing things - perhaps there aren't that many examples which are as clear cut, maybe in many of them the new term is actually subtly different from the existing term.

But I would guess that a quick Google search could have found the "premortem" term, and reduced one piece of jargon.

Linda Linsefors's Shortform

Ariel_3mo10

In my experience, people saying they "updated" did not literally change a percentage or propagate a specific fact through their model. Maybe it's unrealistic to expect it to be so granular, but to me it devalues the phrase and so I try to avoid it unless I can point to a somewhat specific change in my model. Whereas usually my model (e.g. of a subset of AI risks) is not really detailed enough to actually perform a Bayesian update, but more to just generally change my mind or learn something new and maybe gradually/subconsciously rethink my position.

Maybe I have too high bar for what counts as a bayesian updates - not sure? But if not, then I think "I updated" would count more often as social signaling or as appropriation of a technical term to a non-technical usage. Which is fine, but seems less than ideal for LW/AI Safety people.

So I would say that jargon has this problem (of being used too casually/technically imprecise) sometimes, even if I agree with your inferential distance point.

As far as LW jargon being interchangeable with existing language - one case I can think of is "murphyjitsu", which is basically exactly a "premortem" (existing term) - so maybe there's a bit of over-eagerness to invent a new term instead of looking for an existing one.

Announcing Trajectory Labs - A Toronto AI Safety Office

Ariel_6mo51

Congrats! Can confirm that it is a great office :)

Orienting Toward Wizard Power

Ariel_6mo30

my dream dwelling was a warehouse filled with whatever equipment one could possibly need to make things and run experiments in a dozen different domains

I had a similar dream, though I mostly thought about it from the context of "building cool fun mechanical stuff" and working on cars/welding bike frames. I think the actual usefulness might be a bit overrated, but still would be fun to have.

I do have a 3D printer though, and a welder (though I don't have anywhere to use it - needs high voltage plug). Again though not sure how useful these things are - it seems to me like it is mostly for fun, and in the end the novelty wears off a bit once I realize that building something actually useful will take way more time than I want to spend on non-AI safety work.

But maybe that is something I shouldn't have given up on that quickly, perhaps it is a bit of "magic" that makes life fun and maybe even a few actually cool inventions could come from this kind of tinkering. And maybe that would also permeate into how I approach my AI safety work.

The EU Is Asking for Feedback on Frontier AI Regulation (Open to Global Experts)—This Post Breaks Down What’s at Stake for AI Safety

Ariel_6mo20

Sure! and yeah regarding edits - I have not gone through the full request for feedback yet, I expect to have a better sense late next week of which contributions are most needed and how to prioritize. I mainly wanted to comment first on obvious things that stood out to me from the post.

There is also an Evals workshop in Brussels on Monday where we might learn more. I've know of some some non-EU based technical safety researchers who are attending, which is great to see.

The EU Is Asking for Feedback on Frontier AI Regulation (Open to Global Experts)—This Post Breaks Down What’s at Stake for AI Safety

Ariel_6mo30

I'd suggest updating the language in the post to clarify things and not overstate :)

Regarding the 3rd draft - opinions varied between people I work with but we are generally happy. Loss of Control is included in the selected systemic risks, as well as CBRN. Appendix 1.2 also has useful things, though some valid concerns got raised there on compatibility with the AI Act language that still need tweaking (possobly merging parts of 1.2 into selected systemic risks). As far as interpretability - the code is meant to be outcome based, and the main reason evals are mentioned is that they are in the act. Prescribing interpretability isn't something the code can do, and also probably shouldn't as these techniques arent good enough yet to be prescribed as mandatory for mitigating systemic risks.

The EU Is Asking for Feedback on Frontier AI Regulation (Open to Global Experts)—This Post Breaks Down What’s at Stake for AI Safety

Ariel_6mo910

FYI I wouldn't say at all that AI safety is under-represented in the EU (if anything, it would be easier to argue the opposite). Many safety orgs (including mine) supported the Codes of Practice, and almost all the Chairs and vice chairs are respected governance researchers. But probably still good for people to give feedback, just don't want to give the impression that this is neglected.

Also no public mention of intention to sign the code has been made as far as I know. Though apart from copyright section, most of it is in line with RSPs, which makes signing more reasonable.

So You Want To Make Marginal Progress...

Ariel_9mo40

Good point. Thinking of robotics overall, it's much more of a bunch of small stuff than one big thing. Though it depends how far you "zoom out" I guess. Technically Linear Algebra itself, or the Jacobian, is an essential element of robotics. But could also zoom in on a different aspect and then say that "zero backlash gearboxes" (where Harmonic Drive is notable as it's much more compact and accurate than prev versions - but perhaps a still small effect in the big picture) are the main element. Or PID control, or high resolution encoders.

I'm not quite sure how to think of how these all fit together to form "robotics" and whether they are small elements of a larger thing, or large breakthroughs stacked over the course of many years (where they might appear small at that zoomed out level).

I think that if we take a snapshot in a specific time (e.g. 5 years) in robotics, there will often be one or very few large bottlenecks that are holding it back. Right now it is mostly ML/vision and batteries. 10-15 years ago, maybe it was the CPU real time processing latency or the motor power density. A bit earlier it might be gearbox. These things were fairly major bottlenecks until they got good enough that it switches to a minor revision/iteration regime (nowadays there's not much left to improve on gearboxes e.g., except for maybe in very specific use cases)

So You Want To Make Marginal Progress...

Ariel_9mo90

Other examples of fields like this include: medicine, mechanical engineering, education, SAT solving, and computer chess.

To give a maybe helpful anecdote - I am a mechanical engineer (though I now work in AI governance), and in my experience that isnt true at least for R&D (e.g. a surgical robot) where you arent just iterating or working in a highly standardized field (aerospace, hvac, mass manufacturing etc). The "bottleneck" in that case is usually figuring out the requirements (e.g. which surgical tools to support? whats the motion range, design envelope for interferences). If those are wrong, the best design will still be wrong.

In more standardized engineering fields the requirements (and user needs) are much better known, so perhaps the bottleneck now becomes a bunch of small things rather than one big thing.

LESSWRONG
LW

LESSWRONG
LW

Posts

Wikitag Contributions

Comments