The shortform of Ole Q Doc

Quinn

LESSWRONG
LW

The shortform of Ole Q Doc — LessWrong

170 comments, sorted by

top scoring

Click to highlight new comments since: Today at 9:26 AM

[-]Quinn1y7135

`<standup_comedian>` What's the deal with evals `</standup_comedian>`

epistemic status: tell me I'm wrong.

Funders seem particularly enchanted with evals, which seems to be defined as "benchmark but probably for scaffolded systems and scoring that is harder than scoring most of what we call benchmarks".

I can conjure a theory of change. It's like, 1. if measurement is bad then we're working with vibes, so we'd like to make measurement good. 2. if measurement is good then we can demonstrate to audiences (especially policymakers) that warning shots are substantial signals and not base it on vibes. (question: what am I missing?)

This is an at least coherent reason why dangerous capability evals pay into governance strats in such a way that maybe philanthropic pressure is correct. It relies on cruxes that I don't share, like that a principled science of measurement would outperform vibes in a meme war in the first place, but it at least has a crux that works as a fulcrum.

Everything worth doing is at least a little dual use, I'm not attacking anybody. But it's a faustian game where, like benchmarks, evals pump up races cuz everyone loves it when number go up. The primal urge to see... (read more)

[-]MichaelDickens1y1213

If I'm allowed to psychoanalyze funders rather than discussing anything at the object level, I'd speculate that funders like evals because:

If you funded the creation of an eval, you can point to a concrete thing you did. Compare to funding theoretical technical research, which has a high chance of producing no tangible outputs; or funding policy work, which has a high chance of not resulting in any policy change. (Streetlight Effect.)
AI companies like evals, and funders seem to like doing things AI companies like, for various reasons including (a) the thing you funded will get used (by the AI companies) and (b) you get to stay friends with the AI companies.

7Lech Mazur1y

This might blur the distinction between some evals. While it's true that most evals are just about capabilities, some could be positive for improving LLM safety. I've created 8 (soon to be 9) LLM evals (I'm not funded by anyone, it's mostly out of my own curiosity, not for capability or safety or paper publishing reasons). Using them as examples, improving models to score well on some of them is likely detrimental to AI safety: https://github.com/lechmazur/step_game - to score better, LLMs must learn to deceive others and hold hidden intentions https://github.com/lechmazur/deception/ - the disinformation effectiveness part of the benchmark Some are likely somewhat negative because scoring better would enhance capabilities: https://github.com/lechmazur/nyt-connections/ https://github.com/lechmazur/generalization Others focus on capabilities that are probably not dangerous: https://github.com/lechmazur/writing - creative writing https://github.com/lechmazur/divergent - divergent thinking in writing However, improving LLMs to score high on certain evals could be beneficial: https://github.com/lechmazur/goods - teaching LLMs not to overvalue selfishness https://github.com/lechmazur/deception/?tab=readme-ov-file#-disinformation-resistance-leaderboard - the disinformation resistance part of the benchmark https://github.com/lechmazur/confabulations/ - reducing the tendency of LLMs to fabricate information (hallucinate) I think it's possible to do better than these by intentionally designing evals aimed at creating defensive AIs. It might be better to keep them private and independent. Given the rapid growth of AI capabilities, the lack of apparent concern for an international treaty (as seen in the recent Paris AI summit), and the competitive race dynamics among companies and nations, specifically developing an AI to protect us from threats from other AIs or AIs + humans might be the best we can hope for.

5ozziegooen1y

(potential relevant meme)

3Shankar Sivarajan1y

That's an edited version of this: My neighbor told me coyotes keep eating his outdoor cats so I asked how many cats he has and he said he just goes to the shelter and gets a new cat afterwards so I said it sounds like he’s just feeding shelter cats to coyotes and then his daughter started crying.

5ozziegooen1y

Yep - I saw other meme-takes like this, assumed people might be familiar enough with it.

3MichaelDickens1y

I was familiar enough to recognize that it was an edit of something I had seen before, but not familiar enough to remember what the original was

[-]Quinn1y4023

$PERSON at $LAB once showed me an internal document saying that there are bad benchmarks - dangerous capability benchmarks - that are used negatively, so unlike positive benchmarks where the model isn't shipped to prod if it performs under a certain amount, these benchmarks could block a model from going to prod that performs over a certain amount. I asked, "you create this benchmark like it's a bad thing, and it's a bad thing at your shop, but how do you know it won't be used in a sign-flipped way at another shop?" and he said "well we just call it EvilBench and no one will want to score high on EvilBench".

It sounded like a ridiculous answer, but is maybe actually true in the case of labs. It is extremely not true in the open weight case, obviously huggingface user Yolo4206969 would love to score high on EvilBench.

[-]Nathan Helm-Burger1y141

This is exactly why the bio team for WMDP decided to deliberately include distractors involving relatively less harmful stuff. We didn't want to publicly publish a benchmark which gave a laser-focused "how to be super dangerous" score. We aimed for a fuzzier decision boundary. This brought criticism from experts at the labs who said that the benchmark included too much harmless stuff. I still think the trade-off was worthwhile.

[-]Quinn1y*270

august 2024 guaranteed safe ai newsletter

in case i forgot last month, here's a link to july

A wager you say

One proof of concept for the GSAI stack would be a well-understood mechanical engineering domain automated to the next level and certified to boot. How about locks? Needs a model of basic physics, terms in some logic for all the parts and how they compose, and some test harnesses that simulate an adversary. Can you design and manufacture a provably unpickable lock?

Zac Hatfield-Dodds (of hypothesis/pytest and Anthropic, was offered and declined authorship on the GSAI position paper) challenged Ben Goldhaber to a bet after Ben coauthored a post with Steve Omohundro. It seems to resolve in 2026 or 2027, the comment thread should get cleared up once Ben gets back from Burning Man. The arbiter is Raemon from LessWrong.

Zac says you can’t get a provably unpickable lock on this timeline. Zac gave (up to) 10:1 odds, so recall that the bet can be a positive expected value for Ben even if he thinks the event is most likely not going to happen.

For funsies, let’s map out one path of what has to happen for Zac to pay Ben $10k. This is not the canonical path, but it is a path:

Physics to

... (read more)

2habryka1y

Oh, I liked this one. Mind if I copy it into your shortform (or at least like the first few paragraphs so people can get a taste?)

1Quinn1y

By all means. Happy for that

[-]Quinn10mo190

I get pretty intense visceral outrage at overreaches in immigration enforcement, just seems the height of depravity. Ive looked for a lot of different routes to mental coolness over the last decade (since Trump started his speeches), they mostly amount to staying busy and distracted. Just seems like a really cost ineffective kind of activism to get involved in. Bankrolling lawyers for random people isn't really in my action space and if it was i'd have opportunity cost to consider.

3Cole Wyeth10mo

Unfortunately, it seems that my action space doesn’t include options that matter in this current battle. Personally, my reaction to this kind of insanity is to keep climbing my local status/influence/wealth/knowledge gradient, in the hopes that my actions are relevant in the future. But perhaps it’s a reason to prioritize gaining power - this reminds me of https://www.lesswrong.com/posts/ottALpgA9uv4wgkkK/what-are-you-getting-paid-in

[-]Quinn1y152

Feels like a MATS-like Program in india is a big opportunity. When I went to EAG in Singapore a while ago there were so many people underserved by the existing community building and mentorship organizations cuz of visa issues.

3Chris_Leong1y

Impact Academy was doing this, before they pivoted towards the Global AI Safety Fellowship. It's unclear whether any further fellowships should be in India or a country that is particularly generous with its visas.

1Milan W1y

A while ago there was a MATS-like AIS fellowship in Mexico. I think Mexico may have been selected partly because of this.

1samuelshadrach1y

As someone from India. Important consideration: Probably not worth asking people to uproot their social lives two times instead of once unless absolutely necessary. Long-term social bonds matter more than short-term ones, and long-term bonds are also more dependent on living in the same place than short-term bonds.

3Chris_Leong1y

Fellowships are typically only for a few month and even if you're in India, you'd likely have to move for the fellowship unless it happened to be in your exact city.

[-]Quinn5y110

Good arguments - notes on Craft of Research chapter 7

Arguments take place in 5 parts.

Claim: What do you want me to believe?

Reasons: Why should I agree?

Evidence: How do you know? Can you back it up?

Acknowledgment and Response: But what about ... ?

Warrant: How does that follow?

This can be modeled as a conversation with readers, where the reader prompts the writer to taking the next step on the list.

Claim ought to be supported with reasons. Reasons ought to be based on evidence. Arguments are recursive: a part of an argument is an acknowledgment of an anticipated response, and another argument addresses that response. Finally, when the distance between a claim and a reason grows large, we draw connections with something called warrants.

The logic of warrants proceeds in generalities and instances. A general circumstance predictably leads to a general consequence, and if you have an instance of the circumstance you can infer an instance of the consequence.

Arguing in real life papers is complexified from the 5 steps, because

Claims should be supported by two or more reasons
A writer can anticipate and address numerous responses. As I mentioned, arguments are recursive, especi

... (read more)

[-]Quinn3mo100

sorry about the spam from my profile. the automated RSS ingest freaked out when I changed my substack domain and sent all those duplicates.

[-]Quinn2y100

Thinking about a top-level post on FOMO and research taste

Fear of missing out defined as inability to execute on a project cuz there's a cooler project if you pivot
but it also gestures at more of a strict negative, where you think your project sucks before you finish it, so you never execute
was discussing this with a friend: "yeah I mean lesswrong is pretty egregious cuz it sorta promotes this idea of research taste as the ability to tear things down, which can be done armchair"
I've developed strategies to beat this FOMO and gain more depth and detail with projects (too recent to see returns yet, but getting there) but I also suspect it was nutritious of me to develop discernment about what projects are valuable or not valuable for various threat models and theories of change (in such a way that being a phd student off of lesswrong wouldn't have been as good in crucial ways, tho way better in other ways).
- but I think the point is you have to turn off this discernment sometimes, unless you want to specialize in telling people why their plans won't work, which I'm more dubious on the value of than I used to be

Idk maybe this shortform is most of the value of the top level post

[-]Quinn2y100

A trans woman told me

I get to have all these talkative blowhard traits and no one will punish me for it cuz I'm a girl. This is one major reason detrans would make my life worse. Society is so cruel to men, it sucks so much for them

And another trans woman had told me almost the exact same thing a couple months ago.

My take is that roles have upsides and downsides, and that you'll do a bad job if you try to say one role is better or worse than another on net or say that a role is more downside than upside. Also, there are versions of "women talk too much" as a stereotype in many subcultures, but I don't have a good inside view about it.

3Elizabeth2y

This may be true, but it might be that she's incurring a bunch of social penalities she isn't aware of. Women are less likely to overtly punish, so if she's spending more time with women that could already explain it. No one yells at you to STFU, but you miss out on party invite you would have gotten if you shared the conversation better. I suspect men are also more willing to tell other men to STFU than they are to say it to women, but will let someone else speak to that question.

2Viliam2y

The fact that both roles have advantages and disadvantage doesn't necessarily prove that neither is better on net. Then again, "better" by what preferences? Lucky are the people whose preferences match the role they were assigned. To me it seems that women have a greater freedom of self-expression, as long as they are not competitive. Men are treated instrumentally: they are socially allowed to work and to compete against each other, anything else is a waste of energy. For example, it is okay for a man to talk a lot, if he is a politician, manager, salesman, professor, priest... simply, if it is a part of his job. And when he is seducing a woman. Otherwise, he should be silent. Women are expected to chit-chat all the time, but they should never contradict men, or say anything controversial.

3Quinn2y

one may be net better than the other, I just think the expected error washes out all of one's reasoning so individuals shouldn't be confident they're right.

[-]Quinn5y100

thoughts on chapter 9 of Craft of Research

Getting the easy things right shows respect for your readers and is the best training for dealing with the hard things.

If they don't believe the evidence, they'll reject the reasons and, with them, your claim.

We saw previously that claims ought to be supported with reasons, and reasons ought to be based on evidence. Now we will look closer at reasons and evidence.

Reasons must be in a clear, logical order. Atomically, readers need to buy each of your reasons, but compositionally they need to buy your logic. Storyboarding is a useful technique for arranging reasons into a logical order: physical arrangements of index cards, or some DAG-like syntax. Here, you can list evidence you have for each reason or, if you're speculating, list the kind of evidence you would need.

When storyboarding, you want to read out the top level reasons as a composite entity without looking at the details (evidence), because you want to make sure the high-level logic makes sense.

Readers will not accept a reason until they see it anchored in what they consider to be a bedrock of established fact. ... To count as evidence, a statement must report something tha

... (read more)

[-]Quinn5y100

Sources - notes on Craft of Research chapters 5 and 6

Primary, secondary, and tertiary sources

Primary sources provide you with the "raw data" or evidence you will use to develop, test, and ultimately justify your hypothesis or claim. Secondary sources are books, articles, or reports that are based on primary sources and are intended for scholarly or professional audiences. Tertiary sources are books and articles that synthesize and report on secondary sources for general readers, such as textbooks, articles in encyclopedias, and articles in mass-circulation publications.

The distinction between primary and secondary sources comes from 19th century historians, and the idea of tertiary sources came later. The boundaries can be fuzzy, and are certainly dependent on the task at hand.

I want to reason about what these distinctions look like in the alignment community, and whether or not they're important.

The rest of chapter five is about how to use libraries and information technologies, and evaluating sources for relevance and reliability.

Chapter 6 starts off with the kind of thing you should be looking for while you read

Look for creative agreement

Offer additional support. You can

... (read more)

[-]Quinn4y90

Yesterday I quit my job for direct work on epistemic public goods! Day one of direct work trial offer is April 4th, and it'll take 6 weeks after that to know if I'm a fulltime hire.

I'm turning down

raise to 200k/yr usd
building lots of skills and career capital that would give me immense job security in worlds where investment into one particular blockchain doesn't go entirely to zero
having fun on the technical challenges

for

confluence of my skillset and a theory of change that could pay huge dividends in the epistemic public goods space
0.35x paycut

... (read more)

[-]Quinn1y80

did anyone draw up an estimate of how much the proportion of code written by LLMs will increase? or even what the proportion is today

[-]Quinn3y80

How are people mistreated by bellcurves?

I think this is a crucial part of a lot of psychological maladaption and social dysfunction, very salient to EAs. If you're way more trait xyz than anyone you know for most of your life, your behavior and mindset will be massively effected, and depending on when in life / how much inertia you've accumulated by the time you end up in a different room where suddenly you're average on xyz, you might lose out on a ton of opportunities for growth.

In other words, the concept of "big fish small pond" is deeply insightful a... (read more)

5Viliam3y

So, being a "big fish in a small pond" teaches you habits that become harmful when you later move to a larger pond. But if you don't move, you can't grow further. I think the specific examples are more known that the generalization. For example: Many people in Mensa are damaged this way. They learned to be the smartest ones, which they signal by solving pointless puzzles, or by talking about "smart topics" (relativity, quantum, etc.) despite the fact that they know almost nothing about these topics. Why did they learn these bad habits? Because this is how you most efficiently signal intelligence to people who are not themselves intelligent. But it fails to impress the intelligent people used to meeting other intelligent people, because they see the puzzles as pointless, they see the smart talk as bullshit if they ever read an introductory textbook on the topic, and will ask you about your work and achievements instead. The useful thing would instead be to learn how to cooperate with other intelligent people on reaching worthy goals. People who are too smart or too popular at elementary school (or high school) may be quite shocked when they move to a high school (or university) and suddenly their relative superpowers are gone. If they learned to rely on them too much, they may have a problem adapting to normal hard work or normal friendships. Staying at the same job for too long might have a similar effect. You feel like an expert because you are familiar with all systems in the company. Then at some moment fate makes you change jobs, and suddenly you realize that you know nothing, that the processes and technologies used in your former company were maybe obsolete. But the more you delay changing jobs, the harder it becomes. I remember reading in a book by László Polgár, father of the famous female chess players, how he wanted his girls to play in the "men's" chess league since the beginning, because that's what he wanted them to win. He was afraid that playing

4Elizabeth2y

I think this wasn't true at the time, at least in Hungary. The oldest sister and their father spent a lot of time fighting this, so it was ~true by the time the youngest sister got really competitive. This might prove the larger point, since the youngest sister also went the farthest.

2Viliam2y

Uh, good catch! Then I am surprised that they actually succeeded to win this. It would be too easy and possibly very tempting to just say "you broke the rules, disqualified!" Or at least, I would expect a debate to last for a decade, and then it would be too late for the Polgár sisters.

3Quinn2y

yeah IQ ish things or athletics are the most well-known examples, but I only generalized in the shortform cuz I was looking around at my friends and thinking about more Big Five oriented examples. Certainly "conscientiousness seems good but I'm exposed to the mistake class of unhelpful navelgazing, so maybe I should be less conscientious" is so much harder to take seriously if you're in a pond that tends to struggle with low conscientiousness. Or being so low on neuroticism that your redteam/pentest muscles atrophy.

2Viliam2y

That sounds intriguing. I would like to read an article with many specific (even if fictional) examples.

[-]Quinn5y80

nonprosaic ai will not be on short timelines

I think a property of my theory of change is that academic and commercial speed is a bottleneck. I recently realized that my mass assignment for timelines synchronized with my mass assignment for the prosaic/nonprosaic axis. The basic idea is that let's say a radical new paper that blows up and supplants the entire optimization literature gets pushed to the arxiv tomorrow, signaling the start of some paradigm that we would call nonprosaic. The lag time for academics and industry to figure out what's going on, fi... (read more)

2ChristianKl5y

The reasoning assumes that ideas are first generated in academia and don't arise inside of companies. With DeepMind outperforming the academic protein folding community when protein folding isn't even the main focus of DeepMind I consider it plausible that new approaches arise within a company and get only released publically when they are strong enough to have an effect. Even if there's a paper most radical new papers get ignored by most people and it might be that in the beginning only one company takes the idea seriously and doesn't talk about it publically to keep a competive edge.

1Quinn5y

That's totally fair, but I have a wild guess that the pipeline from google brain to google products is pretty nontrivial to traverse, and not wholly unlike the pipeline from arxiv to product.

2Steven Byrnes5y

How short is "short" for you? Like, AlexNet was 2012, DeepMind patented deep Q learning in 2014, the first TensorFlow release was 2015, the first PyTorch release was 2016, the first TPU was 2016, and by 2019 we had billion-parameter GPT-2 … So if you say "Short is ≤2 years", then yeah, I agree. If you say "Short is ≤8 years", I think I'd disagree, I think 8 years might be plenty for a non-prosaic approach. (I think there are a lot of people for whom AGI in 15-20 years still counts as "short timelines". Depends on who you're talking to, I guess.)

1Quinn5y

I should've mentioned in OP but I was lowkey thinking upper bound on "short" would be 10 years. I think developer ecosystems are incredibly slow (longer than ten years for a new PL to gain penetration, for instance). I guess under a singleton "one company drives TAI on its own" scenario this doesn't matter, because tooling tailored for a few teams internal to the same company is enough which can move faster than a proper developer ecosystem. But under a CAIS-like scenario there would need to be a mature developer ecosystem, so that there could be competition.

[-]Steven Byrnes5y100

I feel like 7 years from AlexNet to the world of PyTorch, TPUs, tons of ML MOOCs, billion-parameter models, etc. is strong evidence against what you're saying, right? Or were deep neural nets already a big and hot and active ecosystem even before AlexNet, more than I realize? (I wasn't paying attention at the time.)

Moreover, even if not all the infrastructure of deep neural nets transfers to a new family of ML algorithms, much of it will. For example, the building up of people and money in ML, the building up of GPU / ASIC servers and the tools to use them, the normalization of the idea that it’s reasonable to invest millions of dollars to train one model and to fab ASICs tailored to a particular ML algorithm, the proliferation of expertise related to parallelization and hardware-acceleration, etc. So if it took 7 years from AlexNet to smooth turnkey industrial-scale deep neural nets and billion-parameter models and zillions of people trained to use them, then I think we can guess <7 years to get from a different family of learning algorithms to the analogous situation. Right? Or where do you disagree?

4Quinn5y

No you're right. I think I'm updating toward thinking there's a region of nonprosaic short-timelines universes. Overall it still seems like that region is relatively much smaller than prosaic short-timelines and nonprosaic long-timelines, though.

[-]Quinn4mo71

Social graph density leads to millions of acquaintances and few close friends, because you don’t need to treasure each other

2Viliam4mo

Yeah, that's basically "people in big cities". But if you care about some special trait that only a few people have, those people become precious again.

[-]Quinn4mo72

epistemic status: jotting down casually to make sure I have a specific "I Told You So" hyperlink in a year or so

I think 9-10 figures have gone into math automation in 2025, across VC, philanthropy, and a percentage of frontier company expenditure (though if we want to look at the latter, a proper fermstimate would I think get much wider than if you were just counting up all the VC bucks). In the startup case, it looks an awful lot like creating press releases to attract funding and talent, with not a lot of product clarity.

I have been guilty in the past of... (read more)

3Vladimir_Nesov4mo

I think this only helps with security vulnerabilities (bugs that are not vulnerabilities are not that impactful, and I don't see this generalizing beyond ordinary software security pre-takeoff). Plausibly after LLMs can rewrite whole codebases in a different language without making them worse, as the next step after that they could also be taught to rewrite them in a much richer dependently typed language in a way that obsessively tracks all security properties that come to mind. LLMs still can't rewrite codebases (well enough to matter). So this is probably not yet a 2026 thing, but 2027-2028 is plausible, and it probably mostly helps with the still-mostly-speculative future problem of more effective LLM-enabled hacking.

[-]Quinn2y70

Cope isn't a very useful concept.

For every person who has a bad reason that they catch because you say "sounds like cope", there are 10x as many people who find their reason actually compelling. Saying "if that was my reason it would be a sign I was in denial of how hard I was coping" or "I don't think that reason is compelling" isn't really relevant to the person you're ostensibly talking to, who's trying to make the best decisions for the best reasons. Just say you don't understand why the reason is compelling.

2Dagon2y

I'm not sure I have experienced a "sounds like cope" reasoning, or at least it doesn't match to discussions I've noted. Is this similar to "people under stress are bad at updating"? Why would you expect them to be better at communicating than they are at reasoning?

[-]Quinn5y60

Excellence and adequacy

I asked a friend whether I should TA for a codeschool called ${{codeschool}}.

You shouldn't hang around ${{codeschool}}. People at ${{codeschool}} are not pursuing excellence.

A hidden claim there that I would soak up the pursuit of non-excellence by proximity or osmosis isn't what's interesting (though I could see that turning out either way). What's interesting is the value of non-excellence, which I'll call adequacy.

${{codeschool}} in this case is effective and impactful at putting butts in seats at companies, and is thereby re... (read more)

2Viliam5y

Seems to me that on the market there are very few jobs for the SICP types. The more meta something is, the less of that is needed. If you can design an interactive website, there are thousands of job opportunities for you, because thousands of companies want an interactive website, and somehow they are willing to pay for reinventing the wheel. If you can design a new programming language and write a compiler for it... well, it seems that world already has too many different programming languages, but sure there is a place for maybe a dozen more. The probability of success is very small even if you are a genius. The best opportunity for developers who think too meta is probably to design a new library for an already popular programming language, and hope it becomes popular. The question is how exactly you plan to get paid for that. Probably another problem is that it requires intelligence to recognize intelligence, and it requires expertise to recognize expertise. The SICP type developer seems to most potential employers and most potential colleagues as... just another developer. The company does not see individual output, only team output; it does not matter that your part of code does not contain bugs, if the project as a whole does. You cannot use solutions that are too abstract for your colleagues, or for your managers. Companies value replaceability, because it is less fragile and helps to keep developer salaries lower than they might be otherwise. (In theory, you could have a team full of SICP type developers, which would allow them to work smarter, and yet the company would feel safe. In practice, companies can't recognize this type and don't appreciate it, so this is not going to happen.) Again, probably the best position for a SICP type developer in a company would be to develop some library that the rest of the company would use. That is, a subproject of a limited size that the developer can do alone, so they are not limited in the techniques they use,

[-]Quinn1y50

I want a name for the following principle:

the world-spec gap hurts you more than the spec-component gap

I wrote it out much like this a couple years ago and Zac recently said the same thing.

I'd love to be able to just say "the <one to three syllables> principle", yaknow?

1Archimedes1y

How about the "World-Map [Spec] Gap" with [Spec] optional?

[-]Quinn2y50

I used to think "community builder" was a personality trait I couldn't switch off, but once I moved to the bay I realized that I was just desperate for serendipity and knew how to take it from 0% to 1%. Since the bay is constantly humming at 70-90% serendipity, I simply lost the urge to contribute.

Benefactors are so over / beneficiaries are so back / etc.

[-]Quinn3y50

Let FairBot be the player that sends an opponent to Cooperate (C) if it is provable that they cooperate with FairBot, and sends them to Defect (D) otherwise.

Let FairBot_k be the player that searches for proofs of length <= k that it's input cooperates with FairBot_k, and cooperates if it finds one, returning defect if all the proofs of length <= k are exhausted without one being valid.

Critch writes that "100%" of the time, mathematicians and computer scientists report believing that FairBot_k(FairBot_k) = D, owing to the basic vision of a stack overf... (read more)

4JBlack3y

It is almost certainly true that setting k=1, Fairbot_1 defects against Fairbot_1 because there are no proofs of cooperation that are 1 bit in length. There can be exceptions: for instance, where Fairbot_1(Fairbot_1) = C is actually an axiom, and represented with a 1-bit string. It is definitely not true that Fairbot_k cooperates with Fairbot_k for all k and all implementations of Fairbot_k, with or without Löb's theorem. It is also definitely not true that Fairbot_k defects against Fairbot_k in general. Whether they cooperate or defect depends upon exactly what proof system and encoding they are using.

3Jalex S3y

I think that to get the type of the agent, you need to apply a fixpoint operator. This also happens inside the proof of Löb for constructing a certain self-referential sentence. (As a breadcrumb, I've heard that this is related to the Y combinator.)

[-]Quinn4y50

I find myself, just as a random guy, deeply impressed at the operational competence of airports and hospitals. Any good books about that sort of thing?

1JBlack4y

It is pretty impressive that they function as well as they do, but seeing how the sausage is made (at least in hospitals) does detract from it quite substantially. You get to see not only how an enormous number of battle hardened processes prevent a lot of lethal screw-ups, but also how also how sometimes the very same processes cause serious and very occasionally lethal screw-ups. It doesn't help that hospitals seem to be universally run with about 90% of the resources they need to function reasonably effectively. This is possibly because there is relentless pressure to cut costs, but if you strip any more out of them then people start to die from obviously preventable failures. So it stabilizes at a point where everything is much more horrible than it could be, but not quite to an obviously lethal extent. As far as your direct question goes, I don't have any good books to recommend.

[-]Quinn4y50

Rats and EAs should help with the sanity levels in other communities

Consider politics. You should take your political preferences/aesthetics, go to the tribes that are based on them, and help them be more sane. In the politics example, everyone's favorite tribe has failure modes, and it is sort of the responsibility of the clearest-headed members of that tribe to make sure that those failure modes don't become the dominant force of that tribe.

Speaking for myself, having been deeply in an activist tribe before I was a rat/EA, I regret I wasn't there to hel... (read more)

0Viliam4y

But what if that makes my tribe lose the political battle? I mean, if rationality actually helped win political fights, by the power of evolution we already would have been all born rational...

2Pattern4y

1. Evolution does not magically get from A to B instantly. 2. Evolution does not necessarily care about X for many values of X. This can include: winning political fights, whether or not nukes are built and many other things.

[-]Quinn5y50

Claims - thoughts on chapter eight of Craft of Research

Broadly, the two kinds of claims are conceptual and practical.

Conceptual claims ask readers not to ask, but to understand. The flavors of conceptual claim are as follows:

Claims of fact or existence
Claims of definition and classification
Claims of cause and consequence
Claims of evaluation or appraisal

There's essentially one flavor of practical claim

Claims of action or policy.

If you read between the lines, you might notice that a kind of claim of fact or cause/consequence is that a policy work... (read more)

2Viliam5y

This may be context-dependent. Different countries probably have different cultural norms. Norms may differ for higher-status and lower-status speakers. Humble speech may impress some people, but others may perceive it as a sign of weakness. Also, is your audience fellow scientists or are you writing a popular science book? (More hedging for the former, less hedging for the latter.)

[-]Quinn5y50

notes (from a very jr researcher) on alignment training pipeline

Training for alignment research is one part competence (at math, cs, philosophy) and another part having an inside view / gears-level model of the actual problem. Competence can be outsourced to universities and independent study, but inside view / gears-level model of the actual problem requires community support.

A background assumption I'm working with is that training as a longtermist is not always synchronized with legible-to-academia training. It might be the case that jr researchers oug... (read more)

2ChristianKl5y

I don't think Critch's saying that the best way to get his attention is through cold emails backed up by credentials. The whole post is about him not using that as a filter to decide who's worth his time but that people should create good technical writing to get attention.

1philip_b5y

Critch's written somewhere that if you can get into UC Berkeley, he'll automatically allow you to become his student, because getting into UC Berkeley is a good enough filter.

2ChristianKl5y

Where did he say that? Given that he's working at UC Berkeley I would expect him to treat UC Berkeley students preferentially for reasons that aren't just about UC Berkeley being able to filter. It's natural that you can sign up for one of the classes he teaches at UC Berkeley by being a student of UC Berkeley. Being enrolled into MIT might be just as hard as being enrolled into UC Berkeley but it doesn't give you the same access to courses taught at UC Berkeley by it's faculty.

2philip_b5y

http://acritch.com/ai-berkeley/ and also

2ChristianKl5y

Okay, he does speak about using Berkeley as a filter but he doesn't speak about taking people as his student. It seems about helping people in UC Berkeley to connect with other people in UC Berkeley.

[-]Quinn1mo40

Effortposts I keep not getting around to

the ROI of specialization is dominated by a term that's uncorrelated with the upsides what you specifically choose to specialize in
steganography-free certificates: review no-go theorems from information theory in the general case, inspect how onerous the assumptions you need to bound steg capacity are. Is real life a special case where the no-go theorems don't apply?
taelin, lafont, higher order computing, and AI safety. some logic programming module (which is performant and parallel for complicated linear logic r

... (read more)

[-]Quinn1mo40

(Finally read If Anyone Builds It): the fable about defensive acceleration in biotech spooked me pretty good, insofar as I think synthesizing an SL5 grade cloud stack is a good idea. This idea of "we think we're doing monotonic defensive acceleration, and in a very real sense we actually are, but nevertheless the gameboard inexorably marches toward Everyone Dies routing through that very defensive acceleration" could soooooo easily be applied to cybersecurity.

[-]Quinn26d30

I want to do a full post on "taking ownership of a niche" and against "if you're good at something never do it for free", and that'll come later I hope. Today, I just wanted to let you know that Gemini had this banger quote when I was consolidating my notes on this topic:

ownership is bought with the "inefficient" hours you spend doing what you know is right before anyone else is smart enough to pay you for it.

[-]Quinn11mo31

are SOTA configuration languages sufficient for AI proliferation?

My main aim is to work on "hardening the box" i.e. eliminating software bugs so containment schemes don't fail for preventable reasons. But in the famous 4o system card example, the one that looks a little like docker exfiltration, the situation arose from user error, wild guess in compose.yaml or the shell script invoking docker run.

In a linux machine

Here's an example nix file

users.users =
    let
      authorized-key-files = [
        "${keyspath}/id_server_ed25519.pub"
        "${keyspat

... (read more)

2Quinn11mo

seems like there's more prior literature than I thought https://en.wikipedia.org/wiki/Role-based_access_control

[-]Quinn1y30

The more I learn about measurement, the less seriously I take it

I'm impressed with models that accomplish tasks in zero or one shot with minimal prompting skill. I'm not sure what galaxy brained scaffolds and galaxy brained prompts demonstrate. There's so much optimization in the measurement space.

I shipped a benchmark recently, but it's secretly a synthetic data play so regardless of how hard people try in order to score on it, we get synthetic data out of it which leads to finetune jobs which leads to domain specific models that can do such tasks hopefully with minimal prompting effort and no scaffolding.

[-]Quinn2y30

I'm excited for language model interpretability to teach us about the difference between compilers and simulations of compilers. In the sense that chatgpt and I can both predict what a compiler of a suitably popular programming language will do on some input, what's going on there---- surely we're not reimplementing the compiler on our substrate, even in the limit of perfect prediction? Will be an opportunity for a programming language theorist in another year or two of interp progress

[-]Quinn2y32

Proof cert memo

In the Safeguarded AI programme thesis^[1], proof certificates or certifying algorithms are relied upon in the theory of change. Let's discuss!

From the thesis:

Proof certificates are a quite broad concept, introduced by[33] : a certifying algorithm is defined as one that produces enough metadata about its answer that the answer's correctness can be checked by an algorithm which is so simple that it is easy to understand and to formally verify by hand.

The abstract of citation 33 (McConnell et al)^[2]

A certifying algorithm is an algorithm

... (read more)

[-]Quinn2y30

any interest in a REMIX study group? in meatspace in berkeley, a few hours a week. https://github.com/redwoodresearch/remix_public/

2Zack_M_Davis2y

Maybe! (I recently started following the ARENA curriculum, but there's probably a lot of overlap.)

[-]Quinn2y30

Any tips for getting out of a "rise to the occasion mindset" and into a "sink to your training" mindset?

I'm usually optimizing for getting the most out of my A-game bursts. I want to start optimizing for my baseline habits, instead. I should cover the B-, C-, and Z-game; the A-game will cover itself.

Mathaphorically, "rising to the occasion" is taking a max of a max, whereas "sinking to the level of your habits" looks like a greatest lower bound.

[-]Quinn2y30

Yall, this is a rant. It will be sloppy.

I'm really tired of high functioning super smart "autism" like ok we all have madeup diagnoses--- anyone with a IQ slightly above 90 knows that they can learn the slogans to manipulate gatekeepers to get performance enhancement, and they decide not to if they think theyre performing well enough already. That doesn't mean "ADHD" describes something in the world. Similarly, there's this drift of "autism" getting more and more popular. It's obnoxious because labels and identities are obnoxious, but i only find it repuls... (read more)

[-]Quinn2y30

For the record, to mods: I waited till after petrov day to answer the poll because my first guess upon receiving a message on petrov day asking me to click something is that I'm being socially engineered. Clicking the next day felt pretty safe.

[-]Quinn3y30

"EV is measure times value" is a sufficiently load-bearing part of my worldview that if measure and value were correlated or at least one was a function of the other I would be very distressed.

Like in a sense, is John threatening to second-guess hundreds of years of consensus on is-ought?

1Quinn3y

oh dear

1Noosphere893y

I'm not sure what measure is referring to here.

1Quinn3y

probability density

[-]Quinn3y30

messy, jotting down notes:

I saw this thread https://twitter.com/alexschbrt/status/1666114027305725953 which my housemate had been warning me about for years.
failure mode can be understood as trying to aristotle the problem, lack of experimentation
thinking about the nanotech ASI threat model, where it solves nanotech overnight and deploys adversarial proteins in all the bloodstreams of all the lifeforms.
These are sometimes justified by Drexler's inside view of boundary conditions and physical limits.
But to dodge the aristotle problem, there would have

... (read more)

[-]Quinn4y30

Methods, famously, includes the line "I am a descendant of the line of Bacon", tracing empiricism to either Roger (13th century) or Francis (16th century) (unclear which).

Though a cursory wikiing shows an 11th century figure providing precedents for empiricism! Alhazen or Ibn al-Haytham worked mostly optics apparently but had some meta-level writings about the scientific method itself. I found this shockingly excellent quote

The duty of the man who investigates the writings of scientists, if learning the truth is his goal, is to make himself an enemy of a

... (read more)

[-]Quinn5y30

New discord server dedicated to multi-multi delegation research

DM me for invite if you're at all interested in multipolar scenarios, cooperative AI, ARCHES, social applications & governance, computational social choice, heterogeneous takeoff, etc.

(side note I'm also working on figuring out what unipolar worlds and/or homogeneous takeoff worlds imply for MMD research).

[-]Quinn5y30

Questions and Problems - thoughts on chapter 4 of Craft of Doing Research

Last time we discussed the difference between information and a question or a problem, and I suggested that the novelty-satisfied mode of information presentation isn't as good as addressing actual questions or problems. In chapter 3 which I have not typed up thoughts about, A three step procedure is introduced

Topic: "I am studying ..."
Question: "... because I want to find out what/why/how ..."
Significance: "... to help my reader understand ..." As we elaborate on the different k

... (read more)

[-]Quinn1mo20

step-function research and monotonic research.

Some agendas are step-function. Maybe interuniversal teichmuller theory is an example, if it reaches the threshold where it solves ABC then it pays out, and if it doesn't its payout is approximately zero ^[1] . Something something MIRI's logic team ^[2] .

Other agendas are monotonic. The more you do, the more you pay out. I think what I do, program synthesis security (which Mike Dodds calls "scalable formal oversight") for defensive acceleration, is like this. I might be able to reduce some mo... (read more)

[-]Quinn3mo20

is there a quick link I can point someone to if they don't speak Berkelese and I want to say "bayes points"?

2mattmacdermott3mo

My first suggestion is to not use the phrase if they don’t know it and just say “points for making a correct prediction”. But if you do want to link them to something you could send this slight edit of what I wrote elsewhere in the thread:

2Dagon3mo

I don't know the reference myself, and I'd probably recommend against using insider shortcut phrases with people who aren't already aware. For most people who don't already have the background knowledge to understand it from your explanation, a link isn't going to help them much. For the kind of person who WILL benefit from a link, I'd recommend a more general one - perhaps LessWrong overall, or https://www.lesswrong.com/w/probability-theory.

2MichaelDickens3mo

I think the term means something like "you demonstrated truth-seeking character/virtue". Example: Someone (I forget who it was, sorry) came up with a novel AI alignment theory, and then they wrote a long post about how their own theory was deeply flawed. That post earned them Bayes points.

4mattmacdermott3mo

I’ve always interpreted it more literally. Like, if we’ve just seen some evidence which Hypothesis A predicted with twice as much probability as Hypothesis B, then the probability of Hypothesis A grows by a factor of two relative to Hypothesis B. This doubling adds one bit in logspace, and we can think of this bit as a point scored by Hypothesis A. By analogy, if Alice predicted the evidence with twice as much probability as Bob, we can pretend we’re scoring people like hypotheses and give Alice one ‘Bayes point’. If Alice and Bob each subscribe to a fixed hypothesis about How Stuff Works then this is not even an analogy, we’re just Bayesian updating about their hypotheses.

1sjadler3mo

Maybe I’ve been misusing it or seeing it misused, but I thought it meant something more like “called a thing ahead of time” or “made a good prediction” and therefore treated as more credible in the future?

2MichaelDickens3mo

Or maybe I'm the one who's been misunderstanding it! I don't think I have a great understanding of the term tbh so you're probably right. If that's what it means then instead of "Bayes points", Quinn could call it "credibility" or "predictive accuracy" or something.

[-]Quinn9mo20

i'm hearing the new movie "the mountainhead' has thinly veiled musk, altman characters. can anyone confirm or offer takes? I might watch it.

[-]Quinn10mo20

there's an analogy between the zurich r/changemyview curse of evals and the metr/epoch curse of evals. You do this dubiously ethical (according to more US-pilled IRBs or according to more paranoid/pure AI safety advocates) measuring/elicitation project because you might think the world deserves to know. But you had to do dubiously ethical experimentation on unconsenting reddizens / help labs improve capabilities in order to get there--- but the catch is, you only come out net positive if the world chooses to act on this information

[-]Quinn1y20

talk to friends as a half measure

When it comes to your internal track record, it is often said that finding what you wrote at time t-k beats trying to remember what you thought at t-k. However, the activation energy to keep such a journal is kinda a hurdle (which is why products like https://fatebook.io are so good!).

I find that a nice midpoint between the full and correct internal track record practices (rigorous journaling) and completely winging it (leaving yourself open to mistakes and self delusion) is talking to friends, because I think my memory of... (read more)

[-]Quinn1y20

what's the best essay on asking for advice?

Going over etiquette and the social contract, perhaps if it's software specific it talks about minimal reproducers, whatever else the author thinks is involved.

2Quinn1y

A sketch I'm thinking of: asking people to consume information (a question, in this case) is asking them to do you a favor, so you should do your best to ease this burden, however, also don't be paralyzed so budget some leeway to be less than maximally considerate in this way when you really need to.

[-]Quinn1y20

Guaranteed Safe AI paper club meets again this thursday

Event for the paper club: https://calendar.app.google/2a11YNXUFwzHbT3TA

blurb about the paper in last month's newsletter:

... If you’re wondering why you just read all that, here’s the juice: often in GSAI position papers there’ll be some reference to expectations that capture “harm” or “safety”. Preexpectations and postexpectations with respect to particular pairs of programs could be a great way to cash this out, cuz we could look at programs as interventions and simulate RCTs (labeling one program

... (read more)

[-]Quinn1y20

Yoshua Bengio is giving a talk online tomorrow https://lu.ma/4ylbvs75

[-]Quinn1y20

GSAI paper club is tomorrow (gcal ticket), summary (by me) and discussion of this paper

[-]Quinn2y20

i'm getting back into composing and arranging. send me rat poems to set to music!

[-]Quinn2y20

Does anyone use vim / mouse-minimal browser? I like Tridactyl better than the other one I tried, but it's not great when there's a vim mode in a browser window everything starts to step on eachother (like in jupyter, colab, leetcode, codesignal)

1Morpheus2y

Trydactyl is amazing. You can disable the mode on specific websites by running the blacklistadd command. If you have configured that already, these settings can also be saved in your config file. Here's my config (though careful before copying my config. It has fixamo_quiet enabled, a command that got Tridactyl almost removed when it was enabled by default. You should read what it does before you enable it.) Here are my ignore settings: autocmd DocStart https://youtube.com mode ignore autocmd DocStart https://todoist.com mode ignore autocmd DocStart mail.google.com mode ignore autocmd DocStart calendar.google.com mode ignore autocmd DocStart keyma.sh mode ignore autocmd DocStart monkeytype.com mode ignore autocmd DocStart https://www.youtube.com mode ignore autocmd DocStart https://ilias.studium.kit.edu/ mode ignore autocmd DocStart localhost:888 mode ignore autocmd DocStart getguestimate.com mode ignore autocmd DocStart localhost:8888 mode ignore

[-]Quinn2y20

I'm halfway through how to measure anything: cybersecurity, which doesn't have a lot of specifics to cybersecurity and mostly reviews the first book. I never finished the first one, and it was about four years ago that I read the parts that I did.

I think for top of the funnel EA recruiting it remains the best and most underrated book. Basically anyone worried about any kind of problem will do better if they read it, and most people in memetically adaptive / commonsensical activist or philanthropic mindsets probably aren't measuring enough.

However, the mate... (read more)

1matto2y

What's different there compared to the first book? I read the first one and found it to resonate strongly, but also found my mental models to not fit well with the general thrust. Since then I've been studying stats and thinking more about measurement with the intent to reread the first book. Curious if the cybersecurity one adds something more though

2Quinn2y

In terms of the parts where the books overlap, I didn't notice anything substantial. If anything the sequel is less, cuz there wasn't enough detail to get into tricks like the equivalent bet test.

[-]Quinn3y20

preorders as the barest vocabulary for emergence

We can say "a monotonic map, $Φ \in m o n o (Q^{P})$ is a phenomenon of $P$ as observed by $Q$ ", then, emergence is simply the impreservation of joins.

Given preorders $(P, \leq_{P})$ and $(Q, \leq_{Q})$ , we say a map in $m o n o (Q^{P})$ "preserves" joins (which, recall, are least upper bounds) iff $\forall a b \in P, Φ a \lor_{Q} Φ b = Φ (a \lor_{P} b)$ where by " $x = y$ " we mean $x \leq y \land y \leq x$ .

Suppose $Φ$ is a measurement taken from a particle. We would like for our measurement system to be robust against emergence, which is literally operationalized by measuring one particle, measuring another, t... (read more)

[-]Quinn4y20

Jotted down some notes about the law of mad science on the EA Forum. Looks like some pretty interesting open problems in the global priorities, xrisk strategy space. https://forum.effectivealtruism.org/posts/r5GbSZ7dcb6nbuWch/quinn-s-shortform?commentId=DqSh6ifdXpwHgXnCG

[-]Quinn4y20

Ambition, romance, kids

Two premises of mine are that I'm more ambitious than nearly everyone I meet in meatspace and normal distributions. This implies that in any relationship, I should expect to be the more ambitious one.

I do aspire to be a nagging voice increasing the ambitions of all my friends. I literally break the ice with acquaintances by asking "how's your master plan going?" because I try to create vibes like we're having coffee in the hallway of a supervillain conference, and I like to also ask "what harder project is your current project a war... (read more)

[-]Quinn4y20

Positive and negative longtermism

I'm not aware of a literature or a dialogue on what I think is a very crucial divide in longtermism.

In this shortform, I'm going to take a polarity approach. I'm going to bring each pole to it's extreme, probably each beyond positions that are actually held, because I think median longtermism or the longtermism described in the Precipice is a kind of average of the two.

Negative longtermism is saying "let's not let some bad stuff happen", namely extinction. It wants to preserve. If nothing gets better for the poor or the an... (read more)

[-]Quinn5y20

The audience models of research - thoughts on Craft of Doing Research chapter 2

Writers can't avoid creating some role for themselves and their readers, planned or not

Before considering the role you're creating for your reader, consider the role you're creating for yourself. Your broad options are the following

I've found some new and interesting information - I have information for you
I've found a solution to an important practical problem - I can help you fix a problem
I've found an answer to an important question - I can help you understand somethi

... (read more)

[-]Quinn2y10

He had become so caught up in building sentences that he had almost forgotten the barbaric days when thinking was like a splash of color landing on a page.

Edward St Aubyn

[-]Quinn2y10

a $B$ -valued quantifier is any function $(A \to B) \to B$ , so when $B$ is bool quantifiers are the functions that take predicates as input and return bool as output (same for prop). the standard max and min functions on arrays count as real-valued quantifiers for some index set $A$ .

I thought I had seen $\forall$ as the max of the Prop-valued quantifiers, and exists as the min somewhere, which has a nice mindfeel since forall has this "big" feeling (if you determined for $P : A \to P r o p$ that $\forall P$ (of which $\forall x : A, P x$ is just syntax sugar since the variable name $x$ is irrelevant) by exhaustive ... (read more)

3Gurkenglas2y

Among monotonic, boolean quantifiers that don't ignore their input, exists is maximal because it returns true as often as possible; forall is minimal because it returns true as rarely as possible.

[-]Quinn2y10

claude and chatgpt are pretty good at ingesting textbooks and papers and making org-drill cards.

here's my system prompt https://chat.openai.com/g/g-rgeaNP1lO-org-drill-card-creator though i usually tune it a little further per session.

Here are takes on the idea from the anki ecosystem

I tried a little ankigpt and it was fine, i haven't tried the direct plugin from ankiweb. I'm opting for org-drill here cuz I really like plaintext.

[-]Quinn2y10

consider how our nonconstructive existence proof of nash equilibria creates an algorithmic search problem, which we then study with computational complexity. For example, 2-player 0-sum games are P but for three or more players general sum games are NP-hard. I wonder if every nonconstructive existence proof is like this? In the sense of inducing a computational complexity exercise to find what class it's in, before coming up with greedy heuristics to accomplish an approximate example in practice.

[-]Quinn2y10

I like thinking about "what it feels like to write computer programs if you're a transformer".

Does anyone have a sense of how to benchmark or falsify Nostalgebraist's post on the subject?

[-]Quinn2y10

Quick version of conversations I keep having, might be worth a top level effortpost.

A prediction market platform giving granular permission systems would open up many use cases for many people

whistleblower protections at large firms, dating, project management and internal company politics--- all userbases with underserved opinions about transparency. Manifold could pivot to this but have a lot of other stuff they could do instead.

Think about slack admins are confused about how to prevent some usergroups from @channel and discord admins aren't.

[-]Quinn2y10

what are your obnoxious price systems for tutoring?

There's a somewhat niche CS subtopic that a friend wants to learn, I'm really well positioned to teach her. More discussion on the manifold bounty:

[-]Quinn3y10

10^93 is a fun and underrated number https://en.wikipedia.org/wiki/Transcomputational_problem

3niplav3y

I like 10120 more.

[-]Quinn3y10

Jargon is not due to status scarcity, but it sometimes makes unearned requests for attention

When you see a new intricate discipline, and you're reticent to invest in navigating it, asking to be convinced that your attention has been earned is fine, but I don't recall seeing a valid or interesting complaint about jargon that deviates from this.

Some elaboration here

2Dagon3y

Like most wide-scale social phenomena, jargon is shaped by multiple incentives, with a pretty wide variance in the narrowness of consumer (insider, outsider, elite, median) and type of value provided (clarity, obfuscation, reinforcement of values, chunking of concepts). Undertstanding a field VERY OFTEN requires understanding the people and social structures that shape the field. Jargon is useful in this dimension, as well as the surface-level content of the jargon.

[-]Quinn3y10

There's a remarkable TNG episode about enfeeblement and paul-based threatmodels, if I recall correctly.

There's a post-scarcity planet with some sort of Engine of Prosperity in the townsquare, and it doesn't require maintenance for enough generations that engineering itself is a lost oral tradition. Then it starts showing signs of wear and tear...

If paul was writing this story, they would die. I think in the actual episode, there's a disagreeable autistic teenager who expresses curiosity about the Engine mechanisms, and the grownups basically shame him, lik... (read more)

[-]Quinn3y10

We need a cool one-word snappy thing to say for "just what do you think you know and how do you think you know it" or like "I'm requesting more background about this belief you've stated, if you have time".

I want something that has the same mouthfeel as "roll to disbelieve" for this.

[-]Quinn4y10

Is there an EV monad? I'm inclined to think there is not, because EV(EV(X)) is a way simpler structure than a "flatmap" analogue.

[-]Quinn4y10

Would there be a way of estimating how many people within the amazon organization are fanatical about same day delivery ratio against how many are "just working a job"? Does anyone have a guess? My guess is that an organization of that size with a lot of cash only needs about 50 true fanatics, the rest can be "mere employees". What do yall think?

3gwern4y

I can't really think of any research bearing on this, and unclear how you'd measure it anyway. One way to go might be to note that there is a wide (and weird) variance between the efficiency of companies: market pressures are slack enough that two companies doing as far as can be told the exact same thing in the same geographic markets with the same inputs might be almost 100% different (I think was the range in the example of concrete manufacturing in one paper I read); a lot of that difference appears to be explainable by the quality of the management, and you can do randomized experiments in management coaching or intensity of management and see substantial changes in the efficiency of a company (Bloom - the other one - has a bunch of studies like this). Presumably you could try to extrapolate from the effects of individuals to company-wide effects, and define the goal of the 'fanatical' as something like 'maintaining top-10% industry-wide performance': if educating the CEO is worth X percentiles and hiring a good manager is worth 0.0Y percentiles and you have such and such a number of each, then multiply out to figure out what will bump you 40 percentiles from an imagined baseline of 50% to the 90% goal. Another argument might be a more Fermi estimate style argument from startups. A good startup CEO should be a fanatic about something, otherwise they probably aren't going to survive the job. So we can assume one fanatic at least. People generally talk about startups beginning to lose the special startup magic of agility, focus, and fanaticism at around Dunbar's number level of employees like 300, or even less (eg Amazon's two-pizza rule which is I guess 6 people?). In the 'worst' case that the founder has hired 0 fanatics, that implies 1 fanatic can ride herd over no more than ~300 people; in the 'best' case that he's hired dozens, then each fanatic can only cover for more like 2 or 3 non-fanatics. I'm not sure how we should count Amazon's employees: do the wa

2Dagon4y

I'm not sure "fanatical" is well-defined enough to mean anything here. I doubt there are any who'd commit terrorist acts to further same-day delivery. There are probably quite a few who believe it's important to the business, and a big benefit for many customers. You're absolutely right that a lot of employees and contractors can be "mere employees", not particularly caring about long-term strategy, customer perception, or the like. That's kind of the nature of ALL organizations and group behaviors, including corporate, government, and social groupings. There's generally some amount of influencers/selectors/visionaries, some amount of strategists and implementers, and a large number of followers. Most organizations are multidimensional enough that the same people can play different roles on different topics as well.

1JBlack4y

I don't think it needs any true fanatics. It just needs incentives. This isn't to say there won't be fanatics anyway. There probably aren't many things that nobody can get fanatical about. This is even more true if they're given incentives to act fanatical about it.

3RHollerith4y

Sure, but the incentive structure needs continual maintenance to keep it aligned with or pointing at the goal, which naturally leads to the questions of how many people are needed to keep the structure pointing at the goal, and what the motivation of those people will be.

[-]Quinn4y10

We need a name for the following heuristic, I think, I think of it as one of those "tribal knowledge" things that gets passed on like an oral tradition without being citeable in the sense of being a part of a literature. If you come up with a name I'll certainly credit you in a top level post!

I heard it from Abram Demski at AISU'21.

Suppose you're either going to end up in world A or world B, and you're uncertain about which one it's going to be. Suppose you can pull lever $L_{A}$ which will be 100 valuable if you end up in world A, or you can pull lever $L_{B}$ whi... (read more)

3Dagon4y

Why are you specifying 100 or 0 value, and using fuzzy language like "acceptably small" for disvalue? Is this based on "value" and "disvalue" being different dimensions, and thus incomparable? Wouldn't you just include both in your prediction, and run it through your (best guess of) utility function and pick highest expectation, weighted by your probability estimate of which universe you'll find yourself in?

1TLW4y

100 and 0 in this context make sense. Or at least in my initial reading: arbitrarily-chosen values that are in a decent range to work quickly with (akin to why people often work in percentages instead of 0..1) It is - I'm going to say "often", although I am aware this is suboptimal phrasing - often the case that you are confident in the sign of an outcome but not the magnitude of the outcome. As such, you can often end up with discontinuities at zero. Dropping the entire probability distribution of outcomes through your utility function doesn't even necessarily have a closed-form result. In a universe where computation itself is a cost, finding a cheaper heuristic (and working through if said heuristic has any particular basis or problems) can be valuable. The heuristic in the grandparent comment is just what happens if you are simultaneously very confident in the sign of positive results, and have very little confidence in the magnitude of negative results.

1TLW4y

It is often the case that you are confident in the sign of an outcome but not the magnitude of the outcome. This heuristic is what happens if you are simultaneously very confident in the sign of positive results, and have very little confidence in the magnitude of negative results.

1Measure4y

I'm not sure I understand. If the lever is +100 in world A and -90 in world B, it seems like a good bet if you don't know which world you're in. Or is that what you mean by "acceptably small amount of disvalue"?

1Quinn4y

Obviously there are considerations downstream of articulating this, one is that when P(A)>P(B) but V(LA|A)<V(LB|B) so it's reasonable to hedge on ending up in world B even though it's not strictly more probable than ending up in world A.

[-]Quinn4y10

critiques and complaints

I think one of the most crucial meta skills i've developed is honing my sense of who's criticizing me vs. who's complaining.

A criticism is actionable, implicitly often it's from someone who wants you to win. A complaint is when you can't figure out how you'd actionably fix something or improve based on what you're being told.

This simple binary story is problematic. It can empower you to ignore criticism you don't like by providing a set of excuses, if you're not careful. Sometimes it's operationally impossible to parse out a critic... (read more)

[-]Quinn4y10

hmu for a haskell job in decentralized finance. Super fun zero knowledge proof stuff, great earning to give opportunity.

[-]Quinn4y10

Are shelling points the occam's razor of mechanism design?

In game theory, a focal point (or Schelling point) is a solution that people tend to choose by default in the absence of communication. (wikipedia)

Intuitively I think simplicity is a good explanation for a solution being converged upon.

Does anyone have any crisp examples that violate the schelling point - occam's razor correspondence?

1[comment deleted]4y

[-]Quinn4y10

Disvalue via interpersonal expected value and probability

My deontologist friend just told me that treating people like investments is no way to live. The benefits of living by that take are that your commitments are more binding, you actually do factor out uncertainty, because when you treat people like investments you always think "well someday I'll no longer be creating value for this person and they'll drop me from their life". It's hard to make long term plans, living like that.

I've kept friends around out of loyalty to what we shared 5-10 years ago w... (read more)

3Dagon4y

One thing to be careful about in such decisions - you don't know your own utility function very precisely, and your modeling of both future interactions and your value from such are EXTREMELY lossy. The best argument for deontological approaches is that you're running on very corrupt hardware, and rules that have evolved and been tested over a long period of time are far more trustworthy than your ad-hoc analysis which privileges obvious visible artifacts over more subtle (but often more important) considerations.

1[comment deleted]4y

[-]Quinn4y10

I may refine this into a formal bounty at some point.

I'm curious if censorship would actually work in the context of blocking deployment of superpowerful AI systems. Sometimes people will mention "matrix multiplication" as a sort of goofy edge case, which isn't very plausible, but that doesn't mean there couldn't be actual political pressure to censor it. A more plausible example would be attention. Say the government threatens soft power against arxiv if they don't pull attention is all you need, or threatens soft power against harvard if their linguistic... (read more)

1[comment deleted]4y

[-]Quinn4y10

any literature on estimates of social impact of businesses divided by their valuations?

the idea that dollars are a proxy for social impact is neat, but leaves a lot of room for goodhart and I think it's plausible that they diverge entirely in cases. It would be useful to know, if possible to know, what's going on here.

1Josh Jacobson4y

there's paid tools that estimate this, probably poorly

1Quinn4y

thinking about this comment

[-]Quinn4y10

Why have I heard about Tyson investing into lab grown, but I haven't heard about big oil investing in renewable?

Tyson's basic insight here is not to identify as "an animal agriculture company". Instead, they identify as "a feeding people company". (Which happens to align with doing the right thing, conveniently!)

It seems like big oil is making a tremendous mistake here. Do you think oil execs go around saying "we're an oil company"? When they could instead be going around saying "we're a powering stuff" company. Being a powering stuff company means you hav... (read more)

5ChristianKl4y

Yes, this is more about you not hearing about it. Shell Has A Bigger Clean Energy Plan Than You Think — CleanTechnica Interview BP Bets Future on Green Energy, but Investors Remain Wary It seems that Tyson invested 150 million into a fund for new food solutions. In contrast to that Exxon invested 600 million in algae biofuels back in 2009 and more afterward.

2Yoav Ravid4y

I do vaguely remember hearing of big oil doing that, though perhaps not as much as meat producers do with lab grown meat, try looking into it.

2Pattern4y

1. Might be a little bit harder in that industry. 2. Are they in charge (of that)? Who chose them?

1Quinn4y

you're most likely right about it being harder in the industry! I don't think they need permission or an external mandate to do the right thing!

1JBlack4y

The main problem is that prior investment into the oil method of powering stuff doesn't translate into having a comparative advantage in a renewable way of powering stuff. They want a return on their existing massive investments. While this looks superficially like a sunk cost fallacy, it isn't. If a comparatively small investment (mere billions) can ensure continued returns on their trillions of sunk capital for another decade, it's worth it to them. Investment into renewable powering stuff would require substantially different skill sets in employees, in very different locations, and highly non-overlapping investment. At best, such an endeavour would constitute a wholly owned subsidiary that grows while the rest of the company withers. At worst, a parasite that hastens the demise of the parent while eventually failing in the face of competition anyway.

[-]Quinn4y10

I've had a background assumption in my interpretation of and beliefs about reward functions for as long as I can remember (i.e. since first reading the sequences), that I suddenly realized I don't believe is written down. Over the last two years I've gained experience writing coq sufficient to inspire a convenient way of framing it.

Computational vs axiomatic reward functions

Computational vs axiomatic in proof engineering

A proof engineer calls a proposition computational if it's proof can be broken down into parts.

For example, a + (b + c) = (a + b) + c i... (read more)

1Quinn4y

I should be more careful not to imply I think that we have solid specimens of computational reward functions; more that I think it's a theoretically important region of the space of possible minds, and might factor in idealizations of agency

[-]Quinn5y10

capabilities-prone research.

I come to you with a dollar I want to spend on AI. You can allocate p pennies to go to capabilities and 100-p pennies to go to alignment, but only if you know of a project that realizes that allocation. For example, we might think that GAN research sets p = 98 (providing 2 cents to alignment) while interpretability research sets p = 10 (providing 90 cents to alignment).

Is this remotely useful? This is a really rough model (you might think it's more of a venn diagram and that this model doesn't provide a way of reasoning about t... (read more)

[-]Quinn5y10

Question your argument as your readers will - thoughts on chapter 10 of Craft of Research

Three predictable disagreements are

There are causes in addition to the one you claim
What about these counterexamples?
I don't define X as you do, to me X means...

There are roughly two kinds of queries readers will have about your argument

intrinsic soundness - "challenging the clarity of a claim, relevance of reasons, or quality of evidence"
extrinsic soundness - "different ways of framing the problem, evidence you've overlooked, or what others have written on t

... (read more)

[-]Quinn5y10

there's a gap in my inside view of the problem, part of me thinks that capabilities progress such as out-of-distribution robustness or the 4 tenets described in open problems in cooperative ai is necessary for AI to be transformative, i.e. a prereq of TAI, and another part of me that thinks AI will be xrisky and unstable if it progresses along other aspects but not along the axis of those capabilities.

There's a geometry here of transformative / not transformative cross product with dangerous not dangerous.

To have an inside view I must be able to adequately navigate between the quadrants with respect to outcomes, interventions, etc.

2Pattern5y

If something can learn fast enough, then it's out-of-distribution performance won't matter as much. (OOD performance will still matter -but it'll have less to learn where it's good, and more to learn where it's not.*) *Although generalization ability seems like the reason learning matters. So I see why it seems necessary for 'transformation'.

[-]Quinn5y10

testing latex in spoiler tag

Testing code block in spoiler tag

1Quinn5y

::: latex Ax+1:={} :::

[-]Quinn4y00

missed opportunities to build a predictive track record and trump

I was reminiscing about my prediction market failures, the clearest "almost won a lot of mana dollars" (if manifold markets had existed back then) was this executive order. The campaign speeches made it fairly obvious, and I'm still salty about a few idiots telling me "stop being hysterical" when I accused him of being what he's writing on the tin that he is pre inauguration even though I overall reminisce that being a time when my epistemics were way worse than they are now.

However, there d... (read more)

[-]Quinn23d-10

Me: was chatting a bunch with Gemini and Claude the past week or two about flops accounting, and shipped this prototype for software-side estimation via the CuTe layouts that i don't expect to be that useful in an adversarial setting but i'm open minded and just learning. I'm not gonna do this for Apart's hackathon next weekend, but someone could do some of these basic exercises involving hardware-level flops accounting primitives and related attacks, given simply a gamer laptop.

Gemini:

Hackathon: Adversarial FLOPs Accounting & SAGE Defense

Scope: Week... (read more)

Moderation Log

The shortform of Ole Q Doc

3

<standup_comedian> What's the deal with evals </standup_comedian>

august 2024 guaranteed safe ai newsletter

A wager you say

Good arguments - notes on Craft of Research chapter 7

thoughts on chapter 9 of Craft of Research

Sources - notes on Craft of Research chapters 5 and 6

Primary, secondary, and tertiary sources

Look for creative agreement

How are people mistreated by bellcurves?

nonprosaic ai will not be on short timelines

Excellence and adequacy

Rats and EAs should help with the sanity levels in other communities

Claims - thoughts on chapter eight of Craft of Research

notes (from a very jr researcher) on alignment training pipeline

Effortposts I keep not getting around to

are SOTA configuration languages sufficient for AI proliferation?

The more I learn about measurement, the less seriously I take it

Proof cert memo

New discord server dedicated to multi-multi delegation research

Questions and Problems - thoughts on chapter 4 of Craft of Doing Research

step-function research and monotonic research.

talk to friends as a half measure

what's the best essay on asking for advice?

Guaranteed Safe AI paper club meets again this thursday

preorders as the barest vocabulary for emergence

Ambition, romance, kids

Positive and negative longtermism

The audience models of research - thoughts on Craft of Doing Research chapter 2

A prediction market platform giving granular permission systems would open up many use cases for many people

what are your obnoxious price systems for tutoring?

Jargon is not due to status scarcity, but it sometimes makes unearned requests for attention

critiques and complaints

Are shelling points the occam's razor of mechanism design?

Disvalue via interpersonal expected value and probability

Computational vs axiomatic reward functions

Computational vs axiomatic in proof engineering

capabilities-prone research.

Question your argument as your readers will - thoughts on chapter 10 of Craft of Research

missed opportunities to build a predictive track record and trump

Hackathon: Adversarial FLOPs Accounting & SAGE Defense

`<standup_comedian>` What's the deal with evals `</standup_comedian>`