LessWrong

12h

The curious tale of how I mistook my dyslexia for stupidity - and talked, sang, and drew my way out of it.

Sometimes I tell people I’m dyslexic and they don’t believe me. I love to read, I can mostly write without error, and I’m fluent in more than one language.

Also, I don’t actually technically know if I’m dyslectic cause I was never diagnosed. Instead I thought I was pretty dumb but if I worked really hard no one would notice. Later I felt inordinately angry about why anyone could possibly care about the exact order of letters when the gist is perfectly clear even if if if I right liike tis.

I mean, clear to me anyway.

I was 25 before it dawned on me that all the tricks...

(Continue Reading – 1770 more words)

keltan7m10

Is it not normal to sub vocalise?

Could people react to this comment with a Tick if they do, and a cross if they don’t?

1keltan10m

I was diagnosed as a kid. I went through a. lot. of. therapy. Lots of special classes and making two thumbs up then pushing your knuckles together to make a bed that spells bed. That all helped a lot. But three things helped to the point where I hardly think about it these days. 1. Minecraft PVP servers. You need to be able to effectively communicate with your team and taunt the enemy. And you need to be able to do it while someone is running at you with a sword. 2. Fighting with Antivax people as a teenager on Facebook. The biggest slip up someone could make in a Facebook argument was mixing up “you’re” and “your” 3. Talking to girls I liked who could actually spell things correctly. I got very good at rapidly googling how to spell words as I was typing a response.

1philip_b1h

I think that's pretty easy :)

4AnthonyC6h

This was really interesting! You probably already know this, but reading out loud was the norm, and silent reading unusual, for most of history: https://en.wikipedia.org/wiki/Silent_reading That didn't really start to change until well after the invention of the printing press. For most of my life, even now once in a while, I would subvocalize my own inner monologue. Definitely had to learn to suppress that in social situations.

Introducing AI Lab Watch

197

Zach Stein-Perlman

10d

This is a linkpost for https://ailabwatch.org

I'm launching AI Lab Watch. I collected actions for frontier AI labs to improve AI safety, then evaluated some frontier labs accordingly.

It's a collection of information on what labs should do and what labs are doing. It also has some adjacent resources, including a list of other safety-ish scorecard-ish stuff.

(It's much better on desktop than mobile — don't read it on mobile.)

It's in beta—leave feedback here or comment or DM me—but I basically endorse the content and you're welcome to share and discuss it publicly.

It's unincorporated, unfunded, not affiliated with any orgs/people, and is just me.

Some clarifications and disclaimers.

How you can help:

Give feedback on how this project is helpful or how it could be different to be much more helpful
Tell me what's wrong/missing; point me to sources

...

(See More – 208 more words)

9ryan_greenblatt3h

I initially thought this was wrong, but on further inspection, I agree and this seems to be a bug. The deployment criteria starts with: This criteria seems to allow to lab to meet it by having a good risk assesment criteria, but the rest of the criteria contains specific countermeasures that: 1. Are impossible to consistently impose if you make weights open (e.g. Enforcement and KYC). 2. Don't pass cost benefit for current models which pose low risk. (And it seems the criteria is "do you have them implemented right now?) If the lab had an excellent risk assement policy and released weights if the cost/benefit seemed good, that should be fine according to the "deployment" criteria IMO. Generally, the deployment criteria should be gated behind "has a plan to do this when models are actually powerful and their implementation of the plan is credible". I get the sense that this criteria doesn't quite handle the necessarily edge cases to handle reasonable choices orgs might make. (This is partially my fault as I didn't notice this when providing feedback on this project.) (IMO making weights accessible is probably good on current margins, e.g. llama-3-70b would be good to release so long as it is part of an overall good policy, is not setting a bad precedent, and doesn't leak architecture secrets.) (A general problem with this project is somewhat arbitrarily requiring specific countermeasures. I think this is probably intrinsic to the approach I'm afraid.)

2Zach Stein-Perlman2h

[edited] Thanks. I agree you're pointing at something flawed in the current version and generally thorny. Strong-upvoted and strong-agreevoted. I didn't put much effort into clarifying this kind of thing because it's currently moot—I don't think it would change any lab's score—but I agree.[1] I think e.g. a criterion "use KYC" should technically be replaced with "use KYC OR say/demonstrate that you're prepared to implement KYC and have some capability/risk threshold to implement it and [that threshold isn't too high]." Yeah. The criteria can be like "implement them or demonstrate that you could implement them and have a good plan to do so," but it would sometimes be reasonable for the lab to not have done this yet. (Especially for non-frontier labs; the deployment criteria mostly don't work well for evaluating non-frontier labs. Also if demonstrating that you could implement something is difficult, even if you could implement it.) I'm interested in suggestions :shrug: 1. ^ And I think my site says some things that contradict this principle, like 'these criteria require keeping weights private.' Oops.

ryan_greenblatt15m20

Hmm, yeah it does seem thorny if you can get the points by just saying you'll do something.

Like I absolutely think this shouldn't count for security. I think you should have to demonstrate actual security of model weights and I can't think of any demonstration of "we have the capacity to do security" which I would find fully convincing. (Though setting up some inference server at some point which is secure to highly resourced pen testers would be reasonably compelling for demonstrating part of the security portfolio.)

2Akash10h

Could consider “frontier AI watch”, “frontier AI company watch”, or “AGI watch.” Most people in the world (including policymakers) have a much broader conception of AI. AI means machine learning, AI is the thing that 1000s of companies are using and 1000s of academics are developing, etc etc.

Linear infra-Bayesian Bandits

Vanessa Kosoy

Ω 720m

This is a linkpost for https://arxiv.org/abs/2405.05673

Linked is my MSc thesis, where I do regret analysis for an infra-Bayesian^[1] generalization of stochastic linear bandits.

The main significance that I see in this work is:

Expanding our understanding of infra-Bayesian regret bounds, and solidifying our confidence that infra-Bayesianism is a viable approach. Previously, the most interesting IB regret analysis we had was Tian et al which deals (essentially) with episodic infra-MDPs. My work here doesn't supersede Tian et al because it only talks about bandits (i.e. stateless infra-Bayesian laws), but it complements it because it deals with a parameteric hypothesis space (i.e. fits into the general theme in learning-theory that generalization bounds should scale with the dimension of the hypothesis class).
Discovering some surprising features of infra-Bayesian learning that have no analogues in classical theory. In particular, it

...

(See More – 74 more words)

We might be missing some key feature of AI takeoff; it'll probably seem like "we could've seen this coming"

Lukas_Gloor

15h

Predicting the future is hard, so it’s no surprise that we occasionally miss important developments.

However, several times recently, in the contexts of Covid forecasting and AI progress, I noticed that I missed some crucial feature of a development I was interested in getting right, and it felt to me like I could’ve seen it coming if only I had tried a little harder. (Some others probably did better, but I could imagine that I wasn't the only one who got things wrong.)

Maybe this is hindsight bias, but if there’s something to it, I want to distill the nature of the mistake.

First, here are the examples that prompted me to take notice:

Predicting the course of the Covid pandemic:

I didn’t foresee the contribution from sociological factors (e.g., “people not wanting

...

(Continue Reading – 1306 more words)

4Jsevillamol1h

Here is a "predictable surprise" I don't discussed often: given the advantages of scale and centralisation for training, it does not seem crazy to me that some major AI developers will be pooling resources in the future, and training jointly large AI systems.

3habryka3h

I have a lot of uncertainty about the difficulty of robotics, and the difficulty of e.g. designing superviruses or other ways to kill a lot of people. I do agree that in most worlds robotics will be solved to a human level before AI will be capable of killing everyone, but I am generally really averse to unnecessarily constraining my hypothesis space when thinking about this kind of stuff. >90% seems quite doable with a well-engineered virus (especially one with a long infectious incubation period). I think 99%+ is much harder and probably out of reach until after robotics is thoroughly solved, but like, my current guess is a motivated team of humans could design a virus that kills 90% - 95% of humanity.

1O O2h

Can a motivated team of humans design a virus that spreads rapidly but stays dormant for a while until it kills most humans with a difficult to stop mechanism before we can stop it? And it has to happen before we develop AIs that can detect these sorts of latent threats anyways. You have to realize if covid was like this we would mass trial mrna vaccines as soon as they were available and a lot of Hail Mary procedures since the alternative is extinction. These slightly smarter than human AIs will be monitored by other such AIs, and probably will be rewarded if they defect. (The AIs they defect on get wiped out and they possibly get to replicate more for example) I think such a takeover could be quite difficult to pull off in practice. The world with lots of slightly smarter than human AIs will be more robust to takeover, there’s a limited time window to even attempt it, failure would be death, and humanity would be far more disciplined against this than covid.

habryka1h54

Despite my general interest in open inquiry, I will avoid talking about my detailed hypothesis of how to construct such a virus. I am not confident this is worth the tradeoff, but the costs of speculating about the details here in public do seem non-trivial.

simeon_c's Shortform

simeon_c

1mo

15simeon_c1h

Idea: Daniel Kokotajlo probably lost quite a bit of money by not signing an OpenAI NDA before leaving, which I consider a public service at this point. Could some of the funders of the AI safety landscape give some money or social reward for this? I guess reimbursing everything Daniel lost might be a bit too much for funders but providing some money, both to reward the act and incentivize future safety people to not sign NDAs would have a very high value.

habryka1h198

@Daniel Kokotajlo If you indeed avoided signing an NDA, would you be able to share how much you passed up as a result of that? I might indeed want to create a precedent here and maybe try to fundraise for some substantial fraction of it.

Risks from GPT-4 Byproduct of Recursively Optimizing AIs

ben hayum

This is a linkpost for https://forum.effectivealtruism.org/posts/XvicpERcDFXnsMkfe/risks-from-gpt-4-byproduct-of-recursively-optimizing-ais

Epistemic Status: At midnight three days ago, I saw some of the GPT-4 Byproduct Recursively Optimizing AIs below on twitter which freaked me out a little and lit a fire underneath me to write up this post, my first on LessWrong. Here, my main goal is to start a dialogue on this topic which from my (perhaps secluded) vantage point nobody seems to be talking about. I don’t expect to currently have the optimal diagnosis of the issue and prescription of end solutions.

Acknowledgements: Thanks to my fellow Wisconsin AI Safety Initiative (WAISI) group organizers Austin Witte and Akhil Polamarasetty for giving feedback on this post. Organizing the WAISI community has been incredibly fruitful in being able to spar ideas with others and see which strongest ones survive....

(Continue Reading – 2890 more words)

eClaire_de_lune1h10

Empathizing with AGI will not align it nor will it prevent any existential risk. Ending discrimination would obviously be a positive for the world, but it will not align AGI.

It may not align it, but I do think it would prevent certain unlikely existential risks.

If AI/AGI/ASI is truly intelligent, and not just knowledgeable, we should definitely empathize and be compassionate with it. If it ends up being non-sentient, so be it, guess we made a perfect tool. If it ends up being sentient and we've been abusing a being that is super-intelligent, then good luck... (read more)

To get the best posts emailed to you, create an account! (2-3 posts per week, selected by the LessWrong moderation team.)

Dating Roundup #3: Third Time’s the Charm

Zvi

The first speculated on why you’re still single. We failed to settle the issue. A lot of you were indeed still single. So the debate continues.

The second gave more potential reasons, starting with the suspicion that you are not even trying, and also many ways you are likely trying wrong.

The definition of insanity is trying the same thing over again expecting different results. Another definition of insanity is dating in 2024. Can’t quit now.

You’re Single Because Dating Apps Keep Getting Worse

A guide to taking the perfect dating app photo. This area of your life is important, so if you intend to take dating apps seriously then you should take photo optimization seriously, and of course you can then also use the photos for other things.

I love the...

(Continue Reading – 11504 more words)

Curt Tigges1h10

I think this list will successfully convince many to stay off the dating market indefinitely (I feel inclined that way myself after reading all this). Who in the world has time to work on all of this? At best, this is just a massive set of to-dos; at worst, it's an enormous list of all the ways the dating world sucks and reasons why you'll fail.

1exanova2h

If it helps, I am willing to match people in the rationality-adjacent circles in the Bay Area and give you personal feedback. You can find my contact information in my profile.

1rotatingpaguro13h

Thinking about it, I suspect I was not getting what "authenticity and openness" means. Like, it's not "being yourself and letting go", and more "being honest", I guess? Could you give me >= 2 examples of a person being "authentic and open"?

1RamblinDash11h

So I guess I'm not sure what you mean by that. I think it might be easier to support what I'm saying in the negative. Some example of inauthenticity or un-openness might be: * Consciously faking your personality (in a way that you wouldn't want to maintain as an essentially permanent change) * Lying about what you want out of the relationship * Pretending to like/dislike hobbies or interests that you actually strongly dislike/like The problem with doing these things is that, to the extent that doing them was necessary to gain the relationship, you are now stuck with a relationship that is built on a papered-over incompatibility. If your plan is that you will fake a completely different personality/goals/interests, then you will now be in a relationship where you have to permanently keep faking that stuff while constantly being wary that your new partner might find out you were faking plus you have to spend a lot of time and energy doing stuff and/or interacting with someone you don't actually like, or else ending the relationship and being back at square 1, except that you've invested time/energy that you won't get back. There can be toned-down good versions of this bad strategy tho, I think, which are more like "putting your best foot forward" than like "being inauthentic." Truth: Looking for a life partner, getting desperate Good strategy [probably depends on age, for this one]: Open to various possibilities, see how it goes. Bad strategy: Your date says they are really only looking for short term fun, and you agree that's all you are looking for too. Truth: A talkative person who loves debating ideas Good strategy: Tone it down a little, try to listen as much as you talk and try to "yes, and" or "that's interesting, tell me more about what led you to that" your date's points rather than "no but" (you can often make similar points either way) Bad strategy: Just agree with everything your date says; even if you actually have a strong opposing view

A couple productivity tips for overthinkers

Steven Byrnes

20d

1. If you find that you’re reluctant to permanently give up on to-do list items, “deprioritize” them instead

I hate the idea of deciding that something on my to-do list isn’t that important, and then deleting it off my to-do list without actually doing it. Because once it’s off my to-do list, then quite possibly I’ll never think about it again. And what if it’s actually worth doing? Or what if my priorities will change such that it will be worth doing at some point in the future? Gahh!

On the other hand, if I never delete anything off my to-do list, it will grow to infinity.

The solution I’ve settled on is a priority-categorized to-do list, using a kanban-style online tool (e.g. Trello). The left couple columns (“lists”) are very active—i.e., to-do list...

(See More – 906 more words)

2MondSemmel9h

I've found that there's value in having short to-do lists, because short lists fit much better into working memory and are thus easier to think about. If items are deprioritized rather than getting properly deleted from the system, this increases the total number of to-dos one could think about. On the other hand, maybe moving tasks to offscreen columns is sufficient to get them off one's mind? It seems to me like a both easier and more comprehensive approach would be to use a text editor with proper version control and diff features, and then to name particular versions before making major changes.

2Steven Byrnes7h

IMO the main point of a to-do list is to not have the to-do list in working memory. The only thing that should be in working memory is the one thing you're actually supposed to be focusing on and doing, right now. Right? Or if you're instead in the mode of deciding what to do next, or making a schedule for your day, etc., then that's different, but working memory is still kinda irrelevant because presumably you have your to-do list open on your computer, right in front of your eyes, while you do that, right? Is that what you do? It's not a good fit to my typical workflow. But I'm definitely in favor of trying different things and seeing what works best for you. :)

MondSemmel1h20

Or if you're instead in the mode of deciding what to do next, or making a schedule for your day, etc., then that's different, but working memory is still kinda irrelevant because presumably you have your to-do list open on your computer, right in front of your eyes, while you do that, right?

Whenever I look at a to-do list, I've personally found it noticeably harder to decide which of e.g. 15 tasks to do, than which of <10 tasks to do. And this applies to lists of all kinds. A related difficulty spike appears once a list no longer fits on a single screen and requires scrolling.

An Introduction to AI Sandbagging

Teun van der Weij, Felix Hofstätter, Francis Rhys Ward

Ω 1814d

Summary: Evaluations provide crucial information to determine the safety of AI systems which might be deployed or (further) developed. These development and deployment decisions have important safety consequences, and therefore they require trustworthy information. One reason why evaluation results might be untrustworthy is sandbagging, which we define as strategic underperformance on an evaluation. The strategic nature can originate from the developer (developer sandbagging) and the AI system itself (AI system sandbagging). This post is an introduction to the problem of sandbagging.

The Volkswagen emissions scandal

There are environmental regulations which require the reduction of harmful emissions from diesel vehicles, with the goal of protecting public health and the environment. Volkswagen struggled to meet these emissions standards while maintaining the desired performance and fuel efficiency of their diesel engines (Wikipedia). Consequently, Volkswagen...

(Continue Reading – 2217 more words)

Teun van der Weij2h10

I am not sure I fully understand your point, but the problem with detecting sandbagging is that you do not know the actual capability of a model. And I guess that you mean "an anomalous decrease in capability" and not increase?

Regardless, could you spell out more how exactly you'd detect sandbagging?

LESSWRONG
LW

Quick Takes

Popular Comments

Recent Discussion

You’re Single Because Dating Apps Keep Getting Worse

1. If you find that you’re reluctant to permanently give up on to-do list items, “deprioritize” them instead

The Volkswagen emissions scandal

LessOnline & Manifest Summer Camp

June 3rd to June 7th