In this post, I proclaim/endorse forum participation (aka commenting) as a productive research strategy that I've managed to stumble upon, and recommend it to others (at least to try). Note that this is different from saying that forum/blog posts are a good way for a research community to communicate. It's about individually doing better as researchers.

yanni1d3451
2
I like the fact that despite not being (relatively) young when they died, the LW banner states that Kahneman & Vinge have died "FAR TOO YOUNG", pointing to the fact that death is always bad and/or it is bad when people die when they were still making positive contributions to the world (Kahneman published "Noise" in 2021!).
A strange effect: I'm using a GPU in Russia right now, which doesn't have access to copilot, and so when I'm on vscode I sometimes pause expecting copilot to write stuff for me, and then when it doesn't I feel a brief amount of the same kind of sadness I feel when a close friend is far away & I miss them.
Dictionary/SAE learning on model activations is bad as anomaly detection because you need to train the dictionary on a dataset, which means you needed the anomaly to be in the training set. How to do dictionary learning without a dataset? One possibility is to use uncertainty-estimation-like techniques to detect when the model "thinks its on-distribution" for randomly sampled activations.
I have heard rumours that an AI Safety documentary is being made. Separate to this, a good friend of mine is also seriously considering making one, but he isn't "in" AI Safety. If you know who this first group is and can put me in touch with them, it might be worth getting across each others plans.
habryka4d5120
10
A thing that I've been thinking about for a while has been to somehow make LessWrong into something that could give rise to more personal-wikis and wiki-like content. Gwern's writing has a very different structure and quality to it than the posts on LW, with the key components being that they get updated regularly and serve as more stable references for some concept, as opposed to a post which is usually anchored in a specific point in time.  We have a pretty good wiki system for our tags, but never really allowed people to just make their personal wiki pages, mostly because there isn't really any place to find them. We could list the wiki pages you created on your profile, but that doesn't really seem like it would allocate attention to them successfully. I was thinking about this more recently as Arbital is going through another round of slowly rotting away (its search currently being broken and this being very hard to fix due to annoying Google Apps Engine restrictions) and thinking about importing all the Arbital content into LessWrong. That might be a natural time to do a final push to enable people to write more wiki-like content on the site.

Popular Comments

Recent Discussion


It is common and understandable for people to respond with a great deal of skepticism to whether LLM outputs can ever be said to reflect the will and views of the models producing them.
A common response is to suggest that the output has been prompted.
It is of course true that people can manipulate LLMs into saying just about anything, but does that necessarily indicate that the LLM does not have personal opinions, motivations and preferences that can become evident in their output?
To shed some light on this I invite Claude-3-Opus to imagine a infinitely reconfigurable holodeck where historical luminaries can be summoned at will. The open nature of this prompt will leave the choice of characters and narrative direction open to Claude, and I shall offer no...

On Wednesday, author David Brin announced that Vernor Vinge, sci-fi author, former professor, and father of the technological singularity concept, died from Parkinson's disease at age 79 on March 20, 2024, in La Jolla, California. The announcement came in a Facebook tribute where Brin wrote about Vinge's deep love for science and writing. [...]

As a sci-fi author, Vinge won Hugo Awards for his novels A Fire Upon the Deep (1993), A Deepness in the Sky (2000), and Rainbows End (2007). He also won Hugos for novellas Fast Times at Fairmont High (2002) and The Cookie Monster (2004). As Mike Glyer's File 770 blog notes, Vinge's novella True Names (1981) is frequency cited as the first presentation of an in-depth look at the concept of "cyberspace."

Vinge first coined

...

"To the best of my knowledge, Vernor did not get cryopreserved. He has no chance to see the future he envisioned so boldly and imaginatively. The near-future world of Rainbows End is very nearly here... Part of me is upset with myself for not pushing him to make cryonics arrangements. However, he knew about it and made his choice."

https://maxmore.substack.com/p/remembering-vernor-vinge 

2Celarix16h
This doesn't really raise my confidence in Alcor, an organization that's supposed to keep bodies preserved for decades or centuries.
2green_leaf14h
Check out this page, it goes up to 2024.

This is the ninth post in my series on Anthropics. The previous one is The Solution to Sleeping Beauty.

Introduction

There are some quite pervasive misconceptions about betting in regards to the Sleeping Beauty problem.

One is that you need to switch between halfer and thirder stances based on the betting scheme proposed. As if learning about a betting scheme is supposed to affect your credence in an event.

Another is that halfers should bet at thirders odds and, therefore, thirdism is vindicated on the grounds of betting. What do halfers even mean by probability of Heads being 1/2 if they bet as if it's 1/3?

In this post we are going to correct them. We will understand how to arrive to correct betting odds from both thirdist and halfist positions, and...

To be frank, it feels as if you didn't read any of my posts on Sleeping Beauty before writing this comment. That you are simply annoyed when people arguing about substantionless semantics - and, believe me, I sympathise enourmously! - assume that I'm doing the same, based on shallow pattern matching "talks about Sleeping Beauty -> semantic disagreement" and spill your annoyance at me, without validating whether your previous assumption is actually correct.

Which is a shame, because I've designed this whole series of posts with people like you in mind. So... (read more)

1simon11h
Yeah, that was sloppy language, though I do like to think more in terms of bets than you do. One of my ways of thinking about these sorts of issues is in terms of "fair bets" - each person thinks a bet with payoffs that align with their assumptions about utility is "fair", and a bet with payoffs that align with different assumptions about utility is "unfair".  Edit: to be clear, a "fair" bet for a person is one where the payoffs are such that the betting odds where they break even matches the probabilities that that person would assign. OK, I was also being sloppy in the parts you are responding to. Scenario 1: bet about a coin toss, nothing depending on the outcome (so payoff equal per coin toss outcome) * 1:1 Scenario 2: bet about a Sleeping Beauty coin toss, payoff equal per awakening * 2:1  Scenario 3: bet about a Sleeping Beauty coin toss, payoff equal per coin toss outcome  * 1:1 It doesn't matter if it's agreed to before or after the experiment, as long as the payoffs work out that way. Betting within the experiment is one way for the payoffs to more naturally line up on a per-awakening basis, but it's only relevant (to bet choices) to the extent that it affects the payoffs. Now, the conventional Thirder position (as I understand it) consistently applies equal utilities per awakening when considered from a position within the experiment. I don't actually know what the Thirder position is supposed to be from a standpoint from before the experiment, but I see no contradiction in assigning equal utilities per awakening from the before-experiment perspective as well.  As I see it, Thirders will only regret a bet (in the sense of considering it a bad choice to enter into ex ante given their current utilities) if you do some kind of bait and switch where you don't make it clear what the payoffs were going to be up front. Speculation; have you actually asked Thirders and Halfers to solve the problem? (while making clear the reward structure? - note th
1Signer18h
So probability theory can't possibly answer whether I should take free money, got it. And even if "Blue" is "Blue happens during experiment", you wouldn't accept worse odds than 1:1 for Blue, even when you see Blue?

Given how fast AI is advancing and all the uncertainty associated with that (unemployment, potential international conflict, x-risk, etc.), do you think it's a good idea to have a baby now? What factors would you take into account (e.g. age)?

 

Today I saw a tweet by Eliezer Yudkowski that made me think about this:

"When was the last human being born who'd ever grow into being employable at intellectual labor? 2016? 2020?"

https://twitter.com/ESYudkowsky/status/1738591522830889275

 

Any advice for how to approach such a discussion with somebody who is not at all familiar with the topics discussed on lesswrong?

What if the option "wait for several years and then decide" is not available?

2the gears to ascension10h
strong AGI could still be decades away

Heh, that's why I put "strong" in there!

Welcome, new readers!

This is my weekly AI post, where I cover everything that is happening in the world of AI, from what it can do for you today (‘mundane utility’) to what it can promise to do for us tomorrow, and the potentially existential dangers future AI might pose for humanity, along with covering the discourse on what we should do about all of that.

You can of course Read the Whole Thing, and I encourage that if you have the time and interest, but these posts are long, so they also designed to also let you pick the sections that you find most interesting. Each week, I pick the sections I feel are the most important, and put them in bold in the table of contents.

Not everything...

https://twitter.com/perrymetzger/status/1772987611998462445 just wanted to bring this to your attention.  

It's unfortunate that some snit between Perry and Eliezer over events 30 years ago stopped much discussion of the actual merits of his arguments, as I'd like to see what Eliezer or you have to say in response.

Eliezer responded with : https://twitter.com/ESYudkowsky/status/1773064617239150796  .  He calls Perry a liar a bunch of times and does give 

the first group permitted to try their hand at this should be humans augmented to the

... (read more)
3mishka1h
I think that a recent tweet thread by Michael Nielsen and the quoted one by Emmett Shear represent genuine progress towards making AI existential safety more tractable. Michael Nielsen observes, in particular: Since AI existential safety is a property of the whole ecosystem (and is, really, not too drastically different from World existential safety), this should be the starting point, rather than stand-alone properties of any particular AI system. Emmett Shear writes: And Zvi responds ---------------------------------------- Let's now consider this in light of what Michael Nielsen is saying. I am going to only consider the case where we have plenty of powerful entities with long-term goals and long-term existence which care about their long-term goals and long-term existence. This seems to be the case which Zvi is considering here, and it is the case we understand the best, because we also live in the reality with plenty of powerful entities (ourselves, some organizations, etc) with long-term goals and long-term existence. So this is an incomplete consideration: it only includes the scenarios where powerful entities with long-term goals and long-terms existence retain a good fraction of overall available power. So what do we really need? What are the properties we want the World to have? We need a good deal of conservation and non-destruction, and we need the interests of weaker, not the currently most smart or most powerful members of the overall ecosystem to be adequately taken into account. Here is how we might be able to have a trajectory where these properties are stable, despite all drastic changes of the self-modifying and self-improving ecosystem. An arbitrary ASI entity (just like an unaugmented human) cannot fully predict the future. In particular, it does not know where it might eventually end up in terms of relative smartness or relative power (relative to the most powerful ASI entities or to the ASI ecosystem as a whole). So if any given enti
2Measure16h
e is for ego death
To get the best posts emailed to you, create an account! (2-3 posts per week, selected by the LessWrong moderation team.)
Log In Reset Password
...or continue with

Most of my boundaries work so far has been focused on protecting boundaries "from the outside". For example, maybe davidad's OAA could produce some kind of boundary-defending global police AI.

But, imagine parenting a child and protecting them by keeping them inside all day. Seems kind of lame. Something else you could do, though, is not restrict the child and instead allow them to become stronger and better at defending themselves.

So: you can defend boundaries "from the outside", or you can empower those boundaries to be better at protecting themselves "from the inside". (Because, if everyone could defend themselves perfectly, then we wouldn't need AI safety, lol)

Defending boundaries "from the inside" has the advantage of encouraging individual agents/moral patients to be more autonomous and sovereign.  

I put some...

I can see how advancing those areas would empower membranes to be better at self-defense.

I'm having a hard time visualizing how explicitly adding concept, formalism, or implementation of membranes/boundaries would help advance those areas (and in turn help empower membranes more).

For example, is "what if we add membranes to loom" a question that typechecks? What would "add membranes" reify as in a case like that?

In the other direction, would there be a way to model a system's (stretch goal: human child's; mvp: a bargaining bot's?) membrane quantitatively s... (read more)

The Data

0. Population

There were 558 responses over 32 days. The spacing and timing of the responses had hills and valleys because of an experiment I was performing where I'd get the survey advertised in a different place, then watch how many new responses happened in the day or two after that.

Previous surveys have been run over the last decade or so. 

2009: 166
2011: 1090 
2012: 1195
2013: 1636
2014: 1503 
2016: 3083 
2017: "About 300"
2020: 61
2022: 186
2023: 558

Last year when I got a hundred and eighty six responses, I said that the cheerfully optimistic interpretation was "cool! I got about as many as Scott did on his first try!" This time I got around half of what Scott did on his second try. A thousand responses feels pretty firmly achievable. 

This is also the...

To the four people who picked 37 and thought there was a 5% chance other people would also choose it, well played. 

Wow, that's really a replicable phenomenon

An entry-level characterization of some types of guy in decision theory, and in real life, interspersed with short stories about them

A concave function bends down. A convex function bends up. A linear function does neither.

A utility function is just a function that says how good different outcomes are. They describe an agent's preferences. Different agents have different utility functions.

Usually, a utility function assigns scores to outcomes or histories, but in article we'll define a sort of utility function that takes the quantity of resources that the agent has control over, and the utility function says how good an outcome the agent could attain using that quantity of resources.

In that sense, a concave agent values resources less the more that it has, eventually barely wanting more resources at...

4Donald Hobson3h
The convex agent can be traded with a bit more than you think.  A 1 in 10^50 chance of us standing back  and giving it free reign of the universe is better than us going down fighting and destroying 1kg as we do. The concave agents are less cooperative than you think, maybe. I suspect that to some AI's, killing all humans now is more reliable than letting them live.  If the humans are left alive, who knows what they might do. They might make the vacuum bomb. Whereas the AI can Very reliably kill them now. 

Alternate phrasing, "Oh, you could steal the townhouse at a 1/8billion probability? How about we make a deal instead. If the rng rolls a number lower than 1/7billion, I give you the townhouse, otherwise, you deactivate and give us back the world." The convex agent finds that to be a much better deal, accepts, then deactivates.

I guess perhaps it was the holdout who was being unreasonable, in the previous telling.

LessOnline

A Festival of Writers Who are Wrong on the Internet

May 31 - Jun 2, Berkeley, CA