In this post, I proclaim/endorse forum participation (aka commenting) as a productive research strategy that I've managed to stumble upon, and recommend it to others (at least to try). Note that this is different from saying that forum/blog posts are a good way for a research community to communicate. It's about individually doing better as researchers.

Recent Discussion

I will try to keep this shot, just want to use some simple problems to point out what I think is a commonly overlooked point in anthropic discussions.

1. The Room Assignment Problem

You are among 100 people waiting in a hallway. The hallway leads to a hundred rooms numbered from 1 to 100. All of you are knocked out by a sleeping gas and each put into a random/unknown room. After waking up, what is the probability that you are in room No. 1?

This is just an ordinary probability question. All room numbers are symmetrical, the answer is simply 1%. It is also easy to imagine taking part in similar room-assigning experiments a great number of times, the relative fraction of you waking up in room No.1, or...

Doesn't example 3 show that one and two are actually the same? What difference does it make whether you start inside or outside the room?

To quickly recap my main intellectual journey so far (omitting a lengthy side trip into cryptography and Cypherpunk land), with the approximate age that I became interested in each topic in parentheses:

  • (10) Science - Science is cool!
  • (15) Philosophy of Science - The scientific method is cool! Oh look, there's a whole field studying it called "philosophy of science"!
  • (20) Probability Theory - Bayesian subjective probability and the universal prior seem to constitute an elegant solution to the philosophy of science. Hmm, there are some curious probability puzzles involving things like indexical uncertainty, copying, forgetting... I and others make some progress on this but fully solving anthropic reasoning seems really hard. (Lots of people have worked on this for a while and have failed, at least according to my
...

ensuring AI philosophical competence won't be very hard. They have a specific (unpublished) idea that they are pretty sure will work.


Cool, can you please ask them if they can send me the idea, even if it's just a one-paragraph summary or a pile of crappy notes-to-self?

2Daniel Kokotajlo2h
Facile response: I think lots of people (maybe a few hundred a year?) take this path, and end up becoming philosophy grad students like I did. As you said, the obvious next step for many domains of intellectual inquiry is to go meta / seek foundations / etc., and that leads you into increasingly foundational increasingly philosophical questions until you decide you'll never able to answer all the questions but maybe at least you can get some good publications in prestigious journals like Analysis and Phil Studies, and contribute to humanity's understanding of some sub-field.  
2romeostevensit2h
When I look at metaphilosophy, the main places I go looking are places with large confusion deltas. Where, who, and why did someone become dramatically less philosophically confused about something, turning unfalsifiable questions into technical problems. Kuhn was too caught up in the social dynamics to want to do this from the perspective of pure ideas. A few things to point to. 1. Wittgenstein noticed that many philosophical problems attempt to intervene at the wrong level of abstraction and posited that awareness of abstraction as a mental event might help 2. Korzybski noticed that many philosophical problems attempt to intervene at the wrong level of abstraction and posited that awareness of abstraction as a mental event might help 3. David Marr noticed that many philosophical and technical problems attempt to intervene at the wrong level of you get the idea 4. Hassabis cites Marr as of help in deconfusing AI problems 5. Eliezer's Technical Explanation of Technical Explanation doesn't use the term compression and seems the worse for it, using many many words to describe things that compression would render easier to reason about afaict. 6. Hanson in the Elephant in the Brain posits that if we mysteriously don't make progress on something that seems crucial, maybe we have strong motivations for not making progress on it. Question: what happens to people when they gain consciousness of abstraction? My first pass attempt at an answer is that they become a lot less interested in philosophy. Question: if someone had quietly made progress on metaphilosophy how would we know? First guess is that we would only know if their solution scaled well, or caused something to scale well.
2Vanessa Kosoy3h
First, I think that the theory of agents is a more useful starting point than metaphilosophy. Once we have a theory of agents, we can build models, within that theory, of agents reasoning about philosophical questions. Such models would be answers to special cases of metaphilosophy. I'm not sure we're going to have a coherent theory of "metaphilosophy" in general, distinct from the theory of agents, because I'm not sure that "philosophy" is an especially natural category[1]. Some examples of what that might look like: * An agent inventing a theory of agents in order to improve its own cognition is a special case of recursive metalearning (see my recent talk on metacognitive agents). * There might be theorems about convergence of learning systems to agents of particular type (e.g. IBP agents), formalized using some brand of ADAM, in the spirit of John's Selection Theorems programme. This can be another model of agents discovering a theory of agents and becoming more coherent as a result (broader in terms of its notions of "agent" and "discovering" and narrower in terms of what the agent discovers). * An agent learning how to formalize some of its intuitive knowledge (e.g. about its own values) can be described in terms of metacognition, or more generally, the learning of some formal symbolic language. Indeed, understanding is translation, and formalizing intuitive knowledge means translating it from some internal opaque language to an external observable language. Second, obviously in order to solve philosophical problems (such as the theory of agents), we need to implement a particular metaphilosophy. But I don't think it needs to has to be extremely rigorous. (After all, if we tried to solve metaphilosophy instead, we would have the same problem.) My informal theory of metaphilosophy is something like: an answer to a philosophical question is good when it seems intuitive, logically consistent and parsimonious[2] after suffi

Summary

EA Funds aims to empower thoughtful individuals and small groups to carry out altruistically impactful projects - in particular, enabling and accelerating small/medium-sized projects (with grants <$300K). We are looking to increase our level of independence from other actors within the EA and longtermist funding landscape and are seeking to raise ~$2.7M for the Long-Term Future Fund and ~$1.7M for the EA Infrastructure Fund (~$4.4M total) over the next six months.

Why donate to EA Funds? EA Funds is the largest funder of small projects in the longtermist and EA infrastructure spaces, and has had a solid operational track record of giving out hundreds of high-quality grants a year to individuals and small projects. We believe that we’re well-placed to fill the role of a significant independent grantmaker, because...

Thank you, that would be great! 

1Daniel_Eth6h
One way I think about this is there are just so many weird (positive and negative) feedback loops and indirect effects, so it's really hard to know if any particular action is good or bad. Let's say you fund a promising-seeming area of alignment research – just off the top of my head, here are several ways that grant could backfire: • the research appears promising but turns out not to be, but in the meantime it wastes the time of other alignment researchers who otherwise would've gone into other areas • the research area is promising in general, but the particular framing used by the researcher you funded is confusing, and that leads to slower progress than counterfactually • the researcher you funded (unbeknownst to you) turns out to be toxic or otherwise have bad judgment, and by funding him, you counterfactually poison the well on this line of research • the area you fund sees progress and grows, which counterfactually sucks up lots of longtermist money that otherwise would have been invested and had greater effect (say, during crunch time) • the research is somewhat safety-enhancing, to the point that labs (facing safety-capabilities tradeoffs) decide to push capabilities further than they otherwise would, and safety is hurt on net • the research is somewhat safety-enhancing, to the point that it prevents a warning shot, and that warning shot would have been the spark that would have inspired humanity to get its game together regarding combatting AI X-risk • the research advances capabilities, either directly or indirectly • the research is exciting and draws the attention of other researchers into the field, but one of those researchers happens to have a huge, tail negative effect on the field outweighing all the other benefits (say, that particular researcher has a very extreme version of one of the above bullet points) • Etcetera – I feel like I could do this all day. Some of the above are more likely than others, but there are just so many differen
13Linch7h
Really? Without giving away names, can you tell me roughly what cluster they are in? Geographical area, age range, roughly what vocation (technical AI safety/AI policy/biosecurity/community building/earning-to-give)?  Definitely closer to the former than the latter! Here are some steps in my thought process: * The standard longtermist cluelessness arguments ("you can't be sure if eg improving labor laws in India is good because it has uncertain effects on the population and happiness of people in Alpha Centauri in the year 4000") doesn't apply in full-force if you buy high near-term (10-100 years) probability of AI doom, and that AI doom is bad and avoidable. * or (less commonly on LW but more common in some other EA circles) other sources of hinge of history like totalitarian lock-in, s-risks, etc * If you assign low credence in any hinge of history hypothesis, I think you are still screwed by the standard cluelessness arguments, unfortunately. * But even with a belief in x-risk hinge of history, cluelessness still apply significantly. Knowing whether an action reduces x-risk is much easier in relative terms than knowing whether an action will improve the far future in the absence of x-risk, but it's still hard in absolute terms. * If we drill down on a specific action and a specific theory of change ("I want to convince a specific Senator to sign a specific bill to regulate the size of LLM models trained in 2024", "I want to do this type of technical research to understand this particular bug in this class of transformer models, because better understanding of this bug can differentially advance alignment over capabilities at Anthropic if Anthropic will scale up this type of model"), any particular action's impact is just built on a tower of conjunctions and it's really hard to get any grounding to seriously argue that it's probably positive. * So how do you get any robustness? You imagine the set
2Linch11h
I think this is true at the current margin, because we have so limited money.. But if we receive say enough funding to lower the bar to closer to what our early 2023 bar was, I will still want to make skill-up grants to fairly talented/promising people, and I still think they are quite cost-effective. I do expect those grants to have more capabilities externalities (at least in terms of likelihood, maybe in expectation as well) than when we give grants to people who currently could be hired at (eg) Anthropic but choose not to.  It's possible you (and maybe Oli?) disagree and think we should fund moderate-to-good direct work projects over all (or almost all) skillup grants; in that case this is a substantive disagreement about what we should do in the future.

By all reports, and as one would expect, Google’s Gemini looks to be substantially superior to GPT-4. We now have more details on that, and also word that Google plans to deploy it in December, Manifold gives it 82% to happen this year and similar probability of being superior to GPT-4 on release.

I indeed expect this to happen on both counts. This is not too long from now, but also this is AI #27 and Bard still sucks, Google has been taking its sweet time getting its act together. So now we have both the UK Summit and Gemini coming up within a few months, as well as major acceleration of chip shipments. If you are preparing to try and impact how things go, now might be...

You store everything on a cloud instance, where you don’t get to see the model weights and they don’t get to see your data either, and checks are made only to ensure you are within terms of service or any legal restrictions.

Is it actually possible to build a fine-tuning-and-model-hosting product such that

  1. The customer can't access the model weights
  2. The host can't access the training data, or the inputs or outputs of inference (and this "can't" is in the cryptography sense not the legal sense, because otherwise the host is a giant juicy target for hacki
... (read more)
1MiguelDev7h
OpenAI is not solely focused on alignment; they allocate only 20% of their resources to superalignment research. What about the remaining 80%? This often goes undiscussed, yet it arguably represents the largest budget allocated to improving capabilities. Why not devote 100% of resources to building an aligned AGI? Such an approach would likely yield the best economic returns, as people are more likely to use a trustworthy AI, and governments would also be more inclined to promote its adoption.
8Thane Ruthenis8h
One addition I'd make here is: I think what people imagine, when they imagine "us" noticing an AGI going rogue and "fighting back", is movie scenarios where the obviously evil AGI becomes obviously evil in a way that's obvious to everyone, and then it's a neatly arranged white-and-black humanity vs. machines all-out fight. But in real life, such unambiguousness is rare. The monsters don't look obviously evil, the signs of fatal issues are rarely blatant. Is this whiff of smoke a sign of fire, or just someone nearby being bad at cooking? Is this creepy guy actually planning to assault you, or you're just being paranoid? Is this weird feeling in your chest a sign of an impending heart attack, or just some biological noise? Is this epidemic truly following an exponential curve, or it's going to peter out somehow? Are you really, really sure the threat is so major? So sure you'd actually take those drastic actions — call emergency services, throw a fit, declare a quarantine — and risk wasting resources and doing harm and looking foolish for overreacting? Nah, wouldn't do to panic, that's not socially appropriate at all. Better act very concerned, but in a calm, high-status fashion. Maybe it'll all work itself out on its own! And the AGI, if it's worth the name, would not fail to exploit this. It may start clearly acting to amass power, but there would always be a prosocial, plausible-sounding justification for why it's doing that, it'd never stop making pleasant noises about having people's best interests at heart, it'd never stop being genuinely useful to someone such that there'd be no clear harm in shutting it down. The doubt would never go away. Much like there's no fire alarm for AGI, there would be no fire alarm for the treacherous turn. There would never be a moment, except maybe right before the end, where "we must stop the malign AGI from killing us all!" would sound obviously right to everyone. There would always be ambiguity, this sort of message would al
2Noosphere896h
While I definitely agree that a fight between humanity and AGI will never look like humanity vs AGI due to the issues with the abstraction of humanity, I do think one key disagreement I have with this comment is that I don't think that there is no fire alarm for AGI, and in general my model is that if anything a lot of people will support very severe restrictions on AI and AI progress for safety. I think this already happened several months ago, and there people got freaked out about AI, and that was merely GPT-4. We will get a lot of fire alarms, especially via safety incidents. A lot of people are already primed for apocalyptic narratives, and if AI progresses in a big way, this will fan the flames into a potential AI-killer, supported by politicians. It's not impossible for tech companies to defuse this, but damn is it hard to defuse. I worry about the opposite problem, in that if existential risk concerns look less and less likely, AI regulation may nonetheless become quite severe, and the AI organizations built by LessWrongers have systematic biases that will prevent them from updating to this position.

I am experimenting with pulling more social media content directly into these digests, in part to rely less on social media sites long-term (since content might be deleted, blocked, paywalled, etc.) That makes these digests longer, but it means there is less need to click on links.

I will still link back to original social media posts in order to give credit and make sharing easier. As always, let me know your feedback.

Opportunities

...

Patrick Collison has a fantastic list of examples of people quickly accomplishing ambitious things together since the 19th Century. It does make you yearn for a time that feels... different, when the lethargic behemoths of government departments could move at the speed of a racing startup:  

[...] last century, [the Department of Defense] innovated at a speed that puts modern Silicon Valley startups to shame: the Pentagon was built in only 16 months (1941–1943), the Manhattan Project ran for just over 3 years (1942–1946), and the Apollo Program put a man on the moon in under a decade (1961–1969). In the 1950s alone, the United States built five generations of fighter jets, three generations of manned bombers, two classes of aircraft carriers, submarine-launched ballistic missiles, and nuclear-powered

...
To get the best posts emailed to you, create an account! (2-3 posts per week, selected by the LessWrong moderation team.)
Log In Reset Password
...or continue with

I think probably not.  When a dog is asked whether a Human is "conscious", he might mention things like:

  • Human seem to be somewhat aware of their surroundings, but they are not conscious beings really. 
  • They seem to be up and awake so often, but spend almost all their time blankly staring into objects in their hands or on table tops, or randomly moving objects around in the den.
  • They have very little understanding about the importance of the pack, hierarchy, safety in numbers, etc.
  • They have nearly no understanding of the most important dangers in life. Like strange smells and sounds, or uniformed strangers approaching the den.

In this same way many (perhaps most) AI experts might never agree that LLMs or AGI systems have achieved "consciousness"? As, "consciousness" is just...

7kuira9h
I'm going to experiment with, as I aesthetically call it internally, 'entering the wired'. By which I mean, a more mundane thing: replacing my environment with my computer screen via VR. I don't want to bother myself with the things around me, with the physical world, with that antiquated level of physics on which we still concern ourselves with matter rather than information. I want my whole perception to be information; text, webpages, articles here. I'm hoping this will help me focus on it more. More ambituously, I hope it might allow my mind to overgeneralize to that information environment and forego, even further than it has already, processing related to the physical world, in order to fully dedicate itself to information processing and idea generation. I hope this language doesn't sound mystical or ambiguous, I'm just describing a mundane thing (wearing a VR headset for most of the day to read things) in some exciting language/concepts I use for it internally. If anyone's interested in hearing about how this goes let me know now. :)

I'd enjoy seeing a post or two about your setup and initial experiences, and after some time, about your discoveries and remaining uncertainties.  I'm excited about the upcoming tech for this, but I'm not convinced it's quite good enough for me yet - having two large screens and a good keyboard and mouse is pretty good for my workstyle.

This is a linkpost to a recent blogpost from Michael Nielsen, who has previously written on EA among many other topics. This blogpost is adapted from a talk Nielsen gave to an audience working on AI before a screening of Oppenheimer. I think the full post is worth a read, but I've pulled out some quotes I find especially interesting (bolding my own)

I was at a party recently, and happened to meet a senior person at a well-known AI startup in the Bay Area. They volunteered that they thought "humanity had about a 50% chance of extinction" caused by artificial intelligence. I asked why they were working at an AI startup if they believed that to be true. They told me that while they thought it was

...
7DanielFilan10h
This strikes me as the sort of thing one would say without quite meaning it. Like, I'm sure this person could get other jobs that also support a nice house and car. And if they thought about it, they could probably also figure this out. I'm tempted to chalk the true decision up to conformity / lack of confidence in one's ability to originate and execute consequentialist plans, but that's just a guess and I'm not particularly well-informed about this person.

To paraphrase Von Neumann, sometimes we confess to a selfish motive that we may not be suspected of an unselfish one, or to one sin to avoid being accused of another.

[Of] the splendid technical work of the [atomic] bomb there can be no question. I can see no evidence of a similar high quality of work in policy-making which...accompanied this...Behind all this I sensed the desires of the gadgeteer to see the wheels go round.

3dr_s14h
I agree with that principle, but how is that relevant here? The Manhattan Project's effects weren't on long timelines.
5DanielFilan10h
The Manhattan Project brought us nuclear weapons, whose existence affect the world to this day, 79 years after its founding - I would call that a long timeline. And we might not have seen all the relevant effects! But yeah, I think we have enough info to make tentative judgements of at least Klaus Fuchs' espionage, and maybe Joseph Rotblat's quitting.