ozziegooen

I'm currently researching forecasting and epistemics as part of the Quantified Uncertainty Research Institute.

Sequences

Beyond Questions & Answers
Squiggle
Prediction-Driven Collaborative Reasoning Systems

Wiki Contributions

Comments

Sorted by

Caroline Jeanmaire


This seems like an error. From their page, it seems like the CEO is Sanjana Kashyap.

https://www.intelligencerising.org/about

Minor point, but I'd be happy if LessWrong/Lightcone had various (popular) subscriptions for perks, like Patreon. 

Some potential perks:

  • A username with a certain color
  • A flag on your profile
  • Some experimental feature access
  • "We promise to consider your feature requests a bit more"
  • Some monthly (or less frequent) cheap event at Lighthaven
  • Get your name in the LessWrong books
  • "Supporters" part of Lighthaven. Maybe different supporters could sponsor various Lighthaven nooks - there are a bunch of these.
  • Secret Discord channel
  • Invites to some otherwise private Lighthaven things

I realize these can be a pain to set up though. 

(I'd want this if it helped with total profit, to Lightcone) 

Donated $300 now, intend to donate more (after more thinking).

My impression is that if you read LessWrong regularly, it could easily be worth $10-$30/month for you. If you've attended Lighthaven, there's an extra benefit there, which could be much more. So I think it's very reasonable for people in our community to donate $100 (a ~$10/month Substack subscription) to $10k (a fancy club membership) per person or so, depending on the person, just from the standpoint of thinking of it as communal/local infrastructure.

One potential point of contention is with the fact that I believe some of the team is working on future, more experimental projects, than just the LessWrong/Lighthaven basics. But I feel pretty good about this in-general, it's just more high-risk and more difficult to recommend. 

I also think it's just good practice for community-focused projections to get donations from the community. This helps keep incentives more aligned. I think that the Lighthaven team is about as relevant as things get, on this topic, now. 

I am not particularly sympathetic to your argument, which amounts to 'the public might pressure them to train away the inconvenient thoughts, so they shouldn't let the public see the inconvenient thoughts in the first place.'

I was attempting to make a descriptive claim about the challenges they would face, not a normative claim that it would be better if they wouldn't expose this information.

From a stance of global morality, it seems quite scary for one company to oversee and then hide all the epistemic reasoning of their tools.

I'd also guess that the main issue I raised, should rarely be the main problem with o1. I think that there is some limit of epistemic quality you can reach without offending users. But this is mainly for questions like, "How likely are different religions", not, 'what is the best way of coding this algorithm", which is what o1 seems more targeted towards now.

So I'd imagine that most cases in which the reasoning steps of o1 would look objectionable, would be ones that are straightforward technical problems, like the system lying in some steps or reasoning in weird ways. 

Also, knowledge of these steps might just make it easier to crack/hack o1. 

If I were a serious API users of an o1-type system, I'd seriously want to see the reasoning steps, at very least. I imagine that over time, API users will be able to get a lot of this from these sorts of systems.

If it is the case that a frontier is hit when the vast majority of objectionable-looking steps are due to true epistemic disagreements, then I think there's a different discussion to be had. It seems very safe to me to at least ensure that the middle steps are exposed to academic and government researchers. I'm less sure then of the implications of revealing this data to the public. It does seem like generally a really hard question to me. While I'm generally pro-transparency, if I were convinced then that full transparent reasoning would force these models to hold incorrect beliefs at a deeper level, I'd be worried.

I'm in support of this sort of work. Generally, I like the idea of dividing up LLM architectures into many separate components that could be individually overseen / aligned.

Separating "Shaggoth" / "Face" seems like a pretty reasonable division to me. 

At the same time, there are definitely a lot of significant social+political challenges here. 

I suspect that one reason why OpenAI doesn't expose all the thinking of O1 is that this thinking would upset some users, especially journalists and such. It's hard enough making sure that the final outputs are sufficiently unobjectionable to go public at a large scale. It seems harder to make sure the full set of steps is also unobjectionable.

One important thing that smart intellectuals do is to have objectionable/unpopular beliefs, but still present unobjectionable/popular outputs. For example, I'm sure many of us have beliefs that could get us cancelled by some group or other. 

If the entire reasoning process is exposed, this might pressure it to be unobjectionable, even if that trades off against accuracy. 

In general, I'm personally very much in favor of transparent thinking and argumentation. It's just that I've noticed this as one fundamental challenge with intellectual activity, and I also expect to see it here. 

One other challenge to flag - I imagine that the Shaggoth and Face layers would require (or at least greatly benefit from) some communication back and forth. An intellectual analysis could vary heavily depending on the audience. It's not enough to do all the intellectual work in one pass, then match it to the audience after that.

For example, if an AI were tasked with designing changes to New York City, it might matter a lot that if the audience is a religious zealot.

One last tiny point - in future work, I'd lean against using the phrase "Shaggoth" as the reasoning step. It sort of makes sense to this audience, but I think it's a mediocre fit here. For one, because I assume the "Face" could have some "Shaggoth" like properties. 

I think that xAI's contributions have been minimal so far, but that could shift. Apparently they have a very ambitious data center coming up, and are scaling up research efforts quickly. Seems very accelerate-y.

Seeking feedback on this AI Safety proposal:
(I don't have experience in AI experimentation)

I'm interested in the question of, "How can we use smart AIs to help humans at strategic reasoning."

We don't want the solution to be, "AIs just tell humans exactly what to do without explaining themselves." We'd prefer situations where smart AIs can explain to humans how to think about strategy, and this information makes humans much better at doing strategy. 

One proposal to make progress on this is to set a benchmark for having smart AIs help out dumb AIs by providing them with strategic information. 

Or more specifically, we find methods of having GPT4 give human-understandable prompts to GPT2, that would allow GPT2 to do as well as possible on specific games like chess. 

Some improvements/changes would include: 

  1. Try to expand the games to include simulations of high-level human problems. Like simplified versions of Civilization.
  2. We could also replace GPT2 with a different LLM that better represents how a human, with some amount of specialized knowledge (for example, being strong at probability).
  3. There could be a strong penalty for prompts that aren't human-understandable.
  4. Use actual humans in some experiments. See how much they improve at specific [chess | civilization] moves, with specific help text.
  5. Instead of using GPT2, you could likely just use GPT4. My impression is that GPT4 is a fair bit worse than the SOTA chess engines. So you use some amplified GPT4 procedure, to figure out how to come up with the best human-understandable chess prompts, to give to GPT4s without the amplification.
  6. You set certain information limits. For example, you see how good of a job an LLM could do with "100 bits" of strategic information. 


A solution would likely involve search processes where GPT4 experiments with a large space of potential English prompts, and tests them over the space of potential chess moves. I assume that reinforcement learning could be done here, but perhaps some LLM-heavy mechanism could work better. I'd assume that good strategies would be things like, "In cluster of situations X, you need to focus on optimizing Y." So the "smart agent" would need to be able to make clusters of different situations, and solve for a narrow prompt for many of them. 

It's possible that the best "strategies" would be things like long decision-trees. One of the key things to learn about is what sorts/representations of information wind up being the densest and most useful. 

Zooming out, if we had AIs that we knew give AIs and humans strong and robust strategic advice in test cases, I imagine we could use some of this for real life cases - perhaps most importantly, to strategize about AI safety. 

I was pretty surprised to see so little engagement / net-0 upvotes (besides mine). Any feedback is appreciated, I'm curious how to do better going forward. 

I spent a while on this and think there's a fair bit here that would be useful to some community members, though perhaps the jargon or some key aspects weren't very liked. 

I'd flag that I think it's very possible TSMC will be very much hurt/destroyed if China is in control. There's been a bit of discussion of this.

I'd suspect China might fix this after some years, but would expect it would be tough for a while. 

https://news.ycombinator.com/item?id=40426843

Load More