LESSWRONG
LW

leogao
6116Ω870324400
Message
Dialogue
Subscribe

Sequences

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
Alignment Stream of Thought
No wikitag contributions to display.
7leogao's Shortform
Ω
3y
Ω
402
leogao's Shortform
leogao6h30

the LLM cost should not be too bad. it would mostly be looking at vague vibes rather than requiring lots of reasoning about the thing. I trust e.g AI summaries vastly less because they can require actual intelligence.

I'm happy to fund this a moderate amount for the MVP. I think it would be cool if this existed.

I don't really want to deal with all the problems that come with modifying something that already works for other people, at least not before we're confident the ideas are good. this points towards building a new thing. fwiw I think if building a new thing, the chat part would be most interesting/valuable standalone (and I think it's good to have platforms grow out of a simple core rather than to do everything at once)

Reply
leogao's Shortform
leogao17h80

it's kind of haphazard and I have no reason to believe I'm better at prompting than anyone else. the broad strokes are I tell it to:

  • use lowercase
  • not use emojis
  • be concise, explain at bird's eye level
  • don't sugar cost things
  • not be too professional/formal; use some IRC/twitter slang without overdoing it
  • speak as if it's a conversation over a dinner table between two close friends who are also technical experts
  • don't dumb things down but also don't use unnecessary jargon

I've also been trying to get it to use CS/ML analogies when it would make things clearer, much the same way people on LW would do, but it's been hard to get the model to do it in a natural, non cringe way. rn it overdoes it and makes lots of very forced and not insightful analogies despite my attempts to explain to it

Reply1
leogao's Shortform
leogao20h40

there's a broader category of things which are not literally scrolling but still time wasting / consuming info not to enrich oneself, but to push the dopamine button, and I think even removing the scroll doesn't fix this (my phone is intentionally quite high friction to use and I still fail to stay off of it)

Reply1
leogao's Shortform
leogao1d30

to be clear I explicitly decided not to think too hard about this kind of issue when brainstorming. I think the long run solution is probably something like an elected governance scheme that lets the users control what model to use. maybe make it bicameral to split power between users and funders. but my main motivation for this brainstorming was to think of ideas I could implement in a weekend for shits and giggles to see how well they work irl

Reply1
leogao's Shortform
leogao1d224

one big problem with using LMs too much imo is that they are dumb and catastrophically wrong about things a lot, but they are very pleasant to talk to, project confidence and knowledgeability, and reply to messages faster than 99.99% of people. these things are more easily noticeable than subtle falsehood, and reinforce a reflex of asking the model more and more. it's very analogous to twitter soundbites vs reading long form writing and how that eroded epistemics.

hotter take: the extent to which one finds current LMs smart is probably correlated with how much one is swayed by good vibes from their interlocutor as opposed to the substance of the argument (ofc conditional on the model actually giving good vibes, which varies from person to person. I personally never liked chatgpt vibes until I wrote a big system prompt)

Reply
leogao's Shortform
leogao1d630

random brainstorming ideas for things the ideal sane discourse encouraging social media platform would have:

  • have an LM look at the comment you're writing and real time give feedback on things like "are you sure you want to say that? people will interpret that as an attack and become more defensive, so your point will not be heard". addendum: if it notices you're really fuming and flame warring, literally gray out the text box for 2 minutes with a message like "take a deep breath. go for a walk. yelling never changes minds"
  • have some threaded chat component bolted on (I have takes on best threading system). big problem is posts are fundamentally too high effort to be a way to think; people want to talk over chat (see success of discord). dialogues were ok but still too high effort and nobody wants to read the transcript. one stupid idea is have an LM look at the transcript and gently nudge people to write things up if the convo is interesting and to have UI affordances to make it low friction (eg a single button that instantly creates a new post and automatically invites everyone from the convo to edit, and auto populates the headers)
  • inspired by the court system, the most autistically rule following part of the US government: have explicit trusted judges who can be summoned to adjudicate claims or meta level "is this valid arguing" claims. top level judges are selected for fixed terms by a weighted sortition scheme that uses some game theoretic / schelling point stuff to discourage partisanship
  • recommendation system where you can say what kind of stuff you want to be recommended in some text box in the settings. also when people click "good/bad rec" buttons on the home page, try to notice patterns and occasionally ask the user whether a specific noticed pattern is correct and ask whether they want it appended to their rec preferences
  • opt in anti scrolling pop up that asks you every few days what the highest value interaction you had recently on the site was, or whether you're just mindlessly scrolling. gently reminds you to take a break if you can't come up with a good example of a good interaction.
  • argument mapping is really cool imo but I think most attempts fail because they try to make arguments super structured and legible. I think a less structured version that lets you vote on how much you think various posts respond to other posts and how well you think it addresses the key points and which posts overlap in arguments would be valuable. like you'd see clusters with (human written and vote selected) summaries of various clusters, and then links of various strengths inter cluster. I think this would greatly help epistemics by avoiding infinite argument retreading
Reply2111
Consider chilling out in 2028
leogao1d119

to be clear, I am not intending to claim that you wrote this post believing that it was wrong. I believe that you are trying your best to improve the epistemics and I commend the effort. 

I had interpreted your third sentence as still defending the policy of the post even despite now agreeing with Oliver, but I understand now that this is not what you meant, and that you are no longer in favor of the policy advocated in the post. my apologies for the misunderstanding.

I don't think you should just declare that people's beliefs are unfalsifiable. certainly some people's views will be. but finding a crux is always difficult and imo should be done through high bandwidth talking to many people directly to understand their views first (in every group of people, especially one that encourages free thinking among its members, there will be a great diversity of views!). it is not effective to put people on blast publicly and then backtrack when people push back saying you misunderstood their position.

I realize this would be a lot of work to ask of you. unfortunately, coordination is hard. it's one of the hardest things in the world. I don't think you have any moral obligation to do this beyond any obligation you feel to making AI go well / improving this community. I'm mostly saying this to lay out my view of why I think this post did not accomplish its goals, and what I think would be the most effective course of action to find a set of cruxes that truly captures the disagreement. I think this would be very valuable if accomplished and it would be great if someone did it.

Reply
I underestimated safety research speedups from safe AI
leogao2d112

my hot take is i agree that human researchers spend a ridiculous amount doing stupid stuff (see my shortform on this), but also I don't think it's very easy to automate the stupid stuff.

I've optimized my research setup to get quite tight feedback loops. if I had more slack I could probably make things even better, but it would look more like developing better infrastructure and hpopt techniques myself, than handing work off to agents.

I disagree that you have to use grid search and not anything more clever in theory. I currently use grid searches too for simplicity, and it's definitely nontrivial to get the clever thing to tell you about interactions, but it doesn't seem fundamentally impossible.

Reply
Consider chilling out in 2028
leogao6d30

i think there's a lot of variance. i personally can only work in unpredictable short intense bursts, during which i get my best work done; then i have to go and chill for a while. if i were 1 year away from the singularity i'd try to push myself past my normal limits and push chilling to a minimum, but doing so now seems like a bad idea. i'm currently trying to fix this more durably in the long run but this is highly nontrival

Reply
Consider chilling out in 2028
leogao6d75

it still seems bad to advocate for the exactly wrong policy, especially one that doesn't make sense even if you turn out to be correct (as habryka points out in the original comment, many think 2028 is not really when most people expect agi to have happened). it seems very predictable that people will just (correctly) not listen to the advice, and in 2028 both sides on this issue will believe that their view has been vindicated - you will think of course rationalists will never change their minds and emotions on agi doom, and most rationalists will think obviously it was right not to follow the advice because they never expected agi to definitely happen before 2028.

i think you would have much more luck advocating for chilling today and citing past evidence to make your case..

Reply
Load More
151My takes on SB-1047
10mo
8
106Scaling and evaluating sparse autoencoders
Ω
1y
Ω
6
55Weak-to-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision
Ω
2y
Ω
5
106Shapley Value Attribution in Chain of Thought
Ω
2y
Ω
7
42[ASoT] Some thoughts on human abstractions
Ω
2y
Ω
4
66Clarifying wireheading terminology
Ω
3y
Ω
6
103Scaling Laws for Reward Model Overoptimization
Ω
3y
Ω
13
27How many GPUs does NVIDIA make?
Q
3y
Q
2
81Towards deconfusing wireheading and reward maximization
Ω
3y
Ω
7
27Humans Reflecting on HRH
Ω
3y
Ω
4
Load More