Cédric — LessWrong

Towards Monosemanticity: Decomposing Language Models With Dictionary Learning

This is alignment's Attention Is All You Need moment

You could get your framework by adapting existing frameworks to fit your meta-agent utility function. Examples:

The utilitarianism framework which seeks to maximize the sum of utility over all agents.
The Rawlsian maximin framework which seeks to maximize the utility of the worst-off agent.
The Nozickian entitlement framework which seeks to give each agent the maximal entitlement they could have, given the constraints of the system.
The Nussbaumian capability approach which seeks to give each agent the maximal capability they could have, given the constraints of the system.

I think in the end you would get stuck on the unsolved problem of balancing the needs of individuals and the collective.

The Parable of the Boy Who Cried 5% Chance of Wolf

Cédric3y98

Explaining more to align understanding.

Just translate this:

"I didn't say there was a wolf!" cried the boy. "I was estimating the probability at low, but high enough. A false alarm is much less costly than a missed detection when it comes to dying! The expected value is good!"

Into normie language and should be fine.

However it's very hard to communicate nuance at scale. I have no idea how to solve that.

Theses on Sleep

Cédric4y70

Follow up:

I didn't try the experiment for very long but here are my observations when sleeping fewer hours:

Lower self-control (more Uber Eats, more checking socials during work)
When I was able to exert enough control to concentrate on something, my concentration was deeper
Felt more tense/stressed? I guess it must be due to more adrenaline and cortisol
More creative (increased frequency of random ideas while doing chores/mundane tasks, more lost in interesting thoughts - when a thought was particularly interesting, I'd deeply and automatically follow the thoughts for multiple minutes)

So basically I felt tired, energetic, more disinhibited, creative, unable to stay concentrated but when concentrated, the concentration was deeper. It was a very oxymoronic experience with the tired + energised and contradictory effects on concentration.

Overall, pros and cons but if I can overcome the self-control problem it will be a net positive productivity wise. However, due to the stress that's another point deducted from the healthiness of sleeping fewer hours (not to mention the other non stress related potential issues).

The main thing that this has unlocked for me is that it has freed me from the fear of missing out on sleep. I sleep my 8 hours but don't sweat it if I don't sleep well or if I stay up late.

I also think it's a good idea to try random things, even if they fly in the face of conventional wisdom. Obvious caveat is you might cause irreparable damage to yourself but as long as you're careful not to take self-experimentation too far there might be some benefit to walking off the beaten track.

AGI Ruin: A List of Lethalities

Cédric4y21

Well then let's use hyperbolic discounting to our advantage. If we make paddling sufficiently taboo, the social punishment of paddling will outweigh the rewards of potentially building AGI in the minds of the selfish researchers.

AGI Ruin: A List of Lethalities

Cédric4y50

What I'm doing is trying to help with the wings by throwing some money at MIRI. I am also helping with the stopping/slowing of paddling by sharing my very simple reasoning about why that's the most sensible course of action. Hopefully the simple idea will spread and have some influence.

To be honest, I am not willing to invest that much into this as I have other things I am working on (sounds so insane to type that I am not willing to invest much into preventing the doom of everyone and everything). Anyway, there are many like me who are willing to help but only if the cost is low so if you have any ideas of what people like me could do to shift the probabilities a bit, let me know.

AGI Ruin: A List of Lethalities

Cédric4y82

Imagine we're all in a paddleboat paddling towards a waterfall. Inside the paddleboat is everyone but only a relatively small number of them are doing the paddling. Of those paddling, most are aware of the waterfall ahead but for reasons beyond my comprehension, decide to paddle on anyway. A smaller group of paddlers have realised their predicament and have decided to stop paddling and start building wings onto the paddleboat so that when the paddleboat inevitably hurtles off the waterfall, it might fly.

It seems to me like the most sensible course of action is to stop paddling until the wings are built and we know for sure they're going to work. So why isn't the main strategy definitively proving that we're heading towards the waterfall and raising awareness until the culture has shifted enough that paddling is taboo? With this strategy, even if the paddling doesn't stop, at least it buys time for the wings to be constructed. Trying to get people to stop paddling seems like a higher probability of success than wing building + increases the probability of success of wing building as it buys time.

I suspect that part of the reason for just focusing on the wings is the desire to reap the rewards of aligned AGI within our lifetimes. The clout of being the ones who did the final work. The immortality. The benefits that we can't yet imagine etc etc. Maybe infinite rewards justifies infinite risk but it does not apply in this case because we can still get the infinite rewards without so much risk if we just wait until the risks are eliminated.

I read Einstein's biography. Here are 15 quotes that reveal his philosophy on life.

Cédric4y30

Well, I hope you're right because I'd feel bad if someone tried to write something useful for us and it was so bad the comments are just speculation about whether the person is a spammer.

I'll keep on assuming people are actually trying though and try to provide constructive feedback and encouragement because the demand for LW posts outstrips supply. Even if the person is a spammer, perhaps being more encouraging and constructive will make others who are hesitant to post more comfortable. And if the person is not a spammer, they can use the feedback to improve on their next post and hopefully iterate until their posts become really good.

I read Einstein's biography. Here are 15 quotes that reveal his philosophy on life.

Cédric4y10

What would be the purpose of doing such a thing? There is no link in the writeup which would indicate farming backlinks for SEO

I read Einstein's biography. Here are 15 quotes that reveal his philosophy on life.

Cédric4y20

You're getting downvoted without feedback so I'll try to provide some.

The post does not provide any particular insight, it's a disparate collection of quotes. The same and more can be gotten by googling 'Einstein quotes'. Some ideas about how you can make the post more insightful:

Connect the quotes to uncover Einstein's worldview and approach to his work and life
Give your personal thoughts about the quotes, its caveats and nuance
Explain what Einstein means a little more as some quotes don't really mean much without context/explanation

Also, Einstein's brilliance was in his physics, none of those quotes really touch on that. In fact, his worldview outside of physics is not that sophisticated, especially relative to his work in physics which is as sophisticated as it gets. Some of those quotes are also quite obviously wrong e.g. the one about having no special talents is wrong even if it wasn't self-contradictory. Perhaps focusing on his work and how he did it would provide more value as that's what he was best at.

LESSWRONG
LW

LESSWRONG
LW

Posts

Wikitag Contributions

Comments