Independent AI safety researcher

Wiki Contributions


More generally, science is about identifying the structure and patterns in the world; the task taxonomy learned by powerful language models may be very convergent and could be a useful map for understanding the territory of the world we are in. What’s more, such a decomposition would itself be of scientifico-philosophical interest — it would tell us something about thinking.

I would love to see someone expand on the ways we could use interpretability to learn about the world, or the structure of tasks (or perhaps examples of how we've already done this?). Aside from being interesting scientifically, maybe this could also help us build economically valuable systems which are more explicit and predictable?

It's been a while since I read about this, but I think your slavery example might be a bit misleading. If I'm not mistaken, the movement to abolish slavery initially only gained serious steam in the United Kingdom. Adam Hochschild tells a story in Bury the Chains that makes the abolition of slavery look extremely contingent on the role activists played in shaping the UK political climate. A big piece of this story is how the UK used their might as a global superpower to help force an end to the transatlantic slave trade (as well as precedent setting). 

What about leaning into the word-of-mouth sharing instead, and support that with features? For example, being able to as effortlessly as possible recommend posts to people you know from within LW?

I think I must be missing something. As the number of traders increases, each trader can be less risk averse as their personal wealth is now a much smaller fraction of the whole, and this changes their strategy. In what way are these individuals now not EU-maximizing?

I like this thought experiment, but I feel like this points out a flaw in the concept of CEV in general, not SCEV in particular. 

If the entire future is determined by a singular set of values derived from an aggregation/extrapolation of the values of a group, then you would always run the risk of a "tyranny of the mob" kind of situation. 

If in CEV that group is specifically humans, it feels like all the author is calling for is expanding the franchise/inclusion to non-humans as well. 

@janus wrote a little bit about this in the final section here, particularly referencing the detection of situational awareness as a thing cyborgs might contribute to. It seems like a fairly straightforward thing to say that you would want the people overseeing AI systems to also be the ones who have the most direct experience interacting with them, especially for noticing anomalous behavior.

This post feels to me like it doesn't take seriously the default problems with living in our particular epistemic environment. The meat and dairy industries have historically, and continue to have, a massive influence on our culture through advertisements and lobbying governments. We live in a culture where we now eat more meat than ever. What would this conversation be like if it were happening in a society where eating meat was as rare as being vegan now?

It feels like this is preaching to the choir, and picking on a very small group of people who are not as well resourced (financially or otherwise). The idea that people should be vegan by default is an extremely minority view, even in EA, and so anyone holding this position really has everything stacked against them. 

This avoids spending lots of time getting confused about concepts that are confusing because they were the wrong thing to think about all along, such as "what is the shape of human values?" or "what does GPT4 want?"

These sound like exactly the sort of questions I'm most interested in answering. We live in a world of minds that have values and want things, and we are trying to prevent the creation of a mind that would be extremely dangerous to that world. These kind of questions feel to me like they tend to ground us to reality.

Try out The Most Dangerous Writing App if you are looking for ways to improve your babble. It forces you to keep writing continuously for a set amount of time, or else the text will fade and you will lose everything. 

Load More