JuliaHP — LessWrong

Extracting and playing with "evil" features seem like literally of the worst and most irresponsible things you could be doing when working on AI-related things. I don't care if it leads to a good method or whatever its too close to really bad things. They claim to be adding an evil vector temporarily during fine tuning. It would not suprise me if you end up being one code line away from accidentally adding your evil vector to your AI during deployment or something. Or what if your AI ends up going rogue and breaking out of containment during this period?

Responsible AI development involves among other things having zero evil vectors stored in your data&code-base.

Related https://arbital.greaterwrong.com/p/hyperexistential_separation

JuliaHP's Shortform

JuliaHP3mo190

Over time during my life Ive started to see the world as a more and more horrible place, in various ways. Ive noticed that this seems to make it harder to be excited about things in general, although I'm not confident that these are related. This seems kind of bad as being excited about things is among other things important for learning things and doing things.

Imagine a robot which serves coffee and does a back-flip. Wouldn't that be awesome cool? A healthy kid would probably be excited about making such a thing. This kind of feels like a values thing in some sense. The world containing an awesome robot sure seems nice.

But now the world happens to suck. The awesomeness of the existence of the robot feels diminished. The world with billions of tortured sentient beings, vs the world with billions of tortured sentient beings but it has a back-flipping coffee robot.

Perhaps the world feels smaller as a kid and the robot feels more meaningful. Maybe if the world is less sucky its easier to ignore the larger world and live in a more local smaller world where the existence of the robot matters more in a relative sense?

I don't feel like I have a great understanding of whats going on here though, or if my hunch is even in the right direction. I'm curious if others have similar experiences or clarifying thoughts.

How to Make Superbabies

JuliaHP5mo10

I expect the main cost to be regulatory rather than technical, this seems to be a trend across various medicine. These costs might scale with the richest peoples ability to pay.

examples-ish;
- Needing expensive studies to get FDA (or other regulatory framework) approval, (and thus needing to sell at a premium to make up the loss).
- Regulations which make market entry expensive (and favor the market leader by requiring bio-equivalence studies) which promote monopolies.
- Need for expensive (time, money & training-capacity) general certifications for people to be allowed to administer narrow treatments.

I don't have any domain-knowledge or analysis to illustrate my point, but I am curious to what extent you've (or someone else working on accelerating this technology) have thought about this.

(metanote; I'm not meant to be discouraging of your work direction. to me it seems that both work on the tech itself, thinking about how to encourage favorable regulation and public opinion, as well as thinking about cultural downstream effects of the technology, are all extremely neglected.)

Sinclair Chen's Shortform

JuliaHP7mo31

>though maintenance might suck idk

Yeah, and I'm guessing very expensive. If something is being given away for cheap/free the true market value of the good is likely negative. It probably makes sense to think more about that bit before concluding that obtaining a castle is a good idea.

Can we ever ensure AI alignment if we can only test AI personas?

JuliaHP7mo20

This to me seems to be akin to "sponge-alignment" IE not building a powerful AI.

We understand personas because they are simulating human behavior which we understand. But that human behavior is mostly limited to human capabilities (expect for maybe speed-up possibilities).

Building truly powerful AI's will probably involve systems that do something different than human brains, or at-least do not grow with human biases for learning, which causes them to learn the human behaviors we are familiar with.

If the "power" of the AI comes through something else than the persona, then trusting the persona won't do you much good.

Altman blog on post-AGI world

JuliaHP8mo3013

I do believe that if Altman does manage to create his superAI's, the first such eats Altman and makes squiggles. But if I were to engage in the hypothetical where nice corrigible superassistants are just magically created, Altman does not appear to treat this future he claims to be steering towards seriously.

The world where "everyone has a superassitant" is inherently incredibly volatile/unstable/dangerous due to an incredibly large offence-defence assymetry of superassistants attacking fragile-fleshbags (with optimized viruses, bacteria, molecules, nanobots etcetc) or hijacking fragile minds with supermemes.

Avoiding this kind of outcome to me seems difficult. Nonsystematic "patches" are always workaroundable.
If openAI's superassistant refuses your request to destroy the world, use it to build your own superassistant, or use it for subtasks etc etc. Humans are fragile-fleshbags, and if strong optimization is ever pointed in their direction, they die.

There are ways to make such a world stable, but all of them that I can see look incredibly authoritarian, something Altman says hes not aiming for. But Altman does not appear to be proposing any alternatives as to how this will turn out fine, and I am not aware of any research agenda at openai trying to figure out how "giving everyone a superoptimizer" will result in a stable world with humans doing human things.

I know only three coherent ways to interpret what Altman is saying, and none of them take the object of writing seriously:
1) I wanted to have the stock go up and wrote words which do that
2) I didnt really think about it, oops
3) I'm actully gonna keep the superassistants all to myself and rule, and this nicecore writing will make people support me as I approach the finish line

This is less meant to be critical of the writing, and more me asking for help of how to actually make sense of what Altman says

The Field of AI Alignment: A Postmortem, and What To Do About It

JuliaHP10mo91

(That broad technical knowledge is the main thing (as opposed to tacit skills) why you value a physics PhD is a really surprising response to me, and seems like an important part of the model that didn't come across from the post.)

The Field of AI Alignment: A Postmortem, and What To Do About It

JuliaHP10mo72

Curious about what it would look like to pick up the relevant skills, especially the subtle/vague/tacit skills, in an independent-study setting rather than in academia. As well as the value of doing this, IE maybe its just a stupid idea and its better to just go do a PhD. Is the purpose of a PhD to learn the relevant skills, or to filter for them? (If you have already written stuff which suffices as a response, id be happy to be pointed to the relevant bits rather than having them restated)

"Broad technical knowledge" should be in some sense the "easiest" (not in terms of time-investment, but in terms of predictable outcomes), by reading lots of textbooks (using similar material as your study guide).

Writing/communication, while more vague, should also be learnable by just writing a lot of things, publishing them on the internet for feedback, reflecting on your process etc.

Something like "solving novel problems" seems like a much "harder" one. I don't know if this is a skill with a simple "core" or a grab-bag of tactics. Textbook problems take on a "meant-to-be-solved" flavor and I find one can be very good at solving these without being good at tackling novel problems. Another thing I notice is that when some people (myself included) try solving novel problems, we can end up on a path which gets there eventually, but if given "correct" feedback integration would go OOM faster.

I'm sure there are other vague-skills which one ends up picking up from a physics PhD. Can you name others, and how one picks them up intentionally? Am I asking the wrong question?

Considerations on orca intelligence

JuliaHP10mo103

(warning: armchair evolutionary biology)

Another consideration for orca intelligence; they dodge the fermi paradox by not having arms.

Assume the main driver of genetic selection for intelligence is the social arms-race. As soon as a species gets intelligent enough (see humans) from this arms-race they start using their intelligence for manipulating the environment, and start civilization. But orcas mostly lack the external organs for manipulating the enviroment, so they can keep social-arms-racing-boosting-intelligence way past the point of "criticality".

This should be checkable, IE how long have orcas (or orca-forefathers) been socially-arms-racing? I tried asking claude to no avail, and I lack the domain knowledge to quickly look it up myself. Perhaps one could also check genetic change over time, perhaps social arms race is something you can see in this data? Do we know what this looks like in humans and orcas?

jacquesthibs's Shortform

JuliaHP10mo64

"As a result, we can make progress toward automating interpretability research by coming up with experimental setups that allow AIs to iterate."
This sounds exactly like the kind of progress which is needed in order to get closer to game-over-AGI. Applying current methods of automation to alignment seems fine, but if you are trying to push the frontier of what intellectual progress can be achieved using AI's, I fail to see your comparative advantage relative to pure capabilities researchers.

I do buy that there might be credit to the idea of developing the infrastructure/ability to be able to do a lot of automated alignment research, which gets cached out when we are very close to game-over-AGI, even if it comes at the cost of pushing the frontier some.

LESSWRONG
LW

LESSWRONG
LW

Posts

Wikitag Contributions

Comments