marc/er

Contact me via LessWrong or at m[dot]arc[dot]er[underscore]contact[at]proton[dot]me

Wiki Contributions

Comments

I'm in high school myself and am quite invested in AI safety. I'm not sure whether you're requesting advice for high school as someone interested in LW, or for LW and associated topics as someone attending high school. I will try to assemble a response to accommodate both possibilities.

Absorbing yourself in topics like x-risk can make school feel like a waste of time. This seems to me to be because school is mostly a waste of time (which is a position I held before becoming interested in AI safety,) but disengaging with the practice entirely also feels incorrect. I use school mostly as a place to relax. Those eight hours are time I usually have to write off as wasted in terms of producing a technical product, but value immensely as a source of enjoyment, socializing and relaxation. It's hard for me to overstate just how pleasurable attending school can be when you optimize for enjoyment, and if permitted by your school's environment; a suitable place for intellectual progress in an autodidactic sense also, presuming you aren't being provided that in the classroom. If you do feel that the classroom is an optimal learning environment for you, I don't see why you shouldn't just maximize knowledge extraction.

For many of my peers, school is practically their life. I think that this is a shame, but social pressures don't let them see otherwise, even when their actions are clearly value negative. Making school just one part of your life instead of having it consume you is probably the most critical thing to extract from this response. The next is to use its resources to your advantage. If you can network with driven friends or find staff willing to push you/find you interesting opportunities, you absolutely should. I would be shocked if there wasn't at least one staff member at your school passionate about something you were too. Just asking can get you a long way, and shutting yourself off from that is another mistake I made in my first few years of high school, falsely assuming that school simply had nothing to offer me.

In terms of getting involved with LW/AI safety, the biggest mistake I made was being insular, assuming my age would get in the way of networking. There are hundreds of people available at any given time who probably share your interests but possess an entirely different perspective. Most people do not care about my age, and I find that phenomena especially prevalent in the rationality community. Just talk to people. Discord and Slack are the two biggest clusters for online spaces, and if you're interested I can message you invites. 

Another important point, particularly as a high school student is not falling victim to group think. It's easy to be vulnerable to the failing in your formative years, but it can massively skew your perspective, even when your thinking seems unaffected. Don't let LessWrong memetics propagate throughout your brain too strongly without good reason.

I expect agentic simulacra to occur without intentionally simulating them, in that agents are just generally useful for solving prediction problems and that in conducting millions of predictions (as would be expected of a product on the order of ChatGPT, or future successors,) it's probable for agentic simulacra to occur. Even if these agents are just approximations, in predicting the behaviors of approximated agents their preferences could still be satisfied in the real world (as described in the Hubinger post.)

The problem I'm interested in is how you ensure that all subsequent agentic simulacra (whether occurred intentionally or otherwise) are safe, which seems difficult to verify formally due to the Löbian Obstacle.

Which part specifically are you referring to as being overly complicated? What I take to be the primary assertions of the post to be are:

  • Simulacra may themselves conduct simulation, and advanced simulators could produce vast webs of simulacra organized as a hierarchy.
  • Simulating an agent is not fundamentally different to creating one in the real world.
  • Due to instrumental convergence, agentic simulacra might be expected to engage in resource acquisition. This could take the shape of 'complexity theft' as described in the post.[1]
  • The Löbian Obstacle accurately describes why an agent cannot obtain a formal guarantee via design-inspection of its subsequent agent.
  • For a simulator to be safe, all simulacra need to be aligned unless we figure some upper bound on "programs of this complexity are too simple to be dangerous," at which point we would consider simulacra above that complexity only.

I'll try to justify my approach with respect to one or more of these claims, and if I can't, I suppose that would give me strong reason to believe the method is overly complicated.

  1. ^

    This doesn't have to be resource acquisition, just any negative action that we could reasonably expect a rational agent to pursue.

The issue I have with pivotal act models is that they presume an aligned superintelligence would be capable of bootstrapping its capabilities in such a way that it could perform that act before the creation of the next superintelligence. Soft takeoff seems a very popular opinion now, and isn't conducive to this kind of scheme.

Also, if a large org were planning a pivotal act I highly doubt they would do so publicly. I imagine subtly modifying every GPU on the planet, melting them or doing anything pivotal on a planetary scale such that the resulting world has only one or a select few superintelligences (at least until a better solution exists) would be very unpopular with the public and with any government.

I don't think the post explicity argues against either of these points, and I agree with what you have written. I think these are useful things to bring up in such a discussion however.

I have enjoyed your writings both on LessWrong and on your personal blog. I share your lack of engagement with EA and with Hanson (although I find Yudkowsky's writing very elegant and so felt drawn to LW as a result.) If not the above, which intellectuals do you find compelling, and what makes them so by comparison to Hanson/Yudkowsky?

In (P2) you talk about a roadblock for RSI, but in (C) you talk about about RSI as a roadblock, is that intentional?

This was a typo. 

By "difficult", do you mean something like, many hours of human work or many dollars spent?  If so, then I don't see why the current investment level in AI is relevant.  The investment level partially determines how quickly it will arrive, but not how difficult it is to produce.

The primary implications of the difficulty of a capabilities problem in the context of safety is when said capability will arrive in most contexts. I didn't mean to imply that the investment amount determined the difficulty of the problem, but that if you invest additional resources into a problem it is more likely to be solved faster than if you didn't invest those resources. As a result, the desired effect of RSI being a difficult hurdle to overcome (increasing the window to AGI) wouldn't be realized.

More like: (P1) Currently there is a lot of investment in AI. (P2) I cannot currently imagine a good roadblock for RSI. (C) Therefore, I have more reasons to believe RSI will not be entail atypically difficult roadblocks than I do to believe it will.

This is obviously a high level overview, and a more in-depth response might cite claims like the fact that RSI is likely an effective strategy for achieving most goals, or mention counterarguments like Robin Hanson's, which asserts that RSI is unlikely due to the observed behaviors of existing >human systems (e.g. corporations).

"But what if [it's hard]/[it doesn't]"-style arguments are very unpersuasive to me. What if it's easy? What if it does? We ought to prefer evidence to clinging to an unknown and saying "it could go our way." For a risk analysis post to cause me to update I would need to see "RSI might be really hard because..." and find the supporting reasoning robust.

Given current investment in AI and the fact that I can't conjure a good roadblock for RSI, I am erring on the side of it being easier rather than harder, but I'm open to updating in light of strong counter-reasoning.

See:

Defining fascism in this way makes me worry that future fascist figures can hide behind the veil of "But we aren't doing x specific thing (e.g. minority persecution) and therefore are not fascist!"

And:

Is a country that exhibits all symptoms of fascism except for minority group hostility still fascist? 

Agreed. I have edited that excerpt to be:

It's not obvious to me that selection for loyalty over competence is necessarily more likely in fascism or bad. A competent figure who is opposed to democracy would be a considerably more concerning electoral candidate than a less competent one who is loyal to democracy assuming that democracy is your optimization target. 

Load More