All of lukehmiles's Comments + Replies

What do you consider the strongest evidence / reason to believe?

Would love to see some false diagrams. Flow charts or circuits etc

Do you personally believe in either of those?

Ah, that's a difficult question. I don't think I will ever know for sure how this works. Smarter people than me will probably figure it out a few centuries after my death. * I think that many-worlds interpretation of quantum physics or something similar is at least 50% likely to be true. By "something similar" I mean some version that says that superpositions never collapse, and multiple states have a symmetrical claim on being real -- that is, each of them is "real" from its own perspective, and the others are "not real" from its perspective, but everyone's perspective is like this, and from the "outside" there is nothing that makes one of them right and the others wrong. That said, even if all quantum states are possible, some of them have greater measure than others. To translate it to sci-fi terms, you can have parallel universes, but some of them are "more real" than others. Like, each of them feels equally real "from inside", but there are different probabilities for ending up in one of them. Again, this is not about details, just the idea that: some things are more likely that others, the existence of parallel universes does not mean that you are free to throw away the laws of physics. * With regards to Tegmark multiverse, my confidence is much lower. Unlike quantum physics, when we know for sure that superpositions exist at least temporarily (and at least like a mathematical metaphor), we have no experimental evidence about universes with other laws of physics, if such thing is possible. Also, it seems to answer some questions, but opens much more -- what kinds of universes are possible, and what is their relative measure? Can you imagine any laws of physics (maybe even logically inconsistent ones), or is it somehow limited to a certain type of laws? Are universes with less complicated laws of physics more likely; and if yes, how exactly do you measure the complexity? Unless we answer these questions, then even if we believe that universes with other la

It is just as ambitious/implausible as you say. I am hoping to get out some rough ideas in my next post anyways.

I like the way you've operationalized the question

Yes the fact that coning works and people are doing it is what I meant was funny.

But I do wonder whether the protests will keep up and/or scale up. Maybe if enough people protest everywhere all at once, then they can kill autonomous cars altogether. Otherwise, I think a long legal dispute would eventually come out in the car companies' favor. Not that I would know.

What's going on here, like where did this post come from? I am missing context

It's the 25th installment of weekly update posts that go over all the important news of the week, to which Zvi adds his own thoughts. They're honestly amazing sources of information and it helps that I love Zvi's writing style.

Yes it does become easier to control and communicate with, but it does not become harder to make it be malicious. I'm not sure that an AI scheme that can't be trivially turned evil rerverso is possible, but I would like to try to find one.

Edited post to rename "intrinsically aligned AI" to "intrinsically kind AI" for clarity. As I understand it, the hope is to develop capability techniques and control techniques in parallel. But there's no major plan I know of to have a process for developing capabilities that are hard-linked to control/kindness/whatever in a way you can't easily remove. (I have heard an idea or two though and am planning on writing a post about it soon.)

My intuition is that you got down voted for the lack of clarity about whether you're responding to me [my raising the potential gap in assessing outcomes for self-driving], or the article I referenced. For my part, I also think that coning-as-protest is hilarious. I'm going to give you the benefit of the doubt and assume that was your intention (and not contribute to downvotes myself.) Cheers.

I know of one: the steam engine was "working" and continuously patented and modified for a century (iirc) before someone used it in boats at scale.

See also my post the Manhattan project was all about taking something that’s known to work in theory and solving all the Z_n’s

Do you know of any compendiums of such Z_Ns? Would love to read one

I know of one: the steam engine was "working" and continuously patented and modified for a century (iirc) before someone used it in boats at scale.

I never heard of it. You should try it.

If I make a post then revert to draft then republish, what is the publish date?

It will be the publish date of the republishing.

Perhaps there are some behavioral / black-box methods available for evaluating alignment, depending on the kind of system being evaluated.

Toy example: imagine a two part system where part A tries to do tasks and part B limits part A's compute based on the riskiness of the task. You could try to optimize the overall system towards catastrophic behavior and see how well your part B holds up.

Personally I expect monolithic systems to be hard to control than two-part systems, so I think this evaluation scheme has a good chance of being applicable. One piece of evidence: OpenAI's moderation system correctly flags most jailbreaks that get past the base model's RLHF.

I wonder how cross-species-compatible animal genes are in general. Main example I've heard of is that fluorescence genes from bacteria can be pretty much inserted anywhere and just work [citation needed]. You probably couldn't give a parrot elephant ears but maybe you could do more basic tweaks like lifespan or size changes?

If you can cross-copy-paste useful stuff easily then scenario 1 is significantly upgraded

Good point. In fact I can imagine people treating smarter parrots even worse sometimes because they would be extra annoying sometimes

The neurons are smaller and faster to match though

Yes I meant that it is slow. Seems to be very roughly six months for dogs and octopi.

I forgot to highlight that I think parrot's general social and physical compatibility with humans — and humans' general sympathy and respect for parrots -- is probably greater than any alternative except dogs. They also can fly. People quickly report and prosecute dog fighting. I bet regular or kinda smart or very smart parrots would all do fine. 100% speculation of course.

Parrots are social animal that frequently are held without any other parrots to socialize with. That suggests that humans don't care that much about treating them according to their values.

When you accidentally unlock the tech tree by encouraging readers to actually map out a tech tree and strategize about it

No, excellent analysis though.

Great references - very informative - thank you. I am always yelling at random people on the street walking their dogs that they're probably hacked already based on my needs-no-evidence raw reasoning. I'll print this out and carry it with me next time

I'm just patting myself on the back here for predicting the cup would get knocked over. That shouldn't count. You want the ball in the cup -- what use is a knocked over cup and ball on the ground.

Do you have more things like this? I would participate or run one

I'm interested in similar exercises that could be run. Brainstorming:

  • I've positioned the ramp, now you set up the cup. (Or possibly, I've set up the ramp and the cup, you decide where to drop from.)
  • Drop this magnet through this coil from the correct height to generate a particular peak current.
  • How long will a marble take to go through this marble run?
  • This toy car has a sail on it. Mark on the floor with tape where you think it will stop, after I turn this fan on to full power.

I think these all have various problems compared to the original, but migh... (read more)

Those kind of sound like decisions. Is the difference that you paused a little longer and sort of organized your thoughts beyond what was immediately necessary? Or how would you describe the key differentiating thing here?

In each case, I was sitting around doing some unrelated thing, and then noticed "hey, an observation (covid, ukraine, ftx), seems maybe decision relevant. Let's think about it and see if changes the space of decisions I might want to make" In of these cases, if I hadn't oriented, I would have been doing stuff completely unrelated to covid/ukraine/ai-regulation" (which in my case usually looked like "building LW features and/or Lightcone offices, and having some nice hobbies). The thing that's maybe a bit more confusing is the ChatGPT/Open-Letters thing, where I had been working on AI risk mitigation, but I'd been focused on one particular strategic frame (i.e. building infrastructure for AI researchers), because I'd previously judged "help the world-at-large orient to x-risk" wasn't very tractable. But I'd cached the strategic-thought of "it's too intractable to help the world orient to AI risk" and not re-examined it, and it took me ~a month or two after chatGPT came out to realize that it was changing that landscape and that I should open up a whole set of possible decisions I'd previously pruned off.

Does a dog orient? An ant? I thought one of the fighter pilot things was to not allow your enemy the time to orient

I think dogs orient. Orienting is something like "deciding on a new strategic frame". There are larger and smaller strategic shifts, and orienting is sort of fractal. Example: A dog is playing with a ball (action). It spends a while making little microdecisions within the "play with ball" game. (Chase the ball this way, chase the ball that way). Within the play-the-ball-game is one small OODA loop (observe where the ball headed, and what the owner-who-threw-the-ball is doing. Orient to the fact that you need to change direction if you want to catch the ball. Decide to head in the new direction. Head in the new direction. Catch the ball) Then the dog notices that it's hungry (observation). It orients to the hunger. Either it's not that hungry, and it's going to continue playing with the ball, or it's hungry enough that now "get food" is it's new primary goal, and it begins making choices focused on resolving that., rather than ball-playing. (Running back to the front door, scratching it, looking at the owner hoping the owner comes and let's the dog inside. If that works, go to the kitchen and scratch at it's food bowl, etc. If it doesn't work, maybe run back up to the owner and get its attention. If there is no owner) ... Example 2: The dog is fighting with another dog. It's making a serious of decisions and actions about fighting the dog. This can include little micro-orientings, like, "the Other Dog is coming up on my left, hmm, maybe instead of biting it right now I want to run away briefly to get a better position." But then a major observation + orienting happens when two new dogs join the fight. Previously the fight seemed winnable, now it seems like the new goal is run away or do some kind of submission ritual. ... In both cases there are micro and macro OODA loop. For the micro OODA loop (i.e. decisions within the "fight with individual dog" situation, or "play with ball" situation), the orienting step is very short. When John Boyd says "get inside

Kyle Scott roughly said that when you know where to look and what to ignore you are oriented. Imagine a general freaking out at all the explosions vs one who knows how severe the explosions are expected to be and the threshold for changing course.

Of course ReLU is great!! I was trying to say that if I were a 2009 ANN researcher (unaware of prior ReLU uses like most people probably were at the time) and someone (who had not otherwise demonstrated expertise) came in and asked why we use this particular woosh instead of a bent line or something, then I would've thoroughly explained the thought out of them. It's possible that I would've realized how it works but very unlikely IMO. But a dumbworker more likely to say "Go do it. Now. Go. Do it now. Leave. Do it." as I see it.

Curious what industry this is if you don't mind saying

3Gesild Muka4mo
Corporate real estate is what I call it when I want to sound fancy. Really, it was a call center for a relocation company which was a subsidiary of a large real estate company. Our department was like a dispatch service, we took calls from customers (of companies we had contracts with) and after a short exposition-heavy conversation we’d refer them to the real estate firms that were under the parent company’s umbrella. A real estate agent would be automatically assigned and receive our referral. It was free and if they closed with our agent they’d get a kickback (from the referral fee that we took out of the agent’s commission). I was a supervisor and quit 2 years ago but recently learned that the department was downsized and merged with another because I think they realized 10 people could do the work they had ~60 people doing: 1 director, 3 managers, 7 supervisors, ~40 line workers and ~5 administrative support workers (these last 2 numbers would often fluctuate).

Good point. I am concerned that adding even a dash of legibility screws the work over completely and immediately and invisibly rather than incrementally. I may have over-analyzed my data so I should probably return to the field to collect more samples.

It doesn't seem correct to me that adding even a dash of legibility "screws the work over" in the general case. I do agree there are certainly situations where the right solution is illegible to all (except the person implementing it). But both in that case and in general, talking to and getting along with the boss both makes things more legible, and will tend to increase quality. I expect that in the cases of you working well and not getting rewarded much, spending a little time interacting with your boss would both improve your outcomes, and importantly, also make your output even better than it already was.

Could spaceships accelerate fast enough to make missile course adjustment necessary? Seems like blind missile could still hit

3Alexander Gietelink Oldenziel4mo
A 0.01 m/s acceleration will displace a spaceship 50 meters over 100 seconds. In 100 seconds A missile moving at 10 km/s would move a 1000 km A missile moving at 100 km/s would move 10000 km A torch missile moving at 1000/s would move 100k km. 1/300th the speed of light. Not realistic with purely chemical propulsion, but could be reached by multistage ORION propulsion. At this point the missile is somewhat of an entire spaceship onto itself. To accelerate to this speed would take a considerable time: at an eye-watering 100g acceleration it would take a full 1000 seconds just to achieve top speed. Engagement ranges of > 100k km could be realistic. Using selenic drones one could extend the effective range of laser weapons beyond this range.

I would read a longpost about where and how and when and why liability insurance has succeeded or failed

Liability insurance has a mixed record for sure. Landlords and doctors ok not great in terms of safety

I would read a longpost about where and how and when and why liability insurance has succeeded or failed

This is so goddamn strange. I have wondered about this for so long

Some things are easy to notice and hard to replicate

More ideas you're less confident in?

I should clarify that section. I meant that if you're asked to write a line of code or an app or whatever then it is easier to guess at intent/consequences for the higher level tasks. Another example: the lab manager has a better idea of what's going on than a lab assistant.

Ah, ok. Thank you for clarifying.

How much room is there in algorithmic improvements?

Yeah would love to see experiments/evidence outside of Bing

Do you think there might be a simple difference between the successes and failures here that we could learn from?

1Carl Feynman5mo
If there was a simple difference, it would already have been noticed and acted on.

Added footnote clarifying link (goodfirms seems misquoted and also kind of looks fake?)

I mentioned the software development firm as an intermediate step to products because it's less risky / easier than making a successful product. Even easier would just be to hire devs, give them your model, put them on upwork, and split the profits.

I suppose the ideal commercialization plan depends on how the model works and the size of the firm commercializing it. (And for govts and universities "commercialization" is completely different.)

Thanks for the clarifications, that makes sense. I agree it might be easier to start as a software development company, and then you might develop something for a client that you can replicate and sell to other. Just anecdotal evidence, I use ChatGPT when I code, the speedup in my case is very modest (less than 10%), but I expect future models to be more useful for coding.

Could you state the problem and solution more succinctly?

problem: people think they/are trying to evaluate arguments when what's actually happening is that they're experiencing weird psychological effects that aren't contextualized well by western psychological theories. Understanding these psychological effects allows better separation of them from the underlying claims one would like to evaluate about the future.

There is a lot of room between "ignore people; do drastic thing" and "only do things where the exact details have been fully approved". In other words, the Overton window has pretty wide error bars.

I would be pleased if someone sent me a computer virus that was actually a security fix. I would be pretty upset if someone fried all my gadgets. If someone secretly watched my traffic for evil AI fingerprints I would be mildly annoyed but I guess glad?

Even google has been threatening unpatched software people to patch it or else they'll release the exploit iirc

So some of the Q of "to pivotally act or not to pivotally act" is resolved by acknowledging that extent is relevant and you can be polite in some cases

This is the post I would have written if I had had more time, knew more, thought faster, etc

One note about your final section: I expect the tool -> sovereign migration to be pretty easy and go pretty well. It is also kind of multistep, not binary.

Eg current browser automation tools (which bring browsers one step up the agency ladder to scriptable processes) work very well, probably better than a from-scratch web scripting tool would work.

Fake example: predict proteins, then predict interactions, then predict cancer-preventiveness, THEN, if everything is... (read more)

I thought not cuz i didn't see why that'd be desideratum. You mean a good definition is so canonical that when you read it you don't even consider other formulations?

'Betray' in the sense of contradicting/violating?

Hah no 'betray' in its less-used meaning as

unintentionally reveal; be evidence of.

"she drew a deep breath that betrayed her indignation"

Seems like choosing the definitions is the important skill, since in real life you don't usually have a helpful buddy saying "hey this is a graph"

Hah! Yes.

Also, a good definition does not betray all the definitions that one could try but that didn't make it. To truly appreciate why a definition is "mathematically righteous" is not so straightforward.

Do you expect the primary asset to be a neural architecture / infant mind or an adult mind? Is it too ambitious to try to find an untrained mind that reliably develops nicely?

Clearly the former precedes the latter - assuming by 'primary asset' you mean that which we eventually release into the world.

Someone make a PR for a builder/breaker feature on lesswrong

Load More