It is just as ambitious/implausible as you say. I am hoping to get out some rough ideas in my next post anyways.
Yes the fact that coning works and people are doing it is what I meant was funny.
But I do wonder whether the protests will keep up and/or scale up. Maybe if enough people protest everywhere all at once, then they can kill autonomous cars altogether. Otherwise, I think a long legal dispute would eventually come out in the car companies' favor. Not that I would know.
Yes it does become easier to control and communicate with, but it does not become harder to make it be malicious. I'm not sure that an AI scheme that can't be trivially turned evil rerverso is possible, but I would like to try to find one.
Edited post to rename "intrinsically aligned AI" to "intrinsically kind AI" for clarity. As I understand it, the hope is to develop capability techniques and control techniques in parallel. But there's no major plan I know of to have a process for developing capabilities that are hard-linked to control/kindness/whatever in a way you can't easily remove. (I have heard an idea or two though and am planning on writing a post about it soon.)
I know of one: the steam engine was "working" and continuously patented and modified for a century (iirc) before someone used it in boats at scale. https://youtu.be/-8lXXg8dWHk
Perhaps there are some behavioral / black-box methods available for evaluating alignment, depending on the kind of system being evaluated.
Toy example: imagine a two part system where part A tries to do tasks and part B limits part A's compute based on the riskiness of the task. You could try to optimize the overall system towards catastrophic behavior and see how well your part B holds up.
Personally I expect monolithic systems to be hard to control than two-part systems, so I think this evaluation scheme has a good chance of being applicable. One piece of evidence: OpenAI's moderation system correctly flags most jailbreaks that get past the base model's RLHF.
I wonder how cross-species-compatible animal genes are in general. Main example I've heard of is that fluorescence genes from bacteria can be pretty much inserted anywhere and just work [citation needed]. You probably couldn't give a parrot elephant ears but maybe you could do more basic tweaks like lifespan or size changes?
If you can cross-copy-paste useful stuff easily then scenario 1 is significantly upgraded
Good point. In fact I can imagine people treating smarter parrots even worse sometimes because they would be extra annoying sometimes
I forgot to highlight that I think parrot's general social and physical compatibility with humans — and humans' general sympathy and respect for parrots -- is probably greater than any alternative except dogs. They also can fly. People quickly report and prosecute dog fighting. I bet regular or kinda smart or very smart parrots would all do fine. 100% speculation of course.
When you accidentally unlock the tech tree by encouraging readers to actually map out a tech tree and strategize about it
No, excellent analysis though.
Great references - very informative - thank you. I am always yelling at random people on the street walking their dogs that they're probably hacked already based on my needs-no-evidence raw reasoning. I'll print this out and carry it with me next time
I'm just patting myself on the back here for predicting the cup would get knocked over. That shouldn't count. You want the ball in the cup -- what use is a knocked over cup and ball on the ground.
Do you have more things like this? I would participate or run one
I'm interested in similar exercises that could be run. Brainstorming:
I think these all have various problems compared to the original, but migh...
Those kind of sound like decisions. Is the difference that you paused a little longer and sort of organized your thoughts beyond what was immediately necessary? Or how would you describe the key differentiating thing here?
Does a dog orient? An ant? I thought one of the fighter pilot things was to not allow your enemy the time to orient
Kyle Scott roughly said that when you know where to look and what to ignore you are oriented. Imagine a general freaking out at all the explosions vs one who knows how severe the explosions are expected to be and the threshold for changing course.
Of course ReLU is great!! I was trying to say that if I were a 2009 ANN researcher (unaware of prior ReLU uses like most people probably were at the time) and someone (who had not otherwise demonstrated expertise) came in and asked why we use this particular woosh instead of a bent line or something, then I would've thoroughly explained the thought out of them. It's possible that I would've realized how it works but very unlikely IMO. But a dumbworker more likely to say "Go do it. Now. Go. Do it now. Leave. Do it." as I see it.
Good point. I am concerned that adding even a dash of legibility screws the work over completely and immediately and invisibly rather than incrementally. I may have over-analyzed my data so I should probably return to the field to collect more samples.
Could spaceships accelerate fast enough to make missile course adjustment necessary? Seems like blind missile could still hit
I would read a longpost about where and how and when and why liability insurance has succeeded or failed
Liability insurance has a mixed record for sure. Landlords and doctors ok not great in terms of safety
I should clarify that section. I meant that if you're asked to write a line of code or an app or whatever then it is easier to guess at intent/consequences for the higher level tasks. Another example: the lab manager has a better idea of what's going on than a lab assistant.
Do you think there might be a simple difference between the successes and failures here that we could learn from?
Added footnote clarifying link (goodfirms seems misquoted and also kind of looks fake?)
I mentioned the software development firm as an intermediate step to products because it's less risky / easier than making a successful product. Even easier would just be to hire devs, give them your model, put them on upwork, and split the profits.
I suppose the ideal commercialization plan depends on how the model works and the size of the firm commercializing it. (And for govts and universities "commercialization" is completely different.)
problem: people think they/are trying to evaluate arguments when what's actually happening is that they're experiencing weird psychological effects that aren't contextualized well by western psychological theories. Understanding these psychological effects allows better separation of them from the underlying claims one would like to evaluate about the future.
There is a lot of room between "ignore people; do drastic thing" and "only do things where the exact details have been fully approved". In other words, the Overton window has pretty wide error bars.
I would be pleased if someone sent me a computer virus that was actually a security fix. I would be pretty upset if someone fried all my gadgets. If someone secretly watched my traffic for evil AI fingerprints I would be mildly annoyed but I guess glad?
Even google has been threatening unpatched software people to patch it or else they'll release the exploit iirc
So some of the Q of "to pivotally act or not to pivotally act" is resolved by acknowledging that extent is relevant and you can be polite in some cases
This is the post I would have written if I had had more time, knew more, thought faster, etc
One note about your final section: I expect the tool -> sovereign migration to be pretty easy and go pretty well. It is also kind of multistep, not binary.
Eg current browser automation tools (which bring browsers one step up the agency ladder to scriptable processes) work very well, probably better than a from-scratch web scripting tool would work.
Fake example: predict proteins, then predict interactions, then predict cancer-preventiveness, THEN, if everything is...
I thought not cuz i didn't see why that'd be desideratum. You mean a good definition is so canonical that when you read it you don't even consider other formulations?
Hah no 'betray' in its less-used meaning as
unintentionally reveal; be evidence of.
"she drew a deep breath that betrayed her indignation"
Seems like choosing the definitions is the important skill, since in real life you don't usually have a helpful buddy saying "hey this is a graph"
Hah! Yes.
Also, a good definition does not betray all the definitions that one could try but that didn't make it. To truly appreciate why a definition is "mathematically righteous" is not so straightforward.
Do you expect the primary asset to be a neural architecture / infant mind or an adult mind? Is it too ambitious to try to find an untrained mind that reliably develops nicely?
What do you consider the strongest evidence / reason to believe?