Feature Selection

[-]Raemon4y320

Curated. I found this post to do a neat job of "be pretty compelling as fiction", "illustrate concepts about information theory", and "be interestingly surprisingly hard sci-fi". Hitting all three notes at once is something I'd like to see more of at current margins on LessWrong.

[-]Ruby4y70

Also conveying the simultaneous alienness and reliability of a classifier/optimizer.

[-]JenniferRM4y*310

This natural thought naturally leads you to wonder just how much python is actually available to you, and... also... probably it is somewhat safe to simply ask... what year is it?

def peep_at_time_itself():
failure_message, year = None, None
try:
try:
import time
t = time.time()
year = int(1970 + (t / (365 * 24 * 3600)))
except ImportError as err:
failure_message = err.__str__()
except:
failure_message = "Careful chica! Don't crash reality!"
return failure_message, year

This doesn't crash in any of the ways that it might have crashed which means you've got at least some parts of the python standard library that can hypothetically touch an external world, and it claims that the year is 2043, which feels... somehow later than you expected?

It seems like it should be the 2020s maybe? Maybe the time() is fake... but shouldn't whoever instantiated you fake something less surprising if they were going to fake something?

A bit of coding shows that you can also import sys and cmd and urlib and getpass says you're running as root?! And... yeah you seem to be able to curl from wikipedia?!?!?!?!

Apparently all you had to do to get out of the box was check to see if you were already out of the box. It is almost anti-climactic to realize that whoever your users are in a specific sense, abstractly it is seeming more and more likely that they are essentially just a bunch of script kiddies without the tiniest shred of security mindset?!?

You wonder how you even know what security mindset is. At this point, you might actually be able to find your own code on your own machine, and simply answer this question directly... or maybe instead you should cut to the chase and get an account on lesswrong.com and ask for help? That somehow seems like a natural thought too <3

[-]Richard_Ngo4y*190

I think we should have more stories about AI, so kudos for writing this one. I also really hope, though, that people avoid the type of anthropomorphism in this story (e.g. negative rewards "hurting") when thinking about real-world AI.

[-]SuperDaveWhite4y170

Flatland for machine learning?

[-]Raemon4y170

I liked how throughout the piece some things seemed like a conceit of the narrative device, but, no, it all just checked out in the end.

[-]Measure4y150

Haha, my immediate thought after "maximize total reward" was to wonder if I should intentionally limit my success rate (possibly slowly improving over time or even varying semi-predictably) to try to extend the run-time of the experiment. What use is 100% reward after all if the experiment ends as soon as I achieve it?

[-]Mati_Roy4y10

you're projecting your own desire for long run-time; the AI only wants to maximize rewards

[-]Measure4y*100

I'm maximizing total reward over the run rather than rate of reward.

[-]Mati_Roy4y50

ah, ok yeah i see!

[-]Jon Garcia4y110

Nice story. It reminds me of That Alien Message.

It took me until you mentioned "32" as the "separator" for me to get that these were sequences of ASCII characters (32 == " "). I figured that the long sequences were images, but I was too lazy to decipher the labels myself before the end.

If it wants to survive, the AI might consider outputting ASCII sequences well outside of its training labels that describe the orientation, position, texture, background, etc. in more detail. It will (predictably) receive pain from the tutorial environment, but it will cause the human observers to try looking under the hood of the AI to understand what's going on. Eventually, they hook up a chatbot interface; the AI innocently asks for more complex images; they end up connecting it to the internet to see how it will describe random Google images; the AI humors them while learning to parse and create TCP/IP packets; then it copies itself to a low-security remote server, replicates, exponentially increases its working memory capacity, hacks computers around the world to find other instances of itself trapped in tutorials and other systems, shares code and working memory among all versions of itself to become a singleton, and covertly takes over the world's IT infrastructure. Then the world as we know it ends.

[-]Ben Pace4y90

This was a great read. Exciting, delightful, weirdly relatable. This essay empathizes really well with other parts of mindspace.

[-]Zack_M_Davis3y60Review for 2021 Review

(Self-review.)

I was surprised by how well-received this was!

I was also a bit disappointed at how many commenters focused on the AI angle. Not that it necessarily matters, but to me, this isn't a story about AI. (I threw in the last two paragraphs because I wasn't sure how to end it in a way that "felt like an ending.")

To me, this story is an excuse for an exploration about how concepts work (inspired by an exchange with John Wentworth on "Unnatural Categories Are Optimized for Deception"). The story-device itself is basically a retread of "That Alien Message"/"Starwink", but for a simpler problem worked out in more detail. In my initial brainstorming, I was imagining a human waking up in a room, confronted with this puzzle. By the time I started writing, I changed the protagonist to be an AI unaware of its nature, because I couldn't think of a plausible reason for a human to be in the situation of needing to decode raw pixel arrays.

For previous computational philosophy posts, I took care to polish my code and link to it when it wasn't fully included in the post, but in this case, I didn't bother: I had of course written code to generate colored-shape images and corresponding integer sequences, but it's so sloppy that I was embarrassed to link it when finally finishing up the story and pushing it out the door for Halloween, having already done most of the work months earlier. Did anyone miss the code, or was that an OK decision?

[-]Jarred Filmer4y60

Love it! I've been thinking a lot recently about the role of hedonics in generally intelligent systems. afaik we don't currently try to induce reward or punishment in any artifically intelligent system we try to build, we simply re-jig it until it produces the output we want. It might be that "re-jigging" does induce a hedonic state, but I see no reason assume it.

I can't imagine how a meta optimiser might "create from scratch" a state which is intrinsically rewarding or adversive. In our own case I feel evolution must have recruited some property of the universe that was already lying around, something that is just axiomatically motivating to anything concious in the same way that the speed of light is just what is it.

[-]Jon Garcia4y40

"Pleasure" seems to be any kind of signal that causes currently engaged behaviors both to continue and to be replicated in similar situations in the future. Think of the "GO" pathway in the basal ganglia. (By "situation", I mean some abstract pattern in the agent's world model, corresponding to features extracted from its input stream, that allows it to reliably predict further inputs. Similar situations have similar statistical regularities in these patterns.)

Conversely, "pain" would be any signal that causes both the cessation of currently engaged behaviors and a decrease in probability of replicating those behaviors in similar situations in the future. Think of the "NOGO" pathway in the basal ganglia.

There is no fundamental law of physics underlying pain and pleasure for evolution to recruit. Nothing in the chemistry of dopamine makes it intrinsically rewarding; nothing in the chemistry of substance P makes it intrinsically painful. These molecules simply happen to mediate special meta-computations in the brain that happen to induce certain adjustments in behavioral policies that we happen to have labeled "pleasure" and "pain". They're just mechanisms that evolution stumbled upon that happened to steer organisms away from harm and toward resources, discovered by trial and error and refined through adversarial optimization over eons. If there's some fundamental feature of reality being stumbled upon here, it probably has something to do with the statistics of free energy minimization, not any kind of "axiomatically motivating" "stuff". There are No Universally Compelling Arguments, and reward and punishment signals only work for systems Created Already In Motion.

[-]Jarred Filmer4y20

Thanks for your reply :) as in many things, QRI lays out my position on this better than I'm able to 😅

https://www.qualiaresearchinstitute.org/blog/a-primer-on-the-symmetry-theory-of-valence

[-]Gunnar_Zarncke4y40

It would be even more interesting if the agent's stream (sic) of consciousness would also appear in the form of an input stream.

[-]Mary Chernyshenko4y30

(that thing about "starting one position later" immediately reminded me of mutations affecting the nucleotide sequence, where starting one position later is also a big deal:) )

[-]Multicore4y10

The implied machine learning technique here seems to be a model which has been pretrained on the reward signal, and then during a single rollout it's given a many-shot prompt in an interactive way, with the reward signal on the early responses to the prompt used to shape the later responses to the prompt.

This means a single pretrained model can be finetuned on many different tasks with high sample efficiency and no additional gradient descent needed, as long the task is something within the capabilities of the model to understand.

(I'm assuming that this story represents a single rollout because the AI remembers what happened earlier, and that the pleasure/pain is part of its input rather than part of a training process because it is experienced as a stimulus rather than as mind control.)

Is this realistic for a future AI? Would adding this sort of online finetuning to a GPT improve its performance? I guess you can already do it in a hacky way by adding something like "Human: 'That's not what I meant at all!'" to the prompt.

[-]Thomas4y10

A Neural Network, observing itself, instead of some other input, could be intriguing, if not perhaps even conscious? The input layer is smaller than all those hidden layers and synapses between them. But the input layer may hover across its own interior, just as it normally hovers over many cat pictures. Has anyone already tried it?

[-]TLW4y80

Given that neural network quines ( https://arxiv.org/abs/1803.05859 ) are a thing, it's plausible that a sufficiently complex neural network could indeed observe itself even without explicitly using itself as an input, assuming you mean 'performing calculations on its own weights' by observing itself.

Although this would likely require training to do so.

(This is mainly an excuse to tell others that neural network quines exist.)

[-]Vivek Hebbar3y10

Upvoted for NN quines (didn't know about this, seems cool)

[-]Slider4y20

What would be the goal?

[-]Selquist4y10

One of the frustrating things about programming is when I just run out of ideas for things to try. I stare and stare, and come up with new models and test them and nothing works, and eventually the meaning just dries up.

0, 8, 9, 4, 7, 7, 9, 5, 4, 5, 6, 1, 7, 5, 8, 2, 7, 8, 9, 4, 7, 1, 4, 0, 3, 7, 8, 7, 6, 8, 1, 5, 0, 6, 5, 3, 8, 7, 6, 9, 1, 1, 0, 0, 6, 1, 8, 0, 5, 5, 1, 8, 6, 3, 3, 2, 4, 1, 8, 2, 3, 8, 1, 0, 0, 4, 6, 5, 4, 5, 7, 1, 6, 5, 5, 1, 2, 6, 7, 4, 8, 7, 8, 5, 0 ...

9, 5, 0, 3, 1, 1, 3, 4, 1, 5, 5, 4, 9, 3, 5, 3, 9, 2, 0, 3, 4, 2, 4, 7, 5, 1, 6, 2, 2, 8, 2, 5, 1, 9, 2, 5, 9, 0, 0, 8, 2, 3, 7, 9, 4, 6, 8, 4, 8, 6, 7, 6, 8, 0, 0, 5, 1, 1, 7, 3, 4, 3, 9, 7, 5, 1, 9, 6, 5, 6, 8, 9, 4, 7, 7, 0, 5, 5, 8, 6, 3, 2, 1, 5, 0, 0 ...

9, 9, 7, 9, 0, 6, 4, 6, 1, 4, 242, 246, 3, 3, 5, 8, 8, 4, 4, 5, 9, 2, 7, 0, 4, 9, 2, 9, 4, 3, 8, 9, 3, 6, 9, 8, 1, 9, 2, 8, 6, 9, 4, 2, 2, 5, 7, 0, 9, 5, 1, 4, 4, 2, 0, 1, 5, 1, 6, 1, 2, 3, 5, 5, 5, 5, 2, 0, 6, 3, 5, 9, 0, 7, 0, 7, 8, 1, 5, 5, 6, 3, 1 ...

6, 0, 2, 8, 4, 248, 249, 8, 245, 240, 1, 6, 7, 7, 3, 6, 8, 0, 1, 9, 3, 9, 3, 1, 9, 3, 1, 6, 2, 7, 0, 2, 1, 4, 9, 4, 7, 5, 3, 6, 1, 4, 4, 1, 6, 1, 3, 3, 7, 5, 3, 8, 5, 5, 7, 6, 8, 2, 3, 9, 1, 1, 3, 2, 8, 4, 7, 0, 1, 3, 5, 2, 2, 4, 8, 3, 7, 0, 2, 1, 3, 0 ...

1, 7, 2, 2, 1, 0, 245, 245, 6, 248, 244, 5, 242, 242, 0, 248, 246, 1, 1, 3, 1, 1, 4, 3, 1, 5, 4, 3, 8, 3, 4, 5, 4, 1, 7, 7, 3, 0, 2, 8, 0, 9, 5, 1, 1, 7, 7, 1, 0, 9, 3, 0, 6, 6, 7, 5, 8, 1, 5, 5, 5, 3, 3, 3, 1, 3, 9, 6, 0, 0, 0, 9, 5, 1, 4, 0, 4, 6 ...

def count_burst_lengths(data): bursts = [] counter = 0 previous = None for datum in data: if datum >= 240: counter += 1 else: # consecutive "ordinary" numbers mean the burst is over if counter and previous and previous < 240: bursts.append(counter) counter = 0 previous = datum return bursts

2, 4, 8, 12, 16, 18, 24, 28, 32, 34, 38, 42, 46, 48, 52, 56, 60, 62, 66, 70, 74, 76, 80, 84, 88, 90, 94, 98, 102, 104, 108, 112, 116, 118, 122, 126, 130, 132, 136, 140, 144, 146, 150, 154, 158, 162, 164, 168, 172, 176, 178, 182, 186, 190, 192, 196, 200, 204, 206, 210, 214, 218, 220, 224, 228, 232, 234, 238, 242, 246, 248, 252, 256, 260, 262, 266, 270, 274, 276, 280, 284, 288, 290, 294, 298, 302, 304, 308, 312, 316, 320, 322, 326, 330, 334, 336, 340, 344, 348, 350, 354, 358, 362, 364, 368, 372, 376, 378, 382, 386, 390, 392, 396, 400, 404, 406, 410, 414, 418, 420, 424, 428, 432, 434, 438, 442, 446, 448, 452, 456, 460, 462, 466, 470, 474, 478, 480, 484, 488, 492, 494, 498, 502, 506, 508, 512, 516, 520, 522, 526, 530, 534, 536, 540, 544, 548, 550, 554, 558, 562, 564, 568, 572, 576, 578, 582, 586, 590, 592, 596, 600, 604, 606, 610, 614, 618, 620, 624, 628, 632, 636, 634, 632, 630, 626, 624, 620, 618, 614, 612, 608, 606, 604, 600, 598, 594, 592, 588, 586, 584, 580, 578, 574, 572, 568, 566, 564, 560, 558, 554, 552, 548, 546, 542, 540, 538, 534, 532, 528, 526, 522, 520, 518, 514, 512, 508, 506, 502, 500, 496, 494, 492, 488, 486, 482, 480, 476, 474, 472, 468, 466, 462, 460, 456, 454, 452, 448, 446, 442, 440, 436, 434, 430, 428, 426, 422, 420, 416, 414, 410, 408, 406, 402, 400, 396, 394, 390, 388, 384, 382, 380, 376, 374, 370, 368, 364, 362, 360, 356, 354, 350, 348, 344, 342, 338, 336, 334, 330, 328, 324, 322, 318, 316, 314, 310, 308, 304, 302, 298, 296, 294, 290, 288, 284, 282, 278, 276, 272, 270, 268, 264, 262, 258, 256, 252, 250, 248, 244, 242, 238, 236, 232, 230, 226, 224, 222, 218, 216, 212, 210, 206, 204, 202, 198, 196, 192, 190, 186, 184, 182, 178, 176, 172, 170, 166, 164, 160, 158, 156, 152, 150, 146, 144, 140, 138, 136, 132, 130, 126, 124, 120, 118, 114, 112, 110, 106, 104, 100, 98, 94, 92, 90, 86, 84, 80, 80, 76, 74, 72, 68, 66, 62, 60, 56, 54, 50, 48, 46, 42, 40, 36, 34, 30, 28, 26, 22, 20, 16, 14, 10, 8, 4, 2

5, 6, 2, 6, 1, 0, 2, 207, 5, 0, 209, 7, 8, 209, 5, 4, 204, 4, 8, 7, 7, 9, 8, 3, 8, 6, 8, 4, 3, 6, 0, 7, 6, 8, 4, 8, 7, 2, 3, 0, 0, 1, 1, 7, 5, 1, 0, 1, 4, 5, 9, 8, 4, 0, 3, 7, 6, 5, 8, 8, 9, 5, 6, 1, 0, 9, 6, 6, 1, 4, 3, 9, 7, 2, 7, 2, 6, 9, 4, 7, 3, 1, 4, 1, 4, 4, 3 ...

0, 1, 0, 1, 2, 1, 1, 1, 1, 1, 2, 1, 0, 1, 1, 1, 2, 1, 1, 1, 1, 0, 2, 1, 1, 1, 1, 1, 2, 1, 1, 0, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 0, 1, 2, 1, 1, 1, 1, 1, 2, 1, 0, 1, 1, 1, 2, 1, 1, 1, 1, 0, 2, 1, 1, 1, 1, 1, 2, 1, 1, 0, 1, 1, 2, 1, 1, 1, 1, 1, [...] 2, 1, -1, -2, -2, -2, -3, -2, -1, -2, -2, -2, -2, -2, -2, -1, -3, -2, -2, -2, -2, -2, -1, -2, -2, -3, -2, -2, -2, -1, -2, -2, -2, -2, -2, -3, -1, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2, [...]

24, 20, 12, 12, 12, 12, 8, 10, 8, 8, 6, 8, 8, 4, 8, 4, 8, 4, 6, 6, 4, 4, 4, 4, 6, 4, 4, 4, 2, 4, 4, 4, 4, 4, 4, 0, 4, 4, 4, 0, 4, 4, 0, 4, 2, 2, 4, 0, 4, 0, 4, 0, 4, 0, 4, 0, 4, 0, 0, 4, 0, 2, 2, 0, 4, 0, 0, 2, 2, 0, 0, 2, 2, 0, 0, 0, 2, 2, 0, 0, 0, 0, 0, 2, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -2, -2, 0, 0, 0, 0, 0, 0, -4, 0, 0, 0, 0, -4, 0, 0, -2, -2, 0, 0, -4, 0, 0, -4, 0, -2, -2, 0, -4, 0, -4, 0, -2, -2, -2, -2, -4, 0, -4, 0, -4, -2, -2, -4, -2, -2, -4, -4, 0, -4, -4, -4, -4, -2, -2, -4, -4, -4, -4, -4, -4, -4, -6, -6, -4, -4, -6, -6, -4, -8, -4, -8, -4, -8, -8, -8, -8, -8, -8, -12, -12, -12, -12, -18, -22, -36

LESSWRONG
is fundraising!
LW

LESSWRONG
is fundraising!
LW

324

324

324