I realize that my position might seem increasingly flippant, but I really think it is necessary to acknowledge that you've stated a core assumption as a fact.
Alignment doesn't run on some nega-math that can't be cast as an optimization problem.
I am not saying that the concept of "alignment" is some bizarre meta-physical idea that cannot be approximated by a computer because something something human souls etc, or some other nonsense.
However the assumption that "alignment is representable in math" directly implies "alignment is representable as an optimization problem" seems potentially false to me, and I'm not sure why you're certain it is true.
There exist systems that can be 1.) represented mathematically, 2.) perform computations, and 3.) do not correspond to some type of min/max optimization, e.g. various analog computers or cellular automaton.
I don't think it is ridiculous to suggest that what the human brain does is 1.) representable in math, 2.) in some type of way that we could actually understand and re-implement it on hardware / software systems, and 3.) but not as an optimization problem where there exists some reward function to maximize or some loss function to minimize.
I wasn't intending for a metaphor of "biomimicry" vs "modernist".
(Claim 1) Wings can't work in space because there's no air. The lack of air is a fundamental reason for why no wing design, no matter how clever it is, will ever solve space travel.
If TurnTrout is right, then the equivalent statement is something like (Claim 2) "reward functions can't solve alignment because alignment isn't maximizing a mathematical function."
The difference between Claim 1 and Claim 2 is that we have a proof of Claim 1, and therefore don't bother debating it anymore, while with Claim 2 we only have an arbitrarily long list of examples for why reward functions can be gamed, exploited, or otherwise fail in spectacular ways, but no general proof yet for why reward functions will never work, so we keep arguing about a Sufficiently Smart Reward Function That Definitely Won't Blow up as if that is a thing that can be found if we try hard enough.
As of right now, I view "shard theory" sort of like a high-level discussion of chemical propulsion without the designs for a rocket or a gun. I see the novelty of it, but I don't understand how you would build a device that can use it. Until someone can propose actual designs for hardware or software that would implement "shard theory" concepts without just becoming an obfuscated reward function prone to the same failure modes as everything else, it's not incredibly useful to me. However, I think it's worth engaging with the idea because if correct then other research directions might be a dead-end.
Does that help explain what I was trying to do with the metaphor?
To some extent, I think it's easy to pooh-pooh finding a flapping wing design (not maximally flappy, merely way better than the best birds) when you're not proposing a specific design for building a flying machine that can go to space. Not in the tone of "how dare you not talk about specifics," but more like "I bet this chemical propulsion direction would have to look more like birds when you get down to brass tacks."
(1) The first thing I did when approaching this was think about how the message is actually transmitted. Things like the preamble at the start of the transmission to synchronize clocks, the headers for source & destination, or the parity bits after each byte, or even things like using an inversed parity on the header so that it is possible to distinguish a true header from bytes within a message that look like a header, and even optional checksum calculations.
(2) I then thought about how I would actually represent the data so it wasn't just traditional 8-bit bytes -- I created encoders & decoders for 36/24/12/6 bit unsigned and signed ints, and 30 / 60 bit non-traditional floating point, etc.
Finally, I created a mock telemetry stream that consisted of a bunch of time-series data from many different sensors, with all of the sensor values packed into a single frame with all of the data types from (2), and repeatedly transmitted that frame over the varying time series, using (1), until I had >1 MB.
And then I didn't submit that, and instead swapped to a single message using the transmission protocol that I designed first, and shoved an image into that message instead of the telemetry stream.
[0, +N], when I should have allowed the noise to be positive or negative. I am less convinced that the uniform distribution is incorrect. In my experience, ADC noise is almost always uniform (and only present in a few LSB), unless there's a problem with the HW design, in which case you'll get dramatic non-uniform "spikes". I was assuming that the alien HW is not so poorly designed that they are railing their ADC channels with noise of that magnitude.
My understanding of faul_sname's claim is that for the purpose of this challenge we should treat the alien sensor data output as an original piece of data.
In reality, yes, there is a source image that was used to create the raw data that was then encoded and transmitted. But in the context of the fiction, the raw data is supposed to represent the output of the alien sensor, and the claim is that the decompressor + payload is less than the size of just an ad-hoc gzipping of the output by itself. It's that latter part of the claim that I'm skeptical towards. There is so much noise in real sensors -- almost always the first part of any sensor processing pipeline is some type of smoothing, median filtering, or other type of noise reduction. If a solution for a decompressor involves saving space on encoding that noise by breaking a PRNG, it's not clear to me how that would apply to a world in which this data has no noise-less representation available. However, a technique of measuring & subtracting noise so that you can compress a representation that is more uniform and then applying the noise as a post-processing op during decoding is definitely doable.
Assuming that you use the payload of size 741809 bytes, and are able to write a decompressor + "transmitter" for that in the remaining ~400 KB (which should be possible, given that 7z is ~450 KB, zip is 349 KB, other compressors are in similar size ranges, and you'd be saving space since you just need to the decoder portion of the code), how would we rate that against the claims?
- It would be possible for me, given some time to examine the data, create a decompressor and a payload such that running the decompressor on the payload yields the original file, and the decompressor program + the payload have a total size of less than the original gzipped file
- The decompressor would legibly contain a substantial amount of information about the structure of the data.
(1) seems obviously met, but (2) is less clear to me. Going back to the original claim, faul_sname said 'we would see that the winning programs would look more like "generate a model and use that model and a similar rendering process to what was used to original file, plus an error correction table" and less like a general-purpose compressor'.
So far though, this solution does use a general purpose compressor. My understanding of (2) is that I was supposed to be looking for solutions like "create a 3D model of the surface of the object being detected and then run lighting calculations to reproduce the scene that the camera is measuring", etc. Other posts from faul_sname in the thread, e.g. here seem to indicate that was their thinking as well, since they suggested using ray tracing as a method to describe the data in a more compressed manner.
What are your thoughts?
Regarding the sensor data itself
I alluded to this in my post here, but I was waffling and backpedaling a lot on what would be "fair" in this challenge. I gave a bunch of examples in the thread of what would make a binary file difficult to decode -- e.g. non-uniform channel lengths, an irregular data structure, multiple types of sensor data interwoven into the same file, and then did basically none of that, because I kept feeling like the file was unapproachable. Anything that was a >1 MB of binary data but not a 2D image (or series of images) seemed impossible. For example, the first thing I suggested in the other thread was a stream of telemetry from some alien system.
I thought this file would strike a good balance, but I now see that I made a crucial mistake: I didn't expect that you'd be able to view it with the wrong number of bits per byte (7 instead of 6) and then skip almost every byte and still find a discernible image in the grayscale data. Once you can "see" what the image is supposed to be, the hard part is done.
I was assuming that more work would be needed for understanding the transmission itself (e.g. deducing the parity bits by looking at the bit patterns), and then only after that would it be possible to look at the raw data by itself.
I had a similar issue when I was playing with LIDAR data as an alternative to a 2D image. I found that a LIDAR point cloud is eerily similar enough to image data that you can stumble upon a depth map representation of the data almost by accident.
Which question are we trying to answer?
I don't think (1) is a particularly interesting question, because last weekend I convinced myself that the answer is yes, you can transfer data in a way that it can be decoded, with very few assumptions on the part of the receiver. I do have a file I created for this purpose. If you want, I'll send you it.
I started creating a file for (2), but I'm not really sure how to gauge what is "fair" vs "deliberately obfuscated" in terms of encoding. I am conflicted. Even if I stick to encoding techniques I've seen in the real world, I feel like I can make choices on this file encoding that make the likelihood of others decoding it very low. That's exactly what we're arguing about on (2). However, I don't think it will be particularly interesting or fun for people trying to decode it. Maybe that's ok?
What are your thoughts?
It depends on what you mean by "didn't work". The study described is published in a paper only 16 pages long. We can just read it: http://web.mit.edu/curhan/www/docs/Articles/biases/67_J_Personality_and_Social_Psychology_366,_1994.pdf
First, consider the question of, "are these predictions totally useless?" This is an important question because I stand by my claim that the answer of "never" is actually totally useless due to how trivial it is.
Despite the optimistic bias, respondents' best estimates were by no means devoid of information: The predicted completion times were highly correlated with actual completion times (r = .77, p < .001). Compared with others in the sample, respondents who predicted that they would take more time to finish actually did take more time. Predictions can be informative even in the presence of a marked prediction bias.
Respondents' optimistic and pessimistic predictions were both strongly correlated with their actual completion times (rs = .73 and .72, respectively; ps < .01).
Yep. Matches my experience.
We know that only 11% of students met their optimistic targets, and only 30% of students met their "best guess" targets. What about the pessimistic target? It turns out, 50% of the students did finish by that target. That's not just a quirk, because it's actually related to the distribution itself.
However, the distribution of difference scores from the best-guess predictions were markedly skewed, with a long tail on the optimistic side of zero, a cluster of scores within 5 or 10 days of zero, and virtually no scores on the pessimistic side of zero. In contrast, the differences from the worst-case predictions were noticeably more symmetric around zero, with the number of markedly pessimistic predictions balancing the number of extremely
In other words, asking people for a best guess or an optimistic prediction results in a biased prediction that is almost always earlier than a real delivery date. On the other hand, while the pessimistic question is not more accurate (it has the same absolute error margins), it is unbiased. The reality is that the study says that people asked for a pessimistic question were equally likely to over-estimate their deadline as they were to under-estimate it. If you don't think a question that gives you a distribution centered on the right answer is useful, I'm not sure what to tell you.
The paper actually did a number of experiments. That was just the first.
In the third experiment, the study tried to understand what people are thinking about when estimating.
Proportionally more responses concerned future scenarios (M = .74) than relevant past experiences (M =.07), r(66) = 13.80, p < .001. Furthermore, a much higher proportion of subjects' thoughts involved planning for a project and imagining its likely progress (M =.71) rather than considering potential impediments (M = .03), r(66) = 18.03, p < .001.
This seems relevant considering that the idea of premortems or "worst case" questioning is to elicit impediments, and the project managers / engineering leads doing that questioning are intending to hear about impediments and will continue their questioning until they've been satisfied that the group is actually discussing that.
In the fourth experiment, the study tries to understand why it is that people don't think about their past experiences. They discovered that just prompting people to consider past experiences was insufficient, they actually needed additional prompting to make their past experience "relevant" to their current task.
Subsequent comparisons revealed that subjects in the recall-relevant condition predicted they would finish the assignment later than subjects in either the recall condition, t(79) = 1.99, p < .05, or the control condition, f(80) = 2.14, p < .04, which did not differ significantly from each other, t(& 1) < 1
Further analyses were performed on the difference between subjects' predicted and actual completion times. Subjects underestimated their completion times significantly in the control (M = -1.3 days), r(40) = 3.03, p < .01, and recall conditions (M = -1.0 day), t(41) = 2.10, p < .05, but not in the recall-relevant condition (M = -0.1 days), ((39) < i. Moreover, a higher percentage of subjects finished the assignments in the predicted time in the recall-relevant condition (60.0%) than in the recall and control conditions (38.1% and 29.3%, respectively), x2G, N = 123) = 7.63, p < .01. The latter two conditions did not differ significantly from each other.
The absence of an effect in the recall condition is rather remarkable. In this condition, subjects first described their past performance with projects similar to the computer assignment and acknowledged that they typically finish only 1 day before
deadlines. Following a suggestion to "keep in mind previous experiences with assignments," they then predicted when they would finish the computer assignment. Despite this seemingly powerful manipulation, subjects continued to make overly optimistic forecasts. Apparently, subjects were able to acknowledge their past experiences but disassociate those episodes from their present predictions.
In contrast, the impact of the recall-relevant procedure was sufficiently robust to eliminate the optimistic bias in both deadline conditions
How does this compare to the first experiment?
Interestingly, although the completion estimates were less biased in the recall-relevant condition than in the other conditions, they were not more strongly correlated with actual completion times, nor was the absolute prediction error any smaller. The optimistic bias was eliminated in the recall-relevant condition because subjects' predictions were as likely to be too long as they were to be too short. The effects of this manipulation mirror those obtained with the instruction to provide pessimistic predictions in the first study: When students predicted the completion date for their honor's thesis on the assumption that "everything went as poorly as it possibly could" they produced unbiased but no more accurate predictions than when they made their "best guesses."
It's common in engineering to perform group estimates. Does the study look at that? Yep, the fifth and last experiment asks individuals to estimate the performance of others.
As hypothesized, observers seemed more attuned to the actors' base rates than did the actors themselves. Observers spontaneously used the past as a basis for predicting actors' task completion times and produced estimates that were later than both the actors' estimates and their completion times.
So observers are more pessimistic. Actually, observers are so pessimistic that you have to average it with the optimistic estimates to get an unbiased estimate.
One of the most consistent findings throughout our investigation was that manipulations that reduced the directional (optimistic) bias in completion estimates were ineffective in in-
creasing absolute accuracy. This implies that our manipulations did not give subjects any greater insight into the particular predictions they were making, nor did they cause all subjects to become more pessimistic (see Footnote 2), but instead caused enough subjects to become overly pessimistic to counterbalance the subjects who remained overly optimistic. It remains for future research to identify those factors that lead people to make
more accurate, as well as unbiased, predictions. In the real world, absolute accuracy is sometimes not as important as (a) the proportion of times that the task is completed by the "best-guess" date and (b) the proportion of dramatically optimistic, and therefore memorable, prediction failures. By both of these criteria, factors that decrease the optimistic bias "improve" the quality of intuitive prediction.
At the end of the day, there are certain things that are known about scheduling / prediction.
gesture vaguely to all of the projects that overrun their schedule. There are many interesting reasons for why that happens and why I don't think it's a massive failure of rationality, but I'm not sure this comment is a good place to go into detail on that. The quick answer is that comical overrun of a schedule has less to do with an inability to create correct schedules from an engineering / evidence-based perspective, and much more to do with a bureaucratic or organizational refusal to accept an evidence-based schedule when a totally false but politically palatable "optimistic" schedule is preferred.
I'm reminded of this thread from 2022: https://www.lesswrong.com/posts/27EznPncmCtnpSojH/link-post-on-deference-and-yudkowsky-s-ai-risk-estimates?commentId=SLjkYtCfddvH9j38T#SLjkYtCfddvH9j38T