faul_sname

Wiki Contributions

Comments

65% confidence: reverse short strings of ascii English text. Here's GPT-3 (text-davinci-002 temp=0.7, bold is the prompt, non-bold is the response)

Reverse the following strings

Input: The quick brown fox jumped over the lazy dog Output: god yzal eht revo depmuj xof nworb kciuq ehT

Input: I think that GPT-3 is pretty cool Output: looc ytterp si 3-TPG taht kniht I

Input: I have heard that GPT-4 is coming out in August, and that it will be even cooler Output:

Attempt 1:

Output: reool evah dna 4-TPG htiw tuo gnimoc ,tiuqila hguone eht ti sah dneirf ev'I

Attempt 2:

Output: reool evah dna tuo august uoy ot gniog 4-TGP taht drah I

Attempt 3:

Output: reool evah dna tuo gniog si 4-TPG taht drah I

I do expect that GPT-4 will do slightly better than GPT-3, but I expect it to still be worse at this than an elementary school student.

Alright, I managed to figure out what the message is and where it came from. I have not managed to write a better compressor than gzip yet, and expect that doing so will require picking up some computer skills that I don't currently have. But I think it's something modern software and hardware is capable of.

Update on progress: https://imgur.com/a/6r7VcDV

Time to see how far behind Mateon1 I was.

Edit: Answer: extremely far behind. I am very impressed. Also I take back my statement that I could probably beat the best general compression algos here given a reasonable amount of time, because

it's a blurry picture of something that had lots of pretty diffraction patterns, which was then converted to jpeg. Deconvolution is a thing, and can sometimes recover information from blurry pictures, but the conversion to jpeg destroys some data and I don't think deconvolution algos are robust to lost data. And also the data that looked random actually was random.

I probably could tell you some details about the camera used, focal distance, etc though after some playing around with Blender. And someone who knows a lot more physics than I do could maybe tell you interesting things about the light source by looking at the interference patterns in the layer of bismuth oxide in the parts of the photo that aren't blurry.

Interesting choice of message to encode -- it's one that (aside from the lossy compression aspect) would actually be quite a bit more informative about the laws of physics in the universe it came from than a picture of a falling apple would be about the laws of physics in our universe.

Thanks for running this challenge.

Oh nice! Sorry for the slow reply. It looks like Mateon1 might have already solved it, but I'm going to take an independent crack at this before looking at the solution they came up with.

So far I've established that it appears to be

a pattern of bits that is 686 bits of header, followed by 500x600x56 bits of message, followed by 2 0's. The 500 and 600 length sides appearing to have neighboring cells have similar value / behavior, and the side of length 56 representing maybe channels. Every 7th channel seems to be different than the other 6. There is some easily visible structure in channels 1-4, 10-13, 29-32, and 38-41.

My progress so far is at https://imgur.com/a/SM0gY1o

I'm not sure either one quite captures exactly what I mean, but I think (1) is probably closer than (2), with the caveat that I don't think the file necessarily has to be deliberately constructed to be decoded without a-priori knowledge, but it should be constructed to have as close as possible to a 1:1 mapping between the structure of the process used to capture the data and the structure of the underlying data stream.

I notice I am somewhat confused by the inclusion of camera raw formats in (2) rather than in (1) though -- I would expect that moving from a file in camera raw format to a jpeg would move you substantially in the direction from (1) to (2).

It sounds like maybe you have something resembling "some sensor data of something unusual in an unconventional but not intentionally obfuscated format"? If so, that sounds pretty much exactly like what I'm looking for.

However, I don't think it will be particularly interesting or fun for people trying to decode it. Maybe that's ok?

I think it's fine if it's not interesting or fun to decode because nobody can get a handle on the structure -- if that's the case, it will be interesting to see why we are not able to do that, and especially interesting if the file ends up looking like one of the things we would have predicted ahead of time would be decodable.

BTW I see that someone did a very minimal version (with an array of floating point numbers generated by a pretty simple process) of this test, but I'm still up for doing a fuller test -- my schedule now is a lot more clear than it was last month.

For me at least, I mentally model the right column as something "a better GPT" could probably describe how to do, if given the correct prompt. For example, let's say that I have just sequenced a sample of Equine herpesvirus 1, and I want to know what the protein encoded by ORF-46 does. I feed GPT the following

Our team identified the likely function of the protein encoded by ORF46 of the Equine herpesvirus 1 (EHV-11) through the use of machine-learning based tools. Specifically, we

and then GPT spits out

used the protein sequence of ORF46 to predict its function using the programs ProtParam and PFAM. ProtParam predicted that ORF46 is a protein with a molecular weight of 9.5 kDa and an isoelectric point of 5.5. PFAM predicted that ORF46 is a protein domain with the PFAM ID PFam:PF02336. This PFAM ID is associated with the family of proteins known as the "G-protein coupled receptors."

G-protein coupled receptors are a large family of proteins that play a role in many cellular processes, including signal transduction, cell proliferation,

ProtParam and PFAM are in fact existing tools used for determining the structure and function of polypeptides. Now obviously GPT does not actually use these tools, so we would need to identify when GPT is confabulating and then actually do the things and substitute in the results of doing the thing. However, I think only the step of "actually use the tool" is a thing that GPT is flatly unable to do at all rather than simply bad at doing. For example, it knows how to

  1. Identify which tools are being used
  2. Figure out what google search you would use to find the documentation of that tool.
  3. Say how one would invoke a given tool on the command line to accomplish a task, given some examples

Now this certainly is not a very satisfying general AI architecture, but I personally would not be all that surprised if "GPT but bigger and with more training specifically around how to use tools, and some clever prompts structures that only need to be discovered once" does squeak over the threshold of "being general".

Basically my mental model is that if "general intelligence" is something possessed by an unmotivated undergrad who just wants to finish their project with minimal effort, who will try to guess the teacher's password without having to actually understand anything if that's possible, it's something that a future GPT could also have with no further major advances.

Honestly, I kind of wonder if the crux of disagreement comes from some people who have and successfully use problem-solving methods that don't look like "take a method you've seen used successfully on a similar problem, and try to apply it to this problem, and see if that works, and if not repeat". That would also explain all of the talk about the expectation that an AI will, at some point, be able to generalize outside the training distribution. That does not sound like a thing I can do with very much success -- when I need to do something that is outside of what I've seen in my training data, my strategy is to obtain some training data, train on it, and then try to do the thing (and "able to notice I need more training data and then obtain that training data" is, I think, the only mechanism by which I even am a general intelligence). But maybe it is just a skill I don't have but some people do, and the ones who don't have it are imagining AIs that also don't have it, and the ones who do have the skill are imagining a "general" AI that can actually do the thing, and then the two groups are talking past each other.

And if that's the case, the whole "some people are able to generalize far from the training distribution, and we should figure out what's going on with them" might be the load-bearing thing to communicate.

I don't think this algorithm could decode arbitrary data in a reasonable amount of time. I think it could decode some particularly structured types of data, and I think "fairly unprocessed sensor data from a very large number of nearly identical sensors" is one of those types of data.

I actually don't know if my proposed method would work with jpgs - the whole discrete cosine transform thing destroys data in a way that might not leave you with enough information to make much progress. In general if there's lossy compression I expect that to make things much harder, and lossless compression I'd expect either makes it impossible (if you don't figure out the encoding) or not meaningfully different than uncompressed (if you do).

RAW I'd expect is better than a bitmap, assuming no encryption step, on the hypothesis that more data is better.

Also I kind of doubt that the situation where some AI with no priors encounters a single frame of video but somehow has access to 1000 GPU years to analyze that frame would come up IRL. The point as I see it is more about whether it's possible with a halfway reasonable amount of compute or whether That Alien Message was completely off-base.

Here is an example of a large file that:

I don't see the file, but again if you want to run this the mentioned dates from the other thread work best.

I am not exactly sure what you mean by the phrase "derive structure but not meaning" - a couple possibilities come to mind.

Let's say I have a file that is composed of 1,209,600 bytes of 0s. That file was generated by taking pressure readings, in kPa, once per second on the Hubble Space Telescope for a two week period. If I said "this file is a sequence of 2^11 x 3^3 x 5^2 x 7 zeros", would that be a minimal example of "deriving the structure but not the meaning"? If so I expect that situation to be pretty common.

If not, some further questions:

  1. Is it coherent to "understand the meaning but not the structure"?
  2. Would it be possible to go from "understands the structure but not the meaning" to "understands both" purely through receiving more data?
  3. If not , what about through interaction
  4. If not, what differences would you expect to observe between a world where I understood the structure but not the meaning of something and a world where I understood both?

You will not do search over all 64 byte programs

I was not imagining you would, no. I was imagining something more along the lines of "come up with a million different hypotheses for what a 2-d grid could encode", which would be tedious for a human but would not really require extreme intelligence so much as extreme patience, and then for each of those million hypotheses try to build a model, and iteratively optimize for programs within that model for closeness to the output.

I expect, though I cannot prove, that "a 2d projection of shapes in a 3d space" is a pretty significant chunk of the hypothesis space, and that all of the following hypotheses would make it into the top million

  1. The 2-d points represent a rectangular grid oriented on a plane within that 3-d space. The values at each point are determined by what the plane intersects.
  2. The 2-d points represent a grid oriented on a plane within that 3-d space. The values at each point are determined by drawing lines orthogonal to the plane and seeing what they intersect and where. The values represent distance.
  3. The 2-d points represent a grid oriented on a plane within that 3-d space. The values at each point are determined by drawing lines orthogonal to the plane and seeing what they intersect and where. The values represent something else about what the lines intersect. 4-6. Same as 1-3, but with a cylinder. 6-9. Same as 1-3, but with a sphere. 10-12. Same as 1-3, but with a torus. 13-21: Same as 4-12 but the rectangular grid is projected onto only part of the cylinder/sphere/torus instead of onto the whole thing.

I am not a superintelligence though, nor do I have any special insight into what the universal prior looks like, so I don't know if that's actually a reasonable assumption or whether it's an entity embedded in a space that detects other things within that space using signals privileging the hypothesis that "a space where it's possible to detect other stuff in that space using signals" is a simple construct.

as well as an actually quite detailed specification of the "scene" you input to that ray tracer (which is probably way more bits than the original png).

If the size-optimal scene such that "ray-trace this scene and then apply corrections from this table" is larger than a naively compressed version of the png, my whole line of thought does indeed fall apart. I don't expect that to be the case, because ray tracers are small and pngs are large, but this is a testable hypothesis (and not just "in principle it could be tested with a sufficiently powerful AI" but rather "an actual human could test it in a reasonable amount of time and effort using known techniques"). I don't at this moment have time to test it, but maybe it's worth testing at some point?

Extremely confident you could construct a file such that I could not even start giving you the provenance of that information. Trivially, you could do something like "aes256 encrypt a 2^20 bytes of zeros with the passphrase ec738958c3e7c2eeb3b4".

My contention is not so much "intelligence is extremely powerful" as it is "an image taken by a modern camera contains a surprisingly large amount of easy-to-decode information about the world, in the information-theoretic sense of the word information".

If you were to give me a binary file with no extension and no metadata that is

  1. Above 1,000,000 bytes in size
  2. Able to be compressed to under 50% of its uncompressed size with some simple tool like gzip (to ensure that there is actually some discoverable structure)
  3. Not able to be compressed under 10% of its uncompressed size by any well-known existing tools (to ensure that there is actually a meaningful amount of information in the file)
  4. Not generated by some tricky gotcha process (e.g. a file that is 250,000 bytes from /dev/random followed by 750,000 bytes from /dev/zero)

then I'd expect that

  1. It would be possible for me, given some time to examine the data, create a decompressor and a payload such that running the decompressor on the payload yields the original file, and the decompressor program + the payload have a total size of less than the original gzipped file
  2. The decompressor would legibly contain a substantial amount of information about the structure of the data.

I would not expect that I would be able to tell you the entire provenance of the information, even if it were possible in principle to deduce from the file, since I am not a superintelligence or even particularly smart by human standards.

I notice that this is actually something that could be empirically tested -- if you wanted we could actually run this experiment at some point (I anticipate being pretty busy for the next few weeks, but on the weekend of July 16/17 I will be tied to my computer for the duration of the weekend (on-call for work), so if you have significantly different expectations about what would happen, and want to actually run this experiment, that would be a good time to do it for me.


P.S. A lot of my expectations here come from the Hutter Prize, a substantial cash prize for generating good compressions of the first gigabyte of Wikipedia. The hypothesis behind the prize is that, in order to compress data, it is helpful to understand that data. The current highest-performing algorithm does not have source code available, but one of the best runner-ups that actually gives source code works using a process that looks like

The transform done by paq8hp1 through paq8hp5 is based on WRT by Przemyslaw Skibinski, which first appeared in PAsQDa and paqar, and later in paq8g and xml-wrt. The steps are as follows:

  • The input is parsed into seqences of all uppercase letters or all lowercase letters, or one uppercase letter followed by lowercase letters, e.g. "THE", "the", or "The".
  • All uppercase words are prefixed by a special symbol (0E hex in paq8hp3, paq8hp4, paq8hp5). If a lowercase letter follows with no intervening characters (e.g. "THEre", then a special symbol (0C hex) marks the end. (e.g. 0E "the" 0C "re").
  • Capitalized words are prefixed with 7F hex (paq8hp3) or 40 hex (paq8hp4, paq8hp5) (e.g. "The" -> 40 "the").
  • All letters are converted to lower case.
  • Words are looked up in the dictionary. The first 80 words in the dictionary are coded with 1 byte: 80, 81, ... CF (hex).
  • The next 2560 words (paq8hp1-4) or 3840 words (paq8hp5) are coded with 2 bytes: D080, D081, ... EFCF (paq8hp1-4), or D080, ... FFCF (paq8hp5).
  • The last 40960 words are coded with 3 bytes: F0D080, F0D081, ... FFEFCF.
  • If a word does not match, then the longest matching prefix with length at least 6 is coded and the rest of the word is spelled.
  • If there is no matching prefix, then the longest matching suffix with length at least 6 is coded after spelling the preceding letters.
  • If no matching word, prefix, or suffix is found, the word is spelled. Capitalization coding occurs regardless.
  • Any input bytes with special meaning are escaped by prefixing with 06: 06, 0C, 0E, 40 or 7F, 80-FF.

WRT has additional capabilities depending on input, such as skipping encoding if little or no text is detected. The dictionary format is one word per line (linefeed only) with a 13 line header.

If there was a future entry into the Hutter Prize which substantially cut down the compressed size, and that future entry contained fewer rather than more rules about how the English language worked, I would be very surprised and consider that to go a long way towards invalidating my model of what it means to "understand" something.

Load More