So do you think this is part of how it generates images - i.e. having used depth estimation and much else to infer 3D objects/scenes and the wider workings of the 3D world from its training photographs, it turns a new description into an internal representation of a 3D scene and then renders it to photorealistic 2D using inter alia a kind of reversal of its 2D->3D inference algorithm???
Which seems miraculous to me - not so much that this is possible, but that a neural network figured this all out by itself rather than the complex algorithms required being very elaborately hand-coded.
I would have assumed that at best a neural network would infer a big pile of kludges that would produce poor results, like a human trying to Photoshop a bunch of different photographs together.
Indeed, airplanes are edible and delicious. Frenchman Michel Lotito (aka Monsieur Mangetout) ate a Cessna 150. Yes, he actually did!
why are you so sure dalle knows what an image is or what it is doing? Why do you think it knows there is a person looking at its output vs an electric field pulsing in a non random way?
I don’t think it does (and didn’t say this!)
My question is how it manages to produce, almost all the time, such convincing 3D images without knowing about the 3D world and everything else normally required to create a realistic 3D image. As you can’t just do it by fitting together existing loosely suitable photographs (from its training data) and tweaking the results.
I don’t deny that you can catch it out by asking for weird things very different from its training data (though it often makes a good attempt). However that doesn’t explain how it does so well at creating images which are broadly within the range of its training data, but different enough from the photographs it’s seen that a skilled human with Photoshop couldn’t do what it does.
Back in 2021 when image generators like DALL-E arrived, I was more astonished than most that they could turn descriptions into realistic pictures. For anyone who’s tried creating a convincing fake photograph themselves by piecing together and adjusting existing photographs using software like Adobe Photoshop (without AI tools), will find it virtually impossible to do. Merely producing convincing lighting (with plausible colour temperature and illumination/shadows) is very difficult. Let alone the ray-tracing required to create a photorealistic 3D view from an arbitrary angle, which almost no human can do manually with such software; at best a skilled artist could take hours to produce a semi-realistic picture of a simple scene.
For AI to be able to do this convincingly, it surely needs a sophisticated understanding of the 3D world (rotation, perspective, lighting, etc - ideally full ray-tracing, plus other physics and pragmatics), which is (almost?) impossible to infer just from a load of photographs (ie 2D images) with captions/descriptions.
Indeed I heard that neural networks intrinsically can’t infer these kinds of complex mathematical transformations; also that, as their output is a superposition of their training data (existing photographs), they can’t create arbitrary scenes either.
So how do they manage to do all of this so well, or at all? (Quite aside from the near-miracle of inferring the very oblique relationship between training images and text descriptions of them.)
Oddly I haven’t seen anyone raise this question, let alone provide a satisfactory answer.
Indeed, for decimal numbers big- and little-endian parsing are equally inefficient. You need to locate the decimal point before you can identify what power of 10 any of the digits is.
Good post. On a detail I’d use the word ‘opportunities’ rather than ‘bids’, which sounds like ‘offers’ - whereas in various of these examples you’re not being explicitly offered a social opportunity by someone. But the situation contains an opportunity.
Surprisingly, glass bottles are even worse for microplastics than plastic bottles! Apparently due to plastic coating on the metal cap:
https://www.sciencedirect.com/science/article/pii/S0889157525005344
Kids physically can’t understand negative numbers until a certain age. The abstract thinking parts of their brain aren’t done cooking yet. Do you know what age that is?
JFC. When I was 12 I was coding in assembly language. I couldn’t understand negative numbers, only two’s complement representation.
Sure, lots of frivolous consumer goods have gotten cheaper but healthcare, housing, childcare, and education, all the important stuff, has exploded in price.
Eh? No, the chart says housing has got cheaper, relative to wages. And childcare hasn’t gone up much
I’ve heard it said that you should have a pair of scissors in every room in your house. (The scissors incidentally tend to differ depending on location, eg kitchen scissors vs nail scissors, but that’s a detail.)