Sometimes you'll come across a picture that looks all skewed, usually where someone cropped a small piece out of the corner of something larger:

Projecting a 3D scene to make a 2D picture unavoidably introduces distortion, worse the farther you are from the center, so if you crop to just the corner of a picture you're getting a lot of distortion.

It would be possible to make this better: your phone knows the distortions its lens makes, and every time you crop something it could automatically reproject the image. I can approximate correction with a vertical shear:

Followed by a horizontal scale:

In writing this up, however, I realized that my phone's (Pixel 5a) camera app takes a different approach. Instead of waiting for you to crop an image, it automatically fixes faces itself. The pictures above were taken with Open Camera, which doesn't have these smarts; here's what I get with the default camera app:

It made the fix initially, not on cropping; the full-size image also minimizes distortion around my face:

Since this is only applied to my face, though, and not the rest of my body, it does give a slightly uncanny effect. Compare to the full-size version of the Open Camera image:

You can see what the Pixel camera app did around my face by looking at the line of the bookcase:

In real life and in a raw photo that line would be straight:

And in my manually corrected version from above the line is straight:

But because the Pixel camera is doing its correction on the whole image, instead of waiting for you to crop, it isn't able to preserve all the straight lines in the original image.

I think the Pixel edit is probably still worth doing, since we care a lot about faces and they do look more natural projected as if you were looking straight at them. Reprojecting the whole image to minimize distortion when you crop would also be a pretty nifty feature, and doesn't introduce the kind of distortions that fixing one part of a larger image does.

New to LessWrong?

New Comment
5 comments, sorted by Click to highlight new comments since: Today at 4:59 AM

I think the Pixel edit is probably still worth doing, since we care a lot about faces

Destroying information automatically does not seem like a good thing to ever do.

If I take a picture, I want a sane, consistent, predictable mapping between the pixels and the light coming from whatever the camera was pointed at. If I want to distort the image in an unpredictable way, I can always do that in post.

Turns out this is not what most people want. Take the picture, programs automatically do their smart things, you have a good-looking picture you can upload to social media. Most people don't want to do anything in 'post', the camera should just handle it.

I'm sympathetic -- most of the time when I take a picture that is in fact what I want: something that looks good and I don't have to fuss with it. On the other hand, there are times when I and others want a picture as a predictable record of something, for measurements, etc, and not knowing what tricks the phone is applying isn't good there. For now I'd just use two camera apps for this depending on my purpose?

I work on the same floor as someone working on Google's AI features for camera. They said that the raw data they work with every time you press the shutter is a series of absolutely horrible looking photos which nobody would ever want to look at. They then combine that into a single good looking shot using a variety of classical and ML techniques.

My guess is this is true for most cameras and camera apps - they always destroy some information in order to give you a picture you like at first site.

I think the objection is mostly around predictability? Combining several images into one more accurate image isn't the issue, it's logic that does different things based on a more detailed model of the world. Things like recognizing common objects (moon etc) and using what you know they should look like as a prior in interpreting fuzzy images, picking which one of several sequential images has the best smile from each person in a group, or the automatic facial unstretching here.

Yeah, that. Simple stacking is one thing; it "concentrates" real information, so that the final image contains more "reality per bit". It's not worse than image compression, or choosing a focal point or aperture or exposure. Making things up is different, and making things up in a way that destroys real information is different yet again.

... and frankly picking the best smiles individually from a stack, or substituting your moon for mine, rises to the point of trying to rewrite my memories, and doing it without a specific user request is NOT OK (TM).

It's also worth mentioning that the reason they do a lot of that stuff is that the camera hardware is fundamentally very bad, with really noisy sensors and distorted optics. I know you can't really do better in a tiny camera, but there's an almost fraudulent attempt to hide the hardware limitations from the consumer.