ESRI claims: "Super Resolution is an image transformation technique with the help of which we can improve the quality of image and recover high resolution image from a given low resolution image."
I would argue this is not true - rather, super resolution generates a plausible high resolution image that would look like a given low resolution image if it were downscaled (i.e., it's not going to recover real details, it's just going to sharpen lines and potentially show details that look real but might not be).
Edit: As an example, in the lower half of figure 5, the algorithm displays circular white dots on the roof, when in reality they are rectangles. An image analyst using this tool might, for example, incorrectly geolocate an image taken on the ground. This tool probably needs a warning label on it.
That's exactly true. It worries me a lot that ESRI (who really should know better) is using the term "resolution" for this. This sort of superresolution is just interpolation. By definition, you're not adding more information density. It cannot increase resolution. It can increase edge response, but that's critically not resolution.
There are superresolution methods that _can_ increase resolution, but they're in the form of combining multiple captures closely spaced with each other that are slightly offset. "Drizzle" was the original method in astrophysics, and while that method is long gone, it is common to do in remote sensing imagery for many instrument types. Similarly pansharpening (which interpolates multispectral information using a higher resolution black and white image captured at the same time) can actually improve resolution and is commonly used.
This is not improving resolution in any way. Two objects that blur together into one will still appear as one object.
This is absolutely true. A very dangerous use-case of DL upscaling. Even without claiming it could "recover" imagery it is a dangerous feature as users could be misconceived, but ArcGIS explicitly claiming that? Outrageous.
To make absolutely clear: any details revealed by this upscaling do not exist. They are guesses based upon other imagery
Totally agreed. I take some issue with the verbiage, particularly:
> "Figure 1: Recovering high resolution image from low resolution"
This is terribly misleading. There is no data being "recovered" here. A ML model is guessing at the result based on other training data. It may (and in fact is likely to) make stuff up entirely based on what it thinks should be there.
I'm generally pretty live and let live when it comes to ML-based upscaling, because if some drawing or personal photograph has some artifacting it's pretty harmless. But when you're doing it in a tool whose data will be relied upon for Real Stuff, one needs to be painfully accurate when it comes to what the system does and its limitations.
> it's not going to recover real details, it's just going to sharpen lines and potentially show details that look real but might not be
Super resolution as a name covers a number of techniques, but some absolutely can recover real details. Eg Superresolution from video will integrate over time, allowing sub-pixel accuracy. Imagine a grid of pixels, each of which covering a defined area of the source. Now, if that grid moves, the area covered by each pixel will be slightly different. The differences between frames can then be used to determine with accuracy a higher resolution output image.
Indeed, in this case ESRI is talking about deep-learning based single image superresolution. With multiple images, even something as simple as shift-and-add can recover details (and lower noise in the picture), but you do require having multiple images. Video being a sequence of images, with large or small movement in between frames, can be an ideal source for robust SR algorithms. To complicate things further, there are deep-learing based multi-image SR algorithms too.
What is it about tech communities that this sort of clarification needs to be made. It's like if someone were to say "okay, I'll just hang out here, I'm not going anywhere" and someone were to respond "technically, that depends on your frame of reference. we're all orbiting the sun and therefore we are going somewhere".
At this point everyone knows the memes about CSI Enhance, the Xerox compression image hallucinations, blah blah. Must we constantly revisit subjects at a grade-school level?
It's because it's a frequent source of confusion in the actual industry.
I _constantly_ hear people claim that this method really does add information. My own company constantly makes similar claims, and internally, even for people who actually do work in the field, they honestly do think it adds resolution.
I'd be concerned it might remove data too. Its making a very statistically probable image given the input, so if there is an emergency, which by their nature are uncommon, it might edit it out?
It is perfectly possible that the interpolated image is a better statistical fit to reality than the source image.
This is more intuitive with height data. For example a 100m grid could have a 10m cell right beside a 20m cell. A point on the boundary between both cells is more likely to be at 15m than either 10m or 20m. And you can improve that estimate using other nuances. Picking a predicted value can be more truthful.
I agree that caution is important but that is true with any imagery analysis. A human analysing imagery will already be using a lot of intuition as it is.
But without this, all you have is a blurry image, in which you cannot accurately make out the details. You may wait until you get a better image due to different atmospheric conditions, to determine what they actually were. With this, you may get something hallucinated which looks very unexpected. Or in military contexts, maybe looks like a threat.
Someone with a little training and practice can get lots of information out of these kind of "blurry" images. You can make educated guesses based on understanding of the real world.
Imagine you are looking for swimming pools to find water to fight a wild fire. All you have is a "blurry" image. That blue shape on the image could be a weirdly shaped tent or patio. And that would be very obvious on high resolution drone imagery. But 99% of the time it will be a pool and that is good enough.
All data has limitations and this is no different.
Unfortunately it's an abuse of terminology that "stuck" a long time ago, and I think the ship has sailed for fixing that (at least in the image processing community). These sort of techniques were named in reference to more principled techniques involving multiple measurements and precise control of imaging systems, allowing you to estimate higher resolutions than your sensor could achieve.
I've seen this exact same problem apply to the war in Ukraine for armored vehicle identification. There's a blurred still of a video showing an abandoned unrecognizable tank, and some guys arrives and say: “look I enhanced the image quality and now we can clearly see that it's a T80-BVM because we can see [insert the appropriate detail here]” where the “details” are by design completely made up by the neural net.
This kind of image transformation is cool when all you care about is esthetic quality of the image, but if you want to see details, then Mark I Eyeball is the best tool you can hope for, because if the thing is unrecognizable, you'll know it and won't make things up to pretend it is not.
This whole concept is so reckless in realms where the image content actually matters and people keep doing it anyways. You cannot CREATE information. You can infer it in certain situations, but if you infer the information and then analyze it you are setting yourself up to make mistakes by overextrapolating a bias/trend in your data to images where you have no idea if that inference is valid.
This was a big thing in the medical imaging community (where I did my stint as a CV researcher), folks were hallucinating microscope images and CT scans with no information theory justification as to why it worked.
Super resolution IS possible, but it must be done by synthesizing new pieces of information, not by inferring based on what other similar looking objects looked like. A cool technique by my former advisor does this with microscopes [1].
Deep learning has a place here, just not as a "lets create information" step, but as a way to learn how to synthesize additional information about images from more sources (i.e. more similar to how Google does Night Sight [2]).
Edit: if you want to see (an attempt) at using deep learning in this field you can checkout one of my papers [3].
This is all very sensible criticism but a bit generic.
Sometimes detail accuracy doesn't matter but the presence does.
Just about every image you ever view has had some manipulation applied. Sometimes that results in a "better" image.
Consider all astronomical images for human consumption, even smartphones adapt now to skin tone.
I'm playing hogwarts legacy, a recent AAA game which is very demanding, and where aesthetics are very important on a mediocre PC precisely because FSR from AMD (and if I had an Nvidea GPU DLSS and DLAA).
I should have been more nuanced I suppose. There is a time and place for these kinds of image "enhancements", they just don't belong in ESRI's scientific GIS platform. Folks don't view these images for pleasure (or at least very few do), they are typically used to analyze the satellite data or georeference other imagery.
Deep learning image enhancement is totally appropriate in your smartphone, as there the goal is not accuracy but perceived quality. Doing this to satellite imagery where the primary consumer cares about accuracy is what I call "reckless"
just wait for the 'real time enhancement' of drone cameras for critical security applications. Snark aside, it is very reasonable and competitive to want to do color correction, focus and shadow darkness on-the-fly; secondly raw sensing data is very large, but on-the-fly capture is bandwidth-sensitive, so very clever compression and band reduction is also desirable and indeed competitive.
What happens if there are multi-million dollar economic outcomes depending on the details of the remote sensing content, as in disaster response.
Agree, you can even see on their input/predicted/target examples that the created/invented data is off enough from ground truth to be in some cases unsuitable for photo interpretation.
A lot of replies here which are based on out of date information and misunderstandings about deep learning super resolution (DLSR).
Firstly, this isn't just doing edge detection etc as happened 20 years ago. It's creating a deep learning model which fills in information based on extrapolating from looking at lots of other images and making an educated guess as to what is most likely to be present. It's a fairly new approach and works much better than previous methods, given enough training data.
This is, of course, imperfect, but claims that it's just to "look good" and is of no practical benefit are incorrect. For instance, we published a paper using DLSR in microscopy, to help experts identify synaptic vesicles. In the original images the experts had a 3x higher false negative rate compared to the DLSR images.
Finally, claims that this can't be called "super resolution" are ignoring years of peer reviewed published research in which this is exactly the name used for this approach. Yes, super resolution can also be achieved using other methods which take advantage of additional data (such as multiple images), but that does not mean DLSR has to be called something else.
(I'm not involved in ArcGIS Pro, but am the lead author of the fastai framework which underpins ArcGIS training behind the scenes, and have published papers and tutorials on super resolution using deep learning.)
I thought "super resolution" used to mean: construct a high res image from several low res images of the same thing. Which means it isn't making up information (I don't think).
ML-based "super resolution" is more trying to "guess" what the extra pixel data is based on images of other things. I thought that was (should be) called "upscaling".
Are we changing the meaning of this term? Or do I have it wrong? (Or do they have it wrong?)
There are a number of different algorithms that people use. You're right about the algorithms that use several images. They may not be making up anything. But there are others that do their best guess and they might be said to be making things up.
> Super Resolution is an image transformation technique with the help of which we can improve the quality of image and recover high resolution image from a given low resolution image
Saying you can “recover” high resolution seems like a bit of a stretch.
Uh… I mean, it looks pretty but it’s basically just invented a bunch of random probabilistic crap on your map right?
Is this a thing? People actually want their maps to be pretty and wrong?
O_o
Strange times.
Particularly blurry blocks that are “maybe cars?” in a backyard being turned into high resolution cars. Or blue splat into “100% a swimming pool” seems… pretty dubious.
I'd rather look at something sharp that's potentially likely to be there than look at a blurry mess and have no idea what it could even be at all.
A lot of things on maps repeat themselves a lot. Most roof HVACs look the same, road lines, cars, trees, etc. How likely is it that a car shaped blob that's the same as the other 10 million car shaped blobs in the training data isn't a car? Not very.
Well, if a blue splat is (just an example) 93% statistically close to being a swimming pool, there is nothing wrong with upscaling it for many cases where absolute accuracy is not required. Larger and better quality datasets will decrease the amount of possible failure modes after a few development / training iterations of this algorithm
Of course if outliers / anomalies are very important for persons business or use-case they shouldn't use this feature.
Sorry but no map is true and all maps are wrong. What you are is an approximation of the underlying data, and if you want to find truth in it, you should use R or SpatialSQL to do inference.
Bing uses similar treatment on their satellite imagery and OSM mappers (who are permitted to use it) have been complaining it looks more like painting than actual imagery.
Satellite images have 0.4 m resolution or better these days, what matters most is good color quality (e.g. hyperspherical pansharpening), sensor dynamic range and good lighting conditions (sun elevation in particular).
"Satellite" images on maps are often not from satellites but rather planes or drones.
I guess with this technique you could do ML image synthesis guided by SAR satellites - that way you could sort-of look through clouds from space, as long as you don't mind the image being largely fictional.
Every smartphone camera applies ML post processing to the image, and have for multiple years now. The newest phones even have purpose-built tensor chips. It's why we can have crisp 100x zoom, night images in near total darkness, and daylight photos that look like they were taken with a much more expensive DSLR.
Phone manufacturers haven't been making revolutionary advances in image sensors or the centuries long science of camera optics — they've learned how to take a few noisy signals and extrapolate them into plausible high fidelity data.
Not exactly the same because the phone camera can take several sensor scans, bracketed or otherwise, to create one image that the user sees. Not surprised ML is involved, but it's more akin to a photographer taking several images and combining them in lightroom or photoshop than it is to this situation where the model only has one low-grade image to work with.
> Super Resolution is used in many fields like surveillance, medical industry, forensic and satellite imaging applications.
Computer, generate a list of applications where technology like this should absolutely not be used.
This seems terrifying — wouldn't this, for example, synthesize identifying details about motor vehicles not present in the sample, drawn from training data? About people? Etc etc
This sounds awesome for making a consumer mapping product feel higher quality. Hell, it could even serve an anonymizing function that is pro-privacy ("a roof" not "your roof"). But it feels incredibly reckless to direct this toward those listed industries.
People in surveillance and forensics etc. should be confronted with the limits of the quality of the data they are using, we should not try to synthesize extra confidence in their analysis by making the images seem higher quality than they are.
ArcGIS is used in a lot of life critical applications including law enforcement, emergency services and national defense, even for less serious applications such as land management I’m not sure that adding LSD mode is ever appropriate for these use cases.
This is great! It would be immediately useful to me if the trained dataset was uploaded to ArcGIS online so I can access it as a data layer through the API
I would argue this is not true - rather, super resolution generates a plausible high resolution image that would look like a given low resolution image if it were downscaled (i.e., it's not going to recover real details, it's just going to sharpen lines and potentially show details that look real but might not be).
Edit: As an example, in the lower half of figure 5, the algorithm displays circular white dots on the roof, when in reality they are rectangles. An image analyst using this tool might, for example, incorrectly geolocate an image taken on the ground. This tool probably needs a warning label on it.