Stable Diffusion and DALL·E have made it simpler to create new images. But getting photorealistic results is still a challenge…
This is a continuation of our work from 3 months ago on “This Food Does Not Exist” (https://news.ycombinator.com/item?id=32167704). We are now using both Stable Diffusion and GANs depending on the subjects we want to render.
these are AI generated 'photorealistic' images, not 'ai generated photography'. Photography is a particular process you are not involved in during the course of these AI activities, as far as I can tell. If you somehow have Machine Learning algorithms operating cameras it would be correct.
You’re right; then again, my money’s on “ai photo” becoming common enough to acquire meaning on its own. We already have a glow-worm (not a worm), funny bone (not a bone), Baby Yoda (not the Yoda), and many more.
that other people are incorrect is no reason to volunteer to be incorrect too. In fact I'd say its the opposite of good to use the wrong terms when you know better.
I suppose 'AI Generated Photography' sounds better than 'yet another stable diffusion model' but one is correct and the other isn't. This has nothing to do with photography.
I find Stable Diffusion is pretty good at generating single-subject images like this. It's really mind blowing and the novelty hasn't worn off for me yet.
But after dozens of attempts I still haven't managed to get it to show me a photograph of a duck eating a hoagie at Niagara Falls. I think it would be really interesting to try to find the simplest query that these tools cannot produce.
I had considered this, I'm not sure it's the issue since Niagara Falls is pretty iconic. I'll give it a shot with surrogates in place of Niagara Falls and see what happens. The bigger difficulty it seems is getting a duck to eat a hoagie.
I can get a hoagie at Niagara Falls, I can get a duck at Niagara Falls, I can even get a hoagie and a duck together near some water, I can almost get a duck eating a hoagie (I've gotten the duck near the hoagie with its mouth open), but I can't get a duck eating a hoagie at Niagara Falls.
The stock photo companies simply pivot to AI, see companies like StockAI popping up [0]. Getty should also get in this business but it seems they're going in the opposite direction, banning AI generated images.
>> From FAQs: “we are using both diffusion models and GANs in combination with an extensive filtering and quality assessment pipeline that allows us to generate photorealistic images at scale.”
For every image that reaches the site, how many were generated that were filtered out by the pipeline? For example, for every photo that reaches site, 1000 were generated but did not pass the quality assessment pipeline.
It really depends on the subjects. For example cookies are quite forgiving - up to 30% of results are good. For things like coral or sushi we keep less than 1 in 10 (which is already much better than where we were at a few months ago!)
For now we still keep a close manual look at the output, but the goal as we scale up is to fully automate this selection process. Right now we have a pipeline that ranks the outputs and we select from the top results.
As a meatspace photographer, I take some comfort that the photograph in column 3, row 25 has an issue. The mountain peak has abundant snow. Its reflection in the lake doesn't. There are similar snow reflection disparities in several of the mountains-reflected-in-water pix.
current AI frequently fail with any hard expectation of structure, like when it tries to genarate human bodies and it doesn't grasp at all how bones work, bending limba in unreal ways or adding extra ones, or creating heads with wrong proportions/spacing. It is pretty good at grasping how to fade textures/colors but it has little grasp on structure and "the big picture".
One thing I think would be super fascinating to have is a system that can take AI image training sets and reverse engineer which pictures were used to make the output. Take the bunny. I bet there were a lot of bunny pictures in the training set that looked very similar to the generated one. It would be interesting to have a system pick the one that is closest and display it next to it. It would be show how original (or unoriginal) these images are.
It's interesting to see what types of features these models don't distinguish well. For example, I've noticed that a lot of models have trouble with giving lady bugs distinct spots. Instead they usually end up with a big black splotch.
Stable Diffusion and DALL·E have made it simpler to create new images. But getting photorealistic results is still a challenge…
This is a continuation of our work from 3 months ago on “This Food Does Not Exist” (https://news.ycombinator.com/item?id=32167704). We are now using both Stable Diffusion and GANs depending on the subjects we want to render.