Can someone explain what is going on here?

aidanns · on July 6, 2012

The file has been created in such a way that the web browser is ignoring the non-html parts of the document, while the image renderer is ignoring the parts that make up the html page.

The first part probably isn't too hard, since most web browsers go to great lengths to render non-standard html in a sensible way, I'm not too sure about the second part. I'm guessing the jpeg spec has some variable length space in some kind of file header that the html for the page can be put in to.

I read something similar a while back (I think it was called a Jafar attack) where a clever person worked out how to create a file that was both a valid .gif image and .jar java executable.

tedunangst · on July 6, 2012

jar files are just zip files, which put the header info at the end of the file, making it very easy to construct a jar/zip that's also got a different file header at the front. bad news for web apps which allow such files to be uploaded without inspecting them. it's not a terrible idea to always transcode all uploaded images/videos to prevent that.

duskwuff · on July 6, 2012

> I think it was called a Jafar attack

GIFAR: http://en.wikipedia.org/wiki/GIFAR

justinschuh · on July 6, 2012

Web browsers need to be very loose in how they interpret data for historical reasons. A lot of this is even codified in the current HTML standard, like always content-sniffing images and identifying data that can be ignored during parsing. You also have HTML comments, which is where most of the JPEG data is packed in this example. Combine that with the fact that image formats generally allow you to pack comments or other arbitrary metadata into fields, and you end up with a file that can be read as either a JPEG or HTML. Also, Michal has a weird thing for squirrels.

younata · on July 6, 2012

view source shows that he has an html document embedded in the jpeg.

Apparently, the jpeg format allows this.

agwa · on July 6, 2012

The HTML document is in the "comment" field of the JPEG, which is perfectly reasonable.

What is surprising is that web browsers just ignore the 24 bytes of binary data between the start of the file and the start of the HTML.

dunham · on July 6, 2012

It's interpreting that data as "text" and sticking it at the beginning of the <body> of the document. The css makes the body invisible so you don't see it on the screen (unless you disable that rule) - take a look in the DOM inspector.

jack-r-abbit · on July 6, 2012

As pointed out by someone else, the browser has been instructed via CSS to hide the body. If you inspect the page and manipulate the CSS to show the body, those odd bytes (ÿØÿàJFIF,,ÿþr) do get rendered.

ejdyksen · on July 6, 2012

That extra binary data starts with an HTML comment tag (<!--) which is never closed, so it makes sense that it is ignored.

Edit: Misread your comment...the bytes at the beginning of the file are hidden by CSS (as pointed out by others).

shuzchen · on July 6, 2012

Commenter is referring to the binary data at the beginning of the file, which makes up the file header for the jpeg. It is before the <html> tag and is neither commented out or actually ignored.

The browser actually picks that "text" up and shows it on page. It's just the html content itself contains some css rule to make that text not visible.

pgeorgi · on July 6, 2012

JPEG allows for additional data chunks (that's how thumbnails, EXIF data, ... are added). The HTML uses CSS to hide the "body" (since that would include the JPEG header), putting the real content in a container element that poses as new root.

Neat hack.

leeoniya · on July 6, 2012

W3C squirrelpocalypse?