Hysterical: "A bot attempting to brute force a solution to the above example will need to work its way through (26)(25)(24) = 15,600 possible combinations. Asking for the identification of four unique features gives 358,800 possible combinations while 5 unique features will render 7,893,600 possible combinations"
This situation reminds me of a Simpson's quote funnily enough.. let me see if I can dig it up.
Lisa: What have you done with my report?
Bart: I've hidden it. To find it you'll need to decipher a series of clues, each more fiendish than...
Lisa: Got it!
Bart: D'oh!
These numbers don't mean much, but it's "hilarious" because you could simply generate every image and compare them to the one on the site in about a second of CPU time.
This is actually not a meaningful way to attack current CAPTCHAs, so now that I think about it... this 3D CAPTCHA would probably be less secure than the current ones that rely on OCR.
The final image is rendered based on random variables /each time/ - Even just moving the light source would result in an entirely (from a bitmap point of view) different image.
So even if you could get your hands on the 3D source file used for rendering, generating all possible images is impossible.
The numbers don't refer to brute force since the answer changes on each try.
If you used the same answer 'ABC' each time you'd take (assuming perfect random distribution) 15,600 tries before getting it right.
I don't know about you, but after getting 15 thousand failed requests in a row from the same IP, I'd assume they were a bot ;)
While moving the light source results in different pixels, the object silhouettes don't change, and even internal edges will remain reasonably consistent under different lighting.
Given an object under different lighting and vantage points, the captcha breaker can build a similar object and automatically generate a database of silhouettes from a sparsely sampled set of vantage points. Then, given a captcha image, he can search the database for an approximate silhouette match, then iteratively improve the vantage point by matching the silhouettes of nearby views. Since the vantage point and the labeled object entirely determines the captcha answer, this approach may be good enough to break the captcha.
A more dynamic scene would be more challenging for this approach, but it would also be more difficult for the server to come up with human-solvable scenes.
You are describing object recognition, which even on just a 2D static image is an insanely hard to get working correctly (I have experience as it was my university project).
However in a 3D context there is no way a computer can infer what an object would look from a different vantage point, since not even a human can do this.
For example, looking at a CRT and an LCD head on, would give you the same image - but would give you no information about the depth of the monitor. Multiple view points would help the computer figure out the full three dimensional object, but then again, object recognition comes into play, which object is which?
This system works with humans because we have good 3D object recognition and a huge database of experience with which to compare it against, all of which is calculated in an instant.
Replicating that behaviour in a computer is still a long way away.
I realize that the general problem of object recognition can be arbitrarily difficult, but so is the general problem of text recognition: How can a computer determine if a downward stroke is a one, or a lowercase L, or an uppercase I? And yet the text-recognition captchas have been broken -- not because the problem is easy but because captcha breakers have exploited artifacts of individual captchas to get a correct answer a modest percentage of the time. The 3D captcha (as the article author described it) is highly constrained -- a small library of objects in static poses -- so it has similarly exploitable artifacts.
The captcha-breaking computer has no need to infer what an object would look like from another view if someone has already manually reproduced the library of models; in that case the problem reduces to identifying which models from the library are in the picture and what angle they are being viewed from. Although the problem is no doubt difficult, the silhouette strategy I described is similar to other published object recognition approaches known to work, e.g.:
And the approach doesn't need to work perfectly: the captcha breaker is only interested in improving his chances of guessing correctly. If an automated approach only guesses correctly even 20% of the time, the captcha is effectively broken.
In the case where two images of objects are very similar -- like your CRT vs. LCD example -- even a human would have difficulty differentiating. By definition that makes these objects bad for the captcha, so the captcha author would either leave them out of the library of objects, or he would need to make the captcha more tolerant of human error, which makes things easier for the captcha-breaker.
Agreed, however the article already acknowledges that approach (read near the end about the flower).
Luckily, unlike text, which follows a very constrained set of rules, (eg an X will always be two lines criss-crossed), the same doesn't apply to 3D objects, where you can have 2 images of the same object that look entirely different, a simple example being the chair, that comes in all varieties of shapes but still easily identifiable to a human.
So this would automatically require human input in respects to identifying the object, you can't create a program that would 'learn' new objects, at least, not yet.
Also, the silouhette strategy can only be applied when a shape remains relatively constant, moving the camera a little to the left would render a completely new silouhette.
Add that the bot would still need to be told how to answer the arbritrary 'How many legs does the chair that the man is sitting on have?' questions.
The fact that so much human input is required just to identify /one/ object in the captcha, the fact that once that object has been compromised it is trivial to switch in another one (which is impossible in text captcha because there are only 26+10 amount of characters that the whole world knows) means that this is a damn effective captcha.
It is a meaningful way to attack many of the current captchas, if the alphabet and the space of transformations applied to it is sufficiently small. It is being done right now, in fact.
Re #1, I presume you mean "switched off" to conserve battery. The thing is, a backlight wouldn't really work with the eInk screen. eInk is an ink technology designed to mimic ink on paper. Like paper, you read the screen by light bouncing off it, not through it. Think of reading a backlit sheet of paper, like a transparency on a projector -- more pleasant to read the reflection than have it lit from behind.
I have a Motorola MOTOFONE F3, which also uses an eInk display. It has a light source -- an LED between the display and outer plastic. Works ok, but direct reflected light (like a flashlight) works better in low light conditions.
I think you can just use a normal booklight with the Kindle.
Friendster -> MySpace : Poor Technical Execution -> Good Technical Execution, Less Features -> More Features, Niche/Techie -> Mainstream.
I would have to say that FriendFeed may take over in the Techie crowd, but it's not the mainstream equivalent of Twitter because it is too reliant on existing usage of other techie sites.
Facebook, with a little tweaking, is well positioned to be the mainstream Twitter. It already has a 'Feed', Mobile Updates, IM client. You may argue that the essence of Twitter is more than the sum of those parts, but for the Mainstream, a tweaked Mobile Personal Facebook Messaging would probably make an acceptable Twitter.
Alternately, Plurk has a very 'teen high school girls would love this' feel to it. Very interested to see how it develops.