IMAGINATION: A Robust Image-based CAPTCHA Generation System
description
Transcript of IMAGINATION: A Robust Image-based CAPTCHA Generation System
IMAGINATION: A Robust Image-based CAPTCHA Generation System
Ritendra Datta, Jia Li, and James Z. WangThe Pennsylvania State University – University Park
ACM International Conference on Multimedia, November 2005
What are CAPTCHAs1,2 ? Completely Automated Public Test to
Tell Computers and Humans Apart. Web-based protection mechanisms Only humans allowed to perform certain
tasks` Opening E-mail accounts Voting on-line, etc.
Prevent automated attacks by bots To avoid eating up resources To avoid biasing results, etc.
Most current systems - text-based. Text-based CAPTCHAs
1. L. von Ahn et al., CACM, 2004.2. The CAPTCHA Project – http://www.captcha.net
Why image-based CAPTCHAs ? Computer vision techniques1,2,3 have
broken text-based CAPTCHAs Over 90% accuracy Makes these systems vulnerable
Solution More noise – harder for humans too Natural image based CAPTCHAs
Present an image to the user User labels content
Hard to attack Image recognition is a hard problem Hence more secure CAPTCHAs !
1. G. Mori et al., CVPR, 2003. 2. A. Thayananthan et al., CVPR, 2004.3. G. Moy et al., CVPR, 2004.
Image-based CAPTCHAs
(Courtesy: The Captcha Project, CMU)
What’s the problem ? CBIR (e.g. SIMPLIcity) and automated
annotation systems (e.g. ALIP) may attack Solution: Generate CAPTCHA images that
Humans can easily label Automated systems fail in most cases
How Use systematic distortions on images.
Dithering, noise, quantizing etc. Maintain low perceptual degradation Test using state-of-the-art automated
systems Optimize attack rate & perceptual
quality Generate word choices systematically to
reduce ambiguity and attack chance
SIMPLIcity and ALIP (Pictures courtesy Corel)
The IMAGINATION System Image Generation for Internet
Authentication. Exploits the difference between
human perception and current level of machine perception.
Generates a CAPTCHA based on a hard AI problem.
Breaking IMAGINATION, though highly unlikely, would in turn advance the state-of-the-art in AI.
Uses a two-phase click-and-annotate process to achieve very low chance of attack.
Click Phase – Select center of an image
Annotate Phase – Select best label from list
The IMAGINATION System: Architecture
Composite Image Generation
Composite image generation by re-partitioning and dithering using different randomly chosen base colors
Composite Distortion Selection
How to smartly choose distortions that can be applied to the images ?
Use state-of-the-art CBIR/related systems that can be potential attack weapons
Enforce probabilistic constraints on what is a good distortion Make some realistic assumptions Generate many distortions Choose a subset that satisfies
these constraints Include in the IMAGINATION
system
A tiger image distorted by four acceptable composite distortions
Composite Distortions: Probabilistic Constraints
An image distortion is considered acceptable, if probabilistically, potential attack algorithms are unable to significantly reduce the uncertainty associated with the labeling of those images
Composite Distortions in IMAGINATION
Schematic view of the four composite distortions satisfying the probabilistic constraints and hence chosen for the IMAGINATION system
Word Choice Generation User choose instead of types:
Avoid spelling mistakes, polysemy etc. More user-friendly (critical) But leads to higher attack chance !
Three issues with choice list generation Ambiguity (e.g. Dog and Wolf) Attack using word choices themselves
(Odd-one-out) Multiple valid labels
Solution Use the WordNet ontology Solve heuristically by constructing a
word hyper-tetrahedron
W1 W2
W4W3
d1,3d2,4,
d1,4d1,3
d1,2
d3,4
A word hyper-tetrahedron (K=4)
Wk = word choice, k = {1, …, K}
di,j = WordNet distance between Wi & Wj
Constraint: di,j ≈ δ, for all (i,j)
Conclusions New form of CAPTCHA
Likely to be more robust against attacks Some issues
Need more rigorous testing against many attack scenarios User-friendliness is critical – needs large-scale testing
Given these issues are somewhat addressed Promise of a more secure Internet Web servers more reliable Potential for commercialization