IMAGINATION: A Robust Image-based CAPTCHA Generation System

12
IMAGINATION: A Robust Image-based CAPTCHA Generation System Ritendra Datta, Jia Li, and James Z. Wang The Pennsylvania State University – University Park ACM International Conference on Multimedia, November 2005

description

IMAGINATION: A Robust Image-based CAPTCHA Generation System. ACM International Conference on Multimedia, November 2005. Ritendra Datta, Jia Li, and James Z. Wang The Pennsylvania State University – University Park. What are CAPTCHA s 1,2 ?. - PowerPoint PPT Presentation

Transcript of IMAGINATION: A Robust Image-based CAPTCHA Generation System

Page 1: IMAGINATION: A Robust Image-based CAPTCHA Generation System

IMAGINATION: A Robust Image-based CAPTCHA Generation System

Ritendra Datta, Jia Li, and James Z. WangThe Pennsylvania State University – University Park

ACM International Conference on Multimedia, November 2005

Page 2: IMAGINATION: A Robust Image-based CAPTCHA Generation System

What are CAPTCHAs1,2 ? Completely Automated Public Test to

Tell Computers and Humans Apart. Web-based protection mechanisms Only humans allowed to perform certain

tasks` Opening E-mail accounts Voting on-line, etc.

Prevent automated attacks by bots To avoid eating up resources To avoid biasing results, etc.

Most current systems - text-based. Text-based CAPTCHAs

1. L. von Ahn et al., CACM, 2004.2. The CAPTCHA Project – http://www.captcha.net

Page 3: IMAGINATION: A Robust Image-based CAPTCHA Generation System

Why image-based CAPTCHAs ? Computer vision techniques1,2,3 have

broken text-based CAPTCHAs Over 90% accuracy Makes these systems vulnerable

Solution More noise – harder for humans too Natural image based CAPTCHAs

Present an image to the user User labels content

Hard to attack Image recognition is a hard problem Hence more secure CAPTCHAs !

1. G. Mori et al., CVPR, 2003. 2. A. Thayananthan et al., CVPR, 2004.3. G. Moy et al., CVPR, 2004.

Image-based CAPTCHAs

(Courtesy: The Captcha Project, CMU)

Page 4: IMAGINATION: A Robust Image-based CAPTCHA Generation System

What’s the problem ? CBIR (e.g. SIMPLIcity) and automated

annotation systems (e.g. ALIP) may attack Solution: Generate CAPTCHA images that

Humans can easily label Automated systems fail in most cases

How Use systematic distortions on images.

Dithering, noise, quantizing etc. Maintain low perceptual degradation Test using state-of-the-art automated

systems Optimize attack rate & perceptual

quality Generate word choices systematically to

reduce ambiguity and attack chance

SIMPLIcity and ALIP (Pictures courtesy Corel)

Page 5: IMAGINATION: A Robust Image-based CAPTCHA Generation System

The IMAGINATION System Image Generation for Internet

Authentication. Exploits the difference between

human perception and current level of machine perception.

Generates a CAPTCHA based on a hard AI problem.

Breaking IMAGINATION, though highly unlikely, would in turn advance the state-of-the-art in AI.

Uses a two-phase click-and-annotate process to achieve very low chance of attack.

Click Phase – Select center of an image

Annotate Phase – Select best label from list

Page 6: IMAGINATION: A Robust Image-based CAPTCHA Generation System

The IMAGINATION System: Architecture

Page 7: IMAGINATION: A Robust Image-based CAPTCHA Generation System

Composite Image Generation

Composite image generation by re-partitioning and dithering using different randomly chosen base colors

Page 8: IMAGINATION: A Robust Image-based CAPTCHA Generation System

Composite Distortion Selection

How to smartly choose distortions that can be applied to the images ?

Use state-of-the-art CBIR/related systems that can be potential attack weapons

Enforce probabilistic constraints on what is a good distortion Make some realistic assumptions Generate many distortions Choose a subset that satisfies

these constraints Include in the IMAGINATION

system

A tiger image distorted by four acceptable composite distortions

Page 9: IMAGINATION: A Robust Image-based CAPTCHA Generation System

Composite Distortions: Probabilistic Constraints

An image distortion is considered acceptable, if probabilistically, potential attack algorithms are unable to significantly reduce the uncertainty associated with the labeling of those images

Page 10: IMAGINATION: A Robust Image-based CAPTCHA Generation System

Composite Distortions in IMAGINATION

Schematic view of the four composite distortions satisfying the probabilistic constraints and hence chosen for the IMAGINATION system

Page 11: IMAGINATION: A Robust Image-based CAPTCHA Generation System

Word Choice Generation User choose instead of types:

Avoid spelling mistakes, polysemy etc. More user-friendly (critical) But leads to higher attack chance !

Three issues with choice list generation Ambiguity (e.g. Dog and Wolf) Attack using word choices themselves

(Odd-one-out) Multiple valid labels

Solution Use the WordNet ontology Solve heuristically by constructing a

word hyper-tetrahedron

W1 W2

W4W3

d1,3d2,4,

d1,4d1,3

d1,2

d3,4

A word hyper-tetrahedron (K=4)

Wk = word choice, k = {1, …, K}

di,j = WordNet distance between Wi & Wj

Constraint: di,j ≈ δ, for all (i,j)

Page 12: IMAGINATION: A Robust Image-based CAPTCHA Generation System

Conclusions New form of CAPTCHA

Likely to be more robust against attacks Some issues

Need more rigorous testing against many attack scenarios User-friendliness is critical – needs large-scale testing

Given these issues are somewhat addressed Promise of a more secure Internet Web servers more reliable Potential for commercialization