Spatial Business Detection and Recognition from Images
description
Transcript of Spatial Business Detection and Recognition from Images
Spatial Business Detection and Recognition from Images
Spatial Business Detection and Recognition from ImagesAlexander DarinoWeeks 10 & 11STR ImplementationSTR Implementation: Automatic Detection and Recognition of Signs From Natural ScenesMultiresolution-based potential characters detectionCharacter/layout geometry and color properties analysisLocal affine rectificationRefined DetectionRefined DetectionOne Font per classifier, a-z A-ZGenerate alphabet templatesResize & center templates; Divide into grid (7x7)Apply several 2D Gabor filters to each grid patchDifferent orientations, frequencies, variances,For each pixel, yields real/imaginary component of transformationFeed data into Linear Discriminant AnalysisReduces features and forms classifier at same time2D Gabor FilterConvolution of Gaussian x Sine wave
Training Process
Character DeterminationEach grid patch has its own LDA classifier; classifier returns vector of probabilities for each symbolTo classify overall character, recursively consider all 9-neighborhoods, multiply corresponding probabilities togetherWhen only one grid-patch remains, highest probability winsRecognition ProcessColor Properties Analysis: Choose channel with highest confidence of best distinguishing foreground from backgroundBinarization Threshold (50% of Otsus Method)Intermediate Representation: Trim, Resize, and Center Binary ImagePerform OCR on variations of Int. Rep: stretched, eroded (gaussian-based), dilutedAggregate and return votesRecognition Process Example:G using Trebuchet-MS Classifier
Query Character(Actual Size)
Intermediate Representation(Actual Size)abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZRecognition Process Example:G using Trebuchet-MS ClassifierVariation (Actual Size)Identified Character: g
Variation (Actual Size)Identified Character: sVariation (Actual Size)Identified Character: G
Recognition Process Example:G using Trebuchet-MS ClassifierVariation (Actual Size)Identified Character: g
Variation (Actual Size)Identified Character: gVariation (Actual Size)Identified Character: B
Recognition Process Example:G using Trebuchet-MS ClassifierVariation (Actual Size)Identified Character: GVariation (Actual Size)Identified Character: GVariation (Actual Size)Identified Character: B
Recognition Process Example:G using Trebuchet-MS ClassifierVariation (Actual Size)Identified Character: BVariation (Actual Size)Identified Character: BVariation (Actual Size)Identified Character: G
Recognition Process Example:G using Trebuchet-MS ClassifierVariation (Actual Size)Identified Character: GVariation (Actual Size)Identified Character: BVariation (Actual Size)Identified Character: a
Recognition Process Example:G using Trebuchet-MS ClassifierFinal Results:B: 5/15G: 5/15g: 3/15a : 1 (6.6%)s : 1 (6.6%)
GEORGE (Trebuchet-MS)
Votes:E: 14/15t: 1/15GEORGE (Trebuchet-MS)Votes:j: 13/15i: 2/15
j is the default when unable to decide
Should invert during preprocessing
GEORGE (Trebuchet-MS)Votes:j: 13/15i: 1/15M: 1/15j is the default when unable to decide
Should invert during preprocessing
GEORGE (Trebuchet-MS)Votes:B: 5/15G: 5/15g: 3/15a: 1/15s: 1/15
GEORGE (Trebuchet-MS)Votes:j: 12/15Y: 2/15X: 1/15
j is the default when unable to decide
Should invert during preprocessing or training
Note on the Inversion ProblemEasy to fix; common problem in OCR systemsWill likely detect and correct during preprocessing state as opposed to trainingMore training data: slower, less reliablePreprocessing: like trying many different lenses at the eye doctor and taking your best guess with each lense.BAKERY(Actual: Tw-Cen-MT, Used: Arial)abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ
abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ
BAKERY(Actual: Tw-Cen-MT, Used: Arial)Votes:B: 9/15j: 3/15H: 2/15F: 1/15
BAKERY(Actual: Tw-Cen-MT, Used: Arial)Votes:A: 9/15j: 5/15n: 1/15
BAKERY(Actual: Tw-Cen-MT, Used: Arial)Votes:K: 12/15j: 2/15H: 1/15
BAKERY(Actual: Tw-Cen-MT, Used: Arial)Votes:E: 5/15j: 3/15L: 3/15r: 2/15F: 2/15
BAKERY(Actual: Tw-Cen-MT, Used: Arial)Votes:p: 12/15j: 3/15
PRBAKERY(Actual: Tw-Cen-MT, Used: Arial)Votes:Y: 12/15j: 3/15
UNIVERSITY(Used: Times New Roman)abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ
UNIVERSITY(Used: Times New Roman)Votes:U: 8/15C: 3/15j: 2/15s: 1/15O: 1/15
UNIVERSITY(Used: Times New Roman)Votes:N: 12/15j: 3/15
UNIVERSITY(Used: Times New Roman)Votes:l(el): 9/15I(eye): 6/15
UNIVERSITY(Used: Times New Roman)Votes:v: 9/15j: 3/15V: 3/15
UNIVERSITY(Used: Times New Roman)Votes:F: 9/15L: 5/15l (el): 1/15
UNIVERSITY(Used: Times New Roman)Votes:G: 9/15j: 6/15
UNIVERSITY(Used: Times New Roman)Votes:j: 12/15x: 2/15w: 1/15
UNIVERSITY(Used: Times New Roman)Votes:j: 5/15C: 4/15O: 4/15x: 2/15
UNIVERSITY(Used: Times New Roman)Votes:T: 9/15l: 3/15i: 1/15j: 1/15L: 1/15
UNIVERSITY(Used: Times New Roman)Votes:Y: 10/15j: 3/15i: 2/15
EvaluationBiggest weaknesses in preprocessing stageOCR sensitive to thresholding/color inversionOccasionally color modeling chooses a bad channel to use for OCR happens more often on low-resolution imagesWorks surprisingly well for low-resolution imagesFont does not need to be exact, but proportions need to be roughly the sameHow do I use this information?The Big PictureLatitudeLongitudeGeocoding
ReverseGeocodingNearby BusinessesImageSTRDetected TextBusiness Name MatchingBusinessIdentificationBusiness SpatialDetection43Old ApproachForm words from highest-voted charactersCompare to lexicon using Levenshtein distanceUse existing ranking system afterwards
BOKFRY > BAKERY (L-DIST = 2)GFQRGF > GEORGE (L-DIST = 3)New Approach (Lexicon-assisted STR)Minimize Levenshtein distance with best permutation of voted charactersUse existing ranking system afterwards
B O K F P YG U H E R I >>> BAKERYJ A j L I l (L-DIST = 0)
The End Result46Bruegger's BagelsCategory:BagelsAddress:Market SqPittsburgh, PA 15222Phone: (412) 281-2515Rating: Not Rated46Next StepsFix STR PreprocessingBug in Color Modeling code found onlineInversion determinationMultiple thresholdsWord matching: Generate templates of words/logos instead of lettersText detector: fix character/word fragmentation by reading papers that address the issueNext StepsTest more images; fix problems as they ariseIdeas to consider:Feed grid-patch probability vectors into SVM instead of smoothingGenerate disambiguation classifiers to differentiate:Between top contending votes. Remember how G and B got confused? Dynamically create classifier to tell them apartBetween commonly confused letters. Eg. E/F, l/i/j, o/c, etcDont consider statistically insignificant confidencesNext StepsText DetectionLook into after more work has been done on STRNeed to address issues:Intracharacter segmentationIntercharacter segmentationWord segmentationNeeded to make STR system automated like beforeThank You