IRJET-Text Detection in Multi-Oriented Natural Scene Images

International Research Journal of Engineering and Technology (IRJET)e-ISSN: 2395 -0056 Volume: 02 Issue: 04 | July-2015 www.irjet.netp-ISSN: 2395-0072 2015, IRJET.NET- All Rights Reserved Page 1357 Text Detection in Multi-Oriented Natural Scene Images M. Fouzia1, C. Shoba Bindu2 1 P.G. Student, Department of CSE, JNTU College of Engineering, Anantapur, Andhra Pradesh, India 2 Associate Professor, Department of CSE, JNTU College of Engineering, Anantapur, Andhra Pradesh, India ---------------------------------------------------------------------***---------------------------------------------------------------------Abstract-Withthegrowingnumberofdigital multimedialibraries,theneedtoefficientlyindex, browseandretrievemultimediainformationis increased.Textembeddedinimagesandvideoframes canhelptoidentifytheimageinformation(e.g. somebody'snameappearinginanimage)ortodisplay informationwhichisindependentoftheimage(e.g. important news during the transmission of a movie). In common,textinimagescanbecategorizedintotwo groups:artificialtextandscenetext.Scenetextispart of the image, and appears unintentionally, like in traffic signsetc.whereasartificialtextiscreatedseparately fromtheimageandislaidoveritinalaterstage,like thenameofajournalistatsomepointinanews program.Artificialtextisusuallyaverygoodkeyto indeximageorvideodatabases.Thispaperaimsto detecttextinsceneimagesofmulti-orientations,i.e., the images containing text in multiple text lines and the imageswithproblemsofskewedangleanddistorted images.Thepreprocessedalgorithmusesskew correctionandimageenhancementalgorithmstopre-processtheimagesforbetterrecognitionaccuracy. Everyimagehastobepreprocessedbeforefurther processes.Thepreprocessingtechniqueisalsotested fortheaccuracyandtimetakentoperformthe detection. KeyWords:TextDetection,Multi-orientation,scene images,OCR,Skew-correction,Cropping,and Enhancement. 1. INTRODUCTION Methods for scene text localization and recognition aim to findallareasinanimage(oravideo)thatwouldbe consideredastextbyahuman,markboundariesofthe areas (usually by rectangular bounding boxes) and output asequenceof(Unicode)charactersassociatedwithits content. A new MSER based scene text detection method is used to detect MSER regions which also can used to detect thetextintheimage[1].Belowfig.showsthedifference between printed document and scene text localization and recognition.Fig -1: (a) a scanned book page. (b) a sample image from dataset[2]. Scenetextlocalizationandrecognition(alsoknownas textrecognitionandlocalizationinreal-worldimages, environment scene OCR or text-in-the-wild problem) is an open problem, unlike printed document recognition where widelyusedsystemsarecapableofrecognizemorethan 99%ofcharacters[3]correctly.Textlocalization techniquescanbeclassifiedintotwoimportantgroups- techniquesbasedonaslidingwindowandtechniques based on regions (characters) grouping. Techniques in the first category [4] utilize a window which is moved over the imageandtheexistenceoftextisestimatedontheorigin of local image features. 2.LITERATURE SURVEY Text detection and recognition in natural scene images is, however,achallenging,unsolvedcomputervision problem.Scenetexthascomplexbackground,imageblur, partiallyoccludedtext,variationsinfont-styles,image International Research Journal of Engineering and Technology (IRJET)e-ISSN: 2395 -0056 Volume: 02 Issue: 04 | July-2015 www.irjet.netp-ISSN: 2395-0072 2015, IRJET.NET- All Rights Reserved Page 1358 noiseandvaryingilluminationasillustratedinFigure1 (b). Theend-to-endscenetextrecognitionproblemis dividedintoatext-detectionandtextrecognitiontask. Text-detectionisapreprocessingstepforthetext-recognitiontask.Thetextdetectorhastolocatewordsin natural sceneimages.The text-recognizer predicts a word shown in a cropped image patch which is retrieved by the detector.Severalcompetitionswithinthescopeofthe ICDARhavebeenorganizedtoassessthestate-of-the-art. Hence, there are publicly available datasets like the ICDAR 2003 dataset [7] or the Street View Text (SVT) dataset [7], onwhichobjectivecomparisonsofstate-of-the-art methods can be done. 2.1 Text Detection Methods Text-detection methods can be divided intoRegion based andTexture based methods [9, 10]. Regionbasedmethodsrelyonimage segmentation.PixelsaregroupedtoCCs(Character Candidates).Thesecandidatesarefurthergroupedto candidatewordsandtextlinesbasedongeometric features..Textconfidencemapsarecreated,whichare post-processedandconvertedintowordlevelbounding boxes.Furthermorethereexisthybridapproacheswhich groupconnectedcomponentsintowordcandidatesand useatexturebasedclassifiertovalidatethesecandidates. Sinceregionbasedapproachesrelyonimage segmentation they are subject to segmentation errors.Texture based methods use sliding windows, if its stepsizeortheimagescalesarenotestimatedcorrectly regardingthetextsize,textdetectionerrorscanoccur. CandidatetextlinesaregeneratedbyaMSER segmentationandgroupedtotextlinehypotheses,which are verified by a texture based method. Second entry is the text detectorproposedby Yietal.[11](F-score:62.32%). Thismethodgeneratescandidatepatcheswitha segmentationmethodpresentedin[11].BasedonCanny edgesandk-meanscolorclusteringtextcomponentsare segmented.Candidatepatchesareclassifiedbyan AdaBoosttextureclassifierbasedongradientfeaturesin block-patterns proposed by Chen et al. [12]. A)Region Based Methods: TheSWTisanimageoperatorwhichassignsa strokewidthtoeachpixelof animage(seeFigure2).The strokewidthisdeterminedbyshootingraysonedgesin thedirectionofthegradient.Iftherayhitsanedgewith thesamegradientdirectionmodulo180degrees,the lengthoftheraydeterminesthestrokewidthofthe underlyingpixels.Pixelswhichareadjacenttoeachother belongtothesamecharacteriftheirstrokewidthis similar.Hence,CCsareformedbysegmentingthestroke widthmap.AdjacentCCsaregroupedtowordsifthe medianstrokewidthratioof twoCCsdoesnotexceed2:0, theheightratioofthetwocomponentsissmallerthan2:0 anddistanceandaveragecolordifferencearewithin Thresholds learned from the training set. Fig 2: The SWT for a stroke shown in (a) is computed by shooting a ray from the edges in the gradient direction (b). If another edge with the same gradient direction is found along the ray, the width between the start- and end-point is assigned as stroke width to the underlying pixels (c). Chenetal.[12]proposeatextdetectionmethodusing MSER. The outlines of MSER are enhanced by Canny edges. ThismakesMSERlesssensitivetoblur.Basedon geometric cues these candidate character regions are then groupedtowordsandtextlines.Neumannetal.[13] propose ERs for segmenting regions. ERs are extractedon theRGB,HSIandgradientimagestoretrievecharacter candidateregions.InsteadofusingheuristicsasEpshtein etal.[9]forlabelingtext,an AdaBoostclassifierbasedon geometricfeaturesisused.Text-CCsarethengroupedto words.Yaoetal.[14]generalizestheSWTforarbitrary orientedtext.Camshiftisusedtofindcharacter orientation,lengthofmajorandminoraxisand barycenter. International Research Journal of Engineering and Technology (IRJET)e-ISSN: 2395 -0056 Volume: 02 Issue: 04 | July-2015 www.irjet.netp-ISSN: 2395-0072 2015, IRJET.NET- All Rights Reserved Page 1359 CCsarefilteredandgroupedtochains.Falsepositive chains are eliminated by a Random Forest (RF) classifier. B)Texture Based Methods: Chenetal.[12]proposeanAdaBoostclassifier basedonx-andy-derivativefeatures,intensityfeatures andedge-linkingfeaturescomputedinblock-patternsof thedetector-window(seeFigure3).Text-patcheshave low-entropy in these block-patterns. Fig 3: Block-patterns proposed from Chen et al. [12]. C)Hybrid approaches: Minettoetal.[15]proposeahybridapproach where CCs are labeledwithshape descriptors like zernike moments as text or non-text regions. CCs are then grouped totextlinecandidates.Thesecandidatesarethenverified bydiscriminativeclassifiersusinganovelFuzzyHOG(F-HOG) feature set. Gonzalezetal.[16]proposeahybridlocalization methodbasedonacombinationofMSERandlocal adaptivethresholdingforsegmentation.CCsarefiltered based on geometric features such as aspect ratio or occupy ratio.Hypothesesarecreatedwhichareverifiedbya texture based classifier. 2.2 Text Recognition Similartotext-detectionsystems,recognition systemscanbedividedintoregion-basedandtexture-basedapproaches.Texture-basedapproachesuselow-levelvisionfeaturestoassigncandidatelabelstotext-windows.Onthecontrary,region-basedmethodsassigna candidate label to segmented CCs. Similar to texture-based methods; a language model is then used to predict the final word. A) Region Based Methods Neumann et al. [13] propose a recognition system whichclassifiesMSERregionswhicharedetectedbya text-detectionsystemastext-components.Boundariesof normalizedMSERregionsareinsertedintoseparate imagesbasedontheirorientation.Intotal8orientations areused.Eachorientationimageisfilteredwitha Gaussian and sub-sampled in a 5 X 5 image. Hence, a 5 X 5 X 8 = 200 dimensional feature vector is used for classifying MSERregions.Toimprovetheprediction,alanguage model based on bigrams is used. B) Texture Based Methods Wangetal.[8]proposeHOGfeatureswitha RandomFernsclassifiertodetectandclassifytextinan end-to-endsetting.Themulticlass-detectoristrainedon croppedsyntheticandreal-worldletters.Non-maximaof thedetectorresponsesaresuppressed.Theremaining lettersarethencombinedinaPictorialStructure framework,wherelettersarepartsofwords.Foreach wordinadictionary,themostplausiblecharacter responses are found in the image. Detected words are then rescored based on geometric information and non-maxima suppressionisdonetoremoveoverlappingword-responses. Minettoetal.[15]propose,similartoWangetal. [8],HOGfeatureswithSVMstodetectandclassifytext. InsteadofusingPictorialStructures,CRFsareusedto predictthefinalgeometricinformationandclassifier informationtopredictacandidateword.Thefinalword predictionisdonebyretrievingthewordinadictionary with the shortest edit distance. Yildirim et al. [11] propose Hough Forest (HF) for textrecognition.Cross-scalebinaryfeatures,whichare thresholdeddifferencesofregionsacrossdifferentscales, are proposed for word recognition. For predicting the final wordfromcandidate-responsesaCRFenergyfunctionis minimized,whichincorporatesgeometricalinformation and lexicon priors. 3. PRE PROCESSING TECHNIQUE Thepre-processingphasewhichisnotdefinedin anyotherbeforetechniquesdescribedintheabove sections.Thebasicideaistoenhancetheinputimageso thattheaccuracyofthetextlocalizationandrecognition canbeincreased.Figure4representsthepre-processing flow. International Research Journal of Engineering and Technology (IRJET)e-ISSN: 2395 -0056 Volume: 02 Issue: 04 | July-2015 www.irjet.netp-ISSN: 2395-0072 2015, IRJET.NET- All Rights Reserved Page 1360 Fig 4: Pre Processing Steps Fig5: Input distorted image with skew and Noise Skew Correction: Animportantpartofanydocumentrecognition system is detection and correction of skew in the image of a page. Page layout analysis and preprocessing operations usedforcharacterrecognitiondependonanupright imageor,atleast,knowledgeoftheangleofskew.One exampleofaprocesswhichisspoiltbyskew istheuseof horizontal and vertical projection profiles. Fig 6: Image after Skew Correction Noise Removal: Digitalimagesarepronetoavarietyoftypesof noise. Noise is the result of errors in the image acquisition processthatresultinpixelvaluesthatdonotreflectthe trueintensitiesoftherealscene.Thereareseveralways thatnoisecanbeintroducedintoanimage,dependingon how the image is created. For example: Iftheimageisscannedfromaphotographmadeon film,thefilmgrainisasourceof noise.Noisecanalso betheresultofdamagetothefilm,orbeintroduced by the scanner itself.If the image is acquired directly in a digital format, the mechanismforgatheringthedata(suchasaCCD detector) can introduce noise. Electronictransmissionofimagedatacanintroduce noise. Afterthepre-processingiscompletedthe enhancedimageisfetchedasinputtothenextprocesses. Thefurtherstagescanbeprocessedfortheenhanced imagesothattherecognitionaccuracycanbeimproved comparedtotheexistingtechniques.Figure7showsthe enhanced image. Fig 7: Enhanced Image after Pre-Processing Theimageisthenprocessedundervariousstagesto getthedesiredoutputallthosestepsaredepictedinthe fig 8. International Research Journal of Engineering and Technology (IRJET)e-ISSN: 2395 -0056 Volume: 02 Issue: 04 | July-2015 www.irjet.netp-ISSN: 2395-0072 2015, IRJET.NET- All Rights Reserved Page 1361 Fig 8: Flow of the Detection & Recognition Procedure3.1 MSER Region Detection Astheintensitydifferenceoftexttoits backgroundisnaturallyconsiderableandauniform intensityorcolorwithineverylettercanbeunderstood, MSER is a usual choice for text detection.While MSER has beenrecognizedasoneofthefinestregiondetectors becauseofitsrobustnessagainstscale,viewpointand lighting changes, it is perceptive to image blur. Thus, small lettersmaynotbedistinguishedordetectedincaseof defocusormotionblurbyusingplainMSERtoimagesof restricted resolution. Fig 9 shows the MSER Regions. 3.2 Character Candidates Extraction: Fortunately, rather than identifying the character, wecansimplychoosetheonethatismorepossibletobe charactersinaparent-childrenrelationship.Claimedthat suchpairwiserelationshipsmaynotbesufficientto eliminatenon-characterMSERs,andpruningshould exploitsomecomplicatedhigher-orderpropertiesoftext. Alternatively,thisindicatesthatthisprobabilitycanbe fastestimatedusingourregularizedvariationscheme withreasonableaccuracy.Figure10depictstheoutputof this stage. Fig 10: Character Candidates Extraction 3.3 Connected Component Analysis:CCAisawellutilizedalgorithminimage processingthatgathersandscansimagepixelsinlabeled componentsdependingonpixelconnectivity[16].An eight-pointCCAstepisusedtofindalltheobjectsinside thebinaryimageproducedfromtheearlierstage.The output of this step is an array of N objects. Fig 11 depicts the output of the CCA stage.3.4 Edge Detection:Ashandwrittentextisbasicallyplacedonwhite background,ithasapropensitytogivehighreactionto edge detection. Additionally, a meeting point of edges with theMSERregionsisgoingtogiveregionsthatareeven morepossibletofitintotext.Figure12depictstheedge International Research Journal of Engineering and Technology (IRJET)e-ISSN: 2395 -0056 Volume: 02 Issue: 04 | July-2015 www.irjet.netp-ISSN: 2395-0072 2015, IRJET.NET- All Rights Reserved Page 1362 detection. Fig 12: Edge Detection 3.5 Text Candidates Classification: Asitishardtotrainaneffectivetextclassifier usingsuchanunbalanceddatabase,mostofthenon-text candidatesneedtoberemovedbeforetrainingthe classifier. The proposed method uses a character classifier toapproximatetheposteriorprobabilitiesoftext candidatesrelatedtonon-textandgetridoftext candidateswithhighnon-text probabilities.Thefollowing featuresareusedtotrainthecharacterclassifier:text region height, width and aspect ratio, smoothness (defined astheaveragedifferenceofadjacentboundarypixels gradientdirections)andstrokewidthfeatures(including mean and variance of character stroke widths). Characters withsmallaspectratiossuchasi,jandlarelabeled asnegativesamples,asitisveryuncommonthatsome wordscomprisemanysmallaspectratiocharacters .Figure 13 depicts the extracted characters.

Fig 13: Classifying only Text 3.6 Text Region Extraction: Tocalculateaboundingboxofthetextarea,this paperinitiallymergesthesinglecharactersintoaunit connected element. This is completed using morphological closing. The region which contains the text will be outlined and extracted separately from the input image. 3.7 OCR: Thevariationoftextfromaclutteredscenecan efficientlyimproveOCRaccuracies.Sincetheprevious stagesofthealgorithmalreadyextractedanappropriate segmentedtextregion,thispaperutilizesthebinarytext mask to further enhance the recognition accuracy Figure14showstheextractedregionwiththe appropriate OCR recognized characters. Fig 14: Extracted Text Region and the Recognized Text 4. EXPERIMENTAL RESULTS The pre processed technique is tested for two metrics which includes the recognition accuracy and the time taken to recognize the text. The testing is completely done on the images form ICDAR 2003 data set. For comparison we considered the traditional techniques. The below table shows the number of input images and the correctly recognized versus the faulty recognized inputs. The results are shown in both table 1 and figure 13. International Research Journal of Engineering and Technology (IRJET)e-ISSN: 2395 -0056 Volume: 02 Issue: 04 | July-2015 www.irjet.netp-ISSN: 2395-0072 2015, IRJET.NET- All Rights Reserved Page 1363 TABLE -1 Comparison of Recognition Accuracies between Various Techniques. Techniques Used No. Of Inputted Images Recognized Faults MSER Detection100 9010 SWT 928 Hybrid Approach955 SVMs 8812 Hough Forest8713 Pre processed method 973 Fig 13: Faults Vs Recognition accuracies Duetheimagepreprocessingstagetheimageis enhancedtoanextentwherethelaterprocessingstages haveimprovedtheaccuracy,whereastheexisting techniques lacks this pre processing and the processing of thedistortedimagesledthemtodeducetherecognition accuracy. 5. CONCLUSION: Thispaperpresentsanewscenetextdetection methodwithseveralnoveltechniques.First,this technique uses the pre-processing for enhancing the input image for better accuracies and then it utilized the fast and accurateMSERspruningalgorithmthatenablesusto detectmostcharactersevenwhentheimageisinlesser quality.Second,weutilizeatraditionalself-training distancemetriclearningmethodthatcanlearnclustering thresholdanddistanceweightssimultaneously;text candidatesareconstructedbyclusteringcharacter candidatesbythesingle-linkalgorithmusingthelearned parameters.Third,weputforwardtouseacharacter classifiertoestimatetheposteriorprobabilityoftext candidatecorrespondingtonon-textandeliminatetext candidateswithhighnon-textprobability,whichhelpsto buildamorepowerfultextclassifier.Finally,by integratingtheabovenewtechniques,webuildarobust scenetextdetectionsystemthatexhibitssuperior performance over state-of-the-art methods on a variety of publicdatabases.ThesystemisevaluatedontheICDAR datasets,outperformingstate-of-the-artmethodsintext detection,recognitionandend-to-enddictionarydriven scenetextdetection.Hence,theresearchquestioncanbe answered:itispossibletoachievecompetitiveresults withacombinationoflocalfeatures;regionbased approaches and learned local features.REFERENCES: [1]Xu-ChengYin,XuwangYin,KaizhuHaung,and Hong-WeiHao.Robusttextdetectioninnatural scene images, May 2014. [2]S. M. Lucas, A. Panaretos, L. Sosa, A. Tang, S. Wong, andR.Young.ICDAR2003robustreading competitions. In ICDAR 2003, page 682, 2003. [3]X. Lin. Reliable OCR solution for digital content re-mastering.InSocietyofPhoto-Optical InstrumentationEngineers(SPIE)Conference Series, Dec. 2001. [4][4]X. Chen and A. L. Yuille.Detecting and reading text in natural scenes. CVPR, 2:366{373, 2004. [5]Y.-F. Pan, X. Hou, and C.-L. Liu. A robust system to detectandlocalizetextsinnaturalsceneimages. InDocumentAnalysisSystems,2008.DAS'08. TheEighthIAPRInternationalWorkshopon, pages 35 -42, sept. 2008. International Research Journal of Engineering and Technology (IRJET)e-ISSN: 2395 -0056 Volume: 02 Issue: 04 | July-2015 www.irjet.netp-ISSN: 2395-0072 2015, IRJET.NET- All Rights Reserved Page 1364 [6]B.Epshtein,E.Ofek,andY.Wexler.Detectingtext innaturalsceneswithstrokewidthtransform.In CVPR 2010, pages 2963 {2970, 6 2010.[7]L.P.Sosa,S.M.Lucas,A.Panaretos,L.Sosa,A. Tang,S.Wong,andR.Young.ICDAR2003Robust ReadingCompetitions.InProceedingsofthe InternationalConferenceonDocumentAnalysis andRecognition,pages682687.IEEEPress, 2003.[8]K.Wang,B.Babenko,andS.Belongie.End-to-end scenetext recognition. In Proceedings of the IEEE InternationalConferenceonComputerVision, pages 14571464, Barcelona, Spain, 2011. [9]B.Epshtein,E.Ofek,andY.Wexler.DetectingText inNaturalSceneswithStrokeWidthTransform. InProceedingsoftheIEEEConferenceon ComputerVisionandPatternRecognition,pages 29632970. IEEE, 2010. [10]M.Anthimopoulos,B.Gatos,andI. Pratikakis.DetectionofArtificialandSceneText in Images and Video Frames. Pattern Analysis and Applications, 16(3):431446, 2013. [11]C.YiandY.Tian.TextStringDetection fromNaturalScenesbyStructure-BasedPartition andGrouping.IEEETransactionsonImage Processing, 20(9):25942605, 2011. [12]H.Chen,S. S.Tsai,G.Schroth,D.M.Chen, R. Grzeszczuk, and B. Girod. Robust Text Detection in Natural Images with Edge-Enhanced Maximally StableExtremalRegions.InInternational ConferenceonImageProcessing,pages26092612. IEEE, 2011. [13]L.NeumannandJ.Matas.AMethodfor TextLocalizationandRecognitioninReal-World Images. In Proceedings of the Asian Conference on Computer Vision, pages 770783. Springer, 2011. [14]C.Yao,X.Bai,W.Liu,Y.Ma,andZ.Tu. DetectingTextsofArbitraryOrientationsin NaturalImages.InIEEEConferenceonComputer Vision and Pattern Recognition, pages 10831090, 2012. [15]R.Minetto,N.Thome,M.Cord,J.Stolfi,F. Precioso,J.Guyomard,andN.J.Leite.Text DetectionandRecognitioninUrbanScenes.In InternationalConferenceonComputerVision Workshops, pages 227234. IEEE, 2011. [16]Gonzalez,L.M.Bergasa,J.J.Yebes,andS. Bronte.TextLocationinComplexImages.In InternationalConferenceonPatternRecognition, pages 617620, 2012. Authors Profiles: M.Fouziaiscurrentlypursuingher M.TechdegreeinComputerScience andEngineeringwithspecializationin ArtificialIntelligencefromJawaharlal NehruTechnologicalUniversity, Anantapur,India.ShedidherB.Tech DegreeinInformationTechnology fromKSRMcollegeofengineering KADAPA, India in 2013. C.ShobaBinduisanAssociate ProfessorofComputerScienceand EngineeringatJawaharlalNehru TechnologicalUniversityCollegeof Engineering,Ananthapuramu.She obtainedherBachelordegreein ElectronicsandCommunication Engineering,MasterofTechnology inComputerSciencefrom JawaharlalNehruTechnologicalUniversityHyderabad andPh.D.inComputerScienceandEngineeringfrom JawaharlalNehruTechnologicalUniversity Anantapuramu.ShehaspublishedseveralResearch papersinNational/InternationalConferencesand Journals. Her research interests includes network security and Wireless communication systems.

IRJET-Text Detection in Multi-Oriented Natural Scene Images

Documents

Transcript of IRJET-Text Detection in Multi-Oriented Natural Scene Images