Modi script character recognition

47
“MODI SCRIPT CHARACTER RECOGNITION” BY- Neha Kulkarni PICT, Pune

Transcript of Modi script character recognition

Page 1: Modi script character recognition

“MODI SCRIPT CHARACTER RECOGNITION”

BY-Neha Kulkarni

PICT, Pune

Page 2: Modi script character recognition

CAN YOU READ THIS ????

Page 3: Modi script character recognition

TABLE OF CONTENTS Introduction Aim Motivation Objectives Challenges Related work done Benefits Architecture UML diagrams Implementation details Demo of working modules Conclusion References

Page 4: Modi script character recognition

MODI SCRIPT

INTRODUCTION• Modi is an ancient script• Crores of Modi documents • Origin : 12th century and used uptil the

20th century• No machine transliterator available• Documents wilting• Recent OCR techniques being used for

revival

Page 5: Modi script character recognition

CHARACTER SET OF MODI SCRIPT

Page 6: Modi script character recognition

STYLES OF MODI SCRIPT

BAHAMANIKALIN

CHITNISI

Page 7: Modi script character recognition

PESHVEKALIN

ANGLAKALIN

Page 8: Modi script character recognition

AIM» The aim of this project is to recognize individual

Modi characters from Modi document.

Page 9: Modi script character recognition

MOTIVATION» MODI is an ancient script (13th century to 1950).» An article is Sakal newspaper dated 9th July

2014, was the driving force behind this project.» Due to immense importance of historical

research there is a need to transliterate Modi Script documents into Devanagari script.

» Manual transliteration is extremely time consuming and costly (approx Rs. 2500/- per page).

Page 10: Modi script character recognition

OBJECTIVES» Study of existing systems» Study of Modi Script» Taking sample inputs of Modi Documents from

various people and experts» Processing these inputs with the help of image

processing algorithms and recognising using Neural Networks

Page 11: Modi script character recognition

CHALLENGES» Negligible research regarding Modi Script in

Information Technology» No previous knowledge of Modi script» Handwriting differs from person to person» Modi, being a cursive script is difficult to

process with the help of algorithms» No punctuations in the script

Page 12: Modi script character recognition

RELATED WORK» “An Approach for Recognizing Modi Lipi using

Otsu’s Binarization Algorithm and Kohenen Neural Network”, is a proposed system in alpha stage which claims to give output with an accuracy of 70%.

» Drawback of this system is that only 22 Modi Script characters have been considered and it also proved to be less efficient in recognising similar looking characters.

» No commercially viable Modi Script Recognition System is available.

Page 13: Modi script character recognition

BENEFITS» The “ 7/12 cha utara “ or land records that are

mostly in Modi Script would be transliterated.» Many long standing legal disputes would be settled

due to this» Many historical secrets would be unearthed because

research work in Modi Script would become easy» New light would be thrown on the Governance,

Economy, Rule of our ancestors which would be beneficial for everyone

Page 14: Modi script character recognition

ARCHITECTURE

Page 15: Modi script character recognition

UML DIAGRAMS

» CLASS DIAGRAM

Page 16: Modi script character recognition

» STATE DIAGRAM

Page 17: Modi script character recognition

» USE-CASE DIAGRAM

Page 18: Modi script character recognition

» SEQUENCE DIAGRAM – FULL SYSTEM

Page 19: Modi script character recognition

» SEQUENCE DIAGRAM – FAILURE

Page 20: Modi script character recognition

» SEQUENCE DIAGRAM – HCR

Page 21: Modi script character recognition

» ACTIVITY DIAGRAM

Page 22: Modi script character recognition

IMPLEMENTATION DETAILS

INPUT

SYSTEM

GREY-SCALE

BINARIZE

CHAIN CODE FOR FEATURE EXTRACTION

KOHONEN NEURAL NETWORK

OUTPUT

Page 23: Modi script character recognition

PHASES OF MSCR

IMAGE ACQUISATION

GREYSCALING

OTSU THRESHOLDING

CHAIN CODE FEATURE

EXTRACTION

KOHONEN NEURAL NETWORK

RECOGNITION

Page 24: Modi script character recognition

PHASES OF MSCR

IMAGE ACQUISATION

GREYSCALING

OTSU THRESHOLDING

CHAIN CODE FEATURE

EXTRACTION

KOHONEN NEURAL NETWORK

RECOGNITION

Page 25: Modi script character recognition

IMAGE ACQUISITION PHASE

• Acquire a scanned image• Store it in a buffer• Forward it to preprocessing phase

Image acquired using scanner

Page 26: Modi script character recognition

PREPROCESSING PHASE

PURPOSE : • Suppress unwanted distortions• Enhance image quality• In MSCR, preprocessing includes:

Grayscale conversion Otsu’s binarization

Page 27: Modi script character recognition

PHASES OF MSCR

IMAGE ACQUISATION

GREYSCALING

OTSU THRESHOLDING

CHAIN CODE FEATURE

EXTRACTION

KOHONEN NEURAL NETWORK

RECOGNITION

Page 28: Modi script character recognition

GRAYSCALE CONVERSION

• Single intensity value for each pixel

Gray = 0.2126 * R + 0.7152 * G + 0.0722 * B

BINARIZATION• Converting grayscale image to bi-level image• Two possible value for a single bit – 0 or 1• Performance of MSCR depends on accuracy of

this process• Purpose : extract text from image, remove

noise and reduce size of image

Page 29: Modi script character recognition

PHASES OF MSCR

IMAGE ACQUISATION

GREYSCALING

OTSU THRESHOLDING

CHAIN CODE FEATURE

EXTRACTION

KOHONEN NEURAL NETWORK

RECOGNITION

Page 30: Modi script character recognition

OTSU’S THRESHOLDING

• Converting a grayscale image to monochrome

• Algorithm : o Iterate through all possible threshold

valueso Calculate measure of spread (variance)

for the pixel levels o Find threshold value where sum of

background and foreground spread is minimum

o Calculate within class varianceo Select final threshold value depending

on minimum variance

Page 31: Modi script character recognition

Histogram for 6 level gray image

Page 32: Modi script character recognition

Result of Otsu’s method

Page 33: Modi script character recognition

PHASES OF MSCR

IMAGE ACQUISATION

GREYSCALING

OTSU THRESHOLDING

CHAIN CODE FEATURE

EXTRACTION

KOHONEN NEURAL NETWORK

RECOGNITION

Page 34: Modi script character recognition

CHAIN CODE ALGORITHM FOR FEATURE EXTRACTION

This representation is based on 4-connectivity or 8- connectivity of the segments.

In a clockwise direction and assigning a direction to the segments connecting every pair of pixels.

Page 35: Modi script character recognition
Page 36: Modi script character recognition

PHASES OF MSCR

IMAGE ACQUISATION

GREYSCALING

OTSU THRESHOLDING

CHAIN CODE FEATURE

EXTRACTION

KOHONEN NEURAL NETWORK

RECOGNITION

Page 37: Modi script character recognition

RECOGNITION PHASE• Process of matching segmented characters

with data set used to train the network• When character image matches with the

data set successful recognition• Recognition is done by using Kohonen neural

network trained from actual drawn letters to recognise Modi characters from input characters

• Only one output neuron from a number of input neurons

Page 38: Modi script character recognition

Demo of working modules» Home Page

Page 39: Modi script character recognition

» Handwritten Character Recognition Page

Page 40: Modi script character recognition

» Input image recognition page

Page 41: Modi script character recognition

» Text editor

Page 42: Modi script character recognition

» Help page

Page 43: Modi script character recognition

CONCLUSION» The system has an overall Recognition Percentage of 85% as compared

to the efficiency rate of 72% of the previously proposed Modi Character recognition system using Otsu Binarization and Kohonen Neural Networks.

» This improvement in the efficiency is due to the additional use of the Chain Code algorithm.

» The system finds huge applications for historians, farmers, research enthusiasts and common man alike.

» While recognition of handwritten characters is an important task, it however is not the final stage in linguistic research. Transliteration of the recognized Modi characters into the common and easily readable Devanagari script is the next logical step. We have begun work on the same using SAX parser and xml and the future is surely very bright in this field.

Page 44: Modi script character recognition

REFERENCES» Sidra Anam, Saurabh Gupta, “An Approach for Recognizing Modi Lipi using Ostu’s Binarization Algorithm and

Kohenen Neural Network”, International Journal of Computer Applications (0975 – 8887) ,Volume 111 – No 2, February

» Gupta, A., Srivastava, M. , Mahanta, C. , “Offline handwritten character recognition using neural network” , Computer Applications and Industrial Electronics (ICCAIE), 2011 IEEE International Conference on Date of Conference: 4-7 Dec. 2011, Print ISBN: 978-1-4577-2058-1

» Prof. Mrs. Snehal R. Rathi, Rohini H.Jadhav, Rushikesh A. Ambildhok,"Recognition and Conversion of Handwritten Modi Characters “

» International Journal of Technical Research and Applications e-ISSN: 2320-8163, www.ijtra.com Volume 3, Issue 1 (Jan-Feb 2015), PP. 128-131)

» D.N.Besekar, Dr. R.J.Ramteke, "A Chain Code Approach for Recognizing Modi Script Numerals”, Research Paper,

ISSN – 2249-555X

» Amritha Sampath, C. Tripti, V. Govindaru, “Online Handwritten Character Recognition for Malayalam”, CCSEIT '12 Proceedings of the Second International Conference on Computational Science, Engineering and Information Technology, Pages 661-664, ACM New York, NY, USA ©2012 , ISBN: 978-1-4503-1310-0

» Bindu S. Moni, G. Raju, “Handwritten Character Recognition System using a Simple Feature”, ICACCI '12 Proceedings of the International Conference on Advances in Computing, Communications and Informatics, Pages 728-734, ACM New York, NY, USA ©2012, ISBN: 978-1-4503-1196-0

Page 45: Modi script character recognition

» Cinthia O. de A. Freitas, Luiz S. Oliveira, Simone B. K. Aires, Flávio Bortolozzi, “Zoning and Metaclasses for Character Recognition”, SAC '07 Proceedings of the 2007 ACM symposium on Applied computing, Pages 632-636, ACM New York, NY, USA ©2007, ISBN:1-59593-480-4

» Samit Kumar Pradhan, Atul Negi, “A syntactic PR approach to Telugu handwritten character recognition”,

DAR '12 Proceeding of the workshop on Document Analysis and Recognition, Pages 147-153, ACM New York, NY, USA ©2012, ISBN: 978-1-4503-1797-9

» Dayashankar Singh, Maitrayee Dutta, Sarvpal H. Singh, “Neural network based handwritten hindi

character recognition system”, COMPUTE '09 Proceedings of the 2nd Bangalore Annual Compute Conference, Article No. 15, ACM New York, NY, USA ©2009, ISBN: 978-1-60558-476-8

» Manisha S. Deshmukh, Manoj P. Patil, Satish R. Kolhe, “Off-line Handwritten Modi Numerals Recognition using Chain Code”, WCI '15 Proceedings of the Third International Symposium on Women in Computing and Informatics, Pages 388-393, ACM New York, NY, USA ©2015,ISBN: 978-1-4503-3361-0

» The Times of India, Pune Edition, “Band of researchers, enthusiasts strive to keep Modi script alive”, TNN | Feb 21,2014, 05.48 AM IST, “timesofindia.indiatimes.com/city/pune/Band-of-researchers-enthusiasts-strive-to-keep-Modi-script-alive/articleshow/30761335.cms”, Accessed 8 March 2015

» Sakal News Paper(9th July 2014) , Accessed 8 March 2014

» Lulu C. Munggaran, SuryariniWidodo, Cipta A.M and Nuryuliani, “Handwritten Pattern Recognition Using Kohonen Neural Network Based on Pixel Character”, (IJACSA) International Journal of Advanced Computer Science and Applications,Vol. 5, No. 11, 2014.

Page 46: Modi script character recognition

A MODI Document

Page 47: Modi script character recognition

Thank You !!!