Quranic Arabic CorpusData Mining & Text Analytics
By Ismail Teladia & Abdullah Alazwari
Introduction What is the Quran?
Holy book for Muslims Revealed from 610 AD 6,236 verses, 114 chapters
Corpus Definition. Written or spoken language
What is the Quranic Arabic Corpus? 77,430 words of Quranic Arabic Researcher: Kais Dukes
Features of QAC: Morphological Annotation
Syntactic Treebank
Semantic Ontology
Morphological Annotation Word By Word
Grammar Syntax Morphology
Part-of-speech tagging Natural Language
Computing Technology
Details of Word’s Grammar Clicking the word gives more detail:
Type of WordTranslationGenderCaseRoot
In addition it shows the verse in which word appears and sound recitation of the verse.
Syntactic Treebank Verse by verse dependency graphs
Meaning of verse (broken down) Sentence structure (dependencies) Case
Mathematical graph theory
Ontology of Concepts Knowledge representation Relationship between concepts Historic places and people Named entity tagging E.g. Sun, Moon, Star, Earth classified
under “Astronomical Body” Uses predicate logic
Visual Representation of Ontology 300 linked concepts with 350 relations
Conclusion Uses of the QAC:
Analysing Arabic text of each verse Linking Arabic words through
dependencies Finding relationships between concepts
Website used daily by 2,500 people from 165 countries
Map Showing Usage of QAC
Bibliography http://corpus.quran.com
Thank you for listening!
Top Related