Post on 24-Dec-2015
1
NLP in Thailandby Asanee Kawtrakul
Kasetsart University
2
Thailand Language Features What we do and the problems The main actors Research Model and Infrastructure What do we need more?
Outline
3
Dialects and Tone Isolating language Uninflected Monosyllabic No word delimiters Same form but several functions Same form but several meanings
Thai Language Characteristics
4
Thai Language Characteristics
Grammar coverage Word formation/Recognition
Compound words vs Sentences Proper name vs Common noun Loan word (transliterated foreign words)
without special orthography
5
North 18.8%
Dialects
North-East34.2%Central
33.7%
South13.3%
6
What do we do?
TEXT PROCESSING
SPEECH PROCESSING
IMAGE PROCESSING
7
Text ProcessingTheoretical Application
Morphological Level Sentence Level Lexical database Corpus
Automatic Indexing Automatic Clustering Machine Translation Data Mining Information Extraction Information Retrieval
8
Text Processing and Problems
Statistical Based Approach
Knowledge Based Approaches
Lack of Legal Corpus Small Corpus
Lack of Standard(Pos, Semantic Concepts)
Redundancy work
9
Speech Processing
Speech recognitionSpeech generation
10
Speech Processing and Problems
Recognition
Generation
Not Only Dialect but Tone
Isolated word not Continuous speech
Word Boundary detection
11
Image Processing
Thai optical character recognitionHand written recognition
12
OCR and Problems
Isolated Characters
13
The Main Actors Universities NECTEC (National Electronic and
Computer Technology Center) , Ministry of Science and Technology Environment
SIGNLP
14
The Main Actors More than 50 experienced researchers
(minimum 5 years research) More than 100 young researchers
15
Financial Supporter National Electronics and Computer Technology
Center (NECTEC) National research council of Thailand (NRCT) Kasetsart University Research and Development
Institute (KURDI) Thai Research Foundation (TRF) etc.
16
Research Model and Infrastructure
Short Term
Long Term
Simple But Work Collaboration between end users,
universities and Funding Agency (including Private sectors)
Robust and very large scale Enlarge the number of
researchers
17
What do we need more? Share resources (Corpus, Dictionary,
Tools, etc.) Share Experiences and Knowledge Set Big Umbrella and distribute workload Establish research network Partnership
18
ConclusionMost Thai uses Thai LanguageThai Language Processing has good futu
re in the market IF…….
19
We have more Collaborative workNLP Market for 1/2 of 60 millions