1 NLP in Thailand by Asanee Kawtrakul Kasetsart University.

Post on 24-Dec-2015

216 views 2 download

Transcript of 1 NLP in Thailand by Asanee Kawtrakul Kasetsart University.

1

NLP in Thailandby Asanee Kawtrakul

Kasetsart University

2

Thailand Language Features What we do and the problems The main actors Research Model and Infrastructure What do we need more?

Outline

3

Dialects and Tone Isolating language Uninflected Monosyllabic No word delimiters Same form but several functions Same form but several meanings

Thai Language Characteristics

4

Thai Language Characteristics

Grammar coverage Word formation/Recognition

Compound words vs Sentences Proper name vs Common noun Loan word (transliterated foreign words)

without special orthography

5

North 18.8%

Dialects

North-East34.2%Central

33.7%

South13.3%

6

What do we do?

TEXT PROCESSING

SPEECH PROCESSING

IMAGE PROCESSING

7

Text ProcessingTheoretical Application

Morphological Level Sentence Level Lexical database Corpus

Automatic Indexing Automatic Clustering Machine Translation Data Mining Information Extraction Information Retrieval

8

Text Processing and Problems

Statistical Based Approach

Knowledge Based Approaches

Lack of Legal Corpus Small Corpus

Lack of Standard(Pos, Semantic Concepts)

Redundancy work

9

Speech Processing

Speech recognitionSpeech generation

10

Speech Processing and Problems

Recognition

Generation

Not Only Dialect but Tone

Isolated word not Continuous speech

Word Boundary detection

11

Image Processing

Thai optical character recognitionHand written recognition

12

OCR and Problems

Isolated Characters

13

The Main Actors Universities NECTEC (National Electronic and

Computer Technology Center) , Ministry of Science and Technology Environment

SIGNLP

14

The Main Actors More than 50 experienced researchers

(minimum 5 years research) More than 100 young researchers

15

Financial Supporter National Electronics and Computer Technology

Center (NECTEC) National research council of Thailand (NRCT) Kasetsart University Research and Development

Institute (KURDI) Thai Research Foundation (TRF) etc.

16

Research Model and Infrastructure

Short Term

Long Term

Simple But Work Collaboration between end users,

universities and Funding Agency (including Private sectors)

Robust and very large scale Enlarge the number of

researchers

17

What do we need more? Share resources (Corpus, Dictionary,

Tools, etc.) Share Experiences and Knowledge Set Big Umbrella and distribute workload Establish research network Partnership

18

ConclusionMost Thai uses Thai LanguageThai Language Processing has good futu

re in the market IF…….

19

We have more Collaborative workNLP Market for 1/2 of 60 millions