1 NLP in Thailand by Asanee Kawtrakul Kasetsart University.

19
1 NLP in Thailand by Asanee Kawtrakul Kasetsart University

Transcript of 1 NLP in Thailand by Asanee Kawtrakul Kasetsart University.

Page 1: 1 NLP in Thailand by Asanee Kawtrakul Kasetsart University.

1

NLP in Thailandby Asanee Kawtrakul

Kasetsart University

Page 2: 1 NLP in Thailand by Asanee Kawtrakul Kasetsart University.

2

Thailand Language Features What we do and the problems The main actors Research Model and Infrastructure What do we need more?

Outline

Page 3: 1 NLP in Thailand by Asanee Kawtrakul Kasetsart University.

3

Dialects and Tone Isolating language Uninflected Monosyllabic No word delimiters Same form but several functions Same form but several meanings

Thai Language Characteristics

Page 4: 1 NLP in Thailand by Asanee Kawtrakul Kasetsart University.

4

Thai Language Characteristics

Grammar coverage Word formation/Recognition

Compound words vs Sentences Proper name vs Common noun Loan word (transliterated foreign words)

without special orthography

Page 5: 1 NLP in Thailand by Asanee Kawtrakul Kasetsart University.

5

North 18.8%

Dialects

North-East34.2%Central

33.7%

South13.3%

Page 6: 1 NLP in Thailand by Asanee Kawtrakul Kasetsart University.

6

What do we do?

TEXT PROCESSING

SPEECH PROCESSING

IMAGE PROCESSING

Page 7: 1 NLP in Thailand by Asanee Kawtrakul Kasetsart University.

7

Text ProcessingTheoretical Application

Morphological Level Sentence Level Lexical database Corpus

Automatic Indexing Automatic Clustering Machine Translation Data Mining Information Extraction Information Retrieval

Page 8: 1 NLP in Thailand by Asanee Kawtrakul Kasetsart University.

8

Text Processing and Problems

Statistical Based Approach

Knowledge Based Approaches

Lack of Legal Corpus Small Corpus

Lack of Standard(Pos, Semantic Concepts)

Redundancy work

Page 9: 1 NLP in Thailand by Asanee Kawtrakul Kasetsart University.

9

Speech Processing

Speech recognitionSpeech generation

Page 10: 1 NLP in Thailand by Asanee Kawtrakul Kasetsart University.

10

Speech Processing and Problems

Recognition

Generation

Not Only Dialect but Tone

Isolated word not Continuous speech

Word Boundary detection

Page 11: 1 NLP in Thailand by Asanee Kawtrakul Kasetsart University.

11

Image Processing

Thai optical character recognitionHand written recognition

Page 12: 1 NLP in Thailand by Asanee Kawtrakul Kasetsart University.

12

OCR and Problems

Isolated Characters

Page 13: 1 NLP in Thailand by Asanee Kawtrakul Kasetsart University.

13

The Main Actors Universities NECTEC (National Electronic and

Computer Technology Center) , Ministry of Science and Technology Environment

SIGNLP

Page 14: 1 NLP in Thailand by Asanee Kawtrakul Kasetsart University.

14

The Main Actors More than 50 experienced researchers

(minimum 5 years research) More than 100 young researchers

Page 15: 1 NLP in Thailand by Asanee Kawtrakul Kasetsart University.

15

Financial Supporter National Electronics and Computer Technology

Center (NECTEC) National research council of Thailand (NRCT) Kasetsart University Research and Development

Institute (KURDI) Thai Research Foundation (TRF) etc.

Page 16: 1 NLP in Thailand by Asanee Kawtrakul Kasetsart University.

16

Research Model and Infrastructure

Short Term

Long Term

Simple But Work Collaboration between end users,

universities and Funding Agency (including Private sectors)

Robust and very large scale Enlarge the number of

researchers

Page 17: 1 NLP in Thailand by Asanee Kawtrakul Kasetsart University.

17

What do we need more? Share resources (Corpus, Dictionary,

Tools, etc.) Share Experiences and Knowledge Set Big Umbrella and distribute workload Establish research network Partnership

Page 18: 1 NLP in Thailand by Asanee Kawtrakul Kasetsart University.

18

ConclusionMost Thai uses Thai LanguageThai Language Processing has good futu

re in the market IF…….

Page 19: 1 NLP in Thailand by Asanee Kawtrakul Kasetsart University.

19

We have more Collaborative workNLP Market for 1/2 of 60 millions