Burmese Project - fbcinc.com
Transcript of Burmese Project - fbcinc.com
![Page 1: Burmese Project - fbcinc.com](https://reader030.fdocuments.in/reader030/viewer/2022012409/616a521711a7b741a35137a1/html5/thumbnails/1.jpg)
Burmese Project
LEARNApril 23, 2018
Ye Min TunBurmese, LCI
1
![Page 2: Burmese Project - fbcinc.com](https://reader030.fdocuments.in/reader030/viewer/2022012409/616a521711a7b741a35137a1/html5/thumbnails/2.jpg)
Post‐Visit Report:
Memo 1
2
![Page 3: Burmese Project - fbcinc.com](https://reader030.fdocuments.in/reader030/viewer/2022012409/616a521711a7b741a35137a1/html5/thumbnails/3.jpg)
12‐Week Workshop‐ Challenges‐ Prototyping‐ Solutions ‐ Memo‐2: A Project Proposal
3
Design Thinking
![Page 4: Burmese Project - fbcinc.com](https://reader030.fdocuments.in/reader030/viewer/2022012409/616a521711a7b741a35137a1/html5/thumbnails/4.jpg)
Burmese Unicode Fonts and Lack of Available Keyboard
Is there any feasible Burmese font that we could use with any language tools available on the internet?
Do we have a Burmese keyboard for Unicode font?
The Analysis and Design Process Stage I: Technical Challenges for Burmese
![Page 5: Burmese Project - fbcinc.com](https://reader030.fdocuments.in/reader030/viewer/2022012409/616a521711a7b741a35137a1/html5/thumbnails/5.jpg)
We were able to:(1) Identify a Burmese Unicode font that can be
used for vocabulary development with tools and technology available on the internet
(2) Design and create new Burmese Unicode keyboard software at FSI
(3) Find an appropriate tool for Burmese word segmentation
(4) Find appropriate tools to create a word frequency list
(5) Prototype a Burmese word frequency list
End of the Stage II of Design ThinkingMajor Challenges Identified and Solved
![Page 6: Burmese Project - fbcinc.com](https://reader030.fdocuments.in/reader030/viewer/2022012409/616a521711a7b741a35137a1/html5/thumbnails/6.jpg)
“Word lists lie at the heart of good vocabulary course design, the development of graded materials for extensive listening and extensive reading, research on vocabulary load, and vocabulary test development.”
Paul NationMaking and Using Word Lists for Language Learning and Testing
(2016)
6
Making Word Lists
![Page 7: Burmese Project - fbcinc.com](https://reader030.fdocuments.in/reader030/viewer/2022012409/616a521711a7b741a35137a1/html5/thumbnails/7.jpg)
Phase I: Vocabulary frequency list (7 ‐months)(i) 1000 to 3000 high‐frequency words
Phase II: Texts and Prototype‐lessons (5 ‐months)(i) Reading texts (ii) Lesson prototypes (iii) New ways of teaching the Burmese script and
sound system (iv) Prototype lessons to be tested in class
Phase III:Curriculum (9 ‐months)(i) A vocab‐frequency list based curriculum (ii) Job related materials(iii) An extensive reading program (iv) Grammar instruction and materials
Scope and Sequence of the Burmese Curriculum Project
7
![Page 8: Burmese Project - fbcinc.com](https://reader030.fdocuments.in/reader030/viewer/2022012409/616a521711a7b741a35137a1/html5/thumbnails/8.jpg)
Purpose for making a particular list has a strong effect on the decisions and procedures that need to be followed.
Paul NationMaking and Using Word Lists for Language Learning and
Testing (2016)
8
Phase I
![Page 9: Burmese Project - fbcinc.com](https://reader030.fdocuments.in/reader030/viewer/2022012409/616a521711a7b741a35137a1/html5/thumbnails/9.jpg)
200000000
10000001000000
10000001000000
Phase I: Creation of Vocabulary Frequency List for the Burmese Project
1000, 2000, 3000
1000, 2000, 3000for each cone
200000000
10000000
1000000
A. Making a general Corpus‐ Target tokens – 2.5 millions‐ A general frequency list
B. Making sub‐Corpora (BEA/CS/GC/PM/STA)‐ Target tokens – 5oo K (each)‐ Frequency lists for each cone
C. Testing Frequency Lists‐Word list, Keyword List, Collocation, Concordance
9
![Page 10: Burmese Project - fbcinc.com](https://reader030.fdocuments.in/reader030/viewer/2022012409/616a521711a7b741a35137a1/html5/thumbnails/10.jpg)
Step 1. Finding appropriate textshttp://myanmar.mmtimes.com/
Phase I: How and What
10
![Page 11: Burmese Project - fbcinc.com](https://reader030.fdocuments.in/reader030/viewer/2022012409/616a521711a7b741a35137a1/html5/thumbnails/11.jpg)
Step 2. Convert text into Unicode if needed http://burglish.my‐mm.org/latest/trunk/web/fontconv.htm
Phase I: How and What
11
![Page 12: Burmese Project - fbcinc.com](https://reader030.fdocuments.in/reader030/viewer/2022012409/616a521711a7b741a35137a1/html5/thumbnails/12.jpg)
http://www.nlpresearch‐ucsy.edu.mm/NLP_UCSY/wsandpos.html
Phase I: How and WhatStep 3. Segmentation
12
![Page 13: Burmese Project - fbcinc.com](https://reader030.fdocuments.in/reader030/viewer/2022012409/616a521711a7b741a35137a1/html5/thumbnails/13.jpg)
Step 4A. Cleaning the noises
Phase I: How and What
13
![Page 14: Burmese Project - fbcinc.com](https://reader030.fdocuments.in/reader030/viewer/2022012409/616a521711a7b741a35137a1/html5/thumbnails/14.jpg)
Step 4B. Cleaning the noises
Phase I: How and What
14
![Page 15: Burmese Project - fbcinc.com](https://reader030.fdocuments.in/reader030/viewer/2022012409/616a521711a7b741a35137a1/html5/thumbnails/15.jpg)
Step 4C. Improving accuracy
Phase I: How and What
15
![Page 16: Burmese Project - fbcinc.com](https://reader030.fdocuments.in/reader030/viewer/2022012409/616a521711a7b741a35137a1/html5/thumbnails/16.jpg)
Step 4C. Improving accuracy
Phase I: How and What
16
![Page 17: Burmese Project - fbcinc.com](https://reader030.fdocuments.in/reader030/viewer/2022012409/616a521711a7b741a35137a1/html5/thumbnails/17.jpg)
Step 5. Save a text file
Phase I: How and What
17
![Page 18: Burmese Project - fbcinc.com](https://reader030.fdocuments.in/reader030/viewer/2022012409/616a521711a7b741a35137a1/html5/thumbnails/18.jpg)
Step 6A. Spreadsheet Record for General Corpus
Phase I: How and What
18
![Page 19: Burmese Project - fbcinc.com](https://reader030.fdocuments.in/reader030/viewer/2022012409/616a521711a7b741a35137a1/html5/thumbnails/19.jpg)
1. Future use of materials2. Copy rights issue3. Sources of materials4. Spoken Vs. Written5. Range of the Corpora (How many tokens acquired
for each category)
19
Why do we need a record?
![Page 20: Burmese Project - fbcinc.com](https://reader030.fdocuments.in/reader030/viewer/2022012409/616a521711a7b741a35137a1/html5/thumbnails/20.jpg)
Step 6A. Spreadsheet Record for General Corpus
Phase I: How and What
20
![Page 21: Burmese Project - fbcinc.com](https://reader030.fdocuments.in/reader030/viewer/2022012409/616a521711a7b741a35137a1/html5/thumbnails/21.jpg)
Step 6A. Spreadsheet Record for Sub‐Corpora
Phase I: How and What
21
![Page 22: Burmese Project - fbcinc.com](https://reader030.fdocuments.in/reader030/viewer/2022012409/616a521711a7b741a35137a1/html5/thumbnails/22.jpg)
# of acquired tokens as of December 2016 = 2,451,217
(Sub‐Corpora)
Corpora
516946 488469
398922
535288
472983
36109
PM BEA STA
CS GC DS
22
![Page 23: Burmese Project - fbcinc.com](https://reader030.fdocuments.in/reader030/viewer/2022012409/616a521711a7b741a35137a1/html5/thumbnails/23.jpg)
Total # of acquired tokens = 2,731,762(Sub‐Corpora)
23
516946 497723 482732 535288 475809
43208 43850
2731762
0
500000
1000000
1500000
2000000
2500000
3000000
1
PM
BEA
STA
CS
GC
DS
CON
Total
![Page 24: Burmese Project - fbcinc.com](https://reader030.fdocuments.in/reader030/viewer/2022012409/616a521711a7b741a35137a1/html5/thumbnails/24.jpg)
Creating High Frequency ListsTesting with Tools
(AntWordProfiler, AntConc, TextFixer)http://www.laurenceanthony.net/
24
http://www.laurenceanthony.net/
![Page 25: Burmese Project - fbcinc.com](https://reader030.fdocuments.in/reader030/viewer/2022012409/616a521711a7b741a35137a1/html5/thumbnails/25.jpg)
AntConc: A tool to do many things
25
![Page 26: Burmese Project - fbcinc.com](https://reader030.fdocuments.in/reader030/viewer/2022012409/616a521711a7b741a35137a1/html5/thumbnails/26.jpg)
Creating High Frequency ListsTesting with Tools
(AntWordProfiler, AntConc, TextFixer)
26
![Page 27: Burmese Project - fbcinc.com](https://reader030.fdocuments.in/reader030/viewer/2022012409/616a521711a7b741a35137a1/html5/thumbnails/27.jpg)
Creating High Frequency ListsTesting with Tools
(AntWordProfiler, AntConc, TextFixer)
27
![Page 28: Burmese Project - fbcinc.com](https://reader030.fdocuments.in/reader030/viewer/2022012409/616a521711a7b741a35137a1/html5/thumbnails/28.jpg)
Creating High Frequency ListsTesting with Tools
(AntWordProfiler, AntConc, TextFixer)
28
![Page 29: Burmese Project - fbcinc.com](https://reader030.fdocuments.in/reader030/viewer/2022012409/616a521711a7b741a35137a1/html5/thumbnails/29.jpg)
Creating High Frequency ListsTesting with Tools
(AntWordProfiler, AntConc, TextFixer)
29
![Page 30: Burmese Project - fbcinc.com](https://reader030.fdocuments.in/reader030/viewer/2022012409/616a521711a7b741a35137a1/html5/thumbnails/30.jpg)
Using AntConc to make a stable list of high frequency words
Next …
Using AntWordProfiler to profile the texts
30
What I am currently doing …
![Page 31: Burmese Project - fbcinc.com](https://reader030.fdocuments.in/reader030/viewer/2022012409/616a521711a7b741a35137a1/html5/thumbnails/31.jpg)
What does this project mean for similar languages?
31
![Page 32: Burmese Project - fbcinc.com](https://reader030.fdocuments.in/reader030/viewer/2022012409/616a521711a7b741a35137a1/html5/thumbnails/32.jpg)
Discussion:
Burmese Project
32
![Page 33: Burmese Project - fbcinc.com](https://reader030.fdocuments.in/reader030/viewer/2022012409/616a521711a7b741a35137a1/html5/thumbnails/33.jpg)
200000000
One‐Page Description
200000000 10000000
1000, 2000, 3000
Road Map: Three Phases of Burmese Project
Phase I: Creation of Vocabulary List
Phase II: Texts and Prototype‐lessons (i) Reading texts (ii) Lesson prototypes (iii) New ways of
teaching the Burmese script and sound system
(iv) Prototype lessons to be tested in class
Phase III: (i) Vocabulary based Curriculum with “Four Strands” principle(ii) Development of Grammar Instructions and Integrate in Curriculum
33