1 A preliminary study on unknown word problem in Chinese word segmentation Authors: Ming -Yu Lin...
-
Upload
pearl-carpenter -
Category
Documents
-
view
327 -
download
0
Transcript of 1 A preliminary study on unknown word problem in Chinese word segmentation Authors: Ming -Yu Lin...
![Page 1: 1 A preliminary study on unknown word problem in Chinese word segmentation Authors: Ming -Yu Lin Tung – Hui Chiang Keh-Yih Su Speaker: Jbc.](https://reader033.fdocuments.in/reader033/viewer/2022061400/56649f035503460f94c175f3/html5/thumbnails/1.jpg)
1
A preliminary study on unknown word problem in Chinese word segmentation
Authors: Ming -Yu LinTung –Hui Chiang
Keh-Yih SuSpeaker: Jbc
![Page 2: 1 A preliminary study on unknown word problem in Chinese word segmentation Authors: Ming -Yu Lin Tung – Hui Chiang Keh-Yih Su Speaker: Jbc.](https://reader033.fdocuments.in/reader033/viewer/2022061400/56649f035503460f94c175f3/html5/thumbnails/2.jpg)
2
Abstract Unknown word is the main factor
that affect the performance of WS. To solve the unknown word, this
paper proposes two way: Morphological rule: solving the
regular unknown words. Statistical model : solving the
irregular unknown words.
![Page 3: 1 A preliminary study on unknown word problem in Chinese word segmentation Authors: Ming -Yu Lin Tung – Hui Chiang Keh-Yih Su Speaker: Jbc.](https://reader033.fdocuments.in/reader033/viewer/2022061400/56649f035503460f94c175f3/html5/thumbnails/3.jpg)
3
Outline Introduction System architecture Overview of the baseline model The morphological analysis Tagging part of speech Unknown word modeling
![Page 4: 1 A preliminary study on unknown word problem in Chinese word segmentation Authors: Ming -Yu Lin Tung – Hui Chiang Keh-Yih Su Speaker: Jbc.](https://reader033.fdocuments.in/reader033/viewer/2022061400/56649f035503460f94c175f3/html5/thumbnails/4.jpg)
4
![Page 5: 1 A preliminary study on unknown word problem in Chinese word segmentation Authors: Ming -Yu Lin Tung – Hui Chiang Keh-Yih Su Speaker: Jbc.](https://reader033.fdocuments.in/reader033/viewer/2022061400/56649f035503460f94c175f3/html5/thumbnails/5.jpg)
5
Introduction-(1) Word:
許多中文處理工作的基本單位 在中文有沒有界限的困擾
Unknown word 影響 WS 頗大 . Unknown word 的分類 :
Regular: EX: time, date (11:50, 11/12), reduplication
Irregular: EX: proper names, compound nouns.
![Page 6: 1 A preliminary study on unknown word problem in Chinese word segmentation Authors: Ming -Yu Lin Tung – Hui Chiang Keh-Yih Su Speaker: Jbc.](https://reader033.fdocuments.in/reader033/viewer/2022061400/56649f035503460f94c175f3/html5/thumbnails/6.jpg)
6
![Page 7: 1 A preliminary study on unknown word problem in Chinese word segmentation Authors: Ming -Yu Lin Tung – Hui Chiang Keh-Yih Su Speaker: Jbc.](https://reader033.fdocuments.in/reader033/viewer/2022061400/56649f035503460f94c175f3/html5/thumbnails/7.jpg)
7
Introduction-(2) 不同類型的 unknown word 的對策 :
Regular: 使用 morphological rule 來辨識 . Irregular: 使用統計模式來辨識 .
![Page 8: 1 A preliminary study on unknown word problem in Chinese word segmentation Authors: Ming -Yu Lin Tung – Hui Chiang Keh-Yih Su Speaker: Jbc.](https://reader033.fdocuments.in/reader033/viewer/2022061400/56649f035503460f94c175f3/html5/thumbnails/8.jpg)
8
System Architecture-(1)
![Page 9: 1 A preliminary study on unknown word problem in Chinese word segmentation Authors: Ming -Yu Lin Tung – Hui Chiang Keh-Yih Su Speaker: Jbc.](https://reader033.fdocuments.in/reader033/viewer/2022061400/56649f035503460f94c175f3/html5/thumbnails/9.jpg)
10
System Architecture-(2) Lexicon:
89590 entries. 49 tags.
# of characters / word
# of entries
1 1,734
2 35,492
3 19,650
4 24,054
5 6,140
6 2,020
>=7 500
Total 89,590
![Page 10: 1 A preliminary study on unknown word problem in Chinese word segmentation Authors: Ming -Yu Lin Tung – Hui Chiang Keh-Yih Su Speaker: Jbc.](https://reader033.fdocuments.in/reader033/viewer/2022061400/56649f035503460f94c175f3/html5/thumbnails/10.jpg)
11
System Architecture-(3) Morphological Rules:
17 條 . ( 在最後面的 Appendix A)
Corpus:
![Page 11: 1 A preliminary study on unknown word problem in Chinese word segmentation Authors: Ming -Yu Lin Tung – Hui Chiang Keh-Yih Su Speaker: Jbc.](https://reader033.fdocuments.in/reader033/viewer/2022061400/56649f035503460f94c175f3/html5/thumbnails/11.jpg)
12
Morphological Rules
![Page 12: 1 A preliminary study on unknown word problem in Chinese word segmentation Authors: Ming -Yu Lin Tung – Hui Chiang Keh-Yih Su Speaker: Jbc.](https://reader033.fdocuments.in/reader033/viewer/2022061400/56649f035503460f94c175f3/html5/thumbnails/12.jpg)
13
Statistics of Corpora
![Page 13: 1 A preliminary study on unknown word problem in Chinese word segmentation Authors: Ming -Yu Lin Tung – Hui Chiang Keh-Yih Su Speaker: Jbc.](https://reader033.fdocuments.in/reader033/viewer/2022061400/56649f035503460f94c175f3/html5/thumbnails/13.jpg)
14
Overview of the Baseline Model-(1) The baseline model:
![Page 14: 1 A preliminary study on unknown word problem in Chinese word segmentation Authors: Ming -Yu Lin Tung – Hui Chiang Keh-Yih Su Speaker: Jbc.](https://reader033.fdocuments.in/reader033/viewer/2022061400/56649f035503460f94c175f3/html5/thumbnails/14.jpg)
15
Overview of the Baseline Model-(2) Baseline vs. Max match:
![Page 15: 1 A preliminary study on unknown word problem in Chinese word segmentation Authors: Ming -Yu Lin Tung – Hui Chiang Keh-Yih Su Speaker: Jbc.](https://reader033.fdocuments.in/reader033/viewer/2022061400/56649f035503460f94c175f3/html5/thumbnails/15.jpg)
16
![Page 16: 1 A preliminary study on unknown word problem in Chinese word segmentation Authors: Ming -Yu Lin Tung – Hui Chiang Keh-Yih Su Speaker: Jbc.](https://reader033.fdocuments.in/reader033/viewer/2022061400/56649f035503460f94c175f3/html5/thumbnails/16.jpg)
17
Overview of the Baseline Model-(3) Two error patterns:
s_ns( mis-combined error): Ex.| 一 | 個 | 人 | | 一 | 個人 | ns_s( over-segmentation error): Ex.| 轉換器 | | 轉換 | 器 |
![Page 17: 1 A preliminary study on unknown word problem in Chinese word segmentation Authors: Ming -Yu Lin Tung – Hui Chiang Keh-Yih Su Speaker: Jbc.](https://reader033.fdocuments.in/reader033/viewer/2022061400/56649f035503460f94c175f3/html5/thumbnails/17.jpg)
18
Statistics of Error Patterns
![Page 18: 1 A preliminary study on unknown word problem in Chinese word segmentation Authors: Ming -Yu Lin Tung – Hui Chiang Keh-Yih Su Speaker: Jbc.](https://reader033.fdocuments.in/reader033/viewer/2022061400/56649f035503460f94c175f3/html5/thumbnails/18.jpg)
19
The Morphological Analysis-(1) 本 paper 提出了使用 Morphological rul
es 來找出規則的 unknown words. Rule ordering:
Using SFS(sequencial forward selection) procedure.
Cost = wr * (1-Pr) + wp * (1-Pp)
![Page 19: 1 A preliminary study on unknown word problem in Chinese word segmentation Authors: Ming -Yu Lin Tung – Hui Chiang Keh-Yih Su Speaker: Jbc.](https://reader033.fdocuments.in/reader033/viewer/2022061400/56649f035503460f94c175f3/html5/thumbnails/19.jpg)
20
The Morphological Analysis-(2)
![Page 20: 1 A preliminary study on unknown word problem in Chinese word segmentation Authors: Ming -Yu Lin Tung – Hui Chiang Keh-Yih Su Speaker: Jbc.](https://reader033.fdocuments.in/reader033/viewer/2022061400/56649f035503460f94c175f3/html5/thumbnails/20.jpg)
21
The Morphological Analysis-(3) Baseline model + morphological
rule:
![Page 21: 1 A preliminary study on unknown word problem in Chinese word segmentation Authors: Ming -Yu Lin Tung – Hui Chiang Keh-Yih Su Speaker: Jbc.](https://reader033.fdocuments.in/reader033/viewer/2022061400/56649f035503460f94c175f3/html5/thumbnails/21.jpg)
22
The Morphological Analysis-(4) 使用 morphological rule 後對 s_ns 與
ns_s 的改善 :
![Page 22: 1 A preliminary study on unknown word problem in Chinese word segmentation Authors: Ming -Yu Lin Tung – Hui Chiang Keh-Yih Su Speaker: Jbc.](https://reader033.fdocuments.in/reader033/viewer/2022061400/56649f035503460f94c175f3/html5/thumbnails/22.jpg)
23
Tagging part of speech-(1)
![Page 23: 1 A preliminary study on unknown word problem in Chinese word segmentation Authors: Ming -Yu Lin Tung – Hui Chiang Keh-Yih Su Speaker: Jbc.](https://reader033.fdocuments.in/reader033/viewer/2022061400/56649f035503460f94c175f3/html5/thumbnails/23.jpg)
24
Tagging part of speech-(2)
![Page 24: 1 A preliminary study on unknown word problem in Chinese word segmentation Authors: Ming -Yu Lin Tung – Hui Chiang Keh-Yih Su Speaker: Jbc.](https://reader033.fdocuments.in/reader033/viewer/2022061400/56649f035503460f94c175f3/html5/thumbnails/24.jpg)
25
Tagging part of speech-(3)
![Page 25: 1 A preliminary study on unknown word problem in Chinese word segmentation Authors: Ming -Yu Lin Tung – Hui Chiang Keh-Yih Su Speaker: Jbc.](https://reader033.fdocuments.in/reader033/viewer/2022061400/56649f035503460f94c175f3/html5/thumbnails/25.jpg)
26
Tagging part of speech-(4)
![Page 26: 1 A preliminary study on unknown word problem in Chinese word segmentation Authors: Ming -Yu Lin Tung – Hui Chiang Keh-Yih Su Speaker: Jbc.](https://reader033.fdocuments.in/reader033/viewer/2022061400/56649f035503460f94c175f3/html5/thumbnails/26.jpg)
27
Unknown word modeling-(1) 5 unknown word categories:
應加入辭典的 words. Ex: 爭議 應用 morphological rules 規範的 words. E
x: 牛肝 , 牛心 . 縮寫 . Ex: 國大 . 專有名詞 . Ex: 胡適 . 其他 .( 如印錯的 word, Ex: 吩付 辭典中沒有
的 word. )
![Page 27: 1 A preliminary study on unknown word problem in Chinese word segmentation Authors: Ming -Yu Lin Tung – Hui Chiang Keh-Yih Su Speaker: Jbc.](https://reader033.fdocuments.in/reader033/viewer/2022061400/56649f035503460f94c175f3/html5/thumbnails/27.jpg)
28
Unknown word modeling-(2) 使用 unknown word model 來找不規
則的 unknown word. 確認有無 unknown word 存在所預測的區
域 . 如果有 , 找出 unknown word 是那一塊 .
![Page 28: 1 A preliminary study on unknown word problem in Chinese word segmentation Authors: Ming -Yu Lin Tung – Hui Chiang Keh-Yih Su Speaker: Jbc.](https://reader033.fdocuments.in/reader033/viewer/2022061400/56649f035503460f94c175f3/html5/thumbnails/28.jpg)
29
Unknown word modeling-(3) 確認有沒有 :
![Page 29: 1 A preliminary study on unknown word problem in Chinese word segmentation Authors: Ming -Yu Lin Tung – Hui Chiang Keh-Yih Su Speaker: Jbc.](https://reader033.fdocuments.in/reader033/viewer/2022061400/56649f035503460f94c175f3/html5/thumbnails/29.jpg)
30
Unknown word modeling-(4) 確認那一塊 :
![Page 30: 1 A preliminary study on unknown word problem in Chinese word segmentation Authors: Ming -Yu Lin Tung – Hui Chiang Keh-Yih Su Speaker: Jbc.](https://reader033.fdocuments.in/reader033/viewer/2022061400/56649f035503460f94c175f3/html5/thumbnails/30.jpg)
31
Result-(1)
![Page 31: 1 A preliminary study on unknown word problem in Chinese word segmentation Authors: Ming -Yu Lin Tung – Hui Chiang Keh-Yih Su Speaker: Jbc.](https://reader033.fdocuments.in/reader033/viewer/2022061400/56649f035503460f94c175f3/html5/thumbnails/31.jpg)
32
Result-(2)