Similarity metrics for Japanese kanji
-
Upload
larsyencken -
Category
Technology
-
view
479 -
download
9
description
Transcript of Similarity metrics for Japanese kanji
![Page 1: Similarity metrics for Japanese kanji](https://reader034.fdocuments.in/reader034/viewer/2022042500/5553b14eb4c905d4448b4b43/html5/thumbnails/1.jpg)
Similarity Metrics for Japanese Kanji
Lars Yencken / 99designs
Maths and Science Meetup, 30th Nov 2012
![Page 2: Similarity metrics for Japanese kanji](https://reader034.fdocuments.in/reader034/viewer/2022042500/5553b14eb4c905d4448b4b43/html5/thumbnails/2.jpg)
LinguisticsComputerScience
Computational Linguistics
![Page 3: Similarity metrics for Japanese kanji](https://reader034.fdocuments.in/reader034/viewer/2022042500/5553b14eb4c905d4448b4b43/html5/thumbnails/3.jpg)
Relative difficultyof languages
![Page 4: Similarity metrics for Japanese kanji](https://reader034.fdocuments.in/reader034/viewer/2022042500/5553b14eb4c905d4448b4b43/html5/thumbnails/4.jpg)
DIFFICULTY OF LEARNING LANGUAGESFOREIGN SERVICE INSTITUTE, US DEPARTMENT OF STATE
![Page 5: Similarity metrics for Japanese kanji](https://reader034.fdocuments.in/reader034/viewer/2022042500/5553b14eb4c905d4448b4b43/html5/thumbnails/5.jpg)
DIFFICULTY OF LEARNING LANGUAGESFOREIGN SERVICE INSTITUTE, US DEPARTMENT OF STATE
Closely related to English
█▌575-600 class hours
Afrikaans, Danish, Dutch, French, Italian, Norwegian, Portuguese, Romanian, Spanish, Swedish
![Page 6: Similarity metrics for Japanese kanji](https://reader034.fdocuments.in/reader034/viewer/2022042500/5553b14eb4c905d4448b4b43/html5/thumbnails/6.jpg)
DIFFICULTY OF LEARNING LANGUAGESFOREIGN SERVICE INSTITUTE, US DEPARTMENT OF STATE
Closely related to English
█▌575-600 class hours
Afrikaans, Danish, Dutch, French, Italian, Norwegian, Portuguese, Romanian, Spanish, Swedish
Significant linguistic and/or cultural differences
███ 1100 class hours
Albanian, Amharic, Armenian, Azerbaijani, Bengali, Bosnian, Bulgarian, Burmese, Croatian, Czech, Estonian, Finish, Georgian, Greek, Hebrew, Hindi, Hungarian, Icelandic, Khmer, Lao, Latvian, Lithuanian, Macedonian, Mongolian, Nepali, Pashto, Persian (Dari, Farsi, Tajik), Polish, Russian, Serbian, Sinhalese, Slovak, Slovenian, Tagalog, Thai, Turkish, Ukranian, Urdu, Uzbek, Vietnamese, Xhosa, Zulu
![Page 7: Similarity metrics for Japanese kanji](https://reader034.fdocuments.in/reader034/viewer/2022042500/5553b14eb4c905d4448b4b43/html5/thumbnails/7.jpg)
DIFFICULTY OF LEARNING LANGUAGESFOREIGN SERVICE INSTITUTE, US DEPARTMENT OF STATE
Closely related to English
█▌575-600 class hours
Afrikaans, Danish, Dutch, French, Italian, Norwegian, Portuguese, Romanian, Spanish, Swedish
Significant linguistic and/or cultural differences
███ 1100 class hours
Albanian, Amharic, Armenian, Azerbaijani, Bengali, Bosnian, Bulgarian, Burmese, Croatian, Czech, Estonian, Finish, Georgian, Greek, Hebrew, Hindi, Hungarian, Icelandic, Khmer, Lao, Latvian, Lithuanian, Macedonian, Mongolian, Nepali, Pashto, Persian (Dari, Farsi, Tajik), Polish, Russian, Serbian, Sinhalese, Slovak, Slovenian, Tagalog, Thai, Turkish, Ukranian, Urdu, Uzbek, Vietnamese, Xhosa, Zulu
![Page 8: Similarity metrics for Japanese kanji](https://reader034.fdocuments.in/reader034/viewer/2022042500/5553b14eb4c905d4448b4b43/html5/thumbnails/8.jpg)
DIFFICULTY OF LEARNING LANGUAGESFOREIGN SERVICE INSTITUTE, US DEPARTMENT OF STATE
Closely related to English
█▌575-600 class hours
Afrikaans, Danish, Dutch, French, Italian, Norwegian, Portuguese, Romanian, Spanish, Swedish
Significant linguistic and/or cultural differences
███ 1100 class hours
Albanian, Amharic, Armenian, Azerbaijani, Bengali, Bosnian, Bulgarian, Burmese, Croatian, Czech, Estonian, Finish, Georgian, Greek, Hebrew, Hindi, Hungarian, Icelandic, Khmer, Lao, Latvian, Lithuanian, Macedonian, Mongolian, Nepali, Pashto, Persian (Dari, Farsi, Tajik), Polish, Russian, Serbian, Sinhalese, Slovak, Slovenian, Tagalog, Thai, Turkish, Ukranian, Urdu, Uzbek, Vietnamese, Xhosa, Zulu
![Page 9: Similarity metrics for Japanese kanji](https://reader034.fdocuments.in/reader034/viewer/2022042500/5553b14eb4c905d4448b4b43/html5/thumbnails/9.jpg)
DIFFICULTY OF LEARNING LANGUAGESFOREIGN SERVICE INSTITUTE, US DEPARTMENT OF STATE
Closely related to English
█▌575-600 class hours
Afrikaans, Danish, Dutch, French, Italian, Norwegian, Portuguese, Romanian, Spanish, Swedish
Significant linguistic and/or cultural differences
███ 1100 class hours
Albanian, Amharic, Armenian, Azerbaijani, Bengali, Bosnian, Bulgarian, Burmese, Croatian, Czech, Estonian, Finish, Georgian, Greek, Hebrew, Hindi, Hungarian, Icelandic, Khmer, Lao, Latvian, Lithuanian, Macedonian, Mongolian, Nepali, Pashto, Persian (Dari, Farsi, Tajik), Polish, Russian, Serbian, Sinhalese, Slovak, Slovenian, Tagalog, Thai, Turkish, Ukranian, Urdu, Uzbek, Vietnamese, Xhosa, Zulu
![Page 10: Similarity metrics for Japanese kanji](https://reader034.fdocuments.in/reader034/viewer/2022042500/5553b14eb4c905d4448b4b43/html5/thumbnails/10.jpg)
DIFFICULTY OF LEARNING LANGUAGESFOREIGN SERVICE INSTITUTE, US DEPARTMENT OF STATE
Closely related to English
█▌575-600 class hours
Afrikaans, Danish, Dutch, French, Italian, Norwegian, Portuguese, Romanian, Spanish, Swedish
Significant linguistic and/or cultural differences
███ 1100 class hours
Albanian, Amharic, Armenian, Azerbaijani, Bengali, Bosnian, Bulgarian, Burmese, Croatian, Czech, Estonian, Finish, Georgian, Greek, Hebrew, Hindi, Hungarian, Icelandic, Khmer, Lao, Latvian, Lithuanian, Macedonian, Mongolian, Nepali, Pashto, Persian (Dari, Farsi, Tajik), Polish, Russian, Serbian, Sinhalese, Slovak, Slovenian, Tagalog, Thai, Turkish, Ukranian, Urdu, Uzbek, Vietnamese, Xhosa, Zulu
![Page 11: Similarity metrics for Japanese kanji](https://reader034.fdocuments.in/reader034/viewer/2022042500/5553b14eb4c905d4448b4b43/html5/thumbnails/11.jpg)
DIFFICULTY OF LEARNING LANGUAGESFOREIGN SERVICE INSTITUTE, US DEPARTMENT OF STATE
Closely related to English
█▌575-600 class hours
Afrikaans, Danish, Dutch, French, Italian, Norwegian, Portuguese, Romanian, Spanish, Swedish
Significant linguistic and/or cultural differences
███ 1100 class hours
Albanian, Amharic, Armenian, Azerbaijani, Bengali, Bosnian, Bulgarian, Burmese, Croatian, Czech, Estonian, Finish, Georgian, Greek, Hebrew, Hindi, Hungarian, Icelandic, Khmer, Lao, Latvian, Lithuanian, Macedonian, Mongolian, Nepali, Pashto, Persian (Dari, Farsi, Tajik), Polish, Russian, Serbian, Sinhalese, Slovak, Slovenian, Tagalog, Thai, Turkish, Ukranian, Urdu, Uzbek, Vietnamese, Xhosa, Zulu
![Page 12: Similarity metrics for Japanese kanji](https://reader034.fdocuments.in/reader034/viewer/2022042500/5553b14eb4c905d4448b4b43/html5/thumbnails/12.jpg)
DIFFICULTY OF LEARNING LANGUAGESFOREIGN SERVICE INSTITUTE, US DEPARTMENT OF STATE
Closely related to English
█▌575-600 class hours
Afrikaans, Danish, Dutch, French, Italian, Norwegian, Portuguese, Romanian, Spanish, Swedish
Significant linguistic and/or cultural differences
███ 1100 class hours
Albanian, Amharic, Armenian, Azerbaijani, Bengali, Bosnian, Bulgarian, Burmese, Croatian, Czech, Estonian, Finish, Georgian, Greek, Hebrew, Hindi, Hungarian, Icelandic, Khmer, Lao, Latvian, Lithuanian, Macedonian, Mongolian, Nepali, Pashto, Persian (Dari, Farsi, Tajik), Polish, Russian, Serbian, Sinhalese, Slovak, Slovenian, Tagalog, Thai, Turkish, Ukranian, Urdu, Uzbek, Vietnamese, Xhosa, Zulu
![Page 13: Similarity metrics for Japanese kanji](https://reader034.fdocuments.in/reader034/viewer/2022042500/5553b14eb4c905d4448b4b43/html5/thumbnails/13.jpg)
DIFFICULTY OF LEARNING LANGUAGESFOREIGN SERVICE INSTITUTE, US DEPARTMENT OF STATE
Closely related to English
█▌575-600 class hours
Afrikaans, Danish, Dutch, French, Italian, Norwegian, Portuguese, Romanian, Spanish, Swedish
Significant linguistic and/or cultural differences
███ 1100 class hours
Albanian, Amharic, Armenian, Azerbaijani, Bengali, Bosnian, Bulgarian, Burmese, Croatian, Czech, Estonian, Finish, Georgian, Greek, Hebrew, Hindi, Hungarian, Icelandic, Khmer, Lao, Latvian, Lithuanian, Macedonian, Mongolian, Nepali, Pashto, Persian (Dari, Farsi, Tajik), Polish, Russian, Serbian, Sinhalese, Slovak, Slovenian, Tagalog, Thai, Turkish, Ukranian, Urdu, Uzbek, Vietnamese, Xhosa, Zulu
Exceptionally difficult fornative English speakers
██████ 2200 class hours
Arabic, Cantonese, Mandarin, Japanese, Korean
![Page 14: Similarity metrics for Japanese kanji](https://reader034.fdocuments.in/reader034/viewer/2022042500/5553b14eb4c905d4448b4b43/html5/thumbnails/14.jpg)
DIFFICULTY OF LEARNING LANGUAGESFOREIGN SERVICE INSTITUTE, US DEPARTMENT OF STATE
Closely related to English
█▌575-600 class hours
Afrikaans, Danish, Dutch, French, Italian, Norwegian, Portuguese, Romanian, Spanish, Swedish
Significant linguistic and/or cultural differences
███ 1100 class hours
Albanian, Amharic, Armenian, Azerbaijani, Bengali, Bosnian, Bulgarian, Burmese, Croatian, Czech, Estonian, Finish, Georgian, Greek, Hebrew, Hindi, Hungarian, Icelandic, Khmer, Lao, Latvian, Lithuanian, Macedonian, Mongolian, Nepali, Pashto, Persian (Dari, Farsi, Tajik), Polish, Russian, Serbian, Sinhalese, Slovak, Slovenian, Tagalog, Thai, Turkish, Ukranian, Urdu, Uzbek, Vietnamese, Xhosa, Zulu
Exceptionally difficult fornative English speakers
██████ 2200 class hours
Arabic, Cantonese, Mandarin, Japanese, Korean
![Page 15: Similarity metrics for Japanese kanji](https://reader034.fdocuments.in/reader034/viewer/2022042500/5553b14eb4c905d4448b4b43/html5/thumbnails/15.jpg)
持
![Page 16: Similarity metrics for Japanese kanji](https://reader034.fdocuments.in/reader034/viewer/2022042500/5553b14eb4c905d4448b4b43/html5/thumbnails/16.jpg)
持/mo(tsu)/ "to carry"
![Page 17: Similarity metrics for Japanese kanji](https://reader034.fdocuments.in/reader034/viewer/2022042500/5553b14eb4c905d4448b4b43/html5/thumbnails/17.jpg)
持 挂拝
![Page 18: Similarity metrics for Japanese kanji](https://reader034.fdocuments.in/reader034/viewer/2022042500/5553b14eb4c905d4448b4b43/html5/thumbnails/18.jpg)
distance(持, 挂) = ???
![Page 19: Similarity metrics for Japanese kanji](https://reader034.fdocuments.in/reader034/viewer/2022042500/5553b14eb4c905d4448b4b43/html5/thumbnails/19.jpg)
The space of kanji
![Page 20: Similarity metrics for Japanese kanji](https://reader034.fdocuments.in/reader034/viewer/2022042500/5553b14eb4c905d4448b4b43/html5/thumbnails/20.jpg)
![Page 21: Similarity metrics for Japanese kanji](https://reader034.fdocuments.in/reader034/viewer/2022042500/5553b14eb4c905d4448b4b43/html5/thumbnails/21.jpg)
dog
dough
log
![Page 22: Similarity metrics for Japanese kanji](https://reader034.fdocuments.in/reader034/viewer/2022042500/5553b14eb4c905d4448b4b43/html5/thumbnails/22.jpg)
持挂
拝土
![Page 23: Similarity metrics for Japanese kanji](https://reader034.fdocuments.in/reader034/viewer/2022042500/5553b14eb4c905d4448b4b43/html5/thumbnails/23.jpg)
Approaches
![Page 24: Similarity metrics for Japanese kanji](https://reader034.fdocuments.in/reader034/viewer/2022042500/5553b14eb4c905d4448b4b43/html5/thumbnails/24.jpg)
Compare images
![Page 25: Similarity metrics for Japanese kanji](https://reader034.fdocuments.in/reader034/viewer/2022042500/5553b14eb4c905d4448b4b43/html5/thumbnails/25.jpg)
持挂
![Page 26: Similarity metrics for Japanese kanji](https://reader034.fdocuments.in/reader034/viewer/2022042500/5553b14eb4c905d4448b4b43/html5/thumbnails/26.jpg)
![Page 27: Similarity metrics for Japanese kanji](https://reader034.fdocuments.in/reader034/viewer/2022042500/5553b14eb4c905d4448b4b43/html5/thumbnails/27.jpg)
Compare components
![Page 28: Similarity metrics for Japanese kanji](https://reader034.fdocuments.in/reader034/viewer/2022042500/5553b14eb4c905d4448b4b43/html5/thumbnails/28.jpg)
�
�
扌, 土, 寸
彳, 土, 寸
![Page 29: Similarity metrics for Japanese kanji](https://reader034.fdocuments.in/reader034/viewer/2022042500/5553b14eb4c905d4448b4b43/html5/thumbnails/29.jpg)
Compare strokes
![Page 30: Similarity metrics for Japanese kanji](https://reader034.fdocuments.in/reader034/viewer/2022042500/5553b14eb4c905d4448b4b43/html5/thumbnails/30.jpg)
P R O S P E R I T Y
P R O P E R T I E S
![Page 31: Similarity metrics for Japanese kanji](https://reader034.fdocuments.in/reader034/viewer/2022042500/5553b14eb4c905d4448b4b43/html5/thumbnails/31.jpg)
P R O S P E R I T Y
P R O P E R T I E S
distance: 6
![Page 32: Similarity metrics for Japanese kanji](https://reader034.fdocuments.in/reader034/viewer/2022042500/5553b14eb4c905d4448b4b43/html5/thumbnails/32.jpg)
�
�
3, 11a, 2a, 2a
3, 11a, 2a, 2a, 2a
distance: 1
![Page 33: Similarity metrics for Japanese kanji](https://reader034.fdocuments.in/reader034/viewer/2022042500/5553b14eb4c905d4448b4b43/html5/thumbnails/33.jpg)
Compare trees
![Page 34: Similarity metrics for Japanese kanji](https://reader034.fdocuments.in/reader034/viewer/2022042500/5553b14eb4c905d4448b4b43/html5/thumbnails/34.jpg)
�
� �
� � �
� �� � � � �
� �
� � �
� �� � � � �
�
![Page 35: Similarity metrics for Japanese kanji](https://reader034.fdocuments.in/reader034/viewer/2022042500/5553b14eb4c905d4448b4b43/html5/thumbnails/35.jpg)
�
� �
� � �
� �� � � � �
� �
� � �
� �� � � � �
�
tree edit distance
![Page 36: Similarity metrics for Japanese kanji](https://reader034.fdocuments.in/reader034/viewer/2022042500/5553b14eb4c905d4448b4b43/html5/thumbnails/36.jpg)
So what works?
![Page 37: Similarity metrics for Japanese kanji](https://reader034.fdocuments.in/reader034/viewer/2022042500/5553b14eb4c905d4448b4b43/html5/thumbnails/37.jpg)
![Page 38: Similarity metrics for Japanese kanji](https://reader034.fdocuments.in/reader034/viewer/2022042500/5553b14eb4c905d4448b4b43/html5/thumbnails/38.jpg)
![Page 39: Similarity metrics for Japanese kanji](https://reader034.fdocuments.in/reader034/viewer/2022042500/5553b14eb4c905d4448b4b43/html5/thumbnails/39.jpg)
Thanks!