Building Multilingual and Crosslingual Semantic Resources with Volunteer Contributions over the Web...
-
Upload
amice-hodges -
Category
Documents
-
view
214 -
download
0
Transcript of Building Multilingual and Crosslingual Semantic Resources with Volunteer Contributions over the Web...
![Page 1: Building Multilingual and Crosslingual Semantic Resources with Volunteer Contributions over the Web Rada Mihalcea University of North Texas.](https://reader035.fdocuments.in/reader035/viewer/2022062714/56649d145503460f949e93b2/html5/thumbnails/1.jpg)
Building Multilingual and Crosslingual Semantic
Resources with Volunteer Contributions over the WebRada Mihalcea
University of North Texas
![Page 2: Building Multilingual and Crosslingual Semantic Resources with Volunteer Contributions over the Web Rada Mihalcea University of North Texas.](https://reader035.fdocuments.in/reader035/viewer/2022062714/56649d145503460f949e93b2/html5/thumbnails/2.jpg)
Facts
• Globalization– “Breaking down of political, cultural,
and trade barriers” (Thomas Friedman)
– Universal communication
• Dying languages – One language dying every other week
![Page 3: Building Multilingual and Crosslingual Semantic Resources with Volunteer Contributions over the Web Rada Mihalcea University of North Texas.](https://reader035.fdocuments.in/reader035/viewer/2022062714/56649d145503460f949e93b2/html5/thumbnails/3.jpg)
Some Figures (set 1)
• 7,000 languages spoken worldwide– + even more dialects– [http://ethnologue.com]
• Resources currently available for 15-20 languages (or less)
![Page 4: Building Multilingual and Crosslingual Semantic Resources with Volunteer Contributions over the Web Rada Mihalcea University of North Texas.](https://reader035.fdocuments.in/reader035/viewer/2022062714/56649d145503460f949e93b2/html5/thumbnails/4.jpg)
Some Figures (set 2)
Country Web users / TotalUnited States 150 mil. / 290 mil.China 100 mil. / 1.2 bil.Japan 50 mil. / 125 mil.India 40 mil. / 1 bil.Germany 35 mil. / 80 mil.United Kingdom 25 mil. / 60 mil.… …
• On average, an Internet user spends 11 h. 24’ / month
• United States users: 25 h. 25’ [home] + 74 h. 26’ [work] / month – [Lyman & Varian 2003]
10,773,000,000 hours spent online every month
[some 5 million man-years!]
• 945,000,000 total Web users
• [“The Main Thing”, June 2004]– http://www.rebron.org/
mozilla/archives/000085.html
• Internet population
![Page 5: Building Multilingual and Crosslingual Semantic Resources with Volunteer Contributions over the Web Rada Mihalcea University of North Texas.](https://reader035.fdocuments.in/reader035/viewer/2022062714/56649d145503460f949e93b2/html5/thumbnails/5.jpg)
Availability vs. Needs
• The Web as collective mind• A Different View of the Web:
WWW ≠ large set of pagesWWW = a way to ask millions of people
Users spending online 10,773,000,000 hours / mo. [~ 5,000,000 man-years]
Resources required for 7,000 languages
![Page 6: Building Multilingual and Crosslingual Semantic Resources with Volunteer Contributions over the Web Rada Mihalcea University of North Texas.](https://reader035.fdocuments.in/reader035/viewer/2022062714/56649d145503460f949e93b2/html5/thumbnails/6.jpg)
Outline
[building resources]• I: Building multilingual WordNets
• II: Building a (crosslingual) pictorial WordNet [using resources]• III: Applications
![Page 7: Building Multilingual and Crosslingual Semantic Resources with Volunteer Contributions over the Web Rada Mihalcea University of North Texas.](https://reader035.fdocuments.in/reader035/viewer/2022062714/56649d145503460f949e93b2/html5/thumbnails/7.jpg)
Outline
[building resources]• I: Building multilingual
WordNets• II: Building a (crosslingual)
pictorial WordNet [using resources]• III: Applications
![Page 8: Building Multilingual and Crosslingual Semantic Resources with Volunteer Contributions over the Web Rada Mihalcea University of North Texas.](https://reader035.fdocuments.in/reader035/viewer/2022062714/56649d145503460f949e93b2/html5/thumbnails/8.jpg)
Building WordNets
• Other WordNets:– Princeton WordNet– Euro WordNet– BalkaNet– …
• Methodology:– Manual– Lexicographers– Time-consuming and expensive
![Page 9: Building Multilingual and Crosslingual Semantic Resources with Volunteer Contributions over the Web Rada Mihalcea University of North Texas.](https://reader035.fdocuments.in/reader035/viewer/2022062714/56649d145503460f949e93b2/html5/thumbnails/9.jpg)
Romanian Semantic Dictionary
• Distributed / Web based• Non-expert users / expert
validators
WordNet RSDNET
Bilingual
Monolingual
Corpus
automatic
manual
non-expert expert
![Page 10: Building Multilingual and Crosslingual Semantic Resources with Volunteer Contributions over the Web Rada Mihalcea University of North Texas.](https://reader035.fdocuments.in/reader035/viewer/2022062714/56649d145503460f949e93b2/html5/thumbnails/10.jpg)
Resources Used
• WordNet semantic network
• English-Romanian dictionary
• Romanian dictionary
• Romanian corpus
![Page 11: Building Multilingual and Crosslingual Semantic Resources with Volunteer Contributions over the Web Rada Mihalcea University of North Texas.](https://reader035.fdocuments.in/reader035/viewer/2022062714/56649d145503460f949e93b2/html5/thumbnails/11.jpg)
Main Phases
words
meaningconfirm
choose
score!
• non-expert contributions– choose a WordNet
synset– pick the correct
translations (add other words to the synset)
– choose a sentence from the corpus that displays the appropriate meaning
– confirm the new synset– get points and rewards!
• expert validation– correct errors / remove
entries
![Page 12: Building Multilingual and Crosslingual Semantic Resources with Volunteer Contributions over the Web Rada Mihalcea University of North Texas.](https://reader035.fdocuments.in/reader035/viewer/2022062714/56649d145503460f949e93b2/html5/thumbnails/12.jpg)
Result
roSynset
synsetIDdefinitionexamplevalidated
roWords
IDwordsynsetIDengMatch
engSynset
synsetIDdefinitionexample
engWords
IDwordsynsetIDroMatch
![Page 13: Building Multilingual and Crosslingual Semantic Resources with Volunteer Contributions over the Web Rada Mihalcea University of North Texas.](https://reader035.fdocuments.in/reader035/viewer/2022062714/56649d145503460f949e93b2/html5/thumbnails/13.jpg)
Quantity
• Large number of contributions in short amount of time– 6 months: more than 2,000 synsets
from 150 contributors
![Page 14: Building Multilingual and Crosslingual Semantic Resources with Volunteer Contributions over the Web Rada Mihalcea University of North Texas.](https://reader035.fdocuments.in/reader035/viewer/2022062714/56649d145503460f949e93b2/html5/thumbnails/14.jpg)
QualityManual RoWN
RSDnet GenSynsets
• manual• depends on experts• takes longer to build
• semi-automatic• depends on Web users• errors corrected by experts
• automatic• depends on lexical resources• errors introduced
RSDnet GenSynsetsnoun adjectiv
enoun adjective
correct 96.6% 92.3%19% / 63%
21% / 71%
part.correct
3.3% 3.8% 2% / 2% 0% / 0%
erroneous
0% 0% 2% / 2% 4% / 4%
missing 0% 0%77% / 33%
75% / 25%
total 59 26 54 24
![Page 15: Building Multilingual and Crosslingual Semantic Resources with Volunteer Contributions over the Web Rada Mihalcea University of North Texas.](https://reader035.fdocuments.in/reader035/viewer/2022062714/56649d145503460f949e93b2/html5/thumbnails/15.jpg)
Pros / Cons• Pros
– faster than manual experts – more accurate than automatic– derived from WordNet => inherits
WordNet relations
• Limitations– bilingual users (English/Romanian)– capturing difficult concepts
![Page 16: Building Multilingual and Crosslingual Semantic Resources with Volunteer Contributions over the Web Rada Mihalcea University of North Texas.](https://reader035.fdocuments.in/reader035/viewer/2022062714/56649d145503460f949e93b2/html5/thumbnails/16.jpg)
Outline
[building resources]• I: Building multilingual WordNets
• II: Building a (crosslingual) pictorial WordNet [using resources]• III: Applications
![Page 17: Building Multilingual and Crosslingual Semantic Resources with Volunteer Contributions over the Web Rada Mihalcea University of North Texas.](https://reader035.fdocuments.in/reader035/viewer/2022062714/56649d145503460f949e93b2/html5/thumbnails/17.jpg)
A Picture is Worth 7,000 Words
![Page 18: Building Multilingual and Crosslingual Semantic Resources with Volunteer Contributions over the Web Rada Mihalcea University of North Texas.](https://reader035.fdocuments.in/reader035/viewer/2022062714/56649d145503460f949e93b2/html5/thumbnails/18.jpg)
An Image Dictionary
• Add image representations to concepts defined in WordNet– Encode word/image associations– Combine visual and linguistic
representations of world concepts
![Page 19: Building Multilingual and Crosslingual Semantic Resources with Volunteer Contributions over the Web Rada Mihalcea University of North Texas.](https://reader035.fdocuments.in/reader035/viewer/2022062714/56649d145503460f949e93b2/html5/thumbnails/19.jpg)
Typical entry in a dictionary
• pipe, tobacco pipe– a tube with a small
bowl at one end; used for smoking tobacco
• pipe, pipage, piping – a long tube made of
metal or plastic that is used to carry water or oil or gas etc.)
• pipe, tabor pipe – a tubular wind
instrument
+ pictorial representations
![Page 20: Building Multilingual and Crosslingual Semantic Resources with Volunteer Contributions over the Web Rada Mihalcea University of North Texas.](https://reader035.fdocuments.in/reader035/viewer/2022062714/56649d145503460f949e93b2/html5/thumbnails/20.jpg)
What for?• Language learning
– Children– Second (foreign) language– People with language disorders
• International language-independent knowledge base– Pictures are transparent to languages
• Applications– Pictorial translations (“Letters to my cousin”)
• Bridge the gap between research in image and text processing– Image retrieval/classification, natural language
![Page 21: Building Multilingual and Crosslingual Semantic Resources with Volunteer Contributions over the Web Rada Mihalcea University of North Texas.](https://reader035.fdocuments.in/reader035/viewer/2022062714/56649d145503460f949e93b2/html5/thumbnails/21.jpg)
Word/Image Associations
• Difficult• First iteration:
– Concrete nouns (flower, dog)– Concrete verbs (write, drink )
• Next:– Abstractions (friendship, love)– Object properties (red, large)
![Page 22: Building Multilingual and Crosslingual Semantic Resources with Volunteer Contributions over the Web Rada Mihalcea University of North Texas.](https://reader035.fdocuments.in/reader035/viewer/2022062714/56649d145503460f949e93b2/html5/thumbnails/22.jpg)
Building PicNet
• An illustrated semantic dictionary• Web-users perform the mapping• Resources
– WordNet• 150,000+ words, grouped in synsets• 250,000+ semantic relations
– Image Search Engines• PicSearch http://www.picsearch.com• AltaVista http://www.altavista.com/image• To date 72,000 images automatically
collected
![Page 23: Building Multilingual and Crosslingual Semantic Resources with Volunteer Contributions over the Web Rada Mihalcea University of North Texas.](https://reader035.fdocuments.in/reader035/viewer/2022062714/56649d145503460f949e93b2/html5/thumbnails/23.jpg)
Activities in PicNet
• Administrator functions • Word/image associations (Web-
users) – Free association– Competitive free association
(tournament)– Image validation / Scoring– Image donation– Word lookup (search)
![Page 24: Building Multilingual and Crosslingual Semantic Resources with Volunteer Contributions over the Web Rada Mihalcea University of North Texas.](https://reader035.fdocuments.in/reader035/viewer/2022062714/56649d145503460f949e93b2/html5/thumbnails/24.jpg)
Administrator functions
• Validate uploaded images– Determine whether to
allow the images into the system
– Does not verify the mapping
– Delete corporate, offensive, or unclear images
• Options– Ban User
• can delete all activity by a particular user from the database
![Page 25: Building Multilingual and Crosslingual Semantic Resources with Volunteer Contributions over the Web Rada Mihalcea University of North Texas.](https://reader035.fdocuments.in/reader035/viewer/2022062714/56649d145503460f949e93b2/html5/thumbnails/25.jpg)
• Word lookup• User
contribution– Contributing /
validating images
– Free association– Tournaments
(competitive free association)
![Page 26: Building Multilingual and Crosslingual Semantic Resources with Volunteer Contributions over the Web Rada Mihalcea University of North Texas.](https://reader035.fdocuments.in/reader035/viewer/2022062714/56649d145503460f949e93b2/html5/thumbnails/26.jpg)
Word Lookup (Search)
• Synsets with words matching the search phrase are displayed with their best image match.
• Finding the desired synset, a user may:– rate the validity of
the current synset – image mapping
– upload a new image to be attached to this synset.
Activity 1
![Page 27: Building Multilingual and Crosslingual Semantic Resources with Volunteer Contributions over the Web Rada Mihalcea University of North Texas.](https://reader035.fdocuments.in/reader035/viewer/2022062714/56649d145503460f949e93b2/html5/thumbnails/27.jpg)
Image Validation (Scoring)
• User is shown a synset-image pair – rank its appropriateness.
• Factors to consider:– fitness for the given synset.– quality of the image (size, clarity)
Activity 2
![Page 28: Building Multilingual and Crosslingual Semantic Resources with Volunteer Contributions over the Web Rada Mihalcea University of North Texas.](https://reader035.fdocuments.in/reader035/viewer/2022062714/56649d145503460f949e93b2/html5/thumbnails/28.jpg)
Scoring
• Score based on the user response– Not related ( -5 )– Loosely related ( 1 )– Some similarity ( 2 )– Well suited ( 3 )
• Result:– Determine a score for each synset-
image pair– Concept/image pairs that are not
related are quickly discovered• Typically after a response from one or two
users
![Page 29: Building Multilingual and Crosslingual Semantic Resources with Volunteer Contributions over the Web Rada Mihalcea University of North Texas.](https://reader035.fdocuments.in/reader035/viewer/2022062714/56649d145503460f949e93b2/html5/thumbnails/29.jpg)
Free Word Association
• Task: given an image, provide a word to match.
Activity 3
![Page 30: Building Multilingual and Crosslingual Semantic Resources with Volunteer Contributions over the Web Rada Mihalcea University of North Texas.](https://reader035.fdocuments.in/reader035/viewer/2022062714/56649d145503460f949e93b2/html5/thumbnails/30.jpg)
Free Word Association – problems
• Difficult to identify images with optimal specificity– E.g. violet vs. flower
• Sometimes tedious to find the intended word from the synset list
• However, the user can often determine a hypernym (more general concept) – useful information
• [Scoring] A free word association is considered to be “well suited” and scores 3
![Page 31: Building Multilingual and Crosslingual Semantic Resources with Volunteer Contributions over the Web Rada Mihalcea University of North Texas.](https://reader035.fdocuments.in/reader035/viewer/2022062714/56649d145503460f949e93b2/html5/thumbnails/31.jpg)
Image Upload
• Given a concept, upload a matching image– Search facilitated with shortcuts to three
search engines (PicSearch, AltaVista, Google)
• Scoring for uploaded images– An image uploaded for a particular synset
is considered “well suited” and scored at 5
– Account for the extra effort required from the user • Possible indicator of a stronger correlation.
Activity 4
![Page 32: Building Multilingual and Crosslingual Semantic Resources with Volunteer Contributions over the Web Rada Mihalcea University of North Texas.](https://reader035.fdocuments.in/reader035/viewer/2022062714/56649d145503460f949e93b2/html5/thumbnails/32.jpg)
User Motivation
• Points for each activity– Leaderboard
• Competitive activity – The PicNet Game
• Combine ideas into a competitive game
![Page 33: Building Multilingual and Crosslingual Semantic Resources with Volunteer Contributions over the Web Rada Mihalcea University of North Texas.](https://reader035.fdocuments.in/reader035/viewer/2022062714/56649d145503460f949e93b2/html5/thumbnails/33.jpg)
The PicNet Game
• Phase 1: Each player is shown an image and asked to provide a matching synset (as in free word association)
Activity 5
![Page 34: Building Multilingual and Crosslingual Semantic Resources with Volunteer Contributions over the Web Rada Mihalcea University of North Texas.](https://reader035.fdocuments.in/reader035/viewer/2022062714/56649d145503460f949e93b2/html5/thumbnails/34.jpg)
The PicNet Game
• Phase 2: Each player votes for the best match (cannot vote on her own entry).
![Page 35: Building Multilingual and Crosslingual Semantic Resources with Volunteer Contributions over the Web Rada Mihalcea University of North Texas.](https://reader035.fdocuments.in/reader035/viewer/2022062714/56649d145503460f949e93b2/html5/thumbnails/35.jpg)
Scoring and Winning• Each synset-image pair scores one point for
being entered, and one point for each vote received.
• If multiple players enter the same synset-image pair, the score is 2 * number of players entering that synset
• Players also receive a “game score”, which counts towards winning the game– A player receives 100 points for winning the round
• If multiple players entered the synset-image pair winning the best match, the score is split evenly
• A player reaching 300 points wins the tournament
![Page 36: Building Multilingual and Crosslingual Semantic Resources with Volunteer Contributions over the Web Rada Mihalcea University of North Texas.](https://reader035.fdocuments.in/reader035/viewer/2022062714/56649d145503460f949e93b2/html5/thumbnails/36.jpg)
Quality and Quantity
• [1 year] 6,200 concepts from 320 contributors
• Competitive free association– Number of users voting for the same
synset suggestion in each round– User concurrence: 43% (consistent
agreement)
• Random sampling 100 images– 85% correct associations
![Page 37: Building Multilingual and Crosslingual Semantic Resources with Volunteer Contributions over the Web Rada Mihalcea University of North Texas.](https://reader035.fdocuments.in/reader035/viewer/2022062714/56649d145503460f949e93b2/html5/thumbnails/37.jpg)
Sample Word/Image Associations
exodus, hegira, hejira – a journey by a large group to escape from a hostile environment
![Page 38: Building Multilingual and Crosslingual Semantic Resources with Volunteer Contributions over the Web Rada Mihalcea University of North Texas.](https://reader035.fdocuments.in/reader035/viewer/2022062714/56649d145503460f949e93b2/html5/thumbnails/38.jpg)
Sample Word/Image Associations
humerus – bone extending from the shoulder to the elbow
![Page 39: Building Multilingual and Crosslingual Semantic Resources with Volunteer Contributions over the Web Rada Mihalcea University of North Texas.](https://reader035.fdocuments.in/reader035/viewer/2022062714/56649d145503460f949e93b2/html5/thumbnails/39.jpg)
Sample Word/Image Associations
Castro, Fidel Castro – Cuban socialist leader who overthrew a dictator in 1959 and established a socialist state in Cuba (born in 1927)
![Page 40: Building Multilingual and Crosslingual Semantic Resources with Volunteer Contributions over the Web Rada Mihalcea University of North Texas.](https://reader035.fdocuments.in/reader035/viewer/2022062714/56649d145503460f949e93b2/html5/thumbnails/40.jpg)
Outline
[building resources]• I: Building multilingual WordNets
• II: Building a (crosslingual) pictorial WordNet [using resources]
• III: Applications
![Page 41: Building Multilingual and Crosslingual Semantic Resources with Volunteer Contributions over the Web Rada Mihalcea University of North Texas.](https://reader035.fdocuments.in/reader035/viewer/2022062714/56649d145503460f949e93b2/html5/thumbnails/41.jpg)
Translation with Pictures
• What do you understand by the following ?
The house has four bedrooms and one kitchen.
![Page 42: Building Multilingual and Crosslingual Semantic Resources with Volunteer Contributions over the Web Rada Mihalcea University of North Texas.](https://reader035.fdocuments.in/reader035/viewer/2022062714/56649d145503460f949e93b2/html5/thumbnails/42.jpg)
Understanding with Pictures: Pros
• Universal• Requires minimal learning• Intuitive• Cheap (free contribution by users
of PicNet)• Proven success (iconic languages
for augmentative communication)
![Page 43: Building Multilingual and Crosslingual Semantic Resources with Volunteer Contributions over the Web Rada Mihalcea University of North Texas.](https://reader035.fdocuments.in/reader035/viewer/2022062714/56649d145503460f949e93b2/html5/thumbnails/43.jpg)
Understanding with Pictures: Cons
• Complex information cannot be conveyed through pictures – e.g. “An inhaled form of insulin won
federal approval yesterday”• A large number of concepts with a
level of abstraction that prohibits a visual representation– e.g. politics, paradigm, regenerate
• Culture differences– e.g. some Latin American tribes do
not understand the concept of coffee
![Page 44: Building Multilingual and Crosslingual Semantic Resources with Volunteer Contributions over the Web Rada Mihalcea University of North Texas.](https://reader035.fdocuments.in/reader035/viewer/2022062714/56649d145503460f949e93b2/html5/thumbnails/44.jpg)
A First Cut
• Simple sentences– no complex states or evens (e.g. emotional
states, temporal markers, change) or their attributes (adjectives, adverbs)
– no linguistic structure (e.g. complex noun phrases, prepositional attachments, lexical order, certainty)
– basic concrete nouns and verbs translated “as is”
• Evaluate the amount of understanding achieved through pictures as opposed to words
![Page 45: Building Multilingual and Crosslingual Semantic Resources with Volunteer Contributions over the Web Rada Mihalcea University of North Texas.](https://reader035.fdocuments.in/reader035/viewer/2022062714/56649d145503460f949e93b2/html5/thumbnails/45.jpg)
Does It Work?
• Experiments carried out within a translation framework with simple sentences
• A communication process – a speaker of an “unknown” language– a listener of a “known” language– Chinese (unknown) to English (known)
• Three translation scenarios– fully pictorial representations (PicNet)– mixed pictorial/linguistic
representations– fully linguistic representations
![Page 46: Building Multilingual and Crosslingual Semantic Resources with Volunteer Contributions over the Web Rada Mihalcea University of North Texas.](https://reader035.fdocuments.in/reader035/viewer/2022062714/56649d145503460f949e93b2/html5/thumbnails/46.jpg)
Sample Pictorial and Linguistic Translations
this
this
![Page 47: Building Multilingual and Crosslingual Semantic Resources with Volunteer Contributions over the Web Rada Mihalcea University of North Texas.](https://reader035.fdocuments.in/reader035/viewer/2022062714/56649d145503460f949e93b2/html5/thumbnails/47.jpg)
Evaluation Study • Interpretations
– Users asked to provide an interpretation based on their first intuition
– Users’ background: Hispanics, Caucasians, Latin Americans
• Data set: 50 short sentences (10-15 words)– 30 sentences from language learning courses– 20 sentences from various domains (sports, politics,…)– Various levels of difficulty– 15 (average) interpretations for each sentence– One interpretation for each translation scenario– Total of 15*3*50=2,250 interpretations
![Page 48: Building Multilingual and Crosslingual Semantic Resources with Volunteer Contributions over the Web Rada Mihalcea University of North Texas.](https://reader035.fdocuments.in/reader035/viewer/2022062714/56649d145503460f949e93b2/html5/thumbnails/48.jpg)
Sample Interpretations
![Page 49: Building Multilingual and Crosslingual Semantic Resources with Volunteer Contributions over the Web Rada Mihalcea University of North Texas.](https://reader035.fdocuments.in/reader035/viewer/2022062714/56649d145503460f949e93b2/html5/thumbnails/49.jpg)
Evaluation Results
• Manual and automatic evaluations:– Adequacy– NIST [Bleu] – GTM
![Page 50: Building Multilingual and Crosslingual Semantic Resources with Volunteer Contributions over the Web Rada Mihalcea University of North Texas.](https://reader035.fdocuments.in/reader035/viewer/2022062714/56649d145503460f949e93b2/html5/thumbnails/50.jpg)
Evaluation Results
• Significant amount of information can be conveyed through pictures – 76%, compared to the baseline of 0%– Due to the intuitive visual
descriptions that can be assigned to some of the concepts in the text
– Due to humans’ ability to contextualize • Read a book is a more common
interpretation than read about a book• “He sees the riverbank illuminated by a
torch”
![Page 51: Building Multilingual and Crosslingual Semantic Resources with Volunteer Contributions over the Web Rada Mihalcea University of North Texas.](https://reader035.fdocuments.in/reader035/viewer/2022062714/56649d145503460f949e93b2/html5/thumbnails/51.jpg)
Evaluation Results• S1 (pictures) vs. S2 (pictures with words)
– 3.81 vs. 4.32– role played by context that cannot be
described with visual representations– adjectives, adverbs, prepositions, abstract
nouns, verbs cannot be translated into pictures but are important in the communication process
• S2 (pictures with words) vs S3 (words)– 4.32 vs. 4.40– advantage of words over pictures in
producing accurate interpretations
![Page 52: Building Multilingual and Crosslingual Semantic Resources with Volunteer Contributions over the Web Rada Mihalcea University of North Texas.](https://reader035.fdocuments.in/reader035/viewer/2022062714/56649d145503460f949e93b2/html5/thumbnails/52.jpg)
Outline
[building resources]• I: Building multilingual WordNets
• II: Building a (crosslingual) pictorial WordNet [using resources]• III: Applications
• IV: Conclusions
![Page 53: Building Multilingual and Crosslingual Semantic Resources with Volunteer Contributions over the Web Rada Mihalcea University of North Texas.](https://reader035.fdocuments.in/reader035/viewer/2022062714/56649d145503460f949e93b2/html5/thumbnails/53.jpg)
Conclusions• Multilingual and cross-lingual
semantic networks can be constructed with the help of volunteer contributions over the Web
• Advantages– faster than manual experts – more accurate than automatic approaches– taps on the “collective mind”
• Potentially infinite
– construct resources from scratch or validate automatically acquired knowledge
![Page 54: Building Multilingual and Crosslingual Semantic Resources with Volunteer Contributions over the Web Rada Mihalcea University of North Texas.](https://reader035.fdocuments.in/reader035/viewer/2022062714/56649d145503460f949e93b2/html5/thumbnails/54.jpg)
Conclusions
• Challenges– (multilingual networks) require
bilingual users– definition of difficult concepts– Quantity:
• User motivation• Disguise scientific tasks as games• Rewards
– Quality:• Multiple annotations• Human expert supervision
![Page 55: Building Multilingual and Crosslingual Semantic Resources with Volunteer Contributions over the Web Rada Mihalcea University of North Texas.](https://reader035.fdocuments.in/reader035/viewer/2022062714/56649d145503460f949e93b2/html5/thumbnails/55.jpg)
Conclusions
• Multilingual and cross-lingual semantic networks can be used as knowledge bases for building communication tools– machine translation– pictorial translation
![Page 56: Building Multilingual and Crosslingual Semantic Resources with Volunteer Contributions over the Web Rada Mihalcea University of North Texas.](https://reader035.fdocuments.in/reader035/viewer/2022062714/56649d145503460f949e93b2/html5/thumbnails/56.jpg)
• Rada Mihalcea, Ben Leong, Toward Communicating Simple Sentences Using Pictorial Representations, in Proceedings of the 7th Biennial Conference of the Association for Machine Translation in the Americas (AMTA), Boston, MA, August 2006.
• Andy Borman, Rada Mihalcea, Paul Tarau, PicNet: Pictorial Representations for Illustrated Semantic Networks, in Proceedings of the AAAI Spring Symposium on Knowledge Collection from Volunteer Contributors, Stanford, CA, March 2005.
• Nathaniel Ayewah, Rada Mihalcea, and Vivi Nastase, Building Multilingual Semantic Networks with Non-Expert Contributions over the Web, in Proceedings of the KCAP 2003 Workshop on Distributed and Collaborative Knowledge Capture, Sanibel Island, Florida, November 2003.
For More Information