Computational Linguistics (Part II) .. FCIS'13 - ASU

21
Computational nguistics (Part II) By: Abdohelal 1

description

Computational Linguistics Summary For FCIS'13- ASU Part (II)

Transcript of Computational Linguistics (Part II) .. FCIS'13 - ASU

Page 1: Computational Linguistics (Part II) .. FCIS'13 - ASU

1

Computational Linguistics (Part II)

By: Abdohelal

Page 2: Computational Linguistics (Part II) .. FCIS'13 - ASU

By:AbdoHelal 2

-What is Computational Linguistics

-Approaches of the Study Of Computational Linguistics

Developmental Structural Production Comprehension

-Internet Linguistics

Sociolinguistic Perspective

-What is Internet Linguistics-Internet Linguistics Perspectives

Educational Perspective

Stylistic Perspective

AppliedPerspective

• Multilingualism•Language Change•Conversation Discourse•Stylistic Diffusion•MetaLanguage

-Linguistic Future Of The Internet

-Translation Memory

Page 3: Computational Linguistics (Part II) .. FCIS'13 - ASU

By:AbdoHelal 3

What Is Computational Linguistics::

Computational linguistics is an interdisciplinary field concerned with the rule-based modeling of natural language from a computational perspective.

Computational linguistics works with language experts and computer scientists and it draws upon the involvement of : 1-Linguists, 2-Mathematicians, 3-Computer scientists, 4-Experts in artificial intelligence, 5-Logicians, 6-Cognitive science, 7- Cognitive psychologists, 8-Psycholinguistis.

It has theoretical components which takes up issues in theoretical linguistics and cognitive science, and also has Applied components which focuses on the practical outcome of modeling human language use.

Computational Linguistics is originated with efforts in the United states in the 1950s.

Page 4: Computational Linguistics (Part II) .. FCIS'13 - ASU

By:AbdoHelal 4

What Is Computational Linguistics::

Computational linguistics is a new field to study devoted to developing algorithm and software for intelligently processing language data.

Artificial intelligence came into existence in the 1960s. Morphology : The grammar of word form, Syntax: The

grammar of sentence structure, Semantics: The study of the meaning, Lexicon: The meaning in the dictionary, Pragmatics: The Usage of language.

Research within the scope of computational linguistics is done at computational linguistics departments, some researches aim to create working speech or text processing system, others aim to create a system allowing human-machine interaction.

Conversational agents: programs meant for human-machine communications.

Page 5: Computational Linguistics (Part II) .. FCIS'13 - ASU

By:AbdoHelal 5

-Approaches of the Study Of Computational Linguistics* Developmental Approach

Examine language acquisition and development.Disadvantages: 1-Takes long time to learn 2-Only correct evidence is provided and this is

insufficient. Language can be learned more efficiently with a

combination of simple input at first presented incrementally.

-Contributions of Developmental approach are : 1- Neural network 2-Robotic system (in order to test linguistics theories ) :

these robots are able to acquire functioning word-to-meaning mapping without needing grammar structure

3-Predication of future changes in language and give insight into evolutionary history of modern days language.

Page 6: Computational Linguistics (Part II) .. FCIS'13 - ASU

By:AbdoHelal 6

-Approaches of the Study Of Computational Linguistics*Structural Approach : One of the most important pieces of being able to study

linguistic structure is the availability of large linguistic corpora.

Penn Treebank: one of the most cited linguistics corpora, containing over 4.5 million words of American English, this corpus has been annotated for part-of-speech information.

-Contributions of Structural approach are: 1- allows computational linguistics to have a framework to

work out hypothesis that will further the understanding of the language in several ways

2-Allowrs for the discovery and implementation of similarity recognition between pairs of text utterances.

Structural data is not simply available for English but available for other languages such as Japanese.

Computational linguistics allow scientists to parse large amount of data reliably and efficiently, creating possibility for discoveries unlike any other approach.

Page 7: Computational Linguistics (Part II) .. FCIS'13 - ASU

By:AbdoHelal 7

-Approaches of the Study Of Computational Linguistics*Production Approach

Very complex approach as it deals with all the skills that a person need to speak a language fluently.

Comprehension in only half the battle of communication , the other half is how system produces language.

" Alan Turing " proposed the possibility that machine might one day be able to think, he proposed an ' imitation test ' in which human subject has two text-only conversations, one with a human and another with machine attempting to respond as a human, if the subject cannot tell the difference between the machine and human it may be concluded that the machine is capable of thinking.

Today, this test is called ' Turing Test '. ELIZA program is one of the earliest and best known examples of

computer programs designed to converse naturally with humans, its developed by " Joseph Weizenbaum " at MIT in 1966.

In an effort to improve computer translation, several methods have been compared including : 1- Hidden Markov models, 2-Smoothing techniques, and the specific refinements of those to apply them to verb translation.

Production approach has also done in making computer produce language in more naturalistic manner, making human-computer interaction much more natural.

Page 8: Computational Linguistics (Part II) .. FCIS'13 - ASU

By:AbdoHelal 8

-Approaches of the Study Of Computational Linguistics*Comprehension Approach

Much of focus of modern computational linguistics is on comprehension.

Bayesian statistics have applied to the task of character recognition illustrated by Bledsoe and Browing in 1959, and also applied to language analysis included the work of Mosteller and Wollace in 1963.

Lunar is a project developed by NASA to answer written questions about geographically analysis of Lunar rocks by the Apollo missions.

Signal modeling language was achieved with the use of Hidden Markov models detailed by Rabiner in 1989.

Applications on Comprehension approach:1-Topic Identification, 2-Improved search engines, 3-Automated customer service, 4-Online Education.

Page 9: Computational Linguistics (Part II) .. FCIS'13 - ASU

By:AbdoHelal 9

Internet Linguistics : It is a sub domain of linguistics advocated by

David Crystal. It studies the new language styles and forms that have arisen under the influence of internet and other new media ,such as: SMS, HCI, CMC, IMC

Contribution of Internet Linguistics: Studying the emerging language of the internet will help improving the conceptual organizations, translation and web usability, and that will benefit both linguists and web users.

Four main perspectives of Internet Linguistics are : Sociolinguistics, Educational, Stylistics, and Applied.

Page 10: Computational Linguistics (Part II) .. FCIS'13 - ASU

By:AbdoHelal 10

Sociolinguistics perspective :

Deals with how the society views the impact of internet development on language.

It changed the way people communicate and created new platform with far-reaching social impact.

ways of social communication : SMS, E-Mails, Chat groups, Virtual worlds, and the Web.

Influence of Internet language personally, CMC such as SMS text messaging and e-mailing has greatly enhanced instantaneous communication, such as : Blackberry & iPhone.

Influence of Internet language on Education: in school, it's common for students and educators to be given personalized e-mail accounts for communication and interaction purposes, classrooms discussions are increasingly brought onto the internet in form of discussion forums.

Influence of Internet language professionally, it is a common sight for companies to have their computers and laptops hooked up onto the internet, it facilitates internal and external communication, Mobile communication such as smart phones are increasingly making their way into the corporate world.

Page 11: Computational Linguistics (Part II) .. FCIS'13 - ASU

By:AbdoHelal 11

Sociolinguistic can be examined through five themes:

Multilingualism: It looks at the status of the various language on the internet.

Language change: It explores the linguistic changed over time, with emphasis on the internet lingo.

Conversational discourse: It explores the change in patterns of social interaction and communicative practice on the internet.

Stylistic diffusion: It involves the study of the spread of the internet jargons and related linguistic forms into common usage.

Meta-language and folk linguistics: It involves looking at the way these linguistic forms and changes on the internet are being labeled and discussed.

Page 12: Computational Linguistics (Part II) .. FCIS'13 - ASU

By:AbdoHelal 12

Educational perspective: Examine the internet impact on formal language use The rapid spread of internet use has brought onto new

features such as: The increase in usage of informal written language. Inconsistency of written styles and stylistics and the use of new

abbreviations in the internet chats and SMS. Constraints of technology on the word count contributed to the rise of

new abbreviations such as acronyms, and examples of acronyms are "LOL (Laughing out loud) - GTG (Got to go) - OMG (Oh my God)".

Disadvantages of Internet use: Informal language and incorrect words use in academic and formal

situation such as the use of the casual word "Guy" and the choice of the word "Preclude" instead of "Precede".

Use of abbreviations in the academic work such as "u" for "you" and "2" for "two".

Advantages of Internet use: Internet provides potential benefits in enhancing language learners

through communication aspects (use of E-mail, discussion forums, chatting messenger and blogs...)

IMC allow for the greater interaction between language learners and the native speakers of the language, providing for the better error corrections and more learning opportunity of the standard language allowing picking up of some special skills such as negotiation and persuasion.

Page 13: Computational Linguistics (Part II) .. FCIS'13 - ASU

By:AbdoHelal 13

Stylistic perspective : Examine how the internet and its related technologies have encouraged new and

different forms of creativity in language. This new mode of language is interesting to study because it is an mixture of both

spoken and written languages, Traditional writing is static compared to the dynamic nature of new language on the internet where words can appear in different colors and font sizes on the computer screen.

This new mode of language also contains other elements not found in natural languages, example is the concept of framing found in e-mails and discussion forums.

Mobile Phone (cell phones) : have expressive potential beyond their basic communicative functions, The 160-character limit imposed by cell phone have motivated the users to exercise their linguistic creativity to overcome them. Cell phone has also created a new literary genre (cell phone novels).

Blogs : Blogging has brought about new ways of writing diaries and from a linguistic perspective, the language used in blogs is published to the world to see without undergoing the formal editing process. Blogs have become so popular that they have expand beyond written blogs with emerging to photoblog, videoblog, audioblog, mobileblog.

Virtual worlds : provide insight of how users are adapting the usage of their natural language for communication within these new mediums. Some of CMC strategies used include capitalization for words such a "EMPHASIS" , creative usage of the punctuation like "??!?!?!", and usage of symbols such as the asterisk to enclose words such as "*Stress*". Virtual worlds are good tools for language learning among younger learners as they already see such places as a "place to learn and play".

Page 14: Computational Linguistics (Part II) .. FCIS'13 - ASU

By:AbdoHelal 14

Stylistic Perspective :

E-Mails : One of the most popular Internet-related technologies is E-mail, which expanded the stylistics of language in many ways. There is a hybrid of speech and writing styles in terms of format, grammar, and style. Email is rapidly replacing traditional letter-writing because of its convenience, speed, and spontaneity.

Instant messaging : has developed its own acronyms and short forms. Instant messaging is quite different from email and chat-groups because it allows participant to interact in real-time while conversing in private. There are also greater occurrences of stylistic variation because there can be a very wide age gap between participants.

Page 15: Computational Linguistics (Part II) .. FCIS'13 - ASU

By:AbdoHelal 15

Applied Perspective :

Views the linguistic exploitation of the internet in terms of its communicative capabilities - The good and the bad.

The internet is a platform where minority and endangered languages seek to revive their languages use and to create awareness, it provides these languages opportunities to make progress in two important regards : 1- Language documentation, 2-Language revitalization (تنشيط).

Language documentation : The internet facilitates language documentation. Digital archives of media help to preserve language documentation and

allow global dissemination through the internet. Publicity about endangered languages has helped a spur worldwide

interest in linguistic documentation. The HRELP is a project that seeks to document endangered languages,

preserve and disseminate documentation materials amount others. Language Newsletter provides news and articles about topics in

endangered languages.

Page 16: Computational Linguistics (Part II) .. FCIS'13 - ASU

16

Applied Perspective :

By:AbdoHelal

Language revitalization : The internet facilitates language revitalization. Virtual environments (emails, chats, instant messaging) have

helped to bridge the distance between communicators. The use of e-mails facilitates language revitalization in the sense

that speakers of minority languages who moved to a location where their native language is not spoken, can use the internet to communicate with their family and friends, thus maintaining the use of their native language.

Leoki (powerful voice) : is a system developed in Hawaiian where the content, interface and menus are entirely in the Hawaiian language.

Another use of the internet include having students of minority languages write about their native cultures in their native language for distant audience, in attempt to preserve their language and culture.

Page 17: Computational Linguistics (Part II) .. FCIS'13 - ASU

By:AbdoHelal 17

Linguistic Future Of the Internet : People Will alter their language use to suit the dimensions

of Communication. The Increase of The Internet Users make cultural

background , habits , and language differences to be brought to The Web.

The Internet is on its way to become more diverse multilingual web.

The interaction between English and other Languages will be important to study it.

Promotion Will be done to The Minority Languages. However , the Minority Languages will be affected by the The Majority Ones. Speakers Of Minority Languages will be encouraged To

Learn The majority languages to be Allowed to access more Re-Sources

The Future Of Minority Languages is in danger Due to the Spread Of the internet

Page 18: Computational Linguistics (Part II) .. FCIS'13 - ASU

By:AbdoHelal 18

Translation Memory : Translation Memory : is a database that stores segments that

have been translated previously To Aid Human Translation Source-Text and its corresponding translation in “translation

Units” Words Are handled by “Terminology Bases” Software Using the TM Are Called (TMM) Translation Memory

Managers. TM is used in CAT Tools , Word-processing and Terminology

Management systems. Many Companies producing multilingual documentation are

using TM Systems-How The TMs Work :

1- Breaking the Source-Text into Segments. 2- Looks For Matches Between The Segments. 3- Presents Such Matching Pairs As Translation Candidates. 4- Accepting a Candidate and Replacing it With Fresh

Translation , Modify , Or To match them To the Source. 5- Saving The database

Page 19: Computational Linguistics (Part II) .. FCIS'13 - ASU

By:AbdoHelal 19

Translation Memory : Typical TMs only search for text in the source segment. Segments where no-match Found Will have be Translated

Manually and to be saved in the database. TMs Work best on texts which are highly repeative Such as

Technical manuals.

Main Benefits1- Ensuring that the Document is Completely

Translated 2- Ensuring Consistency, Including Common

Definitions And Terminology.3- Various Formats To Be Translated 4- Accelerating The Overall Translation Process5- Reducing Time And Money

Page 20: Computational Linguistics (Part II) .. FCIS'13 - ASU

By:AbdoHelal 20

Translation Memory :

Main Obstacles1- Recycled Translation Lost an Important Princible

is that “Taking The message From the Text”2- Not Supporting All Files Types3- Can’t Work with the Repeative lack Text 4- Quality Of The translated Text is not Guaranteed5- Dealing With the Text Sentence-By-Sentence ,

Instead of the Whole Meaning6- Expensive Software , And The more Cheaper

Software used , the less Features That you Will See.

Also Read the Rest of these Obstacles in the book

Page 21: Computational Linguistics (Part II) .. FCIS'13 - ASU

By:AbdoHelal 21

Finished

Special Thanks To :: Farah El-Mowaled

Created By : Abdohelal www.abdelrahman-othman.blogspot.com