Plagirism checker

Post on 18-Jul-2015

123 views 6 download

Transcript of Plagirism checker

Plagiarism Checker

What is Plagiarism ?

to steal and pass off (the ideas or words of another)

as one's own

to use (another's production) without crediting the source

to commit literary theft

to present as new and original an idea or product

derived from an existing source

Not just Copying or borrowing

Types of Plagiarism ?

CLONESubmitting another’s work, word-for-word, as one’s own

CTRL-CContains significant portions of text from a single source without alterations

FIND - REPLACEChanging key words and phrases but retaining the essential content of the source

REMIXParaphrases from multiple sources, made to fit together

RECYCLEBorrows generously from the writer’s previous work without citation

HYBRIDCombines perfectly cited sources with copied passages without citation

MASHUPMixes copied material from multiple sources

404 ERRORIncludes citations to non-existent or inaccurate information about sources

AGGREGATORIncludes proper citation to sources but the paper contains almost no original work

RE-TWEETIncludes proper citation, but relies too closely on the text’s original wording and/or structure

Algorithm

How To do it practically Document 1

• A document is a written, drawn, presented or recorded representation of thoughts. Originating from the Latin Documentum meaning lesson -the verb doceō means to teach, and is pronounced similarly, in the past it was usually used as a term for a written proof used as evidence. In the computer age, a document is usually used to describe a primarily textual file, along with its structure and design, such as fonts, colors and additional images.

Document 2

• A document is a written, drawn, presented or recorded representation of thoughts. Originating from the Latin Documentum meaning lesson -the verb doceō means to teach, and is pronounced similarly, in the past it was usually used as a term for a written proof used as evidence. In the computer age, a document is usually used to describe a primarily textual file, along with its structure and design, such as fonts, colors and additional images.

Threeshold

Algorithm 1 (document

level),

Algorithm 3 (sentence

level).

(Lexical semantics )-lesk

WordNet

Algorithm 2 (paragraph

level),

Two input documents

• Input : DocA, DocB // Two input documents

• Output: similarity

• Begin

• DocMinSize = min (|DocA|, |DocB|)

• DocIntersectionSize = |DocA ∩ DocB|

• If (DocIntersectionSize >= DocMinSize*DocThreshold)

• Then

• //Possible similarity

• //Check similarity at paragraph level

• similarity = true

• Else

• similarity = false

• End

Two input paragraphs

• Input : ParA, ParB // Two input paragraphs

Output: similarity

• Begin

• ParMinSize = min (|ParA|, |ParB|)

• ParIntersectionSize = |ParA ∩ ParB|

• If (ParIntersectionSize >= ParMinSize*ParThreshold)

• Then

• //Possible similarity

• //Check similarity at sentence level

• similarity = true

• Else

• similarity = false

• End

Sentence level

• Algorithm 3: Sentence level heuristic

• Input : SenA, SenB

• Output: similarity, similar substrings in SenA and SenB

• Begin

• SenMinSize = min(|SenA|, |SenB|)

• SenIntersectionSize = |SenA ∩ SenB|

• If (SenIntersectionSize >= SenMinSize*SenThreshold)

• Then

• //Similarity detected

• //Determine similar

• //substrings

• similarity = true

• Else

• similarity = false

• Else

• similarity = false

• End

WordnetWordNet

•A very large lexical database of English:

–117K nouns, 11K verbs, 22K adjectives, 4.5K adverbs

•Word senses grouped into synonym sets (“synsets”) linked into a conceptual-semantic hierarchy

–82K noun synsets, 13K verb synsets, 18K adjectives synsets, 3.6K adverb synsets

–Avg. # of senses: 1.23/noun, 2.16/verb, 1.41/adj, 1.24/adverb

•Conceptual-semantic relations

–hypernym/hyponym

Lesk algorithm

Compare the context with the dictionary definition of the sense

–Construct the signatureof a word in context by the signatures of its senses in the dictionary

•Signature= set of context words (in examples/gloss or in context)

–Assign the dictionary sense whose gloss and examples are the most similarto the context in which the word occurs

•Similarity = size of intersection of context signature and sense signature

Sense signatures -------bank1

Gloss: a financial institution that accepts deposits and channels

the moneyinto lending activities

Examples: “he cashedthe checkat the bank”,

“that bank holdsthe mortgageon my home”

------bank2

Gloss: slopingland(especially the slopebeside a bodyof water)

Examples: “they pulledthe canoeup on the bank”,

“he saton the bank of the riverand watchedthe current”

Signature(bank1) = {financial, institution, accept, deposit,

channel, money, lend, activity, cash, check, hold, mortgage, home}

Signature(bank1) = {slope, land, body, water, pull, canoe, sit,

river, watch, current}

Final Result Uniqe

Also may be containing a report with details

Team Members NLP

Eslam Hamouda

Ahmed Wahdan

HossamNabih

Mohamed Shalan

Demo

Thank You