Fingerprinting Text in Logical Markup Languages Author : Christian D. Jensen Department of Computer...

8

Click here to load reader

description

How does it work ?

Transcript of Fingerprinting Text in Logical Markup Languages Author : Christian D. Jensen Department of Computer...

Page 1: Fingerprinting Text in Logical Markup Languages Author : Christian D. Jensen Department of Computer Science Trinity College Dublin ©Springer-Verlag Heidelberg.

Fingerprinting Fingerprinting Text in Logical Text in Logical

Markup Markup LanguagesLanguages

Author : Christian D. JensenAuthor : Christian D. JensenDepartment of Computer ScienceDepartment of Computer Science

Trinity College DublinTrinity College Dublin©©Springer-Verlag Heidelberg 2001Springer-Verlag Heidelberg 2001ISC 2001ISC 2001, LNCS 2200, pp. 433-445, , LNCS 2200, pp. 433-445,

2001.2001. Presenter: ChaoLi OuPresenter: ChaoLi Ou

Page 2: Fingerprinting Text in Logical Markup Languages Author : Christian D. Jensen Department of Computer Science Trinity College Dublin ©Springer-Verlag Heidelberg.

SummarySummary3 approaches to Fingerprinting text file:• Open space – vulnerable to Optical

character recognition attack.• Syntactic Fingerprinting – easy to

recognize.• Semantic Fingerprinting-based on

synonym substitutionSemantic Fingerprinting is the best of three

approaches.

Page 3: Fingerprinting Text in Logical Markup Languages Author : Christian D. Jensen Department of Computer Science Trinity College Dublin ©Springer-Verlag Heidelberg.

How does it work ?How does it work ?

Page 4: Fingerprinting Text in Logical Markup Languages Author : Christian D. Jensen Department of Computer Science Trinity College Dublin ©Springer-Verlag Heidelberg.

CommentsComments• Appreciated Comment

The author explored a new approach to do the fingerprinting in HTML or XML text. The algorithm is simple and clearly outline.

Page 5: Fingerprinting Text in Logical Markup Languages Author : Christian D. Jensen Department of Computer Science Trinity College Dublin ©Springer-Verlag Heidelberg.

CommentsCommentsCritical Comments: 1. This article doesn’t mention bit rate –

How many data can be hidden by the text.

Page 6: Fingerprinting Text in Logical Markup Languages Author : Christian D. Jensen Department of Computer Science Trinity College Dublin ©Springer-Verlag Heidelberg.

CommentsCommentsMy bit-rate estimation:Assumptions 1. One line contains one synonym substitution word2. The key is: int [3] key• Results:

Amount of lines Bit Rate<3 06 120 (6x5x4)10 720 (10x9x8)100 970,200 (100x99x98)

Conclusion :Short Text document may not work.

Page 7: Fingerprinting Text in Logical Markup Languages Author : Christian D. Jensen Department of Computer Science Trinity College Dublin ©Springer-Verlag Heidelberg.

Comments (Cont.)Comments (Cont.)2. [4.2,Sub Attacks, Ln 9] Christian

says:“…. However, treason is normally best committed in secret which reduces the risk of collusion among users”

One user may have several User Ids to a system (ex, hotmail).

No one can prevent a group of people share their synonym substitution text.

Page 8: Fingerprinting Text in Logical Markup Languages Author : Christian D. Jensen Department of Computer Science Trinity College Dublin ©Springer-Verlag Heidelberg.

Comments (Cont)Comments (Cont)3.[5.Conclusions,Ln 16] “…simple substitution

requires intervention by the user who is fingerprint the document”

The user may do a wrong substitution by misunderstanding the document. How does the system protect the author’s “right of integrity”?

Question ?Question ?Do you agree that we can find the key easily

by comparing two or three synonym substitution document ?