Doc Name – 1 Telcordia Technologies Proprietary - Internal use only. See proprietary restrictions...

13
Doc Name – 1 Telcordia Technologies Proprietary - Internal use only. See proprietary restrictions on title page. The Scholars BALD heads forgetful of their sins, Old, learned, respectable bald heads Edit and annotate the lines That young men, tossing on their beds, Rhymed out in love’s despair 5 To flatter beauty’s ignorant ear. They’ll cough in the ink to the world’s end; Wear out the carpet with their shoes Earning respect; have no strange friend; If they have sinned nobody knows. 10 Lord, what would they say Should their Catullus walk that way? W.B. Yeats (1865–1939). The Wild Swans at Coole. 1919.
  • date post

    20-Dec-2015
  • Category

    Documents

  • view

    221
  • download

    1

Transcript of Doc Name – 1 Telcordia Technologies Proprietary - Internal use only. See proprietary restrictions...

Doc Name – 1Telcordia Technologies Proprietary - Internal use only. See proprietary restrictions on title page.

The Scholars

BALD heads forgetful of their sins, Old, learned, respectable bald heads

Edit and annotate the lines That young men, tossing on their beds,

Rhymed out in love’s despair 5 To flatter beauty’s ignorant ear.

They’ll cough in the ink to the world’s end;

Wear out the carpet with their shoes Earning respect; have no strange friend; If they have sinned nobody knows. 10

Lord, what would they say Should their Catullus walk that way?

W.B. Yeats (1865–1939). The Wild Swans at Coole. 1919.

From Data to Meta-DataPart 1: Annotation

Telcordia Technologies Proprietary – Internal Use OnlyThis document contains proprietary information that shall be distributed, routed or made available only within Telcordia Technologies, except with written permission of Telcordia Technologies.

Prepared For:Real World Data Class

An SAIC Company

Doc Name – 3Telcordia Technologies Proprietary - Internal use only. See proprietary restrictions on title page.

Adding Meta-Data to Sensor Streams

Annotation

Indexing

Linking

Chunking

…|X-4| X-3| X-2| X-1| X0| X1| X2| X3| X4| X5| X6| X7| X8| X9| X10| X11| X12| …

Walking

…|X-4| X-3| X-2| X-1| X0| X1| X2| X3| X4| X5| X6| X7| X8| X9| X10| X11| X12| …

…|t-4| t-3| t-2| t-1| t0| t1| t2| t3| t4| t5| t6| t7| t8| t9| t10| t11| t12| …

…|X-4| X-3| X-2| X-1| X0| X1| X2| X3| X4| X5| X6| X7| X8| X9| X10| X11| X12| …

…|Y-4| Y-3| Y-2| Y-1| Y0| Y1| Y2| Y3| Y4| Y5| Y6| Y7| Y8| Y9| Y10| Y11| Y12| …

…|X-4| X-3| X-2| X-1| X0| X1| X2| X3| X4| X5| X6| X7| X8| X9| X10| X11| X12| …

Doc Name – 4Telcordia Technologies Proprietary - Internal use only. See proprietary restrictions on title page.

Annotation Annotation

– Etymology: Latin annotatus, past participle of annotare, from ad- + notare to mark

– intransitive senses : to make or furnish critical or explanatory notes or comment; To gloss a text.

Note– Etymology: Middle English, from Old French noter, from Latin

notare to mark, note, from nota1 a : to notice or observe with care b : to record or preserve in writing2 a : to make special mention of

Annotea Project– By annotations we mean comments, notes, explanations, or

other types of external remarks that can be attached to any Web document or a selected part of the document without actually needing to touch the document.

Doc Name – 5Telcordia Technologies Proprietary - Internal use only. See proprietary restrictions on title page.

Examples of Data Annotation

Doc Name – 6Telcordia Technologies Proprietary - Internal use only. See proprietary restrictions on title page.

An Information Theoretic View Annotation

– Altering the output of one information source (T) using the output of another information source (N) Marking: specification of location or address within the output of an

information source – “Mark my words”– Highlighting, underlining, arrows

Noting: including additional information– Footnotes, margin notes

In-Band Annotation – I(N|T) = 0

Marking atypical or unusual outputs from an information source. Summarization, Glossing, Aggregating

Out-of-Band Annotation– I(N|T) > 0

Marking with the output of an external information source that provides critical or explanatory information that cannot be derived from the original information source

Doc Name – 7Telcordia Technologies Proprietary - Internal use only. See proprietary restrictions on title page.

In-Band Marking

What do we Mark?– Bookmark

Where you left off : – needed when switching between tasks;

– required for time-slicing;

– defined with respect to a consuming application

– Underlining/highlighting regions that are: Unusual Epitomize – prime examples Summarize Define Regions that require more processing

– ambiguous passages (? In the margin)

– Unknown, missing, or new words

Semantically Significant– Subject, object, action, event

Doc Name – 8Telcordia Technologies Proprietary - Internal use only. See proprietary restrictions on title page.

In-Band Marking Why do we Mark?

Given a data stream and a marking stream what can we do?

– Learning/Memory The act of marking reinforces memory traces

– Efficient searching landmarks providing a coordinate system index, table of contents, titles, headings Finding a part in a book

– Compression distillation, summarization

– Reminders need for further processing in multi-pass or multi-stage processing

– Changes in context marking transitions Indication of stationarity or non-stationarity

Doc Name – 9Telcordia Technologies Proprietary - Internal use only. See proprietary restrictions on title page.

In-Band MarkingEmpirical Estimates of Entropy• Entropy of English

• Shannon 1951• N-gram Frequencies – Markov Models 2.3 bpc (2ond order)• Human Experiments - # guesses 0.6-1.3 bpcHumans are much more efficient at estimating entropy.Humans are good at keeping track of long range dependencies, following

book plots for hundreds of pages.Markov models have bounded context-depths and thus cannot capture the

strong long range dependencies encountered in written english that are influenced by context and semantics.

• Cover and King, 1978• Human Experiments – betting experiments 1.25-1.35 bpc

• Teahan and Cleary, 1996• better models – Prediction w/ partial Matching (PPM), statistical

preprocessing and training 1.46 bpc

• Kontoyiannis, 1997• Compression based entropy estimators, effect of style

How can we estimate entropy of Data Streams?

Doc Name – 10Telcordia Technologies Proprietary - Internal use only. See proprietary restrictions on title page.

In-Band Marking20 Questions and Variations• Real-world 20 questions

• What assumptions do we make?• Semantic assumptions• Contextual assumptions

• Game show variants – most Shannon like to least• Wheel of Fortune

• Answers – semantically meaningful phrases or sentences• Questions – limited to single letters• Clues – textual unit, word boundries

• Lingo• Answers – 5 letter words• Questions – limited to single letters• Clues – initial letter

• $100,000 Pyramid• Answers – words• Questions - words• Clues – topic, things semantically associated

• Matchgame• Answers – word• Clues - story

Doc Name – 11Telcordia Technologies Proprietary - Internal use only. See proprietary restrictions on title page.

In-band Marking

Estimating Entropy of Sensor DataMarking stream can provide clues about: Units of correlation

– Spacing, punctuation, capitalization, headings

Information source. Author as opposed to their works – Dynamics: What state it is in? State transitions.

Ex. Heart Rhythm

– Statics: Modeling Ex. Stylistics (In-band) – individual characteristics Biographical background, historical background (out-of-band)

…|X-4| X-3| cX-2| X-1| X0| X1| X2| X3| X4| X5| X6| X7| X8| X9| X10| X11| X12| …

1 2 3 4 5 1 2 3 4

Doc Name – 12Telcordia Technologies Proprietary - Internal use only. See proprietary restrictions on title page.

In-Band Marking

How do we mark?Can we discover the transitions?

– Looking for transitions

– Templates/Pattern Matching

Doc Name – 13Telcordia Technologies Proprietary - Internal use only. See proprietary restrictions on title page.

Information contained in marks – only location information

Conversion of data stream into binary stream

Entropy of timing of events – frequency of events

I(T|N)