Information retrieval wed sept 02 2015 data…. -start at 6.45.

32
information retrieval wed sept 02 2015 data…

Transcript of Information retrieval wed sept 02 2015 data…. -start at 6.45.

Page 1: Information retrieval wed sept 02 2015 data…. -start at 6.45.

information retrieval

wed sept 02 2015

data…

Page 2: Information retrieval wed sept 02 2015 data…. -start at 6.45.

-start at 6.45

Page 3: Information retrieval wed sept 02 2015 data…. -start at 6.45.

framework for today’s lecture…

Page 4: Information retrieval wed sept 02 2015 data…. -start at 6.45.
Page 5: Information retrieval wed sept 02 2015 data…. -start at 6.45.

STRUCTURED vs unstructured data

easy to envision structured data in terms of “tables”

5

Employee Manager Salary

Smith Jones 68000

Chang Smith 65000

50000Ivy Smith

Typically allows numerical range and exact match (for text) queries, e.g., Salary < 60000 AND Manager = Smith.

Page 6: Information retrieval wed sept 02 2015 data…. -start at 6.45.
Page 7: Information retrieval wed sept 02 2015 data…. -start at 6.45.

tables in a MS Access relational database –

defines each defining a social networking site

Page 8: Information retrieval wed sept 02 2015 data…. -start at 6.45.

Data entry form in a MS Access relational

database – create each record

Page 9: Information retrieval wed sept 02 2015 data…. -start at 6.45.
Page 10: Information retrieval wed sept 02 2015 data…. -start at 6.45.
Page 11: Information retrieval wed sept 02 2015 data…. -start at 6.45.

• typically refers to free text• email is a good example of unstructured data.

it's indexed by date, time, sender, recipient, and subject, but the body of an email remains unstructured

• other examples of unstructured data include books, documents, medical records, and social media posts

structured vs UNSTRUCTURED data

Page 12: Information retrieval wed sept 02 2015 data…. -start at 6.45.

magazine article is an example of

unstructured data

Page 13: Information retrieval wed sept 02 2015 data…. -start at 6.45.
Page 14: Information retrieval wed sept 02 2015 data…. -start at 6.45.

Document collection(corpus)

Index

Query

Representation function Representation

function

Matching function

Results

CATEGORIESSUBJECT HEADINGS

Page 15: Information retrieval wed sept 02 2015 data…. -start at 6.45.
Page 16: Information retrieval wed sept 02 2015 data…. -start at 6.45.

KWICKey word in context

Page 17: Information retrieval wed sept 02 2015 data…. -start at 6.45.

KWICKey word in context

Page 18: Information retrieval wed sept 02 2015 data…. -start at 6.45.

metadata

metadata

Page 19: Information retrieval wed sept 02 2015 data…. -start at 6.45.

What is Metadata?

• Classic definition: data about data• Metadata is structured information that

describes, explains, locates, or otherwise makes it easier to retrieve, use, or manage an information resource. (NISO)

• 3 primary “types”: – Descriptive– Structural– Administrative (rights management, preservation)

Page 20: Information retrieval wed sept 02 2015 data…. -start at 6.45.
Page 21: Information retrieval wed sept 02 2015 data…. -start at 6.45.

digital forensicsdigital forensics

Page 22: Information retrieval wed sept 02 2015 data…. -start at 6.45.

This reading really made me think about how easily accessible and organized information is today because of the implementation of metadata.

It sparked a few questions: Without metadata, how would accessing data, resources and information be different in today’s society?

-Chris

Page 23: Information retrieval wed sept 02 2015 data…. -start at 6.45.

http://search.lib.unc.edu/search?R=UNCb7097376

More Metadata: A Cataloging Record

Page 24: Information retrieval wed sept 02 2015 data…. -start at 6.45.

The Idea of Facets

• Facets are a way of labeling data– A kind of Metadata (data about data)– Can be thought of as properties of items

• Facets vs. Categories– Items are placed INTO a category system– Multiple facet labels are ASSIGNED TO items

Page 25: Information retrieval wed sept 02 2015 data…. -start at 6.45.

Facets Epicurious example http://www.epicurious.com/

• Create INDEPENDENT categories (facets)– Each facet has labels (sometimes arranged in a

hierarchy)

• Assign labels from the facets to every item– Example: recipe collection

Course

Main Course

CookingMethod

Stir-fry

Cuisine

Thai

Ingredient

Bell Pepper

Curry

Chicken

Page 26: Information retrieval wed sept 02 2015 data…. -start at 6.45.

The Idea of Facets• Break out all the important concepts into their

own facets• Sometimes the facets are hierarchical– Assign labels to items from any level of the

hierarchy

Preparation Method Fry Saute Boil Bake Broil Freeze

Desserts Cakes Cookies Dairy Ice Cream Sorbet Flan

Fruits Cherries Berries Blueberries Strawberries Bananas Pineapple

Page 27: Information retrieval wed sept 02 2015 data…. -start at 6.45.

Using Facets

• Now there are multiple ways to get to each item

Preparation Method Fry Saute Boil Bake Broil Freeze

Desserts Cakes Cookies Dairy Ice Cream Sherbet Flan

Fruits Cherries Berries Blueberries Strawberries Bananas Pineapple

Fruit > PineappleDessert > Cake

Preparation > Bake

Dessert > Dairy > SherbetFruit > Berries > Strawberries

Preparation > Freeze

Page 28: Information retrieval wed sept 02 2015 data…. -start at 6.45.

labor intensive?

expensive?

Page 29: Information retrieval wed sept 02 2015 data…. -start at 6.45.

UNC Libraries Online Cataloghttp://www.lib.unc.edu/

e.g. personal crisise.g. personal crisis

Page 30: Information retrieval wed sept 02 2015 data…. -start at 6.45.

caveat: semi-structured data

• in fact almost no data is absolutely “unstructured”

• e.g., this slide has distinctly identified zones such as the title and bullets

• facilitates “semi-structured” search such as– title contains data and bullets contain structure

Page 31: Information retrieval wed sept 02 2015 data…. -start at 6.45.

Let’s look at a database of magazine & journal articles…

…Academic Search Complete

>> UNC Libraries Homepage: http://www.lib.unc.edu/

>> E-Research by Discipline

>> Frequently Used

>> Academic Search Premier [off-campus log in with onyen/password]

Page 32: Information retrieval wed sept 02 2015 data…. -start at 6.45.

Organization / Search

• We organize to enable retrieval• The more effort we put into organizing information, the more

effectively it can be retrieved• The more effort we put into retrieving information, the less it

needs to be organized first• We need to think in terms of investment, allocation of costs

and benefits between the organizer and retriever• The allocation differs according to the relationship between

them; who does the work and who gets the benefit?