Structured data and metadata evaluation methodology for organizations looking to improve image...

26
Structured Data and Metadata Evaluation Methodology for Organizations Looking to Improve Image Findability on the Web School of Library and Information Studies LIS 5733 Taught by: Dr. Susan Burke Research Proposal Written by: Emily Kolvitz Research Setting: Primarily Geared Towards Online Ecommerce/Business Organizations, but methodology could easily translate to Galleries, Museums, Archives, Libraries (GLAMs) or any institution looking to evaluate their structured data and metadata practices on the world wide web in an effort to improve findability of product offerings, general information or services.

Transcript of Structured data and metadata evaluation methodology for organizations looking to improve image...

Page 1: Structured data and metadata evaluation methodology for organizations looking to improve image findability on the web emily kolvitz_2014

Structured Data and Metadata Evaluation Methodology for

Organizations Looking to Improve Image Findability on the Web

School of Library and Information Studies

LIS 5733 Taught by Dr Susan Burke

Research Proposal Written by Emily Kolvitz

Research Setting Primarily Geared Towards Online EcommerceBusiness Organizations but methodology could

easily translate to Galleries Museums Archives Libraries (GLAMs) or any institution looking to evaluate their

structured data and metadata practices on the world wide web in an effort to improve findability of product offerings

general information or services

Introduction

The current state of findability on the web for many organizations is incipient Search

Engine Optimization (SEO) techniques change frequently and remain much a mystery

to many companies The one variable in the equation of web findability that remains a

staple is good quality metadata under the hood of the website

This research methodology will allow for

An assessment of findability maturity on the web from an image-centric viewpoint

Help improve findability on the web by establishing a baseline for where your

organization is at in terms of structured data content and visualize gaps or areas

for improvement from a search engine neutral perspective

Introduction

Most Searches Start with Google now (Holman 2011) (Lippincott 2013)

Search Algorithms Shaping what is most Easily Accessible (Connaway Dickey amp

Radford 2011) and they are subject to change frequently (Kritzinger 2013)

Search Algorithms Look for Your Structured Data and in the future and possibly

your embedded metadata (Cazier 2014) (Beall 2010)

Literature Review

Marshall Breeding (2013) assesses the limitations of the major search engine algorithms

ldquoBut even with the most sophisticated relevancy

algorithms index-based search and retrieval lacks the

ability to lead users to the potential related content

Semantic web technologies in conjunction with

repositories of open linked data promise to deliver

significant new capabilities in exploring and exploiting

information resources on the webrdquo

Literature Review

Semantic web is founded on good high-

quality structured data

Future technologies could potentially utilize

embedded metadata in search (Cazier 2014)

(Beall 2010) but there is authenticity

provenance and ldquobreadcrumbsrdquo value now

(Reicks 2013)

Literature Review

Most users donrsquot go past the first page of

search results (Paz 2013)

Structured Data Practices can help your

organization stay relevant (and findable) in

the age of information overload

Keeping it Search Engine Neutral is

advisable (Paz 2013)

TopicProposed Research

Methodology for establishing a baseline or benchmark of where an organization is at

in terms of structured data pertaining to image records that ultimately helps findability

on the web

By utilizing the proposed methodology for gathering this data for an organization

data-informed decisions can be made about structured data strategy going forward to

maintain relevancy on the web

Many structured data elements can affect online findability from file-naming

standards presence of alt text tags in html markup html markup itself embedded

metadata schemaorg markup and rich snippets text description at or nearby images

and more IEEE uses metadata or full-text for search (IEEE Xplore offers this--see

next slide)

Full Text Search amp Metadata Search

TopicProposed Research

It is also noteworthy that there are additional factors that affect findability on

the web that do not involve structured data but this research focuses solely on

structured data techniques within the control of individual organizations

All of these structured data techniques pertaining to image records will be

utilized in conjunction with the relevancy of onsite and offsite search results

Image search and information retrieval is a more difficult area than text search

and retrieval because accessibility to the image content is largely dependent on

side-car text (or metadata if you will) that describes the aboutness and

(hopefully) the context for the image record

Questions

Research Questions Addressed in this Study

1 What methods of search are available on the organizationrsquos online website

1 What is the file-naming structure for images on the website

1 What is the quality of search engine (onsite and offsite) results

1 What kinds of search results appear in Image Search when searching by the

organizationrsquos name and product description both with onsite search and offsite

search

Questions

Research Questions Addressed in this Study

5 What kinds of search results appear in Google Image Search when searching

by images taken from the organizationrsquos website

5 What kinds of search results come up when looking for specific products

(measure of structured data) through onsite search and offsite search

5 What are the results when looking for specific products on the offsite search

engine

Questions

Research Questions Addressed in this Study

8 What kinds of structured data are near or around the images on the organizationrsquos

website Alt Text Other

9 What file types appear on the organizationrsquos website (JPEG TIFF PNG)

9 What embedded metadata is available in images on the website

11 What does the XMPXMLRDF for these images look like and how robust is it

What does the graph look like

Variables

These measures are operationalized by utilization of likert scales applied by the human researcher For

example when rating the level of description for the file-name a research could conclude that the

filename sp_18379847923jpg is not very descriptive filename for a human let alone for a search engine

(unless of course this is a product sku) The researcher would then choose to assign it a low value on

descriptiveness on a 1-5 likert scale

Type of page

the image was

on

The image file naming

conventionfilename

Level of description for the

filename

Quality and number

of alt text tags

Quality and number

of embedded

metadata tags

Quality and number of structured

data tags pertaining to the images

Quality and number of search

results for onsite search

utilizing filename or alt text

Quality and number

of relevant search

results utilizing

offsite image search

Data Collection Methods

ParticipantsParticipants will include a single institution anonymized for the protection of their business The sample of image records utilized

in this study will be limited to image assets appearing on the organizationrsquos website domain Most data collection can take place

from the organizationrsquos website itself Some procedures will take place on external sites services or programs

Randomization of SampleThe sample of images utilized in this study can be randomized by extracting a site map of the particular organization of interest

using xsitemapcom After the site map is constructed the list of URLs should be inputted into a spreadsheet program and a record

number should be assigned to each URL From there the researcher can use a randomizer program to select the order of pages to

utilize in the study (ie Research Randomizer Available at httpwwwrandomizerorgformhtm) This method will be utilized for

taking a random sample of pages from the organization of interest

ConsentAll data collected in this study are publicly available and freely available on the web

Data Collection Methods

Obtaining Data on the website

Navigate to the URL

Right Click Image(s) and ldquoSave Asrdquo

Right Click Page and ldquoView Sourcerdquo Save as

txt file

Collect raw data from image by either

opening in Photoshop and Navigating to Raw

Data Column or utilize Phil Harveyrsquos

ExifTool

Obtaining Data through Structured Data Linter

Navigate to the Linter website

Enter URL

Screenshot Structured Data Results -or- save

as webpage

Obtaining Data through W3C RDF validator

Copy raw data xml extracted earlier and input

into RDF Validator

Select Graph Only on the Options

Parse RDF

Save Graph or Screenshot Graph

Store in Folder with other Data

Answer Research Questions

Systematically go through the collected data

and input findings into spreadsheet

Data Analysis Methods

Descriptive Statisticso Bell Curve - measures

towards a central tendency

using likert scale data

Bell Curve Image By Vierge Marie

(Own work) [Public domain] via

Wikimedia Commons

httpuploadwikimediaorgwikipe

diacommonsff6Gaussian_Filter

svg

Data Analysis Methods

Graphical Analysis

(Charts and Graphs)

Summary Report

Discussion of Findings

Visualizing the Results

The Structured Data Linter

utilizing URLs to display

structured data around the images

Available at

httplinterstructured-dataorg

Summary analysis will be

crafted utilizing all of these data

points to show what we are able

to understand about an image

versus what a machine or search

engine is able to know about an

image

W3C RDF Validator Graph

Visualization utilizing the raw

data markup extracted from the

image

Available at

httpwwww3orgRDFValidator

Structured Data Linter

Shows all

structured Data

Tags around the

images and in

the page markup

RDF Validator

Visualization of

embedded data

for images and

their subsequent

relationships to

other data

Summary Report

Complete Picture of Structured

Data Metadata and Analysis

of Study

Expected Outcomes

The anticipated results of this project include a benchmark for where this specific

organization is at in terms of structured data in the online environment and a

methodology for other organizations looking to assess their structured data maturity in

the digital space These results will be used to create a roadmap for improving resource

findability both on the web and within websites Other organizations may also aspire to

reuse this methodology for assessing their own current state of structured data Future

areas of research could include utilizing metadataRDF-driven search engines in

conjuncture with Vector Space Models to assess findability of image records on the

web and within websites

References (Slides amp Full Paper)

Algebraix Data Corporation 0005 Algebraix Data Launches Industryrsquos First Cost-Effective Automated Implementation

of Schemaorg Business Wire (English) 5

Beall Jeffrey 2010 How Google Uses Metadata to Improve Search Results Serials Librarian 59 no 1 40-53

Breeding Marshall 2013 Linked Data The Next Big Wave or Another Tech Fad Computers In Libraries 33 no 3

20-22

Cafarella MJ Halevy AY Zhang Y Wang DZ and Wu E Uncovering the relational Web In Proceedings of the

11th International Workshop on the Web and Databases (Vancouver BC June 13 2008)

httpwebeecsumichedu~michjcpaperswebtables_webdb08pdf

Connaway Lynn Sillipigni Timothy J Dickey and Marie L Radford 2011 ldquoIf it is too inconvenient Im not going after itrdquo

Convenience as

a critical factor in information-seeking behaviors Library amp Information Science Research (07408188) 33 no 3 179-190

References (Slides amp Full Paper)

Cazier Clay 2014 PM Digital Marketing Blog ldquoThe Future of Exif Image Datardquo Last accessed November 20 2014

httpwwwpmdigitalcomblog201404future-exif-image-data

Diagram Center Digital Image and Graphic Resources for Accessible Materials 2014 ldquoContent Modelrdquo Last Accessed

November 23 2014 httpdiagramcenterorgstandards-and-practicescontent-modelhtml

Google 2014 ldquoImage Publishing Guidelinesrdquo Last accessed November 21 2014

httpssupportgooglecomwebmastersanswer114016hl=en

Holman Lucy 2011 Millennial Students Mental Models of Search Implications for Academic Librarians and Database

Developers Journal Of Academic Librarianship 37 no 1 19-27

References (Slides amp Full Paper)

International Business Times 0006 BingGoogle and Yahoo merge to make search easier with schemaorg

International Business Times April

IPTC International Press Telecommunications Council 2014 ldquoEmbedded Metadata Manifestordquo Last accessed November

20 2014 httpwwwembeddedmetadataorgsocial-media-test-resultsphp (Embedded Metadata Manifesto 2014)

Kritzinger W T Search Engine Optimization and Pay-per-Click Marketing Strategies Journal of Organizational

Computing and Electronic Commerce no 3 (2013) 273-86

Lippincott Joan K ldquoNet Generation Students and Librariesrdquo EDUCAUSE (2005) accessed November 19 2014

httpwwweducauseeduresearch-and-publicationsbookseducating-net-generationnet-generation-students-and-libraries

References (Slides amp Full Paper)

Nakanishi T Semantic Context-Dependent Weighting for Vector Space Model Semantic Computing (ICSC) 2014

IEEE International Conference on vol no pp262266 16-18 June 2014 doi 101109ICSC201449

Paz Anita 2013 In search of Meaning The Written Word in the Age of Google Italian Journal Of Library amp

Information Science 4 no 2 255-266

Priebe T Schlager C Pernul G A search engine for RDF metadata Database and Expert Systems Applications

2004 Proceedings 15th International Workshop on vol no pp168172 2004 doi 101109DEXA20041333468

Reicks David 2010 ldquoWhy Embedded Metadata Wonrsquot Help Your SEOrdquo Last Updated December 30 2013 Last

Accessed November 23 2014 httpwwwcontrolledvocabularycomblogembedded-metadata-wont-help-seohtml

Page 2: Structured data and metadata evaluation methodology for organizations looking to improve image findability on the web emily kolvitz_2014

Introduction

The current state of findability on the web for many organizations is incipient Search

Engine Optimization (SEO) techniques change frequently and remain much a mystery

to many companies The one variable in the equation of web findability that remains a

staple is good quality metadata under the hood of the website

This research methodology will allow for

An assessment of findability maturity on the web from an image-centric viewpoint

Help improve findability on the web by establishing a baseline for where your

organization is at in terms of structured data content and visualize gaps or areas

for improvement from a search engine neutral perspective

Introduction

Most Searches Start with Google now (Holman 2011) (Lippincott 2013)

Search Algorithms Shaping what is most Easily Accessible (Connaway Dickey amp

Radford 2011) and they are subject to change frequently (Kritzinger 2013)

Search Algorithms Look for Your Structured Data and in the future and possibly

your embedded metadata (Cazier 2014) (Beall 2010)

Literature Review

Marshall Breeding (2013) assesses the limitations of the major search engine algorithms

ldquoBut even with the most sophisticated relevancy

algorithms index-based search and retrieval lacks the

ability to lead users to the potential related content

Semantic web technologies in conjunction with

repositories of open linked data promise to deliver

significant new capabilities in exploring and exploiting

information resources on the webrdquo

Literature Review

Semantic web is founded on good high-

quality structured data

Future technologies could potentially utilize

embedded metadata in search (Cazier 2014)

(Beall 2010) but there is authenticity

provenance and ldquobreadcrumbsrdquo value now

(Reicks 2013)

Literature Review

Most users donrsquot go past the first page of

search results (Paz 2013)

Structured Data Practices can help your

organization stay relevant (and findable) in

the age of information overload

Keeping it Search Engine Neutral is

advisable (Paz 2013)

TopicProposed Research

Methodology for establishing a baseline or benchmark of where an organization is at

in terms of structured data pertaining to image records that ultimately helps findability

on the web

By utilizing the proposed methodology for gathering this data for an organization

data-informed decisions can be made about structured data strategy going forward to

maintain relevancy on the web

Many structured data elements can affect online findability from file-naming

standards presence of alt text tags in html markup html markup itself embedded

metadata schemaorg markup and rich snippets text description at or nearby images

and more IEEE uses metadata or full-text for search (IEEE Xplore offers this--see

next slide)

Full Text Search amp Metadata Search

TopicProposed Research

It is also noteworthy that there are additional factors that affect findability on

the web that do not involve structured data but this research focuses solely on

structured data techniques within the control of individual organizations

All of these structured data techniques pertaining to image records will be

utilized in conjunction with the relevancy of onsite and offsite search results

Image search and information retrieval is a more difficult area than text search

and retrieval because accessibility to the image content is largely dependent on

side-car text (or metadata if you will) that describes the aboutness and

(hopefully) the context for the image record

Questions

Research Questions Addressed in this Study

1 What methods of search are available on the organizationrsquos online website

1 What is the file-naming structure for images on the website

1 What is the quality of search engine (onsite and offsite) results

1 What kinds of search results appear in Image Search when searching by the

organizationrsquos name and product description both with onsite search and offsite

search

Questions

Research Questions Addressed in this Study

5 What kinds of search results appear in Google Image Search when searching

by images taken from the organizationrsquos website

5 What kinds of search results come up when looking for specific products

(measure of structured data) through onsite search and offsite search

5 What are the results when looking for specific products on the offsite search

engine

Questions

Research Questions Addressed in this Study

8 What kinds of structured data are near or around the images on the organizationrsquos

website Alt Text Other

9 What file types appear on the organizationrsquos website (JPEG TIFF PNG)

9 What embedded metadata is available in images on the website

11 What does the XMPXMLRDF for these images look like and how robust is it

What does the graph look like

Variables

These measures are operationalized by utilization of likert scales applied by the human researcher For

example when rating the level of description for the file-name a research could conclude that the

filename sp_18379847923jpg is not very descriptive filename for a human let alone for a search engine

(unless of course this is a product sku) The researcher would then choose to assign it a low value on

descriptiveness on a 1-5 likert scale

Type of page

the image was

on

The image file naming

conventionfilename

Level of description for the

filename

Quality and number

of alt text tags

Quality and number

of embedded

metadata tags

Quality and number of structured

data tags pertaining to the images

Quality and number of search

results for onsite search

utilizing filename or alt text

Quality and number

of relevant search

results utilizing

offsite image search

Data Collection Methods

ParticipantsParticipants will include a single institution anonymized for the protection of their business The sample of image records utilized

in this study will be limited to image assets appearing on the organizationrsquos website domain Most data collection can take place

from the organizationrsquos website itself Some procedures will take place on external sites services or programs

Randomization of SampleThe sample of images utilized in this study can be randomized by extracting a site map of the particular organization of interest

using xsitemapcom After the site map is constructed the list of URLs should be inputted into a spreadsheet program and a record

number should be assigned to each URL From there the researcher can use a randomizer program to select the order of pages to

utilize in the study (ie Research Randomizer Available at httpwwwrandomizerorgformhtm) This method will be utilized for

taking a random sample of pages from the organization of interest

ConsentAll data collected in this study are publicly available and freely available on the web

Data Collection Methods

Obtaining Data on the website

Navigate to the URL

Right Click Image(s) and ldquoSave Asrdquo

Right Click Page and ldquoView Sourcerdquo Save as

txt file

Collect raw data from image by either

opening in Photoshop and Navigating to Raw

Data Column or utilize Phil Harveyrsquos

ExifTool

Obtaining Data through Structured Data Linter

Navigate to the Linter website

Enter URL

Screenshot Structured Data Results -or- save

as webpage

Obtaining Data through W3C RDF validator

Copy raw data xml extracted earlier and input

into RDF Validator

Select Graph Only on the Options

Parse RDF

Save Graph or Screenshot Graph

Store in Folder with other Data

Answer Research Questions

Systematically go through the collected data

and input findings into spreadsheet

Data Analysis Methods

Descriptive Statisticso Bell Curve - measures

towards a central tendency

using likert scale data

Bell Curve Image By Vierge Marie

(Own work) [Public domain] via

Wikimedia Commons

httpuploadwikimediaorgwikipe

diacommonsff6Gaussian_Filter

svg

Data Analysis Methods

Graphical Analysis

(Charts and Graphs)

Summary Report

Discussion of Findings

Visualizing the Results

The Structured Data Linter

utilizing URLs to display

structured data around the images

Available at

httplinterstructured-dataorg

Summary analysis will be

crafted utilizing all of these data

points to show what we are able

to understand about an image

versus what a machine or search

engine is able to know about an

image

W3C RDF Validator Graph

Visualization utilizing the raw

data markup extracted from the

image

Available at

httpwwww3orgRDFValidator

Structured Data Linter

Shows all

structured Data

Tags around the

images and in

the page markup

RDF Validator

Visualization of

embedded data

for images and

their subsequent

relationships to

other data

Summary Report

Complete Picture of Structured

Data Metadata and Analysis

of Study

Expected Outcomes

The anticipated results of this project include a benchmark for where this specific

organization is at in terms of structured data in the online environment and a

methodology for other organizations looking to assess their structured data maturity in

the digital space These results will be used to create a roadmap for improving resource

findability both on the web and within websites Other organizations may also aspire to

reuse this methodology for assessing their own current state of structured data Future

areas of research could include utilizing metadataRDF-driven search engines in

conjuncture with Vector Space Models to assess findability of image records on the

web and within websites

References (Slides amp Full Paper)

Algebraix Data Corporation 0005 Algebraix Data Launches Industryrsquos First Cost-Effective Automated Implementation

of Schemaorg Business Wire (English) 5

Beall Jeffrey 2010 How Google Uses Metadata to Improve Search Results Serials Librarian 59 no 1 40-53

Breeding Marshall 2013 Linked Data The Next Big Wave or Another Tech Fad Computers In Libraries 33 no 3

20-22

Cafarella MJ Halevy AY Zhang Y Wang DZ and Wu E Uncovering the relational Web In Proceedings of the

11th International Workshop on the Web and Databases (Vancouver BC June 13 2008)

httpwebeecsumichedu~michjcpaperswebtables_webdb08pdf

Connaway Lynn Sillipigni Timothy J Dickey and Marie L Radford 2011 ldquoIf it is too inconvenient Im not going after itrdquo

Convenience as

a critical factor in information-seeking behaviors Library amp Information Science Research (07408188) 33 no 3 179-190

References (Slides amp Full Paper)

Cazier Clay 2014 PM Digital Marketing Blog ldquoThe Future of Exif Image Datardquo Last accessed November 20 2014

httpwwwpmdigitalcomblog201404future-exif-image-data

Diagram Center Digital Image and Graphic Resources for Accessible Materials 2014 ldquoContent Modelrdquo Last Accessed

November 23 2014 httpdiagramcenterorgstandards-and-practicescontent-modelhtml

Google 2014 ldquoImage Publishing Guidelinesrdquo Last accessed November 21 2014

httpssupportgooglecomwebmastersanswer114016hl=en

Holman Lucy 2011 Millennial Students Mental Models of Search Implications for Academic Librarians and Database

Developers Journal Of Academic Librarianship 37 no 1 19-27

References (Slides amp Full Paper)

International Business Times 0006 BingGoogle and Yahoo merge to make search easier with schemaorg

International Business Times April

IPTC International Press Telecommunications Council 2014 ldquoEmbedded Metadata Manifestordquo Last accessed November

20 2014 httpwwwembeddedmetadataorgsocial-media-test-resultsphp (Embedded Metadata Manifesto 2014)

Kritzinger W T Search Engine Optimization and Pay-per-Click Marketing Strategies Journal of Organizational

Computing and Electronic Commerce no 3 (2013) 273-86

Lippincott Joan K ldquoNet Generation Students and Librariesrdquo EDUCAUSE (2005) accessed November 19 2014

httpwwweducauseeduresearch-and-publicationsbookseducating-net-generationnet-generation-students-and-libraries

References (Slides amp Full Paper)

Nakanishi T Semantic Context-Dependent Weighting for Vector Space Model Semantic Computing (ICSC) 2014

IEEE International Conference on vol no pp262266 16-18 June 2014 doi 101109ICSC201449

Paz Anita 2013 In search of Meaning The Written Word in the Age of Google Italian Journal Of Library amp

Information Science 4 no 2 255-266

Priebe T Schlager C Pernul G A search engine for RDF metadata Database and Expert Systems Applications

2004 Proceedings 15th International Workshop on vol no pp168172 2004 doi 101109DEXA20041333468

Reicks David 2010 ldquoWhy Embedded Metadata Wonrsquot Help Your SEOrdquo Last Updated December 30 2013 Last

Accessed November 23 2014 httpwwwcontrolledvocabularycomblogembedded-metadata-wont-help-seohtml

Page 3: Structured data and metadata evaluation methodology for organizations looking to improve image findability on the web emily kolvitz_2014

Introduction

Most Searches Start with Google now (Holman 2011) (Lippincott 2013)

Search Algorithms Shaping what is most Easily Accessible (Connaway Dickey amp

Radford 2011) and they are subject to change frequently (Kritzinger 2013)

Search Algorithms Look for Your Structured Data and in the future and possibly

your embedded metadata (Cazier 2014) (Beall 2010)

Literature Review

Marshall Breeding (2013) assesses the limitations of the major search engine algorithms

ldquoBut even with the most sophisticated relevancy

algorithms index-based search and retrieval lacks the

ability to lead users to the potential related content

Semantic web technologies in conjunction with

repositories of open linked data promise to deliver

significant new capabilities in exploring and exploiting

information resources on the webrdquo

Literature Review

Semantic web is founded on good high-

quality structured data

Future technologies could potentially utilize

embedded metadata in search (Cazier 2014)

(Beall 2010) but there is authenticity

provenance and ldquobreadcrumbsrdquo value now

(Reicks 2013)

Literature Review

Most users donrsquot go past the first page of

search results (Paz 2013)

Structured Data Practices can help your

organization stay relevant (and findable) in

the age of information overload

Keeping it Search Engine Neutral is

advisable (Paz 2013)

TopicProposed Research

Methodology for establishing a baseline or benchmark of where an organization is at

in terms of structured data pertaining to image records that ultimately helps findability

on the web

By utilizing the proposed methodology for gathering this data for an organization

data-informed decisions can be made about structured data strategy going forward to

maintain relevancy on the web

Many structured data elements can affect online findability from file-naming

standards presence of alt text tags in html markup html markup itself embedded

metadata schemaorg markup and rich snippets text description at or nearby images

and more IEEE uses metadata or full-text for search (IEEE Xplore offers this--see

next slide)

Full Text Search amp Metadata Search

TopicProposed Research

It is also noteworthy that there are additional factors that affect findability on

the web that do not involve structured data but this research focuses solely on

structured data techniques within the control of individual organizations

All of these structured data techniques pertaining to image records will be

utilized in conjunction with the relevancy of onsite and offsite search results

Image search and information retrieval is a more difficult area than text search

and retrieval because accessibility to the image content is largely dependent on

side-car text (or metadata if you will) that describes the aboutness and

(hopefully) the context for the image record

Questions

Research Questions Addressed in this Study

1 What methods of search are available on the organizationrsquos online website

1 What is the file-naming structure for images on the website

1 What is the quality of search engine (onsite and offsite) results

1 What kinds of search results appear in Image Search when searching by the

organizationrsquos name and product description both with onsite search and offsite

search

Questions

Research Questions Addressed in this Study

5 What kinds of search results appear in Google Image Search when searching

by images taken from the organizationrsquos website

5 What kinds of search results come up when looking for specific products

(measure of structured data) through onsite search and offsite search

5 What are the results when looking for specific products on the offsite search

engine

Questions

Research Questions Addressed in this Study

8 What kinds of structured data are near or around the images on the organizationrsquos

website Alt Text Other

9 What file types appear on the organizationrsquos website (JPEG TIFF PNG)

9 What embedded metadata is available in images on the website

11 What does the XMPXMLRDF for these images look like and how robust is it

What does the graph look like

Variables

These measures are operationalized by utilization of likert scales applied by the human researcher For

example when rating the level of description for the file-name a research could conclude that the

filename sp_18379847923jpg is not very descriptive filename for a human let alone for a search engine

(unless of course this is a product sku) The researcher would then choose to assign it a low value on

descriptiveness on a 1-5 likert scale

Type of page

the image was

on

The image file naming

conventionfilename

Level of description for the

filename

Quality and number

of alt text tags

Quality and number

of embedded

metadata tags

Quality and number of structured

data tags pertaining to the images

Quality and number of search

results for onsite search

utilizing filename or alt text

Quality and number

of relevant search

results utilizing

offsite image search

Data Collection Methods

ParticipantsParticipants will include a single institution anonymized for the protection of their business The sample of image records utilized

in this study will be limited to image assets appearing on the organizationrsquos website domain Most data collection can take place

from the organizationrsquos website itself Some procedures will take place on external sites services or programs

Randomization of SampleThe sample of images utilized in this study can be randomized by extracting a site map of the particular organization of interest

using xsitemapcom After the site map is constructed the list of URLs should be inputted into a spreadsheet program and a record

number should be assigned to each URL From there the researcher can use a randomizer program to select the order of pages to

utilize in the study (ie Research Randomizer Available at httpwwwrandomizerorgformhtm) This method will be utilized for

taking a random sample of pages from the organization of interest

ConsentAll data collected in this study are publicly available and freely available on the web

Data Collection Methods

Obtaining Data on the website

Navigate to the URL

Right Click Image(s) and ldquoSave Asrdquo

Right Click Page and ldquoView Sourcerdquo Save as

txt file

Collect raw data from image by either

opening in Photoshop and Navigating to Raw

Data Column or utilize Phil Harveyrsquos

ExifTool

Obtaining Data through Structured Data Linter

Navigate to the Linter website

Enter URL

Screenshot Structured Data Results -or- save

as webpage

Obtaining Data through W3C RDF validator

Copy raw data xml extracted earlier and input

into RDF Validator

Select Graph Only on the Options

Parse RDF

Save Graph or Screenshot Graph

Store in Folder with other Data

Answer Research Questions

Systematically go through the collected data

and input findings into spreadsheet

Data Analysis Methods

Descriptive Statisticso Bell Curve - measures

towards a central tendency

using likert scale data

Bell Curve Image By Vierge Marie

(Own work) [Public domain] via

Wikimedia Commons

httpuploadwikimediaorgwikipe

diacommonsff6Gaussian_Filter

svg

Data Analysis Methods

Graphical Analysis

(Charts and Graphs)

Summary Report

Discussion of Findings

Visualizing the Results

The Structured Data Linter

utilizing URLs to display

structured data around the images

Available at

httplinterstructured-dataorg

Summary analysis will be

crafted utilizing all of these data

points to show what we are able

to understand about an image

versus what a machine or search

engine is able to know about an

image

W3C RDF Validator Graph

Visualization utilizing the raw

data markup extracted from the

image

Available at

httpwwww3orgRDFValidator

Structured Data Linter

Shows all

structured Data

Tags around the

images and in

the page markup

RDF Validator

Visualization of

embedded data

for images and

their subsequent

relationships to

other data

Summary Report

Complete Picture of Structured

Data Metadata and Analysis

of Study

Expected Outcomes

The anticipated results of this project include a benchmark for where this specific

organization is at in terms of structured data in the online environment and a

methodology for other organizations looking to assess their structured data maturity in

the digital space These results will be used to create a roadmap for improving resource

findability both on the web and within websites Other organizations may also aspire to

reuse this methodology for assessing their own current state of structured data Future

areas of research could include utilizing metadataRDF-driven search engines in

conjuncture with Vector Space Models to assess findability of image records on the

web and within websites

References (Slides amp Full Paper)

Algebraix Data Corporation 0005 Algebraix Data Launches Industryrsquos First Cost-Effective Automated Implementation

of Schemaorg Business Wire (English) 5

Beall Jeffrey 2010 How Google Uses Metadata to Improve Search Results Serials Librarian 59 no 1 40-53

Breeding Marshall 2013 Linked Data The Next Big Wave or Another Tech Fad Computers In Libraries 33 no 3

20-22

Cafarella MJ Halevy AY Zhang Y Wang DZ and Wu E Uncovering the relational Web In Proceedings of the

11th International Workshop on the Web and Databases (Vancouver BC June 13 2008)

httpwebeecsumichedu~michjcpaperswebtables_webdb08pdf

Connaway Lynn Sillipigni Timothy J Dickey and Marie L Radford 2011 ldquoIf it is too inconvenient Im not going after itrdquo

Convenience as

a critical factor in information-seeking behaviors Library amp Information Science Research (07408188) 33 no 3 179-190

References (Slides amp Full Paper)

Cazier Clay 2014 PM Digital Marketing Blog ldquoThe Future of Exif Image Datardquo Last accessed November 20 2014

httpwwwpmdigitalcomblog201404future-exif-image-data

Diagram Center Digital Image and Graphic Resources for Accessible Materials 2014 ldquoContent Modelrdquo Last Accessed

November 23 2014 httpdiagramcenterorgstandards-and-practicescontent-modelhtml

Google 2014 ldquoImage Publishing Guidelinesrdquo Last accessed November 21 2014

httpssupportgooglecomwebmastersanswer114016hl=en

Holman Lucy 2011 Millennial Students Mental Models of Search Implications for Academic Librarians and Database

Developers Journal Of Academic Librarianship 37 no 1 19-27

References (Slides amp Full Paper)

International Business Times 0006 BingGoogle and Yahoo merge to make search easier with schemaorg

International Business Times April

IPTC International Press Telecommunications Council 2014 ldquoEmbedded Metadata Manifestordquo Last accessed November

20 2014 httpwwwembeddedmetadataorgsocial-media-test-resultsphp (Embedded Metadata Manifesto 2014)

Kritzinger W T Search Engine Optimization and Pay-per-Click Marketing Strategies Journal of Organizational

Computing and Electronic Commerce no 3 (2013) 273-86

Lippincott Joan K ldquoNet Generation Students and Librariesrdquo EDUCAUSE (2005) accessed November 19 2014

httpwwweducauseeduresearch-and-publicationsbookseducating-net-generationnet-generation-students-and-libraries

References (Slides amp Full Paper)

Nakanishi T Semantic Context-Dependent Weighting for Vector Space Model Semantic Computing (ICSC) 2014

IEEE International Conference on vol no pp262266 16-18 June 2014 doi 101109ICSC201449

Paz Anita 2013 In search of Meaning The Written Word in the Age of Google Italian Journal Of Library amp

Information Science 4 no 2 255-266

Priebe T Schlager C Pernul G A search engine for RDF metadata Database and Expert Systems Applications

2004 Proceedings 15th International Workshop on vol no pp168172 2004 doi 101109DEXA20041333468

Reicks David 2010 ldquoWhy Embedded Metadata Wonrsquot Help Your SEOrdquo Last Updated December 30 2013 Last

Accessed November 23 2014 httpwwwcontrolledvocabularycomblogembedded-metadata-wont-help-seohtml

Page 4: Structured data and metadata evaluation methodology for organizations looking to improve image findability on the web emily kolvitz_2014

Literature Review

Marshall Breeding (2013) assesses the limitations of the major search engine algorithms

ldquoBut even with the most sophisticated relevancy

algorithms index-based search and retrieval lacks the

ability to lead users to the potential related content

Semantic web technologies in conjunction with

repositories of open linked data promise to deliver

significant new capabilities in exploring and exploiting

information resources on the webrdquo

Literature Review

Semantic web is founded on good high-

quality structured data

Future technologies could potentially utilize

embedded metadata in search (Cazier 2014)

(Beall 2010) but there is authenticity

provenance and ldquobreadcrumbsrdquo value now

(Reicks 2013)

Literature Review

Most users donrsquot go past the first page of

search results (Paz 2013)

Structured Data Practices can help your

organization stay relevant (and findable) in

the age of information overload

Keeping it Search Engine Neutral is

advisable (Paz 2013)

TopicProposed Research

Methodology for establishing a baseline or benchmark of where an organization is at

in terms of structured data pertaining to image records that ultimately helps findability

on the web

By utilizing the proposed methodology for gathering this data for an organization

data-informed decisions can be made about structured data strategy going forward to

maintain relevancy on the web

Many structured data elements can affect online findability from file-naming

standards presence of alt text tags in html markup html markup itself embedded

metadata schemaorg markup and rich snippets text description at or nearby images

and more IEEE uses metadata or full-text for search (IEEE Xplore offers this--see

next slide)

Full Text Search amp Metadata Search

TopicProposed Research

It is also noteworthy that there are additional factors that affect findability on

the web that do not involve structured data but this research focuses solely on

structured data techniques within the control of individual organizations

All of these structured data techniques pertaining to image records will be

utilized in conjunction with the relevancy of onsite and offsite search results

Image search and information retrieval is a more difficult area than text search

and retrieval because accessibility to the image content is largely dependent on

side-car text (or metadata if you will) that describes the aboutness and

(hopefully) the context for the image record

Questions

Research Questions Addressed in this Study

1 What methods of search are available on the organizationrsquos online website

1 What is the file-naming structure for images on the website

1 What is the quality of search engine (onsite and offsite) results

1 What kinds of search results appear in Image Search when searching by the

organizationrsquos name and product description both with onsite search and offsite

search

Questions

Research Questions Addressed in this Study

5 What kinds of search results appear in Google Image Search when searching

by images taken from the organizationrsquos website

5 What kinds of search results come up when looking for specific products

(measure of structured data) through onsite search and offsite search

5 What are the results when looking for specific products on the offsite search

engine

Questions

Research Questions Addressed in this Study

8 What kinds of structured data are near or around the images on the organizationrsquos

website Alt Text Other

9 What file types appear on the organizationrsquos website (JPEG TIFF PNG)

9 What embedded metadata is available in images on the website

11 What does the XMPXMLRDF for these images look like and how robust is it

What does the graph look like

Variables

These measures are operationalized by utilization of likert scales applied by the human researcher For

example when rating the level of description for the file-name a research could conclude that the

filename sp_18379847923jpg is not very descriptive filename for a human let alone for a search engine

(unless of course this is a product sku) The researcher would then choose to assign it a low value on

descriptiveness on a 1-5 likert scale

Type of page

the image was

on

The image file naming

conventionfilename

Level of description for the

filename

Quality and number

of alt text tags

Quality and number

of embedded

metadata tags

Quality and number of structured

data tags pertaining to the images

Quality and number of search

results for onsite search

utilizing filename or alt text

Quality and number

of relevant search

results utilizing

offsite image search

Data Collection Methods

ParticipantsParticipants will include a single institution anonymized for the protection of their business The sample of image records utilized

in this study will be limited to image assets appearing on the organizationrsquos website domain Most data collection can take place

from the organizationrsquos website itself Some procedures will take place on external sites services or programs

Randomization of SampleThe sample of images utilized in this study can be randomized by extracting a site map of the particular organization of interest

using xsitemapcom After the site map is constructed the list of URLs should be inputted into a spreadsheet program and a record

number should be assigned to each URL From there the researcher can use a randomizer program to select the order of pages to

utilize in the study (ie Research Randomizer Available at httpwwwrandomizerorgformhtm) This method will be utilized for

taking a random sample of pages from the organization of interest

ConsentAll data collected in this study are publicly available and freely available on the web

Data Collection Methods

Obtaining Data on the website

Navigate to the URL

Right Click Image(s) and ldquoSave Asrdquo

Right Click Page and ldquoView Sourcerdquo Save as

txt file

Collect raw data from image by either

opening in Photoshop and Navigating to Raw

Data Column or utilize Phil Harveyrsquos

ExifTool

Obtaining Data through Structured Data Linter

Navigate to the Linter website

Enter URL

Screenshot Structured Data Results -or- save

as webpage

Obtaining Data through W3C RDF validator

Copy raw data xml extracted earlier and input

into RDF Validator

Select Graph Only on the Options

Parse RDF

Save Graph or Screenshot Graph

Store in Folder with other Data

Answer Research Questions

Systematically go through the collected data

and input findings into spreadsheet

Data Analysis Methods

Descriptive Statisticso Bell Curve - measures

towards a central tendency

using likert scale data

Bell Curve Image By Vierge Marie

(Own work) [Public domain] via

Wikimedia Commons

httpuploadwikimediaorgwikipe

diacommonsff6Gaussian_Filter

svg

Data Analysis Methods

Graphical Analysis

(Charts and Graphs)

Summary Report

Discussion of Findings

Visualizing the Results

The Structured Data Linter

utilizing URLs to display

structured data around the images

Available at

httplinterstructured-dataorg

Summary analysis will be

crafted utilizing all of these data

points to show what we are able

to understand about an image

versus what a machine or search

engine is able to know about an

image

W3C RDF Validator Graph

Visualization utilizing the raw

data markup extracted from the

image

Available at

httpwwww3orgRDFValidator

Structured Data Linter

Shows all

structured Data

Tags around the

images and in

the page markup

RDF Validator

Visualization of

embedded data

for images and

their subsequent

relationships to

other data

Summary Report

Complete Picture of Structured

Data Metadata and Analysis

of Study

Expected Outcomes

The anticipated results of this project include a benchmark for where this specific

organization is at in terms of structured data in the online environment and a

methodology for other organizations looking to assess their structured data maturity in

the digital space These results will be used to create a roadmap for improving resource

findability both on the web and within websites Other organizations may also aspire to

reuse this methodology for assessing their own current state of structured data Future

areas of research could include utilizing metadataRDF-driven search engines in

conjuncture with Vector Space Models to assess findability of image records on the

web and within websites

References (Slides amp Full Paper)

Algebraix Data Corporation 0005 Algebraix Data Launches Industryrsquos First Cost-Effective Automated Implementation

of Schemaorg Business Wire (English) 5

Beall Jeffrey 2010 How Google Uses Metadata to Improve Search Results Serials Librarian 59 no 1 40-53

Breeding Marshall 2013 Linked Data The Next Big Wave or Another Tech Fad Computers In Libraries 33 no 3

20-22

Cafarella MJ Halevy AY Zhang Y Wang DZ and Wu E Uncovering the relational Web In Proceedings of the

11th International Workshop on the Web and Databases (Vancouver BC June 13 2008)

httpwebeecsumichedu~michjcpaperswebtables_webdb08pdf

Connaway Lynn Sillipigni Timothy J Dickey and Marie L Radford 2011 ldquoIf it is too inconvenient Im not going after itrdquo

Convenience as

a critical factor in information-seeking behaviors Library amp Information Science Research (07408188) 33 no 3 179-190

References (Slides amp Full Paper)

Cazier Clay 2014 PM Digital Marketing Blog ldquoThe Future of Exif Image Datardquo Last accessed November 20 2014

httpwwwpmdigitalcomblog201404future-exif-image-data

Diagram Center Digital Image and Graphic Resources for Accessible Materials 2014 ldquoContent Modelrdquo Last Accessed

November 23 2014 httpdiagramcenterorgstandards-and-practicescontent-modelhtml

Google 2014 ldquoImage Publishing Guidelinesrdquo Last accessed November 21 2014

httpssupportgooglecomwebmastersanswer114016hl=en

Holman Lucy 2011 Millennial Students Mental Models of Search Implications for Academic Librarians and Database

Developers Journal Of Academic Librarianship 37 no 1 19-27

References (Slides amp Full Paper)

International Business Times 0006 BingGoogle and Yahoo merge to make search easier with schemaorg

International Business Times April

IPTC International Press Telecommunications Council 2014 ldquoEmbedded Metadata Manifestordquo Last accessed November

20 2014 httpwwwembeddedmetadataorgsocial-media-test-resultsphp (Embedded Metadata Manifesto 2014)

Kritzinger W T Search Engine Optimization and Pay-per-Click Marketing Strategies Journal of Organizational

Computing and Electronic Commerce no 3 (2013) 273-86

Lippincott Joan K ldquoNet Generation Students and Librariesrdquo EDUCAUSE (2005) accessed November 19 2014

httpwwweducauseeduresearch-and-publicationsbookseducating-net-generationnet-generation-students-and-libraries

References (Slides amp Full Paper)

Nakanishi T Semantic Context-Dependent Weighting for Vector Space Model Semantic Computing (ICSC) 2014

IEEE International Conference on vol no pp262266 16-18 June 2014 doi 101109ICSC201449

Paz Anita 2013 In search of Meaning The Written Word in the Age of Google Italian Journal Of Library amp

Information Science 4 no 2 255-266

Priebe T Schlager C Pernul G A search engine for RDF metadata Database and Expert Systems Applications

2004 Proceedings 15th International Workshop on vol no pp168172 2004 doi 101109DEXA20041333468

Reicks David 2010 ldquoWhy Embedded Metadata Wonrsquot Help Your SEOrdquo Last Updated December 30 2013 Last

Accessed November 23 2014 httpwwwcontrolledvocabularycomblogembedded-metadata-wont-help-seohtml

Page 5: Structured data and metadata evaluation methodology for organizations looking to improve image findability on the web emily kolvitz_2014

Literature Review

Semantic web is founded on good high-

quality structured data

Future technologies could potentially utilize

embedded metadata in search (Cazier 2014)

(Beall 2010) but there is authenticity

provenance and ldquobreadcrumbsrdquo value now

(Reicks 2013)

Literature Review

Most users donrsquot go past the first page of

search results (Paz 2013)

Structured Data Practices can help your

organization stay relevant (and findable) in

the age of information overload

Keeping it Search Engine Neutral is

advisable (Paz 2013)

TopicProposed Research

Methodology for establishing a baseline or benchmark of where an organization is at

in terms of structured data pertaining to image records that ultimately helps findability

on the web

By utilizing the proposed methodology for gathering this data for an organization

data-informed decisions can be made about structured data strategy going forward to

maintain relevancy on the web

Many structured data elements can affect online findability from file-naming

standards presence of alt text tags in html markup html markup itself embedded

metadata schemaorg markup and rich snippets text description at or nearby images

and more IEEE uses metadata or full-text for search (IEEE Xplore offers this--see

next slide)

Full Text Search amp Metadata Search

TopicProposed Research

It is also noteworthy that there are additional factors that affect findability on

the web that do not involve structured data but this research focuses solely on

structured data techniques within the control of individual organizations

All of these structured data techniques pertaining to image records will be

utilized in conjunction with the relevancy of onsite and offsite search results

Image search and information retrieval is a more difficult area than text search

and retrieval because accessibility to the image content is largely dependent on

side-car text (or metadata if you will) that describes the aboutness and

(hopefully) the context for the image record

Questions

Research Questions Addressed in this Study

1 What methods of search are available on the organizationrsquos online website

1 What is the file-naming structure for images on the website

1 What is the quality of search engine (onsite and offsite) results

1 What kinds of search results appear in Image Search when searching by the

organizationrsquos name and product description both with onsite search and offsite

search

Questions

Research Questions Addressed in this Study

5 What kinds of search results appear in Google Image Search when searching

by images taken from the organizationrsquos website

5 What kinds of search results come up when looking for specific products

(measure of structured data) through onsite search and offsite search

5 What are the results when looking for specific products on the offsite search

engine

Questions

Research Questions Addressed in this Study

8 What kinds of structured data are near or around the images on the organizationrsquos

website Alt Text Other

9 What file types appear on the organizationrsquos website (JPEG TIFF PNG)

9 What embedded metadata is available in images on the website

11 What does the XMPXMLRDF for these images look like and how robust is it

What does the graph look like

Variables

These measures are operationalized by utilization of likert scales applied by the human researcher For

example when rating the level of description for the file-name a research could conclude that the

filename sp_18379847923jpg is not very descriptive filename for a human let alone for a search engine

(unless of course this is a product sku) The researcher would then choose to assign it a low value on

descriptiveness on a 1-5 likert scale

Type of page

the image was

on

The image file naming

conventionfilename

Level of description for the

filename

Quality and number

of alt text tags

Quality and number

of embedded

metadata tags

Quality and number of structured

data tags pertaining to the images

Quality and number of search

results for onsite search

utilizing filename or alt text

Quality and number

of relevant search

results utilizing

offsite image search

Data Collection Methods

ParticipantsParticipants will include a single institution anonymized for the protection of their business The sample of image records utilized

in this study will be limited to image assets appearing on the organizationrsquos website domain Most data collection can take place

from the organizationrsquos website itself Some procedures will take place on external sites services or programs

Randomization of SampleThe sample of images utilized in this study can be randomized by extracting a site map of the particular organization of interest

using xsitemapcom After the site map is constructed the list of URLs should be inputted into a spreadsheet program and a record

number should be assigned to each URL From there the researcher can use a randomizer program to select the order of pages to

utilize in the study (ie Research Randomizer Available at httpwwwrandomizerorgformhtm) This method will be utilized for

taking a random sample of pages from the organization of interest

ConsentAll data collected in this study are publicly available and freely available on the web

Data Collection Methods

Obtaining Data on the website

Navigate to the URL

Right Click Image(s) and ldquoSave Asrdquo

Right Click Page and ldquoView Sourcerdquo Save as

txt file

Collect raw data from image by either

opening in Photoshop and Navigating to Raw

Data Column or utilize Phil Harveyrsquos

ExifTool

Obtaining Data through Structured Data Linter

Navigate to the Linter website

Enter URL

Screenshot Structured Data Results -or- save

as webpage

Obtaining Data through W3C RDF validator

Copy raw data xml extracted earlier and input

into RDF Validator

Select Graph Only on the Options

Parse RDF

Save Graph or Screenshot Graph

Store in Folder with other Data

Answer Research Questions

Systematically go through the collected data

and input findings into spreadsheet

Data Analysis Methods

Descriptive Statisticso Bell Curve - measures

towards a central tendency

using likert scale data

Bell Curve Image By Vierge Marie

(Own work) [Public domain] via

Wikimedia Commons

httpuploadwikimediaorgwikipe

diacommonsff6Gaussian_Filter

svg

Data Analysis Methods

Graphical Analysis

(Charts and Graphs)

Summary Report

Discussion of Findings

Visualizing the Results

The Structured Data Linter

utilizing URLs to display

structured data around the images

Available at

httplinterstructured-dataorg

Summary analysis will be

crafted utilizing all of these data

points to show what we are able

to understand about an image

versus what a machine or search

engine is able to know about an

image

W3C RDF Validator Graph

Visualization utilizing the raw

data markup extracted from the

image

Available at

httpwwww3orgRDFValidator

Structured Data Linter

Shows all

structured Data

Tags around the

images and in

the page markup

RDF Validator

Visualization of

embedded data

for images and

their subsequent

relationships to

other data

Summary Report

Complete Picture of Structured

Data Metadata and Analysis

of Study

Expected Outcomes

The anticipated results of this project include a benchmark for where this specific

organization is at in terms of structured data in the online environment and a

methodology for other organizations looking to assess their structured data maturity in

the digital space These results will be used to create a roadmap for improving resource

findability both on the web and within websites Other organizations may also aspire to

reuse this methodology for assessing their own current state of structured data Future

areas of research could include utilizing metadataRDF-driven search engines in

conjuncture with Vector Space Models to assess findability of image records on the

web and within websites

References (Slides amp Full Paper)

Algebraix Data Corporation 0005 Algebraix Data Launches Industryrsquos First Cost-Effective Automated Implementation

of Schemaorg Business Wire (English) 5

Beall Jeffrey 2010 How Google Uses Metadata to Improve Search Results Serials Librarian 59 no 1 40-53

Breeding Marshall 2013 Linked Data The Next Big Wave or Another Tech Fad Computers In Libraries 33 no 3

20-22

Cafarella MJ Halevy AY Zhang Y Wang DZ and Wu E Uncovering the relational Web In Proceedings of the

11th International Workshop on the Web and Databases (Vancouver BC June 13 2008)

httpwebeecsumichedu~michjcpaperswebtables_webdb08pdf

Connaway Lynn Sillipigni Timothy J Dickey and Marie L Radford 2011 ldquoIf it is too inconvenient Im not going after itrdquo

Convenience as

a critical factor in information-seeking behaviors Library amp Information Science Research (07408188) 33 no 3 179-190

References (Slides amp Full Paper)

Cazier Clay 2014 PM Digital Marketing Blog ldquoThe Future of Exif Image Datardquo Last accessed November 20 2014

httpwwwpmdigitalcomblog201404future-exif-image-data

Diagram Center Digital Image and Graphic Resources for Accessible Materials 2014 ldquoContent Modelrdquo Last Accessed

November 23 2014 httpdiagramcenterorgstandards-and-practicescontent-modelhtml

Google 2014 ldquoImage Publishing Guidelinesrdquo Last accessed November 21 2014

httpssupportgooglecomwebmastersanswer114016hl=en

Holman Lucy 2011 Millennial Students Mental Models of Search Implications for Academic Librarians and Database

Developers Journal Of Academic Librarianship 37 no 1 19-27

References (Slides amp Full Paper)

International Business Times 0006 BingGoogle and Yahoo merge to make search easier with schemaorg

International Business Times April

IPTC International Press Telecommunications Council 2014 ldquoEmbedded Metadata Manifestordquo Last accessed November

20 2014 httpwwwembeddedmetadataorgsocial-media-test-resultsphp (Embedded Metadata Manifesto 2014)

Kritzinger W T Search Engine Optimization and Pay-per-Click Marketing Strategies Journal of Organizational

Computing and Electronic Commerce no 3 (2013) 273-86

Lippincott Joan K ldquoNet Generation Students and Librariesrdquo EDUCAUSE (2005) accessed November 19 2014

httpwwweducauseeduresearch-and-publicationsbookseducating-net-generationnet-generation-students-and-libraries

References (Slides amp Full Paper)

Nakanishi T Semantic Context-Dependent Weighting for Vector Space Model Semantic Computing (ICSC) 2014

IEEE International Conference on vol no pp262266 16-18 June 2014 doi 101109ICSC201449

Paz Anita 2013 In search of Meaning The Written Word in the Age of Google Italian Journal Of Library amp

Information Science 4 no 2 255-266

Priebe T Schlager C Pernul G A search engine for RDF metadata Database and Expert Systems Applications

2004 Proceedings 15th International Workshop on vol no pp168172 2004 doi 101109DEXA20041333468

Reicks David 2010 ldquoWhy Embedded Metadata Wonrsquot Help Your SEOrdquo Last Updated December 30 2013 Last

Accessed November 23 2014 httpwwwcontrolledvocabularycomblogembedded-metadata-wont-help-seohtml

Page 6: Structured data and metadata evaluation methodology for organizations looking to improve image findability on the web emily kolvitz_2014

Literature Review

Most users donrsquot go past the first page of

search results (Paz 2013)

Structured Data Practices can help your

organization stay relevant (and findable) in

the age of information overload

Keeping it Search Engine Neutral is

advisable (Paz 2013)

TopicProposed Research

Methodology for establishing a baseline or benchmark of where an organization is at

in terms of structured data pertaining to image records that ultimately helps findability

on the web

By utilizing the proposed methodology for gathering this data for an organization

data-informed decisions can be made about structured data strategy going forward to

maintain relevancy on the web

Many structured data elements can affect online findability from file-naming

standards presence of alt text tags in html markup html markup itself embedded

metadata schemaorg markup and rich snippets text description at or nearby images

and more IEEE uses metadata or full-text for search (IEEE Xplore offers this--see

next slide)

Full Text Search amp Metadata Search

TopicProposed Research

It is also noteworthy that there are additional factors that affect findability on

the web that do not involve structured data but this research focuses solely on

structured data techniques within the control of individual organizations

All of these structured data techniques pertaining to image records will be

utilized in conjunction with the relevancy of onsite and offsite search results

Image search and information retrieval is a more difficult area than text search

and retrieval because accessibility to the image content is largely dependent on

side-car text (or metadata if you will) that describes the aboutness and

(hopefully) the context for the image record

Questions

Research Questions Addressed in this Study

1 What methods of search are available on the organizationrsquos online website

1 What is the file-naming structure for images on the website

1 What is the quality of search engine (onsite and offsite) results

1 What kinds of search results appear in Image Search when searching by the

organizationrsquos name and product description both with onsite search and offsite

search

Questions

Research Questions Addressed in this Study

5 What kinds of search results appear in Google Image Search when searching

by images taken from the organizationrsquos website

5 What kinds of search results come up when looking for specific products

(measure of structured data) through onsite search and offsite search

5 What are the results when looking for specific products on the offsite search

engine

Questions

Research Questions Addressed in this Study

8 What kinds of structured data are near or around the images on the organizationrsquos

website Alt Text Other

9 What file types appear on the organizationrsquos website (JPEG TIFF PNG)

9 What embedded metadata is available in images on the website

11 What does the XMPXMLRDF for these images look like and how robust is it

What does the graph look like

Variables

These measures are operationalized by utilization of likert scales applied by the human researcher For

example when rating the level of description for the file-name a research could conclude that the

filename sp_18379847923jpg is not very descriptive filename for a human let alone for a search engine

(unless of course this is a product sku) The researcher would then choose to assign it a low value on

descriptiveness on a 1-5 likert scale

Type of page

the image was

on

The image file naming

conventionfilename

Level of description for the

filename

Quality and number

of alt text tags

Quality and number

of embedded

metadata tags

Quality and number of structured

data tags pertaining to the images

Quality and number of search

results for onsite search

utilizing filename or alt text

Quality and number

of relevant search

results utilizing

offsite image search

Data Collection Methods

ParticipantsParticipants will include a single institution anonymized for the protection of their business The sample of image records utilized

in this study will be limited to image assets appearing on the organizationrsquos website domain Most data collection can take place

from the organizationrsquos website itself Some procedures will take place on external sites services or programs

Randomization of SampleThe sample of images utilized in this study can be randomized by extracting a site map of the particular organization of interest

using xsitemapcom After the site map is constructed the list of URLs should be inputted into a spreadsheet program and a record

number should be assigned to each URL From there the researcher can use a randomizer program to select the order of pages to

utilize in the study (ie Research Randomizer Available at httpwwwrandomizerorgformhtm) This method will be utilized for

taking a random sample of pages from the organization of interest

ConsentAll data collected in this study are publicly available and freely available on the web

Data Collection Methods

Obtaining Data on the website

Navigate to the URL

Right Click Image(s) and ldquoSave Asrdquo

Right Click Page and ldquoView Sourcerdquo Save as

txt file

Collect raw data from image by either

opening in Photoshop and Navigating to Raw

Data Column or utilize Phil Harveyrsquos

ExifTool

Obtaining Data through Structured Data Linter

Navigate to the Linter website

Enter URL

Screenshot Structured Data Results -or- save

as webpage

Obtaining Data through W3C RDF validator

Copy raw data xml extracted earlier and input

into RDF Validator

Select Graph Only on the Options

Parse RDF

Save Graph or Screenshot Graph

Store in Folder with other Data

Answer Research Questions

Systematically go through the collected data

and input findings into spreadsheet

Data Analysis Methods

Descriptive Statisticso Bell Curve - measures

towards a central tendency

using likert scale data

Bell Curve Image By Vierge Marie

(Own work) [Public domain] via

Wikimedia Commons

httpuploadwikimediaorgwikipe

diacommonsff6Gaussian_Filter

svg

Data Analysis Methods

Graphical Analysis

(Charts and Graphs)

Summary Report

Discussion of Findings

Visualizing the Results

The Structured Data Linter

utilizing URLs to display

structured data around the images

Available at

httplinterstructured-dataorg

Summary analysis will be

crafted utilizing all of these data

points to show what we are able

to understand about an image

versus what a machine or search

engine is able to know about an

image

W3C RDF Validator Graph

Visualization utilizing the raw

data markup extracted from the

image

Available at

httpwwww3orgRDFValidator

Structured Data Linter

Shows all

structured Data

Tags around the

images and in

the page markup

RDF Validator

Visualization of

embedded data

for images and

their subsequent

relationships to

other data

Summary Report

Complete Picture of Structured

Data Metadata and Analysis

of Study

Expected Outcomes

The anticipated results of this project include a benchmark for where this specific

organization is at in terms of structured data in the online environment and a

methodology for other organizations looking to assess their structured data maturity in

the digital space These results will be used to create a roadmap for improving resource

findability both on the web and within websites Other organizations may also aspire to

reuse this methodology for assessing their own current state of structured data Future

areas of research could include utilizing metadataRDF-driven search engines in

conjuncture with Vector Space Models to assess findability of image records on the

web and within websites

References (Slides amp Full Paper)

Algebraix Data Corporation 0005 Algebraix Data Launches Industryrsquos First Cost-Effective Automated Implementation

of Schemaorg Business Wire (English) 5

Beall Jeffrey 2010 How Google Uses Metadata to Improve Search Results Serials Librarian 59 no 1 40-53

Breeding Marshall 2013 Linked Data The Next Big Wave or Another Tech Fad Computers In Libraries 33 no 3

20-22

Cafarella MJ Halevy AY Zhang Y Wang DZ and Wu E Uncovering the relational Web In Proceedings of the

11th International Workshop on the Web and Databases (Vancouver BC June 13 2008)

httpwebeecsumichedu~michjcpaperswebtables_webdb08pdf

Connaway Lynn Sillipigni Timothy J Dickey and Marie L Radford 2011 ldquoIf it is too inconvenient Im not going after itrdquo

Convenience as

a critical factor in information-seeking behaviors Library amp Information Science Research (07408188) 33 no 3 179-190

References (Slides amp Full Paper)

Cazier Clay 2014 PM Digital Marketing Blog ldquoThe Future of Exif Image Datardquo Last accessed November 20 2014

httpwwwpmdigitalcomblog201404future-exif-image-data

Diagram Center Digital Image and Graphic Resources for Accessible Materials 2014 ldquoContent Modelrdquo Last Accessed

November 23 2014 httpdiagramcenterorgstandards-and-practicescontent-modelhtml

Google 2014 ldquoImage Publishing Guidelinesrdquo Last accessed November 21 2014

httpssupportgooglecomwebmastersanswer114016hl=en

Holman Lucy 2011 Millennial Students Mental Models of Search Implications for Academic Librarians and Database

Developers Journal Of Academic Librarianship 37 no 1 19-27

References (Slides amp Full Paper)

International Business Times 0006 BingGoogle and Yahoo merge to make search easier with schemaorg

International Business Times April

IPTC International Press Telecommunications Council 2014 ldquoEmbedded Metadata Manifestordquo Last accessed November

20 2014 httpwwwembeddedmetadataorgsocial-media-test-resultsphp (Embedded Metadata Manifesto 2014)

Kritzinger W T Search Engine Optimization and Pay-per-Click Marketing Strategies Journal of Organizational

Computing and Electronic Commerce no 3 (2013) 273-86

Lippincott Joan K ldquoNet Generation Students and Librariesrdquo EDUCAUSE (2005) accessed November 19 2014

httpwwweducauseeduresearch-and-publicationsbookseducating-net-generationnet-generation-students-and-libraries

References (Slides amp Full Paper)

Nakanishi T Semantic Context-Dependent Weighting for Vector Space Model Semantic Computing (ICSC) 2014

IEEE International Conference on vol no pp262266 16-18 June 2014 doi 101109ICSC201449

Paz Anita 2013 In search of Meaning The Written Word in the Age of Google Italian Journal Of Library amp

Information Science 4 no 2 255-266

Priebe T Schlager C Pernul G A search engine for RDF metadata Database and Expert Systems Applications

2004 Proceedings 15th International Workshop on vol no pp168172 2004 doi 101109DEXA20041333468

Reicks David 2010 ldquoWhy Embedded Metadata Wonrsquot Help Your SEOrdquo Last Updated December 30 2013 Last

Accessed November 23 2014 httpwwwcontrolledvocabularycomblogembedded-metadata-wont-help-seohtml

Page 7: Structured data and metadata evaluation methodology for organizations looking to improve image findability on the web emily kolvitz_2014

TopicProposed Research

Methodology for establishing a baseline or benchmark of where an organization is at

in terms of structured data pertaining to image records that ultimately helps findability

on the web

By utilizing the proposed methodology for gathering this data for an organization

data-informed decisions can be made about structured data strategy going forward to

maintain relevancy on the web

Many structured data elements can affect online findability from file-naming

standards presence of alt text tags in html markup html markup itself embedded

metadata schemaorg markup and rich snippets text description at or nearby images

and more IEEE uses metadata or full-text for search (IEEE Xplore offers this--see

next slide)

Full Text Search amp Metadata Search

TopicProposed Research

It is also noteworthy that there are additional factors that affect findability on

the web that do not involve structured data but this research focuses solely on

structured data techniques within the control of individual organizations

All of these structured data techniques pertaining to image records will be

utilized in conjunction with the relevancy of onsite and offsite search results

Image search and information retrieval is a more difficult area than text search

and retrieval because accessibility to the image content is largely dependent on

side-car text (or metadata if you will) that describes the aboutness and

(hopefully) the context for the image record

Questions

Research Questions Addressed in this Study

1 What methods of search are available on the organizationrsquos online website

1 What is the file-naming structure for images on the website

1 What is the quality of search engine (onsite and offsite) results

1 What kinds of search results appear in Image Search when searching by the

organizationrsquos name and product description both with onsite search and offsite

search

Questions

Research Questions Addressed in this Study

5 What kinds of search results appear in Google Image Search when searching

by images taken from the organizationrsquos website

5 What kinds of search results come up when looking for specific products

(measure of structured data) through onsite search and offsite search

5 What are the results when looking for specific products on the offsite search

engine

Questions

Research Questions Addressed in this Study

8 What kinds of structured data are near or around the images on the organizationrsquos

website Alt Text Other

9 What file types appear on the organizationrsquos website (JPEG TIFF PNG)

9 What embedded metadata is available in images on the website

11 What does the XMPXMLRDF for these images look like and how robust is it

What does the graph look like

Variables

These measures are operationalized by utilization of likert scales applied by the human researcher For

example when rating the level of description for the file-name a research could conclude that the

filename sp_18379847923jpg is not very descriptive filename for a human let alone for a search engine

(unless of course this is a product sku) The researcher would then choose to assign it a low value on

descriptiveness on a 1-5 likert scale

Type of page

the image was

on

The image file naming

conventionfilename

Level of description for the

filename

Quality and number

of alt text tags

Quality and number

of embedded

metadata tags

Quality and number of structured

data tags pertaining to the images

Quality and number of search

results for onsite search

utilizing filename or alt text

Quality and number

of relevant search

results utilizing

offsite image search

Data Collection Methods

ParticipantsParticipants will include a single institution anonymized for the protection of their business The sample of image records utilized

in this study will be limited to image assets appearing on the organizationrsquos website domain Most data collection can take place

from the organizationrsquos website itself Some procedures will take place on external sites services or programs

Randomization of SampleThe sample of images utilized in this study can be randomized by extracting a site map of the particular organization of interest

using xsitemapcom After the site map is constructed the list of URLs should be inputted into a spreadsheet program and a record

number should be assigned to each URL From there the researcher can use a randomizer program to select the order of pages to

utilize in the study (ie Research Randomizer Available at httpwwwrandomizerorgformhtm) This method will be utilized for

taking a random sample of pages from the organization of interest

ConsentAll data collected in this study are publicly available and freely available on the web

Data Collection Methods

Obtaining Data on the website

Navigate to the URL

Right Click Image(s) and ldquoSave Asrdquo

Right Click Page and ldquoView Sourcerdquo Save as

txt file

Collect raw data from image by either

opening in Photoshop and Navigating to Raw

Data Column or utilize Phil Harveyrsquos

ExifTool

Obtaining Data through Structured Data Linter

Navigate to the Linter website

Enter URL

Screenshot Structured Data Results -or- save

as webpage

Obtaining Data through W3C RDF validator

Copy raw data xml extracted earlier and input

into RDF Validator

Select Graph Only on the Options

Parse RDF

Save Graph or Screenshot Graph

Store in Folder with other Data

Answer Research Questions

Systematically go through the collected data

and input findings into spreadsheet

Data Analysis Methods

Descriptive Statisticso Bell Curve - measures

towards a central tendency

using likert scale data

Bell Curve Image By Vierge Marie

(Own work) [Public domain] via

Wikimedia Commons

httpuploadwikimediaorgwikipe

diacommonsff6Gaussian_Filter

svg

Data Analysis Methods

Graphical Analysis

(Charts and Graphs)

Summary Report

Discussion of Findings

Visualizing the Results

The Structured Data Linter

utilizing URLs to display

structured data around the images

Available at

httplinterstructured-dataorg

Summary analysis will be

crafted utilizing all of these data

points to show what we are able

to understand about an image

versus what a machine or search

engine is able to know about an

image

W3C RDF Validator Graph

Visualization utilizing the raw

data markup extracted from the

image

Available at

httpwwww3orgRDFValidator

Structured Data Linter

Shows all

structured Data

Tags around the

images and in

the page markup

RDF Validator

Visualization of

embedded data

for images and

their subsequent

relationships to

other data

Summary Report

Complete Picture of Structured

Data Metadata and Analysis

of Study

Expected Outcomes

The anticipated results of this project include a benchmark for where this specific

organization is at in terms of structured data in the online environment and a

methodology for other organizations looking to assess their structured data maturity in

the digital space These results will be used to create a roadmap for improving resource

findability both on the web and within websites Other organizations may also aspire to

reuse this methodology for assessing their own current state of structured data Future

areas of research could include utilizing metadataRDF-driven search engines in

conjuncture with Vector Space Models to assess findability of image records on the

web and within websites

References (Slides amp Full Paper)

Algebraix Data Corporation 0005 Algebraix Data Launches Industryrsquos First Cost-Effective Automated Implementation

of Schemaorg Business Wire (English) 5

Beall Jeffrey 2010 How Google Uses Metadata to Improve Search Results Serials Librarian 59 no 1 40-53

Breeding Marshall 2013 Linked Data The Next Big Wave or Another Tech Fad Computers In Libraries 33 no 3

20-22

Cafarella MJ Halevy AY Zhang Y Wang DZ and Wu E Uncovering the relational Web In Proceedings of the

11th International Workshop on the Web and Databases (Vancouver BC June 13 2008)

httpwebeecsumichedu~michjcpaperswebtables_webdb08pdf

Connaway Lynn Sillipigni Timothy J Dickey and Marie L Radford 2011 ldquoIf it is too inconvenient Im not going after itrdquo

Convenience as

a critical factor in information-seeking behaviors Library amp Information Science Research (07408188) 33 no 3 179-190

References (Slides amp Full Paper)

Cazier Clay 2014 PM Digital Marketing Blog ldquoThe Future of Exif Image Datardquo Last accessed November 20 2014

httpwwwpmdigitalcomblog201404future-exif-image-data

Diagram Center Digital Image and Graphic Resources for Accessible Materials 2014 ldquoContent Modelrdquo Last Accessed

November 23 2014 httpdiagramcenterorgstandards-and-practicescontent-modelhtml

Google 2014 ldquoImage Publishing Guidelinesrdquo Last accessed November 21 2014

httpssupportgooglecomwebmastersanswer114016hl=en

Holman Lucy 2011 Millennial Students Mental Models of Search Implications for Academic Librarians and Database

Developers Journal Of Academic Librarianship 37 no 1 19-27

References (Slides amp Full Paper)

International Business Times 0006 BingGoogle and Yahoo merge to make search easier with schemaorg

International Business Times April

IPTC International Press Telecommunications Council 2014 ldquoEmbedded Metadata Manifestordquo Last accessed November

20 2014 httpwwwembeddedmetadataorgsocial-media-test-resultsphp (Embedded Metadata Manifesto 2014)

Kritzinger W T Search Engine Optimization and Pay-per-Click Marketing Strategies Journal of Organizational

Computing and Electronic Commerce no 3 (2013) 273-86

Lippincott Joan K ldquoNet Generation Students and Librariesrdquo EDUCAUSE (2005) accessed November 19 2014

httpwwweducauseeduresearch-and-publicationsbookseducating-net-generationnet-generation-students-and-libraries

References (Slides amp Full Paper)

Nakanishi T Semantic Context-Dependent Weighting for Vector Space Model Semantic Computing (ICSC) 2014

IEEE International Conference on vol no pp262266 16-18 June 2014 doi 101109ICSC201449

Paz Anita 2013 In search of Meaning The Written Word in the Age of Google Italian Journal Of Library amp

Information Science 4 no 2 255-266

Priebe T Schlager C Pernul G A search engine for RDF metadata Database and Expert Systems Applications

2004 Proceedings 15th International Workshop on vol no pp168172 2004 doi 101109DEXA20041333468

Reicks David 2010 ldquoWhy Embedded Metadata Wonrsquot Help Your SEOrdquo Last Updated December 30 2013 Last

Accessed November 23 2014 httpwwwcontrolledvocabularycomblogembedded-metadata-wont-help-seohtml

Page 8: Structured data and metadata evaluation methodology for organizations looking to improve image findability on the web emily kolvitz_2014

Full Text Search amp Metadata Search

TopicProposed Research

It is also noteworthy that there are additional factors that affect findability on

the web that do not involve structured data but this research focuses solely on

structured data techniques within the control of individual organizations

All of these structured data techniques pertaining to image records will be

utilized in conjunction with the relevancy of onsite and offsite search results

Image search and information retrieval is a more difficult area than text search

and retrieval because accessibility to the image content is largely dependent on

side-car text (or metadata if you will) that describes the aboutness and

(hopefully) the context for the image record

Questions

Research Questions Addressed in this Study

1 What methods of search are available on the organizationrsquos online website

1 What is the file-naming structure for images on the website

1 What is the quality of search engine (onsite and offsite) results

1 What kinds of search results appear in Image Search when searching by the

organizationrsquos name and product description both with onsite search and offsite

search

Questions

Research Questions Addressed in this Study

5 What kinds of search results appear in Google Image Search when searching

by images taken from the organizationrsquos website

5 What kinds of search results come up when looking for specific products

(measure of structured data) through onsite search and offsite search

5 What are the results when looking for specific products on the offsite search

engine

Questions

Research Questions Addressed in this Study

8 What kinds of structured data are near or around the images on the organizationrsquos

website Alt Text Other

9 What file types appear on the organizationrsquos website (JPEG TIFF PNG)

9 What embedded metadata is available in images on the website

11 What does the XMPXMLRDF for these images look like and how robust is it

What does the graph look like

Variables

These measures are operationalized by utilization of likert scales applied by the human researcher For

example when rating the level of description for the file-name a research could conclude that the

filename sp_18379847923jpg is not very descriptive filename for a human let alone for a search engine

(unless of course this is a product sku) The researcher would then choose to assign it a low value on

descriptiveness on a 1-5 likert scale

Type of page

the image was

on

The image file naming

conventionfilename

Level of description for the

filename

Quality and number

of alt text tags

Quality and number

of embedded

metadata tags

Quality and number of structured

data tags pertaining to the images

Quality and number of search

results for onsite search

utilizing filename or alt text

Quality and number

of relevant search

results utilizing

offsite image search

Data Collection Methods

ParticipantsParticipants will include a single institution anonymized for the protection of their business The sample of image records utilized

in this study will be limited to image assets appearing on the organizationrsquos website domain Most data collection can take place

from the organizationrsquos website itself Some procedures will take place on external sites services or programs

Randomization of SampleThe sample of images utilized in this study can be randomized by extracting a site map of the particular organization of interest

using xsitemapcom After the site map is constructed the list of URLs should be inputted into a spreadsheet program and a record

number should be assigned to each URL From there the researcher can use a randomizer program to select the order of pages to

utilize in the study (ie Research Randomizer Available at httpwwwrandomizerorgformhtm) This method will be utilized for

taking a random sample of pages from the organization of interest

ConsentAll data collected in this study are publicly available and freely available on the web

Data Collection Methods

Obtaining Data on the website

Navigate to the URL

Right Click Image(s) and ldquoSave Asrdquo

Right Click Page and ldquoView Sourcerdquo Save as

txt file

Collect raw data from image by either

opening in Photoshop and Navigating to Raw

Data Column or utilize Phil Harveyrsquos

ExifTool

Obtaining Data through Structured Data Linter

Navigate to the Linter website

Enter URL

Screenshot Structured Data Results -or- save

as webpage

Obtaining Data through W3C RDF validator

Copy raw data xml extracted earlier and input

into RDF Validator

Select Graph Only on the Options

Parse RDF

Save Graph or Screenshot Graph

Store in Folder with other Data

Answer Research Questions

Systematically go through the collected data

and input findings into spreadsheet

Data Analysis Methods

Descriptive Statisticso Bell Curve - measures

towards a central tendency

using likert scale data

Bell Curve Image By Vierge Marie

(Own work) [Public domain] via

Wikimedia Commons

httpuploadwikimediaorgwikipe

diacommonsff6Gaussian_Filter

svg

Data Analysis Methods

Graphical Analysis

(Charts and Graphs)

Summary Report

Discussion of Findings

Visualizing the Results

The Structured Data Linter

utilizing URLs to display

structured data around the images

Available at

httplinterstructured-dataorg

Summary analysis will be

crafted utilizing all of these data

points to show what we are able

to understand about an image

versus what a machine or search

engine is able to know about an

image

W3C RDF Validator Graph

Visualization utilizing the raw

data markup extracted from the

image

Available at

httpwwww3orgRDFValidator

Structured Data Linter

Shows all

structured Data

Tags around the

images and in

the page markup

RDF Validator

Visualization of

embedded data

for images and

their subsequent

relationships to

other data

Summary Report

Complete Picture of Structured

Data Metadata and Analysis

of Study

Expected Outcomes

The anticipated results of this project include a benchmark for where this specific

organization is at in terms of structured data in the online environment and a

methodology for other organizations looking to assess their structured data maturity in

the digital space These results will be used to create a roadmap for improving resource

findability both on the web and within websites Other organizations may also aspire to

reuse this methodology for assessing their own current state of structured data Future

areas of research could include utilizing metadataRDF-driven search engines in

conjuncture with Vector Space Models to assess findability of image records on the

web and within websites

References (Slides amp Full Paper)

Algebraix Data Corporation 0005 Algebraix Data Launches Industryrsquos First Cost-Effective Automated Implementation

of Schemaorg Business Wire (English) 5

Beall Jeffrey 2010 How Google Uses Metadata to Improve Search Results Serials Librarian 59 no 1 40-53

Breeding Marshall 2013 Linked Data The Next Big Wave or Another Tech Fad Computers In Libraries 33 no 3

20-22

Cafarella MJ Halevy AY Zhang Y Wang DZ and Wu E Uncovering the relational Web In Proceedings of the

11th International Workshop on the Web and Databases (Vancouver BC June 13 2008)

httpwebeecsumichedu~michjcpaperswebtables_webdb08pdf

Connaway Lynn Sillipigni Timothy J Dickey and Marie L Radford 2011 ldquoIf it is too inconvenient Im not going after itrdquo

Convenience as

a critical factor in information-seeking behaviors Library amp Information Science Research (07408188) 33 no 3 179-190

References (Slides amp Full Paper)

Cazier Clay 2014 PM Digital Marketing Blog ldquoThe Future of Exif Image Datardquo Last accessed November 20 2014

httpwwwpmdigitalcomblog201404future-exif-image-data

Diagram Center Digital Image and Graphic Resources for Accessible Materials 2014 ldquoContent Modelrdquo Last Accessed

November 23 2014 httpdiagramcenterorgstandards-and-practicescontent-modelhtml

Google 2014 ldquoImage Publishing Guidelinesrdquo Last accessed November 21 2014

httpssupportgooglecomwebmastersanswer114016hl=en

Holman Lucy 2011 Millennial Students Mental Models of Search Implications for Academic Librarians and Database

Developers Journal Of Academic Librarianship 37 no 1 19-27

References (Slides amp Full Paper)

International Business Times 0006 BingGoogle and Yahoo merge to make search easier with schemaorg

International Business Times April

IPTC International Press Telecommunications Council 2014 ldquoEmbedded Metadata Manifestordquo Last accessed November

20 2014 httpwwwembeddedmetadataorgsocial-media-test-resultsphp (Embedded Metadata Manifesto 2014)

Kritzinger W T Search Engine Optimization and Pay-per-Click Marketing Strategies Journal of Organizational

Computing and Electronic Commerce no 3 (2013) 273-86

Lippincott Joan K ldquoNet Generation Students and Librariesrdquo EDUCAUSE (2005) accessed November 19 2014

httpwwweducauseeduresearch-and-publicationsbookseducating-net-generationnet-generation-students-and-libraries

References (Slides amp Full Paper)

Nakanishi T Semantic Context-Dependent Weighting for Vector Space Model Semantic Computing (ICSC) 2014

IEEE International Conference on vol no pp262266 16-18 June 2014 doi 101109ICSC201449

Paz Anita 2013 In search of Meaning The Written Word in the Age of Google Italian Journal Of Library amp

Information Science 4 no 2 255-266

Priebe T Schlager C Pernul G A search engine for RDF metadata Database and Expert Systems Applications

2004 Proceedings 15th International Workshop on vol no pp168172 2004 doi 101109DEXA20041333468

Reicks David 2010 ldquoWhy Embedded Metadata Wonrsquot Help Your SEOrdquo Last Updated December 30 2013 Last

Accessed November 23 2014 httpwwwcontrolledvocabularycomblogembedded-metadata-wont-help-seohtml

Page 9: Structured data and metadata evaluation methodology for organizations looking to improve image findability on the web emily kolvitz_2014

TopicProposed Research

It is also noteworthy that there are additional factors that affect findability on

the web that do not involve structured data but this research focuses solely on

structured data techniques within the control of individual organizations

All of these structured data techniques pertaining to image records will be

utilized in conjunction with the relevancy of onsite and offsite search results

Image search and information retrieval is a more difficult area than text search

and retrieval because accessibility to the image content is largely dependent on

side-car text (or metadata if you will) that describes the aboutness and

(hopefully) the context for the image record

Questions

Research Questions Addressed in this Study

1 What methods of search are available on the organizationrsquos online website

1 What is the file-naming structure for images on the website

1 What is the quality of search engine (onsite and offsite) results

1 What kinds of search results appear in Image Search when searching by the

organizationrsquos name and product description both with onsite search and offsite

search

Questions

Research Questions Addressed in this Study

5 What kinds of search results appear in Google Image Search when searching

by images taken from the organizationrsquos website

5 What kinds of search results come up when looking for specific products

(measure of structured data) through onsite search and offsite search

5 What are the results when looking for specific products on the offsite search

engine

Questions

Research Questions Addressed in this Study

8 What kinds of structured data are near or around the images on the organizationrsquos

website Alt Text Other

9 What file types appear on the organizationrsquos website (JPEG TIFF PNG)

9 What embedded metadata is available in images on the website

11 What does the XMPXMLRDF for these images look like and how robust is it

What does the graph look like

Variables

These measures are operationalized by utilization of likert scales applied by the human researcher For

example when rating the level of description for the file-name a research could conclude that the

filename sp_18379847923jpg is not very descriptive filename for a human let alone for a search engine

(unless of course this is a product sku) The researcher would then choose to assign it a low value on

descriptiveness on a 1-5 likert scale

Type of page

the image was

on

The image file naming

conventionfilename

Level of description for the

filename

Quality and number

of alt text tags

Quality and number

of embedded

metadata tags

Quality and number of structured

data tags pertaining to the images

Quality and number of search

results for onsite search

utilizing filename or alt text

Quality and number

of relevant search

results utilizing

offsite image search

Data Collection Methods

ParticipantsParticipants will include a single institution anonymized for the protection of their business The sample of image records utilized

in this study will be limited to image assets appearing on the organizationrsquos website domain Most data collection can take place

from the organizationrsquos website itself Some procedures will take place on external sites services or programs

Randomization of SampleThe sample of images utilized in this study can be randomized by extracting a site map of the particular organization of interest

using xsitemapcom After the site map is constructed the list of URLs should be inputted into a spreadsheet program and a record

number should be assigned to each URL From there the researcher can use a randomizer program to select the order of pages to

utilize in the study (ie Research Randomizer Available at httpwwwrandomizerorgformhtm) This method will be utilized for

taking a random sample of pages from the organization of interest

ConsentAll data collected in this study are publicly available and freely available on the web

Data Collection Methods

Obtaining Data on the website

Navigate to the URL

Right Click Image(s) and ldquoSave Asrdquo

Right Click Page and ldquoView Sourcerdquo Save as

txt file

Collect raw data from image by either

opening in Photoshop and Navigating to Raw

Data Column or utilize Phil Harveyrsquos

ExifTool

Obtaining Data through Structured Data Linter

Navigate to the Linter website

Enter URL

Screenshot Structured Data Results -or- save

as webpage

Obtaining Data through W3C RDF validator

Copy raw data xml extracted earlier and input

into RDF Validator

Select Graph Only on the Options

Parse RDF

Save Graph or Screenshot Graph

Store in Folder with other Data

Answer Research Questions

Systematically go through the collected data

and input findings into spreadsheet

Data Analysis Methods

Descriptive Statisticso Bell Curve - measures

towards a central tendency

using likert scale data

Bell Curve Image By Vierge Marie

(Own work) [Public domain] via

Wikimedia Commons

httpuploadwikimediaorgwikipe

diacommonsff6Gaussian_Filter

svg

Data Analysis Methods

Graphical Analysis

(Charts and Graphs)

Summary Report

Discussion of Findings

Visualizing the Results

The Structured Data Linter

utilizing URLs to display

structured data around the images

Available at

httplinterstructured-dataorg

Summary analysis will be

crafted utilizing all of these data

points to show what we are able

to understand about an image

versus what a machine or search

engine is able to know about an

image

W3C RDF Validator Graph

Visualization utilizing the raw

data markup extracted from the

image

Available at

httpwwww3orgRDFValidator

Structured Data Linter

Shows all

structured Data

Tags around the

images and in

the page markup

RDF Validator

Visualization of

embedded data

for images and

their subsequent

relationships to

other data

Summary Report

Complete Picture of Structured

Data Metadata and Analysis

of Study

Expected Outcomes

The anticipated results of this project include a benchmark for where this specific

organization is at in terms of structured data in the online environment and a

methodology for other organizations looking to assess their structured data maturity in

the digital space These results will be used to create a roadmap for improving resource

findability both on the web and within websites Other organizations may also aspire to

reuse this methodology for assessing their own current state of structured data Future

areas of research could include utilizing metadataRDF-driven search engines in

conjuncture with Vector Space Models to assess findability of image records on the

web and within websites

References (Slides amp Full Paper)

Algebraix Data Corporation 0005 Algebraix Data Launches Industryrsquos First Cost-Effective Automated Implementation

of Schemaorg Business Wire (English) 5

Beall Jeffrey 2010 How Google Uses Metadata to Improve Search Results Serials Librarian 59 no 1 40-53

Breeding Marshall 2013 Linked Data The Next Big Wave or Another Tech Fad Computers In Libraries 33 no 3

20-22

Cafarella MJ Halevy AY Zhang Y Wang DZ and Wu E Uncovering the relational Web In Proceedings of the

11th International Workshop on the Web and Databases (Vancouver BC June 13 2008)

httpwebeecsumichedu~michjcpaperswebtables_webdb08pdf

Connaway Lynn Sillipigni Timothy J Dickey and Marie L Radford 2011 ldquoIf it is too inconvenient Im not going after itrdquo

Convenience as

a critical factor in information-seeking behaviors Library amp Information Science Research (07408188) 33 no 3 179-190

References (Slides amp Full Paper)

Cazier Clay 2014 PM Digital Marketing Blog ldquoThe Future of Exif Image Datardquo Last accessed November 20 2014

httpwwwpmdigitalcomblog201404future-exif-image-data

Diagram Center Digital Image and Graphic Resources for Accessible Materials 2014 ldquoContent Modelrdquo Last Accessed

November 23 2014 httpdiagramcenterorgstandards-and-practicescontent-modelhtml

Google 2014 ldquoImage Publishing Guidelinesrdquo Last accessed November 21 2014

httpssupportgooglecomwebmastersanswer114016hl=en

Holman Lucy 2011 Millennial Students Mental Models of Search Implications for Academic Librarians and Database

Developers Journal Of Academic Librarianship 37 no 1 19-27

References (Slides amp Full Paper)

International Business Times 0006 BingGoogle and Yahoo merge to make search easier with schemaorg

International Business Times April

IPTC International Press Telecommunications Council 2014 ldquoEmbedded Metadata Manifestordquo Last accessed November

20 2014 httpwwwembeddedmetadataorgsocial-media-test-resultsphp (Embedded Metadata Manifesto 2014)

Kritzinger W T Search Engine Optimization and Pay-per-Click Marketing Strategies Journal of Organizational

Computing and Electronic Commerce no 3 (2013) 273-86

Lippincott Joan K ldquoNet Generation Students and Librariesrdquo EDUCAUSE (2005) accessed November 19 2014

httpwwweducauseeduresearch-and-publicationsbookseducating-net-generationnet-generation-students-and-libraries

References (Slides amp Full Paper)

Nakanishi T Semantic Context-Dependent Weighting for Vector Space Model Semantic Computing (ICSC) 2014

IEEE International Conference on vol no pp262266 16-18 June 2014 doi 101109ICSC201449

Paz Anita 2013 In search of Meaning The Written Word in the Age of Google Italian Journal Of Library amp

Information Science 4 no 2 255-266

Priebe T Schlager C Pernul G A search engine for RDF metadata Database and Expert Systems Applications

2004 Proceedings 15th International Workshop on vol no pp168172 2004 doi 101109DEXA20041333468

Reicks David 2010 ldquoWhy Embedded Metadata Wonrsquot Help Your SEOrdquo Last Updated December 30 2013 Last

Accessed November 23 2014 httpwwwcontrolledvocabularycomblogembedded-metadata-wont-help-seohtml

Page 10: Structured data and metadata evaluation methodology for organizations looking to improve image findability on the web emily kolvitz_2014

Questions

Research Questions Addressed in this Study

1 What methods of search are available on the organizationrsquos online website

1 What is the file-naming structure for images on the website

1 What is the quality of search engine (onsite and offsite) results

1 What kinds of search results appear in Image Search when searching by the

organizationrsquos name and product description both with onsite search and offsite

search

Questions

Research Questions Addressed in this Study

5 What kinds of search results appear in Google Image Search when searching

by images taken from the organizationrsquos website

5 What kinds of search results come up when looking for specific products

(measure of structured data) through onsite search and offsite search

5 What are the results when looking for specific products on the offsite search

engine

Questions

Research Questions Addressed in this Study

8 What kinds of structured data are near or around the images on the organizationrsquos

website Alt Text Other

9 What file types appear on the organizationrsquos website (JPEG TIFF PNG)

9 What embedded metadata is available in images on the website

11 What does the XMPXMLRDF for these images look like and how robust is it

What does the graph look like

Variables

These measures are operationalized by utilization of likert scales applied by the human researcher For

example when rating the level of description for the file-name a research could conclude that the

filename sp_18379847923jpg is not very descriptive filename for a human let alone for a search engine

(unless of course this is a product sku) The researcher would then choose to assign it a low value on

descriptiveness on a 1-5 likert scale

Type of page

the image was

on

The image file naming

conventionfilename

Level of description for the

filename

Quality and number

of alt text tags

Quality and number

of embedded

metadata tags

Quality and number of structured

data tags pertaining to the images

Quality and number of search

results for onsite search

utilizing filename or alt text

Quality and number

of relevant search

results utilizing

offsite image search

Data Collection Methods

ParticipantsParticipants will include a single institution anonymized for the protection of their business The sample of image records utilized

in this study will be limited to image assets appearing on the organizationrsquos website domain Most data collection can take place

from the organizationrsquos website itself Some procedures will take place on external sites services or programs

Randomization of SampleThe sample of images utilized in this study can be randomized by extracting a site map of the particular organization of interest

using xsitemapcom After the site map is constructed the list of URLs should be inputted into a spreadsheet program and a record

number should be assigned to each URL From there the researcher can use a randomizer program to select the order of pages to

utilize in the study (ie Research Randomizer Available at httpwwwrandomizerorgformhtm) This method will be utilized for

taking a random sample of pages from the organization of interest

ConsentAll data collected in this study are publicly available and freely available on the web

Data Collection Methods

Obtaining Data on the website

Navigate to the URL

Right Click Image(s) and ldquoSave Asrdquo

Right Click Page and ldquoView Sourcerdquo Save as

txt file

Collect raw data from image by either

opening in Photoshop and Navigating to Raw

Data Column or utilize Phil Harveyrsquos

ExifTool

Obtaining Data through Structured Data Linter

Navigate to the Linter website

Enter URL

Screenshot Structured Data Results -or- save

as webpage

Obtaining Data through W3C RDF validator

Copy raw data xml extracted earlier and input

into RDF Validator

Select Graph Only on the Options

Parse RDF

Save Graph or Screenshot Graph

Store in Folder with other Data

Answer Research Questions

Systematically go through the collected data

and input findings into spreadsheet

Data Analysis Methods

Descriptive Statisticso Bell Curve - measures

towards a central tendency

using likert scale data

Bell Curve Image By Vierge Marie

(Own work) [Public domain] via

Wikimedia Commons

httpuploadwikimediaorgwikipe

diacommonsff6Gaussian_Filter

svg

Data Analysis Methods

Graphical Analysis

(Charts and Graphs)

Summary Report

Discussion of Findings

Visualizing the Results

The Structured Data Linter

utilizing URLs to display

structured data around the images

Available at

httplinterstructured-dataorg

Summary analysis will be

crafted utilizing all of these data

points to show what we are able

to understand about an image

versus what a machine or search

engine is able to know about an

image

W3C RDF Validator Graph

Visualization utilizing the raw

data markup extracted from the

image

Available at

httpwwww3orgRDFValidator

Structured Data Linter

Shows all

structured Data

Tags around the

images and in

the page markup

RDF Validator

Visualization of

embedded data

for images and

their subsequent

relationships to

other data

Summary Report

Complete Picture of Structured

Data Metadata and Analysis

of Study

Expected Outcomes

The anticipated results of this project include a benchmark for where this specific

organization is at in terms of structured data in the online environment and a

methodology for other organizations looking to assess their structured data maturity in

the digital space These results will be used to create a roadmap for improving resource

findability both on the web and within websites Other organizations may also aspire to

reuse this methodology for assessing their own current state of structured data Future

areas of research could include utilizing metadataRDF-driven search engines in

conjuncture with Vector Space Models to assess findability of image records on the

web and within websites

References (Slides amp Full Paper)

Algebraix Data Corporation 0005 Algebraix Data Launches Industryrsquos First Cost-Effective Automated Implementation

of Schemaorg Business Wire (English) 5

Beall Jeffrey 2010 How Google Uses Metadata to Improve Search Results Serials Librarian 59 no 1 40-53

Breeding Marshall 2013 Linked Data The Next Big Wave or Another Tech Fad Computers In Libraries 33 no 3

20-22

Cafarella MJ Halevy AY Zhang Y Wang DZ and Wu E Uncovering the relational Web In Proceedings of the

11th International Workshop on the Web and Databases (Vancouver BC June 13 2008)

httpwebeecsumichedu~michjcpaperswebtables_webdb08pdf

Connaway Lynn Sillipigni Timothy J Dickey and Marie L Radford 2011 ldquoIf it is too inconvenient Im not going after itrdquo

Convenience as

a critical factor in information-seeking behaviors Library amp Information Science Research (07408188) 33 no 3 179-190

References (Slides amp Full Paper)

Cazier Clay 2014 PM Digital Marketing Blog ldquoThe Future of Exif Image Datardquo Last accessed November 20 2014

httpwwwpmdigitalcomblog201404future-exif-image-data

Diagram Center Digital Image and Graphic Resources for Accessible Materials 2014 ldquoContent Modelrdquo Last Accessed

November 23 2014 httpdiagramcenterorgstandards-and-practicescontent-modelhtml

Google 2014 ldquoImage Publishing Guidelinesrdquo Last accessed November 21 2014

httpssupportgooglecomwebmastersanswer114016hl=en

Holman Lucy 2011 Millennial Students Mental Models of Search Implications for Academic Librarians and Database

Developers Journal Of Academic Librarianship 37 no 1 19-27

References (Slides amp Full Paper)

International Business Times 0006 BingGoogle and Yahoo merge to make search easier with schemaorg

International Business Times April

IPTC International Press Telecommunications Council 2014 ldquoEmbedded Metadata Manifestordquo Last accessed November

20 2014 httpwwwembeddedmetadataorgsocial-media-test-resultsphp (Embedded Metadata Manifesto 2014)

Kritzinger W T Search Engine Optimization and Pay-per-Click Marketing Strategies Journal of Organizational

Computing and Electronic Commerce no 3 (2013) 273-86

Lippincott Joan K ldquoNet Generation Students and Librariesrdquo EDUCAUSE (2005) accessed November 19 2014

httpwwweducauseeduresearch-and-publicationsbookseducating-net-generationnet-generation-students-and-libraries

References (Slides amp Full Paper)

Nakanishi T Semantic Context-Dependent Weighting for Vector Space Model Semantic Computing (ICSC) 2014

IEEE International Conference on vol no pp262266 16-18 June 2014 doi 101109ICSC201449

Paz Anita 2013 In search of Meaning The Written Word in the Age of Google Italian Journal Of Library amp

Information Science 4 no 2 255-266

Priebe T Schlager C Pernul G A search engine for RDF metadata Database and Expert Systems Applications

2004 Proceedings 15th International Workshop on vol no pp168172 2004 doi 101109DEXA20041333468

Reicks David 2010 ldquoWhy Embedded Metadata Wonrsquot Help Your SEOrdquo Last Updated December 30 2013 Last

Accessed November 23 2014 httpwwwcontrolledvocabularycomblogembedded-metadata-wont-help-seohtml

Page 11: Structured data and metadata evaluation methodology for organizations looking to improve image findability on the web emily kolvitz_2014

Questions

Research Questions Addressed in this Study

5 What kinds of search results appear in Google Image Search when searching

by images taken from the organizationrsquos website

5 What kinds of search results come up when looking for specific products

(measure of structured data) through onsite search and offsite search

5 What are the results when looking for specific products on the offsite search

engine

Questions

Research Questions Addressed in this Study

8 What kinds of structured data are near or around the images on the organizationrsquos

website Alt Text Other

9 What file types appear on the organizationrsquos website (JPEG TIFF PNG)

9 What embedded metadata is available in images on the website

11 What does the XMPXMLRDF for these images look like and how robust is it

What does the graph look like

Variables

These measures are operationalized by utilization of likert scales applied by the human researcher For

example when rating the level of description for the file-name a research could conclude that the

filename sp_18379847923jpg is not very descriptive filename for a human let alone for a search engine

(unless of course this is a product sku) The researcher would then choose to assign it a low value on

descriptiveness on a 1-5 likert scale

Type of page

the image was

on

The image file naming

conventionfilename

Level of description for the

filename

Quality and number

of alt text tags

Quality and number

of embedded

metadata tags

Quality and number of structured

data tags pertaining to the images

Quality and number of search

results for onsite search

utilizing filename or alt text

Quality and number

of relevant search

results utilizing

offsite image search

Data Collection Methods

ParticipantsParticipants will include a single institution anonymized for the protection of their business The sample of image records utilized

in this study will be limited to image assets appearing on the organizationrsquos website domain Most data collection can take place

from the organizationrsquos website itself Some procedures will take place on external sites services or programs

Randomization of SampleThe sample of images utilized in this study can be randomized by extracting a site map of the particular organization of interest

using xsitemapcom After the site map is constructed the list of URLs should be inputted into a spreadsheet program and a record

number should be assigned to each URL From there the researcher can use a randomizer program to select the order of pages to

utilize in the study (ie Research Randomizer Available at httpwwwrandomizerorgformhtm) This method will be utilized for

taking a random sample of pages from the organization of interest

ConsentAll data collected in this study are publicly available and freely available on the web

Data Collection Methods

Obtaining Data on the website

Navigate to the URL

Right Click Image(s) and ldquoSave Asrdquo

Right Click Page and ldquoView Sourcerdquo Save as

txt file

Collect raw data from image by either

opening in Photoshop and Navigating to Raw

Data Column or utilize Phil Harveyrsquos

ExifTool

Obtaining Data through Structured Data Linter

Navigate to the Linter website

Enter URL

Screenshot Structured Data Results -or- save

as webpage

Obtaining Data through W3C RDF validator

Copy raw data xml extracted earlier and input

into RDF Validator

Select Graph Only on the Options

Parse RDF

Save Graph or Screenshot Graph

Store in Folder with other Data

Answer Research Questions

Systematically go through the collected data

and input findings into spreadsheet

Data Analysis Methods

Descriptive Statisticso Bell Curve - measures

towards a central tendency

using likert scale data

Bell Curve Image By Vierge Marie

(Own work) [Public domain] via

Wikimedia Commons

httpuploadwikimediaorgwikipe

diacommonsff6Gaussian_Filter

svg

Data Analysis Methods

Graphical Analysis

(Charts and Graphs)

Summary Report

Discussion of Findings

Visualizing the Results

The Structured Data Linter

utilizing URLs to display

structured data around the images

Available at

httplinterstructured-dataorg

Summary analysis will be

crafted utilizing all of these data

points to show what we are able

to understand about an image

versus what a machine or search

engine is able to know about an

image

W3C RDF Validator Graph

Visualization utilizing the raw

data markup extracted from the

image

Available at

httpwwww3orgRDFValidator

Structured Data Linter

Shows all

structured Data

Tags around the

images and in

the page markup

RDF Validator

Visualization of

embedded data

for images and

their subsequent

relationships to

other data

Summary Report

Complete Picture of Structured

Data Metadata and Analysis

of Study

Expected Outcomes

The anticipated results of this project include a benchmark for where this specific

organization is at in terms of structured data in the online environment and a

methodology for other organizations looking to assess their structured data maturity in

the digital space These results will be used to create a roadmap for improving resource

findability both on the web and within websites Other organizations may also aspire to

reuse this methodology for assessing their own current state of structured data Future

areas of research could include utilizing metadataRDF-driven search engines in

conjuncture with Vector Space Models to assess findability of image records on the

web and within websites

References (Slides amp Full Paper)

Algebraix Data Corporation 0005 Algebraix Data Launches Industryrsquos First Cost-Effective Automated Implementation

of Schemaorg Business Wire (English) 5

Beall Jeffrey 2010 How Google Uses Metadata to Improve Search Results Serials Librarian 59 no 1 40-53

Breeding Marshall 2013 Linked Data The Next Big Wave or Another Tech Fad Computers In Libraries 33 no 3

20-22

Cafarella MJ Halevy AY Zhang Y Wang DZ and Wu E Uncovering the relational Web In Proceedings of the

11th International Workshop on the Web and Databases (Vancouver BC June 13 2008)

httpwebeecsumichedu~michjcpaperswebtables_webdb08pdf

Connaway Lynn Sillipigni Timothy J Dickey and Marie L Radford 2011 ldquoIf it is too inconvenient Im not going after itrdquo

Convenience as

a critical factor in information-seeking behaviors Library amp Information Science Research (07408188) 33 no 3 179-190

References (Slides amp Full Paper)

Cazier Clay 2014 PM Digital Marketing Blog ldquoThe Future of Exif Image Datardquo Last accessed November 20 2014

httpwwwpmdigitalcomblog201404future-exif-image-data

Diagram Center Digital Image and Graphic Resources for Accessible Materials 2014 ldquoContent Modelrdquo Last Accessed

November 23 2014 httpdiagramcenterorgstandards-and-practicescontent-modelhtml

Google 2014 ldquoImage Publishing Guidelinesrdquo Last accessed November 21 2014

httpssupportgooglecomwebmastersanswer114016hl=en

Holman Lucy 2011 Millennial Students Mental Models of Search Implications for Academic Librarians and Database

Developers Journal Of Academic Librarianship 37 no 1 19-27

References (Slides amp Full Paper)

International Business Times 0006 BingGoogle and Yahoo merge to make search easier with schemaorg

International Business Times April

IPTC International Press Telecommunications Council 2014 ldquoEmbedded Metadata Manifestordquo Last accessed November

20 2014 httpwwwembeddedmetadataorgsocial-media-test-resultsphp (Embedded Metadata Manifesto 2014)

Kritzinger W T Search Engine Optimization and Pay-per-Click Marketing Strategies Journal of Organizational

Computing and Electronic Commerce no 3 (2013) 273-86

Lippincott Joan K ldquoNet Generation Students and Librariesrdquo EDUCAUSE (2005) accessed November 19 2014

httpwwweducauseeduresearch-and-publicationsbookseducating-net-generationnet-generation-students-and-libraries

References (Slides amp Full Paper)

Nakanishi T Semantic Context-Dependent Weighting for Vector Space Model Semantic Computing (ICSC) 2014

IEEE International Conference on vol no pp262266 16-18 June 2014 doi 101109ICSC201449

Paz Anita 2013 In search of Meaning The Written Word in the Age of Google Italian Journal Of Library amp

Information Science 4 no 2 255-266

Priebe T Schlager C Pernul G A search engine for RDF metadata Database and Expert Systems Applications

2004 Proceedings 15th International Workshop on vol no pp168172 2004 doi 101109DEXA20041333468

Reicks David 2010 ldquoWhy Embedded Metadata Wonrsquot Help Your SEOrdquo Last Updated December 30 2013 Last

Accessed November 23 2014 httpwwwcontrolledvocabularycomblogembedded-metadata-wont-help-seohtml

Page 12: Structured data and metadata evaluation methodology for organizations looking to improve image findability on the web emily kolvitz_2014

Questions

Research Questions Addressed in this Study

8 What kinds of structured data are near or around the images on the organizationrsquos

website Alt Text Other

9 What file types appear on the organizationrsquos website (JPEG TIFF PNG)

9 What embedded metadata is available in images on the website

11 What does the XMPXMLRDF for these images look like and how robust is it

What does the graph look like

Variables

These measures are operationalized by utilization of likert scales applied by the human researcher For

example when rating the level of description for the file-name a research could conclude that the

filename sp_18379847923jpg is not very descriptive filename for a human let alone for a search engine

(unless of course this is a product sku) The researcher would then choose to assign it a low value on

descriptiveness on a 1-5 likert scale

Type of page

the image was

on

The image file naming

conventionfilename

Level of description for the

filename

Quality and number

of alt text tags

Quality and number

of embedded

metadata tags

Quality and number of structured

data tags pertaining to the images

Quality and number of search

results for onsite search

utilizing filename or alt text

Quality and number

of relevant search

results utilizing

offsite image search

Data Collection Methods

ParticipantsParticipants will include a single institution anonymized for the protection of their business The sample of image records utilized

in this study will be limited to image assets appearing on the organizationrsquos website domain Most data collection can take place

from the organizationrsquos website itself Some procedures will take place on external sites services or programs

Randomization of SampleThe sample of images utilized in this study can be randomized by extracting a site map of the particular organization of interest

using xsitemapcom After the site map is constructed the list of URLs should be inputted into a spreadsheet program and a record

number should be assigned to each URL From there the researcher can use a randomizer program to select the order of pages to

utilize in the study (ie Research Randomizer Available at httpwwwrandomizerorgformhtm) This method will be utilized for

taking a random sample of pages from the organization of interest

ConsentAll data collected in this study are publicly available and freely available on the web

Data Collection Methods

Obtaining Data on the website

Navigate to the URL

Right Click Image(s) and ldquoSave Asrdquo

Right Click Page and ldquoView Sourcerdquo Save as

txt file

Collect raw data from image by either

opening in Photoshop and Navigating to Raw

Data Column or utilize Phil Harveyrsquos

ExifTool

Obtaining Data through Structured Data Linter

Navigate to the Linter website

Enter URL

Screenshot Structured Data Results -or- save

as webpage

Obtaining Data through W3C RDF validator

Copy raw data xml extracted earlier and input

into RDF Validator

Select Graph Only on the Options

Parse RDF

Save Graph or Screenshot Graph

Store in Folder with other Data

Answer Research Questions

Systematically go through the collected data

and input findings into spreadsheet

Data Analysis Methods

Descriptive Statisticso Bell Curve - measures

towards a central tendency

using likert scale data

Bell Curve Image By Vierge Marie

(Own work) [Public domain] via

Wikimedia Commons

httpuploadwikimediaorgwikipe

diacommonsff6Gaussian_Filter

svg

Data Analysis Methods

Graphical Analysis

(Charts and Graphs)

Summary Report

Discussion of Findings

Visualizing the Results

The Structured Data Linter

utilizing URLs to display

structured data around the images

Available at

httplinterstructured-dataorg

Summary analysis will be

crafted utilizing all of these data

points to show what we are able

to understand about an image

versus what a machine or search

engine is able to know about an

image

W3C RDF Validator Graph

Visualization utilizing the raw

data markup extracted from the

image

Available at

httpwwww3orgRDFValidator

Structured Data Linter

Shows all

structured Data

Tags around the

images and in

the page markup

RDF Validator

Visualization of

embedded data

for images and

their subsequent

relationships to

other data

Summary Report

Complete Picture of Structured

Data Metadata and Analysis

of Study

Expected Outcomes

The anticipated results of this project include a benchmark for where this specific

organization is at in terms of structured data in the online environment and a

methodology for other organizations looking to assess their structured data maturity in

the digital space These results will be used to create a roadmap for improving resource

findability both on the web and within websites Other organizations may also aspire to

reuse this methodology for assessing their own current state of structured data Future

areas of research could include utilizing metadataRDF-driven search engines in

conjuncture with Vector Space Models to assess findability of image records on the

web and within websites

References (Slides amp Full Paper)

Algebraix Data Corporation 0005 Algebraix Data Launches Industryrsquos First Cost-Effective Automated Implementation

of Schemaorg Business Wire (English) 5

Beall Jeffrey 2010 How Google Uses Metadata to Improve Search Results Serials Librarian 59 no 1 40-53

Breeding Marshall 2013 Linked Data The Next Big Wave or Another Tech Fad Computers In Libraries 33 no 3

20-22

Cafarella MJ Halevy AY Zhang Y Wang DZ and Wu E Uncovering the relational Web In Proceedings of the

11th International Workshop on the Web and Databases (Vancouver BC June 13 2008)

httpwebeecsumichedu~michjcpaperswebtables_webdb08pdf

Connaway Lynn Sillipigni Timothy J Dickey and Marie L Radford 2011 ldquoIf it is too inconvenient Im not going after itrdquo

Convenience as

a critical factor in information-seeking behaviors Library amp Information Science Research (07408188) 33 no 3 179-190

References (Slides amp Full Paper)

Cazier Clay 2014 PM Digital Marketing Blog ldquoThe Future of Exif Image Datardquo Last accessed November 20 2014

httpwwwpmdigitalcomblog201404future-exif-image-data

Diagram Center Digital Image and Graphic Resources for Accessible Materials 2014 ldquoContent Modelrdquo Last Accessed

November 23 2014 httpdiagramcenterorgstandards-and-practicescontent-modelhtml

Google 2014 ldquoImage Publishing Guidelinesrdquo Last accessed November 21 2014

httpssupportgooglecomwebmastersanswer114016hl=en

Holman Lucy 2011 Millennial Students Mental Models of Search Implications for Academic Librarians and Database

Developers Journal Of Academic Librarianship 37 no 1 19-27

References (Slides amp Full Paper)

International Business Times 0006 BingGoogle and Yahoo merge to make search easier with schemaorg

International Business Times April

IPTC International Press Telecommunications Council 2014 ldquoEmbedded Metadata Manifestordquo Last accessed November

20 2014 httpwwwembeddedmetadataorgsocial-media-test-resultsphp (Embedded Metadata Manifesto 2014)

Kritzinger W T Search Engine Optimization and Pay-per-Click Marketing Strategies Journal of Organizational

Computing and Electronic Commerce no 3 (2013) 273-86

Lippincott Joan K ldquoNet Generation Students and Librariesrdquo EDUCAUSE (2005) accessed November 19 2014

httpwwweducauseeduresearch-and-publicationsbookseducating-net-generationnet-generation-students-and-libraries

References (Slides amp Full Paper)

Nakanishi T Semantic Context-Dependent Weighting for Vector Space Model Semantic Computing (ICSC) 2014

IEEE International Conference on vol no pp262266 16-18 June 2014 doi 101109ICSC201449

Paz Anita 2013 In search of Meaning The Written Word in the Age of Google Italian Journal Of Library amp

Information Science 4 no 2 255-266

Priebe T Schlager C Pernul G A search engine for RDF metadata Database and Expert Systems Applications

2004 Proceedings 15th International Workshop on vol no pp168172 2004 doi 101109DEXA20041333468

Reicks David 2010 ldquoWhy Embedded Metadata Wonrsquot Help Your SEOrdquo Last Updated December 30 2013 Last

Accessed November 23 2014 httpwwwcontrolledvocabularycomblogembedded-metadata-wont-help-seohtml

Page 13: Structured data and metadata evaluation methodology for organizations looking to improve image findability on the web emily kolvitz_2014

Variables

These measures are operationalized by utilization of likert scales applied by the human researcher For

example when rating the level of description for the file-name a research could conclude that the

filename sp_18379847923jpg is not very descriptive filename for a human let alone for a search engine

(unless of course this is a product sku) The researcher would then choose to assign it a low value on

descriptiveness on a 1-5 likert scale

Type of page

the image was

on

The image file naming

conventionfilename

Level of description for the

filename

Quality and number

of alt text tags

Quality and number

of embedded

metadata tags

Quality and number of structured

data tags pertaining to the images

Quality and number of search

results for onsite search

utilizing filename or alt text

Quality and number

of relevant search

results utilizing

offsite image search

Data Collection Methods

ParticipantsParticipants will include a single institution anonymized for the protection of their business The sample of image records utilized

in this study will be limited to image assets appearing on the organizationrsquos website domain Most data collection can take place

from the organizationrsquos website itself Some procedures will take place on external sites services or programs

Randomization of SampleThe sample of images utilized in this study can be randomized by extracting a site map of the particular organization of interest

using xsitemapcom After the site map is constructed the list of URLs should be inputted into a spreadsheet program and a record

number should be assigned to each URL From there the researcher can use a randomizer program to select the order of pages to

utilize in the study (ie Research Randomizer Available at httpwwwrandomizerorgformhtm) This method will be utilized for

taking a random sample of pages from the organization of interest

ConsentAll data collected in this study are publicly available and freely available on the web

Data Collection Methods

Obtaining Data on the website

Navigate to the URL

Right Click Image(s) and ldquoSave Asrdquo

Right Click Page and ldquoView Sourcerdquo Save as

txt file

Collect raw data from image by either

opening in Photoshop and Navigating to Raw

Data Column or utilize Phil Harveyrsquos

ExifTool

Obtaining Data through Structured Data Linter

Navigate to the Linter website

Enter URL

Screenshot Structured Data Results -or- save

as webpage

Obtaining Data through W3C RDF validator

Copy raw data xml extracted earlier and input

into RDF Validator

Select Graph Only on the Options

Parse RDF

Save Graph or Screenshot Graph

Store in Folder with other Data

Answer Research Questions

Systematically go through the collected data

and input findings into spreadsheet

Data Analysis Methods

Descriptive Statisticso Bell Curve - measures

towards a central tendency

using likert scale data

Bell Curve Image By Vierge Marie

(Own work) [Public domain] via

Wikimedia Commons

httpuploadwikimediaorgwikipe

diacommonsff6Gaussian_Filter

svg

Data Analysis Methods

Graphical Analysis

(Charts and Graphs)

Summary Report

Discussion of Findings

Visualizing the Results

The Structured Data Linter

utilizing URLs to display

structured data around the images

Available at

httplinterstructured-dataorg

Summary analysis will be

crafted utilizing all of these data

points to show what we are able

to understand about an image

versus what a machine or search

engine is able to know about an

image

W3C RDF Validator Graph

Visualization utilizing the raw

data markup extracted from the

image

Available at

httpwwww3orgRDFValidator

Structured Data Linter

Shows all

structured Data

Tags around the

images and in

the page markup

RDF Validator

Visualization of

embedded data

for images and

their subsequent

relationships to

other data

Summary Report

Complete Picture of Structured

Data Metadata and Analysis

of Study

Expected Outcomes

The anticipated results of this project include a benchmark for where this specific

organization is at in terms of structured data in the online environment and a

methodology for other organizations looking to assess their structured data maturity in

the digital space These results will be used to create a roadmap for improving resource

findability both on the web and within websites Other organizations may also aspire to

reuse this methodology for assessing their own current state of structured data Future

areas of research could include utilizing metadataRDF-driven search engines in

conjuncture with Vector Space Models to assess findability of image records on the

web and within websites

References (Slides amp Full Paper)

Algebraix Data Corporation 0005 Algebraix Data Launches Industryrsquos First Cost-Effective Automated Implementation

of Schemaorg Business Wire (English) 5

Beall Jeffrey 2010 How Google Uses Metadata to Improve Search Results Serials Librarian 59 no 1 40-53

Breeding Marshall 2013 Linked Data The Next Big Wave or Another Tech Fad Computers In Libraries 33 no 3

20-22

Cafarella MJ Halevy AY Zhang Y Wang DZ and Wu E Uncovering the relational Web In Proceedings of the

11th International Workshop on the Web and Databases (Vancouver BC June 13 2008)

httpwebeecsumichedu~michjcpaperswebtables_webdb08pdf

Connaway Lynn Sillipigni Timothy J Dickey and Marie L Radford 2011 ldquoIf it is too inconvenient Im not going after itrdquo

Convenience as

a critical factor in information-seeking behaviors Library amp Information Science Research (07408188) 33 no 3 179-190

References (Slides amp Full Paper)

Cazier Clay 2014 PM Digital Marketing Blog ldquoThe Future of Exif Image Datardquo Last accessed November 20 2014

httpwwwpmdigitalcomblog201404future-exif-image-data

Diagram Center Digital Image and Graphic Resources for Accessible Materials 2014 ldquoContent Modelrdquo Last Accessed

November 23 2014 httpdiagramcenterorgstandards-and-practicescontent-modelhtml

Google 2014 ldquoImage Publishing Guidelinesrdquo Last accessed November 21 2014

httpssupportgooglecomwebmastersanswer114016hl=en

Holman Lucy 2011 Millennial Students Mental Models of Search Implications for Academic Librarians and Database

Developers Journal Of Academic Librarianship 37 no 1 19-27

References (Slides amp Full Paper)

International Business Times 0006 BingGoogle and Yahoo merge to make search easier with schemaorg

International Business Times April

IPTC International Press Telecommunications Council 2014 ldquoEmbedded Metadata Manifestordquo Last accessed November

20 2014 httpwwwembeddedmetadataorgsocial-media-test-resultsphp (Embedded Metadata Manifesto 2014)

Kritzinger W T Search Engine Optimization and Pay-per-Click Marketing Strategies Journal of Organizational

Computing and Electronic Commerce no 3 (2013) 273-86

Lippincott Joan K ldquoNet Generation Students and Librariesrdquo EDUCAUSE (2005) accessed November 19 2014

httpwwweducauseeduresearch-and-publicationsbookseducating-net-generationnet-generation-students-and-libraries

References (Slides amp Full Paper)

Nakanishi T Semantic Context-Dependent Weighting for Vector Space Model Semantic Computing (ICSC) 2014

IEEE International Conference on vol no pp262266 16-18 June 2014 doi 101109ICSC201449

Paz Anita 2013 In search of Meaning The Written Word in the Age of Google Italian Journal Of Library amp

Information Science 4 no 2 255-266

Priebe T Schlager C Pernul G A search engine for RDF metadata Database and Expert Systems Applications

2004 Proceedings 15th International Workshop on vol no pp168172 2004 doi 101109DEXA20041333468

Reicks David 2010 ldquoWhy Embedded Metadata Wonrsquot Help Your SEOrdquo Last Updated December 30 2013 Last

Accessed November 23 2014 httpwwwcontrolledvocabularycomblogembedded-metadata-wont-help-seohtml

Page 14: Structured data and metadata evaluation methodology for organizations looking to improve image findability on the web emily kolvitz_2014

Data Collection Methods

ParticipantsParticipants will include a single institution anonymized for the protection of their business The sample of image records utilized

in this study will be limited to image assets appearing on the organizationrsquos website domain Most data collection can take place

from the organizationrsquos website itself Some procedures will take place on external sites services or programs

Randomization of SampleThe sample of images utilized in this study can be randomized by extracting a site map of the particular organization of interest

using xsitemapcom After the site map is constructed the list of URLs should be inputted into a spreadsheet program and a record

number should be assigned to each URL From there the researcher can use a randomizer program to select the order of pages to

utilize in the study (ie Research Randomizer Available at httpwwwrandomizerorgformhtm) This method will be utilized for

taking a random sample of pages from the organization of interest

ConsentAll data collected in this study are publicly available and freely available on the web

Data Collection Methods

Obtaining Data on the website

Navigate to the URL

Right Click Image(s) and ldquoSave Asrdquo

Right Click Page and ldquoView Sourcerdquo Save as

txt file

Collect raw data from image by either

opening in Photoshop and Navigating to Raw

Data Column or utilize Phil Harveyrsquos

ExifTool

Obtaining Data through Structured Data Linter

Navigate to the Linter website

Enter URL

Screenshot Structured Data Results -or- save

as webpage

Obtaining Data through W3C RDF validator

Copy raw data xml extracted earlier and input

into RDF Validator

Select Graph Only on the Options

Parse RDF

Save Graph or Screenshot Graph

Store in Folder with other Data

Answer Research Questions

Systematically go through the collected data

and input findings into spreadsheet

Data Analysis Methods

Descriptive Statisticso Bell Curve - measures

towards a central tendency

using likert scale data

Bell Curve Image By Vierge Marie

(Own work) [Public domain] via

Wikimedia Commons

httpuploadwikimediaorgwikipe

diacommonsff6Gaussian_Filter

svg

Data Analysis Methods

Graphical Analysis

(Charts and Graphs)

Summary Report

Discussion of Findings

Visualizing the Results

The Structured Data Linter

utilizing URLs to display

structured data around the images

Available at

httplinterstructured-dataorg

Summary analysis will be

crafted utilizing all of these data

points to show what we are able

to understand about an image

versus what a machine or search

engine is able to know about an

image

W3C RDF Validator Graph

Visualization utilizing the raw

data markup extracted from the

image

Available at

httpwwww3orgRDFValidator

Structured Data Linter

Shows all

structured Data

Tags around the

images and in

the page markup

RDF Validator

Visualization of

embedded data

for images and

their subsequent

relationships to

other data

Summary Report

Complete Picture of Structured

Data Metadata and Analysis

of Study

Expected Outcomes

The anticipated results of this project include a benchmark for where this specific

organization is at in terms of structured data in the online environment and a

methodology for other organizations looking to assess their structured data maturity in

the digital space These results will be used to create a roadmap for improving resource

findability both on the web and within websites Other organizations may also aspire to

reuse this methodology for assessing their own current state of structured data Future

areas of research could include utilizing metadataRDF-driven search engines in

conjuncture with Vector Space Models to assess findability of image records on the

web and within websites

References (Slides amp Full Paper)

Algebraix Data Corporation 0005 Algebraix Data Launches Industryrsquos First Cost-Effective Automated Implementation

of Schemaorg Business Wire (English) 5

Beall Jeffrey 2010 How Google Uses Metadata to Improve Search Results Serials Librarian 59 no 1 40-53

Breeding Marshall 2013 Linked Data The Next Big Wave or Another Tech Fad Computers In Libraries 33 no 3

20-22

Cafarella MJ Halevy AY Zhang Y Wang DZ and Wu E Uncovering the relational Web In Proceedings of the

11th International Workshop on the Web and Databases (Vancouver BC June 13 2008)

httpwebeecsumichedu~michjcpaperswebtables_webdb08pdf

Connaway Lynn Sillipigni Timothy J Dickey and Marie L Radford 2011 ldquoIf it is too inconvenient Im not going after itrdquo

Convenience as

a critical factor in information-seeking behaviors Library amp Information Science Research (07408188) 33 no 3 179-190

References (Slides amp Full Paper)

Cazier Clay 2014 PM Digital Marketing Blog ldquoThe Future of Exif Image Datardquo Last accessed November 20 2014

httpwwwpmdigitalcomblog201404future-exif-image-data

Diagram Center Digital Image and Graphic Resources for Accessible Materials 2014 ldquoContent Modelrdquo Last Accessed

November 23 2014 httpdiagramcenterorgstandards-and-practicescontent-modelhtml

Google 2014 ldquoImage Publishing Guidelinesrdquo Last accessed November 21 2014

httpssupportgooglecomwebmastersanswer114016hl=en

Holman Lucy 2011 Millennial Students Mental Models of Search Implications for Academic Librarians and Database

Developers Journal Of Academic Librarianship 37 no 1 19-27

References (Slides amp Full Paper)

International Business Times 0006 BingGoogle and Yahoo merge to make search easier with schemaorg

International Business Times April

IPTC International Press Telecommunications Council 2014 ldquoEmbedded Metadata Manifestordquo Last accessed November

20 2014 httpwwwembeddedmetadataorgsocial-media-test-resultsphp (Embedded Metadata Manifesto 2014)

Kritzinger W T Search Engine Optimization and Pay-per-Click Marketing Strategies Journal of Organizational

Computing and Electronic Commerce no 3 (2013) 273-86

Lippincott Joan K ldquoNet Generation Students and Librariesrdquo EDUCAUSE (2005) accessed November 19 2014

httpwwweducauseeduresearch-and-publicationsbookseducating-net-generationnet-generation-students-and-libraries

References (Slides amp Full Paper)

Nakanishi T Semantic Context-Dependent Weighting for Vector Space Model Semantic Computing (ICSC) 2014

IEEE International Conference on vol no pp262266 16-18 June 2014 doi 101109ICSC201449

Paz Anita 2013 In search of Meaning The Written Word in the Age of Google Italian Journal Of Library amp

Information Science 4 no 2 255-266

Priebe T Schlager C Pernul G A search engine for RDF metadata Database and Expert Systems Applications

2004 Proceedings 15th International Workshop on vol no pp168172 2004 doi 101109DEXA20041333468

Reicks David 2010 ldquoWhy Embedded Metadata Wonrsquot Help Your SEOrdquo Last Updated December 30 2013 Last

Accessed November 23 2014 httpwwwcontrolledvocabularycomblogembedded-metadata-wont-help-seohtml

Page 15: Structured data and metadata evaluation methodology for organizations looking to improve image findability on the web emily kolvitz_2014

Data Collection Methods

Obtaining Data on the website

Navigate to the URL

Right Click Image(s) and ldquoSave Asrdquo

Right Click Page and ldquoView Sourcerdquo Save as

txt file

Collect raw data from image by either

opening in Photoshop and Navigating to Raw

Data Column or utilize Phil Harveyrsquos

ExifTool

Obtaining Data through Structured Data Linter

Navigate to the Linter website

Enter URL

Screenshot Structured Data Results -or- save

as webpage

Obtaining Data through W3C RDF validator

Copy raw data xml extracted earlier and input

into RDF Validator

Select Graph Only on the Options

Parse RDF

Save Graph or Screenshot Graph

Store in Folder with other Data

Answer Research Questions

Systematically go through the collected data

and input findings into spreadsheet

Data Analysis Methods

Descriptive Statisticso Bell Curve - measures

towards a central tendency

using likert scale data

Bell Curve Image By Vierge Marie

(Own work) [Public domain] via

Wikimedia Commons

httpuploadwikimediaorgwikipe

diacommonsff6Gaussian_Filter

svg

Data Analysis Methods

Graphical Analysis

(Charts and Graphs)

Summary Report

Discussion of Findings

Visualizing the Results

The Structured Data Linter

utilizing URLs to display

structured data around the images

Available at

httplinterstructured-dataorg

Summary analysis will be

crafted utilizing all of these data

points to show what we are able

to understand about an image

versus what a machine or search

engine is able to know about an

image

W3C RDF Validator Graph

Visualization utilizing the raw

data markup extracted from the

image

Available at

httpwwww3orgRDFValidator

Structured Data Linter

Shows all

structured Data

Tags around the

images and in

the page markup

RDF Validator

Visualization of

embedded data

for images and

their subsequent

relationships to

other data

Summary Report

Complete Picture of Structured

Data Metadata and Analysis

of Study

Expected Outcomes

The anticipated results of this project include a benchmark for where this specific

organization is at in terms of structured data in the online environment and a

methodology for other organizations looking to assess their structured data maturity in

the digital space These results will be used to create a roadmap for improving resource

findability both on the web and within websites Other organizations may also aspire to

reuse this methodology for assessing their own current state of structured data Future

areas of research could include utilizing metadataRDF-driven search engines in

conjuncture with Vector Space Models to assess findability of image records on the

web and within websites

References (Slides amp Full Paper)

Algebraix Data Corporation 0005 Algebraix Data Launches Industryrsquos First Cost-Effective Automated Implementation

of Schemaorg Business Wire (English) 5

Beall Jeffrey 2010 How Google Uses Metadata to Improve Search Results Serials Librarian 59 no 1 40-53

Breeding Marshall 2013 Linked Data The Next Big Wave or Another Tech Fad Computers In Libraries 33 no 3

20-22

Cafarella MJ Halevy AY Zhang Y Wang DZ and Wu E Uncovering the relational Web In Proceedings of the

11th International Workshop on the Web and Databases (Vancouver BC June 13 2008)

httpwebeecsumichedu~michjcpaperswebtables_webdb08pdf

Connaway Lynn Sillipigni Timothy J Dickey and Marie L Radford 2011 ldquoIf it is too inconvenient Im not going after itrdquo

Convenience as

a critical factor in information-seeking behaviors Library amp Information Science Research (07408188) 33 no 3 179-190

References (Slides amp Full Paper)

Cazier Clay 2014 PM Digital Marketing Blog ldquoThe Future of Exif Image Datardquo Last accessed November 20 2014

httpwwwpmdigitalcomblog201404future-exif-image-data

Diagram Center Digital Image and Graphic Resources for Accessible Materials 2014 ldquoContent Modelrdquo Last Accessed

November 23 2014 httpdiagramcenterorgstandards-and-practicescontent-modelhtml

Google 2014 ldquoImage Publishing Guidelinesrdquo Last accessed November 21 2014

httpssupportgooglecomwebmastersanswer114016hl=en

Holman Lucy 2011 Millennial Students Mental Models of Search Implications for Academic Librarians and Database

Developers Journal Of Academic Librarianship 37 no 1 19-27

References (Slides amp Full Paper)

International Business Times 0006 BingGoogle and Yahoo merge to make search easier with schemaorg

International Business Times April

IPTC International Press Telecommunications Council 2014 ldquoEmbedded Metadata Manifestordquo Last accessed November

20 2014 httpwwwembeddedmetadataorgsocial-media-test-resultsphp (Embedded Metadata Manifesto 2014)

Kritzinger W T Search Engine Optimization and Pay-per-Click Marketing Strategies Journal of Organizational

Computing and Electronic Commerce no 3 (2013) 273-86

Lippincott Joan K ldquoNet Generation Students and Librariesrdquo EDUCAUSE (2005) accessed November 19 2014

httpwwweducauseeduresearch-and-publicationsbookseducating-net-generationnet-generation-students-and-libraries

References (Slides amp Full Paper)

Nakanishi T Semantic Context-Dependent Weighting for Vector Space Model Semantic Computing (ICSC) 2014

IEEE International Conference on vol no pp262266 16-18 June 2014 doi 101109ICSC201449

Paz Anita 2013 In search of Meaning The Written Word in the Age of Google Italian Journal Of Library amp

Information Science 4 no 2 255-266

Priebe T Schlager C Pernul G A search engine for RDF metadata Database and Expert Systems Applications

2004 Proceedings 15th International Workshop on vol no pp168172 2004 doi 101109DEXA20041333468

Reicks David 2010 ldquoWhy Embedded Metadata Wonrsquot Help Your SEOrdquo Last Updated December 30 2013 Last

Accessed November 23 2014 httpwwwcontrolledvocabularycomblogembedded-metadata-wont-help-seohtml

Page 16: Structured data and metadata evaluation methodology for organizations looking to improve image findability on the web emily kolvitz_2014

Data Analysis Methods

Descriptive Statisticso Bell Curve - measures

towards a central tendency

using likert scale data

Bell Curve Image By Vierge Marie

(Own work) [Public domain] via

Wikimedia Commons

httpuploadwikimediaorgwikipe

diacommonsff6Gaussian_Filter

svg

Data Analysis Methods

Graphical Analysis

(Charts and Graphs)

Summary Report

Discussion of Findings

Visualizing the Results

The Structured Data Linter

utilizing URLs to display

structured data around the images

Available at

httplinterstructured-dataorg

Summary analysis will be

crafted utilizing all of these data

points to show what we are able

to understand about an image

versus what a machine or search

engine is able to know about an

image

W3C RDF Validator Graph

Visualization utilizing the raw

data markup extracted from the

image

Available at

httpwwww3orgRDFValidator

Structured Data Linter

Shows all

structured Data

Tags around the

images and in

the page markup

RDF Validator

Visualization of

embedded data

for images and

their subsequent

relationships to

other data

Summary Report

Complete Picture of Structured

Data Metadata and Analysis

of Study

Expected Outcomes

The anticipated results of this project include a benchmark for where this specific

organization is at in terms of structured data in the online environment and a

methodology for other organizations looking to assess their structured data maturity in

the digital space These results will be used to create a roadmap for improving resource

findability both on the web and within websites Other organizations may also aspire to

reuse this methodology for assessing their own current state of structured data Future

areas of research could include utilizing metadataRDF-driven search engines in

conjuncture with Vector Space Models to assess findability of image records on the

web and within websites

References (Slides amp Full Paper)

Algebraix Data Corporation 0005 Algebraix Data Launches Industryrsquos First Cost-Effective Automated Implementation

of Schemaorg Business Wire (English) 5

Beall Jeffrey 2010 How Google Uses Metadata to Improve Search Results Serials Librarian 59 no 1 40-53

Breeding Marshall 2013 Linked Data The Next Big Wave or Another Tech Fad Computers In Libraries 33 no 3

20-22

Cafarella MJ Halevy AY Zhang Y Wang DZ and Wu E Uncovering the relational Web In Proceedings of the

11th International Workshop on the Web and Databases (Vancouver BC June 13 2008)

httpwebeecsumichedu~michjcpaperswebtables_webdb08pdf

Connaway Lynn Sillipigni Timothy J Dickey and Marie L Radford 2011 ldquoIf it is too inconvenient Im not going after itrdquo

Convenience as

a critical factor in information-seeking behaviors Library amp Information Science Research (07408188) 33 no 3 179-190

References (Slides amp Full Paper)

Cazier Clay 2014 PM Digital Marketing Blog ldquoThe Future of Exif Image Datardquo Last accessed November 20 2014

httpwwwpmdigitalcomblog201404future-exif-image-data

Diagram Center Digital Image and Graphic Resources for Accessible Materials 2014 ldquoContent Modelrdquo Last Accessed

November 23 2014 httpdiagramcenterorgstandards-and-practicescontent-modelhtml

Google 2014 ldquoImage Publishing Guidelinesrdquo Last accessed November 21 2014

httpssupportgooglecomwebmastersanswer114016hl=en

Holman Lucy 2011 Millennial Students Mental Models of Search Implications for Academic Librarians and Database

Developers Journal Of Academic Librarianship 37 no 1 19-27

References (Slides amp Full Paper)

International Business Times 0006 BingGoogle and Yahoo merge to make search easier with schemaorg

International Business Times April

IPTC International Press Telecommunications Council 2014 ldquoEmbedded Metadata Manifestordquo Last accessed November

20 2014 httpwwwembeddedmetadataorgsocial-media-test-resultsphp (Embedded Metadata Manifesto 2014)

Kritzinger W T Search Engine Optimization and Pay-per-Click Marketing Strategies Journal of Organizational

Computing and Electronic Commerce no 3 (2013) 273-86

Lippincott Joan K ldquoNet Generation Students and Librariesrdquo EDUCAUSE (2005) accessed November 19 2014

httpwwweducauseeduresearch-and-publicationsbookseducating-net-generationnet-generation-students-and-libraries

References (Slides amp Full Paper)

Nakanishi T Semantic Context-Dependent Weighting for Vector Space Model Semantic Computing (ICSC) 2014

IEEE International Conference on vol no pp262266 16-18 June 2014 doi 101109ICSC201449

Paz Anita 2013 In search of Meaning The Written Word in the Age of Google Italian Journal Of Library amp

Information Science 4 no 2 255-266

Priebe T Schlager C Pernul G A search engine for RDF metadata Database and Expert Systems Applications

2004 Proceedings 15th International Workshop on vol no pp168172 2004 doi 101109DEXA20041333468

Reicks David 2010 ldquoWhy Embedded Metadata Wonrsquot Help Your SEOrdquo Last Updated December 30 2013 Last

Accessed November 23 2014 httpwwwcontrolledvocabularycomblogembedded-metadata-wont-help-seohtml

Page 17: Structured data and metadata evaluation methodology for organizations looking to improve image findability on the web emily kolvitz_2014

Data Analysis Methods

Graphical Analysis

(Charts and Graphs)

Summary Report

Discussion of Findings

Visualizing the Results

The Structured Data Linter

utilizing URLs to display

structured data around the images

Available at

httplinterstructured-dataorg

Summary analysis will be

crafted utilizing all of these data

points to show what we are able

to understand about an image

versus what a machine or search

engine is able to know about an

image

W3C RDF Validator Graph

Visualization utilizing the raw

data markup extracted from the

image

Available at

httpwwww3orgRDFValidator

Structured Data Linter

Shows all

structured Data

Tags around the

images and in

the page markup

RDF Validator

Visualization of

embedded data

for images and

their subsequent

relationships to

other data

Summary Report

Complete Picture of Structured

Data Metadata and Analysis

of Study

Expected Outcomes

The anticipated results of this project include a benchmark for where this specific

organization is at in terms of structured data in the online environment and a

methodology for other organizations looking to assess their structured data maturity in

the digital space These results will be used to create a roadmap for improving resource

findability both on the web and within websites Other organizations may also aspire to

reuse this methodology for assessing their own current state of structured data Future

areas of research could include utilizing metadataRDF-driven search engines in

conjuncture with Vector Space Models to assess findability of image records on the

web and within websites

References (Slides amp Full Paper)

Algebraix Data Corporation 0005 Algebraix Data Launches Industryrsquos First Cost-Effective Automated Implementation

of Schemaorg Business Wire (English) 5

Beall Jeffrey 2010 How Google Uses Metadata to Improve Search Results Serials Librarian 59 no 1 40-53

Breeding Marshall 2013 Linked Data The Next Big Wave or Another Tech Fad Computers In Libraries 33 no 3

20-22

Cafarella MJ Halevy AY Zhang Y Wang DZ and Wu E Uncovering the relational Web In Proceedings of the

11th International Workshop on the Web and Databases (Vancouver BC June 13 2008)

httpwebeecsumichedu~michjcpaperswebtables_webdb08pdf

Connaway Lynn Sillipigni Timothy J Dickey and Marie L Radford 2011 ldquoIf it is too inconvenient Im not going after itrdquo

Convenience as

a critical factor in information-seeking behaviors Library amp Information Science Research (07408188) 33 no 3 179-190

References (Slides amp Full Paper)

Cazier Clay 2014 PM Digital Marketing Blog ldquoThe Future of Exif Image Datardquo Last accessed November 20 2014

httpwwwpmdigitalcomblog201404future-exif-image-data

Diagram Center Digital Image and Graphic Resources for Accessible Materials 2014 ldquoContent Modelrdquo Last Accessed

November 23 2014 httpdiagramcenterorgstandards-and-practicescontent-modelhtml

Google 2014 ldquoImage Publishing Guidelinesrdquo Last accessed November 21 2014

httpssupportgooglecomwebmastersanswer114016hl=en

Holman Lucy 2011 Millennial Students Mental Models of Search Implications for Academic Librarians and Database

Developers Journal Of Academic Librarianship 37 no 1 19-27

References (Slides amp Full Paper)

International Business Times 0006 BingGoogle and Yahoo merge to make search easier with schemaorg

International Business Times April

IPTC International Press Telecommunications Council 2014 ldquoEmbedded Metadata Manifestordquo Last accessed November

20 2014 httpwwwembeddedmetadataorgsocial-media-test-resultsphp (Embedded Metadata Manifesto 2014)

Kritzinger W T Search Engine Optimization and Pay-per-Click Marketing Strategies Journal of Organizational

Computing and Electronic Commerce no 3 (2013) 273-86

Lippincott Joan K ldquoNet Generation Students and Librariesrdquo EDUCAUSE (2005) accessed November 19 2014

httpwwweducauseeduresearch-and-publicationsbookseducating-net-generationnet-generation-students-and-libraries

References (Slides amp Full Paper)

Nakanishi T Semantic Context-Dependent Weighting for Vector Space Model Semantic Computing (ICSC) 2014

IEEE International Conference on vol no pp262266 16-18 June 2014 doi 101109ICSC201449

Paz Anita 2013 In search of Meaning The Written Word in the Age of Google Italian Journal Of Library amp

Information Science 4 no 2 255-266

Priebe T Schlager C Pernul G A search engine for RDF metadata Database and Expert Systems Applications

2004 Proceedings 15th International Workshop on vol no pp168172 2004 doi 101109DEXA20041333468

Reicks David 2010 ldquoWhy Embedded Metadata Wonrsquot Help Your SEOrdquo Last Updated December 30 2013 Last

Accessed November 23 2014 httpwwwcontrolledvocabularycomblogembedded-metadata-wont-help-seohtml

Page 18: Structured data and metadata evaluation methodology for organizations looking to improve image findability on the web emily kolvitz_2014

Visualizing the Results

The Structured Data Linter

utilizing URLs to display

structured data around the images

Available at

httplinterstructured-dataorg

Summary analysis will be

crafted utilizing all of these data

points to show what we are able

to understand about an image

versus what a machine or search

engine is able to know about an

image

W3C RDF Validator Graph

Visualization utilizing the raw

data markup extracted from the

image

Available at

httpwwww3orgRDFValidator

Structured Data Linter

Shows all

structured Data

Tags around the

images and in

the page markup

RDF Validator

Visualization of

embedded data

for images and

their subsequent

relationships to

other data

Summary Report

Complete Picture of Structured

Data Metadata and Analysis

of Study

Expected Outcomes

The anticipated results of this project include a benchmark for where this specific

organization is at in terms of structured data in the online environment and a

methodology for other organizations looking to assess their structured data maturity in

the digital space These results will be used to create a roadmap for improving resource

findability both on the web and within websites Other organizations may also aspire to

reuse this methodology for assessing their own current state of structured data Future

areas of research could include utilizing metadataRDF-driven search engines in

conjuncture with Vector Space Models to assess findability of image records on the

web and within websites

References (Slides amp Full Paper)

Algebraix Data Corporation 0005 Algebraix Data Launches Industryrsquos First Cost-Effective Automated Implementation

of Schemaorg Business Wire (English) 5

Beall Jeffrey 2010 How Google Uses Metadata to Improve Search Results Serials Librarian 59 no 1 40-53

Breeding Marshall 2013 Linked Data The Next Big Wave or Another Tech Fad Computers In Libraries 33 no 3

20-22

Cafarella MJ Halevy AY Zhang Y Wang DZ and Wu E Uncovering the relational Web In Proceedings of the

11th International Workshop on the Web and Databases (Vancouver BC June 13 2008)

httpwebeecsumichedu~michjcpaperswebtables_webdb08pdf

Connaway Lynn Sillipigni Timothy J Dickey and Marie L Radford 2011 ldquoIf it is too inconvenient Im not going after itrdquo

Convenience as

a critical factor in information-seeking behaviors Library amp Information Science Research (07408188) 33 no 3 179-190

References (Slides amp Full Paper)

Cazier Clay 2014 PM Digital Marketing Blog ldquoThe Future of Exif Image Datardquo Last accessed November 20 2014

httpwwwpmdigitalcomblog201404future-exif-image-data

Diagram Center Digital Image and Graphic Resources for Accessible Materials 2014 ldquoContent Modelrdquo Last Accessed

November 23 2014 httpdiagramcenterorgstandards-and-practicescontent-modelhtml

Google 2014 ldquoImage Publishing Guidelinesrdquo Last accessed November 21 2014

httpssupportgooglecomwebmastersanswer114016hl=en

Holman Lucy 2011 Millennial Students Mental Models of Search Implications for Academic Librarians and Database

Developers Journal Of Academic Librarianship 37 no 1 19-27

References (Slides amp Full Paper)

International Business Times 0006 BingGoogle and Yahoo merge to make search easier with schemaorg

International Business Times April

IPTC International Press Telecommunications Council 2014 ldquoEmbedded Metadata Manifestordquo Last accessed November

20 2014 httpwwwembeddedmetadataorgsocial-media-test-resultsphp (Embedded Metadata Manifesto 2014)

Kritzinger W T Search Engine Optimization and Pay-per-Click Marketing Strategies Journal of Organizational

Computing and Electronic Commerce no 3 (2013) 273-86

Lippincott Joan K ldquoNet Generation Students and Librariesrdquo EDUCAUSE (2005) accessed November 19 2014

httpwwweducauseeduresearch-and-publicationsbookseducating-net-generationnet-generation-students-and-libraries

References (Slides amp Full Paper)

Nakanishi T Semantic Context-Dependent Weighting for Vector Space Model Semantic Computing (ICSC) 2014

IEEE International Conference on vol no pp262266 16-18 June 2014 doi 101109ICSC201449

Paz Anita 2013 In search of Meaning The Written Word in the Age of Google Italian Journal Of Library amp

Information Science 4 no 2 255-266

Priebe T Schlager C Pernul G A search engine for RDF metadata Database and Expert Systems Applications

2004 Proceedings 15th International Workshop on vol no pp168172 2004 doi 101109DEXA20041333468

Reicks David 2010 ldquoWhy Embedded Metadata Wonrsquot Help Your SEOrdquo Last Updated December 30 2013 Last

Accessed November 23 2014 httpwwwcontrolledvocabularycomblogembedded-metadata-wont-help-seohtml

Page 19: Structured data and metadata evaluation methodology for organizations looking to improve image findability on the web emily kolvitz_2014

Structured Data Linter

Shows all

structured Data

Tags around the

images and in

the page markup

RDF Validator

Visualization of

embedded data

for images and

their subsequent

relationships to

other data

Summary Report

Complete Picture of Structured

Data Metadata and Analysis

of Study

Expected Outcomes

The anticipated results of this project include a benchmark for where this specific

organization is at in terms of structured data in the online environment and a

methodology for other organizations looking to assess their structured data maturity in

the digital space These results will be used to create a roadmap for improving resource

findability both on the web and within websites Other organizations may also aspire to

reuse this methodology for assessing their own current state of structured data Future

areas of research could include utilizing metadataRDF-driven search engines in

conjuncture with Vector Space Models to assess findability of image records on the

web and within websites

References (Slides amp Full Paper)

Algebraix Data Corporation 0005 Algebraix Data Launches Industryrsquos First Cost-Effective Automated Implementation

of Schemaorg Business Wire (English) 5

Beall Jeffrey 2010 How Google Uses Metadata to Improve Search Results Serials Librarian 59 no 1 40-53

Breeding Marshall 2013 Linked Data The Next Big Wave or Another Tech Fad Computers In Libraries 33 no 3

20-22

Cafarella MJ Halevy AY Zhang Y Wang DZ and Wu E Uncovering the relational Web In Proceedings of the

11th International Workshop on the Web and Databases (Vancouver BC June 13 2008)

httpwebeecsumichedu~michjcpaperswebtables_webdb08pdf

Connaway Lynn Sillipigni Timothy J Dickey and Marie L Radford 2011 ldquoIf it is too inconvenient Im not going after itrdquo

Convenience as

a critical factor in information-seeking behaviors Library amp Information Science Research (07408188) 33 no 3 179-190

References (Slides amp Full Paper)

Cazier Clay 2014 PM Digital Marketing Blog ldquoThe Future of Exif Image Datardquo Last accessed November 20 2014

httpwwwpmdigitalcomblog201404future-exif-image-data

Diagram Center Digital Image and Graphic Resources for Accessible Materials 2014 ldquoContent Modelrdquo Last Accessed

November 23 2014 httpdiagramcenterorgstandards-and-practicescontent-modelhtml

Google 2014 ldquoImage Publishing Guidelinesrdquo Last accessed November 21 2014

httpssupportgooglecomwebmastersanswer114016hl=en

Holman Lucy 2011 Millennial Students Mental Models of Search Implications for Academic Librarians and Database

Developers Journal Of Academic Librarianship 37 no 1 19-27

References (Slides amp Full Paper)

International Business Times 0006 BingGoogle and Yahoo merge to make search easier with schemaorg

International Business Times April

IPTC International Press Telecommunications Council 2014 ldquoEmbedded Metadata Manifestordquo Last accessed November

20 2014 httpwwwembeddedmetadataorgsocial-media-test-resultsphp (Embedded Metadata Manifesto 2014)

Kritzinger W T Search Engine Optimization and Pay-per-Click Marketing Strategies Journal of Organizational

Computing and Electronic Commerce no 3 (2013) 273-86

Lippincott Joan K ldquoNet Generation Students and Librariesrdquo EDUCAUSE (2005) accessed November 19 2014

httpwwweducauseeduresearch-and-publicationsbookseducating-net-generationnet-generation-students-and-libraries

References (Slides amp Full Paper)

Nakanishi T Semantic Context-Dependent Weighting for Vector Space Model Semantic Computing (ICSC) 2014

IEEE International Conference on vol no pp262266 16-18 June 2014 doi 101109ICSC201449

Paz Anita 2013 In search of Meaning The Written Word in the Age of Google Italian Journal Of Library amp

Information Science 4 no 2 255-266

Priebe T Schlager C Pernul G A search engine for RDF metadata Database and Expert Systems Applications

2004 Proceedings 15th International Workshop on vol no pp168172 2004 doi 101109DEXA20041333468

Reicks David 2010 ldquoWhy Embedded Metadata Wonrsquot Help Your SEOrdquo Last Updated December 30 2013 Last

Accessed November 23 2014 httpwwwcontrolledvocabularycomblogembedded-metadata-wont-help-seohtml

Page 20: Structured data and metadata evaluation methodology for organizations looking to improve image findability on the web emily kolvitz_2014

RDF Validator

Visualization of

embedded data

for images and

their subsequent

relationships to

other data

Summary Report

Complete Picture of Structured

Data Metadata and Analysis

of Study

Expected Outcomes

The anticipated results of this project include a benchmark for where this specific

organization is at in terms of structured data in the online environment and a

methodology for other organizations looking to assess their structured data maturity in

the digital space These results will be used to create a roadmap for improving resource

findability both on the web and within websites Other organizations may also aspire to

reuse this methodology for assessing their own current state of structured data Future

areas of research could include utilizing metadataRDF-driven search engines in

conjuncture with Vector Space Models to assess findability of image records on the

web and within websites

References (Slides amp Full Paper)

Algebraix Data Corporation 0005 Algebraix Data Launches Industryrsquos First Cost-Effective Automated Implementation

of Schemaorg Business Wire (English) 5

Beall Jeffrey 2010 How Google Uses Metadata to Improve Search Results Serials Librarian 59 no 1 40-53

Breeding Marshall 2013 Linked Data The Next Big Wave or Another Tech Fad Computers In Libraries 33 no 3

20-22

Cafarella MJ Halevy AY Zhang Y Wang DZ and Wu E Uncovering the relational Web In Proceedings of the

11th International Workshop on the Web and Databases (Vancouver BC June 13 2008)

httpwebeecsumichedu~michjcpaperswebtables_webdb08pdf

Connaway Lynn Sillipigni Timothy J Dickey and Marie L Radford 2011 ldquoIf it is too inconvenient Im not going after itrdquo

Convenience as

a critical factor in information-seeking behaviors Library amp Information Science Research (07408188) 33 no 3 179-190

References (Slides amp Full Paper)

Cazier Clay 2014 PM Digital Marketing Blog ldquoThe Future of Exif Image Datardquo Last accessed November 20 2014

httpwwwpmdigitalcomblog201404future-exif-image-data

Diagram Center Digital Image and Graphic Resources for Accessible Materials 2014 ldquoContent Modelrdquo Last Accessed

November 23 2014 httpdiagramcenterorgstandards-and-practicescontent-modelhtml

Google 2014 ldquoImage Publishing Guidelinesrdquo Last accessed November 21 2014

httpssupportgooglecomwebmastersanswer114016hl=en

Holman Lucy 2011 Millennial Students Mental Models of Search Implications for Academic Librarians and Database

Developers Journal Of Academic Librarianship 37 no 1 19-27

References (Slides amp Full Paper)

International Business Times 0006 BingGoogle and Yahoo merge to make search easier with schemaorg

International Business Times April

IPTC International Press Telecommunications Council 2014 ldquoEmbedded Metadata Manifestordquo Last accessed November

20 2014 httpwwwembeddedmetadataorgsocial-media-test-resultsphp (Embedded Metadata Manifesto 2014)

Kritzinger W T Search Engine Optimization and Pay-per-Click Marketing Strategies Journal of Organizational

Computing and Electronic Commerce no 3 (2013) 273-86

Lippincott Joan K ldquoNet Generation Students and Librariesrdquo EDUCAUSE (2005) accessed November 19 2014

httpwwweducauseeduresearch-and-publicationsbookseducating-net-generationnet-generation-students-and-libraries

References (Slides amp Full Paper)

Nakanishi T Semantic Context-Dependent Weighting for Vector Space Model Semantic Computing (ICSC) 2014

IEEE International Conference on vol no pp262266 16-18 June 2014 doi 101109ICSC201449

Paz Anita 2013 In search of Meaning The Written Word in the Age of Google Italian Journal Of Library amp

Information Science 4 no 2 255-266

Priebe T Schlager C Pernul G A search engine for RDF metadata Database and Expert Systems Applications

2004 Proceedings 15th International Workshop on vol no pp168172 2004 doi 101109DEXA20041333468

Reicks David 2010 ldquoWhy Embedded Metadata Wonrsquot Help Your SEOrdquo Last Updated December 30 2013 Last

Accessed November 23 2014 httpwwwcontrolledvocabularycomblogembedded-metadata-wont-help-seohtml

Page 21: Structured data and metadata evaluation methodology for organizations looking to improve image findability on the web emily kolvitz_2014

Summary Report

Complete Picture of Structured

Data Metadata and Analysis

of Study

Expected Outcomes

The anticipated results of this project include a benchmark for where this specific

organization is at in terms of structured data in the online environment and a

methodology for other organizations looking to assess their structured data maturity in

the digital space These results will be used to create a roadmap for improving resource

findability both on the web and within websites Other organizations may also aspire to

reuse this methodology for assessing their own current state of structured data Future

areas of research could include utilizing metadataRDF-driven search engines in

conjuncture with Vector Space Models to assess findability of image records on the

web and within websites

References (Slides amp Full Paper)

Algebraix Data Corporation 0005 Algebraix Data Launches Industryrsquos First Cost-Effective Automated Implementation

of Schemaorg Business Wire (English) 5

Beall Jeffrey 2010 How Google Uses Metadata to Improve Search Results Serials Librarian 59 no 1 40-53

Breeding Marshall 2013 Linked Data The Next Big Wave or Another Tech Fad Computers In Libraries 33 no 3

20-22

Cafarella MJ Halevy AY Zhang Y Wang DZ and Wu E Uncovering the relational Web In Proceedings of the

11th International Workshop on the Web and Databases (Vancouver BC June 13 2008)

httpwebeecsumichedu~michjcpaperswebtables_webdb08pdf

Connaway Lynn Sillipigni Timothy J Dickey and Marie L Radford 2011 ldquoIf it is too inconvenient Im not going after itrdquo

Convenience as

a critical factor in information-seeking behaviors Library amp Information Science Research (07408188) 33 no 3 179-190

References (Slides amp Full Paper)

Cazier Clay 2014 PM Digital Marketing Blog ldquoThe Future of Exif Image Datardquo Last accessed November 20 2014

httpwwwpmdigitalcomblog201404future-exif-image-data

Diagram Center Digital Image and Graphic Resources for Accessible Materials 2014 ldquoContent Modelrdquo Last Accessed

November 23 2014 httpdiagramcenterorgstandards-and-practicescontent-modelhtml

Google 2014 ldquoImage Publishing Guidelinesrdquo Last accessed November 21 2014

httpssupportgooglecomwebmastersanswer114016hl=en

Holman Lucy 2011 Millennial Students Mental Models of Search Implications for Academic Librarians and Database

Developers Journal Of Academic Librarianship 37 no 1 19-27

References (Slides amp Full Paper)

International Business Times 0006 BingGoogle and Yahoo merge to make search easier with schemaorg

International Business Times April

IPTC International Press Telecommunications Council 2014 ldquoEmbedded Metadata Manifestordquo Last accessed November

20 2014 httpwwwembeddedmetadataorgsocial-media-test-resultsphp (Embedded Metadata Manifesto 2014)

Kritzinger W T Search Engine Optimization and Pay-per-Click Marketing Strategies Journal of Organizational

Computing and Electronic Commerce no 3 (2013) 273-86

Lippincott Joan K ldquoNet Generation Students and Librariesrdquo EDUCAUSE (2005) accessed November 19 2014

httpwwweducauseeduresearch-and-publicationsbookseducating-net-generationnet-generation-students-and-libraries

References (Slides amp Full Paper)

Nakanishi T Semantic Context-Dependent Weighting for Vector Space Model Semantic Computing (ICSC) 2014

IEEE International Conference on vol no pp262266 16-18 June 2014 doi 101109ICSC201449

Paz Anita 2013 In search of Meaning The Written Word in the Age of Google Italian Journal Of Library amp

Information Science 4 no 2 255-266

Priebe T Schlager C Pernul G A search engine for RDF metadata Database and Expert Systems Applications

2004 Proceedings 15th International Workshop on vol no pp168172 2004 doi 101109DEXA20041333468

Reicks David 2010 ldquoWhy Embedded Metadata Wonrsquot Help Your SEOrdquo Last Updated December 30 2013 Last

Accessed November 23 2014 httpwwwcontrolledvocabularycomblogembedded-metadata-wont-help-seohtml

Page 22: Structured data and metadata evaluation methodology for organizations looking to improve image findability on the web emily kolvitz_2014

Expected Outcomes

The anticipated results of this project include a benchmark for where this specific

organization is at in terms of structured data in the online environment and a

methodology for other organizations looking to assess their structured data maturity in

the digital space These results will be used to create a roadmap for improving resource

findability both on the web and within websites Other organizations may also aspire to

reuse this methodology for assessing their own current state of structured data Future

areas of research could include utilizing metadataRDF-driven search engines in

conjuncture with Vector Space Models to assess findability of image records on the

web and within websites

References (Slides amp Full Paper)

Algebraix Data Corporation 0005 Algebraix Data Launches Industryrsquos First Cost-Effective Automated Implementation

of Schemaorg Business Wire (English) 5

Beall Jeffrey 2010 How Google Uses Metadata to Improve Search Results Serials Librarian 59 no 1 40-53

Breeding Marshall 2013 Linked Data The Next Big Wave or Another Tech Fad Computers In Libraries 33 no 3

20-22

Cafarella MJ Halevy AY Zhang Y Wang DZ and Wu E Uncovering the relational Web In Proceedings of the

11th International Workshop on the Web and Databases (Vancouver BC June 13 2008)

httpwebeecsumichedu~michjcpaperswebtables_webdb08pdf

Connaway Lynn Sillipigni Timothy J Dickey and Marie L Radford 2011 ldquoIf it is too inconvenient Im not going after itrdquo

Convenience as

a critical factor in information-seeking behaviors Library amp Information Science Research (07408188) 33 no 3 179-190

References (Slides amp Full Paper)

Cazier Clay 2014 PM Digital Marketing Blog ldquoThe Future of Exif Image Datardquo Last accessed November 20 2014

httpwwwpmdigitalcomblog201404future-exif-image-data

Diagram Center Digital Image and Graphic Resources for Accessible Materials 2014 ldquoContent Modelrdquo Last Accessed

November 23 2014 httpdiagramcenterorgstandards-and-practicescontent-modelhtml

Google 2014 ldquoImage Publishing Guidelinesrdquo Last accessed November 21 2014

httpssupportgooglecomwebmastersanswer114016hl=en

Holman Lucy 2011 Millennial Students Mental Models of Search Implications for Academic Librarians and Database

Developers Journal Of Academic Librarianship 37 no 1 19-27

References (Slides amp Full Paper)

International Business Times 0006 BingGoogle and Yahoo merge to make search easier with schemaorg

International Business Times April

IPTC International Press Telecommunications Council 2014 ldquoEmbedded Metadata Manifestordquo Last accessed November

20 2014 httpwwwembeddedmetadataorgsocial-media-test-resultsphp (Embedded Metadata Manifesto 2014)

Kritzinger W T Search Engine Optimization and Pay-per-Click Marketing Strategies Journal of Organizational

Computing and Electronic Commerce no 3 (2013) 273-86

Lippincott Joan K ldquoNet Generation Students and Librariesrdquo EDUCAUSE (2005) accessed November 19 2014

httpwwweducauseeduresearch-and-publicationsbookseducating-net-generationnet-generation-students-and-libraries

References (Slides amp Full Paper)

Nakanishi T Semantic Context-Dependent Weighting for Vector Space Model Semantic Computing (ICSC) 2014

IEEE International Conference on vol no pp262266 16-18 June 2014 doi 101109ICSC201449

Paz Anita 2013 In search of Meaning The Written Word in the Age of Google Italian Journal Of Library amp

Information Science 4 no 2 255-266

Priebe T Schlager C Pernul G A search engine for RDF metadata Database and Expert Systems Applications

2004 Proceedings 15th International Workshop on vol no pp168172 2004 doi 101109DEXA20041333468

Reicks David 2010 ldquoWhy Embedded Metadata Wonrsquot Help Your SEOrdquo Last Updated December 30 2013 Last

Accessed November 23 2014 httpwwwcontrolledvocabularycomblogembedded-metadata-wont-help-seohtml

Page 23: Structured data and metadata evaluation methodology for organizations looking to improve image findability on the web emily kolvitz_2014

References (Slides amp Full Paper)

Algebraix Data Corporation 0005 Algebraix Data Launches Industryrsquos First Cost-Effective Automated Implementation

of Schemaorg Business Wire (English) 5

Beall Jeffrey 2010 How Google Uses Metadata to Improve Search Results Serials Librarian 59 no 1 40-53

Breeding Marshall 2013 Linked Data The Next Big Wave or Another Tech Fad Computers In Libraries 33 no 3

20-22

Cafarella MJ Halevy AY Zhang Y Wang DZ and Wu E Uncovering the relational Web In Proceedings of the

11th International Workshop on the Web and Databases (Vancouver BC June 13 2008)

httpwebeecsumichedu~michjcpaperswebtables_webdb08pdf

Connaway Lynn Sillipigni Timothy J Dickey and Marie L Radford 2011 ldquoIf it is too inconvenient Im not going after itrdquo

Convenience as

a critical factor in information-seeking behaviors Library amp Information Science Research (07408188) 33 no 3 179-190

References (Slides amp Full Paper)

Cazier Clay 2014 PM Digital Marketing Blog ldquoThe Future of Exif Image Datardquo Last accessed November 20 2014

httpwwwpmdigitalcomblog201404future-exif-image-data

Diagram Center Digital Image and Graphic Resources for Accessible Materials 2014 ldquoContent Modelrdquo Last Accessed

November 23 2014 httpdiagramcenterorgstandards-and-practicescontent-modelhtml

Google 2014 ldquoImage Publishing Guidelinesrdquo Last accessed November 21 2014

httpssupportgooglecomwebmastersanswer114016hl=en

Holman Lucy 2011 Millennial Students Mental Models of Search Implications for Academic Librarians and Database

Developers Journal Of Academic Librarianship 37 no 1 19-27

References (Slides amp Full Paper)

International Business Times 0006 BingGoogle and Yahoo merge to make search easier with schemaorg

International Business Times April

IPTC International Press Telecommunications Council 2014 ldquoEmbedded Metadata Manifestordquo Last accessed November

20 2014 httpwwwembeddedmetadataorgsocial-media-test-resultsphp (Embedded Metadata Manifesto 2014)

Kritzinger W T Search Engine Optimization and Pay-per-Click Marketing Strategies Journal of Organizational

Computing and Electronic Commerce no 3 (2013) 273-86

Lippincott Joan K ldquoNet Generation Students and Librariesrdquo EDUCAUSE (2005) accessed November 19 2014

httpwwweducauseeduresearch-and-publicationsbookseducating-net-generationnet-generation-students-and-libraries

References (Slides amp Full Paper)

Nakanishi T Semantic Context-Dependent Weighting for Vector Space Model Semantic Computing (ICSC) 2014

IEEE International Conference on vol no pp262266 16-18 June 2014 doi 101109ICSC201449

Paz Anita 2013 In search of Meaning The Written Word in the Age of Google Italian Journal Of Library amp

Information Science 4 no 2 255-266

Priebe T Schlager C Pernul G A search engine for RDF metadata Database and Expert Systems Applications

2004 Proceedings 15th International Workshop on vol no pp168172 2004 doi 101109DEXA20041333468

Reicks David 2010 ldquoWhy Embedded Metadata Wonrsquot Help Your SEOrdquo Last Updated December 30 2013 Last

Accessed November 23 2014 httpwwwcontrolledvocabularycomblogembedded-metadata-wont-help-seohtml

Page 24: Structured data and metadata evaluation methodology for organizations looking to improve image findability on the web emily kolvitz_2014

References (Slides amp Full Paper)

Cazier Clay 2014 PM Digital Marketing Blog ldquoThe Future of Exif Image Datardquo Last accessed November 20 2014

httpwwwpmdigitalcomblog201404future-exif-image-data

Diagram Center Digital Image and Graphic Resources for Accessible Materials 2014 ldquoContent Modelrdquo Last Accessed

November 23 2014 httpdiagramcenterorgstandards-and-practicescontent-modelhtml

Google 2014 ldquoImage Publishing Guidelinesrdquo Last accessed November 21 2014

httpssupportgooglecomwebmastersanswer114016hl=en

Holman Lucy 2011 Millennial Students Mental Models of Search Implications for Academic Librarians and Database

Developers Journal Of Academic Librarianship 37 no 1 19-27

References (Slides amp Full Paper)

International Business Times 0006 BingGoogle and Yahoo merge to make search easier with schemaorg

International Business Times April

IPTC International Press Telecommunications Council 2014 ldquoEmbedded Metadata Manifestordquo Last accessed November

20 2014 httpwwwembeddedmetadataorgsocial-media-test-resultsphp (Embedded Metadata Manifesto 2014)

Kritzinger W T Search Engine Optimization and Pay-per-Click Marketing Strategies Journal of Organizational

Computing and Electronic Commerce no 3 (2013) 273-86

Lippincott Joan K ldquoNet Generation Students and Librariesrdquo EDUCAUSE (2005) accessed November 19 2014

httpwwweducauseeduresearch-and-publicationsbookseducating-net-generationnet-generation-students-and-libraries

References (Slides amp Full Paper)

Nakanishi T Semantic Context-Dependent Weighting for Vector Space Model Semantic Computing (ICSC) 2014

IEEE International Conference on vol no pp262266 16-18 June 2014 doi 101109ICSC201449

Paz Anita 2013 In search of Meaning The Written Word in the Age of Google Italian Journal Of Library amp

Information Science 4 no 2 255-266

Priebe T Schlager C Pernul G A search engine for RDF metadata Database and Expert Systems Applications

2004 Proceedings 15th International Workshop on vol no pp168172 2004 doi 101109DEXA20041333468

Reicks David 2010 ldquoWhy Embedded Metadata Wonrsquot Help Your SEOrdquo Last Updated December 30 2013 Last

Accessed November 23 2014 httpwwwcontrolledvocabularycomblogembedded-metadata-wont-help-seohtml

Page 25: Structured data and metadata evaluation methodology for organizations looking to improve image findability on the web emily kolvitz_2014

References (Slides amp Full Paper)

International Business Times 0006 BingGoogle and Yahoo merge to make search easier with schemaorg

International Business Times April

IPTC International Press Telecommunications Council 2014 ldquoEmbedded Metadata Manifestordquo Last accessed November

20 2014 httpwwwembeddedmetadataorgsocial-media-test-resultsphp (Embedded Metadata Manifesto 2014)

Kritzinger W T Search Engine Optimization and Pay-per-Click Marketing Strategies Journal of Organizational

Computing and Electronic Commerce no 3 (2013) 273-86

Lippincott Joan K ldquoNet Generation Students and Librariesrdquo EDUCAUSE (2005) accessed November 19 2014

httpwwweducauseeduresearch-and-publicationsbookseducating-net-generationnet-generation-students-and-libraries

References (Slides amp Full Paper)

Nakanishi T Semantic Context-Dependent Weighting for Vector Space Model Semantic Computing (ICSC) 2014

IEEE International Conference on vol no pp262266 16-18 June 2014 doi 101109ICSC201449

Paz Anita 2013 In search of Meaning The Written Word in the Age of Google Italian Journal Of Library amp

Information Science 4 no 2 255-266

Priebe T Schlager C Pernul G A search engine for RDF metadata Database and Expert Systems Applications

2004 Proceedings 15th International Workshop on vol no pp168172 2004 doi 101109DEXA20041333468

Reicks David 2010 ldquoWhy Embedded Metadata Wonrsquot Help Your SEOrdquo Last Updated December 30 2013 Last

Accessed November 23 2014 httpwwwcontrolledvocabularycomblogembedded-metadata-wont-help-seohtml

Page 26: Structured data and metadata evaluation methodology for organizations looking to improve image findability on the web emily kolvitz_2014

References (Slides amp Full Paper)

Nakanishi T Semantic Context-Dependent Weighting for Vector Space Model Semantic Computing (ICSC) 2014

IEEE International Conference on vol no pp262266 16-18 June 2014 doi 101109ICSC201449

Paz Anita 2013 In search of Meaning The Written Word in the Age of Google Italian Journal Of Library amp

Information Science 4 no 2 255-266

Priebe T Schlager C Pernul G A search engine for RDF metadata Database and Expert Systems Applications

2004 Proceedings 15th International Workshop on vol no pp168172 2004 doi 101109DEXA20041333468

Reicks David 2010 ldquoWhy Embedded Metadata Wonrsquot Help Your SEOrdquo Last Updated December 30 2013 Last

Accessed November 23 2014 httpwwwcontrolledvocabularycomblogembedded-metadata-wont-help-seohtml