Digital cultural heritage spring 2015 day 2

53
Seminar at IMT Lucca - Spring 2015 Prof. Stefano Gazziano [email protected] Data, Value, People

Transcript of Digital cultural heritage spring 2015 day 2

Page 1: Digital cultural heritage spring 2015 day 2

Seminar at IMT Lucca - Spring 2015

Prof. Stefano Gazziano

[email protected]

Data, Value, People

Page 2: Digital cultural heritage spring 2015 day 2

Internet is a powerful a channel to spread info, and culture, which power towards management of cultural heritages is just being unleashed. Topics Pros and cons of using internet in managing cultural

heritage assets. The "death of distance" and motivation to cross real

distances. "Being digital" helps increase real visits. Virtual Museums, Virtual reality, Augmented reality:

technologies and content to improve the user experience of cultural heritage sites

Internet platforms, on-site installations, mobile devices, cloud computing platforms.

Stefano A Gazziano [email protected] 2

Page 3: Digital cultural heritage spring 2015 day 2

Internet is a gold mine, users are the nuggets. Let us learn how we can enrich culture.

Topics

What is “Big data” and what use it is.

“Analytics” or who are our internet visitors, what are they looking for, and do they found it on our internet presence ?

Data acquisition. Open data standards.

Digital contact with users. Before and after the visit.

Museum analytics, assessing user satisfaction. Case study.

Stefano A Gazziano

[email protected] 3

Page 4: Digital cultural heritage spring 2015 day 2

Internet has rules, netiquette, and we must conform and be smart. A few “musts” to put cultural heritage on the net.

Topics

Search Engine Optimization. Content updates, internet staff.

Web reputation management.

Search engine marketing: crawling, indexing, ranking.

Analitycs and conversions of a web site.

Stefano A Gazziano [email protected] 4

Page 5: Digital cultural heritage spring 2015 day 2

The web is really a wide world, and there is a lot more to do than just publish a web site.

Topics

Social networks: engagement techniques and online tools.

Going viral. Case study

Stefano A Gazziano [email protected] 5

Page 6: Digital cultural heritage spring 2015 day 2

Internet is a gold mine, users are the nuggets. Let us learn how we can enrich culture.

Topics

What is “Big data” and what use it is.

“Analytics” or who are our internet visitors, what are they looking for, and do they found it on our internet presence ?

Data acquisition. Open data standards.

Digital contact with users.

Museum analytics, assessing user satisfaction. Case study.

Stefano A Gazziano

[email protected] 6

Page 7: Digital cultural heritage spring 2015 day 2

As a general reference: Head First Data Analysis - A learner's guide to big numbers, statistics, and good decisions By Michael Milton Publisher: O'Reilly Media - July 2009

SAS Institute, International Institute for Analytics. Big Data in Big Companies - May 2013 Authored by:Thomas H. Davenport, Jill Dyché. http://www.sas.com/resources/asset/Big-Data-in-Big-Companies.pdf

Web analytics on Wikipedia: http://en.wikipedia.org/wiki/Web_analytics Google Analytics Home Page http://www.google.com/analytics/ Open Web analytics http://www.openwebanalytics.com/ Open data Wikipedia page http://en.wikipedia.org/wiki/Open_data Opencultuurdata http://www.opencultuurdata.nl/english/ at the Rijksmuseum, the

Regionaal Archief Leiden and Visserijmuseum Zoutkamp, The Netherelands. The Rijksmuseum API (Application Programming Interface)

https://www.rijksmuseum.nl/en/api How the Rijksmuseum opened up its collection - a case study http://pro.europeana.eu/pro-

blog/-/blogs/how-the-rijksmuseum-opened-up-its-collection-a-case-study http://www.museumsandtheweb.com/mw2012/papers/sharing_cultural_heritage_the_lin

ked_open_data Museum Analytics http://www.museum-analytics.org/

Stefano A Gazziano [email protected] 7

Page 8: Digital cultural heritage spring 2015 day 2

Now: a Video !!

And a loong one on visual overviews, just in case (MIT video, such stuff!)

Stefano A Gazziano [email protected] 8

Page 10: Digital cultural heritage spring 2015 day 2

Stefano A Gazziano [email protected] 10

90% of world's data generated over last

two years

Page 11: Digital cultural heritage spring 2015 day 2

There are few technology phenomena that have taken both the technical and the mainstream media by storm than “big data.”

From the analyst communities to the front pages of the most respected sources of journalism, the world seems to be awash in big data projects, activities, analyses, and so on.

However, as with many technology fads, there is some murkiness in its definition, which lends to confusion, uncertainty, and doubt when attempting to understand how the methodologies can benefit the organization. Therefore, it is best to begin with a definition of big data. The analyst firm Gartner can be credited with the most-frequently used (and perhaps, somewhat abused) definition:

Big data is high-volume, high-velocity and high-variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight and decision making.

Stefano A Gazziano [email protected] 11

Page 12: Digital cultural heritage spring 2015 day 2

For the most part, in popularizing the big data concept, the analyst community and the media have seemed to latch onto the alliteration that appears at the beginning of the definition, hyperfocusing on what is referred to as the “3Vs—volume, velocity, and variety.” Others have built upon that meme to inject additional Vs such as“value”or “variability,” intended to capitalize on an apparent improvement to the definition.

The challenge with Gartner’s definition is twofold. First, the impact of truncating the definition to concentrate on the Vs effectively distils out two other critical components of the message: 1. “cost-effective innovative forms of information processing” (the

means by which the benefit can be achieved); 2. “enhanced insight and decision-making”(the desired outcome)

Stefano A Gazziano [email protected] 12

Page 13: Digital cultural heritage spring 2015 day 2

Big data is fundamentally about applying innovative and cost-effective techniques for solving existing and future business problems whose resource requirements (for data management space, computation resources, or immediate, inmemory representation needs) exceed the capabilities of traditional computing environments as currently configured within the enterprise.

Stefano A Gazziano [email protected] 13

Page 14: Digital cultural heritage spring 2015 day 2

Stefano A Gazziano [email protected] 14

Page 15: Digital cultural heritage spring 2015 day 2

Stefano A Gazziano [email protected] 15

Page 16: Digital cultural heritage spring 2015 day 2

Stefano A Gazziano [email protected] 16

Main » TERM » U » unstructured data Related Terms structured data data structuredata dynamic data structure static data structure SQL - structured query language [email protected] Remote Server returned '< #5.2.2 smtp;550 5.2.2 STOREDRV.Deliver: mailbox full. The following information should help identify the cause: "MapiExceptionShutoffQuotaExceeded:16.18969:A0000000, 17.27161:0000000094000000000000000F00000000000000, 255.23226:31000000, 255.27962:FE000000, 255.17082:DD040000, 0.26937:0E000000, 4.21921:DD040000, 255.27962:FA000000, 255.1494:86000000, 255.26426:FE000000, 4.7588:0F010480, 4.6564:0F010480, 4.4740:05000780, 4.6276:05000780, 4.5721:DD040000, 4.6489:DD040000, 4.2199:DD040000, 4.17097:DD040000, 4.8620:DD040000, 255.1750:0F010480, 0.26849:EC030000, 255.21817:DD040000, 0.26297:0F010480, 4.16585:DD040000, 0.32441:DD040000, 4.1706:DD040000, 0.24761:DD040000, 4.20665:DD040000, 0.25785:DD040000, 4.29881:DD040000".>' Original message headers: Received: from exedge02.esteri.it (192.168.2.79) by exhub02.intranet.mae.dom (10.173.119.27) with Microsoft SMTP Server (TLS) id 8.3.389.2; Tue, 3 Feb 2015 16:55:44 +0100 Received: from fe-ex01.esteri.it (192.168.2.174) by exedge02.esteri.it (192.168.2.79) with Microsoft SMTP Server id 8.3.389.2; Tue, 3 Feb 2015 16:55:40 +0100 Received: by fe-ex01.esteri.it (Postfix, from userid 0) id 3kc9bh05VRz14pPS; Tue, 3 Feb 2015 16:55:43 +0100 (CET) Received: from smtpauth01.esteri.it (unknown [192.168.2.104]) by fe-ex01.esteri.it (Postfix) with ESMTP id 3kc9bg0kXyz14pPn; Tue, 3 Feb 2015 16:55:43 +0100 (CET) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: rwAAGTu0FSdN+qMnGdsb 2JhbABahDW2eJN8AoF hAQEBAQEBEAEBAQ EBBg0JCRQuhAwB AQEBAxIBXggQAgEIEQQBAQoeBw8jFAkIAQEEDgUVDYgLBbJDAYEfARxfBSgCilYBAZIIAYUmAQEBAQEBAQECAQEBAQEBAQEajxYRAR0zB4QpBYlvoRuEEG+BCzl+AQEB X-IronPort-AV: E=Sophos;i="5.09,513,1418079600"; d="scan'208";a="126904896" Received: from mail-db3on0140.outbound.protection.outlook.com (HELO emea01-db3-obe.outbound.protection.outlook.com) ([157.55.234.140]) by ironsmtp01.esteri.it with ESMTP; 03 Feb 2015 16:55:42 +0100 Received: from AMSPR04MB517.eurprd04.prod.outlook.com (10.242.20.143) by AMSPR04MB519.eurprd04.prod.outlook.com (10.242.20.27) with Microsoft SMTP Server (TLS) id 15.1.75.20; Tue, 3 Feb 2015 15:55:40 +0000 Received: from AMSPR04MB517.eurprd04.prod.outlook.com ([10.242.20.143]) by AMSPR04MB517.eurprd04.prod.outlook.com ([10.242.20.143]) with mapi id 15.01.0075.002; Tue, 3 Feb 2015 15:55:40 +0000 From: Stefano Gazziano <[email protected]> database database software ODBC - Open DataBase Connectivity cloud database By Vangie Beal The phrase "unstructured data" usually refers to information that doesn't reside in a traditional row-column database. As you might expect, it's the opposite of structured data -- the data stored in fields in a database. Unstructured data files often include text and multimedia content. Examples include e-mail messages, word processing documents, videos, photos, audio files, presentations, webpages and many other kinds of business documents. Note that while these sorts of files may have an internal structure, they are still considered "unstructured" because the data they contain doesn't fit neatly in a database. Experts estimate that 80 to 90 percent of the data in any organization is unstructured. And the amount of : with unstructured data. Big data refers to extremely large datasets that are difficult to analyze with traditional tools. Big data can include both structured and unstructured data, but IDC estimates that 90 percent of big data is unstructured data. Many of the tools designed to analyze big data can handle unstructured data. Implementing Unstructured Data Management Organizations use of variety of different software tools to help them organize and manage unstructured data. These can include the following: Big data tools: Software like Hadoop can process stores of both unstructured and structured data that are extremely large, very complex and changing rapidly. Business intelligence software: Also known as BI, this is a broad category of analytics, data mining, dashboards and reporting tools that help companies make sense of their structured and unstructured data for the purpose of making better business decisions. Data integration tools: These tools combine data from disparate sources so that they can be viewed or analyzed from a single application. They sometimes include the capability to unify structured and unstructured data. Document management systems: Also called "enterprise content management systems," a DMS can track, store and share unstructured data that is saved in the form of document files. Information management solutions: This type of software tracks structured and unstructured enterprise data throughout its lifecycle. Search and indexing tools: These tools retrieve information from unstructured data files such as documents, Web pages and photos. Unstructured Data Technology A group called the Organization for the Advancement of Structured Information Standards (OASIS) has published the Unstructured Information Management Architecture (UIMA) standard. The UIMA "defines platform-independent data representations and interfaces for software components or services called analytics, which analyze unstructured information and assign semantics to regions of that unstructured information." Many industry watchers say that Hadoop has become the de facto industry standard for managing Big Data. This open source project is managed by the Apache Software Foundation. PREVIOUS unpackNEXT unusual software bug

Page 17: Digital cultural heritage spring 2015 day 2

Stefano A Gazziano [email protected] 17

Page 18: Digital cultural heritage spring 2015 day 2

Stefano A Gazziano [email protected] 18

Page 19: Digital cultural heritage spring 2015 day 2

Stefano A Gazziano [email protected] 19

Page 20: Digital cultural heritage spring 2015 day 2

Stefano A Gazziano [email protected] 20

Page 21: Digital cultural heritage spring 2015 day 2

Stefano A Gazziano [email protected] 21

Page 22: Digital cultural heritage spring 2015 day 2

Stefano A Gazziano [email protected] 22

Page 23: Digital cultural heritage spring 2015 day 2

Stefano A Gazziano [email protected] 23

Page 24: Digital cultural heritage spring 2015 day 2

The current work on e-Infrastructures relevant to digital cultural heritage, such as DARIAH and CLARIN, and large-scale aggregators of digital content, like Europeana, changes the current landscape of digital cultural heritage.

Better understanding of big data implications on content, architectures, functionality of large digital collections and the effects on the users, quality and policy aspects is needed.

The digital cultural heritage community forum to discuss current work and theoretical advancements, and consolidate state-of-the-art research, provide a forum to discuss current experiences, and brainstorm future developments in the area.

Stefano A Gazziano [email protected] 24

Page 26: Digital cultural heritage spring 2015 day 2

Being a new domain, it also requires an in-depth discussion on integrating aspects of big data in curricula in librarianship, information science, archival science and a range of Humanities disciplines. Novel research relates to big data in the following domains: ◦ Cultural heritage objects and big data: ◦ aspects of capture, storage, sharing, and analysis ◦ Visualisation of large digital cultural heritage collections ◦ Curation of big cultural heritage collections ◦ Searching big data: Information retrieval and data mining ◦ Natural language processing: statistical NLP in cultural heritage ◦ Semantic web technologies and large scales of cultural data ◦ Web intelligence Cultural cloud ◦ Issues of aggregation of vast resources ◦ Distributed service architectures: SaaS, PaaS, IaaS ◦ Big data economics and digital heritage ◦ Evaluation, usability and use ◦ Visualisation methods and tools ◦ e-Infrastructures and large digital resources ◦ Citizen science: the challenges of scale in engaging citizens ◦ Educational aspects: how to introduce big data aspects in digital humanities and in Library and

Information Science schools?

Stefano A Gazziano [email protected] 26

Page 27: Digital cultural heritage spring 2015 day 2

Stefano A Gazziano [email protected] 27

Page 28: Digital cultural heritage spring 2015 day 2

Justify and quantify NH impact to the communities they serve while knowing relatively little about their visitors.

Understanding of visitor behavior in museums significantly lags common practice in the commercial sector to provide adequate insight into how best to achieve the field’s mission.

Simple attendance statistics are not enough.

Invest little in the detailed understanding of the actions, experiences, and ongoing participation of visitors once they enter the building.T

Tools to know how to achieve long-term relevance.

Stefano A Gazziano [email protected] 28

Page 29: Digital cultural heritage spring 2015 day 2

Data Acquisition

Digital contact with users

Assessing user satisfaction

Stefano A Gazziano [email protected] 29

Page 30: Digital cultural heritage spring 2015 day 2

And open data standards, a little bit

Page 31: Digital cultural heritage spring 2015 day 2

Surveys v/s Digital interaction

The danger of garbage in / garbage out

Wrong email (misspelling), Incorrect

statistical sampling and “confounders“

The importance of digital

interaction

Stefano A Gazziano [email protected] 31

Page 32: Digital cultural heritage spring 2015 day 2

Actually, we’ll present a brief overview, just what is necessary to interact then with a data analyst and not look too dumb

Stefano A Gazziano [email protected] 32

Page 33: Digital cultural heritage spring 2015 day 2

Stefano A Gazziano [email protected] 33

Page 34: Digital cultural heritage spring 2015 day 2

Our source

Stefano A Gazziano [email protected] 34

Page 35: Digital cultural heritage spring 2015 day 2

Stefano A Gazziano [email protected] 35

Page 36: Digital cultural heritage spring 2015 day 2

Stefano A Gazziano [email protected] 36

Page 37: Digital cultural heritage spring 2015 day 2

My problem ? Get more votes than others.

A tough job that requires quantitative directions. The best agency (progressive) is probably GQRR Research . I thank IPR Marketing, who graciously allowed me to disclose this study for IMT

Get voters to the polls

Create consensus on your proposal and candidate

Case study : Italian parliamentary 2013.

Stefano A Gazziano [email protected] 37

Page 38: Digital cultural heritage spring 2015 day 2

Identify segments of electorate

Survey voters

Target segments with proper

message Focus groups

Evaluate results

Stefano A Gazziano [email protected] 38

Page 39: Digital cultural heritage spring 2015 day 2

Loyal voters

Stefano A Gazziano [email protected] 39

Mobile voters

Swing voters

Non voters

Page 40: Digital cultural heritage spring 2015 day 2

Loyal voters

Stefano A Gazziano [email protected] 40

Mobile voters

Swing voters

Non voters

Now: profile, profile and profile again (8 – 12)

Page 41: Digital cultural heritage spring 2015 day 2

We want to get as much votes as

possible given the campaign budget

Where to allocate how much given the data analysis

results ?

Constraints:

“profitability” of target by segments

Total campaign budget

Time to election day

Decision variable:

How many ads to run per target

Stefano A Gazziano [email protected] 41

Page 42: Digital cultural heritage spring 2015 day 2

Stefano A Gazziano [email protected] 42

HFDA p 76

Page 43: Digital cultural heritage spring 2015 day 2

«Digital friends»

Stefano A Gazziano [email protected] 43

Page 44: Digital cultural heritage spring 2015 day 2

Beyond paper: actual observational digital data

Web site analytics, user experience Social networks engagement Direct contact by targeted mail Digital membership programs Online polls Newsletters Virtual / 3D museums Augmented reality Marketing & Upselling E-commerce

Stefano A Gazziano

[email protected] 44

Page 46: Digital cultural heritage spring 2015 day 2

Stefano A Gazziano [email protected] 46

Page 47: Digital cultural heritage spring 2015 day 2

Sorry but the technicalia is exactly the same for a Museum and a Supermarket

Surveys are not enough, and are expensive Social networks and web site presence could offer a

deluge of data Day 3 will exactly be on how to produce content

suitable for data collection. Day 4 will focus on activity to engage prospects on

social networks Today we have a look at how selected CH institutions

assess user satisfaction

Stefano A Gazziano [email protected] 47

Page 48: Digital cultural heritage spring 2015 day 2

Assessing user satisfaction

Stefano A Gazziano [email protected] 48

Page 51: Digital cultural heritage spring 2015 day 2

The ten largest museums in the world: off and online

Stefano A Gazziano [email protected] 51

Page 52: Digital cultural heritage spring 2015 day 2

The annual conference of Museums and the Web ◦ April 2-5, 2014 Baltimore, MD, USA

MW2014: Museums and the Web 2014 Tourist Satisfaction with Cultural Heritage destinations in India:

with special reference to Kolkata, West Bengal TOURIST SATISFACTION WITH CULTURAL / HERITAGE SITES: The

Virginia Historic Triangle A Study of Service Quality and Satisfaction for Museums - Taking

the National Museum of Prehistory as an Example The Contribution of Technology-Based Heritage Interpretation to

the Visitor Satisfaction in Museums

Stefano A Gazziano [email protected] 52

Page 53: Digital cultural heritage spring 2015 day 2

Stefano A Gazziano [email protected] 53