1 Facets and structured research data in the Digital Humanities John Bradley and Michele Pasin...

Post on 26-Mar-2015

212 views 0 download

Tags:

Transcript of 1 Facets and structured research data in the Digital Humanities John Bradley and Michele Pasin...

1

Facets and structured research data in the Digital Humanities

John Bradley and Michele Pasinjohn.bradley@kcl.ac.ukmichele.pasin@gmail.com

Annual Bliss Classification Association Lecture19 April 2013

2

Facetted modelling in the dynamic world of structured humanities scholarshipJohn Bradley

Annual Bliss Classification Association Lecture19 April 2013

Bpi1700: (British Printed Images before 1700) http://www.bpi1700.org.uk CVMA: (Corpus Vitrearum Medii Aevi—Medieval stained glass)

http://www.cvma.ac.uk PBE: Prosopography of the Byzantine Empire (published on CD) PBW: Prosopography of the Byzantine World http://www.pbw.kcl.ac.uk PASE: (Prosopography of Anglo-saxon England) http://www.pase.ac.uk CCEd: Clergy of the Church of England: http://www.theclergydatabase.org.uk EMLoT: Early Modern London Theatres: http://www.emlot.kcl.ac.uk. DIAMM: Digital Image Archive of Medieval Music:

http://www.diamm.ac.uk/index.html OCVE: Online Chopin Variorum Edition: http://www.ocve.org.uk/index.html PoMS: Paradox of Medieval Scotland: http://www.poms.ac.uk Art of Making: http://www.artofmaking.ac.uk Records of Early English Drama The Making of Charlemagne’s Europe The Breaking of Britain

Structured Data DDH projects

3

Text in, Text out

HistoricalResearch

SourceSecond’rySource

SourceSourcePrimarySources

ArticleArticle

ArticleArticleBook /

Article

“the historical work as what it most manifestly is: a verbal structure in the form of a narrative prose discourse” Hayden White (1973), quoted in Jörn Rüsen (1987). “Historical Narration: Foundation, Types, Reason” in History and Theory Vol 26 No 4 4

Historians and narrative text

“Multiplicity is inherent in the word-narratives used to communicate history. Words are complex forms of information; they have 'halos of meaning', making them wonderfully evocative but imprecise and slippery. [...] Historians embrace this range of meanings. We prefer the medium of words and narratives because it permits us to represent the past as multidimensional, complex, and nonlinear, even though structurally our prose and our logic are sequential.”

David J. Bodenhamer (2008), "History and GIS: Implications for the Discipline", in Anne Kelly Knowles (ed.) (2008). Placing History: How Maps, Spatial Data, and the GIS Are Changing Historical Scholarship. Redlands, CA: ESRI Press. p. 224

5

Structured Data: Appropriate to the Humanities?

“humanistic inquiry reveals itself as an activity fundamentally dependent upon the location of pattern.”

“Of all the technologies in use among computing humanists, databases are perhaps the best suited to facilitating and exploiting [pattern].”

“To build a database one must be willing to move from the forest to the trees and back again; to use a database is to reap the benefits of the enhanced vision which the system affords.”

Stephen Ramsay (2004). “Databases” in A Companion to Digital Humanities”.

6

Appropriate to the Humanities?

the underlying ontology [that a database represents] has considerable intellectual value.

A well-designed database that contains information about people, buildings, and events in New York City contains not static information, but an entire set of ontological relations capable of generating statements about a domain.

A truly relational database, in other words, contains not merely "Central Park", "Frederick Law Olmstead", and "1857", but a far more suggestive string of logical relationships (e.g., "Frederick Law Olmstead submitted his design for Central Park in New York during 1857").

(from Steven Ramsay, “Databases” in A Companion to Digital Humanities”)

7

Three example projects

BPI1700: British Printed Images before 1700

EMLoT: Early Modern London Theatres PASE: Prosopography of Anglo-Saxon

England

9

“This website, funded by the Arts and Humanities Research Council, makes available a database of thousands of prints and book illustrations from early modern Britain in fully-searchable form.” (from bpi1700 webpage)

Online version gives free access to all these images

bpi1700

10

Purpose behind bpi1700

“Printed images are striking and revealing, potentially serving a wide range of illustrative and interpretative uses. They range from high art to crude satire, and significant conclusions can be drawn from their circulation and consumption about the culture of their period. Yet they are surprisingly little used by researchers, partly because they are currently difficult to access. This project seeks to rectify this by making a comprehensive collection of early modern British prints available online, and by promoting research on their relationship to their milieu.” (proposal to the AHRC)

11

BPI1700 online

12

13

Bpi1700 DB structure overview

Work

State

Impression

Producer

Producer Type

Person

Subject

Technique

14

Evidenced by

Has these Role in production

Created by

Created using

Represented in

Appears in

FRBR?

Other data

The “real” DB structure

Impression data: from Merlin and the V&A

Bpi1700 added value: Works and subject index

15

Early Modern London Theatres

Transmission and understanding: “Most of what we know about the early London theatres, which developed before, during and shortly after the life of Shakespeare, has been passed down to us through a complex process of filtration. Documents written at the time have been selected, copied, adapted, and interpreted over subsequent centuries, and that process has shaped our understanding. In turn, what we do with this received information will determine how future generations view the early theatres.”

"EMLoT lets you see what direct use has been made, over the last four centuries, of pre-1642 documents related to professional performance in purpose-built theatres and other permanent structures in the London area. [...] It tells you who used them, and when, and where you can find evidence of that use. It also gives you some access to what was used, because it includes a brief account (or ‘abstract’) of the transcription’s contents, together with a reference to the location of the original document.“ (from the EMLoT website)

16

Elements of the EMLoT structure

Event Type

Event

VenueAuspices

Document Record

TroupePrivy CouncilOffice of the RevelsCourt of RequestsLord Chamberlain’s Office[...]

Admiral’s MenQueen’s MenWorchester’s MenKing’s MenOxford’s Boys[...]

Globe TheatreFortuneBlackfriarsBel SavageBoar’s Heads[...]

playhouse contextcourt caseplayhouse businesspaymentplayer context[...]

Source

Primary Source

2ndary Source

Transcription

17

PBE: Prosopography of the Byzantine Empire (published on CD) PBW: Prosopography of the Byzantine World http://www.pbw.kcl.ac.uk PASE: (Prosopography of Anglo-saxon England) http://www.pase.ac.uk CCEd: Clergy of the Church of England:

http://www.theclergydatabase.org.uk Breaking of Britain

PoMS: Paradox of Medieval Scotland: http://www.poms.ac.uk PoNE: People of Northern England Database

The Making of Charlemagne’s Europe

Structured Prosopographical projects

18

A “Source Assertion”

An assertion made by the project team that a source "S" at reference “R" states something ("F") about a person or persons ("P")

19

Core structure for DDH’s Prosopographical databases

PersonPerson

AssertionAssertionAuthority ListsAuthority Lists

Assertion TypeAssertion Type

SourceSource

LocationLocation PossessionPossession

20

Instance of

Typed by

Connected toConnected to

Appears in

Connected to

PASE’s “real” structure

Person Assertion (Factoid) Authority Lists

Sources

Factoid Types

Possesion Place

21

Marriages in PASE

22

Facetted Thinking and structure

Facetted Classification:An approach to organise a body of materials

using facetted principles. Facetted Browsing:

The exploiting of facets to facilitate the exploration of a body of materials.

23

Facetted Browsing Principles

"Remember the purpose of the classification and the users. Who will use it? Why? Will they search it, browse it, or both? How well do they know the subject? Always remember it is meant for them to use.“

Denton 2003: “How to make a Facet Classification and put it on the Web” referencing Kwasnick, Barbara H. 1999. The role of classification in knowledge representation and discovery. Library Trends 48 (1): 22-47.

24

Faceted classification: advantages "Kwasnick (1999, 40-42) lists several things in favour of

faceted classifications: they do not require complete knowledge of the entities or their relationships; they are hospitable (can accommodate new entities easily); they are flexible; they are expressive; they can be ad hoc and free-form; and they allow many different perspectives on and approaches to the things classified.“

Denton 2003.

25

Searching the Clergy of the Church of England database

26

WWW Facetted browsing principles1. The user should not be able to form a query

that is known to have no results.

2. Users must always know where they are in the classification

3. Users must always be able to refine their query or adjust their navigation to see what is nearby in the classification

4. The URL is the notation of the classification.

Denton 2003

27

Facetted Browsing in bpi1700

Facets Selected Works28

Facets in PASE

29

Facets in Early Modern London Theatres (EMLoT)

30

Facets in Early Modern London Theatres (EMLoT)

31

Facets in Early Modern London Theatres (EMLoT)

32

Faceted classification: problems

"She [Kwasnick 1999] lists three major problems: the difficulty of choosing the right facets; the lack of the ability to express the relationships between them; and the difficulty of visualizing it all.”

Denton 2003

33

Metadata Definition:

“Metadata is sometimes defined literally as 'data about data,' but the term is normally understood to mean structured data about resources that can be used to help support a wide range of operations.”

(UKOLN (2001): “Metadata in a nutshell” (http://www.ukoln.ac.uk/metadata/publications/nutshell/)

Metadata and Dublin Core “Perhaps the most well-known metadata initiative is the

Dublin Core.” (UKOLN 2001)34

DC: The fifteen elements

Creator Title Subject

Contributor Date Description

Publisher Type Format

Coverage Rights Relation

Source Language I dentifier

From Weibel, Stuart (2007), Dublin Core Metadata Tutorial. OCLC Research

Resource has property

DC:CreatorDC:TitleDC:SubjectDC:Date...

X

implied subject

impliedverb

one of 15properties

property value(an appropriateliteral)

[optional qualifier]

[optional qualifier]

qualifiers(adjectives)

DC (Metadata) base syntax

From Weibel, Stuart (2007), Dublin Core Metadata Tutorial. OCLC Research

36

Resource has Date "2000-06-13"Revised

ISO8601

Resource has Subject "Languages -- Grammar"LCSH

From Weibel, Stuart (2007), Dublin Core Metadata Tutorial. OCLC Research

37

38

http://www.tutorialsonline.info/Common/DublinCore.html

DC.Subject: Dublin Core Meta Tags

DC.Creator: Alan Kelsey

DC.Format: text/html

DC.Date: 2007-01-06

DC.Publisher: Alan Kelsey, Ltd.

DC.Coverage: Hennepin Technical College

DC.Rights: Copyright 2011, ...

DC.Language: EN

Metadata: a “world view” of structure

Metadata: two kinds of data: Resource: The object being

classified Metadata: The classification data

Classification data could be used as facets

Does this rather “flat” model suit our purposes?

39

Resources Metadata

BLISS: BC2 standard facets

thing/entity kind part property material process operation patient product by-product agent space time

40

"These fundamental thirteen categories have been found to be sufficient for the analysis of vocabulary in almost all areas on knowledge. It is however quite likely that other general categories exist; it is certainly the case that there are some domain specific categories, such as those of form and genre in the field of literature" (pp 79-80). Vanda Broughton (2001): Faceted classification as a basis for knowledge organization in a digital environment; the Bliss Bibliographic Classification as a model for vocabulary management and the creation of multidimensional knowledge structures, New Review of Hypermedia and Multimedia, 7:1, 67-102

“BC2 makes an excellent starting point for thinking of how to make a faceted classification. Its facets can be renamed and adapted to suit your particular circumstances.” (Denton 2008)

Modelling: elements of the EMLoT structure

Event Type

Event

VenueAuspices

Document Record

TroupePrivy CouncilOffice of the RevelsCourt of RequestsLord Chamberlain’s Office[...]

Admiral’s MenQueen’s MenWorchester’s MenKing’s MenOxford’s Boys[...]

Globe TheatreFortuneBlackfriarsBel SavageBoar’s Heads[...]

playhouse contextcourt caseplayhouse businesspaymentplayer context[...]

Source

Primary Source

2ndary Source

Transcription

41

Modelling

"In terms of humanities computing, modelling is an iterative process of constructing and developing something like a computational 'knowledge representation' as this is defined in computer science. In fact we might say that a model is a manipulable knowledge representation.”

Willard McCarty 2002. “Humanities Computing: Essential Problems, Experimental Practice” in Literary and Linguistic Computing Vol 17 No 1. pp.103-125

42

Analytical Modelling: the utility of failure "the digital model illumines analytically by

isolating what would not compute. In other words, the failures of analytic modelling are where its success is to be found.”

Willard McCarty (2008). “What’s going on?” in Literary and Linguistic Computing, Vol 23 No 3. p. 256

43

Structure as a scholarly outcome, and its public presentation The tension between that and the need for

a public face to the project.Classification: “user focus”: focus on universal

structureModelling Scholarly structure: “scholar focus”:

focus on individual scholarly exploration and assertion

44

Where are the facets in a DB structure?

45

Spiteri 1998: CRG Principles: Fundamental Categories g) Fundamental Categories: "there exist

no categories that are fundamental to all subjects, and ... categories should be derived based upon the nature of the subject being classified" (pp 18-19)

Spiteri, Louise. (1998). “A Simplified Model for Facet Analysis”. Now online at http://iainstitute.org/en/learn/research/a_simplified_model_for_facet_analysis.php

46

Spiteri 1998: CRG Principles: Relevance b) Relevance: "when choosing facets by

which to divide entities, it is important to make sure that the facets reflect the purpose, subject, and scope of the classification system" (1998, 6).

47

Spiteri 1998: CRG (Classification Research Group) Principles: Differentiation a) Differentiation: "when dividing an entity

into its component parts, it is important to use characteristics of division (i.e., facets) that will distinguish clearly among these component parts" (Spiteri 1998, 5). For example, dividing humans by sex.

48

Structured data requires clear categories: authority lists

Authority lists provide a classification mechanismalGenderKey alGender alGenderAbrv

1

2 (Other) (Other)

3 Male M

4 Female F

5 Institution

Inst

6 M/F M/F

7 Undefined

(Undefined)

alOfficeTermKey alOfficeTerm

1

2 (Other)

3 King

4 Secundarius

5 Judge

6 Pincerna

7 Comes

8 Pope

9 Queen

10 Bishop

11 Counsellor

12 Abbess

13 Archbishop

14 Dux

15 Priest

16 Minister

17 Fasellus

18 Princeps

19 Cleric

49

CaseType (PoNE)id type

29 agreement

64 appeal of breach of peace

63 appeal of homicide

28 assize of last presentation (darrein presentm

1 assize of mort d'ancestor

45 assize of novel disseisin

14 deforcement

52 grand assize

37 last presentation (darrein presentment)

38 mort d'ancestor

2 novel disseisin

59 plea de namio vetito

47 plea in ecclesiastical court

25 plea of acquittal

7 plea of advowson

18 plea of agreement

27 plea of an extent

42 plea of appeal

62 plea of breach of peace

8 plea of charter-warrant

56 plea of death

5 plea of debt

21 plea of detention

23 plea of disseisin

10 plea of dower

22 plea of ejection

31 plea of false judgment

58 plea of false testimony

[...]

Spiteri 1998: CRG Principles: Ascertainability c) Ascertainability: "it is important to

choose facets that are definite and can be ascertained" (1998, 6).

50

Location Data in CCE:kinds of locations

51

Spiteri 1998: CRG Principles: Homogeneity & Mutual Exclusivity e) Homogeneity: "facets must be

homogeneous" (1998, 18). f) Mutual Exclusivity: facets must be "mutually

exclusive," "each facet must represent only one characteristic of division" (1998, 18). “i.e., that the contents any two facets cannot

overlap, and that each facet must represent only one characteristic of division.”

52

53

PASE: Office/Status/Occupation

Spiteri 1998: CRG Principles: Permanence d) Permanence: facets should "represent

permanent qualities of the item being divided" (1998, 18).

54

55

PASE’s event types: an evolving understanding

Revised event types (PASE II) Acts of crime, law-breaking/violence

Hostility, Burh-abandonment, Lust, Disobedience, Burning, Insulting ...

Legal/governmental/administrative acts and legitimate use of violence Legal/governmental/administrative acts

Challenge, Archiepiscopal see: restoration, Property-given/selling ...

Legitimate use of violence Imprisonment, Execution, Campaigning, War, Outlawing ...

Life-events/social and economic acts and relations Life Events

Retirement, Journey, Naming, Betrothal, Marriage, Birth ...

Social/economic acts and relations Visit, Promise, Begging, Ship-buiding, Slave-selling, Godparenting ...

Power-taking and power-leaving Political Acts

Conquest, Agreement, Throne-sitting, Message-sending ...

Taking/leaving power Appointment of abbot, royal insignia-entrusting, Coronation, Deposition of bishop, ...

Religious/ecclesiastical acts Acts of Christian piety

Commemoration of the dead, Martyrdom, Church going, Easter-observance, Confession ...

Acts of ecclesiastical authority Baptism, Confraternity, Tonsuring, Liturgical celebration, Ecclesiastical reform, Mission sending ...

56

Conclusions Facetted thinking in our structured projects arises out of

an exploratory and somewhat dynamic modelling rather than classification activity.

It provides a way for the public to have better access to a data structure that emerges from the project team’s emerging and shifting understanding and interests in their data. It has to fit with a model of data that has a mix of different entity

types and no specific entity centre. It has to fit with a model that is subject to change and evolution

Although facetted representation of our models is not a perfect fit with their nature, it has allowed for a browsing view of the data that enables the public to engage much better with the complexities of these project’s materials.

57

58

59

DC.Subject: Dublin Core Meta Tags

DC.Creator: Alan Kelsey

DC.Format: text/html

DC.Date: 2007-01-06

DC.Publisher: Alan Kelsey, Ltd.

DC.Coverage: Hennepin Technical College

DC.Rights: Copyright 2011, ...

DC.Language: EN

Why facets here?

Complex structure? Sparse data Public interface CCE query example

60