Linked Books - DH Venice Fall School 2014

18
Linked Books Giovanni Colavizza EPFL

Transcript of Linked Books - DH Venice Fall School 2014

Linked Books

Giovanni Colavizza EPFL

Motivation: a question

How to find sources for a humanities research?

How to find literature for a research in “hard” sciences?

Motivation: the differences between humanities and “hard” sciences

• Primary and secondary sources • Citation history (e.g. Google Scholar) • Citation semantics

Motivation: primary and secondary sources

Approx. half of the citations in humanities are to primary sources [Wiberley (2009)].

Their use has hardly ever been studied with citation analytic methods.

“For  scholarship  in  the  humanities  there  are  three  kinds  of  literature:  primary  literature  that  contains  the  evidence  on  which  humanists base  their  scholarship,  secondary  literature  in  which  humanists  write  up  their  scholarship,  and  access  services  that  describe  and  index  the  publications  written  by  humanists.”  (Wiberley,  2009)

Motivation: citation history

Lack of data [Sula and Miller (2014)], why? • Sparse and local sub-fields • Nationality (language and schools) • Proliferation of editorial practices

Motivation: citation semantics

•Humanists are less prone to credit each other than scientists [Heinzkill, 1980; Swales, 1990; Hellqvist, 2010]

•They are less prone to work together. Avg. authors per publication of 1.06 in a study by Linmans (2010)

•They use citations with a great variety of meanings and ways: agree, disagree, full association, minor reference, etc. [Harwood (2008), Cano (1989)]

Examples: Strongly negative: “Professor Epstein’s comment presents no new findings and ignores the theoretical issues I raise.” and quote to Epstein 2008. Ogilvie (2008). Association: “non basta ridimensionare gli aspetti strutturali del declino economico, che per Venezia fu comunque solo “relativo”, ..” and quote to Rapp 1979. Trivellato (2000).

Motivation: our answer

Citation analysis for humanities is an almost non-existent field, yet the results could be very rich:

We cannot simply use traditional citation analysis methods on humanities data. We need new questions and methods.

The project: goals

• Digitise all historiography on Venice we can (i.e., for now, history).

• Extract all citations and populate a database. • Analyse the history of the history of Venice and

develop a framework for citation analysis for humanities.

• Publish an open access search engine for scholars and general public.

The project: goals

“Side effects”, we have the full text of most publications on Venice, considering we are also digitising documents at the Archive.. • Indexes of keywords (e.g. named

entities) • Direct link publication-sources • Topic modelling and fine-grain

classification of publications (currently at most Dewey subjects..)

• Enhanced library catalogue

The project: partners and materials

Partnership with Ca’ Foscari Library System (humanities library) and discussion with major Venetian libraries.

Digitisation goal: digitise all secondary literature on Venice for the last 200y (monographs, journals, editions, etc.). Currently circa 5000 estimated items (there are many more). Digitisation ongoing (1513 done last Friday).

Methods: overview

Methods I: data extraction

Methods I: data extraction

Methods I: data extraction

The steps: • OCR • Citation detection • Citation parsing • Model and populate the db (ontologies for citations)

Basic tools: • Active annotation for supervised learning (minimise

training data to annotate) • Conditional Random Fields for parsing • RDF and triple stores as database

Methods II: citation analysis, networks

Network-based models. Remember primary and secondary sources, how many graphs can we build?

Bibliographic coupling and co-citation

Methods II: citation analysis, networks

Methods II: citation analysis

Network-based models: • Global analysis • Local analysis (communities and nodes) • Temporal analysis • Publication classification and analysis

Big questions: • Key works, authors, sources • Disciplinary segmentations • Measure intellectual influence and schools of thought • Map scholarly debates

Linked Books Thank you

Giovanni Colavizza EPFL