The East Asian Studies Macroscope: Infrastructure for Collaborative Scholarship across Corpora and...
-
Upload
peter-broadwell -
Category
Education
-
view
402 -
download
0
Transcript of The East Asian Studies Macroscope: Infrastructure for Collaborative Scholarship across Corpora and...
The East Asian Studies Macroscope
@PeterBroadwell, UCLA Digital Library
Digital Research in East Asian Studies: July 12, 20161
The East Asian Studies
Macroscope:Infrastructure for Collaborative Scholarship
across Corpora and Institutions
Peter BroadwellAcademic Projects Developer
UCLA Digital Library
@PeterBroadwell
EASM | 東亞研究宏觀鏡 | ヒュー:マ |인문학매크로스콥
Prof. Tim Tangherlini Prof. Jack Chen
The East Asian Studies Macroscope
@PeterBroadwell, UCLA Digital Library
Digital Research in East Asian Studies: July 12, 20162
Timothy R. Tangherlini, “The Folklore Macroscope: Challenges for a Computational Folkloristics,”
The 34th Archer Taylor Memorial Lecture, Western Folklore 72, no. 1 (2013): 7-27.
An integrated suite of digital tools and
interfaces that allows researchers to model
the complexity of cultural phenomena,
moving between close reading, distant
reading, and all levels in between.
Vision for a humanities macroscope:
micro-scale meso-scale macro-scale
The East Asian Studies Macroscope
@PeterBroadwell, UCLA Digital Library
Digital Research in East Asian Studies: July 12, 20163
Key features of a distributed humanities
macroscope infrastructure
Facilitates secure use of restricted-access collections
• Sensitive data can remain on its home server
• Access to data is via secure protocols
• Support for server-side processing: if necessary, only
summaries and/or results are exported from corpus servers
Researchers can run analyses across multiple
collections hosted at different participating institutions
• This enables novel types of research that cannot be done
on locally downloaded data, or even on the host’s servers
• Multi-corpus stylometry, topic modeling
• Cross-corpus network analysis, geo-coding, etc.
The East Asian Studies Macroscope
@PeterBroadwell, UCLA Digital Library
Digital Research in East Asian Studies: July 12, 20164
Institution B
Macroscope
portal
Institution A
Y ZX
Access policies
Tool A Tool B
collections
Findings
The macroscope research environment
The East Asian Studies Macroscope
@PeterBroadwell, UCLA Digital Library
Digital Research in East Asian Studies: July 12, 20165
Features of a macroscope research portal
1. Users and group accounts
• Users may choose with whom to share materials
2. Corpus access management
• Authenticates access to external or local data sets
3. Analytical tools and workflow development
• Researchers may run existing tools, or create their own
4. Visualization and sharing of research results
• Scholars can present, view, and comment on findings
• Analytical results can be made available for download
The East Asian Studies Macroscope
@PeterBroadwell, UCLA Digital Library
Digital Research in East Asian Studies: July 12, 20166
Features of a macroscope research portal
1. Users and group accounts
• Users may choose with whom to share materials
Example: Liferay user portal (www.liferay.com)
The East Asian Studies Macroscope
@PeterBroadwell, UCLA Digital Library
Digital Research in East Asian Studies: July 12, 20167
User and group accounts
Institution B
Macroscope
portal
Institution A
Y ZX
Access policies
Tool A Tool B
collections
Findings
The East Asian Studies Macroscope
@PeterBroadwell, UCLA Digital Library
Digital Research in East Asian Studies: July 12, 20168
Features of a macroscope research portal
2. Corpus access management
• Authenticates access to external or local data sets
Example: Alveo corpus selection interface (http://alveo.edu.au/)
The East Asian Studies Macroscope
@PeterBroadwell, UCLA Digital Library
Digital Research in East Asian Studies: July 12, 20169
Corpus access policies
Institution B
Macroscope
portal
Institution A
Y ZX
Access policies
Tool A Tool B
collections
Findings
The East Asian Studies Macroscope
@PeterBroadwell, UCLA Digital Library
Digital Research in East Asian Studies: July 12, 201610
Features of a macroscope research portal
3. Analytical tools and workflow development
• Researchers may run existing tools, or create their own
Example: Network creation and analysis workflow in Knime (https://www.knime.org/)
The East Asian Studies Macroscope
@PeterBroadwell, UCLA Digital Library
Digital Research in East Asian Studies: July 12, 201611
Tools and workflow development
Institution B
Macroscope
portal
Institution A
Y ZX
Access policies
Tool A Tool B
collections
Findings
custom
workflows
The East Asian Studies Macroscope
@PeterBroadwell, UCLA Digital Library
Digital Research in East Asian Studies: July 12, 201612
Features of a macroscope research portal
4. Visualization and sharing of research results
• Scholars can present, view, and comment on findings
• Analytical results can be made available for download
Example: Network
analysis results
visualized in multiple
offline tools: Knime,
Visone, Cytoscape,
Gephi
The East Asian Studies Macroscope
@PeterBroadwell, UCLA Digital Library
Digital Research in East Asian Studies: July 12, 201613
Sharing of research results
Institution B
Macroscope
portal
Institution A
Y ZX
Access policies
Tool A Tool B
collections
Findings
The East Asian Studies Macroscope
@PeterBroadwell, UCLA Digital Library
Digital Research in East Asian Studies: July 12, 201614
An example: Communication & empire(s) –
全唐詩 and Heian漢詩
Special thanks:Tomoko BialockJapanese Studies LibrarianUCLA LibraryImage sources: Wikimedia Commons
The East Asian Studies Macroscope
@PeterBroadwell, UCLA Digital Library
Digital Research in East Asian Studies: July 12, 201615
The Hentaigana mobile app
Funded by the Tadashi Yanai Initiative for Globalizing Japanese Humanities
Supported by the UCLA Library
The East Asian Studies Macroscope
@PeterBroadwell, UCLA Digital Library
Digital Research in East Asian Studies: July 12, 201616
Hentaigana, classical Japanese, and digital scholarship
The East Asian Studies Macroscope
@PeterBroadwell, UCLA Digital Library
Digital Research in East Asian Studies: July 12, 201617
Image analysis of thumbnails from IIIF manifest
The East Asian Studies Macroscope
@PeterBroadwell, UCLA Digital Library
Digital Research in East Asian Studies: July 12, 201618
Topics in Genji monogatari (ca. 1020)
The East Asian Studies Macroscope
@PeterBroadwell, UCLA Digital Library
Digital Research in East Asian Studies: July 12, 201619
Topics in Genji monogatari (ca. 1020)
The East Asian Studies Macroscope
@PeterBroadwell, UCLA Digital Library
Digital Research in East Asian Studies: July 12, 201620
Advanced n-gram viewers
The East Asian Studies Macroscope
@PeterBroadwell, UCLA Digital Library
Digital Research in East Asian Studies: July 12, 201621
t-SNE dimensionality reduction
The East Asian Studies Macroscope
@PeterBroadwell, UCLA Digital Library
Digital Research in East Asian Studies: July 12, 201622
“Confusion matrix” of original vs. naïve Bayes poem genre classifications
Mimno,
Broadwell,
Tangherlini.
2014. “The
Telltale Hat:
LDA and
Classification
Problems in a
Large
Folklore
Corpus.” DH
2014,
Lausanne,
Switzerland.
The East Asian Studies Macroscope
@PeterBroadwell, UCLA Digital Library
Digital Research in East Asian Studies: July 12, 201623
“Confusion matrix” of original vs. naïve Bayes poem genre classifications
Mimno,
Broadwell,
Tangherlini.
2014. “The
Telltale Hat:
LDA and
Classification
Problems in a
Large
Folklore
Corpus.” DH
2014,
Lausanne,
Switzerland.
The East Asian Studies Macroscope
@PeterBroadwell, UCLA Digital Library
Digital Research in East Asian Studies: July 12, 201624
Heian/Kamakura kanshi collections• Kaifūsō 懐風藻 – 751 (116 poems)
• Ryōunshū 凌雲集 – 814 (91 poems)
• Bunka shūreishū文華秀麗集 – 818 (140 poems)
• Keikokushū経国集 – 827 (213 poems)
• Toshi bunshū都氏文集 – 879, poems probably by都良香 (71 poems)
• Den-shikashū田氏家集 – 891 (written or collected by 島田忠臣) (217 poems)
• Kanke bunzō菅家文草 – ca. 900; 468 poems by 菅原道眞, rest Buddhist texts
• Zenshūsai-taku shi-awase 善秀才宅詩合 – 963, from poetry contest (12)
• Fusōshū扶桑集 – 995-999 (100 poems)
• Honchō reisō本朝麗藻 – 1010 (153 poems)
• Gōrihōshū江吏部集 – 1011 (135 poems)
• Wakan rōeishū和漢朗詠集 – 1013 (225 poems)
• Jishin shi-awase 侍臣詩合 – 1051, from a courtiers’ poetry contest (8 poems)
• Hosshōji Kanpaku goshū法性寺關白御集 – by藤原忠通 (1097-1164) (102)
• Honchō mudaisi本朝無題詩 – 1162-64 (658 poems)
• Tenjō shi-awase 殿上詩合 – from a palace competition, year unknown (40)
• Sukezane Nagakane ryōkyō hyakuban shi-awase 資實長兼百番詩合 –
Sukezane and Nagakane Lords’ 100 poem contests, year unknown (200)
• Poems about Genji monogatari賦光源氏物語詩 – 1291 (55 poems)
The East Asian Studies Macroscope
@PeterBroadwell, UCLA Digital Library
Digital Research in East Asian Studies: July 12, 201625
Heian/Kamakura 漢詩 collections
3,004 total poems
Historical source: Gunsho Ruijū
(群書類従), published 1894-1912
Partially digitized in Waseda University’s
Kanshi Database, the Internet Archive (?)
Major source: an
enthusiast’s site
http://miko.org/
~uraki/kuon/furu/
furu_index1.htm
Internet Archive
The East Asian Studies Macroscope
@PeterBroadwell, UCLA Digital Library
Digital Research in East Asian Studies: July 12, 201626
The organization of the 全唐詩
• poems are organized according to categories (i.e., imperial authorship, “Music Bureau” poetry, insult poetry)
• bulk of the poems belong to individual authors, organized historically (別集)
• authors may be excluded from historical organization based on certain traits (women, Buddhists, Daoists, ghosts)
• later fascicles are dominated by a kind of miscellaneous quality
The East Asian Studies Macroscope
@PeterBroadwell, UCLA Digital Library
Digital Research in East Asian Studies: July 12, 201627
全唐詩 table of contents
1-9 Emperors, empresses, and imperial members
10-16 Ritual and ceremonial poems
17-29 “Music Bureau” poetry (樂府詩)
30-731 Individual Tang poets
732-733 Dynastic villains and rebels
734-766 Individual Five Dynasties poets
767-784 Poets with partial biographical information
785-787 Poems without authorial attribution
788-794 Linked verse poems (聯句)
795 Incomplete poems and lines by poets not listed above
796 Incomplete poems and lines without authorial attribution
797-805 Poems by women authors
806-851 Poems by Buddhist figures
852-859 Poems by Daoist figures
860-862 Poems by male immortals (仙)
863 Poems by female immortals (女仙)
864 Poems by divinities (神)
865-866 Poems by ghosts (鬼)
The East Asian Studies Macroscope
@PeterBroadwell, UCLA Digital Library
Digital Research in East Asian Studies: July 12, 201628
全唐詩 table of contents, continued
867 Poems by weirds (怪)
868 Dream poems (夢)
869-872 Jest and insult poems (諧謔)
873 Poems inscribed on walls (提語) and judgments (判)
874 Songs (歌) sung by local communities or groups
875 Prophetic verse (讖記)
876 Sayings in verse form (語)
877 Orally transmitted enigmatic verse (諺謎)
878 Orally transmitted ditties (謠)
879 Drinking songs (酒令)
880 Divination songs (占辭)
881 The Mengqiu蒙求 by Li Han 李瀚882-888 Poems left out of previous sections
889-900 Song-lyrics (詞)
The East Asian Studies Macroscope
@PeterBroadwell, UCLA Digital Library
Digital Research in East Asian Studies: July 12, 201629
http://etkspace.scandinavian.ucla.edu/~broadwell/poem_clusters.html
Macro-scale: clustering by shared n-grams
The East Asian Studies Macroscope
@PeterBroadwell, UCLA Digital Library
Digital Research in East Asian Studies: July 12, 201630
Th
e 9
00
卷o
f th
e 全唐詩
+ 1
8 v
olu
me
s o
f 平安時代の漢詩
The East Asian Studies Macroscope
@PeterBroadwell, UCLA Digital Library
Digital Research in East Asian Studies: July 12, 201631
Th
e 9
00
卷o
f th
e 全唐詩
+ 1
8 v
olu
me
s o
f 平安時代の漢詩
18 kanshi
collections
Individual poets
given fairly
contiguous 卷ranges in the
全唐詩, in roughly
chronological
order (別集)
卷 424-462
白居易 Bai Juyi
(772-846)
卷 216-234
杜甫 Du Fu
(712-770)
卷 161-185
李白 Li Bai
(701-762)
The East Asian Studies Macroscope
@PeterBroadwell, UCLA Digital Library
Digital Research in East Asian Studies: July 12, 201632
Meso-scale: LDA topic modeling (全唐詩)
The East Asian Studies Macroscope
@PeterBroadwell, UCLA Digital Library
Digital Research in East Asian Studies: July 12, 201633
EASM: topic modeling the Quan Tang shiMeso-scale: LDA topic modeling
(全唐詩)
The East Asian Studies Macroscope
@PeterBroadwell, UCLA Digital Library
Digital Research in East Asian Studies: July 12, 201634
Meso-scale: LDA topic modeling (漢詩)
The East Asian Studies Macroscope
@PeterBroadwell, UCLA Digital Library
Digital Research in East Asian Studies: July 12, 201635
EASM: topic modeling the Quan Tang shiMeso-scale: LDA topic modeling
(漢詩)
The East Asian Studies Macroscope
@PeterBroadwell, UCLA Digital Library
Digital Research in East Asian Studies: July 12, 201636
Subcorpus Topic Modeling (STM)
The East Asian Studies Macroscope
@PeterBroadwell, UCLA Digital Library
Digital Research in East Asian Studies: July 12, 201637
Institution B
Macroscope
portal
Institution A
YX
Access policies
STM tool
Subcorpus Topic Modeling (STM)
Summary
tool
Summary
tool
X=well-known
corpus, e.g.,
the 13 Classics
Y=large, unknown
corpus, e.g.
Tang prose
The East Asian Studies Macroscope
@PeterBroadwell, UCLA Digital Library
Digital Research in East Asian Studies: July 12, 201638
Institution B
Macroscope
portal
Institution A
YX
Access policies
STM tool
Subcorpus Topic Modeling (STM)
Summary
tool
Summary
tool
X=well-known
corpus, e.g.,
the 13 Classics
Y=large, less well
known corpus,
e.g., Tang prose
The East Asian Studies Macroscope
@PeterBroadwell, UCLA Digital Library
Digital Research in East Asian Studies: July 12, 201639
Institution B
Macroscope
portal
Institution A
YX
Access policies
STM tool
Subcorpus Topic Modeling (STM)
Summary
tool
Summary
tool
X=well-known
corpus, e.g.,
the 13 Classics
Y=large, less well
known corpus,
e.g., Tang prose
The East Asian Studies Macroscope
@PeterBroadwell, UCLA Digital Library
Digital Research in East Asian Studies: July 12, 201640
Institution B
Macroscope
portal
Institution A
YX
Access policies
STM tool
Subcorpus Topic Modeling (STM)
Summary
tool
Summary
tool
X=well-known
corpus, e.g.,
the 13 Classics
Y=large, less well
known corpus,
e.g., Tang prose
(topics)
The East Asian Studies Macroscope
@PeterBroadwell, UCLA Digital Library
Digital Research in East Asian Studies: July 12, 201641
Institution B
Macroscope
portal
Institution A
YX
Access policies
STM tool
Subcorpus Topic Modeling (STM)
Summary
tool
Summary
tool
X=well-known
corpus, e.g.,
the 13 Classics
Y=large, less well
known corpus,
e.g., Tang prose
(topics)
The East Asian Studies Macroscope
@PeterBroadwell, UCLA Digital Library
Digital Research in East Asian Studies: July 12, 201642
Institution B
Macroscope
portal
Institution A
YX
Access policies
STM tool
Subcorpus Topic Modeling (STM)
Summary
tool
Summary
tool
X=well-known
corpus, e.g.,
the 13 Classics
Y=large, less well
known corpus,
e.g., Tang prose
(topics)
The East Asian Studies Macroscope
@PeterBroadwell, UCLA Digital Library
Digital Research in East Asian Studies: July 12, 201643
Institution B
Macroscope
portal
Institution A
YX
Access policies
STM tool
Subcorpus Topic Modeling (STM)
Summary
tool
Summary
tool
X=well-known
corpus, e.g.,
the 13 Classics
Y=large, less well
known corpus,
e.g., Tang prose
(topics)
The East Asian Studies Macroscope
@PeterBroadwell, UCLA Digital Library
Digital Research in East Asian Studies: July 12, 201644
Subcorpus Topic Modeling (STM)
The East Asian Studies Macroscope
@PeterBroadwell, UCLA Digital Library
Digital Research in East Asian Studies: July 12, 201645
Subcorpus Topic Modeling (STM)
The East Asian Studies Macroscope
@PeterBroadwell, UCLA Digital Library
Digital Research in East Asian Studies: July 12, 201646
Subcorpus Topic Modeling (STM)
The East Asian Studies Macroscope
@PeterBroadwell, UCLA Digital Library
Digital Research in East Asian Studies: July 12, 201647
Micro-scale: word embedding
The East Asian Studies Macroscope
@PeterBroadwell, UCLA Digital Library
Digital Research in East Asian Studies: July 12, 201648
Distributed macroscope infrastructure
Institution B
Macroscope
portal
Institution A
Y ZX
Access policies
Tool A Tool B
collections
Findings
Secure
data
access
Option 1
The East Asian Studies Macroscope
@PeterBroadwell, UCLA Digital Library
Digital Research in East Asian Studies: July 12, 201649
Distributed macroscope infrastructure
Institution B
Macroscope
portal
Institution A
Y ZX
Access policies
Tool A Tool B
collections
Summary
tool
Option 2
Findings
Summary data
only, e.g.,
bibliographic
metadata
The East Asian Studies Macroscope
@PeterBroadwell, UCLA Digital Library
Digital Research in East Asian Studies: July 12, 201650
Distributed macroscope infrastructure
Institution B
Macroscope
portal
Institution A
Y ZX
Access policies
Tool A Tool B
collections
Summary
tool
Option 2
Findings
時時: 14
磷緇: 7
日遲: 6
相隨: 5
移時: 4
Summary data
(n-gram counts)
蟬鳴: 25
秋色: 19
唧唧: 9
秋風: 3
秋雨: 2
笙歌: 33
多少: 12
歌吹: 8
曲歌: 4
爭唱: 2
The East Asian Studies Macroscope
@PeterBroadwell, UCLA Digital Library
Digital Research in East Asian Studies: July 12, 201651
Distributed macroscope infrastructure
Institution B
Macroscope
portal
Institution A
Y ZX
Access policies
Tool A Tool B
collections
Option 3
(not as
desirable)
Findings
Tool B
The East Asian Studies Macroscope
@PeterBroadwell, UCLA Digital Library
Digital Research in East Asian Studies: July 12, 201652
Distributed macroscope infrastructure
Institution B
Macroscope
portal
Institution A
Y ZX
Access policies
Tool A
Tool B
collections
Findings
Results
only
Option 3
(not as
desirable)
The East Asian Studies Macroscope
@PeterBroadwell, UCLA Digital Library
Digital Research in East Asian Studies: July 12, 201653
Jöel de Rosnay, The Macroscope (New York: Harper & Row, 1979).
A macroscope lets us “observe what is at once too
great, slow, or complex for the human eye and
mind to notice and comprehend” Katy Börner, “Plug-and-Play Macroscopes,” Communications of the ACM 54, no. 3 (2011): 60-69.
The East Asian Studies Macroscope
@PeterBroadwell, UCLA Digital Library
Digital Research in East Asian Studies: July 12, 201654
East Asian Studies (EASM):
http://macroscope.cdh.ucla.edu
Funding source:
• The Andrew W. Mellon Foundation
Sample macroscope sites at UCLA
The Danish Folklore Macroscope:
http://etkspace.scandinavian.ucla.edu/
macroscope.html
Funding sources:
• American Council of Learned Societies
• The National Endowment for the Humanities
• UCLA Council on Research
• Nordic Council of Ministers
• UCLA Institute for Pure and Applied Mathematics (NSF)
The East Asian Studies Macroscope
@PeterBroadwell, UCLA Digital Library
Digital Research in East Asian Studies: July 12, 201655
The East Asian Studies Macroscope (EASM)
Exploratory phase (Phase 0): 2014-2015• Supported by Andrew W. Mellon Foundation, Profs. Jack Chen and
Timothy Tangherlini, UCLA Department of Asian Languages and
Cultures, Co-PIs
• Developed sample macroscope tools and analyses based on a
classical Chinese text corpus (collected poetry of the Tang Dynasty):
http://macroscope.cdh.ucla.edu
• Meetings with faculty and archivists at Academia Sinica, National
Taiwan University, National Tsinghua University, Dharma Drum
Buddhist College, and National Chengchi University, Jan. & Dec. 2015
Implementation Phase 1: 2016-2019 (pending)• Prospectus submitted to Andrew W. Mellon Foundation, January 2016
• Plan to develop software infrastructure and tools, establish
partnerships with archival institutions in Taiwan, Korea, others (?)
The East Asian Studies Macroscope
@PeterBroadwell, UCLA Digital Library
Digital Research in East Asian Studies: July 12, 201656
Other macroscope development projects
Sub-corpus topic modeling for large literary corpora• Supported by a Google Books research fellowship at UCLA, 2013-
2014
• Resulting publication: Tangherlini, T and P Leonard. 2014. “Trawling
in the Sea of the Great Unread: Sub-corpus topic modeling and
Humanities research.” Poetics 41 (6): 725-749.
Collaborations with Scandinavian partners• Project title: “New Digital Resources and Computational Methods for
the Study of Literature in a Global Context,” 2015-present
• Funded by the Transatlantic program for collaborative work in the field
of digital humanities, Fondation Maison des Sciences de l'Homme
(France) and the Andrew W. Mellon Foundation (USA)
• Core participants: UCLA, Aarhus University (Denmark). Exploring
collaborations with archives in Denmark, Norway, Sweden
The East Asian Studies Macroscope
@PeterBroadwell, UCLA Digital Library
Digital Research in East Asian Studies: July 12, 201657
EASM: mapping places mentioned in poems
Special thanks:David ShepardLead Academic Developer,UCLA Center for Digital Humanities
The East Asian Studies Macroscope
@PeterBroadwell, UCLA Digital Library
Digital Research in East Asian Studies: July 12, 201658
EASM: network graph of poem communities
Special thanks: David Shepard, Lead Academic Developer, UCLA Center for
Digital Humanities