More HTRC Loretta Auvil, Boris Capitanu University of Illinois at Urbana-Champaign...
-
Upload
hillary-shana-hodge -
Category
Documents
-
view
219 -
download
0
Transcript of More HTRC Loretta Auvil, Boris Capitanu University of Illinois at Urbana-Champaign...
More HTRC
Loretta Auvil, Boris Capitanu
University of Illinois at Urbana-Champaign
Topic Modeling References• http://www.matthewjockers.net/2011/09/29/the-lda-buffet-is-now-
open-or-latent-dirichlet-allocation-for-english-majors/
• http://dsl.richmond.edu/dispatch/pages/intro
• http://historying.org/2010/04/01/topic-modeling-martha-ballards-diary/
• http://www.ics.uci.edu/~newman/pubs/JASIST_Newman.pdf
• https://dhs.stanford.edu/visualization/topic-networks/
• Proceedings of the 5th ACL-HLT Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities, pages 96–104, Portland, OR, USA, 24 June 2011. © 2011 Association for Computational Linguistics
• Matthew Jockers, Macroanalysis: Digital Methods and Literary History, UIUC Press, 2013
• Termite: Visualization Techniques for Assessing Textual Topic Models, Jason Chuang, Christopher D. Manning, Jeffrey Heer, Advanced Visual Interfaces, 2012
• Mallet website: http://mallet.cs.umass.edu
• David Mimno’s website: http://www.cs.princeton.edu/~mimno/
Spellchecking Analysis
• Not just OCR detection but OCR correction
• Can also be used for cleaning other messy data
Learning Exercises (1)1. Run Meandre_Topic_Modeling Algorithm
A. Click on “Algorithms”
B. Click on “Meandre_Topic_Modeling”
1. Provide Job Name (required)
2. Select a Workset (required)
3. Adjust Additional Parameters (optional)
a. Provide the number of tokens to be displayed in the tagcloud (default: 200):
b. Provide the number of topics to be created (default: 10):
4. Click “Submit” button
C. Once Job finishes, select Job Name
D. View Results by clicking on “topic_tagclouds.html”
Learning Exercises (2)2. Run
Meandre_Spellcheck_Report_Per_VolumeA. Click on “Algorithms”
B. Click on “Meandre_Spellcheck_Report_Per_Volume”
1. Provide Job Name (required)
2. Select a Workset (required)
3. Adjust Additional Parameters (optional)
a. Provide a text for transformation, e.g. h=li; li=h; rn=m; m=rn; s=f;
b. Provide a url that contains the dictionary
c. Provide a url for token counts that can be used for choosing the best correctly spelled word based on popularity.
4. Click “Submit” button
C. Once Job finishes, select Job Name
D. View Results by clicking on “spellcheck_report.html”, “replacement_rules.txt”, etc
Attendee Project Plan
• Study/Project Title
• Team Members and their Affiliation
• Procedural Outline of Study/Project– Research Question/Purpose of Study
– Data Sources
– Analysis Tools
• Activity Timeline or Milestones
• Report or Project Outcome(s)
• Ideas on what your team needs from SEASR staff to help you achieve your goal.
Identify Research Question