International Internet Preservation Consortium Research Slides from Ian Milligan
-
Upload
ian-milligan -
Category
Internet
-
view
110 -
download
0
Transcript of International Internet Preservation Consortium Research Slides from Ian Milligan
![Page 1: International Internet Preservation Consortium Research Slides from Ian Milligan](https://reader033.fdocuments.in/reader033/viewer/2022060205/55a0fe2b1a28ab1e2e8b45ad/html5/thumbnails/1.jpg)
Ian Milligan, PhD Assistant Professor of History [email protected]
'An Infinite Archive?’ Historical Explorations in
the Internet Archive’s Wide Web Scrape
![Page 2: International Internet Preservation Consortium Research Slides from Ian Milligan](https://reader033.fdocuments.in/reader033/viewer/2022060205/55a0fe2b1a28ab1e2e8b45ad/html5/thumbnails/2.jpg)
[http://en.wikipedia.org/wiki/File:Internet_map_1024.jpg]
![Page 3: International Internet Preservation Consortium Research Slides from Ian Milligan](https://reader033.fdocuments.in/reader033/viewer/2022060205/55a0fe2b1a28ab1e2e8b45ad/html5/thumbnails/3.jpg)
Why? !
Historians need to think about Computational Methods in an era of
web archives.
![Page 4: International Internet Preservation Consortium Research Slides from Ian Milligan](https://reader033.fdocuments.in/reader033/viewer/2022060205/55a0fe2b1a28ab1e2e8b45ad/html5/thumbnails/4.jpg)
“.... [n]ow expectations have inverted. Everything may be recorded and preserved*, at
least potentially.” !
- James Gleick, The Information !
* an overstatement, of course, but a useful one
![Page 5: International Internet Preservation Consortium Research Slides from Ian Milligan](https://reader033.fdocuments.in/reader033/viewer/2022060205/55a0fe2b1a28ab1e2e8b45ad/html5/thumbnails/5.jpg)
We have too much information to make sense
of with normal methods.
![Page 6: International Internet Preservation Consortium Research Slides from Ian Milligan](https://reader033.fdocuments.in/reader033/viewer/2022060205/55a0fe2b1a28ab1e2e8b45ad/html5/thumbnails/6.jpg)
The 80TB Wide Web Scrape
[March - December 2011]
![Page 7: International Internet Preservation Consortium Research Slides from Ian Milligan](https://reader033.fdocuments.in/reader033/viewer/2022060205/55a0fe2b1a28ab1e2e8b45ad/html5/thumbnails/7.jpg)
ca,yorku,justlabour)/ 20110714073726 http://www.justlabour.yorku.ca/ text/html 302 3I42H3S6NNFQ2MSVX7XZKYAYSCX5QBYJ http://www.justlabour.yorku.ca/index.php?page=toc&volume=16 -‐ 462 880654831 WIDE-‐20110714062831-‐crawl416/WIDE-‐20110714070859-‐02373.warc.gz
![Page 8: International Internet Preservation Consortium Research Slides from Ian Milligan](https://reader033.fdocuments.in/reader033/viewer/2022060205/55a0fe2b1a28ab1e2e8b45ad/html5/thumbnails/8.jpg)
![Page 9: International Internet Preservation Consortium Research Slides from Ian Milligan](https://reader033.fdocuments.in/reader033/viewer/2022060205/55a0fe2b1a28ab1e2e8b45ad/html5/thumbnails/9.jpg)
Methods (or the fun of playing with WARC files themselves)
![Page 10: International Internet Preservation Consortium Research Slides from Ian Milligan](https://reader033.fdocuments.in/reader033/viewer/2022060205/55a0fe2b1a28ab1e2e8b45ad/html5/thumbnails/10.jpg)
![Page 11: International Internet Preservation Consortium Research Slides from Ian Milligan](https://reader033.fdocuments.in/reader033/viewer/2022060205/55a0fe2b1a28ab1e2e8b45ad/html5/thumbnails/11.jpg)
![Page 12: International Internet Preservation Consortium Research Slides from Ian Milligan](https://reader033.fdocuments.in/reader033/viewer/2022060205/55a0fe2b1a28ab1e2e8b45ad/html5/thumbnails/12.jpg)
![Page 13: International Internet Preservation Consortium Research Slides from Ian Milligan](https://reader033.fdocuments.in/reader033/viewer/2022060205/55a0fe2b1a28ab1e2e8b45ad/html5/thumbnails/13.jpg)
![Page 14: International Internet Preservation Consortium Research Slides from Ian Milligan](https://reader033.fdocuments.in/reader033/viewer/2022060205/55a0fe2b1a28ab1e2e8b45ad/html5/thumbnails/14.jpg)
![Page 15: International Internet Preservation Consortium Research Slides from Ian Milligan](https://reader033.fdocuments.in/reader033/viewer/2022060205/55a0fe2b1a28ab1e2e8b45ad/html5/thumbnails/15.jpg)
![Page 16: International Internet Preservation Consortium Research Slides from Ian Milligan](https://reader033.fdocuments.in/reader033/viewer/2022060205/55a0fe2b1a28ab1e2e8b45ad/html5/thumbnails/16.jpg)
Named Entity Recognition as another approach?
![Page 17: International Internet Preservation Consortium Research Slides from Ian Milligan](https://reader033.fdocuments.in/reader033/viewer/2022060205/55a0fe2b1a28ab1e2e8b45ad/html5/thumbnails/17.jpg)
Countries Mentioned in .ca TLD (excluding Canada)
![Page 18: International Internet Preservation Consortium Research Slides from Ian Milligan](https://reader033.fdocuments.in/reader033/viewer/2022060205/55a0fe2b1a28ab1e2e8b45ad/html5/thumbnails/18.jpg)
Provinces Mentioned in .ca TLD
![Page 19: International Internet Preservation Consortium Research Slides from Ian Milligan](https://reader033.fdocuments.in/reader033/viewer/2022060205/55a0fe2b1a28ab1e2e8b45ad/html5/thumbnails/19.jpg)
Countries Mentioned in .mil TLD
![Page 20: International Internet Preservation Consortium Research Slides from Ian Milligan](https://reader033.fdocuments.in/reader033/viewer/2022060205/55a0fe2b1a28ab1e2e8b45ad/html5/thumbnails/20.jpg)
Countries Mentioned in .gov TLD
![Page 21: International Internet Preservation Consortium Research Slides from Ian Milligan](https://reader033.fdocuments.in/reader033/viewer/2022060205/55a0fe2b1a28ab1e2e8b45ad/html5/thumbnails/21.jpg)
Countries Mentioned in .edu TLD
![Page 22: International Internet Preservation Consortium Research Slides from Ian Milligan](https://reader033.fdocuments.in/reader033/viewer/2022060205/55a0fe2b1a28ab1e2e8b45ad/html5/thumbnails/22.jpg)
Countries Mentioned in .uk TLD (excluding UK)
![Page 23: International Internet Preservation Consortium Research Slides from Ian Milligan](https://reader033.fdocuments.in/reader033/viewer/2022060205/55a0fe2b1a28ab1e2e8b45ad/html5/thumbnails/23.jpg)
.ca montage
![Page 24: International Internet Preservation Consortium Research Slides from Ian Milligan](https://reader033.fdocuments.in/reader033/viewer/2022060205/55a0fe2b1a28ab1e2e8b45ad/html5/thumbnails/24.jpg)
.ca montage (zoomed in)
![Page 25: International Internet Preservation Consortium Research Slides from Ian Milligan](https://reader033.fdocuments.in/reader033/viewer/2022060205/55a0fe2b1a28ab1e2e8b45ad/html5/thumbnails/25.jpg)
.mil montage
![Page 26: International Internet Preservation Consortium Research Slides from Ian Milligan](https://reader033.fdocuments.in/reader033/viewer/2022060205/55a0fe2b1a28ab1e2e8b45ad/html5/thumbnails/26.jpg)
.mil montage (zoomed in)
![Page 27: International Internet Preservation Consortium Research Slides from Ian Milligan](https://reader033.fdocuments.in/reader033/viewer/2022060205/55a0fe2b1a28ab1e2e8b45ad/html5/thumbnails/27.jpg)
.cn montage
![Page 28: International Internet Preservation Consortium Research Slides from Ian Milligan](https://reader033.fdocuments.in/reader033/viewer/2022060205/55a0fe2b1a28ab1e2e8b45ad/html5/thumbnails/28.jpg)
.cn montage (zoomed in)