Groeling, Tim: NewsScape: Preserving TV News
-
Upload
reynolds-journalism-institute-rji -
Category
News & Politics
-
view
138 -
download
0
Transcript of Groeling, Tim: NewsScape: Preserving TV News
![Page 2: Groeling, Tim: NewsScape: Preserving TV News](https://reader034.fdocuments.in/reader034/viewer/2022051520/588001741a28ab3a1e8b7a9b/html5/thumbnails/2.jpg)
Who Are We?• Prof. Francis Steen: Director, Communication Studies
Archive
• Me: Leading the analog digitization effort
• Predecessor & Archive Founder: Prof. Paul Rosenthal (emeritus)
• UCLA Library: Helps support the collection, store files, and host main “public” site (tvnews.library.ucla.edu )
• Other supporters: UCLA Chancellor and Dean of Social Sciences, Arcadia Fund, UCLA Social Sciences Computing, the NSF, the California Endowment, UCLA Office for Instructional Development, and UCLA CCLE.
![Page 3: Groeling, Tim: NewsScape: Preserving TV News](https://reader034.fdocuments.in/reader034/viewer/2022051520/588001741a28ab3a1e8b7a9b/html5/thumbnails/3.jpg)
Collections?• Oldest collection: UCLA Campus
Speakers, from 1950s-1980s (over 500 audio recordings)
• Digitized to coincide with 40th anniversary of my department.
• Originally planned to exhibit & host on our website. Target at alumni.
• Moved to YouTube: Now most traffic (77%) comes from YouTube search/ suggested videos/browsing.
• Issues: commenters & copyright
![Page 4: Groeling, Tim: NewsScape: Preserving TV News](https://reader034.fdocuments.in/reader034/viewer/2022051520/588001741a28ab3a1e8b7a9b/html5/thumbnails/4.jpg)
NewsScape• Largest collection: TV news and public
affairs programs (local & national)
• Started during Watergate (preserve ephemera). Shoestring budget (until recently, only about $10k per year plus volunteer labor & donated equipment)
• 1979: Started trying to record all the local and national TV news viewable in LA.
• 2006: Started daily straight-to-digital recording.
• Since 2006: Added other cities.
![Page 5: Groeling, Tim: NewsScape: Preserving TV News](https://reader034.fdocuments.in/reader034/viewer/2022051520/588001741a28ab3a1e8b7a9b/html5/thumbnails/5.jpg)
Pre-2006 Holdings?• Recordings spread across three campus
organizations (Comm Studies, Library, and TFT) and at least four on- and off-campus storage sites.
• Good records for some portions; very poor records for others.
• Not sure how many tapes are in the collection overall.
• Even where we know what should be on a tape, some problems with tape, VCR, or schedule.
![Page 6: Groeling, Tim: NewsScape: Preserving TV News](https://reader034.fdocuments.in/reader034/viewer/2022051520/588001741a28ab3a1e8b7a9b/html5/thumbnails/6.jpg)
Tapes
• Earliest recordings (1970s) Around 500 U-Matic tapes.
• Middle period (1979-early 1990s): about 50k hours on Betamax
• Late period (1990s-2006): Around 160k hours on VHS, plus some redundancy.
![Page 7: Groeling, Tim: NewsScape: Preserving TV News](https://reader034.fdocuments.in/reader034/viewer/2022051520/588001741a28ab3a1e8b7a9b/html5/thumbnails/7.jpg)
Preservation• VHS are actually most threatened, despite being newest
tapes.
• Coincided with cable TV expansion of news programming: stretched same budget to cover more news programming.
• 8 hours per consumer VHS tape (Betas and U-matics were higher quality tapes; less recorded on each tape)
• Poor quality consumer-grade VCRs
• Limited spot-checking for quality (failing VCRs or poor signal quality not noticed for long stretches).
• Originals still in hand, but dead.
• Improperly stored (even faculty didn’t have A/C)
![Page 8: Groeling, Tim: NewsScape: Preserving TV News](https://reader034.fdocuments.in/reader034/viewer/2022051520/588001741a28ab3a1e8b7a9b/html5/thumbnails/8.jpg)
Cost to Digitize?• Got bids from another archive and
commercial providers: $1.5 million (just for first 150k hours of VHS).
• Instead, shoestring again.
• $20k (and some donated surplus machines) for hardware, software, and furniture for digitization lab.
• Run by me, part time lab manager, and 10 work-study students. Steen handles files.
• [Shifting to Betamax will be costly, though]
![Page 9: Groeling, Tim: NewsScape: Preserving TV News](https://reader034.fdocuments.in/reader034/viewer/2022051520/588001741a28ab3a1e8b7a9b/html5/thumbnails/9.jpg)
Lab Details• 22 digitization stations (VCR, encoder, computer).
3 local RAID file servers and 1 Filemaker Server.
• All computers: surplus or eBay Macs (circa 2008)
• Encoders: EyeTV using Hauppauge 950q or EyeTV Hybrid hardware MPEG-2 encoders (get CC). Export and sync scripted.
• VCRs: After testing, settled on JVC S-VHS VCRs (consumer to pro).
• Use pre-printed barcode stickers for inventory. Custom Filemaker database for tracking digitization attempts and quality control. Filemaker Go Mobile (via cell phones) for asset tracking.
• Files are quality-checked, compressed to h.264, closed captioning extracted via Hoffman Cluster.
![Page 10: Groeling, Tim: NewsScape: Preserving TV News](https://reader034.fdocuments.in/reader034/viewer/2022051520/588001741a28ab3a1e8b7a9b/html5/thumbnails/10.jpg)
Progress• Fall 2015: Process design &
workstation configuration testing
• Winter 2016: hired students, fixed network issues, and ramped up workstations.
• Summer 2016: Filemaker inventory control; two daily shifts.
• March-Oct 2016: 4.5k tapes encoded (about 36k hours).
• Delaying splitting files into shows.
![Page 11: Groeling, Tim: NewsScape: Preserving TV News](https://reader034.fdocuments.in/reader034/viewer/2022051520/588001741a28ab3a1e8b7a9b/html5/thumbnails/11.jpg)
Problems: Lots• Recording/playback VCRs out of spec
(solution: quality control aggregation helps find love connection)
• Varying program names over time (database tracking alternate show names; day/time/channel 30-minute bloc)
• Buzzing audio? Computer RF interference with VCR audio. (Used dead VCRs as spacers.)
• Lot of other problems, but in most cases, just means another encoding attempt.
![Page 12: Groeling, Tim: NewsScape: Preserving TV News](https://reader034.fdocuments.in/reader034/viewer/2022051520/588001741a28ab3a1e8b7a9b/html5/thumbnails/12.jpg)
Post 2006 Straight-to-Digital
• 46 networks (US and beyond)
• 2,525 Series
• Total video files: 383,550
• Duration in hours: 297,596
• Closed caption files: 383,739
• Words in caption files: 2,419,185,351
• OCR files: 371,426
• Words in OCR files: 825,662,597
• Total thumbnail images: 107,134,425
• Storage: 106.93 terabytes
• Limited public access link: tvnews.library.ucla.edu
![Page 13: Groeling, Tim: NewsScape: Preserving TV News](https://reader034.fdocuments.in/reader034/viewer/2022051520/588001741a28ab3a1e8b7a9b/html5/thumbnails/13.jpg)
Unlocking the Content• Preservation is just first step:
Needs to be more than "world’s best DVR"
• Want to provide tools to make the collection more useful and relevant beyond UCLA: Help people
• Understand TV news, and…
• Share what they find
![Page 14: Groeling, Tim: NewsScape: Preserving TV News](https://reader034.fdocuments.in/reader034/viewer/2022051520/588001741a28ab3a1e8b7a9b/html5/thumbnails/14.jpg)
Example: Obama• Preservation is just first step: Needs to be more
than world’s "best VCR"
• Want to provide tools to make the collection more useful and relevant beyond UCLA: Help people
• Understand TV news, and…
• Share what they find
Good to know, but…
![Page 15: Groeling, Tim: NewsScape: Preserving TV News](https://reader034.fdocuments.in/reader034/viewer/2022051520/588001741a28ab3a1e8b7a9b/html5/thumbnails/15.jpg)
Tools to Understand News• Not just view, but analyze
• Help understand and visualize patterns of news coverage, not just individual stories. Forest, not just trees.
• Tools are already being developed, but are complex
![Page 16: Groeling, Tim: NewsScape: Preserving TV News](https://reader034.fdocuments.in/reader034/viewer/2022051520/588001741a28ab3a1e8b7a9b/html5/thumbnails/16.jpg)
Tools to Understand Text
• Not just view, but analyze
• Help understand and visualize patterns of news coverage, not just individual stories (copyright, too)
• Tools are already being developed, but are complex
![Page 17: Groeling, Tim: NewsScape: Preserving TV News](https://reader034.fdocuments.in/reader034/viewer/2022051520/588001741a28ab3a1e8b7a9b/html5/thumbnails/17.jpg)
Ambitious Goal: Visuals• Text analysis is fairly mature (more
than 2 billion words in NewsScape index)
• Named entities, parts of speech, topic detection are all working now (sentiment is harder)
• Analysis of visuals is challenging.
• Facial detection & analysis tools are becoming more useful; scalable
![Page 18: Groeling, Tim: NewsScape: Preserving TV News](https://reader034.fdocuments.in/reader034/viewer/2022051520/588001741a28ab3a1e8b7a9b/html5/thumbnails/18.jpg)
Automated Analysis of Visuals
• Goal: Be able to understand patterns of visual communication in election news. Hard to study.
• Mostly hand-coded & focus on still newspaper or web photos
• Trouble scaling to massive volume of images.
• Subjectivity
• Machine learning and big data as solution
• Presented pilot study at this year’s American Political Science Association conference categorizing presidential candidate faces (smiling or not).
![Page 19: Groeling, Tim: NewsScape: Preserving TV News](https://reader034.fdocuments.in/reader034/viewer/2022051520/588001741a28ab3a1e8b7a9b/html5/thumbnails/19.jpg)
Face Validity
>17 14 ~ 17 11 ~ 14 8 ~ 11
< -13 -13 ~ -10 -10 ~ -7 -7 ~ -4 -4 ~ -1 -1 ~ 2
2 ~ 5 5 ~ 8
![Page 20: Groeling, Tim: NewsScape: Preserving TV News](https://reader034.fdocuments.in/reader034/viewer/2022051520/588001741a28ab3a1e8b7a9b/html5/thumbnails/20.jpg)
>17 14 ~ 17 11 ~ 14 8 ~ 11
< -13 -13 ~ -10 -10 ~ -7 -7 ~ -4 -4 ~ -1 -1 ~ 2
2 ~ 5 5 ~ 8
Face Validity
![Page 21: Groeling, Tim: NewsScape: Preserving TV News](https://reader034.fdocuments.in/reader034/viewer/2022051520/588001741a28ab3a1e8b7a9b/html5/thumbnails/21.jpg)
![Page 22: Groeling, Tim: NewsScape: Preserving TV News](https://reader034.fdocuments.in/reader034/viewer/2022051520/588001741a28ab3a1e8b7a9b/html5/thumbnails/22.jpg)
Weekly topic tracking (filter by outlet) with
metadata (who, what, where, how much)
Daily topic trajectory. News topics are
detected by clustering every day, and then linked the detected
topics to generate topic tracking trajectories (Li, Joo, Qi, & Zhu, 2015)
![Page 23: Groeling, Tim: NewsScape: Preserving TV News](https://reader034.fdocuments.in/reader034/viewer/2022051520/588001741a28ab3a1e8b7a9b/html5/thumbnails/23.jpg)
![Page 24: Groeling, Tim: NewsScape: Preserving TV News](https://reader034.fdocuments.in/reader034/viewer/2022051520/588001741a28ab3a1e8b7a9b/html5/thumbnails/24.jpg)
![Page 25: Groeling, Tim: NewsScape: Preserving TV News](https://reader034.fdocuments.in/reader034/viewer/2022051520/588001741a28ab3a1e8b7a9b/html5/thumbnails/25.jpg)
Other Goal: Sharing• Help people share what they
learn (within bounds of copyright)
• Solution #1: Share analysis, rather than raw material
• Solution #2: use familiar, copyright-compliant tools to create and share.
![Page 26: Groeling, Tim: NewsScape: Preserving TV News](https://reader034.fdocuments.in/reader034/viewer/2022051520/588001741a28ab3a1e8b7a9b/html5/thumbnails/26.jpg)
Social Sharing Tools• Trying to develop two tools:
• Animated GIF generator: (short clip; small file; on-screen captioning; easy to play)
• "Supercut" generator: Assemble short examples from archives; share compilation
![Page 27: Groeling, Tim: NewsScape: Preserving TV News](https://reader034.fdocuments.in/reader034/viewer/2022051520/588001741a28ab3a1e8b7a9b/html5/thumbnails/27.jpg)
Summing Up• Preservation as goal, but also as starting point.
• Excited to be able to understand long-term changes in news content & norms.
• Lot of work ahead of us.
• Appreciate any help or advice (or funding) you can offer.