High quality, low maintenance content tagging … quality, low maintenance content tagging @ ZEIT...
Transcript of High quality, low maintenance content tagging … quality, low maintenance content tagging @ ZEIT...
![Page 1: High quality, low maintenance content tagging … quality, low maintenance content tagging @ ZEIT Online Breno Faria, Christoph Goller](https://reader030.fdocuments.in/reader030/viewer/2022020316/5b4c988c7f8b9ad1338b9d77/html5/thumbnails/1.jpg)
High quality, low maintenance content tagging @ ZEIT Online Breno Faria, Christoph Goller
![Page 2: High quality, low maintenance content tagging … quality, low maintenance content tagging @ ZEIT Online Breno Faria, Christoph Goller](https://reader030.fdocuments.in/reader030/viewer/2022020316/5b4c988c7f8b9ad1338b9d77/html5/thumbnails/2.jpg)
About Us
(c) 2014 I IntraFind Software AG 2
IntraFind Software AG Elasticsearch Partner (we also do consulting)
Specialist for Information Retrieval and Text Analytics
Founded 2000, 30 employees
More than 850 customers mainly in Germany, Austria, and Switzerland
Lucene Committers: B. Messer, C. Goller
Independent Software Vendor, entirely self-financed
Products are a combination of Open Source Components and in-house Development
High quality Linguistic Analyzers for most European Languages (also available as Solr and Elasticsearch plugins)
Named Entity Recognition
Text Classification
Tagging Service – extraction of semantic meta data
![Page 3: High quality, low maintenance content tagging … quality, low maintenance content tagging @ ZEIT Online Breno Faria, Christoph Goller](https://reader030.fdocuments.in/reader030/viewer/2022020316/5b4c988c7f8b9ad1338b9d77/html5/thumbnails/3.jpg)
Outline
1. The ZEIT Online Project 2010 tagging and making the archive searchable
2. Editorial Workflow @ ZEIT Online
3. Feedback from the Editors
4. Meeting the Expectations
(c) 2014 I IntraFind Software AG 3
![Page 4: High quality, low maintenance content tagging … quality, low maintenance content tagging @ ZEIT Online Breno Faria, Christoph Goller](https://reader030.fdocuments.in/reader030/viewer/2022020316/5b4c988c7f8b9ad1338b9d77/html5/thumbnails/4.jpg)
The ZEIT Online Project
(c) 2014 I IntraFind Software AG 4
![Page 5: High quality, low maintenance content tagging … quality, low maintenance content tagging @ ZEIT Online Breno Faria, Christoph Goller](https://reader030.fdocuments.in/reader030/viewer/2022020316/5b4c988c7f8b9ad1338b9d77/html5/thumbnails/5.jpg)
The ZEIT Online Project
Die ZEIT is a weekly newspaper founded 1946, one of the most renowned in Germany
ZEIT Online, the web edition, exists since 1996
(c) 2014 I IntraFind Software AG 5
![Page 6: High quality, low maintenance content tagging … quality, low maintenance content tagging @ ZEIT Online Breno Faria, Christoph Goller](https://reader030.fdocuments.in/reader030/viewer/2022020316/5b4c988c7f8b9ad1338b9d77/html5/thumbnails/6.jpg)
The ZEIT Online Project
Die ZEIT is a weekly newspaper founded 1946, one of the most renowned in Germany
ZEIT Online, the web edition, exists since 1996
2010 organize entire archive based on semantic meta data and make it searchable
(c) 2014 I IntraFind Software AG 6
![Page 7: High quality, low maintenance content tagging … quality, low maintenance content tagging @ ZEIT Online Breno Faria, Christoph Goller](https://reader030.fdocuments.in/reader030/viewer/2022020316/5b4c988c7f8b9ad1338b9d77/html5/thumbnails/7.jpg)
The ZEIT Online Project
Die ZEIT is a weekly newspaper founded 1946, one of the most renowned in Germany
ZEIT Online, the web edition, exists since 1996
2010 organize entire archive based on semantic meta data and make it searchable
Persons, locations and organizations mentioned
(c) 2014 I IntraFind Software AG 7
![Page 8: High quality, low maintenance content tagging … quality, low maintenance content tagging @ ZEIT Online Breno Faria, Christoph Goller](https://reader030.fdocuments.in/reader030/viewer/2022020316/5b4c988c7f8b9ad1338b9d77/html5/thumbnails/8.jpg)
The ZEIT Online Project
Die ZEIT is a weekly newspaper founded 1946, one of the most renowned in Germany
ZEIT Online, the web edition, exists since 1996
2010 organize entire archive based on semantic meta data and make it searchable
Persons, locations and organizations mentioned
Statistically significant keywords
(c) 2014 I IntraFind Software AG 8
![Page 9: High quality, low maintenance content tagging … quality, low maintenance content tagging @ ZEIT Online Breno Faria, Christoph Goller](https://reader030.fdocuments.in/reader030/viewer/2022020316/5b4c988c7f8b9ad1338b9d77/html5/thumbnails/9.jpg)
The ZEIT Online Project
Die ZEIT is a weekly newspaper founded 1946, one of the most renowned in Germany
ZEIT Online, the web edition, exists since 1996
2010 organize entire archive based on semantic meta data and make it searchable
Persons, locations and organizations mentioned
Statistically significant keywords
Classification into corresponding department
(c) 2014 I IntraFind Software AG 9
![Page 10: High quality, low maintenance content tagging … quality, low maintenance content tagging @ ZEIT Online Breno Faria, Christoph Goller](https://reader030.fdocuments.in/reader030/viewer/2022020316/5b4c988c7f8b9ad1338b9d77/html5/thumbnails/10.jpg)
The ZEIT Online Project
Amazingly, there is an API for accessing this tagged content! See developer.zeit.de
(c) 2014 I IntraFind Software AG 10
![Page 11: High quality, low maintenance content tagging … quality, low maintenance content tagging @ ZEIT Online Breno Faria, Christoph Goller](https://reader030.fdocuments.in/reader030/viewer/2022020316/5b4c988c7f8b9ad1338b9d77/html5/thumbnails/11.jpg)
Editorial Workflow @ ZEIT Online
(c) 2014 I IntraFind Software AG 11
![Page 12: High quality, low maintenance content tagging … quality, low maintenance content tagging @ ZEIT Online Breno Faria, Christoph Goller](https://reader030.fdocuments.in/reader030/viewer/2022020316/5b4c988c7f8b9ad1338b9d77/html5/thumbnails/12.jpg)
Editorial Workflow @ ZEIT Online
Second step in the project was to integrate the content tagging system into the editorial workflow @ ZEIT Online
(c) 2014 I IntraFind Software AG 12
![Page 13: High quality, low maintenance content tagging … quality, low maintenance content tagging @ ZEIT Online Breno Faria, Christoph Goller](https://reader030.fdocuments.in/reader030/viewer/2022020316/5b4c988c7f8b9ad1338b9d77/html5/thumbnails/13.jpg)
Editorial Workflow @ ZEIT Online
Second step in the project was to integrate the content tagging system into the editorial workflow @ ZEIT Online
(c) 2014 I IntraFind Software AG 13
![Page 14: High quality, low maintenance content tagging … quality, low maintenance content tagging @ ZEIT Online Breno Faria, Christoph Goller](https://reader030.fdocuments.in/reader030/viewer/2022020316/5b4c988c7f8b9ad1338b9d77/html5/thumbnails/14.jpg)
Editorial Workflow @ ZEIT Online
Second step in the project was to integrate the content tagging system into the editorial workflow @ ZEIT Online
(c) 2014 I IntraFind Software AG 14
![Page 15: High quality, low maintenance content tagging … quality, low maintenance content tagging @ ZEIT Online Breno Faria, Christoph Goller](https://reader030.fdocuments.in/reader030/viewer/2022020316/5b4c988c7f8b9ad1338b9d77/html5/thumbnails/15.jpg)
Editorial Workflow @ ZEIT Online
Second step in the project was to integrate the content tagging system into the editorial workflow @ ZEIT Online
(c) 2014 I IntraFind Software AG 15
![Page 16: High quality, low maintenance content tagging … quality, low maintenance content tagging @ ZEIT Online Breno Faria, Christoph Goller](https://reader030.fdocuments.in/reader030/viewer/2022020316/5b4c988c7f8b9ad1338b9d77/html5/thumbnails/16.jpg)
Editorial Workflow @ ZEIT Online
Second step in the project was to integrate the content tagging system into the editorial workflow @ ZEIT Online
(c) 2014 I IntraFind Software AG 16
![Page 17: High quality, low maintenance content tagging … quality, low maintenance content tagging @ ZEIT Online Breno Faria, Christoph Goller](https://reader030.fdocuments.in/reader030/viewer/2022020316/5b4c988c7f8b9ad1338b9d77/html5/thumbnails/17.jpg)
Editorial Workflow @ ZEIT Online
Second step in the project was to integrate the content tagging system into the editorial workflow @ ZEIT Online
(c) 2014 I IntraFind Software AG 17
![Page 18: High quality, low maintenance content tagging … quality, low maintenance content tagging @ ZEIT Online Breno Faria, Christoph Goller](https://reader030.fdocuments.in/reader030/viewer/2022020316/5b4c988c7f8b9ad1338b9d77/html5/thumbnails/18.jpg)
Editorial Workflow @ ZEIT Online
(c) 2014 I IntraFind Software AG 18
![Page 19: High quality, low maintenance content tagging … quality, low maintenance content tagging @ ZEIT Online Breno Faria, Christoph Goller](https://reader030.fdocuments.in/reader030/viewer/2022020316/5b4c988c7f8b9ad1338b9d77/html5/thumbnails/19.jpg)
Editorial Workflow @ ZEIT Online
It's not as simple as that
Keywords will be visible to humans! you cannot rely on a robot's good judgement and publish everything that comes out…
(c) 2014 I IntraFind Software AG 19
![Page 20: High quality, low maintenance content tagging … quality, low maintenance content tagging @ ZEIT Online Breno Faria, Christoph Goller](https://reader030.fdocuments.in/reader030/viewer/2022020316/5b4c988c7f8b9ad1338b9d77/html5/thumbnails/20.jpg)
Editorial Workflow @ ZEIT Online
It's not as simple as that
Keywords will be visible to humans! you cannot rely on a robot's good judgement and publish everything that comes out…
Ever heard of "inter-indexer consistency"? it probably wouldn't work letting every editor choose freely
(c) 2014 I IntraFind Software AG 20
![Page 21: High quality, low maintenance content tagging … quality, low maintenance content tagging @ ZEIT Online Breno Faria, Christoph Goller](https://reader030.fdocuments.in/reader030/viewer/2022020316/5b4c988c7f8b9ad1338b9d77/html5/thumbnails/21.jpg)
Editorial Workflow @ ZEIT Online
It's not as simple as that
Keywords will be visible to humans! you cannot rely on a robot's good judgement and publish everything that comes out…
Ever heard of "inter-indexer consistency"? it probably wouldn't work letting every editor choose freely
Solution:
curated list of allowed keywords
AND editor picks a subset of allowed keywords for the article
(c) 2014 I IntraFind Software AG 21
![Page 22: High quality, low maintenance content tagging … quality, low maintenance content tagging @ ZEIT Online Breno Faria, Christoph Goller](https://reader030.fdocuments.in/reader030/viewer/2022020316/5b4c988c7f8b9ad1338b9d77/html5/thumbnails/22.jpg)
Editorial Workflow @ ZEIT Online
It's not as simple as that
Keywords will be visible to humans! you cannot rely on a robot's good judgement and publish everything that comes out…
Ever heard of "inter-indexer consistency"? it probably wouldn't work letting every editor choose freely
Solution:
curated list of allowed keywords
AND editor picks a subset of allowed keywords for the article
Curating the keyword list is expensive
… going through large lists of keyword candidates also
(c) 2014 I IntraFind Software AG 22
![Page 23: High quality, low maintenance content tagging … quality, low maintenance content tagging @ ZEIT Online Breno Faria, Christoph Goller](https://reader030.fdocuments.in/reader030/viewer/2022020316/5b4c988c7f8b9ad1338b9d77/html5/thumbnails/23.jpg)
Editorial Workflow @ ZEIT Online
It's not as simple as that
Keywords will be visible to humans! you cannot rely on a robot's good judgement and publish everything that comes out…
Ever heard of "inter-indexer consistency"? it probably wouldn't work letting every editor choose freely
Solution:
curated list of allowed keywords
AND editor picks a subset of allowed keywords for the article
Curating the keyword list is expensive
… going through large lists of keyword candidates also we want to solve this problem
(c) 2014 I IntraFind Software AG 23
![Page 24: High quality, low maintenance content tagging … quality, low maintenance content tagging @ ZEIT Online Breno Faria, Christoph Goller](https://reader030.fdocuments.in/reader030/viewer/2022020316/5b4c988c7f8b9ad1338b9d77/html5/thumbnails/24.jpg)
Feedback from the editorial staff
(c) 2014 I IntraFind Software AG 24
![Page 25: High quality, low maintenance content tagging … quality, low maintenance content tagging @ ZEIT Online Breno Faria, Christoph Goller](https://reader030.fdocuments.in/reader030/viewer/2022020316/5b4c988c7f8b9ad1338b9d77/html5/thumbnails/25.jpg)
Feedback from the editorial staff
Tradeoff: relevance vs. completeness
(c) 2014 I IntraFind Software AG 25
![Page 26: High quality, low maintenance content tagging … quality, low maintenance content tagging @ ZEIT Online Breno Faria, Christoph Goller](https://reader030.fdocuments.in/reader030/viewer/2022020316/5b4c988c7f8b9ad1338b9d77/html5/thumbnails/26.jpg)
Feedback from the editorial staff
Tradeoff: relevance vs. completeness
(c) 2014 I IntraFind Software AG 26
generic better than specific (Stuxnet vs. Stuxnet-Virus) expand to similar keywords (Prism NSA) no 'stop-keywords' (e.g. Angela Merkel) no out-of-context keywords consider trends!
![Page 27: High quality, low maintenance content tagging … quality, low maintenance content tagging @ ZEIT Online Breno Faria, Christoph Goller](https://reader030.fdocuments.in/reader030/viewer/2022020316/5b4c988c7f8b9ad1338b9d77/html5/thumbnails/27.jpg)
Feedback from the editorial staff
Tradeoff: relevance vs. completeness
(c) 2014 I IntraFind Software AG 27
generic better than specific (Stuxnet vs. Stuxnet-Virus) expand to similar keywords (Prism NSA) no 'stop-keywords' (e.g. Angela Merkel) no out-of-context keywords consider trends!
all possible keywords, don't miss anything!
![Page 28: High quality, low maintenance content tagging … quality, low maintenance content tagging @ ZEIT Online Breno Faria, Christoph Goller](https://reader030.fdocuments.in/reader030/viewer/2022020316/5b4c988c7f8b9ad1338b9d77/html5/thumbnails/28.jpg)
Feedback from the editorial staff
Tradeoff: relevance vs. completeness
(c) 2014 I IntraFind Software AG 28
generic better than specific (Stuxnet vs. Stuxnet-Virus) expand to similar keywords (Prism NSA) no 'stop-keywords' (e.g. Angela Merkel) no out-of-context keywords consider trends!
all possible keywords, don't miss anything!
Oh, and please don't make us work more with your changes.
![Page 29: High quality, low maintenance content tagging … quality, low maintenance content tagging @ ZEIT Online Breno Faria, Christoph Goller](https://reader030.fdocuments.in/reader030/viewer/2022020316/5b4c988c7f8b9ad1338b9d77/html5/thumbnails/29.jpg)
Meeting the Expectations
(c) 2014 I IntraFind Software AG 29
![Page 30: High quality, low maintenance content tagging … quality, low maintenance content tagging @ ZEIT Online Breno Faria, Christoph Goller](https://reader030.fdocuments.in/reader030/viewer/2022020316/5b4c988c7f8b9ad1338b9d77/html5/thumbnails/30.jpg)
Meeting the Expectations
Provide a perfect ranking of keywords
(c) 2014 I IntraFind Software AG 30
![Page 31: High quality, low maintenance content tagging … quality, low maintenance content tagging @ ZEIT Online Breno Faria, Christoph Goller](https://reader030.fdocuments.in/reader030/viewer/2022020316/5b4c988c7f8b9ad1338b9d77/html5/thumbnails/31.jpg)
Meeting the Expectations
Provide a perfect ranking of keywords
This allows us to present only the relevant keywords to the editor
(c) 2014 I IntraFind Software AG 31
![Page 32: High quality, low maintenance content tagging … quality, low maintenance content tagging @ ZEIT Online Breno Faria, Christoph Goller](https://reader030.fdocuments.in/reader030/viewer/2022020316/5b4c988c7f8b9ad1338b9d77/html5/thumbnails/32.jpg)
Meeting the Expectations
Provide a perfect ranking of keywords
This allows us to present only the relevant keywords to the editor
… and we still have all possible keywords for the archive
(c) 2014 I IntraFind Software AG 32
![Page 33: High quality, low maintenance content tagging … quality, low maintenance content tagging @ ZEIT Online Breno Faria, Christoph Goller](https://reader030.fdocuments.in/reader030/viewer/2022020316/5b4c988c7f8b9ad1338b9d77/html5/thumbnails/33.jpg)
Meeting the Expectations Baseline Scoring
First problem: how do we compare apples and bananas? (different sorts of entities and keywords)
(c) 2014 I IntraFind Software AG 33
![Page 34: High quality, low maintenance content tagging … quality, low maintenance content tagging @ ZEIT Online Breno Faria, Christoph Goller](https://reader030.fdocuments.in/reader030/viewer/2022020316/5b4c988c7f8b9ad1338b9d77/html5/thumbnails/34.jpg)
Meeting the Expectations Baseline Scoring
First problem: how do we compare apples and bananas? (different sorts of entities and keywords)
We will compute the document hit count in the archive by searching for each tag found
(c) 2014 I IntraFind Software AG 34
![Page 35: High quality, low maintenance content tagging … quality, low maintenance content tagging @ ZEIT Online Breno Faria, Christoph Goller](https://reader030.fdocuments.in/reader030/viewer/2022020316/5b4c988c7f8b9ad1338b9d77/html5/thumbnails/35.jpg)
Meeting the Expectations Baseline Scoring
First problem: how do we compare apples and bananas? (different sorts of entities and keywords)
We will compute the document hit count in the archive by searching for each tag found
We can rely on our linguistic analyzers to account for different forms of the same tag: e.g. „Bundeswirtschaftsminister“ == „Bundesminister für Wirtschaft“
(c) 2014 I IntraFind Software AG 35
![Page 36: High quality, low maintenance content tagging … quality, low maintenance content tagging @ ZEIT Online Breno Faria, Christoph Goller](https://reader030.fdocuments.in/reader030/viewer/2022020316/5b4c988c7f8b9ad1338b9d77/html5/thumbnails/36.jpg)
Meeting the Expectations Baseline Scoring
First problem: how do we compare apples and bananas? (different sorts of entities and keywords)
We will compute the document hit count in the archive by searching for each tag found
We can rely on our linguistic analyzers to account for different forms of the same tag: e.g. „Bundeswirtschaftsminister“ == „Bundesminister für Wirtschaft“
Use a Lucene Similarity to compute the TFIDF of each tag
(c) 2014 I IntraFind Software AG 36
![Page 37: High quality, low maintenance content tagging … quality, low maintenance content tagging @ ZEIT Online Breno Faria, Christoph Goller](https://reader030.fdocuments.in/reader030/viewer/2022020316/5b4c988c7f8b9ad1338b9d77/html5/thumbnails/37.jpg)
Meeting the Expectations Baseline Scoring
First problem: how do we compare apples and bananas? (different sorts of entities and keywords)
We will compute the document hit count in the archive by searching for each tag found
We can rely on our linguistic analyzers to account for different forms of the same tag: e.g. „Bundeswirtschaftsminister“ == „Bundesminister für Wirtschaft“
Use a Lucene Similarity to compute the TFIDF of each tag
(c) 2014 I IntraFind Software AG 37
![Page 38: High quality, low maintenance content tagging … quality, low maintenance content tagging @ ZEIT Online Breno Faria, Christoph Goller](https://reader030.fdocuments.in/reader030/viewer/2022020316/5b4c988c7f8b9ad1338b9d77/html5/thumbnails/38.jpg)
Meeting the Expectations Baseline Scoring
First problem: how do we compare apples and bananas? (different sorts of entities and keywords)
We will compute the document hit count in the archive by searching for each tag found
We can rely on our linguistic analyzers to account for different forms of the same tag: e.g. „Bundeswirtschaftsminister“ == „Bundesminister für Wirtschaft“
Use a Lucene Similarity to compute the TFIDF of each tag
(c) 2014 I IntraFind Software AG 38
might hurt context
![Page 39: High quality, low maintenance content tagging … quality, low maintenance content tagging @ ZEIT Online Breno Faria, Christoph Goller](https://reader030.fdocuments.in/reader030/viewer/2022020316/5b4c988c7f8b9ad1338b9d77/html5/thumbnails/39.jpg)
Meeting the Expectations Context Scoring
Idea: compare the document with other documents containing a particular tag
(c) 2014 I IntraFind Software AG 39
![Page 40: High quality, low maintenance content tagging … quality, low maintenance content tagging @ ZEIT Online Breno Faria, Christoph Goller](https://reader030.fdocuments.in/reader030/viewer/2022020316/5b4c988c7f8b9ad1338b9d77/html5/thumbnails/40.jpg)
Meeting the Expectations Context Scoring
Idea: compare the document with other documents containing a particular tag
compute typical contexts of tag
(c) 2014 I IntraFind Software AG 40
![Page 41: High quality, low maintenance content tagging … quality, low maintenance content tagging @ ZEIT Online Breno Faria, Christoph Goller](https://reader030.fdocuments.in/reader030/viewer/2022020316/5b4c988c7f8b9ad1338b9d77/html5/thumbnails/41.jpg)
Meeting the Expectations Context Scoring
Idea: compare the document with other documents containing a particular tag
compute typical contexts of tag
these contexts are a kind of prototypical document for all documents containing the keyword
(c) 2014 I IntraFind Software AG 41
![Page 42: High quality, low maintenance content tagging … quality, low maintenance content tagging @ ZEIT Online Breno Faria, Christoph Goller](https://reader030.fdocuments.in/reader030/viewer/2022020316/5b4c988c7f8b9ad1338b9d77/html5/thumbnails/42.jpg)
Meeting the Expectations Context Scoring
Idea: compare the document with other documents containing a particular tag
compute typical contexts of tag
these contexts are a kind of prototypical document for all documents containing the keyword
we compare the current context with this prototypical context, i.e. we compute a similarity
(c) 2014 I IntraFind Software AG 42
![Page 43: High quality, low maintenance content tagging … quality, low maintenance content tagging @ ZEIT Online Breno Faria, Christoph Goller](https://reader030.fdocuments.in/reader030/viewer/2022020316/5b4c988c7f8b9ad1338b9d77/html5/thumbnails/43.jpg)
Meeting the Expectations Context Scoring
Idea: compare the document with other documents containing a particular tag
compute typical contexts of tag
these contexts are a kind of prototypical document for all documents containing the keyword
we compare the current context with this prototypical context, i.e. we compute a similarity
(c) 2014 I IntraFind Software AG 43
![Page 44: High quality, low maintenance content tagging … quality, low maintenance content tagging @ ZEIT Online Breno Faria, Christoph Goller](https://reader030.fdocuments.in/reader030/viewer/2022020316/5b4c988c7f8b9ad1338b9d77/html5/thumbnails/44.jpg)
Meeting the Expectations Context Scoring
Idea: compare the document with other documents containing a particular tag
compute typical contexts of tag
these contexts are a kind of prototypical document for all documents containing the keyword
we compare the current context with this prototypical context, i.e. we compute a similarity
We can use the same method to expand our tags with related keywords!
(c) 2014 I IntraFind Software AG 44
![Page 45: High quality, low maintenance content tagging … quality, low maintenance content tagging @ ZEIT Online Breno Faria, Christoph Goller](https://reader030.fdocuments.in/reader030/viewer/2022020316/5b4c988c7f8b9ad1338b9d77/html5/thumbnails/45.jpg)
Meeting the Expectations Trend Scoring
But what if the mention of "Schweinsteiger" is not incidental? Maybe it's world cup time?
(c) 2014 I IntraFind Software AG 45
![Page 46: High quality, low maintenance content tagging … quality, low maintenance content tagging @ ZEIT Online Breno Faria, Christoph Goller](https://reader030.fdocuments.in/reader030/viewer/2022020316/5b4c988c7f8b9ad1338b9d77/html5/thumbnails/46.jpg)
Meeting the Expectations Trend Scoring
But what if the mention of "Schweinsteiger" is not incidental? Maybe it's world cup time?
In our case, trend is a measure of variation of hit counts in a timespan
We can compute trends from our archive, by counting hits in different timespans
(c) 2014 I IntraFind Software AG 46
![Page 47: High quality, low maintenance content tagging … quality, low maintenance content tagging @ ZEIT Online Breno Faria, Christoph Goller](https://reader030.fdocuments.in/reader030/viewer/2022020316/5b4c988c7f8b9ad1338b9d77/html5/thumbnails/47.jpg)
Meeting the Expectations Trend Scoring
But what if the mention of "Schweinsteiger" is not incidental? Maybe it's world cup time?
In our case, trend is a measure of variation of hit counts in a timespan
We can compute trends from our archive, by counting hits in different timespans
(c) 2014 I IntraFind Software AG 47
![Page 48: High quality, low maintenance content tagging … quality, low maintenance content tagging @ ZEIT Online Breno Faria, Christoph Goller](https://reader030.fdocuments.in/reader030/viewer/2022020316/5b4c988c7f8b9ad1338b9d77/html5/thumbnails/48.jpg)
Meeting the Expectations Trend Scoring
But what if the mention of "Schweinsteiger" is not incidental? Maybe it's world cup time?
In our case, trend is a measure of variation of hit counts in a timespan
We can compute trends from our archive, by counting hits in different timespans
(c) 2014 I IntraFind Software AG 48
![Page 49: High quality, low maintenance content tagging … quality, low maintenance content tagging @ ZEIT Online Breno Faria, Christoph Goller](https://reader030.fdocuments.in/reader030/viewer/2022020316/5b4c988c7f8b9ad1338b9d77/html5/thumbnails/49.jpg)
Meeting the Expectations Consolidating Scores
(c) 2014 I IntraFind Software AG 49
![Page 50: High quality, low maintenance content tagging … quality, low maintenance content tagging @ ZEIT Online Breno Faria, Christoph Goller](https://reader030.fdocuments.in/reader030/viewer/2022020316/5b4c988c7f8b9ad1338b9d77/html5/thumbnails/50.jpg)
Meeting the Expectations Consolidating Scores
We combine the scores by
1. Individually scaling them onto the same interval
(c) 2014 I IntraFind Software AG 50
![Page 51: High quality, low maintenance content tagging … quality, low maintenance content tagging @ ZEIT Online Breno Faria, Christoph Goller](https://reader030.fdocuments.in/reader030/viewer/2022020316/5b4c988c7f8b9ad1338b9d77/html5/thumbnails/51.jpg)
Meeting the Expectations Consolidating Scores
We combine the scores by
1. Individually scaling them onto the same interval
2. Multiplying each one by a weight
(c) 2014 I IntraFind Software AG 51
![Page 52: High quality, low maintenance content tagging … quality, low maintenance content tagging @ ZEIT Online Breno Faria, Christoph Goller](https://reader030.fdocuments.in/reader030/viewer/2022020316/5b4c988c7f8b9ad1338b9d77/html5/thumbnails/52.jpg)
Meeting the Expectations Consolidating Scores
We combine the scores by
1. Individually scaling them onto the same interval
2. Multiplying each one by a weight
3. Summing up and again scaling the result
(c) 2014 I IntraFind Software AG 52
![Page 53: High quality, low maintenance content tagging … quality, low maintenance content tagging @ ZEIT Online Breno Faria, Christoph Goller](https://reader030.fdocuments.in/reader030/viewer/2022020316/5b4c988c7f8b9ad1338b9d77/html5/thumbnails/53.jpg)
Meeting the Expectations Consolidating Scores
We combine the scores by
1. Individually scaling them onto the same interval
2. Multiplying each one by a weight
3. Summing up and again scaling the result
There's a lot to configure, and there is no such thing as the perfect configuration
(c) 2014 I IntraFind Software AG 53
![Page 54: High quality, low maintenance content tagging … quality, low maintenance content tagging @ ZEIT Online Breno Faria, Christoph Goller](https://reader030.fdocuments.in/reader030/viewer/2022020316/5b4c988c7f8b9ad1338b9d77/html5/thumbnails/54.jpg)
Meeting the Expectations Consolidating Scores
We combine the scores by
1. Individually scaling them onto the same interval
2. Multiplying each one by a weight
3. Summing up and again scaling the result
There's a lot to configure, and there is no such thing as the perfect configuration
ZEIT Online has the freedom to fine-tune the ranking
(c) 2014 I IntraFind Software AG 54
![Page 55: High quality, low maintenance content tagging … quality, low maintenance content tagging @ ZEIT Online Breno Faria, Christoph Goller](https://reader030.fdocuments.in/reader030/viewer/2022020316/5b4c988c7f8b9ad1338b9d77/html5/thumbnails/55.jpg)
Summary
Requirements of an editorial office on a tagging system are complex
Tradeoff between relevance and completeness of tags
You need both. We can solve this problem the same way information retrieval systems have ranking
There is a lot one can do to enrich tags only by looking at a representative archive
(c) 2014 I IntraFind Software AG 55
![Page 56: High quality, low maintenance content tagging … quality, low maintenance content tagging @ ZEIT Online Breno Faria, Christoph Goller](https://reader030.fdocuments.in/reader030/viewer/2022020316/5b4c988c7f8b9ad1338b9d77/html5/thumbnails/56.jpg)
Thanks for Listening
(c) 2014 I IntraFind Software AG 56
Thanks to Ron Drongowski and the ZEIT Online team!
Breno Faria (@brealbfar) & Christoph Goller (@ChGoller)
Phone: +49 89 3090446-0
Fax: +49 89 3090446-29
Email: {christoph.goller,breno.faria}@intrafind.de
Web: www.intrafind.de
IntraFind Software AG
Landsberger Straße 368
80687 München
Germany
The persons graph and most screen-shots are copyright material of ZEIT Online.
![Page 57: High quality, low maintenance content tagging … quality, low maintenance content tagging @ ZEIT Online Breno Faria, Christoph Goller](https://reader030.fdocuments.in/reader030/viewer/2022020316/5b4c988c7f8b9ad1338b9d77/html5/thumbnails/57.jpg)
(c) 2014 I IntraFind Software AG 57
NOW -64d -32d -16d -8d -4d
n64
n64 – n32 n32
n32 – n16 n16
N spans N queries N-1 trends