Company presentation : August 2012 Company Presentation August 2012.
NaBIC 2012 presentation
-
Upload
stefan-sabo -
Category
Devices & Hardware
-
view
63 -
download
1
Transcript of NaBIC 2012 presentation
![Page 1: NaBIC 2012 presentation](https://reader035.fdocuments.in/reader035/viewer/2022071722/55c0c50ebb61ebb2198b459e/html5/thumbnails/1.jpg)
What's going on out there right now? A beehive based machine to give snapshot of the ongoing stories on the Web
Štefan Sabo and Pavol Návrat
![Page 2: NaBIC 2012 presentation](https://reader035.fdocuments.in/reader035/viewer/2022071722/55c0c50ebb61ebb2198b459e/html5/thumbnails/2.jpg)
General overview
• Method to extract keywords related to stories from news articles is proposed.
• Multiple agents inspired by honey bees foraging for food are used.
• Connections between articles are explored one keyword at a time.
• Most promising keywords that provide links between articles are propagated, uninteresting keywords are discarded.
![Page 3: NaBIC 2012 presentation](https://reader035.fdocuments.in/reader035/viewer/2022071722/55c0c50ebb61ebb2198b459e/html5/thumbnails/3.jpg)
Outline of presentation
• Motivation• Method overview• Results• Summary• Future work
![Page 4: NaBIC 2012 presentation](https://reader035.fdocuments.in/reader035/viewer/2022071722/55c0c50ebb61ebb2198b459e/html5/thumbnails/4.jpg)
Motivation
• News stories are often represented by terms that identify the story by providing an easily recognizable label for it.
• These keywords are interesting for navigation in the space of news stories.
• It is difficult to predict in advance which articles will develop into stories over time and which keywords will represent them.
• Dynamic system is needed to follow new articles and account for the changes in the old ones.
• Corpus of all the articles in unavailable.
![Page 5: NaBIC 2012 presentation](https://reader035.fdocuments.in/reader035/viewer/2022071722/55c0c50ebb61ebb2198b459e/html5/thumbnails/5.jpg)
Method overview
• Most representative keywords are chosen by comparing relevance of multiple articles to a given keyword.
• If two articles are both relevant to a keyword a link is established between them.
• Keywords that provide links between most articles are selected as most interesting.
• Comparison between every two articles regarding every keyword would be impractical.
• To facilitate the process of comparison, the process is performed by a swarm of agents inspired by honey bees.
![Page 6: NaBIC 2012 presentation](https://reader035.fdocuments.in/reader035/viewer/2022071722/55c0c50ebb61ebb2198b459e/html5/thumbnails/6.jpg)
Method overview - agents
• Every agent carries a single keyword at a time and can independently perform one of 3 actions:o foraging – comparing articleso dancing – propagating its current keywordo observing – selecting a new keyword
• Based on the keyword quality, an agent may decide to propagate an interesting keyword through dancing or select a new keyword through observation.
• This mechanism focuses the swarm on the most interesting keywords for currently visited articles.
![Page 7: NaBIC 2012 presentation](https://reader035.fdocuments.in/reader035/viewer/2022071722/55c0c50ebb61ebb2198b459e/html5/thumbnails/7.jpg)
Results
• News articles from Reuters web page have been checked daily for a period of 9 days.
• 298 unique keywords had been identified.• On average, 287 articles have been assigned a keywords
every day.• Increased prevalence of proper nouns amongst the top
keywords can be noted.
![Page 8: NaBIC 2012 presentation](https://reader035.fdocuments.in/reader035/viewer/2022071722/55c0c50ebb61ebb2198b459e/html5/thumbnails/8.jpg)
Results – best keywords
keyword n (k) n (k) / N keyword n (k) n (k) / N
Syria 177.30 6.87 % court 49.90 1.93 %
Egypt 98.10 3.80 % ECB 49.85 1.93 %
Apple 92.65 3.59 % attack 49.41 1.91%
Afghan 78.23 3.03 % Colorado 41.79 1.62 %
Euro 75.50 2.92 % trial 28.90 1.12 %
shooting 56.32 2.18 % Libor 27.75 1.07 %
Samsung 55.71 2.16 % murder 26.38 1.02 %
China 55.30 2.14 % Aleppo 25.31 0.98 %
![Page 9: NaBIC 2012 presentation](https://reader035.fdocuments.in/reader035/viewer/2022071722/55c0c50ebb61ebb2198b459e/html5/thumbnails/9.jpg)
Results – development over time
4.8. 5.8. 6.8. 7.8. 8.8. 9.8. 10.8. 11.8. 12.8.0
20
40
60
80
100
120
ColoradoChinashootingAfghanEgyptAppleEuroSyria
![Page 10: NaBIC 2012 presentation](https://reader035.fdocuments.in/reader035/viewer/2022071722/55c0c50ebb61ebb2198b459e/html5/thumbnails/10.jpg)
Summary
• Proposed approach utilizes agents inspired by honey bees foraging for food to extract story related keywords from a set of news articles.
• Articles are compared and their proximity is evaluated multiple times with regard to various keywords.
• To reduce the number of performed comparisons, agents use the mechanisms of propagation and observation to select the best keywords and discard those less desirable.
• Dynamic nature of the process enables agents to react to new articles as well as to changes in the old ones without need for article corpus or machine learning.
![Page 11: NaBIC 2012 presentation](https://reader035.fdocuments.in/reader035/viewer/2022071722/55c0c50ebb61ebb2198b459e/html5/thumbnails/11.jpg)
Future work
• Multi-level hierarchical grouping of keywords based on their generality.
• Visualization of stories.