Download - Trend Detection and Visualization and Custom Search Applications

Transcript
Page 1: Trend Detection and Visualization and Custom Search Applications

Trend Detection and Visualizationand

Custom Search ApplicationsSeminar for

PG PUSHPIN

January 12, 2012

Pranav Kadam (6641525)

Universität Paderborn

Page 2: Trend Detection and Visualization and Custom Search Applications

Overview

• Trend Detection

- Trend Detection in Numbers

- Trend Detection in Text

- Trend Visualization

• Custom Search Applications

- Apache Solr

- Semantic Search

- Linked Data Approach

2Trend Detection and Visualization and Custom Search Applications

Page 3: Trend Detection and Visualization and Custom Search Applications

Overview

• Prototypes

• Q&A

3Trend Detection and Visualization and Custom Search Applications

Page 4: Trend Detection and Visualization and Custom Search Applications

Trend Detection

4Trend Detection and Visualization and Custom Search Applications

Page 5: Trend Detection and Visualization and Custom Search Applications

Trend Detection

What is a trend?

• A general direction in which something is changing

• An inclination

• A pattern of gradual change in a condition over time

• A trend is

- always associated with time

- often described using ‘time series‘

• Long term change in the mean level of a ‘time series‘.

5Trend Detection and Visualization and Custom Search Applications

Page 6: Trend Detection and Visualization and Custom Search Applications

Trend Detection

Trend Analysis

• Practice of collecting information and trying to detect

trend in it

• Process of identifying pattern in behavior of a time

series by minimising noise

• Useful in forecasting future events

• Science of studying changes in social patterns

E.g. Google Trends, Youtube Trends, trendwatching.com,

Facebook Insights, Tag Cloud(on PG PUSHPIN blog)

6Trend Detection and Visualization and Custom Search Applications

Page 7: Trend Detection and Visualization and Custom Search Applications

Trend Detection

Trend Detection in Numbers

7Trend Detection and Visualization and Custom Search Applications

Page 8: Trend Detection and Visualization and Custom Search Applications

Trend Detection in Numbers

Time series and statistical methods

• Time series: ordered sequence of values at equally

spaced time intervals

• Trend detection in numbers: Statistical methods to

interpret time series and determine behavior

• Assumption: pattern in past data can be used to forecast

future data points

• Models: AutoRegressive(AR), Integrated(I), Moving

Average(MA)

8Trend Detection and Visualization and Custom Search Applications

Page 9: Trend Detection and Visualization and Custom Search Applications

Trend Detection in Numbers

Moving Average

• Average of time series data taken at consecutive periods

• New data in, old data out as the series progresses

E.g. MA of temperature for six months: Temp from January

to June, February to July, March to August, and so on.

• Minimizes temporal fluctuations

• Establishes trend, distinguishes any value above or

below trendline

• Applications in fields of Financial analysis, Trade,

Economics, Mathematics9Trend Detection and Visualization and Custom Search Applications

Page 10: Trend Detection and Visualization and Custom Search Applications

Trend Detection in Numbers

Moving Average

• Simple Moving Average: Plain average of data points

over specific no. of periods

• Period selected can be short, medium or long according

to interest (E.g. standard periods of SMA for stock

market analysis is 50 days or 200 days)

• Longer the period gives smoother curve but increases

the lag

• SMA always lags behind the latest data point

10Trend Detection and Visualization and Custom Search Applications

Page 11: Trend Detection and Visualization and Custom Search Applications

Trend Detection in Numbers

Moving Average

• Exponential Moving Average: Weight applied to the data

pointa to reduce the lag

• Weight decreases exponentially and never reaches zero

• EMA has less lag and is more sensitive to the changes in

data points

• SMA vs EMA: Though difference is apparent, either one

cannot be stated as better over the other

MA preference depends on objectives & time horizon

11Trend Detection and Visualization and Custom Search Applications

Page 12: Trend Detection and Visualization and Custom Search Applications

Trend Detection

Trend Detection in Text

12Trend Detection and Visualization and Custom Search Applications

Page 13: Trend Detection and Visualization and Custom Search Applications

Trend Detection in Text

Trend detection system

• Emerging Trend: Topic area growing in interest and

utility over time

• Study of emerging trend dependent on automated

process

• TD system processes collection of textual data and

identifies upward(growing), downward(falling) or

sideway(constant) tendency

• TD then highlights the emerging topics in trial period

13Trend Detection and Visualization and Custom Search Applications

Page 14: Trend Detection and Visualization and Custom Search Applications

Trend Detection in Text

Trend detection system

• Trend detection methods can be classified as:

- Fully-automatic

- Semi-automatic

• Fully-automatic systems:

- It generates a list of emerging topics from the

input(collection of texual data)

- Reviewer examines data & evidence provided to conclude

actual emerging trends

- Results supported with graphical visualization

14Trend Detection and Visualization and Custom Search Applications

Page 15: Trend Detection and Visualization and Custom Search Applications

Trend Detection in Text

Trend detection system

• Semi-automatic:

- User inputs a topic

- System outputs the evidence that helps to determine that

the topic is emerging or not

- Evidence provided either as a summary or a descriptive

report

15Trend Detection and Visualization and Custom Search Applications

Page 16: Trend Detection and Visualization and Custom Search Applications

Trend Detection in Text

Useful models, schemes and tools

• Term-Document Matrix

• Scheme: Term Frequency – Inverse Document

Frequency (tf-idf)

• Latent Semantic Analysis

• Science Citation Index or Web of Science database

• Inspec, Compendex database

16Trend Detection and Visualization and Custom Search Applications

Page 17: Trend Detection and Visualization and Custom Search Applications

Trend Detection in Text

Approches for Trend Detection

1. Tracing a trend via citation linkages:

- Determine a potential trend or select a topic of interest

- Find recent documents on the topic

- Examine whether they really discuss the topic

- Extract keywords

- Fetch abstract of the documents those are frequently

referenced using citation information

- Examine abstract to verify relation with topic

17Trend Detection and Visualization and Custom Search Applications

Page 18: Trend Detection and Visualization and Custom Search Applications

Trend Detection in Text

Approches for Trend Detection

1. Tracing a trend via citation linkages:

- Examine the references used above and make a subset

where author names are referenced in more than, say, 3

documents

- As an improvement, query the repositories of citation

linkage information and other sources

- Graph document frequency, repeated authors and no. of

venues by year

18Trend Detection and Visualization and Custom Search Applications

Page 19: Trend Detection and Visualization and Custom Search Applications

Trend Detection in Text

Approches for Trend Detection

1. Tracing a trend via citation linkages:

- Years with overall higher document frequency are likely

to have points where trend is emerging

Finally, to determine trend, apply a series of thresholds

like atleast one repeated author, atleast 10 venues

present, etc.

19Trend Detection and Visualization and Custom Search Applications

Page 20: Trend Detection and Visualization and Custom Search Applications

Trend Detection in Text

Approches for Trend Detection

2. Using web resources:

- Select a main topic area first

- Knowledge in this area is essential to identify trends in

later stages

- Validate it as a possible research area using sources like

Inspec database

- Search workshop websites and technical papers for

discussions on the main topic area

20Trend Detection and Visualization and Custom Search Applications

Page 21: Trend Detection and Visualization and Custom Search Applications

Trend Detection in Text

Approches for Trend Detection

2. Using web resources:

- Search web using helper terms like

most recent contribution, hot topic, cutting edge strategy, etc

- Again search an indexing database with

main topic ‘AND‘ newly found candiate trend

from year of origin to current year

21Trend Detection and Visualization and Custom Search Applications

Page 22: Trend Detection and Visualization and Custom Search Applications

Trend Detection in Text

Approches for Trend Detection

2. Using web resources:

If document frequency increases over the years, the

candidate trend is a genuine trend

x If documents from same author appear in different years

its not a trend

22Trend Detection and Visualization and Custom Search Applications

Page 23: Trend Detection and Visualization and Custom Search Applications

Trend Detection

Trend Visualization

23Trend Detection and Visualization and Custom Search Applications

Page 24: Trend Detection and Visualization and Custom Search Applications

Trend Visualization

Trend visualization techniques

• Trends can be visualized using

- Line graphs

- Bar graphs

- Word clouds

- Frequency tables

- Sparklines

- Histograms

24Trend Detection and Visualization and Custom Search Applications

Page 25: Trend Detection and Visualization and Custom Search Applications

Trend Visualization

Other ways to visualize trends

• ThemeRiver

- Visualizes thematic variations over time

- Changing widths depict changes in thematic strength of

the associated documents

- Flow represents time

- Colors represent themes

- Vertical section represents an ordered time slice

25Trend Detection and Visualization and Custom Search Applications

Page 26: Trend Detection and Visualization and Custom Search Applications

Trend Visualization

Other ways to visualize trends

• ThemeRiver

26Trend Detection and Visualization and Custom Search Applications

Page 27: Trend Detection and Visualization and Custom Search Applications

Trend Visualization

Other ways to visualize trends

• ThemeRiver

- Assigning same color group to related themes simplify its

tracking

27Trend Detection and Visualization and Custom Search Applications

Page 28: Trend Detection and Visualization and Custom Search Applications

Trend Visualization

Other ways to visualize trends

• SparkClouds

- SparkClouds= Sparklines + Tag Clouds

- Sparkline, characterized by small size and high data density,

visualize trends and variations in a simple condensed way

28Trend Detection and Visualization and Custom Search Applications

Page 29: Trend Detection and Visualization and Custom Search Applications

Trend Visualization

Other ways to visualize trends

• SparkClouds

- Tag clouds are text based

visualizations showing

frequency, popularity or

importance of words

29Trend Detection and Visualization and Custom Search Applications

Page 30: Trend Detection and Visualization and Custom Search Applications

Trend Visualization

Other ways to visualize trends

• SparkClouds

- Sparklines are added to tag clouds to represent trend across

series of tag clouds

- Overview of trends provided in limited space

- Its compact and aesthetic

30Trend Detection and Visualization and Custom Search Applications

Page 31: Trend Detection and Visualization and Custom Search Applications

Custom Search Applications

31Trend Detection and Visualization and Custom Search Applications

Page 32: Trend Detection and Visualization and Custom Search Applications

Custom Search Application

Apache Solr

• Open source search platform from Apache Lucene

project

• Provides full text search, faceted search, dynamic

clustering, database integration, rich document handling,

geo-spatial search

• High scalability, distributed search

• The core of search and navigation engine of some of the

world‘s largest internet sites

32Trend Detection and Visualization and Custom Search Applications

Page 33: Trend Detection and Visualization and Custom Search Applications

Custom Search Application

Apache Solr

• Written in Java, runs as a standalone search server

within a servlet container like Jetty or Tomcat

• REST-like API eases its use with any prog. language

• Input: XML, JSON or binary over HTTP(GET)

• Output: XML, JSON or binary

• Highly customizable

33Trend Detection and Visualization and Custom Search Applications

Page 34: Trend Detection and Visualization and Custom Search Applications

Custom Search Application

Apache Solr

• Operations:

- Indexing data

- Updating data

- Deleting data

- Querying data

- Sorting

- Higlighting

- Faceted search

34Trend Detection and Visualization and Custom Search Applications

Page 35: Trend Detection and Visualization and Custom Search Applications

Custom Search Application

Semantic Web

• An extension to current Web

• Information is given well-defined meaning

• Goes beyond media objects to link people, places, events,

organizations, etc.

• Resources connected by multiple relations

• Data modeled using directed labeled graph

• Based on W3C‘s RDF, it does quering and exchanging

instance data in RDF using SOAP

35Trend Detection and Visualization and Custom Search Applications

Page 36: Trend Detection and Visualization and Custom Search Applications

Custom Search Application

Semantic Web

36Trend Detection and Visualization and Custom Search Applications

type

type

type

co-founder

co-founder

birthplace

typelocated in

died on

born on

temp

CitySan Francisco

Steve Jobs

Businessman

February 24, 1955

October 5, 2011

Pixar

Apple Inc.

USA

Company

9°C

Page 37: Trend Detection and Visualization and Custom Search Applications

Custom Search Application

Semantic Search

• Context-based search results

• Can possibly enhance, but cannot replace the traditional

navigational search

• Disambiguation

• Data divided as ontological data and instance data

• Determines meaning of every word and establishing a

context between them to achieve coherence for a

sentence

37Trend Detection and Visualization and Custom Search Applications

Page 38: Trend Detection and Visualization and Custom Search Applications

Custom Search Application

Semantic Search

• Search Methodologies:

- RDF Path Traversal

- Keyword Concept Mapping

- Graph Patterns

- Logics

- Fuzzy Concepts, Fuzzy Relations, Fuzzy Logics

• Examples

- Hakia, SenseBot, DeepDyve

38Trend Detection and Visualization and Custom Search Applications

Page 39: Trend Detection and Visualization and Custom Search Applications

Custom Search Application

Linked Data Approach

• Linked data: method of publishing structured data that

can be interlinked

• Based on HTTP and URIs, extended to be read by

computers

• Components:

- URIs

- HTTP

- RDF

- Serialization formats (RDFa, RDF/XML, N3)

39Trend Detection and Visualization and Custom Search Applications

Page 40: Trend Detection and Visualization and Custom Search Applications

Custom Search Application

Linked Data Approach

• KiWi – a Linked Media Framework

• Easy to setup server application bundling Semantic Web

technologies

• Consists of LMF core and LMF modules

40Trend Detection and Visualization and Custom Search Applications

Page 41: Trend Detection and Visualization and Custom Search Applications

Custom Search Application

Linked Data Approach

• KiWi LMF core:

- Use URIs as names for things.

- Use HTTP URIs, so that people can look up those names.

- When someone looks up a URI, provide useful information,

using the standards (RDF, SPARQL).

- Include links to other URIs, so that they can discover more

things.

41Trend Detection and Visualization and Custom Search Applications

Page 42: Trend Detection and Visualization and Custom Search Applications

Custom Search Application

Linked Data Approach

• KiWi LMF module:

- LMF Semantic Search(highly configurable Semantic Search

service based on Apache SOLR)

- LMF Linked Data Cache (implements a cache to the Linked

Data Cloud)

- LMF Reasoner (implements a rule-based reasoner that

allows to process Datalog-style rules over RDF triples)

42Trend Detection and Visualization and Custom Search Applications

Page 43: Trend Detection and Visualization and Custom Search Applications

Prototypes

43Trend Detection and Visualization and Custom Search Applications

Page 44: Trend Detection and Visualization and Custom Search Applications

Questions and Answers

44Trend Detection and Visualization and Custom Search Applications

Page 45: Trend Detection and Visualization and Custom Search Applications

Thank you!

45Trend Detection and Visualization and Custom Search Applications