Trend Detection and Visualization and Custom Search Applications

45
Trend Detection and Visualization and Custom Search Applications Seminar for PG PUSHPIN January 12, 2012 Pranav Kadam (6641525) Universität Paderborn

description

This seminar deals with Trend detection in numbers and text and its visualization. In the second part, it focuses on Custom Search Application, Apache Solr, Semantic search and Linked data approach.

Transcript of Trend Detection and Visualization and Custom Search Applications

Page 1: Trend Detection and Visualization and Custom Search Applications

Trend Detection and Visualizationand

Custom Search ApplicationsSeminar for

PG PUSHPIN

January 12, 2012

Pranav Kadam (6641525)

Universität Paderborn

Page 2: Trend Detection and Visualization and Custom Search Applications

Overview

• Trend Detection

- Trend Detection in Numbers

- Trend Detection in Text

- Trend Visualization

• Custom Search Applications

- Apache Solr

- Semantic Search

- Linked Data Approach

2Trend Detection and Visualization and Custom Search Applications

Page 3: Trend Detection and Visualization and Custom Search Applications

Overview

• Prototypes

• Q&A

3Trend Detection and Visualization and Custom Search Applications

Page 4: Trend Detection and Visualization and Custom Search Applications

Trend Detection

4Trend Detection and Visualization and Custom Search Applications

Page 5: Trend Detection and Visualization and Custom Search Applications

Trend Detection

What is a trend?

• A general direction in which something is changing

• An inclination

• A pattern of gradual change in a condition over time

• A trend is

- always associated with time

- often described using ‘time series‘

• Long term change in the mean level of a ‘time series‘.

5Trend Detection and Visualization and Custom Search Applications

Page 6: Trend Detection and Visualization and Custom Search Applications

Trend Detection

Trend Analysis

• Practice of collecting information and trying to detect

trend in it

• Process of identifying pattern in behavior of a time

series by minimising noise

• Useful in forecasting future events

• Science of studying changes in social patterns

E.g. Google Trends, Youtube Trends, trendwatching.com,

Facebook Insights, Tag Cloud(on PG PUSHPIN blog)

6Trend Detection and Visualization and Custom Search Applications

Page 7: Trend Detection and Visualization and Custom Search Applications

Trend Detection

Trend Detection in Numbers

7Trend Detection and Visualization and Custom Search Applications

Page 8: Trend Detection and Visualization and Custom Search Applications

Trend Detection in Numbers

Time series and statistical methods

• Time series: ordered sequence of values at equally

spaced time intervals

• Trend detection in numbers: Statistical methods to

interpret time series and determine behavior

• Assumption: pattern in past data can be used to forecast

future data points

• Models: AutoRegressive(AR), Integrated(I), Moving

Average(MA)

8Trend Detection and Visualization and Custom Search Applications

Page 9: Trend Detection and Visualization and Custom Search Applications

Trend Detection in Numbers

Moving Average

• Average of time series data taken at consecutive periods

• New data in, old data out as the series progresses

E.g. MA of temperature for six months: Temp from January

to June, February to July, March to August, and so on.

• Minimizes temporal fluctuations

• Establishes trend, distinguishes any value above or

below trendline

• Applications in fields of Financial analysis, Trade,

Economics, Mathematics9Trend Detection and Visualization and Custom Search Applications

Page 10: Trend Detection and Visualization and Custom Search Applications

Trend Detection in Numbers

Moving Average

• Simple Moving Average: Plain average of data points

over specific no. of periods

• Period selected can be short, medium or long according

to interest (E.g. standard periods of SMA for stock

market analysis is 50 days or 200 days)

• Longer the period gives smoother curve but increases

the lag

• SMA always lags behind the latest data point

10Trend Detection and Visualization and Custom Search Applications

Page 11: Trend Detection and Visualization and Custom Search Applications

Trend Detection in Numbers

Moving Average

• Exponential Moving Average: Weight applied to the data

pointa to reduce the lag

• Weight decreases exponentially and never reaches zero

• EMA has less lag and is more sensitive to the changes in

data points

• SMA vs EMA: Though difference is apparent, either one

cannot be stated as better over the other

MA preference depends on objectives & time horizon

11Trend Detection and Visualization and Custom Search Applications

Page 12: Trend Detection and Visualization and Custom Search Applications

Trend Detection

Trend Detection in Text

12Trend Detection and Visualization and Custom Search Applications

Page 13: Trend Detection and Visualization and Custom Search Applications

Trend Detection in Text

Trend detection system

• Emerging Trend: Topic area growing in interest and

utility over time

• Study of emerging trend dependent on automated

process

• TD system processes collection of textual data and

identifies upward(growing), downward(falling) or

sideway(constant) tendency

• TD then highlights the emerging topics in trial period

13Trend Detection and Visualization and Custom Search Applications

Page 14: Trend Detection and Visualization and Custom Search Applications

Trend Detection in Text

Trend detection system

• Trend detection methods can be classified as:

- Fully-automatic

- Semi-automatic

• Fully-automatic systems:

- It generates a list of emerging topics from the

input(collection of texual data)

- Reviewer examines data & evidence provided to conclude

actual emerging trends

- Results supported with graphical visualization

14Trend Detection and Visualization and Custom Search Applications

Page 15: Trend Detection and Visualization and Custom Search Applications

Trend Detection in Text

Trend detection system

• Semi-automatic:

- User inputs a topic

- System outputs the evidence that helps to determine that

the topic is emerging or not

- Evidence provided either as a summary or a descriptive

report

15Trend Detection and Visualization and Custom Search Applications

Page 16: Trend Detection and Visualization and Custom Search Applications

Trend Detection in Text

Useful models, schemes and tools

• Term-Document Matrix

• Scheme: Term Frequency – Inverse Document

Frequency (tf-idf)

• Latent Semantic Analysis

• Science Citation Index or Web of Science database

• Inspec, Compendex database

16Trend Detection and Visualization and Custom Search Applications

Page 17: Trend Detection and Visualization and Custom Search Applications

Trend Detection in Text

Approches for Trend Detection

1. Tracing a trend via citation linkages:

- Determine a potential trend or select a topic of interest

- Find recent documents on the topic

- Examine whether they really discuss the topic

- Extract keywords

- Fetch abstract of the documents those are frequently

referenced using citation information

- Examine abstract to verify relation with topic

17Trend Detection and Visualization and Custom Search Applications

Page 18: Trend Detection and Visualization and Custom Search Applications

Trend Detection in Text

Approches for Trend Detection

1. Tracing a trend via citation linkages:

- Examine the references used above and make a subset

where author names are referenced in more than, say, 3

documents

- As an improvement, query the repositories of citation

linkage information and other sources

- Graph document frequency, repeated authors and no. of

venues by year

18Trend Detection and Visualization and Custom Search Applications

Page 19: Trend Detection and Visualization and Custom Search Applications

Trend Detection in Text

Approches for Trend Detection

1. Tracing a trend via citation linkages:

- Years with overall higher document frequency are likely

to have points where trend is emerging

Finally, to determine trend, apply a series of thresholds

like atleast one repeated author, atleast 10 venues

present, etc.

19Trend Detection and Visualization and Custom Search Applications

Page 20: Trend Detection and Visualization and Custom Search Applications

Trend Detection in Text

Approches for Trend Detection

2. Using web resources:

- Select a main topic area first

- Knowledge in this area is essential to identify trends in

later stages

- Validate it as a possible research area using sources like

Inspec database

- Search workshop websites and technical papers for

discussions on the main topic area

20Trend Detection and Visualization and Custom Search Applications

Page 21: Trend Detection and Visualization and Custom Search Applications

Trend Detection in Text

Approches for Trend Detection

2. Using web resources:

- Search web using helper terms like

most recent contribution, hot topic, cutting edge strategy, etc

- Again search an indexing database with

main topic ‘AND‘ newly found candiate trend

from year of origin to current year

21Trend Detection and Visualization and Custom Search Applications

Page 22: Trend Detection and Visualization and Custom Search Applications

Trend Detection in Text

Approches for Trend Detection

2. Using web resources:

If document frequency increases over the years, the

candidate trend is a genuine trend

x If documents from same author appear in different years

its not a trend

22Trend Detection and Visualization and Custom Search Applications

Page 23: Trend Detection and Visualization and Custom Search Applications

Trend Detection

Trend Visualization

23Trend Detection and Visualization and Custom Search Applications

Page 24: Trend Detection and Visualization and Custom Search Applications

Trend Visualization

Trend visualization techniques

• Trends can be visualized using

- Line graphs

- Bar graphs

- Word clouds

- Frequency tables

- Sparklines

- Histograms

24Trend Detection and Visualization and Custom Search Applications

Page 25: Trend Detection and Visualization and Custom Search Applications

Trend Visualization

Other ways to visualize trends

• ThemeRiver

- Visualizes thematic variations over time

- Changing widths depict changes in thematic strength of

the associated documents

- Flow represents time

- Colors represent themes

- Vertical section represents an ordered time slice

25Trend Detection and Visualization and Custom Search Applications

Page 26: Trend Detection and Visualization and Custom Search Applications

Trend Visualization

Other ways to visualize trends

• ThemeRiver

26Trend Detection and Visualization and Custom Search Applications

Page 27: Trend Detection and Visualization and Custom Search Applications

Trend Visualization

Other ways to visualize trends

• ThemeRiver

- Assigning same color group to related themes simplify its

tracking

27Trend Detection and Visualization and Custom Search Applications

Page 28: Trend Detection and Visualization and Custom Search Applications

Trend Visualization

Other ways to visualize trends

• SparkClouds

- SparkClouds= Sparklines + Tag Clouds

- Sparkline, characterized by small size and high data density,

visualize trends and variations in a simple condensed way

28Trend Detection and Visualization and Custom Search Applications

Page 29: Trend Detection and Visualization and Custom Search Applications

Trend Visualization

Other ways to visualize trends

• SparkClouds

- Tag clouds are text based

visualizations showing

frequency, popularity or

importance of words

29Trend Detection and Visualization and Custom Search Applications

Page 30: Trend Detection and Visualization and Custom Search Applications

Trend Visualization

Other ways to visualize trends

• SparkClouds

- Sparklines are added to tag clouds to represent trend across

series of tag clouds

- Overview of trends provided in limited space

- Its compact and aesthetic

30Trend Detection and Visualization and Custom Search Applications

Page 31: Trend Detection and Visualization and Custom Search Applications

Custom Search Applications

31Trend Detection and Visualization and Custom Search Applications

Page 32: Trend Detection and Visualization and Custom Search Applications

Custom Search Application

Apache Solr

• Open source search platform from Apache Lucene

project

• Provides full text search, faceted search, dynamic

clustering, database integration, rich document handling,

geo-spatial search

• High scalability, distributed search

• The core of search and navigation engine of some of the

world‘s largest internet sites

32Trend Detection and Visualization and Custom Search Applications

Page 33: Trend Detection and Visualization and Custom Search Applications

Custom Search Application

Apache Solr

• Written in Java, runs as a standalone search server

within a servlet container like Jetty or Tomcat

• REST-like API eases its use with any prog. language

• Input: XML, JSON or binary over HTTP(GET)

• Output: XML, JSON or binary

• Highly customizable

33Trend Detection and Visualization and Custom Search Applications

Page 34: Trend Detection and Visualization and Custom Search Applications

Custom Search Application

Apache Solr

• Operations:

- Indexing data

- Updating data

- Deleting data

- Querying data

- Sorting

- Higlighting

- Faceted search

34Trend Detection and Visualization and Custom Search Applications

Page 35: Trend Detection and Visualization and Custom Search Applications

Custom Search Application

Semantic Web

• An extension to current Web

• Information is given well-defined meaning

• Goes beyond media objects to link people, places, events,

organizations, etc.

• Resources connected by multiple relations

• Data modeled using directed labeled graph

• Based on W3C‘s RDF, it does quering and exchanging

instance data in RDF using SOAP

35Trend Detection and Visualization and Custom Search Applications

Page 36: Trend Detection and Visualization and Custom Search Applications

Custom Search Application

Semantic Web

36Trend Detection and Visualization and Custom Search Applications

type

type

type

co-founder

co-founder

birthplace

typelocated in

died on

born on

temp

CitySan Francisco

Steve Jobs

Businessman

February 24, 1955

October 5, 2011

Pixar

Apple Inc.

USA

Company

9°C

Page 37: Trend Detection and Visualization and Custom Search Applications

Custom Search Application

Semantic Search

• Context-based search results

• Can possibly enhance, but cannot replace the traditional

navigational search

• Disambiguation

• Data divided as ontological data and instance data

• Determines meaning of every word and establishing a

context between them to achieve coherence for a

sentence

37Trend Detection and Visualization and Custom Search Applications

Page 38: Trend Detection and Visualization and Custom Search Applications

Custom Search Application

Semantic Search

• Search Methodologies:

- RDF Path Traversal

- Keyword Concept Mapping

- Graph Patterns

- Logics

- Fuzzy Concepts, Fuzzy Relations, Fuzzy Logics

• Examples

- Hakia, SenseBot, DeepDyve

38Trend Detection and Visualization and Custom Search Applications

Page 39: Trend Detection and Visualization and Custom Search Applications

Custom Search Application

Linked Data Approach

• Linked data: method of publishing structured data that

can be interlinked

• Based on HTTP and URIs, extended to be read by

computers

• Components:

- URIs

- HTTP

- RDF

- Serialization formats (RDFa, RDF/XML, N3)

39Trend Detection and Visualization and Custom Search Applications

Page 40: Trend Detection and Visualization and Custom Search Applications

Custom Search Application

Linked Data Approach

• KiWi – a Linked Media Framework

• Easy to setup server application bundling Semantic Web

technologies

• Consists of LMF core and LMF modules

40Trend Detection and Visualization and Custom Search Applications

Page 41: Trend Detection and Visualization and Custom Search Applications

Custom Search Application

Linked Data Approach

• KiWi LMF core:

- Use URIs as names for things.

- Use HTTP URIs, so that people can look up those names.

- When someone looks up a URI, provide useful information,

using the standards (RDF, SPARQL).

- Include links to other URIs, so that they can discover more

things.

41Trend Detection and Visualization and Custom Search Applications

Page 42: Trend Detection and Visualization and Custom Search Applications

Custom Search Application

Linked Data Approach

• KiWi LMF module:

- LMF Semantic Search(highly configurable Semantic Search

service based on Apache SOLR)

- LMF Linked Data Cache (implements a cache to the Linked

Data Cloud)

- LMF Reasoner (implements a rule-based reasoner that

allows to process Datalog-style rules over RDF triples)

42Trend Detection and Visualization and Custom Search Applications

Page 43: Trend Detection and Visualization and Custom Search Applications

Prototypes

43Trend Detection and Visualization and Custom Search Applications

Page 44: Trend Detection and Visualization and Custom Search Applications

Questions and Answers

44Trend Detection and Visualization and Custom Search Applications

Page 45: Trend Detection and Visualization and Custom Search Applications

Thank you!

45Trend Detection and Visualization and Custom Search Applications