Markov Chains for the Web - SEO, Usability, Search Engine Scoring, and More

Post on 19-Oct-2014

758 views 2 download

Tags:

description

Markov chains can take predictive theory to a new level, with large-scale applications for digital marketing. From social media network modeling to user pathing, site scoring and recommended pages, Markov chains can quantify, rank, and return likely outcomes on the web. In other words, they can demystify demographics. Here's how.

Transcript of Markov Chains for the Web - SEO, Usability, Search Engine Scoring, and More

Using Markov Chains to Predict User Behavior

Rivka Fogel

Markov Chains: Probability without History

COPYRIGHT 2013 CATALYST. ALL RIGHTS RESERVED. JANUARY 23, 2014 | PAGE 2

Andrey Markov

Rivka Fogel

What Are Probability Spaces?

COPYRIGHT 2013 CATALYST. ALL RIGHTS RESERVED. JANUARY 23, 2014 | PAGE 3

Focal Object / Function Co-Domain

Function/Possibility 1

Function/Possibility 2

• Also known as stochastic processes

Rivka Fogel

Presenter
Presentation Notes
Stochastic definition: Stochastic processes are random processes that describe the evolution of a random value over time. As opposed to deterministic processes, which are just ordinary differential equations

Type 1: Time Series

COPYRIGHT 2013 CATALYST. ALL RIGHTS RESERVED. JANUARY 23, 2014 | PAGE 4

First Event

Function/Possibility 1

Function/Possibility 2

Time

Also called “states”

Rivka Fogel

Application: Personalization

• To return more accurate SERPs (E) for that user

COPYRIGHT 2013 CATALYST. ALL RIGHTS RESERVED. JANUARY 23, 2014 | PAGE 5

Identifying user-specific authorities

User E B A

C D

Rivka Fogel

Type 2: Spatial Field

COPYRIGHT 2013 CATALYST. ALL RIGHTS RESERVED. JANUARY 23, 2014 | PAGE 6

Shared Event

• Variable interactions are often statistically correlated

Rivka Fogel

Addition of The Markov Property

COPYRIGHT 2013 CATALYST. ALL RIGHTS RESERVED. JANUARY 23, 2014 | PAGE 7

E because of B or D, not because of A

B A

C D

• The probability of B causing E, as opposed to D causing E, is calculated by the Bayesian Theorem

The Next State Depends Only on the Current State:

Rivka Fogel

Application: (not provided)

• The Markov Property enables the marketer to model paths without knowing every state.

• While some keyphrase data is known, it can also identify the keyphrase based on other users’ paths where the keyphrase is known.

COPYRIGHT 2013 CATALYST. ALL RIGHTS RESERVED. JANUARY 23, 2014 | PAGE 8

Homepage

Keyphrase?

Bounce

Model Landing Page

Homepage Video View

Inventory

Gallery Page Video View

Rivka Fogel

Presenter
Presentation Notes
In deterministic modeling (for the web), the user keyphrase is the focal point, and all subsequent stages are based on the focal point In stochastic modeling, the Markov theorem has all stages but the focal point and preceding stage irrelevant to the current stage. You can also define the preceding stage as the focal point This means that (not provided) is irrelevant when the focal point changes from the keyword to a SERP, landing page, or behavior (see relational Markov models/user behavior) Other users’ paths: See multichannel attribution

Application: Multichannel Attribution

• Identify A (or predict D) via multiple probability states within a Markovian chain. COPYRIGHT 2013 CATALYST. ALL RIGHTS RESERVED. JANUARY 23, 2014 | PAGE 9

Monitoring and prediction can be based on probability of a user’s path given other users’ paths

Probability of B A Probability of C

B 1 C Known Path 1

B 2 C Known Path 2

D

4

5

Rivka Fogel

Presenter
Presentation Notes
The Markov chain formula is generative, so modeling is easily automated. Monitoring and prediction is defined by the Bayesian theorem. E.g., The probability of the hypothesis given evidence from the initial source is dependent on the probability of the hypothesis given evidence from a different source

Application: Audience Segmentation

COPYRIGHT 2013 CATALYST. ALL RIGHTS RESERVED. JANUARY 23, 2014 | PAGE 10

Probability of B A

Probability of C

B 1 C Known Path 1

B 2 C

D

4

5

Landing Page

Known Path 2

Referral Paths On-Site Paths

Rivka Fogel

Relational Markov Properties

• Relational Markov Models group multiple types of objects – relations – and calculate the probability of the relation’s appearance in a state.

• They work off of Dynamic Bayesian Networks

COPYRIGHT 2013 CATALYST. ALL RIGHTS RESERVED. JANUARY 23, 2014 | PAGE 11

Relational Markov Models allow states to be of different types.

E because of B or D’s type, not because of A or C’s type

State B

State D

Type 2 Type 1

State A

State C

Rivka Fogel

Application: Audience Segmentation 2

COPYRIGHT 2013 CATALYST. ALL RIGHTS RESERVED. JANUARY 23, 2014 | PAGE 12

B

1 C

2

Paid

Organic

Known

Rivka Fogel

Application: User Experience

COPYRIGHT 2013 CATALYST. ALL RIGHTS RESERVED. JANUARY 23, 2014 | PAGE 13

Homepage Bounce

Model Landing Page

Homepage Video View

Inventory

Gallery Page Video View

Page Visit Video View Bounce

Types:

Rivka Fogel

Presenter
Presentation Notes
For example: the probability of a user picking a landing page and then picking an object on that landing page as opposed to the probability of picking both a different object, a different landing page, and a different path entirely can be calculated. Modeled spatially, not temporally Can be combined with probabilities as well

Application: Social Network Modeling

• This function will answer: if the user ended up converting/visiting the landing page, which [type(s)] of social interaction[s] came into play?

COPYRIGHT 2013 CATALYST. ALL RIGHTS RESERVED. JANUARY 23, 2014 | PAGE 14

Site Landing

Page

Rich Media Play Rich Media

Host Page

User Share

Influencer

News Feed

Brand Social Profile

Rivka Fogel

Presenter
Presentation Notes
Possible only via a spatial model because the nature of the co-domain means that you’d be modeling backwards

Application: HTTP Service Request Prediction

• Prefetch Page A given the probability that the user will want to see it. • The keyphrase cluster is predicted by the function with co-domain B and

is then used to predict the incidence of B where the first state isn’t known.

COPYRIGHT 2013 CATALYST. ALL RIGHTS RESERVED. JANUARY 23, 2014 | PAGE 15

Probability of 3 A Keyphrase 1

1

3

2

Known Paths

Keyphrase Cluster

Keyphrase 2

Rivka Fogel

Presenter
Presentation Notes
The keyphrase cluster is post-Hummingbird

Application: Agent Suggestion

• Auto-suggests searches (Search C) and links (URL E) that the user is likely to want to access, based on user history and other users’ history

COPYRIGHT 2013 CATALYST. ALL RIGHTS RESERVED. JANUARY 23, 2014 | PAGE 16

URL A URL B

URL C URL D URL E

Keyphrase Cluster or Authority

First words of Query

Search A

Search B Search C

Rivka Fogel

Application: Search Engine Scoring

• The function identifies hubs of authority that are probable next steps in many systems (each with individual focus objects).

COPYRIGHT 2013 CATALYST. ALL RIGHTS RESERVED. JANUARY 23, 2014 | PAGE 17

Identifying Authority 2:

Page A Keyphrase Cluster

Page B

Link 2

Page C Link 1

Authority 1 Authority 2

Rivka Fogel

Appendix: Formal Definitions

COPYRIGHT 2013 CATALYST. ALL RIGHTS RESERVED. JANUARY 23, 2014 | PAGE 18

Where, Probability Spaces: • The measurable space (S, Σ) and an object on the

measurable space X • The probability space is defined by the function P, the

assignment of probabilities to events, and where Ω is the set of possible outcomes, and F is set of events in which each event has 0 or more outcomes P(x) = Σ(t1-tk)P(t1) for all X on Ω

• The finite dimensional distribution X: Xt1 Ω -> Xk

• That arrow, or the push forward measures, or the random distribution of events, or the matrix of transition probabilities P PT1(.)=PT1(.)/x = Sk

– Where the Bayesian theorem allows for: P (H|E old) = P(H)*P(H|E new)/P(E entire set)

COPYRIGHT 2013 CATALYST. ALL RIGHTS RESERVED. JANUARY 23, 2014 | PAGE 19

Rivka Fogel

• P(Xl+1=S | Xl=St | Xl-1 = St-1 … X0 = S0) = P(Xl+1=S | Xl = Sl) | Xl=I – The random distribution of events is defined because the

system is finite. • So, in the matrix of transition probabilities [defined

as Pl, l+1 over ij = P(Xl+1 = j | Xl=i)], Pl is independent of l.

• That is, s^(t) = s^

(t-1)A – s is the state space, A is the matrix of transition

probabilities, and ^ is the initial probability distribution of the states in s. s(t) is the probability vector for states at time “t.”

COPYRIGHT 2013 CATALYST. ALL RIGHTS RESERVED. JANUARY 23, 2014 | PAGE 20

Then, Markov Property:

Rivka Fogel

Markov Restatement 1: When a User’s History is Available

• A(s, s’)=C(s,s’)/Σs’’ C(s,s’’) and ^(s)=C(s)/Σs’ C(s’) – C(s,s’) counts the instances where s’ follows s – This can be applied to HTTP prediction and agent

suggestion

COPYRIGHT 2013 CATALYST. ALL RIGHTS RESERVED. JANUARY 23, 2014 | PAGE 21

Rivka Fogel

Markov Restatement 2: When the Evidence Comes from a User Pool • The Markov function becomes a generative chain

link system that can store counts and probabilities • s^(t) = a0i^(t-1)A+a1i^(t-2)A2+a2i^(t-3)A3… and

= Max(a0i^(t-1)A+a1i^(t-2)A2+a2i^(t-3)A3…) – s(t) is normalized to select a list of probable states. – Where probabilities are used:

This can be applied to authority hubs as well, where collected user path traversal patterns are represented in a traversal connectivity matrix.

COPYRIGHT 2013 CATALYST. ALL RIGHTS RESERVED. JANUARY 23, 2014 | PAGE 22

Rivka Fogel

Markov Restatement 3: When Groupings of States Are Estimated • These are Relational Markov Models • These groupings are also seen as abstractions. A(Q) forms a

lattice of abstractions. – {D, R, Q, A, π} where D ∈ D is the tree and a hierarchy of values. R is a

set of relations. Each relation is defined by nodes on leaves of D. Q is the set of states. A is the transition probability matrix. Π is the initial probability, that is the initial state in the chain. States are defined as abstractions on Q.

– The rank of an abstraction a=R(d1, …., dk) in the lattice is defined as 1+ Σk

1 depth(dk). Depth is a node’s depth on the tree, and increases with the abstraction’s rank. The rank of Q (the most general) is 0.

• States that have nodes on common leaves will more frequently appear in abstractions together.

COPYRIGHT 2013 CATALYST. ALL RIGHTS RESERVED. JANUARY 23, 2014 | PAGE 23

Rivka Fogel

Further Reading • Anderson, Corin R., Domingos, Pedro, and Weld, Daniel S.

“Relational Markov Models and their Application to Adaptive Web Navigation.” Proceedings of the eighth ACM SIGKDD international conference on knowledge discovery and data mining. (2002): 143-152. Electronic. http://homes.cs.washington.edu/~pedrod/papers/kdd02a.pdf

• Downey, Allen. “Bayesian statistics made (as) simple (as possible).” Pycon US. 7 March 2012. http://pyvideo.org/video/608/bayesian-statistics-made-as-simple-as-possible

• Ildiko, Flesch and Lucas, Peter. “Markov Equivalence in Bayesian Networks.” Electronic. http://www.cs.ru.nl/P.Lucas/markoveq.pdf

• Sarukkai, Ramesh R. “Link prediction and path analysis using Markov chains.” Computer Networks 3 (June 2000): 377-386. Electronic. http://www.sciencedirect.com/science/article/pii/S138912860000044X

COPYRIGHT 2013 CATALYST. ALL RIGHTS RESERVED. JANUARY 23, 2014 | PAGE 24

Rivka Fogel

Questions?