Language networks

84
LANGUAGE NETWORKS SI/EECS 767 Yang Liu January 29, 2010

description

SI/EECS 767 Yang Liu January 29, 2010. Language networks. Human language described as a complex network. Introduction. (Sole et al, 05). Analyzing statistical properties Building models to explain the patterns Studying the origins and evolution of human language - PowerPoint PPT Presentation

Transcript of Language networks

Page 1: Language networks

LANGUAGE NETWORKS

SI/EECS 767Yang LiuJanuary 29, 2010

Page 2: Language networks

INTRODUCTION Human language described as a

complex network

(Sole et al, 05)

Page 3: Language networks

INCENTIVES Analyzing statistical properties Building models to explain the patterns Studying the origins and evolution of

human language Statistical approaches to natural

language processing

Page 4: Language networks

CATEGORIZATION Words as vertices

Co-occurrence networks (Dorogovtsev & Mendes, 2001; Masucci & Rodgers, 2008) Semantic networks (Steyvers, Tenenbaum, 2005) Syntactic networks (Cancho et al., 2004)

Sentences as vertices (Erkan & Radev, 2004)

Documents as vertices (Menzer, 2004)

Page 5: Language networks

CO-OCCURENCE AND SYNTACTIC NETWORKS

Page 6: Language networks

SEMANTIC NETWORKS

Page 7: Language networks

 Language as an evolving word web

(Dorogovtsev & Mendes, 2001)

Page 8: Language networks

INTRODUCTION Propose a theory of how language

evolves Treat human language as a complex

network of distinct words Words are connected with nearest

neighbors (co-occurrence networks) Papers of Ferrer & Sole (2001, 2002) degree distribution consists of two power-

law parts with different exponent

Page 9: Language networks

THE MODEL Preferential attachment

Provide power-law degree distribution Average degree does not change

The total number of connections increases more rapidly than the number of vertices and the average degree grows

Page 10: Language networks

THE MODEL At each time step,

a new vertex (word) is added; t is the total number of vertices, plays the role

of time; connect it with some old one i with the

probability proportional to its degree ki; ct new edges emerge between old words (c

is a constant coefficient) These new edges emerge between vertices i

and j with the p ~ ki kj

Page 11: Language networks

DATA Two word webs by Ferrer and Sole

(2001, 2002) Obatain ¾ of a million words of the

British National Corpus 470 000 vertices Average degree = 72

Page 12: Language networks

SOLVING THE MODEL Continuum approximation k(s,t) : the average degree of the vertices

born at time s and observed at time t

Ct ≈ 70 >>1

Page 13: Language networks

SOLVING THE MODEL

The degree distribution has two regions separated by the crossover point

Page 14: Language networks

SOLVING THE MODEL Below this point, stationary degreedistribution Above this point,Non-stationanry degree distribution

Empty and filled circles show the degree distributions for two word webs by Ferrer and Sole (2001, 2002)

Page 15: Language networks

DISCUSSION Interested only in degree distribution Clustering coefficients not match The total number of words of degree

greater than kcross does not change The size of kernel lexicon does not

depend on the total number of distinct words in language

Page 16: Language networks

 Network properties of written human language

(Masucci & Rodgers, 2008)

Page 17: Language networks

TOPOLOGY OF THE NETWORK The words (include punctuations) are

vertices and two vertices are linked if they are neighbors.

Directed network

Page 18: Language networks

NETWORK STATISTICS 8992 vertices, 117687 edges, mean

degree <k> = 13.1 P(k) ∝k -1.9

Zipf’s law slope -1.2

Page 19: Language networks
Page 20: Language networks

GROWTH PROPERTIES The number of edges between words

grows faster than the number of vertices.

N(t) ∝ t 1.8

Page 21: Language networks

NEAREST NEIGHBOR’S PROPERTIES The mean cluster coefficient <c> =

0.19

Page 22: Language networks

REPEATED BINARY STRUCTURES OF WORDS

Reproduce by local PA

Page 23: Language networks

THE MODELS (D-M MODEL) Starts with a chain of 20 connected

vertices At each time add a new vertex and

connect it to some vertex i with p ∝ ki

m(t) -1 new edges emerge between old words with p ∝ ki kj

Page 24: Language networks

D-M MODEL <c(k)> = 0.16 Catches the average clustering and the global

growth behavior Misses the internal structure

Page 25: Language networks

MODEL 2 Include local PA P(t) ≈ 0.1t0.16 Start with a chain of 20 connected vertices At each time add a new vertex and connect it to

some vertex i (not nearest neighbors) with p ∝ ki

m(t) -1 times, with probability p(t) link the last vertex to an old vertex i (in its nearest neighborhood) through local PA (p ∝ ki); with 1 – p(t), link an old vertex i (not part of its nearest neighborhood) with global PA

Page 26: Language networks

MODEL 2 <c> = 0.08 Catches the global and nearest

neighbor behavior but not the average cluster coeffient

Page 27: Language networks

MODEL 3 Different words in written human

language display different statistical distributions, according to their functions

Page 28: Language networks

MODEL 3 Start with a chain of 20 connected vertices At each time add a new vertex and connect it to

some vertex i (not nearest neighbors) with p ∝ ki

m(t) -1 times, with probability q= 0.05, link the last linked vertex to one of the three fixed vertices; with probability p(t) link the last vertex to an old vertex i (in its nearest neighborhood) through local PA (p ∝ ki); with 1 – p(t) – 3q, link an old vertex i (not part of its nearest neighborhood) with global PA.

Page 29: Language networks

MODEL 3 <c> = 0.20

Page 30: Language networks

CONCLUSIONS New growth mechanisms: 1.local PA 2.the allocation of a set of preselected

vertices

Page 31: Language networks

The large-scale structure of semantic networks: Statistical

analyses and a model of semantic growth

(Steyvers & Tenebaum, 2005)

Page 32: Language networks

INTRODUCTION There are general principles governing

the structure of network representations for natural language semantics

The small-world structure arise from a scale-free organization

Page 33: Language networks

MODEL Concepts enter the network early are

expected to show higher connectivity One aspect of semantic development –

growth of semantic networks by differentiations of existing nodes

The model grows through a process of differentiation analogous to mechanisms of mechanic development which allows it to produce both small-world and scale-free structure.

Page 34: Language networks

ANALYSIS OF SEMANTIC NETWORKS Free association norms WordNet Roget’s thesaurus

Page 35: Language networks

METHODS Associative networks

Created two networks: directed, undirected

Page 36: Language networks

ROGET’S THESAURUS Bipartite graph

Word nodes and semantic category nodes A connection is made between a word and

category node when the word falls into the semantic category

Convert to a simple graph for calculating cc( one-mode projection)

Page 37: Language networks

WORDNET 120,000+ word forms 99,000+ word meanings Links between forms and forms,

meanings and meaning, forms and meanings

Treat as an undirected graph

Page 38: Language networks

RESULTS

Page 39: Language networks

ZIPF’S “LAW OF MEANING”

Page 40: Language networks

GROWING NETWORK MODEL Previous models

BA model: low cc WS model: no scale-free structure

Page 41: Language networks

MODEL A: UNDIRECTED

At each time step, a new node with M links is added to the network by randomly choosing some existing node i for differentiation,

and then connecting the new node to M randomly chosen nodes in the semantic neighborhood of node i.

Page 42: Language networks

TWO PROBABILITY DISTRIBUTION

Page 43: Language networks

Set n equal to the size of the target network

Set M equal to ½ <k>

Page 44: Language networks

MODEL B: DIRECTED Assume the direction of each arc is

chosen randomly and independently of the other arcs

Point toward old node with probability α, point toward new node with probability 1-α

Page 45: Language networks

RESULTS Only test on association networks with

Model A and B  set α = 0.95 Average of 50 simulations

Page 46: Language networks
Page 47: Language networks
Page 48: Language networks

 Patterns in syntactic dependency networks

(Ferrer et al., 2004)

Page 49: Language networks

INTRODUCTION Co-occurrence networks fail in

capturing the characteristic long-distance correlations of words in sentences

The proportion of incorrect syntactic dependency links is high

Require a precise definition of syntactic link

Page 50: Language networks

THE SYNTACTIC DEPENDENCY NETWORK Defined according to the dependency

grammar formalism Vertices are words, links go from the

modifier to its head

Page 51: Language networks

CORPORA 1. Czech corpus

Proportion of links is about 0.65 (missing links between function words)

Performed by hand 2. Romanian corpus

Performed by hand 3. German corpus

Proportion of links is about 0.16 (obey no regularity)

Performed automatically

Page 52: Language networks

NETWORK PROPERTIES Small world structure

Small average path length D and high cluster coefficient C Heterogeneity

Power-law degree distribution Hierarchical organization

C(k) ~ k -θ

Betweenness centrality P(g) ~ g –η

Assortativeness

Page 53: Language networks

RESULTS

Page 54: Language networks

RESULTS

Page 55: Language networks

RESULTS

Page 56: Language networks

RESULTS

Page 57: Language networks

RESUTS

Page 58: Language networks

GLOBAL VS SENTENCE-LEVEL PATTERNS

Page 59: Language networks

DISCUSSIONS(1) Disassortative mixing tells us that labor is

divided in human language. Linking words tend to avoid connections among them.

(2) Hierarchical organization tells us that syntactic dependency networks not only define the syntactically correct links but also a top down hierarchical organization that is the basis of phrase structure formalisms.

(3) Small worldness is a necessary condition for recursion.

Page 60: Language networks

Lexrank: graph-based lexical centrality as salience in text

summarization

(Erkan & Radev, 2004)

Page 61: Language networks

INTRODUCTION Graph-based methods in NLP Random walks on sentence-based

graphs help in Text Summarization (TS) Extractive summarization VS

abstractive summarization Assess the centrality of each sentence

in a cluster and extract the most important ones to include the summary

Page 62: Language networks

Centrality measures: Degree, LexRank with threshold, and continuous Lexrank

Vertices represent sentences and edges are defined in terms of the similarity relation between pairs of sentences

Toolkit MEAD Test data DUC 2003, 2004

Page 63: Language networks

CENTROID-BASED SUMMARIZATION Centroid of the document cluster in a

vector space The centroid is a pseudo-document

which consists of words that have tf*idf scores above a predefined threshold

The sentences that contain more words from the centroid are considered as central

Page 64: Language networks

CENTRALITY-BASED SENTENCE SALIENCE Hypothesis: the sentences that are

similar to many of the other sentences are more central/salient to the topic

Cosine similarity between two sentences:

Page 65: Language networks

DEGREE CENTRALITY Significantly similar sentences are

connected to each other Choice of cosine threshold

Page 66: Language networks

EIGENVECTOR CENTRALITY AND LEXRANK PageRank

Sum of neighbor’s divided prestige d: dumping factor; set to 0.85

Page 67: Language networks

CONTINUOUS LEXRANK Improve by using the strength of the

similarity links

Page 68: Language networks

CENTRALITY VS CENTROID 1. centrality accounts for information

subsumption among sentences 2. it prevents unnaturally high idf

scores from boosting up the score of a sentence that is unrelated to the topic

Page 69: Language networks

EXPERIMENT Data set: DUC 2003 and 2004 Evaluation method: ROGUE MEAD toolkit

The feature extraction Centroid, position and length Relative weight

Combiner reranker

Page 70: Language networks

RESULTS AND DISCUSSION Effects of threshold

Page 71: Language networks

COMPARISON OF CENTRALITY MEASURES

Page 72: Language networks

EXPERIMENT ON NOISY DATA

Page 73: Language networks

  Evolution of document networks

(Menczer, 2004)

Page 74: Language networks

BACKGROUD Content similarity

Link probability approximated by link similarity metric (Jaccard coefficient)

Page 75: Language networks

JOINT DISTRIBUTION MAPS

Page 76: Language networks

DEPENDENCY OF THE WEB’S LINK TOPOLOGY ON CONTENT Conditional probability that the link

neighborhood between two web pages is above some threshold λ, given that the two pages have some content similarity κ, as a function of κ :

Phase transition around κ* For κ>κ*, the probability that two pages are

neighbors does not seem to depend on their content similarity; for κ<κ *, the probability decreases according to a power-law Pr(λ |κ) ~ κγ

Page 77: Language networks

MODEL At each step t one new page t is added, and m new

links are created from t to m existing pages, each selected from {i, i<t} with probability:

(m, κ*, and γ are constants and c is a nomorlization factor)

Page 78: Language networks

VALIDATING PRIOR MODELS

Page 79: Language networks

Look for a model capable of predicting both the degree distribution and the similarity distributions among linked documents

Page 80: Language networks

DEGREE-SIMILARITY MIXTURE MODEL At each step, one new document is added,

and m new links or references are created from it to existing documents.

At time t the probability that the i th document is selected and linked from the tth document is:

α is a preferential attachment parameter

Page 81: Language networks

VALIDATE THE MODEL

Page 82: Language networks
Page 83: Language networks

CONCLUSION Page content cannot be neglected

when we try to understand the evolution of document networks.

The tension between referring to popular

versus related documents

Page 84: Language networks

Questions?