569: Sampling Graphs (2013)jlu.myweb.cs.uwindsor.ca/569/569graphSampling2013.pdf · Lots of...
Transcript of 569: Sampling Graphs (2013)jlu.myweb.cs.uwindsor.ca/569/569graphSampling2013.pdf · Lots of...
JianguoLu
Introduction:Sampling
SizeestimationBiascorrection
UniformSampling
AverageDegree
WeiboSampling
569: Sampling Graphs (2013)
Jianguo Lu
University of Windsor
November 18, 2013
JianguoLu
Introduction:Sampling
SizeestimationBiascorrection
UniformSampling
AverageDegree
WeiboSampling
Outline
1 Introduction: Sampling
2 Size estimationBias correctionUniform Sampling
3 Average Degree
4 Weibo Sampling
JianguoLu
Introduction:Sampling
SizeestimationBiascorrection
UniformSampling
AverageDegree
WeiboSampling
Why sampling
Applications�� ��DeepWeb
�� ��OSN�� ��SouceCode
�� ��SemanticWeb|
|Properties
�� ��Size�� ��Distribution
�� ��Ranking�� ��Community
�� ��Diameter�� ��CluterteringCoefficient
We need to estimate the properties for two reasons:
Data in its entirety is not available (e.g., Facebook), or without central control (e.g., WWW),or evolving.Data is big. Quadratic algorithms are not feasible.
JianguoLu
Introduction:Sampling
SizeestimationBiascorrection
UniformSampling
AverageDegree
WeiboSampling
Deep web graph model
d1
q7
d2
q1 q3 q6
d3 d4 d5
q2q4
d6d7 d8d9
q5
A: bipartite graph
d1
q7
d2
q1
q3
q6
d3
d4
d5
q2
q4
d6
d7
d8
d9
q5
B: same graph as (A) in spring model layout
Figure: Hidden data source as a bipartite graph
JianguoLu
Introduction:Sampling
SizeestimationBiascorrection
UniformSampling
AverageDegree
WeiboSampling
What to sample for
data size;
distributions;
clustering coefficient;
communities;
influential bloggers (degree, pageRank, Katz centralities, etc.)
Outliers (spammers, zombies, inflated followers)
JianguoLu
Introduction:Sampling
SizeestimationBiascorrection
UniformSampling
AverageDegree
WeiboSampling
How to sample
22
1
6
3
3
53 1
4
1
1
2
1
3
53
4
Original graph Random Node
6
3
3
53
2
6
3
3
5
Random Edge Random Walk
Sample by random node;Sample by random edge;Sample by random walk;Combinations and modifications. e.g., random walk with uniform random restart.
JianguoLu
Introduction:Sampling
SizeestimationBiascorrection
UniformSampling
AverageDegree
WeiboSampling
What is different from traditional sampling
Most of the networks are scale-free. Degrees have very large variance. Uniformrandom sampling does not work.
Precise sampling: quantities are digitalized, making the sampling processprecise. e.g., know the exact degree, and can choose uniformly at random fromthe neighbouring nodes. Only possible for ONLINE social networks not real socialnetworks.
Access interface: provide interface APIs, many options. Can design new samplingschemes using APIs.
JianguoLu
Introduction:Sampling
SizeestimationBiascorrection
UniformSampling
AverageDegree
WeiboSampling
Size estimation
Applications
Size of web, search engines
Size of Online Social Network (Twitter, Weibo)
Number of bugs in programs
...
JianguoLu
Introduction:Sampling
SizeestimationBiascorrection
UniformSampling
AverageDegree
WeiboSampling
Capture-Recapture Method
The estimator often usedWhen all the elements have equal probability of being sampled,
N =n1n2
d(1)
where n1 and n2 are the number of samples in two capture occasions, d is theduplicates.
Lawrence and C. Giles. Searching the world wide web. Science,280(5360):98-100, 1998.
A. Broder and et al. Estimating corpus size via queries. In CIKM, pages 594-603.ACM, 2006.
L. Katzir, E. Liberty, and O. Somekh. Estimating sizes of social networks viabiased sampling. In WWW, pages 597-606. ACM, 2011.
Petersson et al., Capture–recapture in software inspections after 10 yearsresearch—-theory, evaluation and application, Journal of Systems and Software,2004.
Lots of research on obtaining uniform random sampling, using methods such asMetropolis-Hasting Random Walk
Bar-Yossef et al. Random sampling from a search engine’s index, JACM 2008.
JianguoLu
Introduction:Sampling
SizeestimationBiascorrection
UniformSampling
AverageDegree
WeiboSampling
Multiple Capture-Recapture
Equal sampling probability
N =n2
2C(2)
where n is total number of sampled elements, C is the number of collisions
Unequal sampling probability
N =n2
2CΓ (3)
where Γ is the normalized variance of the degrees of the graph
How large is Γ?
JianguoLu
Introduction:Sampling
SizeestimationBiascorrection
UniformSampling
AverageDegree
WeiboSampling
Γ in various datasets
Graph N(×103) γ or√
Γ− 1 Φ(×10−5)WikiTalk [?] 2,388 26.32 2,700
BerkStan [?] 654 14.51 5.3EmailEu [?] 224 13.66 13Stanford [?] 255 11.51 5.8
Skitter [?] 1,694 10.46 56Youtube [?] 1,134 9.64 440
NotreDame [?] 325 6.40 9.4Gowalla [?] 196 5.54 1,200Epinion [?] 75 4.02 610Google [?] 855 4.00 62
Slashdot [?] 82 3.35 1,900Facebook [?] 2,937 3.14 590
Flickr [?] 105 2.64 68IMDB [?] 374 2.05 130DBLP [?] 511 1.61 560
Amazon [?] 410 1.27 98Gnutella [?] 62 1.21 9,100
CitePatents [?] 3,764 1.20 1,100
Table: Statistics of the 18 real-world graphs, sorted in descending order of the coefficient of degreevariation γ. Φ is the conductance.
For Twitter data, Γ ≈ 1300.
JianguoLu
Introduction:Sampling
SizeestimationBiascorrection
UniformSampling
AverageDegree
WeiboSampling
Outline
1 Introduction: Sampling
2 Size estimationBias correctionUniform Sampling
3 Average Degree
4 Weibo Sampling
JianguoLu
Introduction:Sampling
SizeestimationBiascorrection
UniformSampling
AverageDegree
WeiboSampling
Bias Correction
TheoremThe relative bias of N̂ can be approximated by the reciprocal of E(C), i.e.,
RB ≈1
E(C)(4)
Jianguo Lu, Dingding Li, Bias Correction in Small Sample from Big Data, TKDE, 2013.
5000 5500 6000 6500 7000 7500 8000 8500 9000 9500 100000.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
0.1
0.11
sample size n
mean e
st N
1/E(C)RB
Figure: RB and 1/E(C) against sample sizes in simulation study. It shows that N̂ is biased upwards,and the relative bias can be approximated by the reciprocal of E(C).
JianguoLu
Introduction:Sampling
SizeestimationBiascorrection
UniformSampling
AverageDegree
WeiboSampling
Our bias corrected estimators
N =n2
2(C + 1)(5)
N =n2
2(C + 1)Γ (6)
5000 5500 6000 6500 7000 7500 8000 8500 9000 9500 100000.98
1
1.02
1.04
1.06
1.08
1.1
1.12x 10
6
sample size n
mean e
st N
hat Nhat N*
Figure: N̂ and N̂S over 104 runs for various sample size in simulation study. Red dotted line is thetrue value.
JianguoLu
Introduction:Sampling
SizeestimationBiascorrection
UniformSampling
AverageDegree
WeiboSampling
Twitter data
N=41million.
0 500 1000 1500 2000 2500 3000 3500 40000
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.161/E(C)RB
0 500 1000 1500 2000 2500 3000 3500 40004.1
4.2
4.3
4.4
4.5
4.6
4.7
4.8x 10
7
sample size n
mean e
st N
hat Nhat N*
Expected bias Corrected estimation
JianguoLu
Introduction:Sampling
SizeestimationBiascorrection
UniformSampling
AverageDegree
WeiboSampling
Outline
1 Introduction: Sampling
2 Size estimationBias correctionUniform Sampling
3 Average Degree
4 Weibo Sampling
JianguoLu
Introduction:Sampling
SizeestimationBiascorrection
UniformSampling
AverageDegree
WeiboSampling
Uniform Random Sampling not Recommended
Lemma (Variance of N̂N )
The estimated variance of RN estimator N̂N is
v̂ar(N̂N ) ≈N2
E(C)≈
2N3
n2(7)
Lemma (Variance of N̂E )
The estimated variance of RE estimator N̂E is
v̂ar(N̂E ) ≈2N3
n2Γ
(1 +
nΓCV 2(Γ)
2N
), (8)
where CV (Γ) is the coefficient of variation of Γ.
Theorem (RN vs. RE)
To achieve the same variance of N̂E , N̂N needs to use at most√
Γ times moresamples.
JianguoLu
Introduction:Sampling
SizeestimationBiascorrection
UniformSampling
AverageDegree
WeiboSampling
Estimated vs. observed
0.05
0.1
0.15
0.2
RSE
(1)WikiTalk
0.05
0.1
0.15
RSE
(7)NotreDame
10 15 200.05
0.1
0.15
RSE
Sample Size
(13)Flickr
0.05
0.1
0.15
0.2(2)BerkStan
0.05
0.1
0.15
(8)Gowalla
10 15 200.05
0.1
0.15
Sample Size
(14)IMDB
0.05
0.1
0.15
0.2(3)EmailEu
0.05
0.1
0.15
(9)Epinions
10 15 200.05
0.1
0.15
Sample Size
(15)DBLP
0.05
0.1
0.15
0.2(4)Stanford
0.05
0.1
0.15
(10)Google
10 15 200.05
0.1
0.15
Sample Size
(16)Amazon
0.05
0.1
0.15
0.2(5)Skitter
0.05
0.1
0.15
(11)Slashdot
10 15 200.05
0.1
0.15
Sample Size
(17)Gnutella
0.05
0.1
0.15
0.2(6)Youtube
0.05
0.1
0.15
(12)Facebook
10 15 200.05
0.1
0.15
Sample Size
(18)Citation
Figure: Estimated and observed RSE of RE sampling with the growth of sample size over 18datasets. The sample size ranges between 10×
√2N/Γ and 20×
√2N/Γ, i.e., the expected
collisions are between 100 and 400.
JianguoLu
Introduction:Sampling
SizeestimationBiascorrection
UniformSampling
AverageDegree
WeiboSampling
RN and RE Sampling on Facebook Data
10 15 200
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
0.1
Sample Size (×√
2N)
RS
E
Observed RN
Estimated RN
Observed RE
Estimated RE
JianguoLu
Introduction:Sampling
SizeestimationBiascorrection
UniformSampling
AverageDegree
WeiboSampling
Comparison of RN, RE, and RW Sampling
Figure: Comparison of three sampling methods. The sample size n =√
2NC where√
C = 10. It showsthat for RN sampling (red solid bars), the relative standard error is equal to 1/
√C = 0.1 across all the
datasets. RE sampling is consistently smaller than RN sampling.RW sampling can approximate REsampling for some datasets. For NotreDame etc. that have low conductance, RW is grossly wrong.
JianguoLu
Introduction:Sampling
SizeestimationBiascorrection
UniformSampling
AverageDegree
WeiboSampling
Why RW Varies
100~+∞
10~992~91
Figure: Subgraphs obtained by RW sampling from Flickr, EmailEu, Stanford and Youtube. Eachsubgraph contains 60,000 nodes. Node colour represents its degree in the original graph.Green=1; Blue=2 ∼ 9; Orange= 10∼99; Red=100∼ ∞.
JianguoLu
Introduction:Sampling
SizeestimationBiascorrection
UniformSampling
AverageDegree
WeiboSampling
Sampling for average degree
Average degree is an important metrics in any network
In and out average degrees in Weibo are different.
Naive method–arithmetic sample mean
Problem– Variance is too large because of the power law
Solution– Use PPS (RE) sampling and harmonic mean estimator
On Twitter, PPS sampling can be hundreds of times better
JianguoLu
Introduction:Sampling
SizeestimationBiascorrection
UniformSampling
AverageDegree
WeiboSampling
Average Degree Estimation
〈̂d〉RN =1n
n∑i=1
dxi (9)
〈̂d〉RE = 〈̂d〉RW = n
[ n∑i=1
1dxi
]−1
(10)
TheoremSuppose the degrees follow Zipf’s law with exponent one, i.e., di = A
α+i . The varianceof the random node estimator is
var(〈̂d〉RN ) ≈〈d〉2
n
(N[
(α+ 1) ln2 N + α
1 + α
]−1− 1
). (11)
var(〈̂d〉RE ) ≈〈d〉2
n
(12
lnN + α
1 + α− 1). (12)
Main conclusionRE is better than RN in many cases; RW depends on graph conductance.
JianguoLu
Introduction:Sampling
SizeestimationBiascorrection
UniformSampling
AverageDegree
WeiboSampling
Degree distribution
100
101
102
103
104
105
106
107
100
101
102
103
104
105
106
107
Fre
qu
en
cy
(1)Twitter
100
101
102
103
104
105
106
107
100
101
102
103
104
105
106
(2)WikiTalk
100
101
102
103
104
105
100
101
102
103
104
105
(3)BerkStan
100
101
102
103
104
105
106
100
101
102
103
104
(4)EmailEu
100
101
102
103
104
105
100
101
102
103
104
105
(5)Stanford
100
101
102
103
104
105
106
100
101
102
103
104
105
(6)Skitter
100
101
102
103
104
105
106
100 101 102 103 104 105
Fre
qu
en
cy
(7)Youtube
100
101
102
103
104
105
106
100
101
102
103
104
105
(8)NotreDame
100
101
102
103
104
105
100
101
102
103
104
105
(9)Gowalla
100
101
102
103
104
105
100
101
102
103
104
(10)Epinion
100
101
102
103
104
105
106
100
101
102
103
104
(11)Google
100
101
102
103
104
105
100
101
102
103
104
(12)Slashdot
100
101
102
103
104
105
106
107
100
101
102
103
104
Fre
qu
en
cy
degree
(13)Facebook-1
100
101
102
103
104
105
100
101
102
103
104
degree
(14)Flickr
100
101
102
103
104
100 101 102 103 104
degree
(15)Facebook-2
100
101
102
103
104
105
100
101
102
103
104
degree
(16)Amazon
100
101
102
103
104
105
106
100
101
102
103
degree
(17)Citation
100
101
102
103
104
105
106
100
101
102
degree
(18)RoadNet
Plots are sorted in decreasing order of coefficient of variation γ.
Most of them follow power-law, yet they are very different
JianguoLu
Introduction:Sampling
SizeestimationBiascorrection
UniformSampling
AverageDegree
WeiboSampling
Comparison of Three Sampling methods
JianguoLu
Introduction:Sampling
SizeestimationBiascorrection
UniformSampling
AverageDegree
WeiboSampling
Comparison of RN and RE
0 5 10 15 20 25 30 35 400
5
10
15
20
25
γ
Ra
tio
o
f R
N a
nd
R
E
0 2 4 6 80
1
2
3
4
JianguoLu
Introduction:Sampling
SizeestimationBiascorrection
UniformSampling
AverageDegree
WeiboSampling
Comparison of RN, RE, and RW Samplings
0
2
4
6
8R
RM
SE
(1)Twitter
0
0.5
1
RR
MS
E
(7)Youtube
0 500 10000
0.5
1
Sample Size
RR
MS
E
(13)Facebook−1
0
1
2
3
4(2)WikiTalk
0
0.5
1
1.5
(8)NotreDame
0 500 10000
0.5
1
1.5
Sample Size
(14)Flickr
0
0.5
1
1.5
2
2.5(3)BerkStan
0
0.2
0.4
0.6
0.8
(9)Gowalla
0 500 10000
0.2
0.4
0.6
Sample Size
(15)Facebook−2
0
0.5
1
1.5
2(4)EmailEu
0
0.2
0.4
0.6
(10)Epinion
0 500 10000
0.1
0.2
0.3
Sample Size
(16)Amazon
0
0.5
1
1.5(5)Stanford
0
0.2
0.4
0.6
(11)Google
0 500 10000
0.2
0.4
0.6
Sample Size
(17)Citation
0
0.5
1
1.5(6)Skitter
0
0.2
0.4
0.6
(12)Slashdot
0 500 10000
0.05
0.1
0.15
Sample Size
(18)RoadNet
Figure: RRMSEs of RN, RE, and RW samplings as a function of sample size for 18 graphs.The dotted, dashed, and solid lines are for RN(. . . ), RE(−−), and RW(–) samplingsrespectively. It shows that in most cases the sample size does not change the relativepositions of the sampling methods. The exceptions are the web graphs 3 and 5 where RWsampling does not improve with the increase of sample size because of the random walktraps.
JianguoLu
Introduction:Sampling
SizeestimationBiascorrection
UniformSampling
AverageDegree
WeiboSampling
Sample distribution
100
105
100
102
104
(2)WikiTalk
100
105
100
101
102
103
(3)BerkStan
100
102
104
100
102
104
(4)EmailEu
100
105
100
101
102
103
(5)Stanford
100
105
100
101
102
103
(6)Skitter
100
105
100
101
102
103
Fre
quency
(7)Youtube
100
105
100
101
102
103
(8)NotreDame
100
105
100
101
102
103
(9)Gowalla
100
102
104
100
101
102
103
(10)Epinion
100
102
104
100
101
102
103
(11)Google
100
102
104
100
101
102
103
(12)Slashdot
100
102
104
100
101
102
103
Degree
Fre
quency
(13)Facebook−1
100
102
104
100
101
102
103
Degree
(14)Flickr
100
102
104
100
101
102
Degree
(15)Facebook−2
100
102
104
100
102
104
Degree
(16)Amazon
100
102
104
100
101
102
103
Degree
(17)Citation
100
100
102
104
Degree
(18)RoadNet
Figure: The degree distributions of the samples obtained from RE (Random Edge)samplings. n=8,000. The log-log plots in the first two rows exhibit a “V" shape, where thesampled small nodes resemble the distribution of the original graph, while the sampledlarge nodes have a tail pointing upwards. These plots in the first two rows indicate that bothsmall and large nodes are well represented in the sample. The plots in the last row indicatethat the sample distribution is similar to the original distribution, therefore the RRMSE of REsampling is similar to that of RN sampling.
JianguoLu
Introduction:Sampling
SizeestimationBiascorrection
UniformSampling
AverageDegree
WeiboSampling
Graph conductance and RW sampling
1.5 2 2.5 3 3.5 4 4.50
2
4
6
8
10
12
14
16
18
−log10(φ)
Ra
tio
of
RW
an
d R
E
Pearson Corr:0.68652Facebbok−2
Youtube
Amazon
Flickr
NotreDame
Stanford
BerkStan
Figure: Standard error ratio between RW and RE vs. graph conductance Φ for 18 datasets.Sample size is 400.
JianguoLu
Introduction:Sampling
SizeestimationBiascorrection
UniformSampling
AverageDegree
WeiboSampling
Network Structure
Flickr NotreDame Stanford Web
100~+∞
10~992~91
Amazon Facebook-2 Youtube
Figure: Random walks on six networks. Flickr, NotreDame and Stanford have looselyconnected components while Amazon, Facebook and Youtube are well enmeshed. Eachrandom walk contains 6× 104 nodes except NotreDame which has 15× 104 nodes. Nodecolour indicates the degree of the node. Green=1; Blue=2∼9; Yellow=10∼99; Red=100+.
JianguoLu
Introduction:Sampling
SizeestimationBiascorrection
UniformSampling
AverageDegree
WeiboSampling
Conductance
10-4
10-3
10-2
10-1
100
100
101
102
103
104
105
Conducta
nce
Number of nodes in the cluster
Flickr
10-5
10-4
10-3
10-2
10-1
100
100
101
102
103
104
105
106
Conducta
nce
Number of nodes in the cluster
NotreDame
10-5
10-4
10-3
10-2
10-1
100
100
101
102
103
104
105
106
Conducta
nce
Number of nodes in the cluster
Stanford
10-4
10-3
10-2
10-1
100
100
101
102
103
104
105
106
Conducta
nce
Number of nodes in the cluster
Amazon
10-2
10-1
100
100
101
102
103
104
105
Conducta
nce
Number of nodes in the cluster
Facebook-2
10-3
10-2
10-1
100
100
101
102
103
104
105
106
Conducta
nce
Number of nodes in the cluster
Youtube
Figure: Conductance Φ(S) over |S|, the size of the the components, for six networks. Plotsare drawn using SNAP API described in [?].
JianguoLu
Introduction:Sampling
SizeestimationBiascorrection
UniformSampling
AverageDegree
WeiboSampling
Very important
We can access only partial informationWhat is the global picture?
SizeDistributionMost influentialOverall topology (e.g. clustering coefficient)Message diffusion, Critical nodesCommunities...
JianguoLu
Introduction:Sampling
SizeestimationBiascorrection
UniformSampling
AverageDegree
WeiboSampling
Star Sampling
Select nodes uniformly at random(e.g., nodes 1 and 6);
Take all the neighbours as sample(nodes 2,3,4,5,5,7,9);
It approximate PPS (probabilityproportional to size) sampling;
More efficient than random walk bytaking all the neighbours instead arandom one;
We sampled around one millionWeibo stars;
1
2 3
4 5 6 7
89
10
11
12
1
2 3
4 5 6 7
89
10
11
12
JianguoLu
Introduction:Sampling
SizeestimationBiascorrection
UniformSampling
AverageDegree
WeiboSampling
Degrees and Messages
Figure: Estimated out-degree, in-degree, and message distributions of Weibo.
Average in-degree and out-degree as 32.10 (CI 31.91, 32.29) and 54.39 (CI 49.02,59.76), respectively.
JianguoLu
Introduction:Sampling
SizeestimationBiascorrection
UniformSampling
AverageDegree
WeiboSampling
Estimation of followers
fi di 〈̂d〉i Difference Ratio1 85016 23,335,290 16,859,105 6,476,185 0.382 75243 15,945,306 14,921,069 1,024,237 0.063 71417 15,247,604 14,162,354 1,085,250 0.074 37914 13,394,620 7,518,539 5,876,081 0.785 61962 13,278,161 12,287,380 990,781 0.086 63308 13,153,177 12,554,298 598,879 0.047 59969 12,990,041 11,892,158 1,097,883 0.098 57100 12,604,270 11,323,220 1,281,050 0.119 59406 12,097,122 11,780,512 316,610 0.02
10 54264 12,003,137 10,760,827 1,242,310 0.11
Table: Estimation for the top 10 Weibo accounts. fi : capture frequency of account i ; di : claimedin-degree or number of followers; 〈̂d〉i : estimated number of followers; Ratio = (di − 〈̂d〉i )/〈̂d〉i .
JianguoLu
Introduction:Sampling
SizeestimationBiascorrection
UniformSampling
AverageDegree
WeiboSampling
Estimated Followers
0 200 400 6000
0.5
1
1.5
2
2.5x 10
7
Accounts(top 500)
Follo
wer
s
D
Estimated
Claimed
100
102
104
104
105
106
107
Accounts(all)
Follo
wer
sE
Estimated
Claimed
0 5000 10000−2
0
2
4
6
8x 10
5
Diff
eren
ce
Accounts(all)
F
0 5000 10000−50
0
50
100
150A
Rat
io
Accounts0 5000 10000
−0.5
0
0.5
1
1.5
Sm
ooth
ed ra
tioAccounts
B
100
102
104
100
101
102
103
Rat
io
Accounts (large ratio)
C
JianguoLu
Introduction:Sampling
SizeestimationBiascorrection
UniformSampling
AverageDegree
WeiboSampling
Estimated vs. Claimed Followers
0 0.5 1 1.5 2 2.5
x 107
0
0.5
1
1.5
2
2.5x 10
7
Claimed followers
Estim
ate
d follow
ers
Account
when X=Y
Figure: Estimated followers vs. claimed followers in log-log scale. The Pearson correlationcoefficient is 0.9797.
JianguoLu
Introduction:Sampling
SizeestimationBiascorrection
UniformSampling
AverageDegree
WeiboSampling
Whether is it correct ...
Relative standard deviation of the estimator is
RSD(d̂i ) = 1/√
fi . (13)
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Top Users
De
gre
es
EmailEu
True Value
Est. Average
Error Bound
0.5
1
1.5
2
2.5
3
x 104
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Top Users
De
gre
es
youtube
True Value
Est. Average
Error Bound
Figure: Sampling accuracy on existing data