Toward a Statistical Approach to Automatic Particle ...chuanhai/teaching/598M/Docs/detection.pdf˜...
Transcript of Toward a Statistical Approach to Automatic Particle ...chuanhai/teaching/598M/Docs/detection.pdf˜...
Data and Data ReductionInitial Estimation of Particle Locations
Particle DetectionDiscussion
Data (a.eps and noise.eps generated by GIMP@linux)Background noiseData reduction via blocking
Toward a Statistical Approach to AutomaticParticle Detection in Electron Microscopy
Chuanhai Liu
Chuanhai Liu Toward a Statistical Approach to Automatic Particle Detection
Data and Data ReductionInitial Estimation of Particle Locations
Particle DetectionDiscussion
Data (a.eps and noise.eps generated by GIMP@linux)Background noiseData reduction via blocking
Image size = 2048 × 2048.
The data in the rectangular
region
(700, 300) : (1100, 700) is
used to study the
distribution of the
background noise.
Chuanhai Liu Toward a Statistical Approach to Automatic Particle Detection
Data and Data ReductionInitial Estimation of Particle Locations
Particle DetectionDiscussion
Data (a.eps and noise.eps generated by GIMP@linux)Background noiseData reduction via blocking
A block of background noise
The R-code generating this image:png(”noise.png”)par(mai=rep(0,4))image(700:1100, 300:700, X[700:1100,300:700])
dev.off()
Chuanhai Liu Toward a Statistical Approach to Automatic Particle Detection
Data and Data ReductionInitial Estimation of Particle Locations
Particle DetectionDiscussion
Data (a.eps and noise.eps generated by GIMP@linux)Background noiseData reduction via blocking
The marginal distribution
For alternative approximations, try, for example,
qqnorm(X[700:1100,300:700] 0̂.6)?
The R-code for “QQ-fit”:z= sort(c(X[700:1100,300:700]))qqfit.nbinom = function (y = z) {˜ n = length(y); p = ((1:n)-.5)/n˜ f = function(par){˜ ˜ par=exp(par)˜ ˜ size = par[1]; prob = par[2]/(1+par[2])˜ ˜ x = qnbinom(p, size=size, prob=prob)˜ ˜ return(mean(abs(lm(y x)$res)))˜ }˜ out = optim(par=c(log(200), 0), fn=f)˜ par=exp(out$par)˜ size = par[1]; prob = par[2]/(1+par[2])˜ return(list(size = size, prob = prob))}˜The noise distribution is a mixture of Poissondistributions with Gamma distributed means:Z ∼ Gamma(shape = size)*(1-prob)/prob;Y ∼ Poisson(Z)˜
EVEV+VE
≈ 2.2% =⇒ model the noise by Gamma
distributions (for simplicity?)
Chuanhai Liu Toward a Statistical Approach to Automatic Particle Detection
Data and Data ReductionInitial Estimation of Particle Locations
Particle DetectionDiscussion
Data (a.eps and noise.eps generated by GIMP@linux)Background noiseData reduction via blocking
The (spatial) dependence
Ignore the spatial trend for the selected block of noise and computethe (homogeneous/stationary) spatial correlation coefficients.
2 4 6 8 10
0.0
0.2
0.4
0.6
0.8
1.0
distance
corr
elat
ion
coef
ficie
nt
Do the pattern of the correlation coefficients,
(1.00, 0.47, 0.10, 0.02, 0.02, 0.02, 0.01, 0.01, ...),
suggest a moving-average model?
Chuanhai Liu Toward a Statistical Approach to Automatic Particle Detection
Data and Data ReductionInitial Estimation of Particle Locations
Particle DetectionDiscussion
Data (a.eps and noise.eps generated by GIMP@linux)Background noiseData reduction via blocking
Specification of the (joint) sampling distribution
To do when it is needed....
Chuanhai Liu Toward a Statistical Approach to Automatic Particle Detection
Data and Data ReductionInitial Estimation of Particle Locations
Particle DetectionDiscussion
Data (a.eps and noise.eps generated by GIMP@linux)Background noiseData reduction via blocking
Due to the small signal-to-ratio and the high image resolution with respect to particlesof interest, data reduction for computational simplicity is possible with much loss ofinformation for particle detection. Consider square blocks of size 2m for some integerm; see R-functions named ParticleDetection/.RData/block and block.png.
Blocks of size 24 = 16 are too large!
Chuanhai Liu Toward a Statistical Approach to Automatic Particle Detection
Data and Data ReductionInitial Estimation of Particle Locations
Particle DetectionDiscussion
Data (a.eps and noise.eps generated by GIMP@linux)Background noiseData reduction via blocking
R-functions block and block.png
block=function (X, block.size=8, plot.it=TRUE){˜ m = block.size; n = dim(X) %/% m˜ x = as.list(1:(m*m))˜ for(i in 1:m){ for(j in 1:m){˜ ˜ x[[(i-1)*m+j]] = X[(0:(n[1]-1))*m + i, (0:(n[2]-1))*m + j]˜ }}˜ A = 0; ˜ for(i in 1:length(x)) A = A + x[[i]]; ˜ A = A/(m*m)˜ V = 0; ˜ for(i in 1:length(x)) V = V + (x[[i]]-A) 2̂; ˜ v = V/(m*m-1)˜ if(plot.it){; ˜ par(mfrow=c(1,2))˜ ˜ image(A, main = paste(”Average over every ”,m,”x”,m, ” blocks”, sep=””))˜ ˜ image(log(v), main = ”Associated variances in log scale”)˜ }˜ return(list(a=A, v=v))}˜block.png = function (X, block.size, file, height, width) {˜ png(file, width=width, height=height)˜ b=block(X, block.size=block.size)˜ dev.off()}
Chuanhai Liu Toward a Statistical Approach to Automatic Particle Detection
Data and Data ReductionInitial Estimation of Particle Locations
Particle DetectionDiscussion
Data (a.eps and noise.eps generated by GIMP@linux)Background noiseData reduction via blocking
Blocks of size 24 = 16 look satisfactory — a formal statisticalapproach needs to be considered!
Chuanhai Liu Toward a Statistical Approach to Automatic Particle Detection
Data and Data ReductionInitial Estimation of Particle Locations
Particle DetectionDiscussion
Reduced data and assumptionsGradientsInitial estimation of particle boundary locations
In what follows, we use the reduced data via blocking with blocksize 23 = 8, using the averages and ignoring the (within-block)variances. Thus, we have
◮ The reduced image size is 256 × 256.
◮ The (background) noise distribution is approximately that ofwhite (Gaussian) noise.
Chuanhai Liu Toward a Statistical Approach to Automatic Particle Detection
Data and Data ReductionInitial Estimation of Particle Locations
Particle DetectionDiscussion
Reduced data and assumptionsGradientsInitial estimation of particle boundary locations
The definition
Notice that the background noise intensities, especially, those nearthe boundaries of particles of interest are higher than those ofparticle images. Let
Ii ,j (1 ≤ i , j ≤ 256)
be the intensity at the pixel location (i , j). Then, the differencevectors, called gradients,
Gi ,j =(
Imin(i+1,n1),j − Imax(i−1,1),j , Ii ,min(j+1,n2) − Ii ,max(j−1,1)
)
provide useful information for finding particle locations, where(n1, n2) = (256, 256).
Chuanhai Liu Toward a Statistical Approach to Automatic Particle Detection
Data and Data ReductionInitial Estimation of Particle Locations
Particle DetectionDiscussion
Reduced data and assumptionsGradientsInitial estimation of particle boundary locations
R-function get.gradients
get.gradients = function (I=block(X,block.size=8)$a){˜ n = dim(I)˜ Gy = cbind(I[,2]-I[,1], I[,3:n[2]]-I[,1:(n[2]-2)], I[,n[2]]-I[,n[2]-1])˜ Gx = rbind(I[2,]-I[1,], I[3:n[1],]-I[1:(n[1]-2),], I[n[1],]-I[n[1]-1,])˜ return(list(r = sqrt(Gx*Gx+Gy*Gy), theta=atan2(Gy,Gx)))}
Chuanhai Liu Toward a Statistical Approach to Automatic Particle Detection
Data and Data ReductionInitial Estimation of Particle Locations
Particle DetectionDiscussion
Reduced data and assumptionsGradientsInitial estimation of particle boundary locations
The distribution
Most of pixel values are noise. Thus, at most pixel locations
◮ The gradient directions and lengths are independent.
◮ The gradient directions are uniform.
◮ The gradient lengths have the marginal distribution
‖ Gi ,j ‖2∼ Exp(σ2)
for some σ2 > 0, i.e., a scaled chi-square with 2 degrees offreedom, with small spatial correlation (zero long-range(distance > 1) correlation — independence).
Chuanhai Liu Toward a Statistical Approach to Automatic Particle Detection
Data and Data ReductionInitial Estimation of Particle Locations
Particle DetectionDiscussion
Reduced data and assumptionsGradientsInitial estimation of particle boundary locations
The histograms
Histogram of g$r^(2/3)
g$r^(2/3)
Den
sity
0 5 10 15 20
0.00
0.05
0.10
0.15
0.20
Histogram of g$theta
g$theta
Fre
quen
cy
−3 −2 −1 0 1 2 3
010
0030
0050
0070
00
The fitted density curve and histogram of g$r2/3 indicate thepresence of outliers, as expected.
Chuanhai Liu Toward a Statistical Approach to Automatic Particle Detection
Data and Data ReductionInitial Estimation of Particle Locations
Particle DetectionDiscussion
Reduced data and assumptionsGradientsInitial estimation of particle boundary locations
ML estimation of the null distribution from truncated data
ml.tchi = function (x, df=2, par=log(3.8)) {˜ x=sort(as.numeric(x))˜ b=max(x)˜ n = length(x)˜ fn = function(par){˜ ˜ scale = exp(par)˜ ˜ y = x/scale˜ ˜ P = pchisq(b/scale,df=df)˜ ˜ lf = sum(dchisq(y, df=df, log=TRUE)) - n*log(P*scale)˜ ˜ return(-lf)˜ }˜ out = optim(par=par, fn=fn)˜ scale = out$par = exp(out$par)˜ hist(x, prob=TRUE, main=”ML fit of truncated chi-square”,˜ ˜ breaks = seq(0, max(x), length=13),˜ ˜ xlab=”truncated observations”)˜ lines(x, dchisq(x/scale,df=df)/pchisq(b/scale,df=df)/scale, col=”green”)˜ return(out)}
Chuanhai Liu Toward a Statistical Approach to Automatic Particle Detection
Data and Data ReductionInitial Estimation of Particle Locations
Particle DetectionDiscussion
Reduced data and assumptionsGradientsInitial estimation of particle boundary locations
ML estimation of the null distribution from truncated data
50 60 70 80 90 100
8090
100
110
120
cutting point (percentile)
estim
ate
of s
cale
σ2 ≈ 81., i.e., ‖ Gi ,j ‖2 /81
·∼ Exp(1)
Chuanhai Liu Toward a Statistical Approach to Automatic Particle Detection
Data and Data ReductionInitial Estimation of Particle Locations
Particle DetectionDiscussion
Reduced data and assumptionsGradientsInitial estimation of particle boundary locations
R function fdr
fdr = function (v=g$rˆ2, df=2, scale=81, p=.60) {˜ P = pchisq(as.numeric(v)/scale,df=df); dim(P) = dim(v)˜ a = sort(as.numeric(v)); x = aˆ(1/3)˜ hist(x,breaks=60, prob=TRUE, xlab=”(g$rˆ2)ˆ(1/3)”)˜ g = density(x)˜ k = length(g$x[g$x <= quantile(x,p)]); x.p = g$x[k]˜ lambda = p/pchisq(x.pˆ3/scale,df=df)˜ Z = lambda*dchisq(g$xˆ3/scale,df=df)*3*g$xˆ2/scale˜ y = g$y[(k+1):length(Z)]; ˜ x = g$x[k:length(Z)]; ˜ z = Z[(k+1):length(Z)]˜ Sy = rev(cumsum(rev(diff(x)*y)))˜ Sz = rev(cumsum(rev(diff(x)*z)))˜ FDR = data.frame(x=x[-length(x)]ˆ3,fdr = Sz/Sy)˜ polygon(c(x[-1],rev(x[-1])),c(y,rev(z)),col=”red”)˜ abline(v=x.p, col=”green”,lwd=2)˜ lines(g$x, g$y, col=”blue”, lwd=2, lty=2)˜ lines(g$x, Z, col=”green”, lwd=2)˜ text(9.2,0.028,paste(round((1-lambda)*100),”% outliers”,sep=””),˜ ˜ srt=-40,cex=1.2, col=”white”)˜ return(list(scale=scale, df=df, P=P, fdr=FDR,˜ ˜ outliers=data.frame(x=x[-1],cdf=Sy-Sz)))}
Chuanhai Liu Toward a Statistical Approach to Automatic Particle Detection
Data and Data ReductionInitial Estimation of Particle Locations
Particle DetectionDiscussion
Reduced data and assumptionsGradientsInitial estimation of particle boundary locations
Outlier identification
Histogram of x
(g$r^2)^(1/3)
Den
sity
0 5 10 15 20
0.00
0.05
0.10
0.15
0.20
10% outliers
Chuanhai Liu Toward a Statistical Approach to Automatic Particle Detection
Data and Data ReductionInitial Estimation of Particle Locations
Particle DetectionDiscussion
Reduced data and assumptionsGradientsInitial estimation of particle boundary locations
False discovery rate (FDR)
6 8 10 12 14 16 18 20
0.0
0.2
0.4
0.6
(g$r^2)^(1/3)
fdr
Chuanhai Liu Toward a Statistical Approach to Automatic Particle Detection
Data and Data ReductionInitial Estimation of Particle Locations
Particle DetectionDiscussion
Reduced data and assumptionsGradientsInitial estimation of particle boundary locations
Boundary locations with fdr ≤ 5%
Chuanhai Liu Toward a Statistical Approach to Automatic Particle Detection
Data and Data ReductionInitial Estimation of Particle Locations
Particle DetectionDiscussion
Reduced data and assumptionsGradientsInitial estimation of particle boundary locations
A comment: modeling outliers (if necessary)
−2 −1 0 1 2
1.8
2.0
2.2
2.4
2.6
2.8
ln−normal for the outliers?
normal quantile
ln(o
utlie
rs.q
uant
ile) Question: Does it make sense
to consider a mixture ofexponential and log-normalfor ‖ Gi ,j ‖
2?
Chuanhai Liu Toward a Statistical Approach to Automatic Particle Detection
Data and Data ReductionInitial Estimation of Particle Locations
Particle DetectionDiscussion
Local smoothingLinear spatial trend removalSegmentationParticle Identification
The four-step procedure
1. (Feature-protected) local smoothing
2. (Linear) spatial trend removal
3. (Adaptive) segmentation: intensity thresholding withthresholds selected based smoothed intensities at initialestimates of particle locations
4. Particle selection (convex shapes and sensible sizes)◮ Trimming?◮ Soft-thresholding — using difference quantiles?◮ Fine tuning by using the original raw 2048× 2048 image?
Chuanhai Liu Toward a Statistical Approach to Automatic Particle Detection
Data and Data ReductionInitial Estimation of Particle Locations
Particle DetectionDiscussion
Local smoothingLinear spatial trend removalSegmentationParticle Identification
The method
◮ Specifying neighbors {Ni ,j} for each pixel (i , j).
◮ Estimating the “mean” µi ,j of xi ,j as the smoothed value ofxi ,j based on a simple, both conceptually and computationally,model.
We consider the model
xi,j ∼ N(µi,j , σ2) and xk ∼ N(µi,j , σ
2 + δ2) (k ∈ Ni,j \ {(i , j)}).
We call this local smoothing method Hanning and also consider
Hanning on smoothed values for further smoothing.
The use of mixture of two normals can be interesting, but
computationally can be too expensive. Consider other alternative
models?
Chuanhai Liu Toward a Statistical Approach to Automatic Particle Detection
Data and Data ReductionInitial Estimation of Particle Locations
Particle DetectionDiscussion
Local smoothingLinear spatial trend removalSegmentationParticle Identification
An implementation in R
Hanning = function (x, sigmasq) {˜ Z = get.neighbors(x)˜ Z0 = Z[[9]]˜ m = (Z[[1]]+Z[[2]]+Z[[3]]+Z[[4]]+Z[[5]]+Z[[6]]+Z[[7]]+Z[[8]])/8˜ v = ( (Z[[1]]-m)ˆ2+(Z[[2]]-m)ˆ2+(Z[[3]]-m)ˆ2 +(Z[[4]]-m)ˆ2˜ ˜ ˜ +(Z[[5]]-m)ˆ2+(Z[[6]]-m)ˆ2 +(Z[[7]]-m)ˆ2+(Z[[8]]-m)ˆ2)/7˜ if(missing(sigmasq)) sigmasq = median(v)˜ v[v¡=sigmasq] = sigmasq˜ w = sigmasq*8/v˜ a = (Z0+w*m)/(1+w)˜ return(a)}
Chuanhai Liu Toward a Statistical Approach to Automatic Particle Detection
Data and Data ReductionInitial Estimation of Particle Locations
Particle DetectionDiscussion
Local smoothingLinear spatial trend removalSegmentationParticle Identification
Performance
Chuanhai Liu Toward a Statistical Approach to Automatic Particle Detection
Data and Data ReductionInitial Estimation of Particle Locations
Particle DetectionDiscussion
Local smoothingLinear spatial trend removalSegmentationParticle Identification
The method
◮ Find the boundary locations using a FDR (e.g., 5%) threshold.
◮ Remove isolated single pixels.
◮ Compute the sample mean, µ, and variance, σ2, of pixelintensities at the selected boundary pixels.
◮ Estimate the trend using row and column α-quantiles of pixelintensities, with the comment proportion/probability α chosenby estimates of the number of noise pixels (?).
Chuanhai Liu Toward a Statistical Approach to Automatic Particle Detection
Data and Data ReductionInitial Estimation of Particle Locations
Particle DetectionDiscussion
Local smoothingLinear spatial trend removalSegmentationParticle Identification
An implementation in R
remove.trend = function(Z, mu, sigma, alpha = 0.5) {˜ A = Z > mu+alpha*sigma˜ a = as.numeric(A%*%rep(1/n[2],n[2]))˜ b = as.numeric(rep(1/n[1],n[1])%*%A)˜ p = mean(c(a,b))˜ a = as.numeric(apply(Z,1,function(x){return(quantile(x,prob=p))}))˜ b = as.numeric(apply(Z,2,function(x){return(quantile(x,prob=p))}))˜ data = data.frame(y=c(a,b), col=c(1:n[1], rep(0,n[2])),˜ ˜ ˜ ˜ row=c(rep(0,n[1]),1:n[2]))˜ out = lm(y∼col+row, data=data)˜ y.hat = out$fitted; y.hat = y.hat-mean(y.hat)˜ x.hat = y.hat[1:n[1]]; y.hat = y.hat[n[1]+(1:n[2])]˜ Z = Z - matrix(x.hat,n[1],n[2]) - matrix(y.hat,n[1],n[2], byrow=TRUE)˜ return(Z)
}
Chuanhai Liu Toward a Statistical Approach to Automatic Particle Detection
Data and Data ReductionInitial Estimation of Particle Locations
Particle DetectionDiscussion
Local smoothingLinear spatial trend removalSegmentationParticle Identification
Performance
Images of pixel values that are less than µi + 0.5 ∗ σi for i = “Before” and “After” (2
hangings), respectively. Notice that a trend from Top-Right to Bottom-Left is
apparent before trend removal.
Chuanhai Liu Toward a Statistical Approach to Automatic Particle Detection
Data and Data ReductionInitial Estimation of Particle Locations
Particle DetectionDiscussion
Local smoothingLinear spatial trend removalSegmentationParticle Identification
Thresholding
◮ Compute a normal reference distribution based on smoothedpixel intensities at estimated particle boundary locations, thelocations with large gradients for specification of thresholds.Denote this reference distribution by N(µ, σ2).
◮ Specify a sequence of thresholds, e.g.,
µ + zkσ (zk = .5, .4, ...)
Chuanhai Liu Toward a Statistical Approach to Automatic Particle Detection
Data and Data ReductionInitial Estimation of Particle Locations
Particle DetectionDiscussion
Local smoothingLinear spatial trend removalSegmentationParticle Identification
Classification: shapes and sizes of particle (2D) projections
◮
Chuanhai Liu Toward a Statistical Approach to Automatic Particle Detection
Data and Data ReductionInitial Estimation of Particle Locations
Particle DetectionDiscussion
...
...
Chuanhai Liu Toward a Statistical Approach to Automatic Particle Detection