Statistical Matrices for Quality Assessment of High Density

9
Statistical matrices for Quality assessment of High Density Tiling Array Data In this paper, we will discuss about the high density tiling arrays which was designed to cover the genomic region and is mostly used in biological applications. Tiling arrays are very important because this is used by many researchers in labs to develop a quality control tool as it was prepared for microarrays. In this paper, several statistical metrics analogous are proposed to those in microarrays that are related to the applications of tiling array data. A method was developed to guess the consequence level of observed quality measurement with the help of randomization test. There are several probes planted on the arrays and there number has been increased to millions, this is similar to trend of a transistor on the Si chip in a semiconductor industry. For example, affymetrix human Genome U133 microarray consists of 45000 probe sets that represent 33000 genes. Tiling arrays are widely used transcription mapping, site mapping, and DNA methylation, high density array CGH for

description

stat

Transcript of Statistical Matrices for Quality Assessment of High Density

Statistical matrices for Quality assessment of High DensityTiling Array DataIn this paper, we will discuss about the high density tiling arrays which was designed to cover the genomic region and is mostly used in biological applications. Tiling arrays are very important because this is used by many researchers in labs to develop a quality control tool as it was prepared for microarrays. In this paper, several statistical metrics analogous are proposed to those in microarrays that are related to the applications of tiling array data. A method was developed to guess the consequence level of observed quality measurement with the help of randomization test. There are several probes planted on the arrays and there number has been increased to millions, this is similar to trend of a transistor on the Si chip in a semiconductor industry. For example, affymetrix human Genome U133 microarray consists of 45000 probe sets that represent 33000 genes.Tiling arrays are widely used transcription mapping, site mapping, and DNA methylation, high density array CGH for comparative genomic hybridization and in many other ways also. There are many issues in the experiment that can cause challenging problems. One of them is the quality control and assessment. There are many ways changeability consisting of biological differences and noise produced in the different stages of the experiment like RNA isolation, chromatin sonication and amplification and handling of DNA fragment. The cost of a single chip may be in thousands of dollars that can make our experiment very expensive. So, it is better to check the procedure before adopting it and make sure that its samples are in good condition and unwanted noise can be reduced to a minimum level. Its main focus is on the design and analysis of tiling array design like array design optimization, normalizing and processing methods and on the prediction programs but none of them had careful dialogue on the quality check and on the assessment issues. Nadon and Shoemaker suggest two types of measurement errors: systematic and random. Random error is the degree of uncertainty. This error cannot be removed and can be calculated with the help of statistical test from the data observed. Systematic errors produced due to difference of measured value with the actual value and this error may cause change in results. They recommend that if the multiple replications are used for the same sample of DNA, this can help us in reducing random and systematic errors.There are many differences in tiling arrays. For example, in tiling arrays, it can be cRNA or cDNA but in expression microarrays, cRNA samples are hybridized and labeled. In expression microarrays, it is assumed that the most of the genes are not biologically harmed. While in tiling arrays, this assumption is valid only under specific situations. NUSE and RLE are the statistical tools that were developed for evolution is expression microarrays. RLE can compute the measurement of the expression of the specific probe set in the chip that can deviate from that to other chips. NUSE can compute the relative measure of expression estimate. For an efficient expression array, NUSE median should be unity and IQR of NUSE should be selected small and related to other chips.For n tiling arrays, RLE can be calculated by constructing a reference ship n+1 while the value of each probe should be the median of the remaining n chips. RLE is actually the log ratio of its intensity i and the reference chip: log (Iik/In+1,k). While, NUSE for n arrays can be calculated by satisfying the following condition for each probe: log (Iij) = ui + aj + eij and NUSE is defined byNUSE= Where, Wi = j wijIn Chip-chip studies, following assumptions should be considered in mind: Most of the probe sets should not be display signals The number of probe sets of positive signals should be equal to the number of probe sets of negative signals.Thus both NUSE and RLE can be used and NUSE is more sensitive than RLE to detect array differences. There are many applications of high density tiling array data. The four public data sets are used to describe the proposed methods. The following Figure 1 shows the parallel boxplots of NUSE for different samples of 2000 probe sets for six chips. The figure shows that the third control has large IQR and median that shows that the level of noise for third control is higher than the remaining five samples. To determine the sufficient level for the variation of the median NUSE from figure 1, we run thousand permutations for all these probe sets on all the six chips and determined the variation between NUSE median and 1 is reached and compares it to the calculated value. The significance level for third control is 0.001means that 1000 randomly permuted difference were calculated and none of them is bigger than the calculated value that shows that the difference is impossible because of chance and the 3rd control is outlier.

Figure 1: NUSE for random 2000 probe clusters: Array A

The outlier for RLE is also shown in the following figure 2.

Figure 2: RLE of binding activity: Array AComparison of these two figures shows that the RLE is less sensitive than NUSE in the detection of outlier chips. The following figure 3 shows the plot of array type M01R. This figure shows the parallel box-plots of NUSE for 2000 probe sets. The 3rd NHBE is the outlier which is due to the protocol change. The 1st two samples are hybridized and labeled together and the 3rd sample was hybridized later. The level of significance for the seven chips is 0.001that shows that the changeability is introduced in NNK and Hypoxia samples but they may not be detected by just looking at the parallel box-plots.

Figure 3: NUSE for random 2000 probe clusters: M01RIn conclusion, we developed the set of statistical metrics for the measurement tiling arrays similar to tools that were used in expression microarrays and studied the effects of different window sizes in these arrays and results showed that they are robust. We concluded that each array platform has its own probe spacing specimen. Assessment and quality control are one of the most widely studied topics in manufactured commercial microarrays because they have wide applications in biological and medical research. Different protocols can be adopted in different labs that depend on the type of disease, cell type and the type of treatment. There are two types of control array, real control and input control. Real control is just like same as the treatment array but the only difference is that the cells in real control are untreated. On the other hand, input control is simply the genomic DNAs. Input control is better than real control because it can reduce genomic noise. The access amount of replication an array plays an important role in the quality of the experiment and is a better method to random errors. Usually three or four replicates per group are used in expression microarrays.