Recognition Methods for 3D Textured Surfaceskdana/Publications/cula2001a.pdf · the texture. The...

14
In Proceedings of SPIE Conference on Human Vision and Electronic Imaging VI, San Jose, January 2001 Recognition Methods for 3D Textured Surfaces Oana G. Cula and Kristin J. Dana Rutgers University Piscataway, NJ 08854 ABSTRACT Texture as a surface representation is the subject of a wide body of computer vision and computer graphics literature. While texture is always associated with a form of repetition in the image, the repeating quantity may vary. The texture may be a color or albedo variation as in a checkerboard, a paisley print or zebra stripes. Very often in real-world scenes, texture is instead due to a surface height variation, e.g. pebbles, gravel, foliage and any rough surface. Such surfaces are referred to here as 3D textured surfaces. Standard texture recognition algorithms are not appropriate for 3D textured surfaces because the appearance of 3D textured surfaces changes in a complex manner with viewing direction, illumination direction and scale. Recent methods have been developed for recognition of 3D textured surfaces using a database of surfaces observed under varied imaging parameters. One of these methods is based on 3D textons obtained using K-means clustering of multiscale feature vectors. Another method uses eigen-analysis originally developed for appearance-based object recognition. In this work we develop a hybrid approach that employs both feature grouping and dimensionality reduction. The method is tested using the Columbia-Utrecht texture database (CUReT) and provides excellent recognition rates. The method is compared with existing recognition methods for 3D textured surfaces. A direct comparison is facilitated by empirical recognition rates from the same texture data set. The current method has key advantages over existing methods including requiring less apriori information on both the training and novel images. 1 Introduction The appearance of ordinary surfaces of real world scenes is surprisingly difficult to model. The complex interactions of light with surfaces such as a tree bark, a stucco wall or a sandy hill creates our rich visual experience. These surfaces exhibit statistical variations that we loosely term texture. It is the key characteristics of the statistical variation that define the texture. The varying quantity may be in terms of surface reflectance, fine-scale surface geometry or both. In this work we refer to surfaces with fine-scale statistical height variations as 3D textured surfaces. The examples mentioned, the tree bark, stucco wall and sandy hill can all be classifies as 3D textured surfaces. Surfaces which are locally planar, but exhibit a color or albedo variation only are termed 2D textured surfaces. For 2D texture, recognition is an easier problem because the changes that occur with changes of viewing and illumina- tion direction are straightforward and well-defined. For example, changes in illumuniation direction can be characterized by an overall scaling of the surface colors. Changes in viewing direction have been shown to be described as an affine trans- formation as in [23]. Much of the texture recognition work in the literature is implicitly based on 2D texture without regard to variations of viewing and illumination condition and a single image is used to represent texture. For 2D texture numerous 1

Transcript of Recognition Methods for 3D Textured Surfaceskdana/Publications/cula2001a.pdf · the texture. The...

  • In Proceedings of SPIE Conference on Human Vision and Electronic Imaging VI, San Jose, January 2001

    Recognition Methods for 3D Textured Surfaces

    Oana G. Cula and Kristin J. DanaRutgers UniversityPiscataway, NJ 08854

    ABSTRACT

    Texture as a surface representation is the subject of a wide body of computer vision and computer graphics literature.While texture is always associated with a form of repetition in the image, the repeating quantity may vary. The texturemay be a color or albedo variation as in a checkerboard, a paisley print or zebra stripes. Very often in real-world scenes,texture is instead due to a surface height variation, e.g. pebbles, gravel, foliage and any rough surface. Such surfacesare referred to here as 3D textured surfaces. Standard texture recognition algorithms are not appropriate for 3D texturedsurfaces because the appearance of 3D textured surfaces changes in a complex manner with viewing direction, illuminationdirection and scale. Recent methods have been developed for recognition of 3D textured surfaces using a database ofsurfaces observed under varied imaging parameters. One of these methods is based on 3D textons obtained using K-meansclustering of multiscale feature vectors. Another method uses eigen-analysis originally developed for appearance-basedobject recognition.

    In this work we develop a hybrid approach that employs both feature grouping and dimensionality reduction. Themethod is tested using the Columbia-Utrecht texture database (CUReT) and provides excellent recognition rates. Themethod is compared with existing recognition methods for 3D textured surfaces. A direct comparison is facilitated byempirical recognition rates from the same texture data set. The current method has key advantages over existing methodsincluding requiring less apriori information on both the training and novel images.

    1 Introduction

    The appearance of ordinary surfaces of real world scenes is surprisingly difficult to model. The complex interactions oflight with surfaces such as a tree bark, a stucco wall or a sandy hill creates our rich visual experience. These surfacesexhibit statistical variations that we loosely term texture. It is the key characteristics of the statistical variation that definethe texture. The varying quantity may be in terms of surface reflectance, fine-scale surface geometry or both. In this workwe refer to surfaces with fine-scale statistical height variations as 3D textured surfaces. The examples mentioned, the treebark, stucco wall and sandy hill can all be classifies as 3D textured surfaces. Surfaces which are locally planar, but exhibita color or albedo variation only are termed 2D textured surfaces.

    For 2D texture, recognition is an easier problem because the changes that occur with changes of viewing and illumina-tion direction are straightforward and well-defined. For example, changes in illumuniation direction can be characterizedby an overall scaling of the surface colors. Changes in viewing direction have been shown to be described as an affine trans-formation as in [23]. Much of the texture recognition work in the literature is implicitly based on 2D texture without regardto variations of viewing and illumination condition and a single image is used to represent texture. For 2D texture numerous

    1

  • techniques representations have been developed since the early seventies and are used in areas like texture-mapping [3] [16][18] [30] [2] [42] [31] [37] [13] [25] [7], texture synthesis [46] [29] [15] [26] [5] [27] [19] [4] [34], shape-from-texture [20][35] [17] [23] [41], texture classification and segmentation [47] [21] [1] [38] [36] [23] [45] [28] [4]. A complete survey oftexture representations would be quite lengthy, as there are many styles and approaches.

    To computationaly characterize the spatial distribution of pixel intensities in image texture, histograms of image featureshave been used. That is, the statistics of image features are computed instead of the statistics of image intensities. A popularimage feature that is used in these methods is the image edge or image gradients (e.g. [44]). To further characterizethe spatial distribution of pixels, multidimensional histograms, or cooccurrence matrices, have been employed as a texturerepresentation (e.g. [6][43]). A more recent development in the area of histogram representations of texture is the multiscalehistogram [19],[4],[48][14]. The multiscale histogram has been used with marked success in synthesizing and recognizingtexture due to color variation. To obtain the multiscale histogram, the texture image is decomposed into a multiresolutionimage pyramid. Then, the conditional joint probability of intensities and features at a particular scale given the parentintensities and features at coarser scales are computed. This leads to a non-parametric description that is based on a targettexture image exemplifying the texture class.

    What 2D texture representations all have in common is that they do not account for the photometry and geometry of a 3Dtextured surface that cause changes in appearance with illumination and viewing conditions due to effects such as shading,shadowing, foreshortening and occlusions. In the case of a 3D textured surface, a single image simply cannot captureappearance accurately. Therefore recognition methods built on single image representations of texture are destined to benon-robust in real world conditions under varying imaging parameters. As such, there has been a recent surge of activityin producing more meaningful texture representations and recognition methods. The CUReT database[11] [12] provides astarting point in empirical studies of 3D textured surface appearance. This database contains BTF and BRDF measurementsfrom over 60 different samples, each observed with over 200 different combinations of viewing and illumination directions.This work has been used in numerous recent studies [22][24][39][8] [10][9][40]. Methods for texture recognition basedon appearance-based representation derived from the image sets are discussed in [24],[39] and [9]. In [24], a 3D texton iscreated by using a multiscale filter bank applied to an image set for a particular sample. The filter responses as a function ofviewing and illumination direction are clustered to form appearance features that are termed 3D textons. In [9], the texturerepresentations are the conditional histograms of the response to a small multiscale filter bank. Principal componentshistograms is performed on the histogram of filter outputs and recognition is done using the SLAM library [33][32] withthe histogram vectors forming the appearance-based feature vector. In [39], the individual texture images are representedusing multiband correlation functions that consider both within and between color band correlations. Principle componentsanalysis is used to reduce dimensionality of the representation and this color information is used to aid in recognition. Unlike[24] and [9], [39] uses texture color as well as structure to recognize objects from the CUReT database, but this use of colorassists the recognition greatly for a restricted dataset of varied color. For all of the 3D texture recognitionmethods described,surface appearance is represented by a collection of images taking from multiple viewing and illumination directions. Assuch, these techniques are image-based or appearance-basedmethods. In addition, each method described uses either featuregrouping or dimensionality reduction to improve algorithm performance.

    In this work we present another method for recognition of 3D textured surfaces that incorporates both feature groupingand dimensionality reduction. Specifically, within a single image of texture, image features are grouped to form imagetextons or 2D textons as described in [24]. The grouping is very similar to the method used to obtain 3D textons in [24].The distribution of the image textons as a function of illumination and viewing direction is used as the surface descriptor.Dimensionality reduction via principal components analysis (PCA) is used to compress the set of image texton histogramsand a manifold is constructed in the reduced eigenspace to represent each sample. A novel texture image of unknownviewing and illumination direction is used to recognize the texture. We provide a detailed comparison of our method withthe method of 3D textons[24] as this is the closest related method. The comparison is facilitated by using the same dataset (CUReT) for our recognition tests. We outline some advantages of our method in terms of real world application. Mostsignficantly the recognition with the image texton method is done in a simple manner without the need for an iterativeMonte Carlo algorithm. Using individual novel images from unknown viewing and illumination direction, the recognitionrate is 98% with the image texton method.

  • Figure 1: Recognition using the 3D texton method of [24].3D Texton Method

    Training

    Recognition Method 1

    Registered Images from �Known View/ Illumination

    Creation of Texton Library

    Sample 1

    texton�labels

    Sample 2�.�.�Sample M

    Nvl

    Texton Labeling Compute �Texton HistogramTexture �

    Representation

    Novel Sample

    Novel Sample

    Metropolis Algorithm

    Nfil*

    F

    k-means clustering�on size (Nvl x Nfil)�

    feature vectors}Sample 1

    Sample 2�.�.�Sample N

    Nvl

    Nvl

    Texton Labeling

    Compare�Texton Histograms

    Recognition Method 2

    2 Summary of 3D Texton Method

    For comparison purposes, we briefly summarize the 3D texton method for recognition as described in [24]. A 3Dtexton is essentially an image feature as a function of illumination and viewing direction. As illustrated in Figure 1, a3D texton vocabulary is created using samples (which need not be the same set of samples that are used for trainingin the recognition stage). For each sample, registered images from different viewing and illumination directions arefiltered using filters (typically of size ). In Figure 1 the filter bank is denoted by and the convolution ofthe images with this filter bank is denoted by the symbol ’ ’. These filters are a set of multiscale gaussians, derivativeof gaussians and center surround derivatives. For the experimental results presented in [24], and .Recognition experiments for this method used images from the CUReT database. In the training stage of recognition,the input images for the sample to be recognized are assigned texton labels and then histograms of 3D texton labels arecomputed. Recognition is done with two different methods. In the first method a set of registered novel images is requiredfrom known viewing and illumination directions (the same directions as those used in training with . Textonsare labelled, a texton histogram is computed and the closest histogram from the training set is found using the chi-squareddistance. The recognition rate reported for this method is 97% based on a training set with 20 registered images for eachsample, and a test set of 20 registered images for each test sample. In the second recognition method, a single image fromunknown viewing and illumination direction is used. However 3D texton labels cannot be found easily for the pixels of thisimage since the variation with illumination and viewing direction is unknown. To obtain a texton labeling and a materialclassification a Markov Chain Monte Carlo algorithm with Metropolis sampling used. The recognition rate reported for thismethod is 87% averaged over 40 samples and 5 images per sample.

  • 2D Texton Method

    Sample 1

    Image 1�.�.�.�Image L

    Project to� Eigenspace

    Sample 1

    PCA

    Creation of Texton Library

    Un-Registered Images �from Unknown �View/ Illumination

    Mvl

    Sample 2�.�.�Sample M

    *

    F

    k-means clustering�on size Nfil�

    feature vectors}texton�labels

    Training

    Texton Labeling Compute �Texton Histogram }Texture �

    Representation

    Recognition Method

    Texton Labeling Compute �Texton HistogramFind Nearest�

    Manifold

    Sample 2�.�.�Sample M

    Figure 2: 2D Texton Method

    3 Image Texton Method

    For the image texton (2D texton) method of recognition illustrated in Figure 2, we create the 2D texton vocabularyusing samples, and unregistered images of different viewing and illumination directions for each of the samples.These images are filtered individually using a multiscale filter bank on 3 different scales. Each filter bank consists of

    filters, which are oriented gaussian derivative filters, center surround derivative filters, and low-pass gaussian filters.In our approach we use filters for each of the scales and the filters are of size . Therefore the textureimage is characterized by 3 sets of -dimensional feature vectors, each of the feature vectors corresponding to a pixel inthe image. Note that Figure 2 illustrates filtering and the resultant feature vector at a single scale for clarity. These vectorscharacterize the local image properties for a particular pixel of the image at a particular viewing and illumination direction.Considering that texture has repetitive spatial properties, the data obtained as described above can be highly redundant. Assuggested in [24], we reduce the size of the data set by determining the representatives among the population. Therefore,for each texture image the key features are identified by performing clustering using the k-means clustering algorithm. Thisclustering method is a greedy iterative algorithm that finds a fixed number of centers in the data space such that each datavector is associated with one of the centers, and the sum of the square distances to the centers is minimized. For each of thetexture images and for each of the scales, we reduce the feature vector set to -dimensional vectors, which representthe 2D textons for that particular image, on that scale. In this manner, for every sample we have a set of 2Dtextons. The sets are merged for all samples to obtain a universal 2D texton vocabularies of size for eachscale. Due to the fact that different textures often exhibit similar local properties, the vocabularies of 2D textons obtainedas above can be further reduced. Specifically, the sets of 2D textons are collapsed by eliminating the ones representing toofew image feature vectors, and then again by clustering the remaining textons, into representatives. The resulting sets oftexton labels are the 2D texton libraries, one for each of the 3 scales.

  • For the training stage of recognition we obtain a surface descriptor for each sample of interest as illustrated in Figure 2.For each sample, images of unknown viewing and illumination directions are used. For each of the images, and for each ofthe scales, the histogram of 2D texton labels is computed. The 3 histograms are concatenated into a -dimensionalvector for each of the images per sample. These vectors are the surface descriptor for each sample. That is, for eachsample we have a bidirectional texton histogram, defining a hypersurface in -dimensional space, parametrized bythe viewing and illumination directions of the training images; this hypersurface is referred to as a manifold.

    Due to the high dimension of the analysis space, a reduction in the dimensionality is performed by using principalcomponent analysis (PCA). Using all training images, for all samples, a universal eigenspace is computed by employing theSLAM software library [33][32]. In the universal eigenspace each of the samples is represented by a manifold, a parametriceigenspace representation obtained by projecting all -dimensional vectors into eigenspace, and by interpolatingthe projected points using a quadratic b-spline.

    During testing, a single novel image, of unknown view and illumination directions, is analyzed to obtain the 2D textonhistogram. This novel image is outside the set of images used to create the 2D texton library and outside the set of imagesused to create the surface descriptor in the training stage. The resulting -dimensional vector is projected into theuniversal eigenspace. The sample corresponding to the closest manifold in the eigenspace is reported as the match.

    4 Results

    In our experiments, we employ texture images from the CUReT database [12]. Among the 61 real-world surfacesavailable in the database, we consider 20 samples: sample 1 (felt), sample 4 (rough plastic), sample 6 (sandpaper), sample10 (plaster a), sample 12 (rough paper), sample 14 (roofing shingle), sample 16 (cork), sample 18 (rug a), sample 20(styrofoam), sample 22 (lambswool), sample 25 (quarry tile), sample 27 (insulation), sample 30 (plaster b zoomed), sample33 (slate a), sample 35 (painted spheres), sample 41 (brick b), sample 45 (concrete a), sample 48 (brown bread), sample50 (concrete c), sample59 (cracker b). For each of the 20 samples, = 64 images of different viewing and illuminationdirections (see Table 1) were used for creating the 2D texton library, as well as for training the system. Both the view andillumination directions of these images are specified by the polar angles , and azimuthal angles , respectively. Theimages used during testing were from the same = 20 samples, but of novel viewing and illumination directions. For eachsample we used 19 images, so the total number of test images is = 380. To ensure that only texture informationis included during analysis, the images were manually presegmented. As discussed in Section3, k-means clustering isused to group image features within each of the input images. The number of key features for each of thetexture images, and for each scale, is empirically set to 15. This set of feature groups is concatenated and then reduced byeliminating the ones representing too few image feature vectors, and then again by clustering the remaining textons, intorepresentatives per scale. The size of the universal 2D texton library, , was varied from 100 to 550, with a step of 50,

    hence we experimented with 10 variants of the vocabulary. During training, different values for the number of dimensionsof the eigenspace were used. Specifically, the number of eigen vectors was varied from 30 to 250, with a step of 10.

    Two cases of correctly recognized texture images, along with the corresponding 2D texton histograms for all 3 scales, aredepicted in Figures 7 and 8. The first example presents a pair of correctly matched instances of sample 45 (concrete), whilethe second illustrates two textured images of sample 4 (rough plastic), which are found as being of the same type, in spite ofthe difference in the view and illumination direction. This demonstrates the robustness of representing the surface descriptoras the distribution of the 2D texton over the texture image. In Figure 4 the set of mismatches is presented for = 450, and

    = 300; also the corresponding values for the polar and azimuthal angles of the view and illumination directions aregiven in Table 2. The algorithm fails in these cases due to the very oblique illumination direction for some of the texturedsurfaces , like in the case of mismatches 1, 4, and 7, or due to oblique view direction, like in the case of mismatches 2 and 3.The 5th example of misclassification is further analyzed as an example of how mismatches occur. For this example Figure5 illustrates the 3 2D texton histograms on the 3 scales. We trace back the 2D textons that give the match on all 3 scales,and in Figure 6 we list the filters, ordered decresingly by the 2D texton components; as we expect, the highest responsescorrespond to the center surround derivative filters, due to the spot-like features that can be observed in both test imageand the matched training image. Analogous observations can be made about the 6th example of misclassification. The

  • 100200

    300400

    500600

    050

    100150

    2002500.92

    0.93

    0.94

    0.95

    0.96

    0.97

    0.98

    0.99

    Number of K−MeansNumber of Eigen VectorsR

    ecog

    nitio

    n R

    ate

    Figure 3: Recognition rates (as the percentage of test images correctly recognized) as a function of the size of the 2D textonlibrary, as well as of the number of dimension of the eigenspace.

    recognition rate as a function of both the size of the 2D texton library, and the dimensionality of the eigenspace is given inFigure 3. The plotted surface clearly suggests that undersampling the 2D texton vocabulary negatively affects performance,but increasing the number of key features over 300 (for each of the 3 scales) produces excellent results, with recognitionrate increasing over 98%.

    We also experimented with representing the textured surface using joint 2D texton histograms between the scales. Theresults were similar, that is the peak recognition rate is 98%.

    The high recognition rates achieved using our algorithm demonstrates that the approach taken is appropriate for theproblem at hand. The results are especially interesting given that the recognition is based on a single novel texture imagewithout any information on imaging conditions.

    5 Comparison of Methods

    In this section we compare recognition using the 3D texton method [24] with the image texton method described inSection 3. Both methods use a representation of texture that can capture the complex variations of appearance of a 3Dtextured surface as a function of viewing and illumination direction. Every texture representation must capture a globalspatial distribution of a local feature. For representations of 3D textured surfaces, variations with imaging conditions(view/illumination direction) must also be included. For the 3D texton method, variations with imaging conditions areaccounted for in the local feature. The image texton representation describes variations of the global spatial distribution withimaging conditions. This difference has consequences in terms of the amount of information needed for recognition. For thecreation of the 3D texton library, a set of registered images from known viewing and illumination directions is required. Inthe training stage, the 3D texton method also requires registered images from known viewing and illumination directions.For the image texton method, a set of unregistered images for unknown viewing and illumination directions is sufficientin both the training stage and the creation of the 2D texton library. For recognition, the 3D texton method has a directtechnique which requires a set (typically 20) registered images of the novel surface from known view/illumination. The 3Dtexton method also has an iterative Markov Chain Monte Carlo recognition technique requiring only a single image fromunknown viewing and illumination direction. However use of an iterative algorithm adds processing time and convergenceissues. The image texton method uses a single image from unknown view/illumination without the need for an iterativealgorithm. The peak recognition rate using the image texton method is 98%, while the 3D texton method recognition ratesare 97% (direct method) and 87% (iterative method).

  • (a1) Sample 25 (b1) Sample 1

    (a2) Sample 30 (b2) Sample 45

    (a3) Sample 33 (b3) Sample 41

    (a4) Sample 33 (b4) Sample 6

    (a5) Sample 41 (b5) Sample 20

    (a6) Sample 41 (b6) Sample 20

    (a7) Sample 41 (b7) Sample 16

    Figure 4: The misclassified test images (left column) and the corresponding wrong matches (right column). The type of thesample is given below each of the images.

  • 0 50 100 150 200 250 3000

    0.005

    0.01

    0.015

    0.02

    0.025

    0.03

    0.035

    0.04

    0.045

    0.05

    0 50 100 150 200 250 3000

    0.01

    0.02

    0.03

    0.04

    0.05

    0.06

    0.07

    0 50 100 150 200 250 3000

    0.01

    0.02

    0.03

    0.04

    0.05

    0.06

    0.07

    Figure 5: The histograms on all three scales for the example of misclassification (the 5th example from the top of of Figure4.) The symbol ’o’ is used for the histograms of the misclassified test image. The symbol ’x’ is used for the correspondingwrong match.

    (a)

    (b)

    (c)

    Figure 6: The histograms of Figure 5 can be used to interpret the mismatch of Figure4 (5th example). A peak in the textonlabel histogram is identified at each scale. This peak is a key feature in both the training and test image and contributessignificantly to the incorrect match. To interpret which texton is associated with the peak, this figure shows the filters ateach scale ordered according to the filter response components for this particular texton. That is, the first filter shown is thefilter output for this texton with the highest absolute response.

  • 0 50 100 150 200 250 3000

    0.01

    0.02

    0.03

    0.04

    0.05

    0.06

    0.07

    0 50 100 150 200 250 3000

    0.005

    0.01

    0.015

    0.02

    0.025

    0.03

    0.035

    0.04

    0.045

    0.05

    0 50 100 150 200 250 3000

    0.005

    0.01

    0.015

    0.02

    0.025

    0.03

    0.035

    0.04

    Figure 7: Example of good match (sample45 - concrete a), and the corresponding histograms for the three scales (from finerto coarser scale).

    0 50 100 150 200 250 3000

    0.005

    0.01

    0.015

    0.02

    0.025

    0.03

    0.035

    0.04

    0 50 100 150 200 250 3000

    0.005

    0.01

    0.015

    0.02

    0.025

    0.03

    0.035

    0.04

    0.045

    0.05

    0 50 100 150 200 250 3000

    0.005

    0.01

    0.015

    0.02

    0.025

    0.03

    0.035

    Figure 8: Example of another good match (sample 4 - rough plastic), and the corresponding histograms for the three scales(from finer to coaser scale).

  • The image texton representation has some clear advantages in recognition, because it requires less apriori information.However this representation also encodes less information about the textured surface. Consequently, it is difficult to use thisrepresentation for rendering 3D textured surfaces. The 3D texton method precisely encodes local feature appearance andvariation of appearance with imaging conditions. Therefore this method can be used more easily in texture rendering. Theimage texton method uses PCA for dimensionality reduction of the feature histogram set. A discussion in [24] commentson the validity of using PCA and its implicit assumption of Lambertian reflectance. It should be noted that our method doesnot use PCA directly on reflectance values. As such there is no assumption that reflectance values are a linear combinationof a basis set. PCA is used only for dimensionality reduction of a high dimensional histogram set, and no assumptionsare made about Lambertian reflectance. The assumption is that the collection of histograms can be projected to a basisset while still retaining its descriptive capability. Indeed we have shown empirically with our recognition results that suchdimensionality reduction is valid.

    6 Conclusion and Future Work

    We have presented a new method of recognizing 3D textured surfaces using the distribution of image textons as a func-tion of viewing and illumination directions. Recognition experiments from the CUReT database have been compared withthose using the 3D texton methods. The comparison shows that the 2D texton method achieved equal or better recognitionrates, and required less apriori information about the training and/or test images.

    For the general problem of 3D textured surface recognition, significant work remains. The recognition results discussedso far are tested only with in-lab images obtained under a controlled setting. While illumination direction varies, the type ofillumination is approximately constant and there is no attempt to vary the spectrum or polarization of the illumination. Thecollimated light source used in obtaining the database images does not approximate the complex illumination that occurs ina natural environment where weather conditions, cloud cover and nearby objects contribute to the illumination of the object.The general problem of surface recognition must also deal with the issue of scale invariance which has not been addressedhere. Furthermore, segmenting the scene to determine the image area for texture analysis is an important open issue.

    Other future work in this area includes integrating object and surface representations. Surface and object recognition arenot separate issues but instead are clearly interrelated. Better surface models can assist the problem of object recognition.Also, object representations can provide global surface normals and enable scene segmentation in order to assist the problemof surface or material identification. Integration of the algorithms and techniques in both these areas is an important area offuture work.

    Another element of future work is the classification of surfaces into more general descriptions so that we can recognizesurfaces as meaningful classes rather than just prelearned examples. For example it may be useful to recognize that a surfaceis made of general classes, e.g. stone, fur and carpet, regardless of the particular type within this class.

    In general, computational representations for 3D textured surfaces are also important for the problem of texture syn-thesis. Feature histograms have been quite useful for the problem of 2D texture synthesis (where a single image is used torepresent texture). Variations of the feature histogram representation used here has potential utility in a similar type of 3Dtextured synthesis based on exemplar texture sets.

    ACKNOWLEDGEMENT 6.1. This material is based upon work supported by the National Science Foundation underGrant No. 0092491 and Grant No. 0085864. We thank Thomas Leung for providing the filter bank used in his experiments.

  • Table 1: Viewing and Illumination Directions for the Training Images

    No.1 1.389566 -1.970402 1.374447 -1.5707962 0.981748 0.000000 1.374447 0.0000003 1.308097 -1.163336 1.370707 -0.7648284 1.050598 -0.417031 1.357866 -0.1534845 0.785399 0.000000 1.178098 0.0000006 1.272499 -2.496589 1.154239 -2.0961927 1.099841 -1.476736 1.154239 -1.0454008 1.421060 -3.015733 1.130287 -2.7385369 1.374447 3.141593 0.981748 3.14159310 0.589049 0.000000 0.981748 0.00000011 1.101194 -2.432726 0.955317 -2.00531512 1.178097 3.141593 0.785398 3.14159313 1.038285 -3.013835 0.662145 -2.86055714 0.694806 -2.211453 0.589048 -1.57079615 0.213063 -1.410017 0.472749 -0.43076716 0.000001 3.141593 0.392699 0.00000017 0.274014 -2.335623 0.284924 -0.76482818 0.196350 3.141593 0.196349 0.00000019 0.392699 3.141593 0.000000 0.00000020 1.432405 -2.365895 1.374447 -1.57079621 0.589049 0.000000 1.374447 0.00000022 1.285872 -1.570796 1.370707 -0.76482823 1.122109 -1.226629 1.360401 -0.43076724 0.392700 0.000000 1.178098 0.00000025 1.122109 -1.914964 1.154239 -1.04540026 0.717004 -1.267733 1.130287 -0.40305627 1.047198 -2.526113 0.785398 -1.57079628 0.261800 -2.202711 0.662145 -0.28103529 0.196350 3.141593 0.589048 0.00000030 0.392700 3.141593 0.392699 0.00000031 0.589049 3.141593 0.196349 0.00000032 0.785398 3.141593 0.000000 0.00000033 0.196350 0.000000 1.374447 0.00000034 1.308097 -1.978257 1.370707 -0.76482835 0.909333 -1.517790 1.357866 -0.28103536 0.000001 0.000000 1.178098 0.00000037 0.196349 3.141593 0.981748 0.00000038 1.141555 -2.514072 0.955317 -1.13627739 0.392699 3.141593 0.785398 0.00000040 0.589049 3.141593 0.589048 0.00000041 0.785399 3.141593 0.392699 0.00000042 0.987375 -3.007326 0.284924 -0.76482843 0.981748 3.141593 0.196349 0.00000044 1.178097 3.141593 0.000000 0.00000045 0.196349 3.141593 1.374447 0.00000046 1.154239 -2.096192 1.360401 -0.43076747 0.392699 3.141593 1.178098 0.00000048 0.589048 3.141593 0.981748 0.00000049 0.662145 -2.860557 0.955317 -0.15344850 0.785398 3.141593 0.785398 0.00000051 0.981748 3.141593 0.589048 0.00000052 1.178098 3.141593 0.392699 0.00000053 1.374447 3.141593 0.196349 0.00000054 0.589048 3.141593 1.374447 0.00000055 0.625671 -2.857075 1.370707 -0.04114156 0.733813 -2.628785 1.360401 -0.09462957 0.785398 3.141593 1.178098 0.00000058 0.981747 3.141593 0.981748 0.00000059 1.178097 3.141593 0.785398 0.00000060 1.374447 3.141593 0.589048 0.00000061 0.981747 3.141593 1.374447 0.00000062 1.178097 3.141593 1.178098 0.00000063 1.374446 3.141593 0.981748 0.00000064 1.374447 3.141593 1.374447 0.000000

    Table 2: The Set of Misclassified Pairs Test Image - Training Image, with the Corresponding Viewing and IlluminationDirections (specified by the polar , , and azimuthal , angles), for = 450, and = 300.

    No. Test Sample Matched Sample1 Sample 25 1.038285 -3.013835 0.955317 -0.153484 Sample 1 0.000001 3.141593 0.392699 0.0000002 Sample 30 1.348016 -2.815465 1.150262 -2.456873 Sample 45 0.589049 0.000000 0.981748 0.0000003 Sample 33 1.348016 -2.815465 1.150262 -2.456873 Sample 41 1.038285 -3.013835 0.662145 -2.8605574 Sample 33 1.038285 -3.013835 0.955317 -0.153484 Sample 6 1.374447 3.141593 0.196349 0.0000005 Sample 41 0.716526 -1.270836 0.876815 -0.764684 Sample 20 1.308097 -1.163336 1.370707 -0.7648286 Sample 41 0.581425 -0.799895 0.876815 -0.403056 Sample 20 1.050598 -0.417031 1.357866 -0.1534847 Sample 41 1.038285 -3.013835 0.955317 -0.153484 Sample 16 1.178097 3.141593 1.178098 0.000000

  • 7 REFERENCES

    [1] J. Beck, A. Sutter, and R. Ivry. Spatial frequency channels and perceptial grouping in texture segmentation. ComputerVision, Graphics, and Image Processing, pages 299–325, 1987.

    [2] E. A. Bier and K. R. Sloan. Two part texture mapping. IEEE Computer Graphics and Applications, 6(9):40–53,September 1986.

    [3] J. F. Blinn and M. E. Newell. Texture and reflection in computer generated images. Communications of the ACM,19(10), October 1976.

    [4] J. S. De Bonet and P. Viola. Texture recognition using a non-parametric multi-scale statistical model. Proceedings ofthe IEEE Conference on Computer Vision and Pattern Recognition, pages 641–647, 1998.

    [5] J. A. Cadzow. Image texture synthesis and analysis using moving average models. IEEE Transaction of Aero andElectronics, 29(4):1110–1121, October 1993.

    [6] P. C. Chen and T. Pavlidis. Segmentation by texture using a cooccurrence matrix and a split-and-merge algorithm.Computer Graphics and Image Processing, 10:172–182, 1979.

    [7] W. T. Correa, R. J. Jensen, C. E. Thayer, and A. Finkelstein. Texture mapping for cel animation. ACM SIGGRAPH,pages 435–46, 1998.

    [8] K. J. Dana and S. K. Nayar. Histogram model for 3d textures. Proceedings of the IEEE Conference on ComputerVision and Pattern Recognition, pages 618–624, June 1998.

    [9] K. J. Dana and S. K. Nayar. 3d textured surface modeling. IEEE Workshop on the Integration of Appearance andGeometric Methods in Object Recognition, pages 46–56, June 1999.

    [10] K. J. Dana and S. K. Nayar. Correlation model for 3d texture. International Conference on Computer Vision, pages1061–1067, September 1999.

    [11] K. J. Dana, B. van Ginneken, S. K. Nayar, and J. J. Koenderink. Reflectance and texture of real-world surfaces.Columbia University Technical Report CUCS-048-96, December 1996.

    [12] K. J. Dana, B. van Ginneken, S. K. Nayar, and J. J. Koenderink. Reflectance and texture of real world surfaces. ACMTransactions on Graphics, 18(1):1–34, January 1999.

    [13] P. E. Debevec, C. J. Taylor, J. Malik, G. Levin, G. Borshukov, and Y. Yu. Image-based modeling and rendering ofarchitecture with interactive photogrammetry and view-dependent texture mapping. ISCAS ’98. Proceedings of the1998 IEEE International Symposium on Circuits and Systems, 5:514–17, 1998.

    [14] A.A. Efros and T.K. Leung. Texture synthesis by non-parametric sampling. International Conference on ComputerVision, 2:1033–1038, 1999.

    [15] J. M. Francos and A. Z. Meiri. A unified structural-stochastic model for texture analysis and synthesis. 9th Interna-tional Conference on Pattern Recognition, pages 41–5, November 1988.

    [16] M. Gangnet and D. Perny. Perspective mapping of planar textures. Eurographics, pages 57–71, 1982.

    [17] J. Garding. Direct estimation of shape from texture. IEEE Transactions on Pattern Analysis and Machine Intelligence,15(11):1202–8, November 1993.

    [18] P. S. Heckbert. A survey of texture mapping. IEEE Computer Graphics and Applications, 6(11):56–67, 1986.

    [19] D. J. Heeger and J. R. Bergen. Pyramid-based texture analysis/synthesis. ACM SIGGRAPH, pages 229–38, 1995.

    [20] J. Y. Jau and R. T. Chin. Shape from texture using the wigner distribution. Computer Vision, Graphics and ImageProcessing, 52(2):248–63, November 1990.

  • [21] R. L. Kashyap and A. Khotanzad. A model based method for rotation invariant texture classification. IEEE Transac-tions on Pattern Analysis and Machine Intelligence, T-PAMI 8:472–481, 1986.

    [22] J. J. Koenderink, A. J. van Doorn, K. J. Dana, and S. K. Nayar. Bidirectional reflection distribution function ofthoroughly pitted surfaces. International Journal of Computer Vision, 31(2-3):129–144, 1999.

    [23] J. Krumm and S. A. Shafer. Texture segmentation and shape in the same image. International Conference on ComputerVision, pages 121–127, 1995.

    [24] T. Leung and J. Malik. Recognizing surfaces using three-dimensional textons. International Conference on ComputerVision, 2:1010–1017, 1999.

    [25] B. Levy and J-L. Mallet. Non-distorted texture mapping for sheared triangulated meshes. ACM SIGGRAPH, pages343–52, 1998.

    [26] J. P. Lewis. Algorithms for solid noise synthesis. ACM SIGGRAPH, 23(3):263–70, 1989.

    [27] G. Lohmann. Cooccurrence-based analysis and synthesis of textures. International Conference on Pattern Recogni-tion, 1:449–53, 1994.

    [28] Chun-Shien Lu and Pau-Choo Chung. Wold features for unsupervised texture segmentation. Fourteenth InternationalConference on Pattern Recognition, 2:1689–93, August 1998.

    [29] G. A. Mastin, P. A. Watterberg, and J. F. Mareda. Fourier synthesis of ocean scenes. IEEE Computer Graphics andApplications, 7(3):16–23, March 1987.

    [30] N. Max. Shadows for bump mapped surfaces. Advanced Computer Graphics, pages 145–56, 1986.

    [31] Gavin S. P. Miller. The definition and rendering of terrain maps. ACM SIGGRAPH, 20(4):39–49, 1986.

    [32] H. Murase and S. K. Nayar. Visual learning and recognition of 3-d objects from appearance. International Journal ofComputer Vision, pages 5–24, 1995.

    [33] S. A. Nene, S. K. Nayar, and H. Murase. Slam: A software library for appearance matching. Technical ReportCUCS-019-94, Proceedings of ARPA Image UnderstandingWorkshop, November 1994.

    [34] R. Paget and I. D. Longstaff. Texture synthesis and unsupervised recognition with a nonparametric multiscale markovrandom field model. Fourteenth International Conference on Pattern Recognition, 2:1068–70, 1998.

    [35] M. A. S. Patel and F. S. Cohen. Shape from texture using markov random field models and stereo-windows. Proceed-ings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 290–5, 1992.

    [36] T. R. Reed. A review of recent texture segmentation and feature extraction techniques. Computer Vision, Graphicsand Image Processing, 57(3):359–372, 1993.

    [37] M. Segal, C. Korobkin, R. van Widenfelt, J. Foran, and P. Haeberli. Fast shadows and lighting effects using texturemapping. ACM SIGGRAPH, 26(2):249–52, July 1992.

    [38] C. Stewart. Fractal brownian motion models for sar imagery scene segmenation. Proceedings of the IEEE,81(10):1511–1522, October 1993.

    [39] P. Suen and G. Healey. Analyzing the bidirectional texture function. Proceedings of the IEEEConference on ComputerVision and Pattern Recognition, pages 753–758, June 1998.

    [40] P. Suen and G. Healey. The analysis and recognition of real-world textures in three dimensions. IEEE Transactionson Pattern Analysis and Machine Intelligence, 22(5):491–503,May 2000.

    [41] B. J. Super and A. C. Bovik. Shape from texture using local spectral moments. IEEE Transactions on Pattern Analysisand Machine Intelligence, 17(4):333–43, April 1995.

  • [42] P. Y. Tso and B. A. Barsky. Modelling and rendering waves: Wave tracing using beta splines and reflective andrefractive texture mapping. ACM Transactions on Graphics, 6(3):191–214, July 1987.

    [43] K. Valkealahti and E. Oja. Reduced multidimensional co-occurrence histograms in texture classification. IEEE Trans-actions on Pattern Analysis and Machine Intelligence, 20(1):90–94, 1998.

    [44] F. M. D. Vilnrotter. Structural analysis of natural textures. IEEE Transactions on Pattern Analysis and MachineIntelligence, 9(1):76–89, 1986.

    [45] L. Wang and G. Healey. Illumination and geometry invariant recognition of texture in color images. Proceedings ofthe IEEE Conference on Computer Vision and Pattern Recognition, pages 419–424, 1996.

    [46] J. Weil. The synthesis of cloth objects. ACM SIGGRAPH, 20(4):49–54, 1986.

    [47] J. Weszka, C. R. Dyer, and A. Rosenfeld. A comparative study of texture measured for terrain classification. IEEETransactions on Systems, Man and Cybernetics, SMC-6:269–285, 1976.

    [48] S. C. Zhu, Y. N. Wu, and D. Mumford. Filters, random field and maximum entropy: Towards a unified theory fortexture modeling. International Journal of Computer Vision, 27(2):1–20, March/April 1998.