Content-Based Retrieval in Image Databases. References Ze-Nian Li & Mark S. Drew, Fundamentals of...

67
Content-Based Retrieval in Image Databases
  • date post

    22-Dec-2015
  • Category

    Documents

  • view

    222
  • download

    0

Transcript of Content-Based Retrieval in Image Databases. References Ze-Nian Li & Mark S. Drew, Fundamentals of...

Page 1: Content-Based Retrieval in Image Databases. References Ze-Nian Li & Mark S. Drew, Fundamentals of Multimedia, ISBN 0-13-061872-1, Prentice-Hall, 2004.

Content-Based Retrieval in Image Databases

Page 2: Content-Based Retrieval in Image Databases. References Ze-Nian Li & Mark S. Drew, Fundamentals of Multimedia, ISBN 0-13-061872-1, Prentice-Hall, 2004.

References

• Ze-Nian Li & Mark S. Drew, Fundamentals of Multimedia, ISBN 0-13-061872-1, Prentice-Hall, 2004.– Chapter 3

• Wasfi Al-Khatib, Y. Francis Day, Arif Ghafoor, and P. Bruce Berra. Semantic modeling and knowledge representation in multimedia databases. IEEE Transactions on Knowledge and Data Engineering, 11(1):64-80, 1999.

Page 3: Content-Based Retrieval in Image Databases. References Ze-Nian Li & Mark S. Drew, Fundamentals of Multimedia, ISBN 0-13-061872-1, Prentice-Hall, 2004.

Outline

• Image Processing Basics

• Semantic Modeling and Knowledge Representation in Image Databases

Page 4: Content-Based Retrieval in Image Databases. References Ze-Nian Li & Mark S. Drew, Fundamentals of Multimedia, ISBN 0-13-061872-1, Prentice-Hall, 2004.

Image Processing Basics: Outline

• Binary and Grayscale Image Representations

• Image Dithering

• Image Segmentation

• Convolution

• Color Image Representaiton

Page 5: Content-Based Retrieval in Image Databases. References Ze-Nian Li & Mark S. Drew, Fundamentals of Multimedia, ISBN 0-13-061872-1, Prentice-Hall, 2004.

1-bit Images

• Each pixel is stored as a single bit (0 or 1), so also referred to as binary image.

• Such an image is also called a 1-bit monochrome image since it contains no color.

• Fig. 3.1 shows a 1-bit monochrome image (called “Lena” by multimedia scientists — this is a standard image used to illustrate many algorithms).

Page 6: Content-Based Retrieval in Image Databases. References Ze-Nian Li & Mark S. Drew, Fundamentals of Multimedia, ISBN 0-13-061872-1, Prentice-Hall, 2004.

Fig. 3.1: Monochrome 1-bit Lena image.

Page 7: Content-Based Retrieval in Image Databases. References Ze-Nian Li & Mark S. Drew, Fundamentals of Multimedia, ISBN 0-13-061872-1, Prentice-Hall, 2004.

8-bit Gray-level Images

• Each pixel has a gray-value between 0 and 255. Each pixel is represented by a single byte; e.g., a dark pixel might have a value of 10, and a bright one might be 230.

• Bitmap: The two-dimensional array of pixel values that represents the graphics/image data.

• Image resolution refers to the number of pixels in a digital image (higher resolution always yields better quality).– Fairly high resolution for such an image might be 1,600 1,200, whereas

lower resolution might be 640 480.

Page 8: Content-Based Retrieval in Image Databases. References Ze-Nian Li & Mark S. Drew, Fundamentals of Multimedia, ISBN 0-13-061872-1, Prentice-Hall, 2004.

• Frame buffer: Hardware used to store bitmap.– Video card (actually a graphics card) is used for this purpose.

– The resolution of the video card does not have to match the desired resolution of the image, but if not enough video card memory is available then the data has to be shifted around in RAM for display.

• 8-bit image can be thought of as a set of 1-bit bit-planes, where each plane consists of a 1-bit representation of the image at higher and higher levels of “elevation”: a bit is turned on if the image pixel has a nonzero value that is at or above that bit level.

• Fig. 3.2 displays the concept of bit-planes graphically.

Page 9: Content-Based Retrieval in Image Databases. References Ze-Nian Li & Mark S. Drew, Fundamentals of Multimedia, ISBN 0-13-061872-1, Prentice-Hall, 2004.

Fig. 3.2: Bit-planes for 8-bit grayscale image.

Page 10: Content-Based Retrieval in Image Databases. References Ze-Nian Li & Mark S. Drew, Fundamentals of Multimedia, ISBN 0-13-061872-1, Prentice-Hall, 2004.

Multimedia Presentation

• Each pixel is usually stored as a byte (a value between 0 to 255), so a 640 480 grayscale image requires 300 kB of storage (640 480 = 307, 200).

• Fig. 3.3 shows the Lena image again, but this time in grayscale.

• When an image is printed, the basic strategy of dithering is used, which trades intensity resolution for spatial resolution to provide ability to print multi-level images on 2-level (1-bit) printers.

Page 11: Content-Based Retrieval in Image Databases. References Ze-Nian Li & Mark S. Drew, Fundamentals of Multimedia, ISBN 0-13-061872-1, Prentice-Hall, 2004.

Fig. 3.3: Grayscale image of Lena.

Page 12: Content-Based Retrieval in Image Databases. References Ze-Nian Li & Mark S. Drew, Fundamentals of Multimedia, ISBN 0-13-061872-1, Prentice-Hall, 2004.

Dithering

• Dithering is used to calculate patterns of dots such that values from 0 to 255 correspond to patterns that are more and more filled at darker pixel values, for printing on a 1-bit printer.

• The main strategy is to replace a pixel value by a larger pattern, say 2 2 or 4 4, such that the number of printed dots approximates the varying-sized disks of ink used in analog, in halftone printing (e.g., for newspaper photos).

1. Half-tone printing is an analog process that uses smaller or larger filled circles of black ink to represent shading, for newspaper printing.

2. For example, if we use a 2 2 dither matrix

Page 13: Content-Based Retrieval in Image Databases. References Ze-Nian Li & Mark S. Drew, Fundamentals of Multimedia, ISBN 0-13-061872-1, Prentice-Hall, 2004.

we can first re-map image values in 0..255 into the new range 0..4 by (integer) dividing by 256/5. Then, e.g., if the pixel value is 0 we print nothing, in a 2 2 area of printer output. But if the pixel value is 4 we print all four dots.

• The rule is:

If the intensity is > the dither matrix entry then print an on dot at that entry location: replace each pixel by an n n matrix of dots.

• Note that the image size may be much larger, for a dithered image, since replacing each pixel by a 4 4 array of dots, makes an image 16 times as large.

Page 14: Content-Based Retrieval in Image Databases. References Ze-Nian Li & Mark S. Drew, Fundamentals of Multimedia, ISBN 0-13-061872-1, Prentice-Hall, 2004.

• A clever trick can get around this problem. Suppose we wish to use a larger, 4 4 dither matrix, such as

• An ordered dither consists of turning on the printer out-put bit for a pixel if the intensity level is greater than the particular matrix element just at that pixel position.

• Fig. 3.4 (a) shows a grayscale image of “Lena”. The ordered-dither version is shown as Fig. 3.4 (b), with a detail of Lena's right eye in Fig. 3.4 (c).

Page 15: Content-Based Retrieval in Image Databases. References Ze-Nian Li & Mark S. Drew, Fundamentals of Multimedia, ISBN 0-13-061872-1, Prentice-Hall, 2004.

• An algorithm for ordered dither, with n n dither matrix, is as follows:

BEGIN

for x = 0 to xmax // columns

for y = 0 to ymax // rows

i = x mod n

j = y mod n

// I(x, y) is the input, O(x, y) is the output,

//D is the dither matrix.

if I(x, y) > D(i, j)

O(x, y) = 1;

else

O(x, y) = 0;

END

Page 16: Content-Based Retrieval in Image Databases. References Ze-Nian Li & Mark S. Drew, Fundamentals of Multimedia, ISBN 0-13-061872-1, Prentice-Hall, 2004.

Fig. 3.4: Dithering of grayscale images.

(a): 8-bit grey image “lenagray.bmp”. (b): Dithered version of the image. (c): Detail of dithered version.

(a) (b) (c)

Page 17: Content-Based Retrieval in Image Databases. References Ze-Nian Li & Mark S. Drew, Fundamentals of Multimedia, ISBN 0-13-061872-1, Prentice-Hall, 2004.

Image Segmentation• Assigning a unique number to “object” pixels

based on different intensities or colors in the foreground and the background regions of an image– Can be used in the object recognition process, but

it is not object recognition on its own

• Segmentation Methods– Pixel oriented methods– Edge oriented methods– Region oriented methods– ....

Page 18: Content-Based Retrieval in Image Databases. References Ze-Nian Li & Mark S. Drew, Fundamentals of Multimedia, ISBN 0-13-061872-1, Prentice-Hall, 2004.

Pixel-Oriented Segmentation

• Gray-values of pixels are studied in isolation• Looks at the gray-level histogram of an image and

finds one or more thresholds in the histogram– Ideally, the histogram has a region without pixels,

which is set as the threshold, and hence the image is divided into a foreground and a background based on that (Bimodal Distribution)

• Major drawback of this approach is that object and background histograms overlap.– Bimodal distribution rarely occurs in nature.

Page 19: Content-Based Retrieval in Image Databases. References Ze-Nian Li & Mark S. Drew, Fundamentals of Multimedia, ISBN 0-13-061872-1, Prentice-Hall, 2004.

Edge-Oriented Segmentation

• Segmentation is carried out as follows– Edges of an image are extracted (using Canny

operators, e.g.)– Edges are connected to form closed contours

around the objects.• Hough Transform

– Usually very expensive

– Works well with regular curves (application in manufactured parts)

– May work in presence of noise

Page 20: Content-Based Retrieval in Image Databases. References Ze-Nian Li & Mark S. Drew, Fundamentals of Multimedia, ISBN 0-13-061872-1, Prentice-Hall, 2004.

Region-Oriented Segmentation

• A major disadvantage of the previous approaches is the lack of “spatial” relationship considerations of pixels.– Neighboring pixels normally have similar properties

• The segmentation (region-growing) is carried out as follows– Start with a “seed” pixel.– Pixel’s neighbors are included if they have some

similarity to the seed pixel, otherwise they are not.• Homogeneity condition• Uses an eight-neighborhood (8-nbd) model

Page 21: Content-Based Retrieval in Image Databases. References Ze-Nian Li & Mark S. Drew, Fundamentals of Multimedia, ISBN 0-13-061872-1, Prentice-Hall, 2004.

Region-Oriented Segmentation

• Homogeneity criterion: Gray-level mean value of a region is usually used

• With standard deviation

• Drawbacks: Computationally expensive.

N

i

N

jkk mjiP

n 1 1

2

2),(

1

N

i

N

jk jiP

nm

1 12

),(1

Page 22: Content-Based Retrieval in Image Databases. References Ze-Nian Li & Mark S. Drew, Fundamentals of Multimedia, ISBN 0-13-061872-1, Prentice-Hall, 2004.

Convolution

• Convolution is a simple mathematical operation which is fundamental to many common image processing operators.

• Convolution provides a way of `multiplying together' two arrays of numbers, generally of different sizes, but of the same dimensionality, to produce a third array of numbers of the same dimensionality.

• This can be used in image processing to implement operators whose output pixel values are simple linear combinations of certain input pixel values.

• The convolution is performed by sliding the kernel over the image, generally starting at the top left corner, so as to move the kernel through all the positions where the kernel fits entirely within the boundaries of the image.

Page 23: Content-Based Retrieval in Image Databases. References Ze-Nian Li & Mark S. Drew, Fundamentals of Multimedia, ISBN 0-13-061872-1, Prentice-Hall, 2004.

Convolution Computation

• If the image E has M rows and N columns, and the kernel K has m rows and n columns, then the size of the output image A will have M - m + 1 rows, and N - n + 1 columns and is given by:

– Example:

m

k

n

l

lkKljkiEjiA1 1

),()1,1(),(

Page 24: Content-Based Retrieval in Image Databases. References Ze-Nian Li & Mark S. Drew, Fundamentals of Multimedia, ISBN 0-13-061872-1, Prentice-Hall, 2004.

Image Data Types

• The most common data types for graphics and image file formats — 24-bit color and 8-bit color.

• Some formats are restricted to particular hardware / operating system platforms, while others are “cross-platform” formats.

• Even if some formats are not cross-platform, there are conversion applications that will recognize and translate formats from one system to another.

• Most image formats incorporate some variation of a compression technique due to the large storage size of image files. Compression techniques can be classified into either lossless or lossy.

Page 25: Content-Based Retrieval in Image Databases. References Ze-Nian Li & Mark S. Drew, Fundamentals of Multimedia, ISBN 0-13-061872-1, Prentice-Hall, 2004.

24-bit Color Images

• In a color 24-bit image, each pixel is represented by three bytes, usually representing RGB.– This format supports 256 256 256 possible combined colors, or a

total of 16,777,216 possible colors.– However such flexibility does result in a storage penalty: A 640 480

24-bit color image would require 921.6 kB of storage without any compression.

• An important point: many 24-bit color images are actually stored as 32-bit images, with the extra byte of data for each pixel used to store an alpha value representing special effect information (e.g., transparency).

• Fig. 3.5 shows the image forestfire.bmp, a 24-bit image in Microsoft Windows BMP format. Also shown are the grayscale images for just the Red, Green, and Blue channels, for this image.

Page 26: Content-Based Retrieval in Image Databases. References Ze-Nian Li & Mark S. Drew, Fundamentals of Multimedia, ISBN 0-13-061872-1, Prentice-Hall, 2004.

Fig. 3.5: High-resolution color and separate R, G, B color channel images. (a): Example of 24-bit color image “forestfire.bmp”. (b, c, d): R, G, and B color channels for this image

(a)

(c)

(b)

(d)

Page 27: Content-Based Retrieval in Image Databases. References Ze-Nian Li & Mark S. Drew, Fundamentals of Multimedia, ISBN 0-13-061872-1, Prentice-Hall, 2004.

8-bit Color Images

• Many systems can make use of 8 bits of color information (the so-called “256 colors”) in producing a screen image.

• Such image files use the concept of a lookup table to store color information.– Basically, the image stores not color, but instead just a set of bytes, each

of which is actually an index into a table with 3-byte values that specify the color for a pixel with that lookup table index.

• Fig. 3.6 shows a 3D histogram of the RGB values of the pixels in “forestfire.bmp”.

Page 28: Content-Based Retrieval in Image Databases. References Ze-Nian Li & Mark S. Drew, Fundamentals of Multimedia, ISBN 0-13-061872-1, Prentice-Hall, 2004.

Fig. 3.6: 3-dimensional histogram of RGB colors in “forestfire.bmp”.

Page 29: Content-Based Retrieval in Image Databases. References Ze-Nian Li & Mark S. Drew, Fundamentals of Multimedia, ISBN 0-13-061872-1, Prentice-Hall, 2004.

• Fig. 3.7 shows the resulting 8-bit image, in GIF format.

Fig. 3.7 Example of 8-bit color image.

• Note the great savings in space for 8-bit images, over 24-bit ones: a 640 480 8-bit color image only requires 300 kB of storage, compared to 921.6 kB for a color image (again, without any compression applied).

Page 30: Content-Based Retrieval in Image Databases. References Ze-Nian Li & Mark S. Drew, Fundamentals of Multimedia, ISBN 0-13-061872-1, Prentice-Hall, 2004.

Color Look-up Tables (LUTs)

• The idea used in 8-bit color images is to store only the index, or code value, for each pixel. Then, e.g., if a pixel stores the value 25, the meaning is to go to row 25 in a color look-up table (LUT).

Fig. 3.8: Color LUT for 8-bit color images.

Page 31: Content-Based Retrieval in Image Databases. References Ze-Nian Li & Mark S. Drew, Fundamentals of Multimedia, ISBN 0-13-061872-1, Prentice-Hall, 2004.

• A Color-picker consists of an array of fairly large blocks of color (or a semi-continuous range of colors) such that a mouse-click will select the color indicated.

– In reality, a color-picker displays the palette colors associated with index values from 0 to 255.

– Fig. 3.9 displays the concept of a color-picker: if the user selects the color block with index value 2, then the color meant is cyan, with RGB values (0, 255, 255).

• A very simple animation process is possible via simply changing the color table: this is called color cycling or palette animation.

Page 32: Content-Based Retrieval in Image Databases. References Ze-Nian Li & Mark S. Drew, Fundamentals of Multimedia, ISBN 0-13-061872-1, Prentice-Hall, 2004.

Fig. 3.9: Color-picker for 8-bit color: each block of the color-picker corresponds to one row of the color LUT

Page 33: Content-Based Retrieval in Image Databases. References Ze-Nian Li & Mark S. Drew, Fundamentals of Multimedia, ISBN 0-13-061872-1, Prentice-Hall, 2004.

• Fig. 3.10 (a) shows a 24-bit color image of “Lena”, and Fig. 3.10 (b) shows the same image reduced to only 5 bits via dithering. A detail of the left eye is shown in Fig. 3.10 (c).

(a) (b) (c)

Fig. 3.10: (a): 24-bit color image “lena.bmp”. (b): Version with color dithering. (c): Detail of dithered version.

Page 34: Content-Based Retrieval in Image Databases. References Ze-Nian Li & Mark S. Drew, Fundamentals of Multimedia, ISBN 0-13-061872-1, Prentice-Hall, 2004.

How to devise a color look-up table

• The most straightforward way to make 8-bit look-up color out of 24-bit color would be to divide the RGB cube into equal slices in each dimension.

(a) The centers of each of the resulting cubes would serve as the entries in the color LUT, while simply scaling the RGB ranges 0..255 into the appropriate ranges would generate the 8-bit codes.

(b) Since humans are more sensitive to R and G than to B, we could shrink the R range and G range 0..255 into the 3-bit range 0..7 and shrink the B range down to the 2-bit range 0..3, thus making up a total of 8 bits.

(c) To shrink R and G, we could simply divide the R or G byte value by (256/8)=32 and then truncate. Then each pixel in the image gets replaced by its 8-bit index and the color LUT serves to generate 24-bit color.

• Alternate solutions that do a better job for this color reduction problem also exist, e.g. Median-cut Algorithm.

Page 35: Content-Based Retrieval in Image Databases. References Ze-Nian Li & Mark S. Drew, Fundamentals of Multimedia, ISBN 0-13-061872-1, Prentice-Hall, 2004.

Semantic Modeling and Knowledge Representation in Image Databases: Outline

• Multilayer Abstraction

• Feature Extraction.

• Salient Object Identification.

• Content-Based Indexing and Retrieval.

Page 36: Content-Based Retrieval in Image Databases. References Ze-Nian Li & Mark S. Drew, Fundamentals of Multimedia, ISBN 0-13-061872-1, Prentice-Hall, 2004.

Multi-Level Abstraction

Semantic SpecificationKnowledge Base

Semantic IdentificationProcess

Object Models

Feature Specification

Image Data

Object RecognitionProcess

Feature ExtractionProcess

Still Video FramesMultimedia

Data

FeatureExtraction

Layer

ObjectRecognition

Layer

Semantic ModelingAnd Knowledge Representation

Layer

Page 37: Content-Based Retrieval in Image Databases. References Ze-Nian Li & Mark S. Drew, Fundamentals of Multimedia, ISBN 0-13-061872-1, Prentice-Hall, 2004.

Feature Extraction Layer

• Image features: Colors, Textures, Shapes, Edges, ...etc.

• Features are mapped into a multi-dimensional feature space allowing similarity-based retrieval.

• Features can be classified into two types: Global and Local.

Page 38: Content-Based Retrieval in Image Databases. References Ze-Nian Li & Mark S. Drew, Fundamentals of Multimedia, ISBN 0-13-061872-1, Prentice-Hall, 2004.

Global Features

• Generally emphasize coarse-grained pattern matching techniques.

• Transform the whole image into a functional representation.

• Finer details within individual parts of the image are ignored.

• Examples: Color histograms and coherence vectors, Texture, Fast Fourier Transform, Hough Transform, and Eigenvalues.

• What are some of the example queries?

Page 39: Content-Based Retrieval in Image Databases. References Ze-Nian Li & Mark S. Drew, Fundamentals of Multimedia, ISBN 0-13-061872-1, Prentice-Hall, 2004.

Color Histogram

• How many pixels of the image take a specific color– In order to control the number of colors, the

domain is discretized• E.g. consider the value of the two leftmost bits in

each color channel (RGB).

• In this case , the number of different colors is equal to __________

• How can we determine whether two images are similar using the color histogram?

Page 40: Content-Based Retrieval in Image Databases. References Ze-Nian Li & Mark S. Drew, Fundamentals of Multimedia, ISBN 0-13-061872-1, Prentice-Hall, 2004.

Color Coherence Vector• Based on the color histogram• Each pixel is checked as to whether it is within a

sufficiently large one-color environment or not.– i.e. in a region related by a path of pixels of the same color

• If so, the pixel is called coherent, otherwise incoherent• For each color j, compute the number of coherent and

incoherent pixels (j , j), j = 1, ..., J• When comparing two images with color coherence

vectors (j , j) and (j , j), j = 1, ..., J, we may use the expression

J

j jj

jj

jj

jj

1 11

Page 41: Content-Based Retrieval in Image Databases. References Ze-Nian Li & Mark S. Drew, Fundamentals of Multimedia, ISBN 0-13-061872-1, Prentice-Hall, 2004.

Texture

• Texture is a small surface structure– Natural or artificial– Regular or irregular

• Examples include– Wood barks– Knitting patterns– The surface of a sponge

Page 42: Content-Based Retrieval in Image Databases. References Ze-Nian Li & Mark S. Drew, Fundamentals of Multimedia, ISBN 0-13-061872-1, Prentice-Hall, 2004.

Texture Examples

– Artificial/periodic

– Artificial/non-periodic

– Photographic/pseudo-periodic

– Photographic/random

– Photographic/structured

– Inhomogeneous (non-texture)

Page 43: Content-Based Retrieval in Image Databases. References Ze-Nian Li & Mark S. Drew, Fundamentals of Multimedia, ISBN 0-13-061872-1, Prentice-Hall, 2004.

Texture

• Two basic approaches to study texture– Structural analysis searches for small basic

components and an arrangement rule– Statistical analysis describes the texture as a

whole based on specific attributes (local gray-level variance, regularity, coarseness, orientation, and contrast.

• Either done in the spatial domain or the spatial frequency domain

Page 44: Content-Based Retrieval in Image Databases. References Ze-Nian Li & Mark S. Drew, Fundamentals of Multimedia, ISBN 0-13-061872-1, Prentice-Hall, 2004.

Global Features

• Advantages:– Simple.– Low computational complexity.

• Disadvantages:– Low accuracy

Page 45: Content-Based Retrieval in Image Databases. References Ze-Nian Li & Mark S. Drew, Fundamentals of Multimedia, ISBN 0-13-061872-1, Prentice-Hall, 2004.

Local Features

• Images are segmented into a collection of smaller regions, with each region representing a potential object of interest (fine-grained).

• An object of interest may represent a simple semantic object (e.g. a round object).

• Choice of features is domain specific:– X-ray imaging, GIS, ...etc require spatial features (e.g.

shapes [may be calculated through edges] and dimensions.)

– Paintings, MMR imaging, ...etc may use color features in specific regions of the image.

Page 46: Content-Based Retrieval in Image Databases. References Ze-Nian Li & Mark S. Drew, Fundamentals of Multimedia, ISBN 0-13-061872-1, Prentice-Hall, 2004.

Edge Detection

• A given input image E is used to gradually compute a (zero-initialized) output image A. – A convolution mask runs across E pixel by pixel

and links the entries in the mask at each position that M occupies in E with the gray value of the underlying image dots.

– The result of the linkage (and the subsequent sum across all products from the mask entry and the gray value of the underlying image pixel) is written to the output image A.

Page 47: Content-Based Retrieval in Image Databases. References Ze-Nian Li & Mark S. Drew, Fundamentals of Multimedia, ISBN 0-13-061872-1, Prentice-Hall, 2004.

Edge Detection Using Sobel Operators

• Mhoriz and Mvert are used to compute output images Ahoriz and Avert

– This provides partial derivations from E in the column and line directions

• Agrad is computed using the formula

• Example:

22 ),(),(),( yxayxayxa verthorizgrad

Page 48: Content-Based Retrieval in Image Databases. References Ze-Nian Li & Mark S. Drew, Fundamentals of Multimedia, ISBN 0-13-061872-1, Prentice-Hall, 2004.

Similarity Metrics

• Minkowski Distance

• Weighted Distance– Average Distance

• Color Histogram Intersection

rF

i

riyix

1

1

][][

Page 49: Content-Based Retrieval in Image Databases. References Ze-Nian Li & Mark S. Drew, Fundamentals of Multimedia, ISBN 0-13-061872-1, Prentice-Hall, 2004.

Prototype Systems

• QBIC (http://www.hermitagemuseum.org)– Uses color, shape, and texture features– Allows queries by sketching features and providing

color information

• Chabot (Cypress)– Uses color and textual annotation.– Improved performance due to textual annotation

(Concept Query)

• KMeD– Uses shapes and contours as features.– Features are extracted automatically in some cases and

manually in other cases.

Page 50: Content-Based Retrieval in Image Databases. References Ze-Nian Li & Mark S. Drew, Fundamentals of Multimedia, ISBN 0-13-061872-1, Prentice-Hall, 2004.

Demo (Andrew Berman & Linda G. Shapiro )

• http://www.cs.washington.edu/research/imagedatabase/demo/seg/

• http://www.cs.washington.edu/research/imagedatabase/demo/edge/

• http://www.cs.washington.edu/research/imagedatabase/demo/fids/

Page 51: Content-Based Retrieval in Image Databases. References Ze-Nian Li & Mark S. Drew, Fundamentals of Multimedia, ISBN 0-13-061872-1, Prentice-Hall, 2004.

Object Recognition Layer

• Features are analyzed to recognize objects and faces in an image database.– Features are matched with object models stored in a

knowledge base.– Each template is inspected to find the closest match.– Exact matches are usually impossible and generally

computationally expensive.– Occlusion of objects and the existence of spurious

features in the image can further diminish the success of matching strategies.

Page 52: Content-Based Retrieval in Image Databases. References Ze-Nian Li & Mark S. Drew, Fundamentals of Multimedia, ISBN 0-13-061872-1, Prentice-Hall, 2004.

Template Matching Techniques

• Fixed Template Matching– Useful if object shapes do not change with

respect to the viewing angle of the camera.

•  Deformable Template Matching– More suitable for cases where objects in the

database may vary due to rigid and non-rigid deformations.

Page 53: Content-Based Retrieval in Image Databases. References Ze-Nian Li & Mark S. Drew, Fundamentals of Multimedia, ISBN 0-13-061872-1, Prentice-Hall, 2004.

Fixed Template Matching• Image Subtraction:

– Difference in intensity levels between the image and the template is used in object recognition.

– Performs well in restricted environments where imaging conditions (such as image intensity) between the image and the template are the same. 

• Matching by correlation:– utilizes the position of the normalized cross-correlation

peak between a template and image. – Generally immune to noise and illumination effects in

the image.– Suffers from high computational complexity caused by

summations over the entire template.

Page 54: Content-Based Retrieval in Image Databases. References Ze-Nian Li & Mark S. Drew, Fundamentals of Multimedia, ISBN 0-13-061872-1, Prentice-Hall, 2004.

Deformable Template Matching• Template is represented as a bitmap describing the

characteristic contour/edges of an object shape.• An objective function with transformation

parameters which alter the shape of the template is formulated reflecting the cost of such transformations.

• The objective function is minimized by iteratively updating the transformations parameters to best match the object.

• Applications include: handwritten character recognition and motion detection of objects in video frames. 

Page 55: Content-Based Retrieval in Image Databases. References Ze-Nian Li & Mark S. Drew, Fundamentals of Multimedia, ISBN 0-13-061872-1, Prentice-Hall, 2004.

Prototype System: KMeD

• Medical objects belonging only to patients in a small age group are identified automatically in KMeD.– Such objects have high contrast with respect to

their background and have relatively simple shapes, large sizes, and little or no overlap with other objects.

• KMeD resorts to a human-assisted object recognition process otherwise.

Page 56: Content-Based Retrieval in Image Databases. References Ze-Nian Li & Mark S. Drew, Fundamentals of Multimedia, ISBN 0-13-061872-1, Prentice-Hall, 2004.

Demo

• http://www.cs.washington.edu/research/imagedatabase/demo/cars/ (check car214)

Page 57: Content-Based Retrieval in Image Databases. References Ze-Nian Li & Mark S. Drew, Fundamentals of Multimedia, ISBN 0-13-061872-1, Prentice-Hall, 2004.

Spatial Modeling and Knowledge Representation Layer (1)

• Maintain the domain knowledge for representing spatial semantics associated with image databases.

• At this level, queries are generally descriptive in nature, and focus mostly on semantics and concepts present in image databases.

• Semantics at this level are based on ``spatial events'' describing the relative locations of multiple objects.– An example involving such semantics is a range query

which involves spatial concepts such as close by, in the vicinity, larger than. (e.g. retrieve all images that contain a large tumor in the brain).

Page 58: Content-Based Retrieval in Image Databases. References Ze-Nian Li & Mark S. Drew, Fundamentals of Multimedia, ISBN 0-13-061872-1, Prentice-Hall, 2004.

Spatial Modeling and Knowledge Representation Layer (2)

• Identify spatial relationships among objects, once they are recognized and marked by the lower layer using bounding boxes or volumes.

• Several techniques have been proposed to formally represent spatial knowledge at this layer.– Semantic networks– Mathematical logic– Constraints– Inclusion hierarchies– Frames.

Page 59: Content-Based Retrieval in Image Databases. References Ze-Nian Li & Mark S. Drew, Fundamentals of Multimedia, ISBN 0-13-061872-1, Prentice-Hall, 2004.

Semantic Networks

• First introduced to represent the meanings of English sentences in terms of words and relationships between them.

• Semantic networks are graphs of nodes representing concepts that are linked together by arcs representing relationships between these concepts.

• Efficiency in semantic networks is gained by representing each concept or object once and using pointers for cross references rather than naming an object explicitly every time it is involved in a relation.

• Example: Type Abstraction Hierarchies (KMeD)

Page 60: Content-Based Retrieval in Image Databases. References Ze-Nian Li & Mark S. Drew, Fundamentals of Multimedia, ISBN 0-13-061872-1, Prentice-Hall, 2004.

Brain Lesions Representation

Page 61: Content-Based Retrieval in Image Databases. References Ze-Nian Li & Mark S. Drew, Fundamentals of Multimedia, ISBN 0-13-061872-1, Prentice-Hall, 2004.

TAH Example

Page 62: Content-Based Retrieval in Image Databases. References Ze-Nian Li & Mark S. Drew, Fundamentals of Multimedia, ISBN 0-13-061872-1, Prentice-Hall, 2004.

Constraints-based Methodology

• Domain knowledge is represented using a set of constraints in conjunction with formal expressions such as predicate calculus or graphs.

• A constraint is a relationship between two or more objects that needs to be satisfied.

Page 63: Content-Based Retrieval in Image Databases. References Ze-Nian Li & Mark S. Drew, Fundamentals of Multimedia, ISBN 0-13-061872-1, Prentice-Hall, 2004.

Example: PICTION system

• Its architecture consists of a natural language processing module (NLP), an image understanding module (IU), and a control module.

• A set of constraints is derived by the NLP module from the picture captions. These constraints (called Visual Semantics by the author) are used with the faces recognized in the picture by the IU module to identify the spatial relationships among people.

• The control module maintains the constraints generated by the NLP module and acts as a knowledge-base for the IU module to perform face recognition functions.

Page 64: Content-Based Retrieval in Image Databases. References Ze-Nian Li & Mark S. Drew, Fundamentals of Multimedia, ISBN 0-13-061872-1, Prentice-Hall, 2004.
Page 65: Content-Based Retrieval in Image Databases. References Ze-Nian Li & Mark S. Drew, Fundamentals of Multimedia, ISBN 0-13-061872-1, Prentice-Hall, 2004.

Mathematical Logic

• Iconic Indexing by 2D strings: Uses projections of salient objects in a coordinated system.

• These projections are expressed in the form of 2D strings to form a partial ordering of object projections in 2D.

• For query processing, 2D subsequence matching is performed to allow similarity-based retrieval.

• Binary Spatial Relations: Uses Allen's 13 temporal relations to represent spatial relationships.

Page 66: Content-Based Retrieval in Image Databases. References Ze-Nian Li & Mark S. Drew, Fundamentals of Multimedia, ISBN 0-13-061872-1, Prentice-Hall, 2004.

Inclusion Hierarchies

• The approach is object-oriented and uses concept classes and attributes to represent domain knowledge.

• These concepts may represent image features, high-level semantics, semantic operators and conditions.

Page 67: Content-Based Retrieval in Image Databases. References Ze-Nian Li & Mark S. Drew, Fundamentals of Multimedia, ISBN 0-13-061872-1, Prentice-Hall, 2004.

Frames

• A frame usually consists of a name and a list of attribute-value pairs.

• A frame can be associated with a class of objects or with a class of concepts.

• Frame abstractions allow encapsulation of file names, features, and relevant attributes of image objects.