Image Formats - skynet.beusers.skynet.be/blaton/optische/Image_Formats.doc · Web viewLossless...

Click here to load reader

Transcript of Image Formats - skynet.beusers.skynet.be/blaton/optische/Image_Formats.doc · Web viewLossless...

Image Formats

Image Formats

· Most popular compression techniques

· Most popular image formats

· GIF

· JPG

· BMP

· TIFF

· Graphic file formats: survey of extensions

· Special Effects

· GIF animation

· Transparent GIF

· Interlaced GIF

· Progressive JPG

Almost every computer program has its own unique way to store information for later retrieval. The way programs organize this information determines the file format.For graphic files alone, there are probably over 100 different file formats. Some of these files are unique to specific programs while others are general formats that extend over several platforms. 

Many image file formats use compression techniques to reduce the storage space (file size). Original bitmap image are large and take a lot of disk space. An even more important reason for compression is the transmission speed over data links (LAN, WAN, intranet, internet). A small price must be paid for the more efficient storage and the gain of transmission time:  the time to execute the compression and decompression process.

There are two types of compression methods: lossy and lossless compression. Lossless techniques compress image data without removing detail and colour from the image;  lossy compression usually produces smaller file sizes than lossless ones, but some details may be lost.

In the chapter Image Compression Techniques, related topics are worked out.

Most popular compression techniques

The following are commonly used compression techniques (source: Adobe Photoshop 5). More details will be explained in the chapter Image Compression Techniques.

• Run Length Encoding (RLE) is a lossless compression technique supported by the Photoshop and TIFF file formats and some common Windows file formats.

• Lemple-Zif-Welch (LZW) is a lossless compression technique supported by TIFF, PDF, GIF, and PostScript language file formats. This technique is most useful in compressing images that contain large areas of single color, such as screenshots or simple paint images.

• Joint Photographic Experts Group (JPEG) is a lossy compression technique supported by JPEG, PDF, and PostScript language file formats. JPEG compression provides the best results with continuous-tone images, such as photographs.

• CCITT encoding is a family of lossless compression techniques for black-and-white images that is supported by the PDF and PostScript language file formats. (CCITT is an abbreviation for the French spelling of International Telegraph and Telekeyed Consultive Committee.)

• ZIP encoding is a lossless compression technique supported by the PDF file format. Like LZW, ZIP compression is most effective for images that contain large areas of single color.

The most widely used image formats are:

· GIF

· JPG

· BMP

· TIFF

GIF

The Graphics Interchange Format is a very popular image format to display indexed colour grahps and images in html documents. GIF is a LZW-compressed format (a lossless compression method) designed to minimize file size and electronic transfer time: GIF allows  rapid decoding for on-line viewing.

The GIF98a standard includes the features transparent GIF, interlaced GIF and animated GIF. GIF supports multiple images in a single file [you can use the GIF format to create animated images]. GIF is supported by most browsers., and preserves transparacy. It does not support alfa channels [Alpha channels allows you to store selections as grayscale images, called masks]. 

The main drawback is that it works with only 256 colours out of a 16-million-colour palette, or with 256 grayscales for black and white images. Thus optimising an original 24-bit image as an 8-bit GIF can result in the loss of colour information.  The format is encumbered by a patent: the copyright is owned by the Compuserve company!

JPG

JPEG (pronounced "jay-peg") is a standardized image compression mechanism. JPEG stands for Joint Photographic Experts Group, the original name of the committee that wrote the standard.

JPEG is designed for compressing either full-color or gray-scale continuous-tone images. Images and photographs of natural, real-world scenes, artwork, i.e..  It works not so well on characters, simple cartoons, or line drawings. The format is commonly used to display continuous-tone images in html documents, it is supported by all browsers.

JPEG compresses file size by discarding selectively data. Because it discards data, JPEG compression is referred to as lossy.  JPEG is designed to exploit known limitations of the human eye, notably the fact that small color changes are perceived less accurately than small changes in brightness. JPEG preserves the broad range and subtle variations in brightness and hue found in photographs. Thus for human eyes - unlike the GIF format - JPEG retains all colour information. But if you plan to machine-analyze your images, the small errors introduced by JPEG may be a problem for you, even if they are invisible to the eye.

An interesting feature of this format is that you can vary the degree of compression. This means that the image maker can trade off file size against output image quality: you can decide what level of data retention vs. space savings is best for your needs. JPEG achieves compression ratios of up to 100 to 1. This is far better than 10 to 1 which may be the best most other compression methods might produce. A too low quality setting can degrade sharp detail in an image, and artifacts, such as wave-like patterns or blocky areas of banding, are created.

This format supports grayscale and True Colour data types, but does not support alpha channels.  The JPEG format does not support transparency. Because it uses a lossy compression method, indexed and black and white data types do not reproduce well and are not supported. 

You can create a progressive JPEG file, in which a low-resolution version of the image appears in a browser while the full image is downloading.

BMP (bitmap)

BMP is a standard Windows image format for DOS and Windows-compatible computers. The BMP format supports several data types ranging from Grayscale, Indexed Colour to 24-bit true colour modes. BMP does not support alpha channels.

Although the BMP format supports RLE compression, most programs do not take advantage of it.  [You can specify either Microsoft® Windows or OS/2® format and a bit depth (1, 4, 8, or 24 bit)  for the image. For 4-bit and 8-bit images using Windows format, you can also specify RLE compression].

The BMP file size is always rather large - RLE compression is weak. It is used for Windows wallpaper images or for pictograms on your desktop.

TIF (Tagged Image File Format)

This is a standard file format for most imaging programs. It supports all data types from monochrome up to 24-bit true colour modes, as well as many color models and compression schemes. An even more powerful aspect of TIF is that its files can move easily between platforms, making it an ideal format for storing image data.

Graphics File Formats: survey of extensions

001

DRW

ICO

PCD

PSD

SHG

BMP

DXF

IFF

PCT

PXR

TGA

CGM

EPS

IMG

PCX

RAS

TIF

CLP

FAX

JPG

PIC

RLE

UFO

CUR

GIF

MAC

PLT

SCI

WMF

DCS

HGL

MSP

PNG

SCT

WPG

DCX

PPM

· Hayes JT Fax (001)A black and white file format used by Hayes-compatible digital faxes. This format uses CCITT compression to minimize file size.

· Bitmap (BMP)This is a widely recognized format made popular by Microsoft Windows and IBM OS/2. It supports several data types ranging from black and white all the way up to 24-bit true colour modes. It is used for Windows wallpaper images or for pictograms on your desktop. Although the BMP format supports RLE compression, most programs do not take advantage of it. Therefore, the BMP file size is always (too) large.

· Computer Graphics Metafile (CGM)This versatile metafile format was developed with the joint efforts of The International Standards Organization (ISO) and American Standards National Institute (ANSI) as a common format that could be used in several independent systems. Although it can reproduce many types of images very well, it is best suited for graphic arts technical illustration, and electronic publishing. It uses RLE and CCITT compression methods.

· Microsoft Windows Clipboard (CLP)When you cut or copy data from any Windows program to the clipboard, it takes this format. depending on the nature of the data, it may be a device dependent, device independent, or metafile format. The clipboard format supports all data types.

· Microsoft Windows Cursor (CUR)You mouse pointer likely uses (or at one time used) the .CUR format. The features that set apart from most other file formats are:1. It can only be 32 x 32 pixels in size.2. It is a 2-bit data type, using Black, white, transparent, and inverse.When Ulead programs open .CUR files, the transparent color appears as white and the inverse color as black.

· Desktop Color Separation (DCS)The Desktop Color Separation format was developed by Quark, Inc. as an enhancement to the EPS format. It uses the CMYK color model to produce images for four color printing. One of the unique aspects to this format is that image data consists of five files. Four files contain information about each color channel and the fifth acts to coordinate the information.

· Intel SatisFaxtion (DCX)A black and white file format used by digital fax software created by Intel Corporation.

· Micrografx Designer Draw (DRW)A vector-based format developed by Micrografx for their Designer graphics program. It supports 24-bit color and is widely used for graphic design work.

· AutoCAD Drawing Exchange (DXF)One of the first CAD (Computer Aided Design) file formats to gain wide recognition on the PC platform. It supports 2-dimensional CAD drawings and is widely used by many professional designers to create architectural, structural, and technical drawings.

· Encapsulated Postscript (EPS)EPS is a very popular device independent format file format among desktop publishers and computer illustrators. The format is used to transfer PostScript-language artwork between applications, e.g. output to a printer. The EPS format supports both bitmap (Lab, CMYK, RGB, Indexed Colour, Duotone, Grayscale, 24-bit true colour modes) and vector data, but does not support alpha channels. It is supported by virtually all graphic, illustration, and page-layout programs. For viewing, EPS files contain preview information in their headers.EPS files can be saved in ASCII and Binary formats. Saving in binary often results in as much as 50% savings in space requirements but many popular programs are not able to interpret it. ASCII files, while larger are more well accepted. Ulead programs can read both ASCII and binary formats with some limitations:1. Ulead programs may not be able to read EPS files containing non-bitmap data2 Not all programs can display EPS files without the header. While adding a header may add to the space requirements, it makes the file more flexible.

· Generic Fax (G3 1-D Encoded) (FAX)A black and white file format used by generic digital fax software. This format uses CCITT compression to minimize file size.

· Graphics Interchange Format (GIF)This format was developed by CompuServe in 1987 as a device independent format for exchanging image data. Due to the quality of the images and relatively small file size, it became a popular format for images stored on electronic bulletin boards and other on-line services. GIF was designed to support image dimensions of up to 64000 pixels, 256 colours out of a 16-million-colour palette, multiple images in a single file, rapid decoding for on-line viewing, efficient LZW-compression, and hardware independence. The Lempel Ziv Welch algorithm compresses efficiently solid areas while preserving sharp details, such as in line art, logos or illustrations with text.

There are two types of GIF files: 

GI87a allows for the following features

· LZW compressed images

· multiple images encoded within a single files

· positioning of the images on a logical screen area (need to confirm this fact)

· interlacing (need to confirm this fact)

This means that it is possible to do simple animation with GIFs by encoding multiple images, what we will refer to as "frames", in a single file.

GIF89a is an extension of the 87a specification. GIF89a added:

· how many 100ths of a second to wait before displaying the next frame

· wait for user input

· specify transparent color

· include unprintable comments

· display lines of text

· indicate how the frame should be removed after it has been displayed

· application-specific extensions encoded inside the file

· Hewlett Packard Graphics Language (HGL or PLT)This is a vector-based language developed by Hewlett Packard Inc. for their family of laser, bubble jet, and plotter printers. It is a black and white file format primarily used for producing accurate line drawings such as those used for CAD. As the HP family of printers has gained wide acceptance in the PC marketplace, virtually every word processing and graphics program produced today is able to interpret files using this format.

· Microsoft Windows Icon (ICO)Many of the icons in your desktop are ICO images. This 2- or 16-color file format is unique in several ways. First, besides the standard color palette, it adds two more colors – inverse and transparent. The transparent color allows you to see the windows desktop behind the icon. The inverse color shows the complementary color to the desktop behind the icon. Also, icons have fixed dimensions. Finally, this format allows you to store several images in one file. For example, some windows programs have several icons to account for different screen resolutions or display types. Others may have several icons to offer users choices.One thing to remember when using Ulead programs with icon files is that Ulead does not support the transparent and inverse features. When Ulead programs see a transparent pixel, they change it to white. An inverse pixel becomes black.

· Interchange File Format (IFF)The Interchange File Format is most popular on the Amiga computer for use as a DTP, image processing, and painting format. It was originally developed by Electronic Arts for their Deluxe Paint program. It supports anything from black and white to True Color using RLE compression.

· GEM Image (IMG)Developed by Digital Research Corporation for programs in the GEM environment, this format gained popularity due to its use with Ventura Publisher. Using RLE compression, you can save monochrome, grayscale, and indexed 256 color images as IMG files.

· JPEG File Interchange Format (JPG)Is a standardized image compression mechanism. JPEG stands for Joint Photographic Experts Group, the original name of the committee that wrote the standard.   JPEG is designed for compressing either full-color or gray-scale images of natural, real-world scenes. It works well on photographs, naturalistic artwork, and similar material; not so well on lettering, simple cartoons, or line drawings. This is because some "averaging" takes place during compression, and edges may be blurred. In photographs, this is not so noticeable because such sharp edges are rare. JPEG handles only still images, but there is a related standard called MPEG for motion pictures.JPEG is "lossy," meaning that the decompressed image isn't quite the same as the one you started with. JPEG is designed to exploit known limitations of the human eye, notably the fact that small color changes are perceived less accurately than small changes in brightness. Thus, JPEG is intended for compressing images that will be looked at by humans. If you plan to machine-analyze your images, the small errors introduced by JPEG may be a problem for you, even if they are invisible to the eye.JPEG achieves compression ratios of up to 100 to 1. This is far better than 10 to 1 which may be the best most other compression methods might produce. This format supports grayscale and True Color data types. Because it uses a lossy compression method, indexed and black and white data types do not reproduce well and are not supported. One interesting feature of this format is that you can vary the degree of compression. This means that the image maker can trade off file size against output image quality: you can decide what level of data retention vs. space savings is best for your needs. In Adobe Photoshop, the compression quality can be chosen between 1 (small file, low quality) and 10 (large file, high quality).

· MacPaint (MAC)This is the original graphics format used by the Apple Macintosh. It is a 1-bit black and white format that supports a maximum image size of 720 x 756 pixels. It uses RLE compression and is still widely used for black and white or dithered grayscale images by many program on the Apple platform.

· Microsoft Paint (MSP)The Microsoft Paint file format is a 1-bit black and white format for creating clip art and other black-and-white images. Using RLE compression, the format compresses well. Still, this format is quickly losing popularity to the Microsoft Windows Bitmap format.

· Kodak Photo CD (PCD)When the prospect of using computers to save images became a reality, Eastman Kodak developed this format to store photographs digitally. This proprietary format is used by images in Kodak CDs. To support multiple display modes, each PCD file contains image data in 5 resolutions, and allows you to choose between viewing the image in True Color, indexed 256 color, or grayscale.

· Quick Draw Picture (PCT)The PICT format is developed by Apple. It is widely used among Mac OS graphics and page-layout applications as an intermediary file format for transferring images between applications (both bitmap and vector data exchange). There have been two major releases of this format: Quick Draw I is a monochrome bitmap format that allows images up to 32 KB in size with a fixed resolution of 72 dpi. Quick Draw II supports all data types up to True Colour: RGB images with a single alpha channel, and indexed-colour, grayscale, and 24-bit true colour modes images without alpha channels. The PICT format is especially effective at compressing images with large areas of solid colour. However, its complex nature makes it less popular on other platforms. 

· PC Paintbrush (PCX)Originally developed by ZSoft Corporation for their PC Paintbrush program. Shortly after its development, ZSoft entered into several OEM agreements with early fax board and scanner manufactures. As a result, the PCX format is commonly used with IBM PC-compatible computers. It supports RGB, Indexed Colour, Grayscale, and 24-bit true colour modes, and does not support alpha channels. PCX supports the RLE compression method. Images can have a bit depth of 1, 4, 8, or 24.  Some PCX images do not include resolution information in their headers. If a Ulead program opens such an image, it will automatically match the image resolution with the current display resolution.

· Lotus Picture (PIC)The Lotus Picture format is a vector format used to create charts and graphs for their early word processing and spreadsheet programs.

· Hewlett Packard Graphics Language (HGL or PLT)This is a vector-based language developed by Hewlett Packard Inc. for their family of laser, bubble jet, and plotter printers. It is a black and white file format primarily used for producing accurate line drawings such as those used for CAD. As the HP family of printers has gained wide acceptance in the PC marketplace, virtually every word processing and graphics program produced today is able to interpret files using this format.

· Portable Network Graphics (PNG)PNG, pronounced "Ping," was developed as a patent-free alternative to GIF. It is designed primarily for sharing image data on the www. Nowadays, most Web browsers support PNG images.Perhaps one of the most fascinating aspects of this format is how it opens images. Unlike most other file formats, this one is designed to show a representation of the image as fast as possible. When opening a PNG image, it first appears out of focus and gradually becomes more clear. In this way, you gain a better idea of the entire image faster, (and can cancel a download sooner if you see that you don't want it). Another feature, especially designed for on-line concerns, is easy file checking for transmission accuracy and against file corruption. PNG supports indexed 256 color, true color, and grayscale data types.Unlike GIF, PNG supports 24-bit images: the format supports RGB, indexed-colour, grayscale, and true colour images without alpha channels. PNG preserves transparency in grayscale and RGB images. It produces background transparency without jagged edges.The PNG-8 format uses 8-bit colour. Like the GIF format, PNG-8 compresses efficiently solid areas of colour while preserving sharp detail, such as that in line art, logos, or illustrations containing typed information. PNG-8 files use more advanced compression schemes than GIF, and can be 10-30% smaller than GIF files of the same image, depending on the image's colour patterns. However with certain images, especially those with very few colours and very simple patterns, GIF compression can create a smaller file than PNG-8 compression.As with the GIF format, you can reduce the number of colours in the image and choose options to control the way colours dither in the application or in the browser. The PNG-8 format supports background transparency and background matting, in which you blend the edges of the image with a Web page background colour.The PNG-24 format supports 24-bit colour. Like the JPEG format, PNG-24 preserves the broad range and subtle variations in brightness and hue found in photographs. Like the GIF and PNG-8 formats, PNG-24 preserves sharp detail, such as that in line art, logos, or illustrations with type.The PNG-24 format uses the same lossless compression method as the PNG-8 format. For that reason, PNG-24 files are usually larger than JPEG files of the same image. PNG-24 browser support is similar to that for PNG-8.In addition to the support of background transparency and background matting, the PNG-24 format supports multilevel transparency. Multilevel transparency allows you to preserve up to 256 levels of transparency to blend the edges of an image smoothly with any background colour.

· PPMThe portable pixmap format is the lowest common denominator colour image file format. It is a part of the Extended Portable Bitmap Utilities (PBMPLUS). PPM is used as an intermediate format for storing colour bitmap information generated by the PBMPLUS toolkit. PPM files may be either binary or ASCII and store pixel values sized up to 24 bits.

· Adobe PhotoShop (PSD)Photoshop format is the default file format in Adobe Photoshop for newly created images. It is the only format which supports all available image modes (Bitmap, Grayscale, Duotone, Indexed Colour, RGB, CMYK, Lab, and Multichannel), guides, alpha channels, spot channels, and layers (including adjustment layers, type layers, and layer effects). It  is one the most popular formats for professional photographers who use desktop computers to touch up their work.

· Pixar Paint (PXR)PXR is the picture storage standard created by Pixar for Pixar Pixel Paint. This format encodes and compresses grayscale and True Color images.

· Sun Raster (RAS)Developed by Sun Microsystems for its FrameMaker program, this format is widely accepted by UNIX-based imaging programs. It uses an RLE compression method to compress images. Sun Raster images can be anything from Black and White to True Color.

· Run Length Encoded (RLE)This device independent format is especially well suited for simple images containing long strings of repeated information stored in Pack Bit form. Many paint programs, including MacPaint, support this format. Ulead programs support indexed 16- and 256-collor RLE files.

· SCIFax (SCI)A black and white file format used by SCIFax fax software. This format uses CCITT compression to minimize file size.

· Scitex CT (SCT)Developed by Scitex as a format for image processing programs, SCT supports grayscale and CMYK true color (4-2-4) images.

· Segmented Hotspot Graphic (SHG)This is an essential file format people who write on-line helps such as this. This file format allows you to assign multiple jumps or pop ups to images depending on where the mouse clicks on the image in a Windows Help file. Several of the images in this help were created as .SHG files.

· Targa (TGA)This format, one of the principal true color image formats, can store image data with up to 32 bits per pixel. Targa format supports 24-bit RGB images (8 bits x 3 colour channels) and 32-bit RGB images (8 bits x 3 colour channels plus a single 8-bit alpha channel). Compared to TIF and JPEG which are other options for true color images, TGA is relatively simple and therefore widely used in imaging programs. The only drawback to this format is that it lacks a good compression scheme.The Targa® format is designed for systems using the Truevision® video board and is commonly supported by MS-DOS colour applications.

· Tagged Image File Format (TIF)This is a flexible bitmap image format supported by virtually all paint, image-editing, and page-layout applications. It supports all data types from monochrome up to 24-bit true colour, as well as many color models and compression schemes. Almost all desktop scanners can produce TIFF images. An even more powerful aspect of TIF is that its files can move easily between platforms,  therefore, it is mainly used to exchange files between applications and computer platforms. TIFF format supports CMYK, RGB, Lab, indexed-colour, and grayscale images with alpha channels and Bitmap-mode images without alpha channels. Photoshop can save layers in a TIFF file; however, if you open the file in another application, only the flattened image is visible. It is also possible to save annotations, transparency, and multiresolution pyramid data in TIFF format.

· Ulead File for Objects (UFO)This proprietary file format was designed by Ulead Systems for use in PhotoImpact. It is the only format supported by Ulead programs that allows objects that have not been merged to the base image to be retained. UFO files use RLE or no compression when saved.

· Windows Metafile (WMF)The Windows Metafile, developed by Microsoft, is a basic format for graphics programs and is supported on all platforms. Although specific to Microsoft Windows, numerous programs that are not Windows-based also support it in order to interchange image information with Windows programs.

· WordPerfect Graphics (WPG)Developed by Word Perfect for use with its line of products, this format uses a simple 2-byte packet scheme to enhance data compression performance. Although WPG is known to be a general purpose vector graphics format, it can also store bitmap graphics files. WPG is well supported by many platforms, including MS-DOS, UNIX, and Apple.

Special Effects

· Animated GIF

· Transparent GIF

· Interlaced GIF

· Progressive JPG

Animated GIF

Basically, when using a GIF Animation Program (e.g. "Animation Shop" in Paint Shop Pro), you compile a series of images or "cells" (much like cartoon animation) adding "control blocks" which hold the instructions (such as how long before each image is replaced and whether or not to loop the animation repetitively). 

Our first example uses only two images:

First Image

Second Image

Using one of the GIF Animation programs, we have one image following the other, we add a "looping" command (to repeat the animation over and over again), save the file and ....ready!

Other examples :

 

Transparent GIF

Transparent GIFs are useful because they appear to blend in smoothly with the user's display, even if the user has set a background colour that differs from the one the developer expected. They do this by assigning one colour to be transparent. If the web browser supports transparency, that color will be replaced by the browser's background color, whatever it may be.

Example :

non-transparent GIF

transparent GIF

 

Interlaced GIF

Non-interlaced GIFs arrive linearly from the top row to the bottom row. If data tranfert is slow, it will take some time before the user can see the bottom part of the image. Interlaced GIFs appear first with poor resolution and then improve in resolution until the entire image has arrived.  Interlacing gives a quick idea of what the entire image will look like while waiting for the rest. It is recommeded for large images only, because the total downloadtime is longer. Interlaced GIFs don't work if  your browser doesn't support progressive display as the image is downloaded. But non-progressive-display web browsers will still display interlaced GIFs once they have arrived in their entirety.

Interlaced

Non interlaced 

Progressive JPEG

A simple or "baseline" JPEG file is stored as one top-to-bottom scan of the image. Progressive JPEG divides the file into a series of scans. The first scan shows the image at the equivalent of a very low quality setting, and therefore it takes very little space. Following scans gradually improve the quality. Each scan adds to the data already provided, so that the total storage requirement is about the same as for a baseline JPEG image of the same quality as the final scan. [Basically, progressive JPEG is just a rearrangement of the same data into a more complicated order.]

The advantage of progressive JPEG is that if an image is being viewed on-the-fly as it is transmitted, one can see an approximation to the whole image very quickly, with gradual improvement of quality as one waits longer; this is much nicer than a slow top-to-bottom display of the image. The disadvantage is that each scan takes about the same amount of computation to display as a whole baseline JPEG file would. So progressive JPEG only makes sense if one has a decoder that's fast compared to the communication link. [If the data arrives quickly, a progressive-JPEG decoder can adapt by skipping some display passes. Hence, those of you fortunate enough to have IDSN, CaTV, ADSL or other fast net links may not see any difference between progressive and regular JPEG; but on a modem-speed link, progressive JPEG is great.]

Up until recently, there weren't many applications in which progressive JPEG looked attractive, so it hasn't been widely implemented. But with the popularity of World Wide Web browsers running over slow modem links, and with the ever-increasing horsepower of personal computers, progressive JPEG can become a win for WWW use

Characteristics of Image Operations

· Types of operations

· Types of neighborhoods

There is a variety of ways to classify and characterize image operations.Important is to understand:- what are the expected results to achieve with a given type of operation- what is the computational burden associated with a given operation.

Types of operations

The types of operations that can be applied to digital images can be classified into three categories. An input image a[m,n]  will be transformed into an output image b[m,n] (or another representation)  as shown in Table 1.

Operation

Characterization

Generic Complexity/Pixel

* Point

- the output value at a specific coordinate is dependent only on the input value at that same coordinate.

constant

* Local

- the output value at a specific coordinate is dependent on the input values in the neighborhood of that same coordinate.

P2

* Global

- the output value at a specific coordinate is dependent on all the values in the input image.

N2

Table 1: Types of image operations. Image size = N x N; neighborhood size = P x P. Note that the complexity is specified in operations per pixel.

This is shown graphically in figure 1.

Figure 1: Illustration of various types of image operations

Point operations

Point operations are algorithms to modify images on a pixel by pixel basis. Each pixel is replaced by a new pixel whose value depends only on the current pixel's value, and / or its location. Point processes are the simplest form of image processing algorithm, requiring only a single pass over the image data. Some point process functions are better implemented using a lookup table. The point processes which we wish to describe include: image negation, thresholding, brightness adjustment, contrast stretching, and histogram equalisation.

Local operations = area operations

Local operations or Area processes transform an image, on a pixel by pixel basis, giving each pixel a new value depending on the values, and / or positions, of its neighbors. The neighborhood of a pixel forms a P by P grid around the pixel, where P is an odd number. Area processes require only a single pass over the image data, but are still more computationally expensive than point processes, as the number of pixels read to calculate each new pixel value is P2 times greater than an equivalent point process.

The pixel neighborhood provides brightness trend, or spatial frequency information. Spatial frequency is the rate at which pixel intensity changes. A low spatial frequency denotes large regions of unchanging pixel intensity, whilst a high spatial frequency comes from regions of rapidly changing pixel intensity. Local spatial filters can be used for noise removal, image smoothing, image sharpening, edge enhancement and edge detection.

Global operations = frequency operations

The frequency domain contains spatial frequency information, i.e. information about the rate of change of pixel intensity in the whole image. Of the area processing methods listed above, those that explicitly use information from this domain are low and high pass filters, edge detection and edge enhancement filters. This is because the convolution process can also be thought of as a conversion from the spatial domain to the frequency domain, and back again. Strictly speaking, this is only true when the size of the filter matches the size of the image. For many applications it may be less computationally expensive to transform the image using the Fourier technique, and work on data directly in the frequency domain. One can use filters for enhancing or removing certain frequency components of an image. These global spatial filters are based on the same mathematical principles as analogue filters in electronics. 

Remark

It should be noted that there is some blurring of the boundaries between point, area and frequency operations. Certain area processes (e.g. median filtering) can be thought of as a particular type of point process, and certain frequency domain methods can be implemented as an area process. In the set area processes mentioned above, only neighborhood averaging and median filtering need not be thought of as frequency domain methods.

 

Types of neighborhoods for local operations.

Neighborhood operations play a key role in modern digital image processing. It is therefore important to understand how images can be sampled and how that relates to the various neighborhoods that can be used to process an image.

* Rectangular sampling - In most cases, images are sampled by laying a rectangular grid over an image as illustrated in figure 1 in section Digitalisation. This results in the type of sampling shown in figure 2a and 2b.

* Hexagonal sampling - An alternative sampling scheme is shown in figure 2c and is termed hexagonal sampling.

Both sampling schemes have been studied extensively and both represent a possible periodic tiling of the continuous image space. We will restrict our attention, however, to only rectangular sampling as it remains, due to hardware and software considerations, the method of choice.

Local operations produce an output pixel value b[m=mo,n=no] based upon the pixel values in the neighborhood of a[m=mo,n=no]. Some of the most common neighborhoods are the 4-connected neighborhood and the 8-connected neighborhood in the case of rectangular sampling and the 6-connected neighborhood in the case of hexagonal sampling illustrated in figure 2c.

Figure 2 (a) Rectangular sampling (b) Rectangular sampling (c) Hexagonal sampling(a) 4-connected  (b) 8-connected (c)6-connected

Mathematical Tools

Certain tools are central to the processing of digital images. These include mathematical tools such as convolution, Fourier analysis, and statistical descriptions, and manipulative tools such as chain codes and run codes. We will present these tools without any specific motivation. The motivation will follow in later sections.

· Convolution

· Properties of Convolution

· Fourier Transforms

· Properties of Fourier Transforms

· Statistics

· Contour Representations

Convolution

There are several possible notations to indicate the convolution of two (multi-dimensional) signals to produce an output signal. The most common are:

We shall use the first form, , with the following formal definitions.

In 2D continuous space:

In 2D discrete space:

Properties of Convolution

There are a number of important mathematical properties associated with convolution.

* Convolution is commutative.

* Convolution is associative.

* Convolution is distributive.

where a, b, c, and d are all images, either continuous or discrete.

Fourier Transforms

The Fourier transform produces another representation of a signal, specifically a representation as a weighted sum of complex exponentials. Because of Euler's formula:

where  j2=-1, we can say that the Fourier transform produces a representation of a signal as a weighted sum of sines and cosines. The defining formulas for the forward Fourier and the inverse Fourier transforms are as follows. Given an image a and its Fourier transform A, then the forward transform goes from the spatial domain (either continuous or discrete) to the frequency domain which is always continuous.

Forward:  

The inverse Fourier transform goes from the frequency domain back to the spatial domain.

Inverse:  

The Fourier transform is a unique and invertible operation so that:

The specific formulas for transforming back and forth between the spatial domain and the frequency domain are given below.

In 2D continuous space:

                       Forward:    

 Inverse:   

In 2D discrete space:

Forward:    

          Inverse:    

Properties of Fourier Transforms

· Properties

· Circularly symmetric signals

· Examples of 2D signals and transforms

Properties

There are a variety of properties associated with the Fourier transform and the inverse Fourier transform. The following are some of the most relevant for digital image processing.

* The Fourier transform is, in general, a complex function of the real frequency variables. As such the transform can be written in terms of its magnitude and phase.

* A 2D signal can also be complex and thus written in terms of its magnitude and phase.

* If a 2D signal is real, then the Fourier transform has certain symmetries.

The symbol (*) indicates complex conjugation. For real signals, the equation above leads directly to:

* If a 2D signal is real and even, then the Fourier transform is real and even.

* The Fourier and the inverse Fourier transforms are linear operations.

where a and b are 2D signals (images) and w1 and w2 are arbitrary, complex constants.

* The Fourier transform in discrete space, A(,), is periodic in both and . Both periods are 2.

* The energy, E, in a signal can be measured either in the spatial domain or the frequency domain. For a signal with finite energy:

Parseval's theorem (2D continuous space):

Parseval's theorem (2D discrete space):

This "signal energy" is not to be confused with the physical energy in the phenomenon that produced the signal. If, for example, the value a[m,n] represents a photon count, then the physical energy is proportional to the amplitude, a, and not the square of the amplitude. This is generally the case in video imaging.

* Given three, multi-dimensional signals a, b, and c and their Fourier transforms A, B, and C:

 

In words, convolution in the spatial domain is equivalent to multiplication in the Fourier (frequency) domain and vice-versa. This is a central result which provides not only a methodology for the implementation of a convolution but also insight into how two signals interact with each other - under convolution - to produce a third signal. We shall make extensive use of this result later.

* If a two-dimensional signal a(x,y) is scaled in its spatial coordinates then:

* If a two-dimensional signal a(x,y) has  the Fourier spectrum A(u,v) then:

* If a two-dimensional signal a(x,y) has the Fourier spectrum A(u,v) then:

Circularly symmetric signals

An arbitrary 2D signal a(x,y) can always be written in a polar coordinate system as a(r,). When the 2D signal exhibits a circular symmetry this means that:

where r2 = x2 + y2 and tan = y/x. As a number of physical systems such as lenses exhibit circular symmetry, it is useful to be able to compute an appropriate Fourier representation.

The Fourier transform A(u, v) can be written in polar coordinates A(r,) and then, for a circularly symmetric signal, rewritten as a ankel transform:

where and Jo(*) is a Bessel function of the first kind of order zero.

The inverse ankel transform is given by:

The Fourier transform of a circularly symmetric 2D signal is a function of only the radial frequency, r. The dependence on the angular frequency, , has vanished. Further, if a(x,y) = a(r) is real, then it is automatically even due to the circular symmetry. According to equation , A(r) will then be real and even.

Examples of 2D signals and transforms

Table 1 shows some basic and useful signals and their 2D Fourier transforms. Two standard signals used in this table are u(*), the unit step function, and J1(*), the Bessel function of the first kind. Circularly symmetric signals are treated as functions of r.

T.1 Rectangle

T.2 Pyramid

T.3 Pill Box

T.4 Cone

T.5 Airy PSF

T.6 Gaussian

T.7 Peak

T.8 Exponential

Decay

Table 1: 2D Images and their Fourier Transforms

In the remainder of this chapter we will refer to a spatial domain term as the point spread function (PSF) or the 2D impulse response and its Fourier transforms as the optical transfer function (OTF) or simply transfer function. 

 

Statistics

· Probability distribution function of the brightnesses

· Probability density function of the brightnesses

· Average

· Standard deviation

· Coefficient-of-variation

· Percentiles

· Mode

· SignaltoNoise ratio

In image processing it is quite common to use simple statistical descriptions of images and sub-images. The notion of a statistic is intimately connected to the concept of a probability distribution, generally the distribution of signal amplitudes. For a given region--which could conceivably be an entire image--we can define the probability distribution function of the brightnesses in that region and the probability density function of the brightnesses in that region. We will assume in the discussion that follows that we are dealing with a digitized image a[m,n].

Probability distribution function of the brightnesses

The probability distribution function, P(a), is the probability that a brightness chosen from the region is less than or equal to a given brightness value a. As a increases from - to +, P(a) increases from 0 to 1. P(a) is monotonic, non-decreasing in a and thus dP/da >= 0.

Probability density function of the brightnesses

The probability that a brightness in a region falls between a and a+a, given the probability distribution function P(a), can be expressed as p(a)a where p(a) is the probability density function:

Because of the monotonic, non-decreasing character of P(a) we have that:

For an image with quantized (integer) brightness amplitudes, the interpretation of a is the width of a brightness interval. We assume constant width intervals. The brightness probability density function is frequently estimated by counting the number of times that each brightness occurs in the region to generate a histogram, h[a]. The histogram can then be normalized so that the total area under the histogram is 1 (see eq. above). Said another way, the p[a] for a region is the normalized count of the number of pixels, , in a region that have quantized brightness a:

The brightness probability distribution function for the image (a) below is shown in figure (b). The (unnormalized) brightness histogram of (b), which is proportional to the estimated brightness probability density function, is shown in (c). The height in this histogram corresponds to the number of pixels with a given brightness.

 

 

Figure 1:  (a) Original image. (b) Brightness distribution function with minimum, median, and maximum indicated. (c) Brightness histogram.

Both the distribution function and the histogram as measured from a region are a statistical description of that region. It must be emphasized that both P[a] and p[a] should be viewed as estimates of true distributions when they are computed from a specific region. That is, we view an image and a specific region as one realization of the various random processes involved in the formation of that image and that region. In the same context, the statistics defined below must be viewed as estimates of the underlying parameters.

Average

The average brightness of a region is defined as the sample mean of the pixel brightnesses within that region. The average, ma, of the brightnesses over the pixels within a region () is given by:

Alternatively, we can use a formulation based upon the (unnormalized) brightness histogram, h(a) = *p(a), with discrete brightness values a. This gives:

The average brightness, ma, is an estimate of the mean brightness, ua, of the underlying brightness probability distribution.

Standard deviation

The unbiased estimate of the standard deviation, sa, of the brightnesses within a region () with pixels is called the sample standard deviation and is given by:

Using the histogram formulation gives:

The standard deviation, sa, is an estimate of a of the underlying brightness probability distribution.

Coefficient-of-variation

The dimensionless coefficient-of-variation, CV, is defined as:

Percentiles

The percentile, p%, of an unquantized brightness distribution is defined as that value of the brightness a such that:

P(a) = p%

or equivalently

Three special cases are frequently used in digital image processing:

* 0% the minimum value in the region* 50% the median value in the region* 100% the maximum value in the region

All three of these values can be determined from figure 1b.

Mode

The mode of the distribution is the most frequent brightness value. There is no guarantee that a mode exists or that it is unique.

SignaltoNoise ratio

The signal-to-noise ratio, SNR, can have several definitions. The noise is characterized by its standard deviation, sn. The characterization of the signal can differ. If the signal is known to lie between two boundaries, amin <= a <= amax, then the SNR is defined as:

Bounded signal -

If the signal is not bounded but has a statistical distribution then two other definitions are known:

Stochastic signal - S & N inter-dependent

S & N independent

where ma and sa are defined above.

The various statistics are given in Table 1 for the image and the region shown in figure 2.

     Figure 2; Table 1: Region is the interior of the circle. Statistics from figure 2

A SNR calculation for the entire image based on eq. is not directly available. The variations in the image brightnesses that lead to the large value of s (=49.5) are not, in general, due to noise but to the variation in local information. With the help of the region there is a way to estimate the SNR. We can use the s (=4.0) and the dynamic range, amax - amin, for the image (=241-56) to calculate a global SNR (=33.3 dB). The underlying assumptions are that:1) the signal is approximately constant in that region and the variation in the region is therefore due to noise. 2) the noise is the same over the entire image with a standard deviation given by sn = s.

Contour Representations

· Chain code

· Chain code properties

· Crack code

· Run codes

When dealing with a region or object, several compact representations are available that can facilitate manipulation of and measurements on the object. In each case we assume that we begin with an image representation of the object as shown in figure 1a and 1b. Several techniques exist to represent the region or object by describing its contour.

Chain code

This representation is based upon the work of Freeman . We follow the contour in a clockwise manner and keep track of the directions as we go from one contour pixel to the next. For the standard implementation of the chain code we consider a contour pixel to be an object pixel that has a background (non-object) pixel as one or more of its 4-connected neighbors (fig.1c).

The codes associated with eight possible directions are the chain codes and, with x as the current contour pixel position, the codes are generally defined as:

Figure 1: Region (shaded) as it is transformed from (a) continuous to (b) discrete form and then considered as a (c) contour or (d) run lengths illustrated in alternating colours.

Chain code properties

* Even codes {0,2,4,6} correspond to horizontal and vertical directions; odd codes {1,3,5,7} correspond to the diagonal directions.

* Each code can be considered as the angular direction, in multiples of 45deg., that we must move to go from one contour pixel to the next.

* The absolute coordinates [m,n] of the first contour pixel (e.g. top, leftmost) together with the chain code of the contour represent a complete description of the discrete region contour.

* When there is a change between two consecutive chain codes, then the contour has changed direction. This point is defined as a corner.

Crack code

An alternative to the chain code for contour encoding is to use neither the contour pixels associated with the object nor the contour pixels associated with background but rather the line, the "crack", in between. This is illustrated in figure 2, with an enlargement of a portion of figure 1.

The "crack" code can be viewed as a chain code with four possible directions instead of eight.

 

Figure 2: (a) Object including part to be studied. (b) Contour pixels as used in the chain code are diagonally shaded. The "crack" is shown with the thick black line.

The chain code for the enlarged section of figure 2b, from top to bottom, is {5,6,7,7,0}. The crack code is {3,2,3,3,0,3,0,0}.

Run codes

A third representation is based on coding the consecutive pixels along a row--a run--that belong to an object by giving the starting position of the run and the ending position of the run. Such runs are illustrated in figure 1d. There are a number of alternatives for the precise definition of the positions. Which alternative should be used depends upon the application and thus will not be discussed here