Major project report on

114
MAJOR PROJECT REPORT ON AN EFFICIENT PARALLEL APPROACH FOR SCLERA VEIN RECOGNITION A dissertation work submitted in partial fulfilment of the requirement for the award of the degree of BACHELOR OF TECHNOLOGY IN (ELECTRONICS AND COMMUNICATION ENGINEERING) BY ADLA KIRAMAYI ANNABATHULA SRILATHA M.AYESHA MUBEEN Under the guidance of MS SYEDA SANA FATIMA Assistant professor DEPARTMENT OF ELECTRONIC & COMMUNICATION ENGINEERING SHADAN WOMENS COLLEGE OF ENGINEERING & TECHNOLOGY (Affiliated To Jawaharlal Nehru Technological University Hyderabad)

Transcript of Major project report on

Page 1: Major project report on

MAJOR PROJECT REPORT ON

AN EFFICIENT PARALLEL APPROACH FOR SCLERA VEIN

RECOGNITION

A dissertation work submitted in partial fulfilment of the requirement for the award of the degree of

BACHELOR OF TECHNOLOGY

IN

(ELECTRONICS AND COMMUNICATION ENGINEERING)

BYADLA KIRAMAYI

ANNABATHULA SRILATHA

MAYESHA MUBEEN

Under the guidance of

MS SYEDA SANA FATIMA

Assistant professor

DEPARTMENT OF ELECTRONIC amp COMMUNICATION ENGINEERING

SHADAN WOMENS COLLEGE OF ENGINEERING amp TECHNOLOGY

(Affiliated To Jawaharlal Nehru Technological University Hyderabad)

2011-2015

CERTIFICATE

This is to certify that the project report entitled ldquoAN EFFICIENT

PARALLEL APPROACH FOR SCLERA VEIN RECOGNITIONrdquo

being submitted by AKIRANMAYI ASRILATHA and MAYESHA

MUBEEN to Jawaharlal Nehru Technological University Hyderabad

for the award of the degree of Bachelor of Technology in Electronics and

Communication Engineering This is a record of bonafide work carried

out by them under my supervision and guidance

The matter contained in this report has not been submitted to any other university or

institute for the award of any degree or diploma

MS SYEDA SANA FATIMA MS SSUNEETA

INTERNAL GUIDE HEAD OF THE DEPARTMENT

EXTERNAL GUIDE

ACKNOWLEDGEMENT

This is a report giving details of our project work titled ldquoAN EFFICIENT

PARALLEL APPROACH FOR SCLERA VEIN RECOGNITIONrdquo

Though this an attempt has been made to present the description of all the

theoretical and practical aspects of our project to all the possible extent

We take this opportunity to express our sincere appreciation to professor

Ms SSuneeta Head of the Department and staff of Bachelor of

Technology for their invaluable suggestion and keen interest they have

shown in successful completion of this project

We express our deep gratitude to our guide Ms Syeda Sana Fatima

whose in valuable reference Suggestion and encouragement have

immensely helped in successful completion of the project This project

would add as an asset to our academic profile

It is with profound sense of gratitude that we acknowledge our project

guide Ms Syeda Sana Fatima for providing us with live specification and

her valuable suggestion which encourage me to complete their project

successfully

We are happy to express our gratitude to one and all who helped us in the fulfilment

of the project successfully

We are thankful to our principal Dr MAZHER SALEEM Shadan Womenrsquos

College of Engineering and Technology for encouraging us to do the projects

ADLA KIRANMAYI

ANNABATHULA SRILATHA

MAYESHA MUBEEN

DECLARATION

We hereby declare that the work which is being presented in this project

entitled ldquoAN EFFICIENT PARALLEL APPROACH FOR SCLERA

VEIN RECOGNITIONrdquo submitted towards the partial fulfilment of the

requirement for the award of the Degree of Bachelor Of Technology in

ldquoElectronics and Communication Engineeringrdquo is an authentic record of

our work under the supervision of Ms Syeda Sana Fatima Assistant

professor and Ms SSuneeta Head of the Department of Electronics and

Communication Engineering SHADAN WOMENS COLLEGE OF

ENGINEERING AND TECHNOLOGY affiliated to Jawaharlal Nehru

Technology University Hyderabad

The matter embodied in this report has not been submitted for the award of

any other degree

ADLA KIRANMAYI

ANNABATHULA SRILATHA

MAYESHA MUBEEN

INDEX

ABSTRACThelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

helliphelliphelliphellipi

CHAPTER 1

INTRODUCTIONhelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

hellip1-16

11 GENERAL

12 OVERVIEW ABOUT DIGITAL IMAGE PROCESSING

121 PREPROCESSING

122 IMAGE ENHANCEMENT

123 IMAGE RESTORATION

124 IMAGE COMPRESSION

125 SEGMENTATION

126 IMAGE RESTORATION

127 FUNDAMENTAL STEPS

13 A SIMPLE IMAGE MODEL

14 IMAGE FILE FORMATS

15 TYPE OF IMAGES

151 BINARY IMAGES

152 GRAY SCALE IMAGE

153 COLOR IMAGE

154 INDEXED IMAGE

16 APPLICATIONS OF IMAGE PROCESSING

17 EXIXTING SYSTEM

171 DISADVANTAGES OF EXISTING SYSTEM

18 LITERATURE SURVEY

19 PROPOSED SYSTEM

191 ADVANTAGES

CHAPTER 2 PROJECT

DESCRIPTIONhelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip17-46

21 INTRODUCTION

22 BACKGROUND OF SCLERA VEIN RECOGNITION

221 OVERVIEW OF SCLERA VEIN RECOGNITION

222 SCLERA SEGMENTATION

223 IRIS AND EYELID REFINEMENT

224 OCULAR SURFACE VASCULATURE

225 OVERVIEW OF THE LINE DESCRIPTOR-BASED SCLERA

VEIN

23 EVOLUTION OF GPU ARCHITECTURE

231 PROGRAMMING A GPU FOR GRAPHICS

232 PROGRAMMING A GPU FOR GENERAL-PURPOSE

PROGRAMS (OLD)

233 PROGRAMMING A GPU FOR GENERAL-PURPOSE

PROGRAMS (NEW)

24 COARSE-TO-FINE TWO-STAGE MATCHING PROCESS

241 STAGE I MATCHING WITH Y SHAPE DESCRIPTOR

242 STAGE II FINE MATCHING USING WPL DESCRIPTOR

25 MAPPING THE SUBTASKS TO CUDA

251 MAPPING ALGORITHM TO BLOCKS

252 MAPPING INSIDE BLOCK

253 MEMORY MANAGEMENT

26 HISTOGRAM OF ORIENTED GRADIENTS

CHAPTER 3 SOFTWARE

SPECIFICATIONhelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip47-53

31 GENERAL

32 SOFTWARE REQUIREMENTS

33 INTRODUCTION

34 FEATURES OF MATLAB

341 INTERFACING WITH OTHER LANGUAGES

35 THE MATLAB SYSTEM

351 DESKTOP TOOLS

352 ANALYZING AND ACCESSING DATA

353 PERFORMING NUMERIC COMPUTATION

CHAPTER 4

IMPLEMENTATIONhelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

helliphellip54-69

41 GENERAL

42 CODING IMPLEMENTATION

43 SNAPSHOTS

CHAPTER 5 helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

70

CHAPTER 6 CONCLUSION amp FUTURE

SCOPEhelliphelliphelliphelliphelliphelliphelliphelliphelliphellip71-72

61 CONCLUSION

62 REFERENCES

APPLICATION

LIST OF FIGURES

FIG NO

FIG NAME PGNO

11 Fundamental blocks of digital image processing

2

12 Gray scale image 813 The additive model of RGB 914 The colors created by the

subtractive model of CMYK9

21 The diagram of a typical sclera vein recognition approach

19

22 Steps of Segmentation 2123 Glare area detection 2124 Detection of the sclera area 22

25 Pattern of veins 23

26 Sclera region and its vein

patterns 25

27 Filtering can take place

simultaneously on different

25

parts of the iris image

28 The sketch of parameters of

segment descriptor

26

29 The weighting image 28

210 The module of sclera

template matching

28

211 The Y shape vessel branch in

sclera

28

212 The rotation and scale

invariant character of Y

shape vessel branch

29

213 The line descriptor of the

sclera vessel pattern

30

214 The key elements of

descriptor vector

31

215 Simplified sclera matching

steps on GPU

32

216 Two-stage matching scheme 35

217 Example image from the

UBIRIS database

42

218 Occupancy on various thread

numbers per block

43

219 The task assignment inside

and outside the GPU

44

220 HOG features 46

41 Original sclera image 65

42 Binarised sclera image 65

43 Edge map subtracted image 66

44 Cropping roi 66

45 Roi mask 67

46 Roi finger sclera image 67

47 Enhanced sclera image 68

48 Feature extracted sclera

image

68

49 Matching with images in

database

69

410 Result 69

ABSTRACT

Sclera vein recognition is shown to be a promising method for human

identification However its matching speed is slow which could impact its

application for real-time applications To improve the matching efficiency

we proposed a new parallel sclera vein recognition method using a two-

stage parallel approach for registration and matching First we designed a

rotation- and scale-invariant Y shape descriptor based feature extraction

method to efficiently eliminate most unlikely matches Second we

developed a weighted polar line sclera descriptor structure to incorporate

mask information to reduce GPU memory cost Third we designed a

coarse-to-fine two-stage matching method Finally we developed a

mapping scheme to map the subtasks to GPU processing units The

experimental results show that our proposed method can achieve dramatic

processing speed improvement without compromising the recognition

accuracy

CHAPTER 1

INTRODUCTION

11GENERAL

Digital image processing is the use of computer algorithms to

perform image processing on digital images The 2D continuous image is

divided into N rows and M columns The intersection of a row and a

column is called a pixel The image can also be a function other variables

including depth color and time An image given in the form of a

transparency slide photograph or an X-ray is first digitized and stored as a

matrix of binary digits in computer memory This digitized image can then

be processed andor displayed on a high-resolution television monitor For

display the image is stored in a rapid-access buffer memory which

refreshes the monitor at a rate of 25 frames per second to produce a visually

continuous display

12 OVERVIEW ABOUT DIGITAL IMAGE PROCESSING

The field of ldquoDigital Image Processingrdquo refers to processing the digital

images by means of a digital computer In a broader sense it can be

considered as a processing of any two dimensional data where any image

(optical information) is represented as an array of real or complex numbers

represented by a definite number of bits An image is represented as a two

dimensional function f(xy) where lsquoxrsquo and lsquoyrsquo are spatial (plane)

coordinates and the amplitude of f at any pair of coordinates (xy)

represents the intensity or gray level of the image at that point

A digital image is one for which both the co-ordinates and the

amplitude values of f are all finite discrete quantities Hence a digital

image is composed of a finite number of elements each of which has a

particular location value These elements are called ldquopixelsrdquo A digital

image is discrete in both spatial coordinates and brightness and it can be

considered as a matrix whose rows and column indices identify a point on

the image and the corresponding matrix element value identifies the gray

level at that point

One of the first applications of digital images was in the newspaper

industry when pictures were first sent by submarine cable between London

and New York Introduction of the Bartlane cable picture transmission

system in the early 1920s reduced the time required to transport a picture

across the Atlantic from more than a week to less than three hours

FIG

121 PREPROCESSING

In imaging science image processing is any form of signal

processing for which the input is an image such as a photograph or video

frame the output of image processing may be either an image or a set of

characteristics or parameters related to the image Most image-processing

techniques involve treating the image as a two-dimensional signal and

applying standard signal-processing techniques to it Image processing

usually refers to digital image processing but optical and analog image

processing also are possible This article is about general techniques that

apply to all of them The acquisition of images (producing the input image

in the first place) is referred to as imaging

Image processing refers to processing of a 2D picture by a

computer Basic definitions

An image defined in the ldquoreal worldrdquo is considered to be a function

of two real variables for example a(xy) with a as the amplitude (eg

brightness) of the image at the real coordinate position (xy) Modern digital

technology has made it possible to manipulate multi-dimensional signals

with systems that range from simple digital circuits to advanced parallel

computers The goal of this manipulation can be divided into three

categories

Image processing (image in -gt image out)

Image Analysis (image in -gt measurements out)

Image Understanding (image in -gt high-level description out)

An image may be considered to contain sub-images sometimes referred

to as regions-of-interest ROIs or simply regions This concept reflects the

fact that images frequently contain collections of objects each of which can

be the basis for a region In a sophisticated image processing system it

should be possible to apply specific image processing operations to selected

regions Thus one part of an image (region) might be processed to suppress

motion blur while another part might be processed to improve colour

rendition

Most usually image processing systems require that the images be

available in digitized form that is arrays of finite length binary words For

digitization the given Image is sampled on a discrete grid and each sample

or pixel is quantized using a finite number of bits The digitized image is

processed by a computer To display a digital image it is first converted

into analog signal which is scanned onto a display Closely related to

image processing are computer graphics and computer vision In computer

graphics images are manually made from physical models of objects

environments and lighting instead of being acquired (via imaging devices

such as cameras) from natural scenes as in most animated movies

Computer vision on the other hand is often considered high-level image

processing out of which a machinecomputersoftware intends to decipher

the physical contents of an image or a sequence of images (eg videos or

3D full-body magnetic resonance scans)

In modern sciences and technologies images also gain much

broader scopes due to the ever growing importance of scientific

visualization (of often large-scale complex scientificexperimental data)

Examples include microarray data in genetic research or real-time multi-

asset portfolio trading in finance Before going to processing an image it is

converted into a digital form Digitization includes sampling of image and

quantization of sampled values After converting the image into bit

information processing is performed This processing technique may be

Image enhancement Image restoration and Image compression

122 IMAGE ENHANCEMENT

It refers to accentuation or sharpening of image features such as

boundaries or contrast to make a graphic display more useful for display amp

analysis This process does not increase the inherent information content in

data It includes gray level amp contrast manipulation noise reduction edge

crispening and sharpening filtering interpolation and magnification

pseudo coloring and so on

123 IMAGE RESTORATION

It is concerned with filtering the observed image to minimize the

effect of degradations Effectiveness of image restoration depends on the

extent and accuracy of the knowledge of degradation process as well as on

filter design Image restoration differs from image enhancement in that the

latter is concerned with more extraction or accentuation of image features

124 IMAGE COMPRESSION

It is concerned with minimizing the number of bits required to represent

an image Application of compression are in broadcast TV remote sensing

via satellite military communication via aircraft radar teleconferencing

facsimile transmission for educational amp business documents medical

images that arise in Computer tomography magnetic resonance imaging

and digital radiology motion pictures satellite images weather maps

geological surveys and so on

Text compression ndash CCITT GROUP3 amp GROUP4

Still image compression ndash JPEG

Video image compression ndash MPEG

125 SEGMENTATION

In computer vision image segmentation is the process of

partitioning a digital image into multiple segments (sets of pixels also

known as super pixels) The goal of segmentation is to simplify andor

change the representation of an image into something that is more

meaningful and easier to analyze Image segmentation is typically used to

locate objects and boundaries (lines curves etc) in images More precisely

image segmentation is the process of assigning a label to every pixel in an

image such that pixels with the same label share certain visual

characteristics

The result of image segmentation is a set of segments that

collectively cover the entire image or a set of contours extracted from the

image (see edge detection) Each of the pixels in a region are similar with

respect to some characteristic or computed property such as

colour intensity or texture Adjacent regions are significantly different

with respect to the same characteristic(s) When applied to a stack of

images typical in medical imaging the resulting contours after image

segmentation can be used to create 3D reconstructions with the help of

interpolation algorithms like marching cubes

126 IMAGE RESTORATION

Image restoration like enhancement improves the qualities of image

but all the operations are mainly based on known measured or

degradations of the original image Image restorations are used to restore

images with problems such as geometric distortion improper focus

repetitive noise and camera motion It is used to correct images for known

degradations

127 FUNDAMENTAL STEPS

Image acquisition to acquire a digital image

Image preprocessing to improve the image in ways that increases the

chances for success of the other processes

Image segmentation to partitions an input image into its constituent parts or

objects

Image representation to convert the input data to a form suitable for

computer processing

Image description to extract features that result in some quantitative

information of interest or features that are basic for differentiating one

class of objects from another

Image recognition to assign a label to an object based on the

information provided by its descriptors

Image interpretation to assign meaning to an ensemble of recognized

objects

Knowledge about a problem domain is coded into an image processing

system in the form of a Knowledge database

13 A SIMPLE IMAGE MODEL

To be suitable for computer processing an image f(xy) must be digitalized

both spatially and in amplitude

Digitization of the spatial coordinates (xy) is called image sampling

Amplitude digitization is called gray-level quantization

The storage and processing requirements increase rapidly with the spatial

resolution and the number of gray levels

Example A 256 gray-level image of size 256x256 occupies 64K bytes of

memory

Images of very low spatial resolution produce a checkerboard effect

The use of insufficient number of gray levels in smooth areas of a digital

image results in false contouring

14 IMAGE FILE FORMATS

There are two general groups of lsquoimagesrsquo vector graphics (or line art)

and bitmaps (pixel-based or lsquoimagesrsquo) Some of the most common file

formats are

GIF mdash Graphical interchange Format An 8-bit (256 colour) non-

destructively compressed bitmap format Mostly used for web Has several

sub-standards one of which is the animated GIF

JPEG mdash Joint Photographic Experts Group a very efficient (ie much

information per byte) destructively compressed 24 bit (16 million colours)

bitmap format Widely used especially for web and Internet (bandwidth-

limited)

TIFF mdash Tagged Image File Format The standard 24 bit publication bitmap

format Compresses non-destructively with for instance Lempel-Ziv-

Welch (LZW) compression

PS mdash Postscript a standard vector format Has numerous sub-standards

and can be difficult to transport across platforms and operating systems

PSD ndash Adobe PhotoShop Document a dedicated Photoshop format that

keeps all the information in an image including all the layers

BMP- bit map file format

15 TYPE OF IMAGES

Images are 4 types

1 Binary image

2 Gray scale image

3 Color image

4 Indexed image

151 BINARY IMAGES

A binary image is a digital image that has only two possible values for

each pixel Typically the two colors used for a binary image are black and

white though any two colors can be used Binary images are also called bi-

level or two-level This means that each pixel is stored as a single bitmdashie

a 0 or 1 The names black-and-white BampW

152 GRAY SCALE IMAGE

In a (8-bit) grayscale image each picture element has an assigned intensity

that ranges from 0 to 255 A grey scale image is what people normally call

a black and white image but the name emphasizes that such an image will

also include many shades of grey

FIG

153 COLOR IMAGE

The RGB colour model relates very closely to the way we perceive

colour with the r g and b receptors in our retinas RGB uses additive colour

mixing and is the basic colour model used in television or any other

medium that projects colour with light It is the basic colour model used in

computers and for web graphics but it cannot be used for print production

The secondary colours of RGB ndash cyan magenta and yellow ndash are formed

by mixing two of the primary colours (red green or blue) and excluding the

third colour Red and green combine to make yellow green and blue to

make cyan and blue and red form magenta The combination of red green

and blue in full intensity makes white

In Photoshop using the ldquoscreenrdquo mode for the different layers in an

image will make the intensities mix together according to the additive

colour mixing model This is analogous to stacking slide images on top of

each other and shining light through them

FIG

CMYK The 4-colour CMYK model used in printing lays down

overlapping layers of varying percentages of transparent cyan (C) magenta

(M) and yellow (Y) inks In addition a layer of black (K) ink can be added

The CMYK model uses the subtractive colour model

154 INDEXED IMAGE

FIG

An indexed image consists of an array and a color map matrix The

pixel values in the array are direct indices into a color map By convention

this documentation uses the variable name X to refer to the array and map

to refer to the color map In computing indexed color is a technique to

manage digital images colors in a limited fashion in order to save

computer memory and file storage while speeding up display refresh and

file transfers It is a form of vector quantization compression

When an image is encoded in this way color information is not

directly carried by the image pixel data but is stored in a separate piece of

data called a palette an array of color elements in which every element a

color is indexed by its position within the array The image pixels do not

contain the full specification of its color but only its index in the palette

This technique is sometimes referred as pseudocolor or indirect color as

colors are addressed indirectly

Perhaps the first device that supported palette colors was a random-

access frame buffer described in 1975 by Kajiya Sutherland and Cheadle

This supported a palette of 256 36-bit RGB colors

16 Applications of image processing

Interest in digital image processing methods stems from 2 principal

application areas

1) Improvement of pictorial information for human interpretation

2) Processing of scene data for autonomous machine perception

In the second application area interest focuses on procedures for

extracting from an image

Information in a form suitable for computer processing

Examples include automatic character recognition industrial machine

vision for product assembly and inspection military recognizance

automatic processing of fingerprints etc

17 EXISTING SYSTEM

In Crihalmeanu and Ross proposed three approaches Speed Up Robust

Features (SURF)-based method minutiae detection and direct correlation

matching for feature registration and matching Within these three methods

the SURF method achieves the best accuracy It takes an average of 15

seconds1 using the SURF method to perform a one-to-one matching In

Zhou et al Proposed line descriptor-based method for sclera vein

recognition The matching step (including registration) is the most time-

consuming step in this sclera vein recognition system which costs about 12

seconds to perform a one-to-one matching Both speed was calculated using

a PC with Intelreg Core TM 2 Duo 24GHz processors and 4 GB DRAM

Currently Sclera vein recognition algorithms are designed using central

processing unit (CPU)-based systems

171 DISADVANTAGES OF EXISTING SYSTEM

1 Mask files are used to calculate valid overlapping areas of two sclera

templates and to align the templates to the same coordinate system But the

mask files are large in size and will preoccupy the GPU memory and slow

down the data transfer Also some of processing on the mask files will

involve convolution which is difficult to improve its performance on the

scalar process unit on CUDA

2 The procedure of sclera feature matching consists of a pipeline of several

computational stages with different memory and processing requirements

There is no uniform mapping scheme applicable for all these stages

3 When the scale of sclera database is far larger than the number of the

processing units on the GPU parallel matching on the GPU is still unable to

satisfy the requirement of real-time performance

LITERATURE SURVEY

1 S Crihalmeanu and A Ross ldquo ieee Multispectral scleral patterns

for ocular biometric recognitionrdquo Pattern Recognit Lett vol 33 no

14 pp 1860ndash1869 Oct 2012

Face recognition in unconstrained acquisition conditions is one of the

most challenging problems that has been actively researched in recent

years It is well known that many state-of-the-arts still face recognition

algorithms perform well when constrained (frontal well illuminated high-

resolution sharp and full) face images are acquired However their

performance degrades significantly when the test images contain variations

that are not present in the training images In this paper we highlight some

of the key issues in remote face recognition We define the remote face

recognition as one where faces are several tens of meters (10-250m) from

the cameras We then describe a remote face database which has been

acquired in an unconstrained outdoor maritime environment Recognition

performance of a subset of existing still image-based face recognition

algorithms is evaluated on the remote face data set Further we define the

remote re-identification problem as matching a subject at one location with

candidate sets acquired at a different location and over time in remote

conditions We provide preliminary experimental results on remote re-

identification It is demonstrated that in addition to applying a good

classification algorithm finding features that are robust to variations

mentioned above and developing statistical models which can account for

these variations are very important for remote face recognition

2 R N Rakvic B J Ulis R P Broussard R W Ives and N

Steiner1 ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics

Security

With the rapidly expanded biometric data collected by various sectors

of government and industry for identification and verification purposes

how to manage and process such Big Data draws great concern Even

though modern processors are equipped with more cores and memory

capacity it still requires careful design in order to utilize the hardware

resource effectively and the power consumption efficiently This research

addresses this issue by investigating the workload characteristics of

biometric application Taking Daugmanrsquos iris matching algorithm which

has been proven to be the most reliable iris matching method as a case

study we conduct performance profiling and binary instrumentation on the

benchmark to capture its execution behavior The results show that data

loading and memory access incurs great performance overhead and

motivates us to move the biometrics computation to high-performance

architecture

Modern iris recognition algorithms can be computationally intensive

yet are designed for traditional sequential processing elements such as a

personal computer However a parallel processing alternative using field

programmable gate arrays (FPGAs) offers an opportunity to speed up iris

recognition Within the means of this project iris template generation with

directional filtering which is acomputationally expensive yet parallel

portion of a modern iris recognition algorithm is parallelized on an FPGA

system We will present a performance comparison of the parallelized

algorithm on the FPGA system to a traditional CPU-based version The

parallelized template generation outperforms an optimized C++ code

version determining the information content of an iris approximately 324

times faster

3 R Derakhshani A Ross and S Crihalmeanu ldquoA new biometric

modality based on conjunctival vasculaturerdquo in Proc Artif Neural

Netw Eng 2006 pp 1ndash8

A new biometric indicator based on the patterns of conjunctival

vasculature is proposed Conjunctival vessels can be observed on the visible

part of the sclera that is exposed to the outside world These vessels

demonstrate rich and specific details in visible light and can be easily

photographed using a regular digital camera In this paper we discuss

methods for conjunctival imaging preprocessing and feature extraction in

order to derive a suitable conjunctival vascular template for biometric

authentication Commensurate classification methods along with the

observed accuracy are discussed Experimental results suggest the potential

of using conjunctival vasculature as a biometric measure Identification of

a person based on some unique set of features is an important task The

human identification is possible with several biometric systems and sclera

recognition is one of the promising biometrics The sclera is the white

portion of the human eye The vein pattern seen in the sclera region is

unique to each person Thus the sclera vein pattern is a well suited

biometric technology for human identification The existing methods used

for sclera recognition have some drawbacks like only frontal looking

images are preferred for matching and rotation variance is another problem

These problems are completely eliminated in the proposed system by using

two feature extraction techniques They are Histogram of Oriented

Gradients (HOG) and converting the image into polar form using the

bilinear interpolation technique These two features help the proposed

system to become illumination invariant and rotation invariant The

experimentation is done with the help of UBIRIS database The

experimental result shows that the proposed sclera recognition method can

achieve better accuracy than the previous methods

4 J D Owens M Houston D Luebke S Green J E Stone and J

C Phillips ldquoGPU computingrdquo Proc IEEE vol 96 no 5pp 879ndash899

May 2008

The graphics processing unit (GPU) has become an integral part of

todayrsquos mainstream computing systems Over the past six years there has

been a marked increase in the performance and capabilities of GPUs The

modern GPU is not only a powerful graphics engine but also a highly

parallel programmable processor featuring peak arithmetic and memory

bandwidth that substantially outpaces its CPU counterpart The GPUrsquos

rapid increase in both programmability and capability has spawned a

research community that has successfully mapped a broad range of

computationally demanding complex problems to the GPU This effort in

general purpose computing on the GPU also known as GPU computing

has positioned the GPU as a compelling alternative to traditional

microprocessors in high-performance computer systems of the future We

describe the background hardware and programming model for GPU

computing summarize the state of the art in tools and techniques and

present four GPU computing successes in game physics and computational

biophysics that deliver order-of-magnitude performance gains over

optimized CPU applications

5 H Proenccedila and L A Alexandre ldquoUBIRIS A noisy iris image

databaserdquoin Proc 13th Int Conf Image Anal Process 2005 pp 970ndash

977

This paper proposes algorithms for iris segmentation quality

enhancement match score fusion and indexing to improve both the

accuracy and the speed of iris recognition A curve evolution approach is

proposed to effectively segment a nonideal iris image using the modified

MumfordndashShah functional Different enhancement algorithms are

concurrently applied on the segmented iris image to produce multiple

enhanced versions of the iris image A support-vector-machine-based

learning algorithm selects locally enhanced regions from each globally

enhanced image and combines these good-quality regions to create a single

high-quality iris image Two distinct features are extracted from the high-

quality iris image The global textural feature is extracted using the 1-D log

polar Gabor transform and the local topological feature is extracted using

Euler numbers An intelligent fusion algorithm combines the textural and

topological matching scores to further improve the iris recognition

performance and reduce the false rejection rate whereas an indexing

algorithm enables fast and accurate iris identification The verification and

identification performance of the propose algorithms is validated and

compared with other algorithms using the CASIA Version 3 ICE 2005 and

UBIRIS iris databases

18 PROPOSED METHOD

Proposed a new parallel sclera vein recognition method using a two-

stage parallel approach for registration and matching A parallel sclera

matching solution for Sclera vein recognition using our sequential line-

descriptor method using the CUDA GPU architecture CUDA is a highly

parallel multithreaded many-core processor with tremendous

computational power

It supports not only a traditional graphics pipeline but also computation

on non-graphical data It is relatively straightforward to implement our C

program for CUDA on AMD-based GPU using Open CL Our CUDA

kernels can be directly converted to Open CL kernels by concerning

different syntax for various keywords and built-in functions The mapping

strategy is also effective in Open CL if we regard thread and block in

CUDA as work item and work-group in Open CL Most of our optimization

techniques such as coalesced memory access and prefix sum can work in

Open CL too Moreover since CUDA is a data parallel architecture the

implementation of our approach by Open CL should be programmed in

data-parallel model

In this research we first discuss why the naiumlve parallel approach would

not work We then propose the new sclera descriptor ndash the Y shape sclera

feature-based efficient registration method to speed up the mapping scheme

introduce the ldquoweighted polar line (WPL) descriptorrdquo that would be better

suited for parallel computing to mitigate the mask size issue and develop

our coarse to fine two-stage matching process to dramatically improve the

matching speed These new approaches make the parallel processing

possible and efficient

191PROPOSED SYSTEM ADVANTAGES

1 To improve the efficiency in this research we propose a new descriptor

mdash the Y shape descriptor which can greatly help improve the efficiency of

the coarse registration of two images and can be used to filter out some

non-matching pairs before refined matching

2 We propose the coarse-to-fine two-stage matching process In the first

stage we matched two images coarsely using the Y-shape descriptors

which is very fast to match because no registration was needed The

matching result in this stage can help filter out image pairs with low

similarities

CHAPTER 2

PROJECT DESCRIPTION

21 INTRODUCTION

The sclera is the opaque and white outer layer of the eye The blood

vessel structure of sclera is formed randomly and is unique to each person

which can be used for humanrsquos identification Several researchers have

designed different Sclera vein recognition methods and have shown that it

is promising to use Sclera vein recognition for human identification In

Crihalmeanu and Ross proposed three approaches Speed Up Robust

Features (SURF)-based method minutiae detection and direct correlation

matching for feature registration and matching Within these three methods

the SURF method achieves the best accuracy It takes an average of 15

seconds1 using the SURF method to per- form a one-to-one matching Zhou

et al proposed line descriptor-based method for sclera vein recognition

The matching step (including registration) is the most time-consuming step

in this sclera vein recognition system which costs about 12 seconds to

perform a one-to-one matching Both speed was calculated using a PC with

Intelreg Coretrade 2 Duo 24GHz processors and 4 GB DRAM Currently

Sclera vein recognition algorithms are designed using central processing

unit (CPU)-based systems

As discussed CPU-based systems are designed as sequential

processing devices which may not be efficient in data processing where the

data can be parallelized Because of large time consumption in the matching

step Sclera vein recognition using sequential-based method would be very

challenging to be implemented in a real time biometric system especially

when there is large number of templates in the database for matching GPUs

(as abbreviation of General purpose Graphics Processing Units GPGPUs)

are now popularly used for parallel computing to improve the

computational processing speed and efficiency The highly parallel

structure of GPUs makes them more effective than CPUs for data

processing where processing can be performed in parallel GPUs have been

widely used in biometrics recognition such as speech recognition text

detection handwriting recognition and face recognition In iris

recognition GPU was used to extract the features construct descriptors

and match templates

GPUs are also used for object retrieval and image search Park et al

designed the performance evaluation of image processing algorithms such

as linear feature extraction and multi-view stereo matching on GPUs

However these approaches were designed for their specific biometric

recognition applications and feature searching methods Therefore they may

not be efficient for Sclera vein recognition Compute Unified Device

Architecture (CUDA) the computing engine of NVIDIA GPUs is used in

this research CUDA is a highly parallel multithreaded many-core

processor with tremendous computational power It supports not only a

traditional graphics pipeline but also computation on non-graphical data

More importantly it offers an easier programming platform which

outperforms its CPU counterparts in terms of peak arithmetic intensity and

memory bandwidth In this research the goal is not to develop a unified

strategy to parallelize all sclera matching methods because each method is

quite different from one another and would need customized design To

develop an efficient parallel computing scheme it would need different

strategies for different Sclera vein recognition methods

Rather the goal is to develop a parallel sclera matching solution for

Sclera vein recognition using our sequential line-descriptor method using

the CUDA GPU architecture However the parallelization strategies

developed in this research can be applied to design parallel approaches for

other Sclera vein recognition methods and help parallelize general pattern

recognition methods Based on the matching approach in there are three

challenges to map the task of sclera feature matching to GPU

1) Mask files are used to calculate valid overlapping areas of two sclera

templates and to align the templates to the same coordinate system But the

mask files are large in size and will preoccupy the GPU memory and slow

down the data transfer Also some of processing on the mask files will

involve convolution which is difficult to improve its performance on the

scalar process unit on CUDA

2) The procedure of sclera feature matching consists of a pipeline of several

computational stages with different memory and processing requirements

There is no uniform mapping scheme applicable for all these stages

3) When the scale of sclera database is far larger than the number of the

processing units on the GPU parallel matching on the GPU is still unable to

satisfy the requirement of real-time performance New designs are

necessary to help narrow down the search range In summary naiumlve

implementation of the algorithms in parallel would not work efficiently

Note it is relatively straightforward to implement our C program for

CUDA on AMD-based GPU using Open CL Our CUDA kernels can be

directly converted to Open CL kernels by concerning different syntax for

various keywords and built-in functions The mapping strategy is also

effective in Open CL if we regard thread and block in CUDA as work item

and work-group in Open CL Most of our optimization techniques such as

coalesced memory access and prefix sum can work in Open CL too

Moreover since CUDA is a data parallel architecture the implementation

of our approach by Open CL should be programmed in data-parallel model

In this research we first discuss why the naiumlve parallel approach would not

work (Section 3) We then propose the new sclera descriptor ndash the Y shape

sclera feature-based efficient registration method to speed up the mapping

scheme (Section 4) introduce the ldquoweighted polar line (WPL) descriptorrdquo

that would be better suited for parallel computing to mitigate the mask size

issue (Section 5) and develop our coarse to fine two-stage matching

process to dramatically improve the matching speed (Section 6) These new

approaches make the parallel processing possible and efficient However it

is non-trivial to implement these algorithms in CUDA We then developed

the implementation schemes to map our algorithms into CUDA (Section 7)

In the Section 2 we give brief introduction of Sclera vein recognition In

the Section 8 we performed some experiments using the proposed system

In the Section 9 we draw some conclusions

22 BACKGROUND OF SCLERA VEIN RECOGNITION

221 OVERVIEW OF SCLERA VEIN RECOGNITION

A typical sclera vein recognition system includes sclera

segmentation feature enhancement feature extraction and feature

matching (Figure 1)

FIG

Sclera image segmentation is the first step in sclera vein recognition

Several methods have been designed for sclera segmentation Crihalmeanu

et al presented an semi-automated system for sclera segmentation They

used a clustering algorithm to classify the color eye images into three

clusters - sclera iris and background Later on Crihalmeanu and Ross

designed a segmentation approach based on a normalized sclera index

measure which includes coarse sclera segmentation pupil region

segmentation and fine sclera segmentation Zhou et al developed a skin

tone plus ldquowhite colorrdquo-based voting method for sclera segmentation in

color images and Otsursquos thresholding-based method for grayscale images

After sclera segmentation it is necessary to enhance and extract the sclera

features since the sclera vein patterns often lack contrast and are hard to

detect Zhou et al used a bank of multi-directional Gabor filters for

vascular pattern enhancement Derakhshani et al used contrast limited

adaptive histogram equalization (CLAHE) to enhance the green color plane

of the RGB image and a multi-scale region growing approach to identify

the sclera veins from the image background Crihalmeanu and Ross applied

a selective enhancement filter for blood vessels to extract features from the

green component in a color image In the feature matching step

Crihalmeanu and Ross proposed

three registration and matching approaches including Speed Up Robust

Features (SURF) which is based on interest-point detection minutiae

detection which is based on minutiae points on the vasculature structure

and direct correlation matching which relies on image registration Zhou et

al designed a line descriptor based feature registration and matching

method

The proposed sclera recognition consists of five steps which include

sclera segmentation vein pattern enhancement feature extraction feature

matching and matching decision Fig 2 shows the block diagram of sclera

recognition Two types of feature extraction are used in the proposed

method to achieve good accuracy for the identification The characteristics

that are elicited from the blood vessel structure seen in the sclera region are

Histogram of Oriented Gradients (HOG) and interpolation of Cartesian to

Polar conversion HOG is used to determine the gradient orientation and

edge orientations of vein pattern in the sclera region of an eye image To

become more computationally efficient the data of the image are converted

to the polar form It is mainly used for circular or quasi circular shape of

object These two characteristics are extracted from all the images in the

database and compared with the features of the query image whether the

person is correctly identified or not This procedure is done in the feature

matching step and ultimately makes the matching decision By using the

proposed feature extraction methods and matching techniques the human

identification is more accurate than the existing studies In the proposed

method two features of an image are drawn out

222 SCLERA SEGMENTATION

Sclera segmentation is the first step in the sclera recognition It lets

in three steps glare area detection sclera area estimation and iris and eyelid

detection and refinement Fig shows the steps of segmentation

FIG

Glare Area Detection Glare area means a small bright area near

pupil or iris This is the unwanted portion on the eye image Sobel filter is

applied to detect the glare area present in the iris or pupil Simply it runs

only for the grayscale image If the image is color then it needs a

conversion to grayscale image and after that apply it to the Sobel filter to

detect the glare area Fig 4 shows the result of the glare area detection

FIG

Sclera area estimation For the estimation of sclera area Otsursquos

thresholding method is applied The stairs of the sclera area detection are

selection of the area of interest (ROI) Otsursquos thresholding sclera area

detection Left and right sclera area is selected based on the iris boundaries

When the region of interest is selected then apply Otsursquos thresholding for

obtaining the potential sclera areas The correct left sclera area should be

placed in the right and center positions and correct right sclera area should

be placed in the left and center In this way non sclera areas are wiped out

223 IRIS AND EYELID REFINEMENT

The top and underside of the sclera regions are the limits of the

sclera area And then that upper eyelid lower eyelid and iris boundaries are

refined These altogether are the unwanted portion for recognition In order

to eliminate these effects refinement is done in the footstep of the detection

of sclera area Fig shows after the Otsursquos thresholding process and iris and

eyelid refinement to detect right sclera area In the same way the left sclera

area is detected using this method

FIG

In the segmentation process all images are not perfectly segmented

Hence feature extraction and matching are needed to reduce the

segmentation fault The vein patterns in the sclera area are not visible in the

segmentation process To get vein patterns more visible vein pattern

enhancement is to be performed

224 OCULAR SURFACE VASCULATURE

Human recognition using vascular patterns in the human body has

been studied in the context of fingers (Miura et al 2004) palm (Lin and

Fan 2004) and retina (Hill 1999) In the case of retinal biometrics an

especial optical device for imaging the back of the eyeball is needed (Hill

1999) Due to its perceived invasiveness and the required degree of subject

cooperation the use of retinal biometrics may not be acceptable to some

individuals The conjunctiva is a thin transparent and moist tissue that

covers the outer surface of the eye The part of the conjunctiva that covers

the inner lining of the eyelids is called palpebral conjunctiva and the part

that covers the outer surface of the eye is called ocular (or the bulbar)

conjunctiva which is the focus of this study The ocular conjunctiva is very

thin and clear thus the vasculature (including those of the episclera) is

easily visible through it The visible microcirculation of conjunctiva offers a

rich and complex network of veins and fine microcirculation (Fig 1) The

apparent complexity and specificity of these vascular patterns motivated us

to utilize them for personal identification (Derakhshani and Ross 2006)

FIG

We have found conjunctival vasculature to be a suitable biometric as it

conforms to the following criteria (Jain et al 2004)

UNIVERSALITY All normal living tissues including that of the

conjunctiva and episclera have vascular structure

UNIQUENESS Vasculature is created during embryonic vasculogenesis

Its detailed final structure is mostly stochastic and thus unique Even

though no comprehensive study on the uniqueness of vascular structures

has been conducted study of some targeted areas such as those of the eye

fundus confirm the uniqueness of such vascular patterns even between

identical twins (Simon and Goldstein 1935 Tower 1955)

PERMANENCE Other than cases of significant trauma pathology or

chemical intervention spontaneous adult ocular vasculogenesis and

angiogenesis does not easily occur Thus the conjunctival vascular

structure is expected to have reasonable permanence (Joussen 2001)

Practicality Conjunctival vasculature can be captured with commercial off

the shelf digital cameras under normal lighting conditions making this

modality highly practical

ACCEPTABILITY Since the subject is not required to stare directly into

the camera lens and given the possibility of capturing the conjunctival

vasculature from several feet away this modality is non-intrusive and thus

more acceptable

SPOOF-PROOFNESS The fine multi surface structure of the ocular

veins makes them hard to reproduce as a physical artifact Besides being a

stand-alone biometric modality we anticipate that the addition of

conjunctival biometrics will enhance the performance of current iris-based

biometric system in the following ways

Improving accuracy by the addition of vascular features

Facilitating recognition using off-angle iris images For instance if the iris

information is relegated to the left or right portions of the eye the sclera

vein patterns will be further exposed This feature makes sclera vasculature

a natural complement to the iris biometric

Addressing the failure-to-enroll issue when iris patterns are not usable (eg

due to surgical procedures)

Reducing vulnerability to spoof attacks For instance when implemented

alongside iris systems an attacker needs to reproduce not only the iris but

also different surfaces of the sclera along with the associated

microcirculation and make them available on commensurate eye surfaces

The first step in parallelizing an algorithm is to determine the

availability for simultaneous computation Below Figure demonstrates the

possibility for parallel directional filtering Since the filter is computed over

different portions of the input image the computation can be computed in

parallel (denoted by Elements below) In addition individual parallelization

of each element of Filtering can also be performed A detailed discussion of

our proposed parallelization is outside the scope of this paper

FIG

FIG

225 OVERVIEW OF THE LINE DESCRIPTOR-BASED SCLERA

VEIN

2251 RECOGNITION METHOD

The matching segment of the line-descriptor based method is a

bottleneck with regard to matching speed In this section we briefly

describe the Line Descriptor-based sclera vein recognition method After

segmentation vein patterns were enhanced by a bank of directional Gabor

filters Binary morphological operations are used to thin the detected vein

structure down to a single pixel wide skeleton and remove the branch

points The line descriptor is used to describe the segments in the vein

structure Figure 2 shows a visual description of the line descriptor Each

segment is described by three quantities the segments angle to some

reference angle at the iris center θ the segments distance to the iris center r

and the dominant angular orientation of the line segment ɸ Thus the

descriptor is S = ( θ r ɸ )T The individual components of the line descriptor

are calculated as

FIG

Here fline (x) is the polynomial approximation of the line segment (xl yl )

is the center point of the line segment (xi yi ) is the center of the detected

iris and S is the line descriptor In order to register the segments of the

vascular patterns a RANSAC-based algorithm is used to estimate the best-

fit parameters for registration between the two sclera vascular patterns For

the registration algorithm it randomly chooses two points ndash one from the

test template and one from the target template It also randomly chooses a

scaling factor and a rotation value based on a priori knowledge of the

database Using these values it calculates a fitness value for the registration

using these parameters

After sclera template registration each line segment in the test

template is compared to the line segments in the target template for

matches In order to reduce the effect of segmentation errors we created the

weighting image (Figure 3) from the sclera mask by setting interior pixels

in the sclera mask to 1 pixels within some distance of the boundary of the

mask to 05 and pixels outside the mask to 0

The matching score for two segment descriptors is calculated By

where Si and Sj are two segment descriptors m(Si Sj ) is the matching

score between segments Si and Sj d(Si Sj ) is the Euclidean distance

between the segment descriptors center points (from Eq 6-8) Dmatch is

the matching distance threshold and match is the matching angle threshold

The total matching score M is the sum of the individual matching scores

divided by the maximum matching score for the minimal set between the

test and target template That is one of the test or target templates has fewer

points and thus the sum of its descriptors weight sets the maximum score

that can be attained

FIG

FIG

FIG

FIG

movement of eye Y shape branches are observed to be a stable feature and

can be used as sclera feature descriptor To detect the Y shape branches in

the original template we search for the nearest neighbors set of every line

segment in a regular distance classified the angles among these neighbors

If there were two types of angle values in the line segment set this set may

be inferred as a Y shape structure and the line segment angles would be

recorded as a new feature of the sclera

There are two ways to measure both orientation and relationship of

every branch of Y shape vessels one is to use the angles of every branch to

x axle the other is to use the angels between branch and iris radial

direction The first method needs additional rotation operating to align the

template In our approach we employed the second method As Figure 6

shows ϕ1 ϕ2 and ϕ3 denote the angle between each branch and the radius

from pupil center Even when the head tilts the eye moves or the camera

zooms occurs at the image acquisition step ϕ1 ϕ2 and ϕ3 are quite stable

To tolerate errors from the pupil center calculation in the segmentation step

we also recorded the center position (x y) of the Y shape branches as

auxiliary parameters So in our rotation shift and scale invariant feature

vector is defined as y(ϕ1 ϕ2 ϕ3 x y) The Y-shape descriptor is generated

with reference to the iris center Therefore it is automatically aligned to the

iris centers It is a rotational- and scale- invariant descriptor V WPL

SCLERA DESCRIPTOR As we discussed in the Section 22 the line

descriptor is extracted from the skeleton of vessel structure in binary images

(Figure 7) The skeleton is then broken into smaller segments For each

segment a line descriptor is created to record the center and orientation of

the segment This descriptor is expressed as s(x yɸ) where (x y) is the

position of the center and ɸ is its orientation Because of the limitation of

segmentation accuracy the descriptor in the boundary of sclera area might

not be accurate and may contain spur edges resulting from the iris eyelid

andor eyelashes To be tolerant of such error the mask file

FIG

The line descriptor of the sclera vessel pattern (a) An eye image (b) Vessel

patterns in sclera (c) Enhanced sclera vessel patterns (d) Centers of line

segments of vessel patterns

Is designed to indicate whether a line segment belongs to the edge of the

sclera or not However in GPU application using the mask is a challenging

since the mask files are large in size and will occupy the GPU memory and

slow down the data transfer When matching the registration RANSAC

type algorithm was used to randomly select the corresponding descriptors

and the transform parameter between them was used to generate the

template transform affine matrix After every templates transform the mask

data should also be transformed and new boundary should be calculated to

evaluate the weight of the transformed descriptor This results in too many

convolutions in processor unit

To reduce heavy data transfer and computation we designed the

weighted polar line (WPL) descriptor structure which includes the

information of mask and can be automatically aligned We extracted the

relationship of geometric feature of descriptors and store them as a new

descriptor We use a weighted image created via setting various weight

values according to their positions The weight of those descriptors who are

beyond the sclera are set to be 0 and those who are near the sclera

boundary are 05 and interior descriptors are set to be 1 In our work

descriptors weights were calculated on their own mask by the CPU only

once

The calculating result was saved as a component of descriptor The

descriptor of sclera will change to s(x y ɸw) where w denotes the weight

of the point and the value may be 0 05 1 To align two templates when a

template is shifted to another location along the line connecting their

centers all the descriptors of that template will be transformed It would be

faster if two templates have similar reference points If we use the center of

the iris as the reference point when two templates are compared the

correspondence will automatically be aligned to each other since they have

the similar reference point Every feature vector of the template is a set of

line segment descriptors composed of three variable (Figure 8) the

segment angle to the reference line which went through the iris center

denoted as θ the distance between the segments center and pupil center

which is denoted as r the dominant angular orientation of the segment

denoted as ɸ To minimize the GPU computing we also convert the

descriptor value from polar coordinate to rectangular coordinate in CPU

preprocess

The descriptor vector becomes s(x y r θ ɸw) The left and right

parts of sclera in an eye may have different registration parameters For

example as an eyeball moves left left part sclera patterns of the eye may be

compressed while the right part sclera patterns are stretched

In parallel matching these two parts are assigned to threads in

different warps to allow different deformation The multiprocessor in

CUDA manages threads in groups of 32 parallel threads called warps We

reorganized the descriptor from same sides and saved

FIG

FIG

them in continuous address This would meet requirement of coalesced

memory access in GPU

After reorganizing the structure of descriptors and adding mask information

into the new descriptor the computation on the mask file is not needed on

the GPU It was very fast to match with this feature because it does not

need to reregister the templates every time after shifting Thus the cost of

data transfer and computation on GPU will be reduced Matching on the

new descriptor the shift parameters generator in Figure 4 is then simplified

as Figure 9

23 EVOLUTION OF GPU ARCHITECTURE

The fixed-function pipeline lacked the generality to efficiently express

more complicated shading and lighting operations that are essential for

complex effects The key step was replacing the fixed-function per-vertex

and per-fragment operations with user-specified programs run on each

vertex and fragment Over the past six years these vertex programs and

fragment programs have become increasingly more capable with larger

limits on their size and resource consumption with more fully featured

instruction sets and with more flexible control-flow operations After many

years of separate instruction sets for vertex and fragment operations current

GPUs support the unified Shader Model 40 on both vertex and fragment

shaders

The hardware must support shader programs of at least 65 k static

instructions and unlimited dynamic instructions

The instruction set for the first time supports both 32-bit integers and 32-

bit floating-point numbers

The hardware must allow an arbitrary number of both direct and indirect

reads from global memory (texture)

Finally dynamic flow control in the form of loops and branches must be

supported

As the shader model has evolved and become more powerful and GPU

applications of all types have increased vertex and fragment program

complexity GPU architectures have increasingly focused on the

programmable parts of the graphics pipeline Indeed while previous

generations of GPUs could best be described as additions of

programmability to a fixed-function pipeline todayrsquos GPUs are better

characterized as a programmable engine surrounded by supporting fixed-

function units General-Purpose Computing on the GPU Mapping general-

purpose computation onto the GPU uses the graphics hardware in much the

same way as any standard graphics application Because of this similarity it

is both easier and more difficult to explain the process On one hand the

actual operations are the same and are easy to follow on the other hand the

terminology is different between graphics and general-purpose use Harris

provides an excellent description of this mapping process

We begin by describing GPU programming using graphics terminology

then show how the same steps are used in a general-purpose way to author

GPGPU applications and finally use the same steps to show the more

simple and direct way that todayrsquos GPU computing applications are written

231 PROGRAMMING A GPU FOR GRAPHICS

We begin with the same GPU pipeline that we described in Section II

concentrating on the programmable aspects of this pipeline

The programmer specifies geometry that covers a region on the screen

The rasterizer generates a fragment at each pixel location covered by that

geometry

Each fragment is shaded by the fragment program

The fragment program computes the value of the fragment by a

combination of math operations and global memory reads from a global

Btexture[ memory

The resulting image can then be used as texture on future passes through

the graphics pipeline

232 PROGRAMMING A GPU FOR GENERAL-PURPOSE

PROGRAMS (OLD)

Coopting this pipeline to perform general-purpose computation

involves the exact same steps but different terminology A motivating

example is a fluid simulation computed over a grid at each time step we

compute the next state of the fluid for each grid point from the current state

at its grid point and at the grid points of its neighbors

The programmer specifies a geometric primitive that covers a

computation domain of interest The rasterizer generates a fragment at each

pixel location covered by that geometry (In our example our primitive

must cover a grid of fragments equal to the domain size of our fluid

simulation)

Each fragment is shaded by an SPMD general-purpose fragment

program (Each grid point runs the same program to update the state of its

fluid)

The fragment program computes the value of the fragment by a

combination of math operations and Bgather[ accesses from global

memory (Each grid point can access the state of its neighbors from the

previous time step in computing its current value)

The resulting buffer in global memory can then be used as an input on

future passes (The current state of the fluid will be used on the next time

step)

23 2 PROGRAMMING A GPU FOR GENERAL-PURPOSE

PROGRAMS (NEW)

One of the historical difficulties in programming GPGPU applications

has been that despite their general-purpose tasksrsquo having nothing to do with

graphics the applications still had to be programmed using graphics APIs

In addition the program had to be structured in terms of the graphics

pipeline with the programmable units only accessible as an intermediate

step in that pipeline when the programmer would almost certainly prefer to

access the programmable units directly The programming environments we

describe in detail in Section IV are solving this difficulty by providing a

more natural direct non-graphics interface to the hardware and

specifically the programmable units Today GPU computing applications

are structured in the following way

The programmer directly defines the computation domain of interest as a

structured grid of threads

An SPMD general-purpose program computes the value of each thread

The value for each thread is computed by a combination of math

operations and both Bgather[ (read) accesses from and Bscatter[ (write)

accesses to global memory Unlike in the previous two

methods the same buffer can be used for both reading and writing

allowing more flexible algorithms (for example in-place algorithms that

use less memory)

The resulting buffer in global memory can then be used as an input in

future computation

24 COARSE-TO-FINE TWO-STAGE MATCHING PROCESS

To further improve the matching process we propose the coarse-to-fine

two-stage matching process In the first stage we matched two images

coarsely using the Y-shape descriptors which is very fast to match because

no registration was needed The matching result in this stage can help filter

out image pairs with low similarities After this step it is still possible for

some false positive matches In the second stage we used WPL descriptor

to register the two images for more detailed descriptor matching including

scale- and translation invariance This stage includes shift transform affine

matrix generation and final WPL descriptor matching Overall we

partitioned the registration and matching processing into four kernels2 in

CUDA (Figure 10) matching on the Y shape descriptor shift

transformation affine matrix generation and final WSL descriptor

matching Combining these two stages the matching program can run faster

and

achieve more accurate score

241 STAGE I MATCHING WITH Y SHAPE DESCRIPTOR

Due to scale- and rotation- invariance of the Y-shape features

registration is unnecessary before matching on Y shape descriptor The

whole matching algorithm is listed as algorithm 1

FIG

Here ytei and yta j are the Y shape descriptors of test template Tte

and target template Tta respectively dϕ is the Euclidian distance of angle

element of descriptors vector defined as (3) dxy is the Euclidian distance of

two descriptor centers defined as (4) ni and di are the matched descriptor

pairsrsquo number and their centers distance respectively tϕ is a distance

threshold and txy is the threshold to restrict the searching area We set tϕ to

30 and txy to 675 in our experiment Here

To match two sclera templates we searched the areas nearby to all

the Y shape branches The search area is limited to the corresponding left or

right half of the sclera in order to reduce the searching range and time The

distance of two branches is defined in (3) where ϕi j is the angle between

the j th branch and the polar from pupil center in desctiptor i

The number of matched pairs ni and the distance between Y shape

branches centers di are stored as the matching result We fuse the number of

matched branches and the average distance between matched branches

centers as (2) Here α is a factor to fuse the matching score which was set

to 30 in our study Ni and Nj is the total numbers of feature vectors in

template i and j separately The decision is regulated by the threshold t if

the sclerarsquos matching score is lower than t the sclera will be discarded The

sclera with high matching score will be passed to the next more precisely

matching process

242 STAGE II FINE MATCHING USING WPL DESCRIPTOR

The line segment WSL descriptor reveals more vessel structure detail of

sclera than the Y shape descriptor The variation of sclera vessel pattern is

nonlinear because

When acquiring an eye image in different gaze angle the vessel structure

will appear nonlinear shrink or extend because eyeball is spherical in shape

sclera is made up of four layers episclera stroma lamina fusca and

endothelium There are slightly differences among movement of these

layers Considering these factors our registration employed both single

shift transform and multi-parameter transform which combines shift

rotation and scale together

1) SHIFT PARAMETER SEARCH As we discussed before

segmentation may not be accurate As a result the detected iris center could

not be very accurate Shift transform is designed to tolerant possible errors

in pupil center detection in the segmentation step If there is no deformation

or only very minor deformation registration with shift transform together

would be adequate to achieve an accurate result We designed Algorithm 2

to get optimized shift parameter Where Tte is the test template and ssei is

the i th WPL descriptor of Tte Tta is the target template and ssai is the i th

WPL descriptor of Ttad(stek staj ) is Euclidean distance of escriptors stek

and staj

Δsk is the shift value of two descriptors defines as

We first randomly select an equal number of segment descriptors

stek in test template Tte from each quad and find its nearest neighbors staj _

in target template Tta The shift offset of them is recorded as the possible

registration shift factor _sk The final offset registration factor is _soptim

which has the smallest standard deviation among these candidate offsets

2) AFFINE TRANSFORM PARAMETER SEARCH

Affine transform is designed to tolerant some deformation of sclera

patterns in the matching step The affine transform algorithm is shown in

Algorithm 3 The shift value in the parameter set is obtained by randomly

selecting descriptor s(it )te and calculating the distance from its nearest

neighbor staj_ in Tta We transform the test template by the matrix in (7)

At end of the iteration we count the numbers of matched descriptor pairs

from the transformed template and the target template The factor β is

involved to determine if the pair of descriptor is matched and we set it to

be 20 pixels in our experiment After N iterations the optimized transform

parameter set is determined via selecting the maximum matching numbers

m(it) Here stei Tte staj and Tta is defined same as algorithm 2 tr (it )

shi f t θ(it )tr (it ) scale is the parameters of shift rotation and scale

transform generated in i tth iteration R(θ (it )) T (tr (it ) shi f t ) and S(tr (it

) scale) are the transform matrix defined as (7) To search optimize

transform parameter we iterated N times to generate these parameters In

our experiment we set iteration time to 512

3) REGISTRATION AND MATCHING ALGORITHM

Using the optimized parameter set determined from Algorithms 2

and 3 the test template will be registered and matched simultaneously The

registration and matching algorithm is listed in Algorithm 4 Here stei Tte

staj and Tta are defined same as Algorithms 2 and 3 θ(optm) tr (optm) shi f

t tr (optm) scale _soptim are the registration parameters attained from

Algorithms 2 and 3 R_θ(optm)_T _tr (optm) shi f t _S_tr (optm) scale

_ is the descriptor transform matrix defined in Algorithm 3 empty is the angle

between the segment descriptor and radius direction w is the weight of the

descriptor which indicates whether the descriptor is at the edge of sclera or

not To ensure that the nearest descriptors have a similar orientation we

used a constant factor α to check the abstract difference of two ɸ In our

experiment we set α to 5 The total matching score is minimal score of two

transformed result divided by the minimal matching score for test template

and target template

25 MAPPING THE SUBTASKS TO CUDA

CUDA is a single instruction multiple data (SIMD) system and

works as a coprocessor with a CPU A CUDA consists of many streaming

multiprocessors (SM) where the parallel part of the program should be

partitioned into threads by the programmer and mapped into those threads

There are multiple memory spaces in the CUDA memory hierarchy

register local memory shared memory global memory constant memory

and texture memory Register local memory and shared memory are on-

chip and could be a little time consuming to access these memories Only

shared memory can be accessed by other threads within the same block

However there is only limited availability of shared memory Global

memory constant memory and texture memory are off-chip memory and

accessible by all threads which would be very time consuming to access

these memories

Constant memory and texture memory are read-only and cacheable

memory Mapping algorithms to CUDA to achieve efficient processing is

not a trivial task There are several challenges in CUDA programming

If threads in a warp have different control path all the branches will be

executed serially To improve performance branch divergence within a

warp should be avoided

Global memory is slower than on-chip memory in term of access To

completely hide the latency of the small instructions set we should use on-

chip memory preferentially rather than global memory When global

memory access occurs threads in same warp should access the words in

sequence to achieve coalescence

Shared memory is much faster than the local and global memory space

But shared memory is organized into banks which are equal in size If two

addresses of memory request from different thread within a warp fall in the

same memory bank the access will be serialized To get maximum

performance memory requests should be scheduled to minimize bank

conflicts

251 MAPPING ALGORITHM TO BLOCKS

Because the proposed registration and matching algorithm has four

independent modules all the modules will be converted to different kernels

on the GPU These kernels are different in computation density thus we

map them to the GPU by various map strategies to fully utilize the

computing power of CUDA Figure 11 shows our scheme of CPU-GPU

task distribution and the partition among blocks and threads Algorithm 1 is

partitioned into coarse-grained parallel subtasks

We create a number of threads in this kernel The number of threads

is the same as the number of templates in the database As the upper middle

column shows in Figure 11 each target template will be assigned to one

thread One thread performs a pair of templates compare In our work we

use NVIDIA C2070 as our GPU Threads and blocks number is set to

1024 That means we can match our test template with up to 1024times1024

target templates at same time

Algorithms 2-4 will be partitioned into fine-grained subtasks which is

processed a section of descriptors in one thread As the lower portion of the

middle column shows in Figure 11 we assigned a target template to one

block Inside a block one thread corresponds a set of descriptors in this

template This partition makes every block execute independently and there

are no data exchange requirements between different blocks When all

threads complete their responding descriptor fractions the sum of the

intermediate results needs to be computed or compared A parallel prefix

sum algorithm is used to calculate the sum of intermediate results which is

show in right of Figure 11 Firstly all odd number threads compute the sum

of consecutive pairs of the results Then recursively every first of i (= 4 8

16 32 64 ) threads

compute the prefix sum on the new result The final result will be saved in

the first address which has the same variable name as the first intermediate

result

252 MAPPING INSIDE BLOCK

In shift argument searching there are two schemes we can choose to

map task

Mapping one pair of templates to all the threads in a block and then every

thread would take charge of a fraction of descriptors and cooperation with

other threads

Assigning a single possible shift offset to a thread and all the threads will

compute independently unless the final result should be compared with

other possible offset

Due to great number of sum and synchronization operations in every

nearest neighbor searching step we choose the second method to parallelize

shift searching In affine matrix generator we mapped an entire parameter

set searching to a thread and every thread randomly generated a set of

parameters and tried them independently The generated iterations were

assigned to all threads The challenge of this step is the randomly generated

numbers might be correlated among threads In the step of rotation and

scale registration generating we used the Mersenne Twister pseudorandom

number generator because it can use bitwise arithmetic and have long

period

The Mersenne twister as most of pseudorandom generators is iterative

Therefore itrsquos hard to parallelize a single twister state update step among

several execution threads To make sure that thousands of threads in the

launch grid generate uncorrelated random sequence many simultaneous

Mersenne twisters need to process with different initial states in parallel

But even ldquovery differentrdquo (by any definition) initial state values do not

prevent the emission of correlated sequences by each generator sharing

identical parameters To solve this problem and to enable efficient

implementation of Mersenne Twister on parallel architectures we used a

special offline tool for the dynamic creation of Mersenne Twisters

parameters modified from the algorithm developed by Makoto Matsumoto

and Takuji Nishimura In the registration and matching step when

searching the nearest neighbor a line segment that has already matched

with others should not be used again In our approach a flag

FIG

FIG

Variable denoting whether the line has been matched is stored in

shared memory To share the flags all the threads in a block should wait

synchronic operation at every query step Our solution is to use a single

thread in a block to process the matching

253 MEMORY MANAGEMENT

The bandwidth inside GPU board is much higher than the

bandwidth between host memory and device memory The data transfer

between host and device can lead to long latency As shown in Figure 11

we load the entire target templates set from database without considering

when they would be processed Therefore there was no data transfer from

host to device during the matching procedure In global memory the

components in descriptor y(ϕ1 ϕ2 ϕ3 x y) and s(x y rθ ϕw) were stored

separately This would guarantee contiguous kernels of Algorithm 2 to 4

can access their data in successive addresses Although such coalescing

access reduces the latency frequently global memory access was still a

slower way to get data In our kernel we loaded the test template to shared

memory to accelerate memory access Because the Algorithms 2 to 4

execute different number of iterations on same data the bank conflict does

not happen To maximize our texture memory space we set the system

cache to the lowest value and bonded our target descriptor to texture

memory Using this catchable memory our data access was accelerated

more

FIG

26 HISTOGRAM OF ORIENTED GRADIENTS

Histogram of oriented gradients is the feature descriptors It is primarily

applied to the design of target detection In this paper it is applied as the

feature for human recognition In the sclera region the vein patterns are the

edges of an image So HOG is used to determine the gradient orientation

and edge orientations of vein pattern in the sclera region of an eye image

To follow out this technique first of all divide the image into small

connected regions called cells For each cell compute the histogram of

gradient directions or edge orientations of the pixels Then the combination

of different histogram of different cell represents the descriptor To improve

accuracy histograms can be contrast normalized by calculating the intensity

from the block and then using this value normalizes all cells within the

block This normalization result shows that it is invariant to geometric and

photometric changes The gradient magnitude m(x y) and orientation 1050592(x

y) are calculated using x and y directions gradients dx (x y) and dy (x y)

Orientation binning is the second step of HOG This method utilized

to create cell histograms Each pixel within the cell used to give a weight to

the orientation which is found in the gradient computation Gradient

magnitude is used as the weight The cells are in the rectangular form The

binning of gradient orientation should be spread over 0 to 180 degrees and

opposite direction counts as the same In the Fig 8 depicts the edge

orientation of picture elements If the images have any illumination and

contrast changes then the gradient strength must be locally normalized For

that cells are grouped together into larger blocks These blocks are

overlapping so that each cell contributes more than once to the final

descriptor Here rectangular HOG (R-HOG) blocks are applied which are

mainly in square grids The performance of HOG is improved by putting

on a Gaussian window into each block

FIG

CHAPTER 3

SOFTWARE SPECIFICATION

31 GENERAL

MATLAB(matrix laboratory) is a numerical

computing environment and fourth-generation programming language

Developed by Math Works MATLAB allows matrix manipulations

plotting of functions and data implementation of algorithms creation

of user interfaces and interfacing with programs written in other languages

including C C++ Java and Fortran

Although MATLAB is intended primarily for numerical computing an

optional toolbox uses the MuPAD symbolic engine allowing access

to symbolic computing capabilities An additional package Simulink adds

graphicalmulti-domainsimulationandModel-Based

Design for dynamic and embedded systems

In 2004 MATLAB had around one million users across industry

and academia MATLAB users come from various backgrounds

of engineering science and economics MATLAB is widely used in

academic and research institutions as well as industrial enterprises

MATLAB was first adopted by researchers and practitioners

in control engineering Littles specialty but quickly spread to many other

domains It is now also used in education in particular the teaching

of linear algebra and numerical analysis and is popular amongst scientists

involved in image processing The MATLAB application is built around the

MATLAB language The simplest way to execute MATLAB code is to type

it in the Command Window which is one of the elements of the MATLAB

Desktop When code is entered in the Command Window MATLAB can

be used as an interactive mathematical shell Sequences of commands can

be saved in a text file typically using the MATLAB Editor as a script or

encapsulated into a function extending the commands available

MATLAB provides a number of features for documenting and

sharing your work You can integrate your MATLAB code with other

languages and applications and distribute your MATLAB algorithms and

applications

32 FEATURES OF MATLAB

High-level language for technical computing

Development environment for managing code files and data

Interactive tools for iterative exploration design and problem solving

Mathematical functions for linear algebra statistics Fourier analysis

filtering optimization and numerical integration

2-D and 3-D graphics functions for visualizing data

Tools for building custom graphical user interfaces

Functions for integrating MATLAB based algorithms with external

applications and languages such as C C++ FORTRAN Javatrade COM

and Microsoft Excel

MATLAB is used in vast area including signal and image

processing communications control design test and measurement

financial modeling and analysis and computational Add-on toolboxes

(collections of special-purpose MATLAB functions) extend the MATLAB

environment to solve particular classes of problems in these application

areas

MATLAB can be used on personal computers and powerful

server systems including the Cheaha compute cluster With the addition of

the Parallel Computing Toolbox the language can be extended with parallel

implementations for common computational functions including for-loop

unrolling Additionally this toolbox supports offloading computationally

intensive workloads to Cheaha the campus compute cluster MATLAB is

one of a few languages in which each variable is a matrix (broadly

construed) and knows how big it is Moreover the fundamental operators

(eg addition multiplication) are programmed to deal with matrices when

required And the MATLAB environment handles much of the bothersome

housekeeping that makes all this possible Since so many of the procedures

required for Macro-Investment Analysis involves matrices MATLAB

proves to be an extremely efficient language for both communication and

implementation

321 INTERFACING WITH OTHER LANGUAGES

MATLAB can call functions and subroutines written in the C

programming language or FORTRAN A wrapper function is created

allowing MATLAB data types to be passed and returned The dynamically

loadable object files created by compiling such functions are termed MEX-

files (for MATLAB executable)

Libraries written in Java ActiveX or NET can be directly called

from MATLAB and many MATLAB libraries (for

example XML or SQL support) are implemented as wrappers around Java

or ActiveX libraries Calling MATLAB from Java is more complicated but

can be done with MATLAB extension which is sold separately by Math

Works or using an undocumented mechanism called JMI (Java-to-Mat lab

Interface) which should not be confused with the unrelated Java that is also

called JMI

As alternatives to the MuPAD based Symbolic Math Toolbox

available from Math Works MATLAB can be connected

to Maple or Mathematical

Libraries also exist to import and export MathML

Development Environment

Startup Accelerator for faster MATLAB startup on Windows especially on

Windows XP and for network installations

Spreadsheet Import Tool that provides more options for selecting and

loading mixed textual and numeric data

Readability and navigation improvements to warning and error messages in

the MATLAB command window

Automatic variable and function renaming in the MATLAB Editor

Developing Algorithms and Applications

MATLAB provides a high-level language and development

tools that let you quickly develop and analyze your algorithms and

applications

The MATLAB Language

The MATLAB language supports the vector and matrix operations

that are fundamental to engineering and scientific problems It enables fast

development and execution With the MATLAB language you can

program and develop algorithms faster than with traditional languages

because you do not need to perform low-level administrative tasks such as

declaring variables specifying data types and allocating memory In many

cases MATLAB eliminates the need for lsquoforrsquo loops As a result one line of

MATLAB code can often replace several lines of C or C++ code

At the same time MATLAB provides all the features of a traditional

programming language including arithmetic operators flow control data

structures data types object-oriented programming (OOP) and debugging

features

MATLAB lets you execute commands or groups of commands one

at a time without compiling and linking enabling you to quickly iterate to

the optimal solution For fast execution of heavy matrix and vector

computations MATLAB uses processor-optimized libraries For general-

purpose scalar computations MATLAB generates machine-code

instructions using its JIT (Just-In-Time) compilation technology

This technology which is available on most platforms provides

execution speeds that rival those of traditional programming languages

Development Tools

MATLAB includes development tools that help you implement

your algorithm efficiently These include the following

MATLAB Editor

Provides standard editing and debugging features such as setting

breakpoints and single stepping

Code Analyzer

Checks your code for problems and recommends modifications to

maximize performance and maintainability

MATLAB Profiler

Records the time spent executing each line of code

Directory Reports

Scan all the files in a directory and report on code efficiency file

differences file dependencies and code coverage

Designing Graphical User Interfaces

By using the interactive tool GUIDE (Graphical User Interface

Development Environment) to layout design and edit user interfaces

GUIDE lets you include list boxes pull-down menus push buttons radio

buttons and sliders as well as MATLAB plots and Microsoft

ActiveXreg controls Alternatively you can create GUIs programmatically

using MATLAB functions

322 ANALYZING AND ACCESSING DATA

MATLAB supports the entire data analysis process from acquiring

data from external devices and databases through preprocessing

visualization and numerical analysis to producing presentation-quality

output

Data Analysis

MATLAB provides interactive tools and command-line functions for data

analysis operations including

Interpolating and decimating

Extracting sections of data scaling and averaging

Thresholding and smoothing

Correlation Fourier analysis and filtering

1-D peak valley and zero finding

Basic statistics and curve fitting

Matrix analysis

Data Access

MATLAB is an efficient platform for accessing data from

files other applications databases and external devices You can read data

from popular file formats such as Microsoft Excel ASCII text or binary

files image sound and video files and scientific files such as HDF and

HDF5 Low-level binary file IO functions let you work with data files in

any format Additional functions let you read data from Web pages and

XML

Visualizing Data

All the graphics features that are required to visualize engineering

and scientific data are available in MATLAB These include 2-D and 3-D

plotting functions 3-D volume visualization functions tools for

interactively creating plots and the ability to export results to all popular

graphics formats You can customize plots by adding multiple axes

changing line colors and markers adding annotation Latex equations and

legends and drawing shapes

2-D Plotting

Visualizing vectors of data with 2-D plotting functions that create

Line area bar and pie charts

Direction and velocity plots

Histograms

Polygons and surfaces

Scatterbubble plots

Animations

3-D Plotting and Volume Visualization

MATLAB provides functions for visualizing 2-D matrices 3-

D scalar and 3-D vector data You can use these functions to visualize and

understand large often complex multidimensional data Specifying plot

characteristics such as camera viewing angle perspective lighting effect

light source locations and transparency

3-D plotting functions include

Surface contour and mesh

Image plots

Cone slice stream and isosurface

323 PERFORMING NUMERIC COMPUTATION

MATLAB contains mathematical statistical and engineering

functions to support all common engineering and science operations These

functions developed by experts in mathematics are the foundation of the

MATLAB language The core math functions use the LAPACK and BLAS

linear algebra subroutine libraries and the FFTW Discrete Fourier

Transform library Because these processor-dependent libraries are

optimized to the different platforms that MATLAB supports they execute

faster than the equivalent C or C++ code

MATLAB provides the following types of functions for performing

mathematical operations and analyzing data

Matrix manipulation and linear algebra

Polynomials and interpolation

Fourier analysis and filtering

Data analysis and statistics

Optimization and numerical integration

Ordinary differential equations (ODEs)

Partial differential equations (PDEs)

Sparse matrix operations

MATLAB can perform arithmetic on a wide range of data types

including doubles singles and integers

CHAPTER 4

IMPLEMENTATION

41 GENERAL

Matlab is a program that was originally designed to simplify the

implementation of numerical linear algebra routines It has since grown into

something much bigger and it is used to implement numerical algorithms

for a wide range of applications The basic language used is very similar to

standard linear algebra notation but there are a few extensions that will

likely cause you some problems at first

42 SNAPSHOTS

ORIGINAL SCLERA IMAGE IS CONVERTED INTO GREY SCALE IMAGE

FIG

GREY SCALE IMAGE IS CONVERTED INTO BINARY IMAGE

FIG

EDGE DETECTON IS DONE BY OTSUrsquoS THRESHOLDING

FIG

SELECTING THE REGION OF INTEREST (SCLERA PART)

FIG

SELECTED ROI PART

FIG

FIG

ENHANCEMENT OF SCLERA IMAGE

FIG

FEATURE EXTRACTION OF SCLERA IMAGE USING GABOR

FILTERS

FIG

MATCHING WITH IMAGES IN DATABASE

FIG

DISPLAYING THE RESULT (MATCHED OR NOT MATCHED)

FIG

CHAPTER 5

APPLICATIONS

The applications of biometrics can be divided into the following three main groups

Commercial applications such as computer network login electronic data security ecommerce Internet access ATM credit card physical access control cellular phone PDA medical records management distance learning etc

Government applications such as national ID card correctional facility driverrsquos license social security welfare-disbursement border control Passport control etc

Forensic applications such as corpse identification criminal investigation terrorist identification parenthood determination missing children etc Traditionally commercial applications have used knowledge-based systems (eg PIN sand passwords) government applications have used token-based systems (eg ID cards and badges) and forensic applications have relied on human experts to match biometric features Biometric systems are being increasingly deployed in large scale civilian applications The Schiphol Premium scheme at the Amsterdam airport for example employs iris scan cards to speed up the passport and visa control procedures

CHAPTER 6

CONCLUSION AND FUTURE SCOPE

61 CONCLUSION

In this paper we proposed a new parallel sclera vein recognition

method which employees a two stage parallel approach for registration and

matching Even though the research focused on developing a parallel sclera

matching solution for the sequential line-descriptor method using CUDA

GPU architecture the parallel strategies developed in this research can be

applied to design parallel solutions to other sclera vein recognition methods

and general pattern recognition methods We designed the Y shape

descriptor to narrow the search range to increase the matching efficiency

which is a new feature extraction method to take advantage of the GPU

structures We developed the WPL descriptor to incorporate mask

information and make it more suitable for parallel computing which can

dramatically reduce data transferring and computation We then carefully

mapped our algorithms to GPU threads and blocks which is an important

step to achieve parallel computation efficiency using a GPU A work flow

which has high arithmetic intensity to hide the memory access latency was

designed to partition the computation task to the heterogeneous system of

CPU and GPU even to the threads in GPU The proposed method

dramatically improves the matching efficiency without compromising

recognition accuracy

62 REFERENCES

[1] C W Oyster The Human Eye Structure and Function Sunderland

Sinauer Associates 1999

[2] C Cuevas D Berjon F Moran and N Garcia ldquoMoving object

detection for real-time augmented reality applications in a GPGPUrdquo IEEE

Trans Consum Electron vol 58 no 1 pp 117ndash125 Feb 2012

[3] D C Cirean U Meier L M Gambardella and J Schmidhuber ldquoDeep

big simple neural nets for handwritten digit recognitionrdquo Neural Comput

vol 22 no 12 pp 3207ndash3220 2010

[4] F Z Sakr M Taher and A M Wahba ldquoHigh performance iris

recognition system on GPUrdquo in Proc ICCES 2011 pp 237ndash242

[5] G Poli J H Saito J F Mari and M R Zorzan ldquoProcessing

neocognitron of face recognition on high performance environment based

on GPU with CUDA architecturerdquo in Proc 20th Int Symp Comput

Archit High Perform Comput 2008 pp 81ndash88

[6] J Antikainen J Havel R Josth A Herout P Zemcik and M Hauta-

Kasari ldquoNonnegative tensor factorization accelerated using GPGPUrdquo IEEE

Trans Parallel Distrib Syst vol 22 no 7 pp 1135ndash1141 Feb 2011

[7] K-S Oh and K Jung ldquoGPU implementation of neural networksrdquo

Pattern Recognit vol 37 no 6 pp 1311ndash1314 2004

[8] P R Dixon T Oonishi and S Furui ldquoHarnessing graphics processors

for the fast computation of acoustic likelihoods in speech recognitionrdquo

Comput Speech Lang vol 23 no 4 pp 510ndash526 2009

[9] P Kaufman and A Alm ldquoClinical applicationrdquo Adlerrsquos Physiology of

the Eye 2003

[10] R N Rakvic B J Ulis R P Broussard R W Ives and N Steiner

ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics Security vol 4

no 4 pp 812ndash823 Dec 2009

[11] S Crihalmeanu and A Ross ldquoMultispectral scleral patterns for ocular

biometric recognitionrdquo Pattern Recognit Lett vol 33 no 14 pp 1860ndash

1869 Oct 2012

[12] W Wenying Z Dongming Z Yongdong L Jintao and G

Xiaoguang ldquoRobust spatial matching for object retrieval and its parallel

implementation on GPUrdquo IEEE Trans Multimedia vol 13 no 6 pp

1308ndash1318 Dec 2011Multimedia Sec Magdeburg Germany Sep 2004

pp 4ndash15

[13] Y Xu S Deka and R Righetti ldquoA hybrid CPU-GPGPU approach for

real-time elastographyrdquo IEEE Trans Ultrason Ferroelectr Freq Control

vol 58 no 12 pp 2631ndash2645 Dec 2011

[14] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive

multimodal eye recognitionrdquo Signal Image Video Process vol 7 no 4

pp 619ndash631 Jul 2013

[15] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive

approach for sclera image quality measurerdquo Int J Biometrics vol 5 no 2

pp 181ndash198 2013

[16] Z Zhou E Y Du N L Thomas and E J Delp ldquoA new human

identification method Sclera recognitionrdquo IEEE Trans Syst Man

Cybern A Syst Humans vol 42 no 3 pp 571ndash583 May 2012

Page 2: Major project report on

CERTIFICATE

This is to certify that the project report entitled ldquoAN EFFICIENT

PARALLEL APPROACH FOR SCLERA VEIN RECOGNITIONrdquo

being submitted by AKIRANMAYI ASRILATHA and MAYESHA

MUBEEN to Jawaharlal Nehru Technological University Hyderabad

for the award of the degree of Bachelor of Technology in Electronics and

Communication Engineering This is a record of bonafide work carried

out by them under my supervision and guidance

The matter contained in this report has not been submitted to any other university or

institute for the award of any degree or diploma

MS SYEDA SANA FATIMA MS SSUNEETA

INTERNAL GUIDE HEAD OF THE DEPARTMENT

EXTERNAL GUIDE

ACKNOWLEDGEMENT

This is a report giving details of our project work titled ldquoAN EFFICIENT

PARALLEL APPROACH FOR SCLERA VEIN RECOGNITIONrdquo

Though this an attempt has been made to present the description of all the

theoretical and practical aspects of our project to all the possible extent

We take this opportunity to express our sincere appreciation to professor

Ms SSuneeta Head of the Department and staff of Bachelor of

Technology for their invaluable suggestion and keen interest they have

shown in successful completion of this project

We express our deep gratitude to our guide Ms Syeda Sana Fatima

whose in valuable reference Suggestion and encouragement have

immensely helped in successful completion of the project This project

would add as an asset to our academic profile

It is with profound sense of gratitude that we acknowledge our project

guide Ms Syeda Sana Fatima for providing us with live specification and

her valuable suggestion which encourage me to complete their project

successfully

We are happy to express our gratitude to one and all who helped us in the fulfilment

of the project successfully

We are thankful to our principal Dr MAZHER SALEEM Shadan Womenrsquos

College of Engineering and Technology for encouraging us to do the projects

ADLA KIRANMAYI

ANNABATHULA SRILATHA

MAYESHA MUBEEN

DECLARATION

We hereby declare that the work which is being presented in this project

entitled ldquoAN EFFICIENT PARALLEL APPROACH FOR SCLERA

VEIN RECOGNITIONrdquo submitted towards the partial fulfilment of the

requirement for the award of the Degree of Bachelor Of Technology in

ldquoElectronics and Communication Engineeringrdquo is an authentic record of

our work under the supervision of Ms Syeda Sana Fatima Assistant

professor and Ms SSuneeta Head of the Department of Electronics and

Communication Engineering SHADAN WOMENS COLLEGE OF

ENGINEERING AND TECHNOLOGY affiliated to Jawaharlal Nehru

Technology University Hyderabad

The matter embodied in this report has not been submitted for the award of

any other degree

ADLA KIRANMAYI

ANNABATHULA SRILATHA

MAYESHA MUBEEN

INDEX

ABSTRACThelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

helliphelliphelliphellipi

CHAPTER 1

INTRODUCTIONhelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

hellip1-16

11 GENERAL

12 OVERVIEW ABOUT DIGITAL IMAGE PROCESSING

121 PREPROCESSING

122 IMAGE ENHANCEMENT

123 IMAGE RESTORATION

124 IMAGE COMPRESSION

125 SEGMENTATION

126 IMAGE RESTORATION

127 FUNDAMENTAL STEPS

13 A SIMPLE IMAGE MODEL

14 IMAGE FILE FORMATS

15 TYPE OF IMAGES

151 BINARY IMAGES

152 GRAY SCALE IMAGE

153 COLOR IMAGE

154 INDEXED IMAGE

16 APPLICATIONS OF IMAGE PROCESSING

17 EXIXTING SYSTEM

171 DISADVANTAGES OF EXISTING SYSTEM

18 LITERATURE SURVEY

19 PROPOSED SYSTEM

191 ADVANTAGES

CHAPTER 2 PROJECT

DESCRIPTIONhelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip17-46

21 INTRODUCTION

22 BACKGROUND OF SCLERA VEIN RECOGNITION

221 OVERVIEW OF SCLERA VEIN RECOGNITION

222 SCLERA SEGMENTATION

223 IRIS AND EYELID REFINEMENT

224 OCULAR SURFACE VASCULATURE

225 OVERVIEW OF THE LINE DESCRIPTOR-BASED SCLERA

VEIN

23 EVOLUTION OF GPU ARCHITECTURE

231 PROGRAMMING A GPU FOR GRAPHICS

232 PROGRAMMING A GPU FOR GENERAL-PURPOSE

PROGRAMS (OLD)

233 PROGRAMMING A GPU FOR GENERAL-PURPOSE

PROGRAMS (NEW)

24 COARSE-TO-FINE TWO-STAGE MATCHING PROCESS

241 STAGE I MATCHING WITH Y SHAPE DESCRIPTOR

242 STAGE II FINE MATCHING USING WPL DESCRIPTOR

25 MAPPING THE SUBTASKS TO CUDA

251 MAPPING ALGORITHM TO BLOCKS

252 MAPPING INSIDE BLOCK

253 MEMORY MANAGEMENT

26 HISTOGRAM OF ORIENTED GRADIENTS

CHAPTER 3 SOFTWARE

SPECIFICATIONhelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip47-53

31 GENERAL

32 SOFTWARE REQUIREMENTS

33 INTRODUCTION

34 FEATURES OF MATLAB

341 INTERFACING WITH OTHER LANGUAGES

35 THE MATLAB SYSTEM

351 DESKTOP TOOLS

352 ANALYZING AND ACCESSING DATA

353 PERFORMING NUMERIC COMPUTATION

CHAPTER 4

IMPLEMENTATIONhelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

helliphellip54-69

41 GENERAL

42 CODING IMPLEMENTATION

43 SNAPSHOTS

CHAPTER 5 helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

70

CHAPTER 6 CONCLUSION amp FUTURE

SCOPEhelliphelliphelliphelliphelliphelliphelliphelliphelliphellip71-72

61 CONCLUSION

62 REFERENCES

APPLICATION

LIST OF FIGURES

FIG NO

FIG NAME PGNO

11 Fundamental blocks of digital image processing

2

12 Gray scale image 813 The additive model of RGB 914 The colors created by the

subtractive model of CMYK9

21 The diagram of a typical sclera vein recognition approach

19

22 Steps of Segmentation 2123 Glare area detection 2124 Detection of the sclera area 22

25 Pattern of veins 23

26 Sclera region and its vein

patterns 25

27 Filtering can take place

simultaneously on different

25

parts of the iris image

28 The sketch of parameters of

segment descriptor

26

29 The weighting image 28

210 The module of sclera

template matching

28

211 The Y shape vessel branch in

sclera

28

212 The rotation and scale

invariant character of Y

shape vessel branch

29

213 The line descriptor of the

sclera vessel pattern

30

214 The key elements of

descriptor vector

31

215 Simplified sclera matching

steps on GPU

32

216 Two-stage matching scheme 35

217 Example image from the

UBIRIS database

42

218 Occupancy on various thread

numbers per block

43

219 The task assignment inside

and outside the GPU

44

220 HOG features 46

41 Original sclera image 65

42 Binarised sclera image 65

43 Edge map subtracted image 66

44 Cropping roi 66

45 Roi mask 67

46 Roi finger sclera image 67

47 Enhanced sclera image 68

48 Feature extracted sclera

image

68

49 Matching with images in

database

69

410 Result 69

ABSTRACT

Sclera vein recognition is shown to be a promising method for human

identification However its matching speed is slow which could impact its

application for real-time applications To improve the matching efficiency

we proposed a new parallel sclera vein recognition method using a two-

stage parallel approach for registration and matching First we designed a

rotation- and scale-invariant Y shape descriptor based feature extraction

method to efficiently eliminate most unlikely matches Second we

developed a weighted polar line sclera descriptor structure to incorporate

mask information to reduce GPU memory cost Third we designed a

coarse-to-fine two-stage matching method Finally we developed a

mapping scheme to map the subtasks to GPU processing units The

experimental results show that our proposed method can achieve dramatic

processing speed improvement without compromising the recognition

accuracy

CHAPTER 1

INTRODUCTION

11GENERAL

Digital image processing is the use of computer algorithms to

perform image processing on digital images The 2D continuous image is

divided into N rows and M columns The intersection of a row and a

column is called a pixel The image can also be a function other variables

including depth color and time An image given in the form of a

transparency slide photograph or an X-ray is first digitized and stored as a

matrix of binary digits in computer memory This digitized image can then

be processed andor displayed on a high-resolution television monitor For

display the image is stored in a rapid-access buffer memory which

refreshes the monitor at a rate of 25 frames per second to produce a visually

continuous display

12 OVERVIEW ABOUT DIGITAL IMAGE PROCESSING

The field of ldquoDigital Image Processingrdquo refers to processing the digital

images by means of a digital computer In a broader sense it can be

considered as a processing of any two dimensional data where any image

(optical information) is represented as an array of real or complex numbers

represented by a definite number of bits An image is represented as a two

dimensional function f(xy) where lsquoxrsquo and lsquoyrsquo are spatial (plane)

coordinates and the amplitude of f at any pair of coordinates (xy)

represents the intensity or gray level of the image at that point

A digital image is one for which both the co-ordinates and the

amplitude values of f are all finite discrete quantities Hence a digital

image is composed of a finite number of elements each of which has a

particular location value These elements are called ldquopixelsrdquo A digital

image is discrete in both spatial coordinates and brightness and it can be

considered as a matrix whose rows and column indices identify a point on

the image and the corresponding matrix element value identifies the gray

level at that point

One of the first applications of digital images was in the newspaper

industry when pictures were first sent by submarine cable between London

and New York Introduction of the Bartlane cable picture transmission

system in the early 1920s reduced the time required to transport a picture

across the Atlantic from more than a week to less than three hours

FIG

121 PREPROCESSING

In imaging science image processing is any form of signal

processing for which the input is an image such as a photograph or video

frame the output of image processing may be either an image or a set of

characteristics or parameters related to the image Most image-processing

techniques involve treating the image as a two-dimensional signal and

applying standard signal-processing techniques to it Image processing

usually refers to digital image processing but optical and analog image

processing also are possible This article is about general techniques that

apply to all of them The acquisition of images (producing the input image

in the first place) is referred to as imaging

Image processing refers to processing of a 2D picture by a

computer Basic definitions

An image defined in the ldquoreal worldrdquo is considered to be a function

of two real variables for example a(xy) with a as the amplitude (eg

brightness) of the image at the real coordinate position (xy) Modern digital

technology has made it possible to manipulate multi-dimensional signals

with systems that range from simple digital circuits to advanced parallel

computers The goal of this manipulation can be divided into three

categories

Image processing (image in -gt image out)

Image Analysis (image in -gt measurements out)

Image Understanding (image in -gt high-level description out)

An image may be considered to contain sub-images sometimes referred

to as regions-of-interest ROIs or simply regions This concept reflects the

fact that images frequently contain collections of objects each of which can

be the basis for a region In a sophisticated image processing system it

should be possible to apply specific image processing operations to selected

regions Thus one part of an image (region) might be processed to suppress

motion blur while another part might be processed to improve colour

rendition

Most usually image processing systems require that the images be

available in digitized form that is arrays of finite length binary words For

digitization the given Image is sampled on a discrete grid and each sample

or pixel is quantized using a finite number of bits The digitized image is

processed by a computer To display a digital image it is first converted

into analog signal which is scanned onto a display Closely related to

image processing are computer graphics and computer vision In computer

graphics images are manually made from physical models of objects

environments and lighting instead of being acquired (via imaging devices

such as cameras) from natural scenes as in most animated movies

Computer vision on the other hand is often considered high-level image

processing out of which a machinecomputersoftware intends to decipher

the physical contents of an image or a sequence of images (eg videos or

3D full-body magnetic resonance scans)

In modern sciences and technologies images also gain much

broader scopes due to the ever growing importance of scientific

visualization (of often large-scale complex scientificexperimental data)

Examples include microarray data in genetic research or real-time multi-

asset portfolio trading in finance Before going to processing an image it is

converted into a digital form Digitization includes sampling of image and

quantization of sampled values After converting the image into bit

information processing is performed This processing technique may be

Image enhancement Image restoration and Image compression

122 IMAGE ENHANCEMENT

It refers to accentuation or sharpening of image features such as

boundaries or contrast to make a graphic display more useful for display amp

analysis This process does not increase the inherent information content in

data It includes gray level amp contrast manipulation noise reduction edge

crispening and sharpening filtering interpolation and magnification

pseudo coloring and so on

123 IMAGE RESTORATION

It is concerned with filtering the observed image to minimize the

effect of degradations Effectiveness of image restoration depends on the

extent and accuracy of the knowledge of degradation process as well as on

filter design Image restoration differs from image enhancement in that the

latter is concerned with more extraction or accentuation of image features

124 IMAGE COMPRESSION

It is concerned with minimizing the number of bits required to represent

an image Application of compression are in broadcast TV remote sensing

via satellite military communication via aircraft radar teleconferencing

facsimile transmission for educational amp business documents medical

images that arise in Computer tomography magnetic resonance imaging

and digital radiology motion pictures satellite images weather maps

geological surveys and so on

Text compression ndash CCITT GROUP3 amp GROUP4

Still image compression ndash JPEG

Video image compression ndash MPEG

125 SEGMENTATION

In computer vision image segmentation is the process of

partitioning a digital image into multiple segments (sets of pixels also

known as super pixels) The goal of segmentation is to simplify andor

change the representation of an image into something that is more

meaningful and easier to analyze Image segmentation is typically used to

locate objects and boundaries (lines curves etc) in images More precisely

image segmentation is the process of assigning a label to every pixel in an

image such that pixels with the same label share certain visual

characteristics

The result of image segmentation is a set of segments that

collectively cover the entire image or a set of contours extracted from the

image (see edge detection) Each of the pixels in a region are similar with

respect to some characteristic or computed property such as

colour intensity or texture Adjacent regions are significantly different

with respect to the same characteristic(s) When applied to a stack of

images typical in medical imaging the resulting contours after image

segmentation can be used to create 3D reconstructions with the help of

interpolation algorithms like marching cubes

126 IMAGE RESTORATION

Image restoration like enhancement improves the qualities of image

but all the operations are mainly based on known measured or

degradations of the original image Image restorations are used to restore

images with problems such as geometric distortion improper focus

repetitive noise and camera motion It is used to correct images for known

degradations

127 FUNDAMENTAL STEPS

Image acquisition to acquire a digital image

Image preprocessing to improve the image in ways that increases the

chances for success of the other processes

Image segmentation to partitions an input image into its constituent parts or

objects

Image representation to convert the input data to a form suitable for

computer processing

Image description to extract features that result in some quantitative

information of interest or features that are basic for differentiating one

class of objects from another

Image recognition to assign a label to an object based on the

information provided by its descriptors

Image interpretation to assign meaning to an ensemble of recognized

objects

Knowledge about a problem domain is coded into an image processing

system in the form of a Knowledge database

13 A SIMPLE IMAGE MODEL

To be suitable for computer processing an image f(xy) must be digitalized

both spatially and in amplitude

Digitization of the spatial coordinates (xy) is called image sampling

Amplitude digitization is called gray-level quantization

The storage and processing requirements increase rapidly with the spatial

resolution and the number of gray levels

Example A 256 gray-level image of size 256x256 occupies 64K bytes of

memory

Images of very low spatial resolution produce a checkerboard effect

The use of insufficient number of gray levels in smooth areas of a digital

image results in false contouring

14 IMAGE FILE FORMATS

There are two general groups of lsquoimagesrsquo vector graphics (or line art)

and bitmaps (pixel-based or lsquoimagesrsquo) Some of the most common file

formats are

GIF mdash Graphical interchange Format An 8-bit (256 colour) non-

destructively compressed bitmap format Mostly used for web Has several

sub-standards one of which is the animated GIF

JPEG mdash Joint Photographic Experts Group a very efficient (ie much

information per byte) destructively compressed 24 bit (16 million colours)

bitmap format Widely used especially for web and Internet (bandwidth-

limited)

TIFF mdash Tagged Image File Format The standard 24 bit publication bitmap

format Compresses non-destructively with for instance Lempel-Ziv-

Welch (LZW) compression

PS mdash Postscript a standard vector format Has numerous sub-standards

and can be difficult to transport across platforms and operating systems

PSD ndash Adobe PhotoShop Document a dedicated Photoshop format that

keeps all the information in an image including all the layers

BMP- bit map file format

15 TYPE OF IMAGES

Images are 4 types

1 Binary image

2 Gray scale image

3 Color image

4 Indexed image

151 BINARY IMAGES

A binary image is a digital image that has only two possible values for

each pixel Typically the two colors used for a binary image are black and

white though any two colors can be used Binary images are also called bi-

level or two-level This means that each pixel is stored as a single bitmdashie

a 0 or 1 The names black-and-white BampW

152 GRAY SCALE IMAGE

In a (8-bit) grayscale image each picture element has an assigned intensity

that ranges from 0 to 255 A grey scale image is what people normally call

a black and white image but the name emphasizes that such an image will

also include many shades of grey

FIG

153 COLOR IMAGE

The RGB colour model relates very closely to the way we perceive

colour with the r g and b receptors in our retinas RGB uses additive colour

mixing and is the basic colour model used in television or any other

medium that projects colour with light It is the basic colour model used in

computers and for web graphics but it cannot be used for print production

The secondary colours of RGB ndash cyan magenta and yellow ndash are formed

by mixing two of the primary colours (red green or blue) and excluding the

third colour Red and green combine to make yellow green and blue to

make cyan and blue and red form magenta The combination of red green

and blue in full intensity makes white

In Photoshop using the ldquoscreenrdquo mode for the different layers in an

image will make the intensities mix together according to the additive

colour mixing model This is analogous to stacking slide images on top of

each other and shining light through them

FIG

CMYK The 4-colour CMYK model used in printing lays down

overlapping layers of varying percentages of transparent cyan (C) magenta

(M) and yellow (Y) inks In addition a layer of black (K) ink can be added

The CMYK model uses the subtractive colour model

154 INDEXED IMAGE

FIG

An indexed image consists of an array and a color map matrix The

pixel values in the array are direct indices into a color map By convention

this documentation uses the variable name X to refer to the array and map

to refer to the color map In computing indexed color is a technique to

manage digital images colors in a limited fashion in order to save

computer memory and file storage while speeding up display refresh and

file transfers It is a form of vector quantization compression

When an image is encoded in this way color information is not

directly carried by the image pixel data but is stored in a separate piece of

data called a palette an array of color elements in which every element a

color is indexed by its position within the array The image pixels do not

contain the full specification of its color but only its index in the palette

This technique is sometimes referred as pseudocolor or indirect color as

colors are addressed indirectly

Perhaps the first device that supported palette colors was a random-

access frame buffer described in 1975 by Kajiya Sutherland and Cheadle

This supported a palette of 256 36-bit RGB colors

16 Applications of image processing

Interest in digital image processing methods stems from 2 principal

application areas

1) Improvement of pictorial information for human interpretation

2) Processing of scene data for autonomous machine perception

In the second application area interest focuses on procedures for

extracting from an image

Information in a form suitable for computer processing

Examples include automatic character recognition industrial machine

vision for product assembly and inspection military recognizance

automatic processing of fingerprints etc

17 EXISTING SYSTEM

In Crihalmeanu and Ross proposed three approaches Speed Up Robust

Features (SURF)-based method minutiae detection and direct correlation

matching for feature registration and matching Within these three methods

the SURF method achieves the best accuracy It takes an average of 15

seconds1 using the SURF method to perform a one-to-one matching In

Zhou et al Proposed line descriptor-based method for sclera vein

recognition The matching step (including registration) is the most time-

consuming step in this sclera vein recognition system which costs about 12

seconds to perform a one-to-one matching Both speed was calculated using

a PC with Intelreg Core TM 2 Duo 24GHz processors and 4 GB DRAM

Currently Sclera vein recognition algorithms are designed using central

processing unit (CPU)-based systems

171 DISADVANTAGES OF EXISTING SYSTEM

1 Mask files are used to calculate valid overlapping areas of two sclera

templates and to align the templates to the same coordinate system But the

mask files are large in size and will preoccupy the GPU memory and slow

down the data transfer Also some of processing on the mask files will

involve convolution which is difficult to improve its performance on the

scalar process unit on CUDA

2 The procedure of sclera feature matching consists of a pipeline of several

computational stages with different memory and processing requirements

There is no uniform mapping scheme applicable for all these stages

3 When the scale of sclera database is far larger than the number of the

processing units on the GPU parallel matching on the GPU is still unable to

satisfy the requirement of real-time performance

LITERATURE SURVEY

1 S Crihalmeanu and A Ross ldquo ieee Multispectral scleral patterns

for ocular biometric recognitionrdquo Pattern Recognit Lett vol 33 no

14 pp 1860ndash1869 Oct 2012

Face recognition in unconstrained acquisition conditions is one of the

most challenging problems that has been actively researched in recent

years It is well known that many state-of-the-arts still face recognition

algorithms perform well when constrained (frontal well illuminated high-

resolution sharp and full) face images are acquired However their

performance degrades significantly when the test images contain variations

that are not present in the training images In this paper we highlight some

of the key issues in remote face recognition We define the remote face

recognition as one where faces are several tens of meters (10-250m) from

the cameras We then describe a remote face database which has been

acquired in an unconstrained outdoor maritime environment Recognition

performance of a subset of existing still image-based face recognition

algorithms is evaluated on the remote face data set Further we define the

remote re-identification problem as matching a subject at one location with

candidate sets acquired at a different location and over time in remote

conditions We provide preliminary experimental results on remote re-

identification It is demonstrated that in addition to applying a good

classification algorithm finding features that are robust to variations

mentioned above and developing statistical models which can account for

these variations are very important for remote face recognition

2 R N Rakvic B J Ulis R P Broussard R W Ives and N

Steiner1 ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics

Security

With the rapidly expanded biometric data collected by various sectors

of government and industry for identification and verification purposes

how to manage and process such Big Data draws great concern Even

though modern processors are equipped with more cores and memory

capacity it still requires careful design in order to utilize the hardware

resource effectively and the power consumption efficiently This research

addresses this issue by investigating the workload characteristics of

biometric application Taking Daugmanrsquos iris matching algorithm which

has been proven to be the most reliable iris matching method as a case

study we conduct performance profiling and binary instrumentation on the

benchmark to capture its execution behavior The results show that data

loading and memory access incurs great performance overhead and

motivates us to move the biometrics computation to high-performance

architecture

Modern iris recognition algorithms can be computationally intensive

yet are designed for traditional sequential processing elements such as a

personal computer However a parallel processing alternative using field

programmable gate arrays (FPGAs) offers an opportunity to speed up iris

recognition Within the means of this project iris template generation with

directional filtering which is acomputationally expensive yet parallel

portion of a modern iris recognition algorithm is parallelized on an FPGA

system We will present a performance comparison of the parallelized

algorithm on the FPGA system to a traditional CPU-based version The

parallelized template generation outperforms an optimized C++ code

version determining the information content of an iris approximately 324

times faster

3 R Derakhshani A Ross and S Crihalmeanu ldquoA new biometric

modality based on conjunctival vasculaturerdquo in Proc Artif Neural

Netw Eng 2006 pp 1ndash8

A new biometric indicator based on the patterns of conjunctival

vasculature is proposed Conjunctival vessels can be observed on the visible

part of the sclera that is exposed to the outside world These vessels

demonstrate rich and specific details in visible light and can be easily

photographed using a regular digital camera In this paper we discuss

methods for conjunctival imaging preprocessing and feature extraction in

order to derive a suitable conjunctival vascular template for biometric

authentication Commensurate classification methods along with the

observed accuracy are discussed Experimental results suggest the potential

of using conjunctival vasculature as a biometric measure Identification of

a person based on some unique set of features is an important task The

human identification is possible with several biometric systems and sclera

recognition is one of the promising biometrics The sclera is the white

portion of the human eye The vein pattern seen in the sclera region is

unique to each person Thus the sclera vein pattern is a well suited

biometric technology for human identification The existing methods used

for sclera recognition have some drawbacks like only frontal looking

images are preferred for matching and rotation variance is another problem

These problems are completely eliminated in the proposed system by using

two feature extraction techniques They are Histogram of Oriented

Gradients (HOG) and converting the image into polar form using the

bilinear interpolation technique These two features help the proposed

system to become illumination invariant and rotation invariant The

experimentation is done with the help of UBIRIS database The

experimental result shows that the proposed sclera recognition method can

achieve better accuracy than the previous methods

4 J D Owens M Houston D Luebke S Green J E Stone and J

C Phillips ldquoGPU computingrdquo Proc IEEE vol 96 no 5pp 879ndash899

May 2008

The graphics processing unit (GPU) has become an integral part of

todayrsquos mainstream computing systems Over the past six years there has

been a marked increase in the performance and capabilities of GPUs The

modern GPU is not only a powerful graphics engine but also a highly

parallel programmable processor featuring peak arithmetic and memory

bandwidth that substantially outpaces its CPU counterpart The GPUrsquos

rapid increase in both programmability and capability has spawned a

research community that has successfully mapped a broad range of

computationally demanding complex problems to the GPU This effort in

general purpose computing on the GPU also known as GPU computing

has positioned the GPU as a compelling alternative to traditional

microprocessors in high-performance computer systems of the future We

describe the background hardware and programming model for GPU

computing summarize the state of the art in tools and techniques and

present four GPU computing successes in game physics and computational

biophysics that deliver order-of-magnitude performance gains over

optimized CPU applications

5 H Proenccedila and L A Alexandre ldquoUBIRIS A noisy iris image

databaserdquoin Proc 13th Int Conf Image Anal Process 2005 pp 970ndash

977

This paper proposes algorithms for iris segmentation quality

enhancement match score fusion and indexing to improve both the

accuracy and the speed of iris recognition A curve evolution approach is

proposed to effectively segment a nonideal iris image using the modified

MumfordndashShah functional Different enhancement algorithms are

concurrently applied on the segmented iris image to produce multiple

enhanced versions of the iris image A support-vector-machine-based

learning algorithm selects locally enhanced regions from each globally

enhanced image and combines these good-quality regions to create a single

high-quality iris image Two distinct features are extracted from the high-

quality iris image The global textural feature is extracted using the 1-D log

polar Gabor transform and the local topological feature is extracted using

Euler numbers An intelligent fusion algorithm combines the textural and

topological matching scores to further improve the iris recognition

performance and reduce the false rejection rate whereas an indexing

algorithm enables fast and accurate iris identification The verification and

identification performance of the propose algorithms is validated and

compared with other algorithms using the CASIA Version 3 ICE 2005 and

UBIRIS iris databases

18 PROPOSED METHOD

Proposed a new parallel sclera vein recognition method using a two-

stage parallel approach for registration and matching A parallel sclera

matching solution for Sclera vein recognition using our sequential line-

descriptor method using the CUDA GPU architecture CUDA is a highly

parallel multithreaded many-core processor with tremendous

computational power

It supports not only a traditional graphics pipeline but also computation

on non-graphical data It is relatively straightforward to implement our C

program for CUDA on AMD-based GPU using Open CL Our CUDA

kernels can be directly converted to Open CL kernels by concerning

different syntax for various keywords and built-in functions The mapping

strategy is also effective in Open CL if we regard thread and block in

CUDA as work item and work-group in Open CL Most of our optimization

techniques such as coalesced memory access and prefix sum can work in

Open CL too Moreover since CUDA is a data parallel architecture the

implementation of our approach by Open CL should be programmed in

data-parallel model

In this research we first discuss why the naiumlve parallel approach would

not work We then propose the new sclera descriptor ndash the Y shape sclera

feature-based efficient registration method to speed up the mapping scheme

introduce the ldquoweighted polar line (WPL) descriptorrdquo that would be better

suited for parallel computing to mitigate the mask size issue and develop

our coarse to fine two-stage matching process to dramatically improve the

matching speed These new approaches make the parallel processing

possible and efficient

191PROPOSED SYSTEM ADVANTAGES

1 To improve the efficiency in this research we propose a new descriptor

mdash the Y shape descriptor which can greatly help improve the efficiency of

the coarse registration of two images and can be used to filter out some

non-matching pairs before refined matching

2 We propose the coarse-to-fine two-stage matching process In the first

stage we matched two images coarsely using the Y-shape descriptors

which is very fast to match because no registration was needed The

matching result in this stage can help filter out image pairs with low

similarities

CHAPTER 2

PROJECT DESCRIPTION

21 INTRODUCTION

The sclera is the opaque and white outer layer of the eye The blood

vessel structure of sclera is formed randomly and is unique to each person

which can be used for humanrsquos identification Several researchers have

designed different Sclera vein recognition methods and have shown that it

is promising to use Sclera vein recognition for human identification In

Crihalmeanu and Ross proposed three approaches Speed Up Robust

Features (SURF)-based method minutiae detection and direct correlation

matching for feature registration and matching Within these three methods

the SURF method achieves the best accuracy It takes an average of 15

seconds1 using the SURF method to per- form a one-to-one matching Zhou

et al proposed line descriptor-based method for sclera vein recognition

The matching step (including registration) is the most time-consuming step

in this sclera vein recognition system which costs about 12 seconds to

perform a one-to-one matching Both speed was calculated using a PC with

Intelreg Coretrade 2 Duo 24GHz processors and 4 GB DRAM Currently

Sclera vein recognition algorithms are designed using central processing

unit (CPU)-based systems

As discussed CPU-based systems are designed as sequential

processing devices which may not be efficient in data processing where the

data can be parallelized Because of large time consumption in the matching

step Sclera vein recognition using sequential-based method would be very

challenging to be implemented in a real time biometric system especially

when there is large number of templates in the database for matching GPUs

(as abbreviation of General purpose Graphics Processing Units GPGPUs)

are now popularly used for parallel computing to improve the

computational processing speed and efficiency The highly parallel

structure of GPUs makes them more effective than CPUs for data

processing where processing can be performed in parallel GPUs have been

widely used in biometrics recognition such as speech recognition text

detection handwriting recognition and face recognition In iris

recognition GPU was used to extract the features construct descriptors

and match templates

GPUs are also used for object retrieval and image search Park et al

designed the performance evaluation of image processing algorithms such

as linear feature extraction and multi-view stereo matching on GPUs

However these approaches were designed for their specific biometric

recognition applications and feature searching methods Therefore they may

not be efficient for Sclera vein recognition Compute Unified Device

Architecture (CUDA) the computing engine of NVIDIA GPUs is used in

this research CUDA is a highly parallel multithreaded many-core

processor with tremendous computational power It supports not only a

traditional graphics pipeline but also computation on non-graphical data

More importantly it offers an easier programming platform which

outperforms its CPU counterparts in terms of peak arithmetic intensity and

memory bandwidth In this research the goal is not to develop a unified

strategy to parallelize all sclera matching methods because each method is

quite different from one another and would need customized design To

develop an efficient parallel computing scheme it would need different

strategies for different Sclera vein recognition methods

Rather the goal is to develop a parallel sclera matching solution for

Sclera vein recognition using our sequential line-descriptor method using

the CUDA GPU architecture However the parallelization strategies

developed in this research can be applied to design parallel approaches for

other Sclera vein recognition methods and help parallelize general pattern

recognition methods Based on the matching approach in there are three

challenges to map the task of sclera feature matching to GPU

1) Mask files are used to calculate valid overlapping areas of two sclera

templates and to align the templates to the same coordinate system But the

mask files are large in size and will preoccupy the GPU memory and slow

down the data transfer Also some of processing on the mask files will

involve convolution which is difficult to improve its performance on the

scalar process unit on CUDA

2) The procedure of sclera feature matching consists of a pipeline of several

computational stages with different memory and processing requirements

There is no uniform mapping scheme applicable for all these stages

3) When the scale of sclera database is far larger than the number of the

processing units on the GPU parallel matching on the GPU is still unable to

satisfy the requirement of real-time performance New designs are

necessary to help narrow down the search range In summary naiumlve

implementation of the algorithms in parallel would not work efficiently

Note it is relatively straightforward to implement our C program for

CUDA on AMD-based GPU using Open CL Our CUDA kernels can be

directly converted to Open CL kernels by concerning different syntax for

various keywords and built-in functions The mapping strategy is also

effective in Open CL if we regard thread and block in CUDA as work item

and work-group in Open CL Most of our optimization techniques such as

coalesced memory access and prefix sum can work in Open CL too

Moreover since CUDA is a data parallel architecture the implementation

of our approach by Open CL should be programmed in data-parallel model

In this research we first discuss why the naiumlve parallel approach would not

work (Section 3) We then propose the new sclera descriptor ndash the Y shape

sclera feature-based efficient registration method to speed up the mapping

scheme (Section 4) introduce the ldquoweighted polar line (WPL) descriptorrdquo

that would be better suited for parallel computing to mitigate the mask size

issue (Section 5) and develop our coarse to fine two-stage matching

process to dramatically improve the matching speed (Section 6) These new

approaches make the parallel processing possible and efficient However it

is non-trivial to implement these algorithms in CUDA We then developed

the implementation schemes to map our algorithms into CUDA (Section 7)

In the Section 2 we give brief introduction of Sclera vein recognition In

the Section 8 we performed some experiments using the proposed system

In the Section 9 we draw some conclusions

22 BACKGROUND OF SCLERA VEIN RECOGNITION

221 OVERVIEW OF SCLERA VEIN RECOGNITION

A typical sclera vein recognition system includes sclera

segmentation feature enhancement feature extraction and feature

matching (Figure 1)

FIG

Sclera image segmentation is the first step in sclera vein recognition

Several methods have been designed for sclera segmentation Crihalmeanu

et al presented an semi-automated system for sclera segmentation They

used a clustering algorithm to classify the color eye images into three

clusters - sclera iris and background Later on Crihalmeanu and Ross

designed a segmentation approach based on a normalized sclera index

measure which includes coarse sclera segmentation pupil region

segmentation and fine sclera segmentation Zhou et al developed a skin

tone plus ldquowhite colorrdquo-based voting method for sclera segmentation in

color images and Otsursquos thresholding-based method for grayscale images

After sclera segmentation it is necessary to enhance and extract the sclera

features since the sclera vein patterns often lack contrast and are hard to

detect Zhou et al used a bank of multi-directional Gabor filters for

vascular pattern enhancement Derakhshani et al used contrast limited

adaptive histogram equalization (CLAHE) to enhance the green color plane

of the RGB image and a multi-scale region growing approach to identify

the sclera veins from the image background Crihalmeanu and Ross applied

a selective enhancement filter for blood vessels to extract features from the

green component in a color image In the feature matching step

Crihalmeanu and Ross proposed

three registration and matching approaches including Speed Up Robust

Features (SURF) which is based on interest-point detection minutiae

detection which is based on minutiae points on the vasculature structure

and direct correlation matching which relies on image registration Zhou et

al designed a line descriptor based feature registration and matching

method

The proposed sclera recognition consists of five steps which include

sclera segmentation vein pattern enhancement feature extraction feature

matching and matching decision Fig 2 shows the block diagram of sclera

recognition Two types of feature extraction are used in the proposed

method to achieve good accuracy for the identification The characteristics

that are elicited from the blood vessel structure seen in the sclera region are

Histogram of Oriented Gradients (HOG) and interpolation of Cartesian to

Polar conversion HOG is used to determine the gradient orientation and

edge orientations of vein pattern in the sclera region of an eye image To

become more computationally efficient the data of the image are converted

to the polar form It is mainly used for circular or quasi circular shape of

object These two characteristics are extracted from all the images in the

database and compared with the features of the query image whether the

person is correctly identified or not This procedure is done in the feature

matching step and ultimately makes the matching decision By using the

proposed feature extraction methods and matching techniques the human

identification is more accurate than the existing studies In the proposed

method two features of an image are drawn out

222 SCLERA SEGMENTATION

Sclera segmentation is the first step in the sclera recognition It lets

in three steps glare area detection sclera area estimation and iris and eyelid

detection and refinement Fig shows the steps of segmentation

FIG

Glare Area Detection Glare area means a small bright area near

pupil or iris This is the unwanted portion on the eye image Sobel filter is

applied to detect the glare area present in the iris or pupil Simply it runs

only for the grayscale image If the image is color then it needs a

conversion to grayscale image and after that apply it to the Sobel filter to

detect the glare area Fig 4 shows the result of the glare area detection

FIG

Sclera area estimation For the estimation of sclera area Otsursquos

thresholding method is applied The stairs of the sclera area detection are

selection of the area of interest (ROI) Otsursquos thresholding sclera area

detection Left and right sclera area is selected based on the iris boundaries

When the region of interest is selected then apply Otsursquos thresholding for

obtaining the potential sclera areas The correct left sclera area should be

placed in the right and center positions and correct right sclera area should

be placed in the left and center In this way non sclera areas are wiped out

223 IRIS AND EYELID REFINEMENT

The top and underside of the sclera regions are the limits of the

sclera area And then that upper eyelid lower eyelid and iris boundaries are

refined These altogether are the unwanted portion for recognition In order

to eliminate these effects refinement is done in the footstep of the detection

of sclera area Fig shows after the Otsursquos thresholding process and iris and

eyelid refinement to detect right sclera area In the same way the left sclera

area is detected using this method

FIG

In the segmentation process all images are not perfectly segmented

Hence feature extraction and matching are needed to reduce the

segmentation fault The vein patterns in the sclera area are not visible in the

segmentation process To get vein patterns more visible vein pattern

enhancement is to be performed

224 OCULAR SURFACE VASCULATURE

Human recognition using vascular patterns in the human body has

been studied in the context of fingers (Miura et al 2004) palm (Lin and

Fan 2004) and retina (Hill 1999) In the case of retinal biometrics an

especial optical device for imaging the back of the eyeball is needed (Hill

1999) Due to its perceived invasiveness and the required degree of subject

cooperation the use of retinal biometrics may not be acceptable to some

individuals The conjunctiva is a thin transparent and moist tissue that

covers the outer surface of the eye The part of the conjunctiva that covers

the inner lining of the eyelids is called palpebral conjunctiva and the part

that covers the outer surface of the eye is called ocular (or the bulbar)

conjunctiva which is the focus of this study The ocular conjunctiva is very

thin and clear thus the vasculature (including those of the episclera) is

easily visible through it The visible microcirculation of conjunctiva offers a

rich and complex network of veins and fine microcirculation (Fig 1) The

apparent complexity and specificity of these vascular patterns motivated us

to utilize them for personal identification (Derakhshani and Ross 2006)

FIG

We have found conjunctival vasculature to be a suitable biometric as it

conforms to the following criteria (Jain et al 2004)

UNIVERSALITY All normal living tissues including that of the

conjunctiva and episclera have vascular structure

UNIQUENESS Vasculature is created during embryonic vasculogenesis

Its detailed final structure is mostly stochastic and thus unique Even

though no comprehensive study on the uniqueness of vascular structures

has been conducted study of some targeted areas such as those of the eye

fundus confirm the uniqueness of such vascular patterns even between

identical twins (Simon and Goldstein 1935 Tower 1955)

PERMANENCE Other than cases of significant trauma pathology or

chemical intervention spontaneous adult ocular vasculogenesis and

angiogenesis does not easily occur Thus the conjunctival vascular

structure is expected to have reasonable permanence (Joussen 2001)

Practicality Conjunctival vasculature can be captured with commercial off

the shelf digital cameras under normal lighting conditions making this

modality highly practical

ACCEPTABILITY Since the subject is not required to stare directly into

the camera lens and given the possibility of capturing the conjunctival

vasculature from several feet away this modality is non-intrusive and thus

more acceptable

SPOOF-PROOFNESS The fine multi surface structure of the ocular

veins makes them hard to reproduce as a physical artifact Besides being a

stand-alone biometric modality we anticipate that the addition of

conjunctival biometrics will enhance the performance of current iris-based

biometric system in the following ways

Improving accuracy by the addition of vascular features

Facilitating recognition using off-angle iris images For instance if the iris

information is relegated to the left or right portions of the eye the sclera

vein patterns will be further exposed This feature makes sclera vasculature

a natural complement to the iris biometric

Addressing the failure-to-enroll issue when iris patterns are not usable (eg

due to surgical procedures)

Reducing vulnerability to spoof attacks For instance when implemented

alongside iris systems an attacker needs to reproduce not only the iris but

also different surfaces of the sclera along with the associated

microcirculation and make them available on commensurate eye surfaces

The first step in parallelizing an algorithm is to determine the

availability for simultaneous computation Below Figure demonstrates the

possibility for parallel directional filtering Since the filter is computed over

different portions of the input image the computation can be computed in

parallel (denoted by Elements below) In addition individual parallelization

of each element of Filtering can also be performed A detailed discussion of

our proposed parallelization is outside the scope of this paper

FIG

FIG

225 OVERVIEW OF THE LINE DESCRIPTOR-BASED SCLERA

VEIN

2251 RECOGNITION METHOD

The matching segment of the line-descriptor based method is a

bottleneck with regard to matching speed In this section we briefly

describe the Line Descriptor-based sclera vein recognition method After

segmentation vein patterns were enhanced by a bank of directional Gabor

filters Binary morphological operations are used to thin the detected vein

structure down to a single pixel wide skeleton and remove the branch

points The line descriptor is used to describe the segments in the vein

structure Figure 2 shows a visual description of the line descriptor Each

segment is described by three quantities the segments angle to some

reference angle at the iris center θ the segments distance to the iris center r

and the dominant angular orientation of the line segment ɸ Thus the

descriptor is S = ( θ r ɸ )T The individual components of the line descriptor

are calculated as

FIG

Here fline (x) is the polynomial approximation of the line segment (xl yl )

is the center point of the line segment (xi yi ) is the center of the detected

iris and S is the line descriptor In order to register the segments of the

vascular patterns a RANSAC-based algorithm is used to estimate the best-

fit parameters for registration between the two sclera vascular patterns For

the registration algorithm it randomly chooses two points ndash one from the

test template and one from the target template It also randomly chooses a

scaling factor and a rotation value based on a priori knowledge of the

database Using these values it calculates a fitness value for the registration

using these parameters

After sclera template registration each line segment in the test

template is compared to the line segments in the target template for

matches In order to reduce the effect of segmentation errors we created the

weighting image (Figure 3) from the sclera mask by setting interior pixels

in the sclera mask to 1 pixels within some distance of the boundary of the

mask to 05 and pixels outside the mask to 0

The matching score for two segment descriptors is calculated By

where Si and Sj are two segment descriptors m(Si Sj ) is the matching

score between segments Si and Sj d(Si Sj ) is the Euclidean distance

between the segment descriptors center points (from Eq 6-8) Dmatch is

the matching distance threshold and match is the matching angle threshold

The total matching score M is the sum of the individual matching scores

divided by the maximum matching score for the minimal set between the

test and target template That is one of the test or target templates has fewer

points and thus the sum of its descriptors weight sets the maximum score

that can be attained

FIG

FIG

FIG

FIG

movement of eye Y shape branches are observed to be a stable feature and

can be used as sclera feature descriptor To detect the Y shape branches in

the original template we search for the nearest neighbors set of every line

segment in a regular distance classified the angles among these neighbors

If there were two types of angle values in the line segment set this set may

be inferred as a Y shape structure and the line segment angles would be

recorded as a new feature of the sclera

There are two ways to measure both orientation and relationship of

every branch of Y shape vessels one is to use the angles of every branch to

x axle the other is to use the angels between branch and iris radial

direction The first method needs additional rotation operating to align the

template In our approach we employed the second method As Figure 6

shows ϕ1 ϕ2 and ϕ3 denote the angle between each branch and the radius

from pupil center Even when the head tilts the eye moves or the camera

zooms occurs at the image acquisition step ϕ1 ϕ2 and ϕ3 are quite stable

To tolerate errors from the pupil center calculation in the segmentation step

we also recorded the center position (x y) of the Y shape branches as

auxiliary parameters So in our rotation shift and scale invariant feature

vector is defined as y(ϕ1 ϕ2 ϕ3 x y) The Y-shape descriptor is generated

with reference to the iris center Therefore it is automatically aligned to the

iris centers It is a rotational- and scale- invariant descriptor V WPL

SCLERA DESCRIPTOR As we discussed in the Section 22 the line

descriptor is extracted from the skeleton of vessel structure in binary images

(Figure 7) The skeleton is then broken into smaller segments For each

segment a line descriptor is created to record the center and orientation of

the segment This descriptor is expressed as s(x yɸ) where (x y) is the

position of the center and ɸ is its orientation Because of the limitation of

segmentation accuracy the descriptor in the boundary of sclera area might

not be accurate and may contain spur edges resulting from the iris eyelid

andor eyelashes To be tolerant of such error the mask file

FIG

The line descriptor of the sclera vessel pattern (a) An eye image (b) Vessel

patterns in sclera (c) Enhanced sclera vessel patterns (d) Centers of line

segments of vessel patterns

Is designed to indicate whether a line segment belongs to the edge of the

sclera or not However in GPU application using the mask is a challenging

since the mask files are large in size and will occupy the GPU memory and

slow down the data transfer When matching the registration RANSAC

type algorithm was used to randomly select the corresponding descriptors

and the transform parameter between them was used to generate the

template transform affine matrix After every templates transform the mask

data should also be transformed and new boundary should be calculated to

evaluate the weight of the transformed descriptor This results in too many

convolutions in processor unit

To reduce heavy data transfer and computation we designed the

weighted polar line (WPL) descriptor structure which includes the

information of mask and can be automatically aligned We extracted the

relationship of geometric feature of descriptors and store them as a new

descriptor We use a weighted image created via setting various weight

values according to their positions The weight of those descriptors who are

beyond the sclera are set to be 0 and those who are near the sclera

boundary are 05 and interior descriptors are set to be 1 In our work

descriptors weights were calculated on their own mask by the CPU only

once

The calculating result was saved as a component of descriptor The

descriptor of sclera will change to s(x y ɸw) where w denotes the weight

of the point and the value may be 0 05 1 To align two templates when a

template is shifted to another location along the line connecting their

centers all the descriptors of that template will be transformed It would be

faster if two templates have similar reference points If we use the center of

the iris as the reference point when two templates are compared the

correspondence will automatically be aligned to each other since they have

the similar reference point Every feature vector of the template is a set of

line segment descriptors composed of three variable (Figure 8) the

segment angle to the reference line which went through the iris center

denoted as θ the distance between the segments center and pupil center

which is denoted as r the dominant angular orientation of the segment

denoted as ɸ To minimize the GPU computing we also convert the

descriptor value from polar coordinate to rectangular coordinate in CPU

preprocess

The descriptor vector becomes s(x y r θ ɸw) The left and right

parts of sclera in an eye may have different registration parameters For

example as an eyeball moves left left part sclera patterns of the eye may be

compressed while the right part sclera patterns are stretched

In parallel matching these two parts are assigned to threads in

different warps to allow different deformation The multiprocessor in

CUDA manages threads in groups of 32 parallel threads called warps We

reorganized the descriptor from same sides and saved

FIG

FIG

them in continuous address This would meet requirement of coalesced

memory access in GPU

After reorganizing the structure of descriptors and adding mask information

into the new descriptor the computation on the mask file is not needed on

the GPU It was very fast to match with this feature because it does not

need to reregister the templates every time after shifting Thus the cost of

data transfer and computation on GPU will be reduced Matching on the

new descriptor the shift parameters generator in Figure 4 is then simplified

as Figure 9

23 EVOLUTION OF GPU ARCHITECTURE

The fixed-function pipeline lacked the generality to efficiently express

more complicated shading and lighting operations that are essential for

complex effects The key step was replacing the fixed-function per-vertex

and per-fragment operations with user-specified programs run on each

vertex and fragment Over the past six years these vertex programs and

fragment programs have become increasingly more capable with larger

limits on their size and resource consumption with more fully featured

instruction sets and with more flexible control-flow operations After many

years of separate instruction sets for vertex and fragment operations current

GPUs support the unified Shader Model 40 on both vertex and fragment

shaders

The hardware must support shader programs of at least 65 k static

instructions and unlimited dynamic instructions

The instruction set for the first time supports both 32-bit integers and 32-

bit floating-point numbers

The hardware must allow an arbitrary number of both direct and indirect

reads from global memory (texture)

Finally dynamic flow control in the form of loops and branches must be

supported

As the shader model has evolved and become more powerful and GPU

applications of all types have increased vertex and fragment program

complexity GPU architectures have increasingly focused on the

programmable parts of the graphics pipeline Indeed while previous

generations of GPUs could best be described as additions of

programmability to a fixed-function pipeline todayrsquos GPUs are better

characterized as a programmable engine surrounded by supporting fixed-

function units General-Purpose Computing on the GPU Mapping general-

purpose computation onto the GPU uses the graphics hardware in much the

same way as any standard graphics application Because of this similarity it

is both easier and more difficult to explain the process On one hand the

actual operations are the same and are easy to follow on the other hand the

terminology is different between graphics and general-purpose use Harris

provides an excellent description of this mapping process

We begin by describing GPU programming using graphics terminology

then show how the same steps are used in a general-purpose way to author

GPGPU applications and finally use the same steps to show the more

simple and direct way that todayrsquos GPU computing applications are written

231 PROGRAMMING A GPU FOR GRAPHICS

We begin with the same GPU pipeline that we described in Section II

concentrating on the programmable aspects of this pipeline

The programmer specifies geometry that covers a region on the screen

The rasterizer generates a fragment at each pixel location covered by that

geometry

Each fragment is shaded by the fragment program

The fragment program computes the value of the fragment by a

combination of math operations and global memory reads from a global

Btexture[ memory

The resulting image can then be used as texture on future passes through

the graphics pipeline

232 PROGRAMMING A GPU FOR GENERAL-PURPOSE

PROGRAMS (OLD)

Coopting this pipeline to perform general-purpose computation

involves the exact same steps but different terminology A motivating

example is a fluid simulation computed over a grid at each time step we

compute the next state of the fluid for each grid point from the current state

at its grid point and at the grid points of its neighbors

The programmer specifies a geometric primitive that covers a

computation domain of interest The rasterizer generates a fragment at each

pixel location covered by that geometry (In our example our primitive

must cover a grid of fragments equal to the domain size of our fluid

simulation)

Each fragment is shaded by an SPMD general-purpose fragment

program (Each grid point runs the same program to update the state of its

fluid)

The fragment program computes the value of the fragment by a

combination of math operations and Bgather[ accesses from global

memory (Each grid point can access the state of its neighbors from the

previous time step in computing its current value)

The resulting buffer in global memory can then be used as an input on

future passes (The current state of the fluid will be used on the next time

step)

23 2 PROGRAMMING A GPU FOR GENERAL-PURPOSE

PROGRAMS (NEW)

One of the historical difficulties in programming GPGPU applications

has been that despite their general-purpose tasksrsquo having nothing to do with

graphics the applications still had to be programmed using graphics APIs

In addition the program had to be structured in terms of the graphics

pipeline with the programmable units only accessible as an intermediate

step in that pipeline when the programmer would almost certainly prefer to

access the programmable units directly The programming environments we

describe in detail in Section IV are solving this difficulty by providing a

more natural direct non-graphics interface to the hardware and

specifically the programmable units Today GPU computing applications

are structured in the following way

The programmer directly defines the computation domain of interest as a

structured grid of threads

An SPMD general-purpose program computes the value of each thread

The value for each thread is computed by a combination of math

operations and both Bgather[ (read) accesses from and Bscatter[ (write)

accesses to global memory Unlike in the previous two

methods the same buffer can be used for both reading and writing

allowing more flexible algorithms (for example in-place algorithms that

use less memory)

The resulting buffer in global memory can then be used as an input in

future computation

24 COARSE-TO-FINE TWO-STAGE MATCHING PROCESS

To further improve the matching process we propose the coarse-to-fine

two-stage matching process In the first stage we matched two images

coarsely using the Y-shape descriptors which is very fast to match because

no registration was needed The matching result in this stage can help filter

out image pairs with low similarities After this step it is still possible for

some false positive matches In the second stage we used WPL descriptor

to register the two images for more detailed descriptor matching including

scale- and translation invariance This stage includes shift transform affine

matrix generation and final WPL descriptor matching Overall we

partitioned the registration and matching processing into four kernels2 in

CUDA (Figure 10) matching on the Y shape descriptor shift

transformation affine matrix generation and final WSL descriptor

matching Combining these two stages the matching program can run faster

and

achieve more accurate score

241 STAGE I MATCHING WITH Y SHAPE DESCRIPTOR

Due to scale- and rotation- invariance of the Y-shape features

registration is unnecessary before matching on Y shape descriptor The

whole matching algorithm is listed as algorithm 1

FIG

Here ytei and yta j are the Y shape descriptors of test template Tte

and target template Tta respectively dϕ is the Euclidian distance of angle

element of descriptors vector defined as (3) dxy is the Euclidian distance of

two descriptor centers defined as (4) ni and di are the matched descriptor

pairsrsquo number and their centers distance respectively tϕ is a distance

threshold and txy is the threshold to restrict the searching area We set tϕ to

30 and txy to 675 in our experiment Here

To match two sclera templates we searched the areas nearby to all

the Y shape branches The search area is limited to the corresponding left or

right half of the sclera in order to reduce the searching range and time The

distance of two branches is defined in (3) where ϕi j is the angle between

the j th branch and the polar from pupil center in desctiptor i

The number of matched pairs ni and the distance between Y shape

branches centers di are stored as the matching result We fuse the number of

matched branches and the average distance between matched branches

centers as (2) Here α is a factor to fuse the matching score which was set

to 30 in our study Ni and Nj is the total numbers of feature vectors in

template i and j separately The decision is regulated by the threshold t if

the sclerarsquos matching score is lower than t the sclera will be discarded The

sclera with high matching score will be passed to the next more precisely

matching process

242 STAGE II FINE MATCHING USING WPL DESCRIPTOR

The line segment WSL descriptor reveals more vessel structure detail of

sclera than the Y shape descriptor The variation of sclera vessel pattern is

nonlinear because

When acquiring an eye image in different gaze angle the vessel structure

will appear nonlinear shrink or extend because eyeball is spherical in shape

sclera is made up of four layers episclera stroma lamina fusca and

endothelium There are slightly differences among movement of these

layers Considering these factors our registration employed both single

shift transform and multi-parameter transform which combines shift

rotation and scale together

1) SHIFT PARAMETER SEARCH As we discussed before

segmentation may not be accurate As a result the detected iris center could

not be very accurate Shift transform is designed to tolerant possible errors

in pupil center detection in the segmentation step If there is no deformation

or only very minor deformation registration with shift transform together

would be adequate to achieve an accurate result We designed Algorithm 2

to get optimized shift parameter Where Tte is the test template and ssei is

the i th WPL descriptor of Tte Tta is the target template and ssai is the i th

WPL descriptor of Ttad(stek staj ) is Euclidean distance of escriptors stek

and staj

Δsk is the shift value of two descriptors defines as

We first randomly select an equal number of segment descriptors

stek in test template Tte from each quad and find its nearest neighbors staj _

in target template Tta The shift offset of them is recorded as the possible

registration shift factor _sk The final offset registration factor is _soptim

which has the smallest standard deviation among these candidate offsets

2) AFFINE TRANSFORM PARAMETER SEARCH

Affine transform is designed to tolerant some deformation of sclera

patterns in the matching step The affine transform algorithm is shown in

Algorithm 3 The shift value in the parameter set is obtained by randomly

selecting descriptor s(it )te and calculating the distance from its nearest

neighbor staj_ in Tta We transform the test template by the matrix in (7)

At end of the iteration we count the numbers of matched descriptor pairs

from the transformed template and the target template The factor β is

involved to determine if the pair of descriptor is matched and we set it to

be 20 pixels in our experiment After N iterations the optimized transform

parameter set is determined via selecting the maximum matching numbers

m(it) Here stei Tte staj and Tta is defined same as algorithm 2 tr (it )

shi f t θ(it )tr (it ) scale is the parameters of shift rotation and scale

transform generated in i tth iteration R(θ (it )) T (tr (it ) shi f t ) and S(tr (it

) scale) are the transform matrix defined as (7) To search optimize

transform parameter we iterated N times to generate these parameters In

our experiment we set iteration time to 512

3) REGISTRATION AND MATCHING ALGORITHM

Using the optimized parameter set determined from Algorithms 2

and 3 the test template will be registered and matched simultaneously The

registration and matching algorithm is listed in Algorithm 4 Here stei Tte

staj and Tta are defined same as Algorithms 2 and 3 θ(optm) tr (optm) shi f

t tr (optm) scale _soptim are the registration parameters attained from

Algorithms 2 and 3 R_θ(optm)_T _tr (optm) shi f t _S_tr (optm) scale

_ is the descriptor transform matrix defined in Algorithm 3 empty is the angle

between the segment descriptor and radius direction w is the weight of the

descriptor which indicates whether the descriptor is at the edge of sclera or

not To ensure that the nearest descriptors have a similar orientation we

used a constant factor α to check the abstract difference of two ɸ In our

experiment we set α to 5 The total matching score is minimal score of two

transformed result divided by the minimal matching score for test template

and target template

25 MAPPING THE SUBTASKS TO CUDA

CUDA is a single instruction multiple data (SIMD) system and

works as a coprocessor with a CPU A CUDA consists of many streaming

multiprocessors (SM) where the parallel part of the program should be

partitioned into threads by the programmer and mapped into those threads

There are multiple memory spaces in the CUDA memory hierarchy

register local memory shared memory global memory constant memory

and texture memory Register local memory and shared memory are on-

chip and could be a little time consuming to access these memories Only

shared memory can be accessed by other threads within the same block

However there is only limited availability of shared memory Global

memory constant memory and texture memory are off-chip memory and

accessible by all threads which would be very time consuming to access

these memories

Constant memory and texture memory are read-only and cacheable

memory Mapping algorithms to CUDA to achieve efficient processing is

not a trivial task There are several challenges in CUDA programming

If threads in a warp have different control path all the branches will be

executed serially To improve performance branch divergence within a

warp should be avoided

Global memory is slower than on-chip memory in term of access To

completely hide the latency of the small instructions set we should use on-

chip memory preferentially rather than global memory When global

memory access occurs threads in same warp should access the words in

sequence to achieve coalescence

Shared memory is much faster than the local and global memory space

But shared memory is organized into banks which are equal in size If two

addresses of memory request from different thread within a warp fall in the

same memory bank the access will be serialized To get maximum

performance memory requests should be scheduled to minimize bank

conflicts

251 MAPPING ALGORITHM TO BLOCKS

Because the proposed registration and matching algorithm has four

independent modules all the modules will be converted to different kernels

on the GPU These kernels are different in computation density thus we

map them to the GPU by various map strategies to fully utilize the

computing power of CUDA Figure 11 shows our scheme of CPU-GPU

task distribution and the partition among blocks and threads Algorithm 1 is

partitioned into coarse-grained parallel subtasks

We create a number of threads in this kernel The number of threads

is the same as the number of templates in the database As the upper middle

column shows in Figure 11 each target template will be assigned to one

thread One thread performs a pair of templates compare In our work we

use NVIDIA C2070 as our GPU Threads and blocks number is set to

1024 That means we can match our test template with up to 1024times1024

target templates at same time

Algorithms 2-4 will be partitioned into fine-grained subtasks which is

processed a section of descriptors in one thread As the lower portion of the

middle column shows in Figure 11 we assigned a target template to one

block Inside a block one thread corresponds a set of descriptors in this

template This partition makes every block execute independently and there

are no data exchange requirements between different blocks When all

threads complete their responding descriptor fractions the sum of the

intermediate results needs to be computed or compared A parallel prefix

sum algorithm is used to calculate the sum of intermediate results which is

show in right of Figure 11 Firstly all odd number threads compute the sum

of consecutive pairs of the results Then recursively every first of i (= 4 8

16 32 64 ) threads

compute the prefix sum on the new result The final result will be saved in

the first address which has the same variable name as the first intermediate

result

252 MAPPING INSIDE BLOCK

In shift argument searching there are two schemes we can choose to

map task

Mapping one pair of templates to all the threads in a block and then every

thread would take charge of a fraction of descriptors and cooperation with

other threads

Assigning a single possible shift offset to a thread and all the threads will

compute independently unless the final result should be compared with

other possible offset

Due to great number of sum and synchronization operations in every

nearest neighbor searching step we choose the second method to parallelize

shift searching In affine matrix generator we mapped an entire parameter

set searching to a thread and every thread randomly generated a set of

parameters and tried them independently The generated iterations were

assigned to all threads The challenge of this step is the randomly generated

numbers might be correlated among threads In the step of rotation and

scale registration generating we used the Mersenne Twister pseudorandom

number generator because it can use bitwise arithmetic and have long

period

The Mersenne twister as most of pseudorandom generators is iterative

Therefore itrsquos hard to parallelize a single twister state update step among

several execution threads To make sure that thousands of threads in the

launch grid generate uncorrelated random sequence many simultaneous

Mersenne twisters need to process with different initial states in parallel

But even ldquovery differentrdquo (by any definition) initial state values do not

prevent the emission of correlated sequences by each generator sharing

identical parameters To solve this problem and to enable efficient

implementation of Mersenne Twister on parallel architectures we used a

special offline tool for the dynamic creation of Mersenne Twisters

parameters modified from the algorithm developed by Makoto Matsumoto

and Takuji Nishimura In the registration and matching step when

searching the nearest neighbor a line segment that has already matched

with others should not be used again In our approach a flag

FIG

FIG

Variable denoting whether the line has been matched is stored in

shared memory To share the flags all the threads in a block should wait

synchronic operation at every query step Our solution is to use a single

thread in a block to process the matching

253 MEMORY MANAGEMENT

The bandwidth inside GPU board is much higher than the

bandwidth between host memory and device memory The data transfer

between host and device can lead to long latency As shown in Figure 11

we load the entire target templates set from database without considering

when they would be processed Therefore there was no data transfer from

host to device during the matching procedure In global memory the

components in descriptor y(ϕ1 ϕ2 ϕ3 x y) and s(x y rθ ϕw) were stored

separately This would guarantee contiguous kernels of Algorithm 2 to 4

can access their data in successive addresses Although such coalescing

access reduces the latency frequently global memory access was still a

slower way to get data In our kernel we loaded the test template to shared

memory to accelerate memory access Because the Algorithms 2 to 4

execute different number of iterations on same data the bank conflict does

not happen To maximize our texture memory space we set the system

cache to the lowest value and bonded our target descriptor to texture

memory Using this catchable memory our data access was accelerated

more

FIG

26 HISTOGRAM OF ORIENTED GRADIENTS

Histogram of oriented gradients is the feature descriptors It is primarily

applied to the design of target detection In this paper it is applied as the

feature for human recognition In the sclera region the vein patterns are the

edges of an image So HOG is used to determine the gradient orientation

and edge orientations of vein pattern in the sclera region of an eye image

To follow out this technique first of all divide the image into small

connected regions called cells For each cell compute the histogram of

gradient directions or edge orientations of the pixels Then the combination

of different histogram of different cell represents the descriptor To improve

accuracy histograms can be contrast normalized by calculating the intensity

from the block and then using this value normalizes all cells within the

block This normalization result shows that it is invariant to geometric and

photometric changes The gradient magnitude m(x y) and orientation 1050592(x

y) are calculated using x and y directions gradients dx (x y) and dy (x y)

Orientation binning is the second step of HOG This method utilized

to create cell histograms Each pixel within the cell used to give a weight to

the orientation which is found in the gradient computation Gradient

magnitude is used as the weight The cells are in the rectangular form The

binning of gradient orientation should be spread over 0 to 180 degrees and

opposite direction counts as the same In the Fig 8 depicts the edge

orientation of picture elements If the images have any illumination and

contrast changes then the gradient strength must be locally normalized For

that cells are grouped together into larger blocks These blocks are

overlapping so that each cell contributes more than once to the final

descriptor Here rectangular HOG (R-HOG) blocks are applied which are

mainly in square grids The performance of HOG is improved by putting

on a Gaussian window into each block

FIG

CHAPTER 3

SOFTWARE SPECIFICATION

31 GENERAL

MATLAB(matrix laboratory) is a numerical

computing environment and fourth-generation programming language

Developed by Math Works MATLAB allows matrix manipulations

plotting of functions and data implementation of algorithms creation

of user interfaces and interfacing with programs written in other languages

including C C++ Java and Fortran

Although MATLAB is intended primarily for numerical computing an

optional toolbox uses the MuPAD symbolic engine allowing access

to symbolic computing capabilities An additional package Simulink adds

graphicalmulti-domainsimulationandModel-Based

Design for dynamic and embedded systems

In 2004 MATLAB had around one million users across industry

and academia MATLAB users come from various backgrounds

of engineering science and economics MATLAB is widely used in

academic and research institutions as well as industrial enterprises

MATLAB was first adopted by researchers and practitioners

in control engineering Littles specialty but quickly spread to many other

domains It is now also used in education in particular the teaching

of linear algebra and numerical analysis and is popular amongst scientists

involved in image processing The MATLAB application is built around the

MATLAB language The simplest way to execute MATLAB code is to type

it in the Command Window which is one of the elements of the MATLAB

Desktop When code is entered in the Command Window MATLAB can

be used as an interactive mathematical shell Sequences of commands can

be saved in a text file typically using the MATLAB Editor as a script or

encapsulated into a function extending the commands available

MATLAB provides a number of features for documenting and

sharing your work You can integrate your MATLAB code with other

languages and applications and distribute your MATLAB algorithms and

applications

32 FEATURES OF MATLAB

High-level language for technical computing

Development environment for managing code files and data

Interactive tools for iterative exploration design and problem solving

Mathematical functions for linear algebra statistics Fourier analysis

filtering optimization and numerical integration

2-D and 3-D graphics functions for visualizing data

Tools for building custom graphical user interfaces

Functions for integrating MATLAB based algorithms with external

applications and languages such as C C++ FORTRAN Javatrade COM

and Microsoft Excel

MATLAB is used in vast area including signal and image

processing communications control design test and measurement

financial modeling and analysis and computational Add-on toolboxes

(collections of special-purpose MATLAB functions) extend the MATLAB

environment to solve particular classes of problems in these application

areas

MATLAB can be used on personal computers and powerful

server systems including the Cheaha compute cluster With the addition of

the Parallel Computing Toolbox the language can be extended with parallel

implementations for common computational functions including for-loop

unrolling Additionally this toolbox supports offloading computationally

intensive workloads to Cheaha the campus compute cluster MATLAB is

one of a few languages in which each variable is a matrix (broadly

construed) and knows how big it is Moreover the fundamental operators

(eg addition multiplication) are programmed to deal with matrices when

required And the MATLAB environment handles much of the bothersome

housekeeping that makes all this possible Since so many of the procedures

required for Macro-Investment Analysis involves matrices MATLAB

proves to be an extremely efficient language for both communication and

implementation

321 INTERFACING WITH OTHER LANGUAGES

MATLAB can call functions and subroutines written in the C

programming language or FORTRAN A wrapper function is created

allowing MATLAB data types to be passed and returned The dynamically

loadable object files created by compiling such functions are termed MEX-

files (for MATLAB executable)

Libraries written in Java ActiveX or NET can be directly called

from MATLAB and many MATLAB libraries (for

example XML or SQL support) are implemented as wrappers around Java

or ActiveX libraries Calling MATLAB from Java is more complicated but

can be done with MATLAB extension which is sold separately by Math

Works or using an undocumented mechanism called JMI (Java-to-Mat lab

Interface) which should not be confused with the unrelated Java that is also

called JMI

As alternatives to the MuPAD based Symbolic Math Toolbox

available from Math Works MATLAB can be connected

to Maple or Mathematical

Libraries also exist to import and export MathML

Development Environment

Startup Accelerator for faster MATLAB startup on Windows especially on

Windows XP and for network installations

Spreadsheet Import Tool that provides more options for selecting and

loading mixed textual and numeric data

Readability and navigation improvements to warning and error messages in

the MATLAB command window

Automatic variable and function renaming in the MATLAB Editor

Developing Algorithms and Applications

MATLAB provides a high-level language and development

tools that let you quickly develop and analyze your algorithms and

applications

The MATLAB Language

The MATLAB language supports the vector and matrix operations

that are fundamental to engineering and scientific problems It enables fast

development and execution With the MATLAB language you can

program and develop algorithms faster than with traditional languages

because you do not need to perform low-level administrative tasks such as

declaring variables specifying data types and allocating memory In many

cases MATLAB eliminates the need for lsquoforrsquo loops As a result one line of

MATLAB code can often replace several lines of C or C++ code

At the same time MATLAB provides all the features of a traditional

programming language including arithmetic operators flow control data

structures data types object-oriented programming (OOP) and debugging

features

MATLAB lets you execute commands or groups of commands one

at a time without compiling and linking enabling you to quickly iterate to

the optimal solution For fast execution of heavy matrix and vector

computations MATLAB uses processor-optimized libraries For general-

purpose scalar computations MATLAB generates machine-code

instructions using its JIT (Just-In-Time) compilation technology

This technology which is available on most platforms provides

execution speeds that rival those of traditional programming languages

Development Tools

MATLAB includes development tools that help you implement

your algorithm efficiently These include the following

MATLAB Editor

Provides standard editing and debugging features such as setting

breakpoints and single stepping

Code Analyzer

Checks your code for problems and recommends modifications to

maximize performance and maintainability

MATLAB Profiler

Records the time spent executing each line of code

Directory Reports

Scan all the files in a directory and report on code efficiency file

differences file dependencies and code coverage

Designing Graphical User Interfaces

By using the interactive tool GUIDE (Graphical User Interface

Development Environment) to layout design and edit user interfaces

GUIDE lets you include list boxes pull-down menus push buttons radio

buttons and sliders as well as MATLAB plots and Microsoft

ActiveXreg controls Alternatively you can create GUIs programmatically

using MATLAB functions

322 ANALYZING AND ACCESSING DATA

MATLAB supports the entire data analysis process from acquiring

data from external devices and databases through preprocessing

visualization and numerical analysis to producing presentation-quality

output

Data Analysis

MATLAB provides interactive tools and command-line functions for data

analysis operations including

Interpolating and decimating

Extracting sections of data scaling and averaging

Thresholding and smoothing

Correlation Fourier analysis and filtering

1-D peak valley and zero finding

Basic statistics and curve fitting

Matrix analysis

Data Access

MATLAB is an efficient platform for accessing data from

files other applications databases and external devices You can read data

from popular file formats such as Microsoft Excel ASCII text or binary

files image sound and video files and scientific files such as HDF and

HDF5 Low-level binary file IO functions let you work with data files in

any format Additional functions let you read data from Web pages and

XML

Visualizing Data

All the graphics features that are required to visualize engineering

and scientific data are available in MATLAB These include 2-D and 3-D

plotting functions 3-D volume visualization functions tools for

interactively creating plots and the ability to export results to all popular

graphics formats You can customize plots by adding multiple axes

changing line colors and markers adding annotation Latex equations and

legends and drawing shapes

2-D Plotting

Visualizing vectors of data with 2-D plotting functions that create

Line area bar and pie charts

Direction and velocity plots

Histograms

Polygons and surfaces

Scatterbubble plots

Animations

3-D Plotting and Volume Visualization

MATLAB provides functions for visualizing 2-D matrices 3-

D scalar and 3-D vector data You can use these functions to visualize and

understand large often complex multidimensional data Specifying plot

characteristics such as camera viewing angle perspective lighting effect

light source locations and transparency

3-D plotting functions include

Surface contour and mesh

Image plots

Cone slice stream and isosurface

323 PERFORMING NUMERIC COMPUTATION

MATLAB contains mathematical statistical and engineering

functions to support all common engineering and science operations These

functions developed by experts in mathematics are the foundation of the

MATLAB language The core math functions use the LAPACK and BLAS

linear algebra subroutine libraries and the FFTW Discrete Fourier

Transform library Because these processor-dependent libraries are

optimized to the different platforms that MATLAB supports they execute

faster than the equivalent C or C++ code

MATLAB provides the following types of functions for performing

mathematical operations and analyzing data

Matrix manipulation and linear algebra

Polynomials and interpolation

Fourier analysis and filtering

Data analysis and statistics

Optimization and numerical integration

Ordinary differential equations (ODEs)

Partial differential equations (PDEs)

Sparse matrix operations

MATLAB can perform arithmetic on a wide range of data types

including doubles singles and integers

CHAPTER 4

IMPLEMENTATION

41 GENERAL

Matlab is a program that was originally designed to simplify the

implementation of numerical linear algebra routines It has since grown into

something much bigger and it is used to implement numerical algorithms

for a wide range of applications The basic language used is very similar to

standard linear algebra notation but there are a few extensions that will

likely cause you some problems at first

42 SNAPSHOTS

ORIGINAL SCLERA IMAGE IS CONVERTED INTO GREY SCALE IMAGE

FIG

GREY SCALE IMAGE IS CONVERTED INTO BINARY IMAGE

FIG

EDGE DETECTON IS DONE BY OTSUrsquoS THRESHOLDING

FIG

SELECTING THE REGION OF INTEREST (SCLERA PART)

FIG

SELECTED ROI PART

FIG

FIG

ENHANCEMENT OF SCLERA IMAGE

FIG

FEATURE EXTRACTION OF SCLERA IMAGE USING GABOR

FILTERS

FIG

MATCHING WITH IMAGES IN DATABASE

FIG

DISPLAYING THE RESULT (MATCHED OR NOT MATCHED)

FIG

CHAPTER 5

APPLICATIONS

The applications of biometrics can be divided into the following three main groups

Commercial applications such as computer network login electronic data security ecommerce Internet access ATM credit card physical access control cellular phone PDA medical records management distance learning etc

Government applications such as national ID card correctional facility driverrsquos license social security welfare-disbursement border control Passport control etc

Forensic applications such as corpse identification criminal investigation terrorist identification parenthood determination missing children etc Traditionally commercial applications have used knowledge-based systems (eg PIN sand passwords) government applications have used token-based systems (eg ID cards and badges) and forensic applications have relied on human experts to match biometric features Biometric systems are being increasingly deployed in large scale civilian applications The Schiphol Premium scheme at the Amsterdam airport for example employs iris scan cards to speed up the passport and visa control procedures

CHAPTER 6

CONCLUSION AND FUTURE SCOPE

61 CONCLUSION

In this paper we proposed a new parallel sclera vein recognition

method which employees a two stage parallel approach for registration and

matching Even though the research focused on developing a parallel sclera

matching solution for the sequential line-descriptor method using CUDA

GPU architecture the parallel strategies developed in this research can be

applied to design parallel solutions to other sclera vein recognition methods

and general pattern recognition methods We designed the Y shape

descriptor to narrow the search range to increase the matching efficiency

which is a new feature extraction method to take advantage of the GPU

structures We developed the WPL descriptor to incorporate mask

information and make it more suitable for parallel computing which can

dramatically reduce data transferring and computation We then carefully

mapped our algorithms to GPU threads and blocks which is an important

step to achieve parallel computation efficiency using a GPU A work flow

which has high arithmetic intensity to hide the memory access latency was

designed to partition the computation task to the heterogeneous system of

CPU and GPU even to the threads in GPU The proposed method

dramatically improves the matching efficiency without compromising

recognition accuracy

62 REFERENCES

[1] C W Oyster The Human Eye Structure and Function Sunderland

Sinauer Associates 1999

[2] C Cuevas D Berjon F Moran and N Garcia ldquoMoving object

detection for real-time augmented reality applications in a GPGPUrdquo IEEE

Trans Consum Electron vol 58 no 1 pp 117ndash125 Feb 2012

[3] D C Cirean U Meier L M Gambardella and J Schmidhuber ldquoDeep

big simple neural nets for handwritten digit recognitionrdquo Neural Comput

vol 22 no 12 pp 3207ndash3220 2010

[4] F Z Sakr M Taher and A M Wahba ldquoHigh performance iris

recognition system on GPUrdquo in Proc ICCES 2011 pp 237ndash242

[5] G Poli J H Saito J F Mari and M R Zorzan ldquoProcessing

neocognitron of face recognition on high performance environment based

on GPU with CUDA architecturerdquo in Proc 20th Int Symp Comput

Archit High Perform Comput 2008 pp 81ndash88

[6] J Antikainen J Havel R Josth A Herout P Zemcik and M Hauta-

Kasari ldquoNonnegative tensor factorization accelerated using GPGPUrdquo IEEE

Trans Parallel Distrib Syst vol 22 no 7 pp 1135ndash1141 Feb 2011

[7] K-S Oh and K Jung ldquoGPU implementation of neural networksrdquo

Pattern Recognit vol 37 no 6 pp 1311ndash1314 2004

[8] P R Dixon T Oonishi and S Furui ldquoHarnessing graphics processors

for the fast computation of acoustic likelihoods in speech recognitionrdquo

Comput Speech Lang vol 23 no 4 pp 510ndash526 2009

[9] P Kaufman and A Alm ldquoClinical applicationrdquo Adlerrsquos Physiology of

the Eye 2003

[10] R N Rakvic B J Ulis R P Broussard R W Ives and N Steiner

ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics Security vol 4

no 4 pp 812ndash823 Dec 2009

[11] S Crihalmeanu and A Ross ldquoMultispectral scleral patterns for ocular

biometric recognitionrdquo Pattern Recognit Lett vol 33 no 14 pp 1860ndash

1869 Oct 2012

[12] W Wenying Z Dongming Z Yongdong L Jintao and G

Xiaoguang ldquoRobust spatial matching for object retrieval and its parallel

implementation on GPUrdquo IEEE Trans Multimedia vol 13 no 6 pp

1308ndash1318 Dec 2011Multimedia Sec Magdeburg Germany Sep 2004

pp 4ndash15

[13] Y Xu S Deka and R Righetti ldquoA hybrid CPU-GPGPU approach for

real-time elastographyrdquo IEEE Trans Ultrason Ferroelectr Freq Control

vol 58 no 12 pp 2631ndash2645 Dec 2011

[14] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive

multimodal eye recognitionrdquo Signal Image Video Process vol 7 no 4

pp 619ndash631 Jul 2013

[15] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive

approach for sclera image quality measurerdquo Int J Biometrics vol 5 no 2

pp 181ndash198 2013

[16] Z Zhou E Y Du N L Thomas and E J Delp ldquoA new human

identification method Sclera recognitionrdquo IEEE Trans Syst Man

Cybern A Syst Humans vol 42 no 3 pp 571ndash583 May 2012

Page 3: Major project report on

ACKNOWLEDGEMENT

This is a report giving details of our project work titled ldquoAN EFFICIENT

PARALLEL APPROACH FOR SCLERA VEIN RECOGNITIONrdquo

Though this an attempt has been made to present the description of all the

theoretical and practical aspects of our project to all the possible extent

We take this opportunity to express our sincere appreciation to professor

Ms SSuneeta Head of the Department and staff of Bachelor of

Technology for their invaluable suggestion and keen interest they have

shown in successful completion of this project

We express our deep gratitude to our guide Ms Syeda Sana Fatima

whose in valuable reference Suggestion and encouragement have

immensely helped in successful completion of the project This project

would add as an asset to our academic profile

It is with profound sense of gratitude that we acknowledge our project

guide Ms Syeda Sana Fatima for providing us with live specification and

her valuable suggestion which encourage me to complete their project

successfully

We are happy to express our gratitude to one and all who helped us in the fulfilment

of the project successfully

We are thankful to our principal Dr MAZHER SALEEM Shadan Womenrsquos

College of Engineering and Technology for encouraging us to do the projects

ADLA KIRANMAYI

ANNABATHULA SRILATHA

MAYESHA MUBEEN

DECLARATION

We hereby declare that the work which is being presented in this project

entitled ldquoAN EFFICIENT PARALLEL APPROACH FOR SCLERA

VEIN RECOGNITIONrdquo submitted towards the partial fulfilment of the

requirement for the award of the Degree of Bachelor Of Technology in

ldquoElectronics and Communication Engineeringrdquo is an authentic record of

our work under the supervision of Ms Syeda Sana Fatima Assistant

professor and Ms SSuneeta Head of the Department of Electronics and

Communication Engineering SHADAN WOMENS COLLEGE OF

ENGINEERING AND TECHNOLOGY affiliated to Jawaharlal Nehru

Technology University Hyderabad

The matter embodied in this report has not been submitted for the award of

any other degree

ADLA KIRANMAYI

ANNABATHULA SRILATHA

MAYESHA MUBEEN

INDEX

ABSTRACThelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

helliphelliphelliphellipi

CHAPTER 1

INTRODUCTIONhelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

hellip1-16

11 GENERAL

12 OVERVIEW ABOUT DIGITAL IMAGE PROCESSING

121 PREPROCESSING

122 IMAGE ENHANCEMENT

123 IMAGE RESTORATION

124 IMAGE COMPRESSION

125 SEGMENTATION

126 IMAGE RESTORATION

127 FUNDAMENTAL STEPS

13 A SIMPLE IMAGE MODEL

14 IMAGE FILE FORMATS

15 TYPE OF IMAGES

151 BINARY IMAGES

152 GRAY SCALE IMAGE

153 COLOR IMAGE

154 INDEXED IMAGE

16 APPLICATIONS OF IMAGE PROCESSING

17 EXIXTING SYSTEM

171 DISADVANTAGES OF EXISTING SYSTEM

18 LITERATURE SURVEY

19 PROPOSED SYSTEM

191 ADVANTAGES

CHAPTER 2 PROJECT

DESCRIPTIONhelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip17-46

21 INTRODUCTION

22 BACKGROUND OF SCLERA VEIN RECOGNITION

221 OVERVIEW OF SCLERA VEIN RECOGNITION

222 SCLERA SEGMENTATION

223 IRIS AND EYELID REFINEMENT

224 OCULAR SURFACE VASCULATURE

225 OVERVIEW OF THE LINE DESCRIPTOR-BASED SCLERA

VEIN

23 EVOLUTION OF GPU ARCHITECTURE

231 PROGRAMMING A GPU FOR GRAPHICS

232 PROGRAMMING A GPU FOR GENERAL-PURPOSE

PROGRAMS (OLD)

233 PROGRAMMING A GPU FOR GENERAL-PURPOSE

PROGRAMS (NEW)

24 COARSE-TO-FINE TWO-STAGE MATCHING PROCESS

241 STAGE I MATCHING WITH Y SHAPE DESCRIPTOR

242 STAGE II FINE MATCHING USING WPL DESCRIPTOR

25 MAPPING THE SUBTASKS TO CUDA

251 MAPPING ALGORITHM TO BLOCKS

252 MAPPING INSIDE BLOCK

253 MEMORY MANAGEMENT

26 HISTOGRAM OF ORIENTED GRADIENTS

CHAPTER 3 SOFTWARE

SPECIFICATIONhelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip47-53

31 GENERAL

32 SOFTWARE REQUIREMENTS

33 INTRODUCTION

34 FEATURES OF MATLAB

341 INTERFACING WITH OTHER LANGUAGES

35 THE MATLAB SYSTEM

351 DESKTOP TOOLS

352 ANALYZING AND ACCESSING DATA

353 PERFORMING NUMERIC COMPUTATION

CHAPTER 4

IMPLEMENTATIONhelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

helliphellip54-69

41 GENERAL

42 CODING IMPLEMENTATION

43 SNAPSHOTS

CHAPTER 5 helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

70

CHAPTER 6 CONCLUSION amp FUTURE

SCOPEhelliphelliphelliphelliphelliphelliphelliphelliphelliphellip71-72

61 CONCLUSION

62 REFERENCES

APPLICATION

LIST OF FIGURES

FIG NO

FIG NAME PGNO

11 Fundamental blocks of digital image processing

2

12 Gray scale image 813 The additive model of RGB 914 The colors created by the

subtractive model of CMYK9

21 The diagram of a typical sclera vein recognition approach

19

22 Steps of Segmentation 2123 Glare area detection 2124 Detection of the sclera area 22

25 Pattern of veins 23

26 Sclera region and its vein

patterns 25

27 Filtering can take place

simultaneously on different

25

parts of the iris image

28 The sketch of parameters of

segment descriptor

26

29 The weighting image 28

210 The module of sclera

template matching

28

211 The Y shape vessel branch in

sclera

28

212 The rotation and scale

invariant character of Y

shape vessel branch

29

213 The line descriptor of the

sclera vessel pattern

30

214 The key elements of

descriptor vector

31

215 Simplified sclera matching

steps on GPU

32

216 Two-stage matching scheme 35

217 Example image from the

UBIRIS database

42

218 Occupancy on various thread

numbers per block

43

219 The task assignment inside

and outside the GPU

44

220 HOG features 46

41 Original sclera image 65

42 Binarised sclera image 65

43 Edge map subtracted image 66

44 Cropping roi 66

45 Roi mask 67

46 Roi finger sclera image 67

47 Enhanced sclera image 68

48 Feature extracted sclera

image

68

49 Matching with images in

database

69

410 Result 69

ABSTRACT

Sclera vein recognition is shown to be a promising method for human

identification However its matching speed is slow which could impact its

application for real-time applications To improve the matching efficiency

we proposed a new parallel sclera vein recognition method using a two-

stage parallel approach for registration and matching First we designed a

rotation- and scale-invariant Y shape descriptor based feature extraction

method to efficiently eliminate most unlikely matches Second we

developed a weighted polar line sclera descriptor structure to incorporate

mask information to reduce GPU memory cost Third we designed a

coarse-to-fine two-stage matching method Finally we developed a

mapping scheme to map the subtasks to GPU processing units The

experimental results show that our proposed method can achieve dramatic

processing speed improvement without compromising the recognition

accuracy

CHAPTER 1

INTRODUCTION

11GENERAL

Digital image processing is the use of computer algorithms to

perform image processing on digital images The 2D continuous image is

divided into N rows and M columns The intersection of a row and a

column is called a pixel The image can also be a function other variables

including depth color and time An image given in the form of a

transparency slide photograph or an X-ray is first digitized and stored as a

matrix of binary digits in computer memory This digitized image can then

be processed andor displayed on a high-resolution television monitor For

display the image is stored in a rapid-access buffer memory which

refreshes the monitor at a rate of 25 frames per second to produce a visually

continuous display

12 OVERVIEW ABOUT DIGITAL IMAGE PROCESSING

The field of ldquoDigital Image Processingrdquo refers to processing the digital

images by means of a digital computer In a broader sense it can be

considered as a processing of any two dimensional data where any image

(optical information) is represented as an array of real or complex numbers

represented by a definite number of bits An image is represented as a two

dimensional function f(xy) where lsquoxrsquo and lsquoyrsquo are spatial (plane)

coordinates and the amplitude of f at any pair of coordinates (xy)

represents the intensity or gray level of the image at that point

A digital image is one for which both the co-ordinates and the

amplitude values of f are all finite discrete quantities Hence a digital

image is composed of a finite number of elements each of which has a

particular location value These elements are called ldquopixelsrdquo A digital

image is discrete in both spatial coordinates and brightness and it can be

considered as a matrix whose rows and column indices identify a point on

the image and the corresponding matrix element value identifies the gray

level at that point

One of the first applications of digital images was in the newspaper

industry when pictures were first sent by submarine cable between London

and New York Introduction of the Bartlane cable picture transmission

system in the early 1920s reduced the time required to transport a picture

across the Atlantic from more than a week to less than three hours

FIG

121 PREPROCESSING

In imaging science image processing is any form of signal

processing for which the input is an image such as a photograph or video

frame the output of image processing may be either an image or a set of

characteristics or parameters related to the image Most image-processing

techniques involve treating the image as a two-dimensional signal and

applying standard signal-processing techniques to it Image processing

usually refers to digital image processing but optical and analog image

processing also are possible This article is about general techniques that

apply to all of them The acquisition of images (producing the input image

in the first place) is referred to as imaging

Image processing refers to processing of a 2D picture by a

computer Basic definitions

An image defined in the ldquoreal worldrdquo is considered to be a function

of two real variables for example a(xy) with a as the amplitude (eg

brightness) of the image at the real coordinate position (xy) Modern digital

technology has made it possible to manipulate multi-dimensional signals

with systems that range from simple digital circuits to advanced parallel

computers The goal of this manipulation can be divided into three

categories

Image processing (image in -gt image out)

Image Analysis (image in -gt measurements out)

Image Understanding (image in -gt high-level description out)

An image may be considered to contain sub-images sometimes referred

to as regions-of-interest ROIs or simply regions This concept reflects the

fact that images frequently contain collections of objects each of which can

be the basis for a region In a sophisticated image processing system it

should be possible to apply specific image processing operations to selected

regions Thus one part of an image (region) might be processed to suppress

motion blur while another part might be processed to improve colour

rendition

Most usually image processing systems require that the images be

available in digitized form that is arrays of finite length binary words For

digitization the given Image is sampled on a discrete grid and each sample

or pixel is quantized using a finite number of bits The digitized image is

processed by a computer To display a digital image it is first converted

into analog signal which is scanned onto a display Closely related to

image processing are computer graphics and computer vision In computer

graphics images are manually made from physical models of objects

environments and lighting instead of being acquired (via imaging devices

such as cameras) from natural scenes as in most animated movies

Computer vision on the other hand is often considered high-level image

processing out of which a machinecomputersoftware intends to decipher

the physical contents of an image or a sequence of images (eg videos or

3D full-body magnetic resonance scans)

In modern sciences and technologies images also gain much

broader scopes due to the ever growing importance of scientific

visualization (of often large-scale complex scientificexperimental data)

Examples include microarray data in genetic research or real-time multi-

asset portfolio trading in finance Before going to processing an image it is

converted into a digital form Digitization includes sampling of image and

quantization of sampled values After converting the image into bit

information processing is performed This processing technique may be

Image enhancement Image restoration and Image compression

122 IMAGE ENHANCEMENT

It refers to accentuation or sharpening of image features such as

boundaries or contrast to make a graphic display more useful for display amp

analysis This process does not increase the inherent information content in

data It includes gray level amp contrast manipulation noise reduction edge

crispening and sharpening filtering interpolation and magnification

pseudo coloring and so on

123 IMAGE RESTORATION

It is concerned with filtering the observed image to minimize the

effect of degradations Effectiveness of image restoration depends on the

extent and accuracy of the knowledge of degradation process as well as on

filter design Image restoration differs from image enhancement in that the

latter is concerned with more extraction or accentuation of image features

124 IMAGE COMPRESSION

It is concerned with minimizing the number of bits required to represent

an image Application of compression are in broadcast TV remote sensing

via satellite military communication via aircraft radar teleconferencing

facsimile transmission for educational amp business documents medical

images that arise in Computer tomography magnetic resonance imaging

and digital radiology motion pictures satellite images weather maps

geological surveys and so on

Text compression ndash CCITT GROUP3 amp GROUP4

Still image compression ndash JPEG

Video image compression ndash MPEG

125 SEGMENTATION

In computer vision image segmentation is the process of

partitioning a digital image into multiple segments (sets of pixels also

known as super pixels) The goal of segmentation is to simplify andor

change the representation of an image into something that is more

meaningful and easier to analyze Image segmentation is typically used to

locate objects and boundaries (lines curves etc) in images More precisely

image segmentation is the process of assigning a label to every pixel in an

image such that pixels with the same label share certain visual

characteristics

The result of image segmentation is a set of segments that

collectively cover the entire image or a set of contours extracted from the

image (see edge detection) Each of the pixels in a region are similar with

respect to some characteristic or computed property such as

colour intensity or texture Adjacent regions are significantly different

with respect to the same characteristic(s) When applied to a stack of

images typical in medical imaging the resulting contours after image

segmentation can be used to create 3D reconstructions with the help of

interpolation algorithms like marching cubes

126 IMAGE RESTORATION

Image restoration like enhancement improves the qualities of image

but all the operations are mainly based on known measured or

degradations of the original image Image restorations are used to restore

images with problems such as geometric distortion improper focus

repetitive noise and camera motion It is used to correct images for known

degradations

127 FUNDAMENTAL STEPS

Image acquisition to acquire a digital image

Image preprocessing to improve the image in ways that increases the

chances for success of the other processes

Image segmentation to partitions an input image into its constituent parts or

objects

Image representation to convert the input data to a form suitable for

computer processing

Image description to extract features that result in some quantitative

information of interest or features that are basic for differentiating one

class of objects from another

Image recognition to assign a label to an object based on the

information provided by its descriptors

Image interpretation to assign meaning to an ensemble of recognized

objects

Knowledge about a problem domain is coded into an image processing

system in the form of a Knowledge database

13 A SIMPLE IMAGE MODEL

To be suitable for computer processing an image f(xy) must be digitalized

both spatially and in amplitude

Digitization of the spatial coordinates (xy) is called image sampling

Amplitude digitization is called gray-level quantization

The storage and processing requirements increase rapidly with the spatial

resolution and the number of gray levels

Example A 256 gray-level image of size 256x256 occupies 64K bytes of

memory

Images of very low spatial resolution produce a checkerboard effect

The use of insufficient number of gray levels in smooth areas of a digital

image results in false contouring

14 IMAGE FILE FORMATS

There are two general groups of lsquoimagesrsquo vector graphics (or line art)

and bitmaps (pixel-based or lsquoimagesrsquo) Some of the most common file

formats are

GIF mdash Graphical interchange Format An 8-bit (256 colour) non-

destructively compressed bitmap format Mostly used for web Has several

sub-standards one of which is the animated GIF

JPEG mdash Joint Photographic Experts Group a very efficient (ie much

information per byte) destructively compressed 24 bit (16 million colours)

bitmap format Widely used especially for web and Internet (bandwidth-

limited)

TIFF mdash Tagged Image File Format The standard 24 bit publication bitmap

format Compresses non-destructively with for instance Lempel-Ziv-

Welch (LZW) compression

PS mdash Postscript a standard vector format Has numerous sub-standards

and can be difficult to transport across platforms and operating systems

PSD ndash Adobe PhotoShop Document a dedicated Photoshop format that

keeps all the information in an image including all the layers

BMP- bit map file format

15 TYPE OF IMAGES

Images are 4 types

1 Binary image

2 Gray scale image

3 Color image

4 Indexed image

151 BINARY IMAGES

A binary image is a digital image that has only two possible values for

each pixel Typically the two colors used for a binary image are black and

white though any two colors can be used Binary images are also called bi-

level or two-level This means that each pixel is stored as a single bitmdashie

a 0 or 1 The names black-and-white BampW

152 GRAY SCALE IMAGE

In a (8-bit) grayscale image each picture element has an assigned intensity

that ranges from 0 to 255 A grey scale image is what people normally call

a black and white image but the name emphasizes that such an image will

also include many shades of grey

FIG

153 COLOR IMAGE

The RGB colour model relates very closely to the way we perceive

colour with the r g and b receptors in our retinas RGB uses additive colour

mixing and is the basic colour model used in television or any other

medium that projects colour with light It is the basic colour model used in

computers and for web graphics but it cannot be used for print production

The secondary colours of RGB ndash cyan magenta and yellow ndash are formed

by mixing two of the primary colours (red green or blue) and excluding the

third colour Red and green combine to make yellow green and blue to

make cyan and blue and red form magenta The combination of red green

and blue in full intensity makes white

In Photoshop using the ldquoscreenrdquo mode for the different layers in an

image will make the intensities mix together according to the additive

colour mixing model This is analogous to stacking slide images on top of

each other and shining light through them

FIG

CMYK The 4-colour CMYK model used in printing lays down

overlapping layers of varying percentages of transparent cyan (C) magenta

(M) and yellow (Y) inks In addition a layer of black (K) ink can be added

The CMYK model uses the subtractive colour model

154 INDEXED IMAGE

FIG

An indexed image consists of an array and a color map matrix The

pixel values in the array are direct indices into a color map By convention

this documentation uses the variable name X to refer to the array and map

to refer to the color map In computing indexed color is a technique to

manage digital images colors in a limited fashion in order to save

computer memory and file storage while speeding up display refresh and

file transfers It is a form of vector quantization compression

When an image is encoded in this way color information is not

directly carried by the image pixel data but is stored in a separate piece of

data called a palette an array of color elements in which every element a

color is indexed by its position within the array The image pixels do not

contain the full specification of its color but only its index in the palette

This technique is sometimes referred as pseudocolor or indirect color as

colors are addressed indirectly

Perhaps the first device that supported palette colors was a random-

access frame buffer described in 1975 by Kajiya Sutherland and Cheadle

This supported a palette of 256 36-bit RGB colors

16 Applications of image processing

Interest in digital image processing methods stems from 2 principal

application areas

1) Improvement of pictorial information for human interpretation

2) Processing of scene data for autonomous machine perception

In the second application area interest focuses on procedures for

extracting from an image

Information in a form suitable for computer processing

Examples include automatic character recognition industrial machine

vision for product assembly and inspection military recognizance

automatic processing of fingerprints etc

17 EXISTING SYSTEM

In Crihalmeanu and Ross proposed three approaches Speed Up Robust

Features (SURF)-based method minutiae detection and direct correlation

matching for feature registration and matching Within these three methods

the SURF method achieves the best accuracy It takes an average of 15

seconds1 using the SURF method to perform a one-to-one matching In

Zhou et al Proposed line descriptor-based method for sclera vein

recognition The matching step (including registration) is the most time-

consuming step in this sclera vein recognition system which costs about 12

seconds to perform a one-to-one matching Both speed was calculated using

a PC with Intelreg Core TM 2 Duo 24GHz processors and 4 GB DRAM

Currently Sclera vein recognition algorithms are designed using central

processing unit (CPU)-based systems

171 DISADVANTAGES OF EXISTING SYSTEM

1 Mask files are used to calculate valid overlapping areas of two sclera

templates and to align the templates to the same coordinate system But the

mask files are large in size and will preoccupy the GPU memory and slow

down the data transfer Also some of processing on the mask files will

involve convolution which is difficult to improve its performance on the

scalar process unit on CUDA

2 The procedure of sclera feature matching consists of a pipeline of several

computational stages with different memory and processing requirements

There is no uniform mapping scheme applicable for all these stages

3 When the scale of sclera database is far larger than the number of the

processing units on the GPU parallel matching on the GPU is still unable to

satisfy the requirement of real-time performance

LITERATURE SURVEY

1 S Crihalmeanu and A Ross ldquo ieee Multispectral scleral patterns

for ocular biometric recognitionrdquo Pattern Recognit Lett vol 33 no

14 pp 1860ndash1869 Oct 2012

Face recognition in unconstrained acquisition conditions is one of the

most challenging problems that has been actively researched in recent

years It is well known that many state-of-the-arts still face recognition

algorithms perform well when constrained (frontal well illuminated high-

resolution sharp and full) face images are acquired However their

performance degrades significantly when the test images contain variations

that are not present in the training images In this paper we highlight some

of the key issues in remote face recognition We define the remote face

recognition as one where faces are several tens of meters (10-250m) from

the cameras We then describe a remote face database which has been

acquired in an unconstrained outdoor maritime environment Recognition

performance of a subset of existing still image-based face recognition

algorithms is evaluated on the remote face data set Further we define the

remote re-identification problem as matching a subject at one location with

candidate sets acquired at a different location and over time in remote

conditions We provide preliminary experimental results on remote re-

identification It is demonstrated that in addition to applying a good

classification algorithm finding features that are robust to variations

mentioned above and developing statistical models which can account for

these variations are very important for remote face recognition

2 R N Rakvic B J Ulis R P Broussard R W Ives and N

Steiner1 ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics

Security

With the rapidly expanded biometric data collected by various sectors

of government and industry for identification and verification purposes

how to manage and process such Big Data draws great concern Even

though modern processors are equipped with more cores and memory

capacity it still requires careful design in order to utilize the hardware

resource effectively and the power consumption efficiently This research

addresses this issue by investigating the workload characteristics of

biometric application Taking Daugmanrsquos iris matching algorithm which

has been proven to be the most reliable iris matching method as a case

study we conduct performance profiling and binary instrumentation on the

benchmark to capture its execution behavior The results show that data

loading and memory access incurs great performance overhead and

motivates us to move the biometrics computation to high-performance

architecture

Modern iris recognition algorithms can be computationally intensive

yet are designed for traditional sequential processing elements such as a

personal computer However a parallel processing alternative using field

programmable gate arrays (FPGAs) offers an opportunity to speed up iris

recognition Within the means of this project iris template generation with

directional filtering which is acomputationally expensive yet parallel

portion of a modern iris recognition algorithm is parallelized on an FPGA

system We will present a performance comparison of the parallelized

algorithm on the FPGA system to a traditional CPU-based version The

parallelized template generation outperforms an optimized C++ code

version determining the information content of an iris approximately 324

times faster

3 R Derakhshani A Ross and S Crihalmeanu ldquoA new biometric

modality based on conjunctival vasculaturerdquo in Proc Artif Neural

Netw Eng 2006 pp 1ndash8

A new biometric indicator based on the patterns of conjunctival

vasculature is proposed Conjunctival vessels can be observed on the visible

part of the sclera that is exposed to the outside world These vessels

demonstrate rich and specific details in visible light and can be easily

photographed using a regular digital camera In this paper we discuss

methods for conjunctival imaging preprocessing and feature extraction in

order to derive a suitable conjunctival vascular template for biometric

authentication Commensurate classification methods along with the

observed accuracy are discussed Experimental results suggest the potential

of using conjunctival vasculature as a biometric measure Identification of

a person based on some unique set of features is an important task The

human identification is possible with several biometric systems and sclera

recognition is one of the promising biometrics The sclera is the white

portion of the human eye The vein pattern seen in the sclera region is

unique to each person Thus the sclera vein pattern is a well suited

biometric technology for human identification The existing methods used

for sclera recognition have some drawbacks like only frontal looking

images are preferred for matching and rotation variance is another problem

These problems are completely eliminated in the proposed system by using

two feature extraction techniques They are Histogram of Oriented

Gradients (HOG) and converting the image into polar form using the

bilinear interpolation technique These two features help the proposed

system to become illumination invariant and rotation invariant The

experimentation is done with the help of UBIRIS database The

experimental result shows that the proposed sclera recognition method can

achieve better accuracy than the previous methods

4 J D Owens M Houston D Luebke S Green J E Stone and J

C Phillips ldquoGPU computingrdquo Proc IEEE vol 96 no 5pp 879ndash899

May 2008

The graphics processing unit (GPU) has become an integral part of

todayrsquos mainstream computing systems Over the past six years there has

been a marked increase in the performance and capabilities of GPUs The

modern GPU is not only a powerful graphics engine but also a highly

parallel programmable processor featuring peak arithmetic and memory

bandwidth that substantially outpaces its CPU counterpart The GPUrsquos

rapid increase in both programmability and capability has spawned a

research community that has successfully mapped a broad range of

computationally demanding complex problems to the GPU This effort in

general purpose computing on the GPU also known as GPU computing

has positioned the GPU as a compelling alternative to traditional

microprocessors in high-performance computer systems of the future We

describe the background hardware and programming model for GPU

computing summarize the state of the art in tools and techniques and

present four GPU computing successes in game physics and computational

biophysics that deliver order-of-magnitude performance gains over

optimized CPU applications

5 H Proenccedila and L A Alexandre ldquoUBIRIS A noisy iris image

databaserdquoin Proc 13th Int Conf Image Anal Process 2005 pp 970ndash

977

This paper proposes algorithms for iris segmentation quality

enhancement match score fusion and indexing to improve both the

accuracy and the speed of iris recognition A curve evolution approach is

proposed to effectively segment a nonideal iris image using the modified

MumfordndashShah functional Different enhancement algorithms are

concurrently applied on the segmented iris image to produce multiple

enhanced versions of the iris image A support-vector-machine-based

learning algorithm selects locally enhanced regions from each globally

enhanced image and combines these good-quality regions to create a single

high-quality iris image Two distinct features are extracted from the high-

quality iris image The global textural feature is extracted using the 1-D log

polar Gabor transform and the local topological feature is extracted using

Euler numbers An intelligent fusion algorithm combines the textural and

topological matching scores to further improve the iris recognition

performance and reduce the false rejection rate whereas an indexing

algorithm enables fast and accurate iris identification The verification and

identification performance of the propose algorithms is validated and

compared with other algorithms using the CASIA Version 3 ICE 2005 and

UBIRIS iris databases

18 PROPOSED METHOD

Proposed a new parallel sclera vein recognition method using a two-

stage parallel approach for registration and matching A parallel sclera

matching solution for Sclera vein recognition using our sequential line-

descriptor method using the CUDA GPU architecture CUDA is a highly

parallel multithreaded many-core processor with tremendous

computational power

It supports not only a traditional graphics pipeline but also computation

on non-graphical data It is relatively straightforward to implement our C

program for CUDA on AMD-based GPU using Open CL Our CUDA

kernels can be directly converted to Open CL kernels by concerning

different syntax for various keywords and built-in functions The mapping

strategy is also effective in Open CL if we regard thread and block in

CUDA as work item and work-group in Open CL Most of our optimization

techniques such as coalesced memory access and prefix sum can work in

Open CL too Moreover since CUDA is a data parallel architecture the

implementation of our approach by Open CL should be programmed in

data-parallel model

In this research we first discuss why the naiumlve parallel approach would

not work We then propose the new sclera descriptor ndash the Y shape sclera

feature-based efficient registration method to speed up the mapping scheme

introduce the ldquoweighted polar line (WPL) descriptorrdquo that would be better

suited for parallel computing to mitigate the mask size issue and develop

our coarse to fine two-stage matching process to dramatically improve the

matching speed These new approaches make the parallel processing

possible and efficient

191PROPOSED SYSTEM ADVANTAGES

1 To improve the efficiency in this research we propose a new descriptor

mdash the Y shape descriptor which can greatly help improve the efficiency of

the coarse registration of two images and can be used to filter out some

non-matching pairs before refined matching

2 We propose the coarse-to-fine two-stage matching process In the first

stage we matched two images coarsely using the Y-shape descriptors

which is very fast to match because no registration was needed The

matching result in this stage can help filter out image pairs with low

similarities

CHAPTER 2

PROJECT DESCRIPTION

21 INTRODUCTION

The sclera is the opaque and white outer layer of the eye The blood

vessel structure of sclera is formed randomly and is unique to each person

which can be used for humanrsquos identification Several researchers have

designed different Sclera vein recognition methods and have shown that it

is promising to use Sclera vein recognition for human identification In

Crihalmeanu and Ross proposed three approaches Speed Up Robust

Features (SURF)-based method minutiae detection and direct correlation

matching for feature registration and matching Within these three methods

the SURF method achieves the best accuracy It takes an average of 15

seconds1 using the SURF method to per- form a one-to-one matching Zhou

et al proposed line descriptor-based method for sclera vein recognition

The matching step (including registration) is the most time-consuming step

in this sclera vein recognition system which costs about 12 seconds to

perform a one-to-one matching Both speed was calculated using a PC with

Intelreg Coretrade 2 Duo 24GHz processors and 4 GB DRAM Currently

Sclera vein recognition algorithms are designed using central processing

unit (CPU)-based systems

As discussed CPU-based systems are designed as sequential

processing devices which may not be efficient in data processing where the

data can be parallelized Because of large time consumption in the matching

step Sclera vein recognition using sequential-based method would be very

challenging to be implemented in a real time biometric system especially

when there is large number of templates in the database for matching GPUs

(as abbreviation of General purpose Graphics Processing Units GPGPUs)

are now popularly used for parallel computing to improve the

computational processing speed and efficiency The highly parallel

structure of GPUs makes them more effective than CPUs for data

processing where processing can be performed in parallel GPUs have been

widely used in biometrics recognition such as speech recognition text

detection handwriting recognition and face recognition In iris

recognition GPU was used to extract the features construct descriptors

and match templates

GPUs are also used for object retrieval and image search Park et al

designed the performance evaluation of image processing algorithms such

as linear feature extraction and multi-view stereo matching on GPUs

However these approaches were designed for their specific biometric

recognition applications and feature searching methods Therefore they may

not be efficient for Sclera vein recognition Compute Unified Device

Architecture (CUDA) the computing engine of NVIDIA GPUs is used in

this research CUDA is a highly parallel multithreaded many-core

processor with tremendous computational power It supports not only a

traditional graphics pipeline but also computation on non-graphical data

More importantly it offers an easier programming platform which

outperforms its CPU counterparts in terms of peak arithmetic intensity and

memory bandwidth In this research the goal is not to develop a unified

strategy to parallelize all sclera matching methods because each method is

quite different from one another and would need customized design To

develop an efficient parallel computing scheme it would need different

strategies for different Sclera vein recognition methods

Rather the goal is to develop a parallel sclera matching solution for

Sclera vein recognition using our sequential line-descriptor method using

the CUDA GPU architecture However the parallelization strategies

developed in this research can be applied to design parallel approaches for

other Sclera vein recognition methods and help parallelize general pattern

recognition methods Based on the matching approach in there are three

challenges to map the task of sclera feature matching to GPU

1) Mask files are used to calculate valid overlapping areas of two sclera

templates and to align the templates to the same coordinate system But the

mask files are large in size and will preoccupy the GPU memory and slow

down the data transfer Also some of processing on the mask files will

involve convolution which is difficult to improve its performance on the

scalar process unit on CUDA

2) The procedure of sclera feature matching consists of a pipeline of several

computational stages with different memory and processing requirements

There is no uniform mapping scheme applicable for all these stages

3) When the scale of sclera database is far larger than the number of the

processing units on the GPU parallel matching on the GPU is still unable to

satisfy the requirement of real-time performance New designs are

necessary to help narrow down the search range In summary naiumlve

implementation of the algorithms in parallel would not work efficiently

Note it is relatively straightforward to implement our C program for

CUDA on AMD-based GPU using Open CL Our CUDA kernels can be

directly converted to Open CL kernels by concerning different syntax for

various keywords and built-in functions The mapping strategy is also

effective in Open CL if we regard thread and block in CUDA as work item

and work-group in Open CL Most of our optimization techniques such as

coalesced memory access and prefix sum can work in Open CL too

Moreover since CUDA is a data parallel architecture the implementation

of our approach by Open CL should be programmed in data-parallel model

In this research we first discuss why the naiumlve parallel approach would not

work (Section 3) We then propose the new sclera descriptor ndash the Y shape

sclera feature-based efficient registration method to speed up the mapping

scheme (Section 4) introduce the ldquoweighted polar line (WPL) descriptorrdquo

that would be better suited for parallel computing to mitigate the mask size

issue (Section 5) and develop our coarse to fine two-stage matching

process to dramatically improve the matching speed (Section 6) These new

approaches make the parallel processing possible and efficient However it

is non-trivial to implement these algorithms in CUDA We then developed

the implementation schemes to map our algorithms into CUDA (Section 7)

In the Section 2 we give brief introduction of Sclera vein recognition In

the Section 8 we performed some experiments using the proposed system

In the Section 9 we draw some conclusions

22 BACKGROUND OF SCLERA VEIN RECOGNITION

221 OVERVIEW OF SCLERA VEIN RECOGNITION

A typical sclera vein recognition system includes sclera

segmentation feature enhancement feature extraction and feature

matching (Figure 1)

FIG

Sclera image segmentation is the first step in sclera vein recognition

Several methods have been designed for sclera segmentation Crihalmeanu

et al presented an semi-automated system for sclera segmentation They

used a clustering algorithm to classify the color eye images into three

clusters - sclera iris and background Later on Crihalmeanu and Ross

designed a segmentation approach based on a normalized sclera index

measure which includes coarse sclera segmentation pupil region

segmentation and fine sclera segmentation Zhou et al developed a skin

tone plus ldquowhite colorrdquo-based voting method for sclera segmentation in

color images and Otsursquos thresholding-based method for grayscale images

After sclera segmentation it is necessary to enhance and extract the sclera

features since the sclera vein patterns often lack contrast and are hard to

detect Zhou et al used a bank of multi-directional Gabor filters for

vascular pattern enhancement Derakhshani et al used contrast limited

adaptive histogram equalization (CLAHE) to enhance the green color plane

of the RGB image and a multi-scale region growing approach to identify

the sclera veins from the image background Crihalmeanu and Ross applied

a selective enhancement filter for blood vessels to extract features from the

green component in a color image In the feature matching step

Crihalmeanu and Ross proposed

three registration and matching approaches including Speed Up Robust

Features (SURF) which is based on interest-point detection minutiae

detection which is based on minutiae points on the vasculature structure

and direct correlation matching which relies on image registration Zhou et

al designed a line descriptor based feature registration and matching

method

The proposed sclera recognition consists of five steps which include

sclera segmentation vein pattern enhancement feature extraction feature

matching and matching decision Fig 2 shows the block diagram of sclera

recognition Two types of feature extraction are used in the proposed

method to achieve good accuracy for the identification The characteristics

that are elicited from the blood vessel structure seen in the sclera region are

Histogram of Oriented Gradients (HOG) and interpolation of Cartesian to

Polar conversion HOG is used to determine the gradient orientation and

edge orientations of vein pattern in the sclera region of an eye image To

become more computationally efficient the data of the image are converted

to the polar form It is mainly used for circular or quasi circular shape of

object These two characteristics are extracted from all the images in the

database and compared with the features of the query image whether the

person is correctly identified or not This procedure is done in the feature

matching step and ultimately makes the matching decision By using the

proposed feature extraction methods and matching techniques the human

identification is more accurate than the existing studies In the proposed

method two features of an image are drawn out

222 SCLERA SEGMENTATION

Sclera segmentation is the first step in the sclera recognition It lets

in three steps glare area detection sclera area estimation and iris and eyelid

detection and refinement Fig shows the steps of segmentation

FIG

Glare Area Detection Glare area means a small bright area near

pupil or iris This is the unwanted portion on the eye image Sobel filter is

applied to detect the glare area present in the iris or pupil Simply it runs

only for the grayscale image If the image is color then it needs a

conversion to grayscale image and after that apply it to the Sobel filter to

detect the glare area Fig 4 shows the result of the glare area detection

FIG

Sclera area estimation For the estimation of sclera area Otsursquos

thresholding method is applied The stairs of the sclera area detection are

selection of the area of interest (ROI) Otsursquos thresholding sclera area

detection Left and right sclera area is selected based on the iris boundaries

When the region of interest is selected then apply Otsursquos thresholding for

obtaining the potential sclera areas The correct left sclera area should be

placed in the right and center positions and correct right sclera area should

be placed in the left and center In this way non sclera areas are wiped out

223 IRIS AND EYELID REFINEMENT

The top and underside of the sclera regions are the limits of the

sclera area And then that upper eyelid lower eyelid and iris boundaries are

refined These altogether are the unwanted portion for recognition In order

to eliminate these effects refinement is done in the footstep of the detection

of sclera area Fig shows after the Otsursquos thresholding process and iris and

eyelid refinement to detect right sclera area In the same way the left sclera

area is detected using this method

FIG

In the segmentation process all images are not perfectly segmented

Hence feature extraction and matching are needed to reduce the

segmentation fault The vein patterns in the sclera area are not visible in the

segmentation process To get vein patterns more visible vein pattern

enhancement is to be performed

224 OCULAR SURFACE VASCULATURE

Human recognition using vascular patterns in the human body has

been studied in the context of fingers (Miura et al 2004) palm (Lin and

Fan 2004) and retina (Hill 1999) In the case of retinal biometrics an

especial optical device for imaging the back of the eyeball is needed (Hill

1999) Due to its perceived invasiveness and the required degree of subject

cooperation the use of retinal biometrics may not be acceptable to some

individuals The conjunctiva is a thin transparent and moist tissue that

covers the outer surface of the eye The part of the conjunctiva that covers

the inner lining of the eyelids is called palpebral conjunctiva and the part

that covers the outer surface of the eye is called ocular (or the bulbar)

conjunctiva which is the focus of this study The ocular conjunctiva is very

thin and clear thus the vasculature (including those of the episclera) is

easily visible through it The visible microcirculation of conjunctiva offers a

rich and complex network of veins and fine microcirculation (Fig 1) The

apparent complexity and specificity of these vascular patterns motivated us

to utilize them for personal identification (Derakhshani and Ross 2006)

FIG

We have found conjunctival vasculature to be a suitable biometric as it

conforms to the following criteria (Jain et al 2004)

UNIVERSALITY All normal living tissues including that of the

conjunctiva and episclera have vascular structure

UNIQUENESS Vasculature is created during embryonic vasculogenesis

Its detailed final structure is mostly stochastic and thus unique Even

though no comprehensive study on the uniqueness of vascular structures

has been conducted study of some targeted areas such as those of the eye

fundus confirm the uniqueness of such vascular patterns even between

identical twins (Simon and Goldstein 1935 Tower 1955)

PERMANENCE Other than cases of significant trauma pathology or

chemical intervention spontaneous adult ocular vasculogenesis and

angiogenesis does not easily occur Thus the conjunctival vascular

structure is expected to have reasonable permanence (Joussen 2001)

Practicality Conjunctival vasculature can be captured with commercial off

the shelf digital cameras under normal lighting conditions making this

modality highly practical

ACCEPTABILITY Since the subject is not required to stare directly into

the camera lens and given the possibility of capturing the conjunctival

vasculature from several feet away this modality is non-intrusive and thus

more acceptable

SPOOF-PROOFNESS The fine multi surface structure of the ocular

veins makes them hard to reproduce as a physical artifact Besides being a

stand-alone biometric modality we anticipate that the addition of

conjunctival biometrics will enhance the performance of current iris-based

biometric system in the following ways

Improving accuracy by the addition of vascular features

Facilitating recognition using off-angle iris images For instance if the iris

information is relegated to the left or right portions of the eye the sclera

vein patterns will be further exposed This feature makes sclera vasculature

a natural complement to the iris biometric

Addressing the failure-to-enroll issue when iris patterns are not usable (eg

due to surgical procedures)

Reducing vulnerability to spoof attacks For instance when implemented

alongside iris systems an attacker needs to reproduce not only the iris but

also different surfaces of the sclera along with the associated

microcirculation and make them available on commensurate eye surfaces

The first step in parallelizing an algorithm is to determine the

availability for simultaneous computation Below Figure demonstrates the

possibility for parallel directional filtering Since the filter is computed over

different portions of the input image the computation can be computed in

parallel (denoted by Elements below) In addition individual parallelization

of each element of Filtering can also be performed A detailed discussion of

our proposed parallelization is outside the scope of this paper

FIG

FIG

225 OVERVIEW OF THE LINE DESCRIPTOR-BASED SCLERA

VEIN

2251 RECOGNITION METHOD

The matching segment of the line-descriptor based method is a

bottleneck with regard to matching speed In this section we briefly

describe the Line Descriptor-based sclera vein recognition method After

segmentation vein patterns were enhanced by a bank of directional Gabor

filters Binary morphological operations are used to thin the detected vein

structure down to a single pixel wide skeleton and remove the branch

points The line descriptor is used to describe the segments in the vein

structure Figure 2 shows a visual description of the line descriptor Each

segment is described by three quantities the segments angle to some

reference angle at the iris center θ the segments distance to the iris center r

and the dominant angular orientation of the line segment ɸ Thus the

descriptor is S = ( θ r ɸ )T The individual components of the line descriptor

are calculated as

FIG

Here fline (x) is the polynomial approximation of the line segment (xl yl )

is the center point of the line segment (xi yi ) is the center of the detected

iris and S is the line descriptor In order to register the segments of the

vascular patterns a RANSAC-based algorithm is used to estimate the best-

fit parameters for registration between the two sclera vascular patterns For

the registration algorithm it randomly chooses two points ndash one from the

test template and one from the target template It also randomly chooses a

scaling factor and a rotation value based on a priori knowledge of the

database Using these values it calculates a fitness value for the registration

using these parameters

After sclera template registration each line segment in the test

template is compared to the line segments in the target template for

matches In order to reduce the effect of segmentation errors we created the

weighting image (Figure 3) from the sclera mask by setting interior pixels

in the sclera mask to 1 pixels within some distance of the boundary of the

mask to 05 and pixels outside the mask to 0

The matching score for two segment descriptors is calculated By

where Si and Sj are two segment descriptors m(Si Sj ) is the matching

score between segments Si and Sj d(Si Sj ) is the Euclidean distance

between the segment descriptors center points (from Eq 6-8) Dmatch is

the matching distance threshold and match is the matching angle threshold

The total matching score M is the sum of the individual matching scores

divided by the maximum matching score for the minimal set between the

test and target template That is one of the test or target templates has fewer

points and thus the sum of its descriptors weight sets the maximum score

that can be attained

FIG

FIG

FIG

FIG

movement of eye Y shape branches are observed to be a stable feature and

can be used as sclera feature descriptor To detect the Y shape branches in

the original template we search for the nearest neighbors set of every line

segment in a regular distance classified the angles among these neighbors

If there were two types of angle values in the line segment set this set may

be inferred as a Y shape structure and the line segment angles would be

recorded as a new feature of the sclera

There are two ways to measure both orientation and relationship of

every branch of Y shape vessels one is to use the angles of every branch to

x axle the other is to use the angels between branch and iris radial

direction The first method needs additional rotation operating to align the

template In our approach we employed the second method As Figure 6

shows ϕ1 ϕ2 and ϕ3 denote the angle between each branch and the radius

from pupil center Even when the head tilts the eye moves or the camera

zooms occurs at the image acquisition step ϕ1 ϕ2 and ϕ3 are quite stable

To tolerate errors from the pupil center calculation in the segmentation step

we also recorded the center position (x y) of the Y shape branches as

auxiliary parameters So in our rotation shift and scale invariant feature

vector is defined as y(ϕ1 ϕ2 ϕ3 x y) The Y-shape descriptor is generated

with reference to the iris center Therefore it is automatically aligned to the

iris centers It is a rotational- and scale- invariant descriptor V WPL

SCLERA DESCRIPTOR As we discussed in the Section 22 the line

descriptor is extracted from the skeleton of vessel structure in binary images

(Figure 7) The skeleton is then broken into smaller segments For each

segment a line descriptor is created to record the center and orientation of

the segment This descriptor is expressed as s(x yɸ) where (x y) is the

position of the center and ɸ is its orientation Because of the limitation of

segmentation accuracy the descriptor in the boundary of sclera area might

not be accurate and may contain spur edges resulting from the iris eyelid

andor eyelashes To be tolerant of such error the mask file

FIG

The line descriptor of the sclera vessel pattern (a) An eye image (b) Vessel

patterns in sclera (c) Enhanced sclera vessel patterns (d) Centers of line

segments of vessel patterns

Is designed to indicate whether a line segment belongs to the edge of the

sclera or not However in GPU application using the mask is a challenging

since the mask files are large in size and will occupy the GPU memory and

slow down the data transfer When matching the registration RANSAC

type algorithm was used to randomly select the corresponding descriptors

and the transform parameter between them was used to generate the

template transform affine matrix After every templates transform the mask

data should also be transformed and new boundary should be calculated to

evaluate the weight of the transformed descriptor This results in too many

convolutions in processor unit

To reduce heavy data transfer and computation we designed the

weighted polar line (WPL) descriptor structure which includes the

information of mask and can be automatically aligned We extracted the

relationship of geometric feature of descriptors and store them as a new

descriptor We use a weighted image created via setting various weight

values according to their positions The weight of those descriptors who are

beyond the sclera are set to be 0 and those who are near the sclera

boundary are 05 and interior descriptors are set to be 1 In our work

descriptors weights were calculated on their own mask by the CPU only

once

The calculating result was saved as a component of descriptor The

descriptor of sclera will change to s(x y ɸw) where w denotes the weight

of the point and the value may be 0 05 1 To align two templates when a

template is shifted to another location along the line connecting their

centers all the descriptors of that template will be transformed It would be

faster if two templates have similar reference points If we use the center of

the iris as the reference point when two templates are compared the

correspondence will automatically be aligned to each other since they have

the similar reference point Every feature vector of the template is a set of

line segment descriptors composed of three variable (Figure 8) the

segment angle to the reference line which went through the iris center

denoted as θ the distance between the segments center and pupil center

which is denoted as r the dominant angular orientation of the segment

denoted as ɸ To minimize the GPU computing we also convert the

descriptor value from polar coordinate to rectangular coordinate in CPU

preprocess

The descriptor vector becomes s(x y r θ ɸw) The left and right

parts of sclera in an eye may have different registration parameters For

example as an eyeball moves left left part sclera patterns of the eye may be

compressed while the right part sclera patterns are stretched

In parallel matching these two parts are assigned to threads in

different warps to allow different deformation The multiprocessor in

CUDA manages threads in groups of 32 parallel threads called warps We

reorganized the descriptor from same sides and saved

FIG

FIG

them in continuous address This would meet requirement of coalesced

memory access in GPU

After reorganizing the structure of descriptors and adding mask information

into the new descriptor the computation on the mask file is not needed on

the GPU It was very fast to match with this feature because it does not

need to reregister the templates every time after shifting Thus the cost of

data transfer and computation on GPU will be reduced Matching on the

new descriptor the shift parameters generator in Figure 4 is then simplified

as Figure 9

23 EVOLUTION OF GPU ARCHITECTURE

The fixed-function pipeline lacked the generality to efficiently express

more complicated shading and lighting operations that are essential for

complex effects The key step was replacing the fixed-function per-vertex

and per-fragment operations with user-specified programs run on each

vertex and fragment Over the past six years these vertex programs and

fragment programs have become increasingly more capable with larger

limits on their size and resource consumption with more fully featured

instruction sets and with more flexible control-flow operations After many

years of separate instruction sets for vertex and fragment operations current

GPUs support the unified Shader Model 40 on both vertex and fragment

shaders

The hardware must support shader programs of at least 65 k static

instructions and unlimited dynamic instructions

The instruction set for the first time supports both 32-bit integers and 32-

bit floating-point numbers

The hardware must allow an arbitrary number of both direct and indirect

reads from global memory (texture)

Finally dynamic flow control in the form of loops and branches must be

supported

As the shader model has evolved and become more powerful and GPU

applications of all types have increased vertex and fragment program

complexity GPU architectures have increasingly focused on the

programmable parts of the graphics pipeline Indeed while previous

generations of GPUs could best be described as additions of

programmability to a fixed-function pipeline todayrsquos GPUs are better

characterized as a programmable engine surrounded by supporting fixed-

function units General-Purpose Computing on the GPU Mapping general-

purpose computation onto the GPU uses the graphics hardware in much the

same way as any standard graphics application Because of this similarity it

is both easier and more difficult to explain the process On one hand the

actual operations are the same and are easy to follow on the other hand the

terminology is different between graphics and general-purpose use Harris

provides an excellent description of this mapping process

We begin by describing GPU programming using graphics terminology

then show how the same steps are used in a general-purpose way to author

GPGPU applications and finally use the same steps to show the more

simple and direct way that todayrsquos GPU computing applications are written

231 PROGRAMMING A GPU FOR GRAPHICS

We begin with the same GPU pipeline that we described in Section II

concentrating on the programmable aspects of this pipeline

The programmer specifies geometry that covers a region on the screen

The rasterizer generates a fragment at each pixel location covered by that

geometry

Each fragment is shaded by the fragment program

The fragment program computes the value of the fragment by a

combination of math operations and global memory reads from a global

Btexture[ memory

The resulting image can then be used as texture on future passes through

the graphics pipeline

232 PROGRAMMING A GPU FOR GENERAL-PURPOSE

PROGRAMS (OLD)

Coopting this pipeline to perform general-purpose computation

involves the exact same steps but different terminology A motivating

example is a fluid simulation computed over a grid at each time step we

compute the next state of the fluid for each grid point from the current state

at its grid point and at the grid points of its neighbors

The programmer specifies a geometric primitive that covers a

computation domain of interest The rasterizer generates a fragment at each

pixel location covered by that geometry (In our example our primitive

must cover a grid of fragments equal to the domain size of our fluid

simulation)

Each fragment is shaded by an SPMD general-purpose fragment

program (Each grid point runs the same program to update the state of its

fluid)

The fragment program computes the value of the fragment by a

combination of math operations and Bgather[ accesses from global

memory (Each grid point can access the state of its neighbors from the

previous time step in computing its current value)

The resulting buffer in global memory can then be used as an input on

future passes (The current state of the fluid will be used on the next time

step)

23 2 PROGRAMMING A GPU FOR GENERAL-PURPOSE

PROGRAMS (NEW)

One of the historical difficulties in programming GPGPU applications

has been that despite their general-purpose tasksrsquo having nothing to do with

graphics the applications still had to be programmed using graphics APIs

In addition the program had to be structured in terms of the graphics

pipeline with the programmable units only accessible as an intermediate

step in that pipeline when the programmer would almost certainly prefer to

access the programmable units directly The programming environments we

describe in detail in Section IV are solving this difficulty by providing a

more natural direct non-graphics interface to the hardware and

specifically the programmable units Today GPU computing applications

are structured in the following way

The programmer directly defines the computation domain of interest as a

structured grid of threads

An SPMD general-purpose program computes the value of each thread

The value for each thread is computed by a combination of math

operations and both Bgather[ (read) accesses from and Bscatter[ (write)

accesses to global memory Unlike in the previous two

methods the same buffer can be used for both reading and writing

allowing more flexible algorithms (for example in-place algorithms that

use less memory)

The resulting buffer in global memory can then be used as an input in

future computation

24 COARSE-TO-FINE TWO-STAGE MATCHING PROCESS

To further improve the matching process we propose the coarse-to-fine

two-stage matching process In the first stage we matched two images

coarsely using the Y-shape descriptors which is very fast to match because

no registration was needed The matching result in this stage can help filter

out image pairs with low similarities After this step it is still possible for

some false positive matches In the second stage we used WPL descriptor

to register the two images for more detailed descriptor matching including

scale- and translation invariance This stage includes shift transform affine

matrix generation and final WPL descriptor matching Overall we

partitioned the registration and matching processing into four kernels2 in

CUDA (Figure 10) matching on the Y shape descriptor shift

transformation affine matrix generation and final WSL descriptor

matching Combining these two stages the matching program can run faster

and

achieve more accurate score

241 STAGE I MATCHING WITH Y SHAPE DESCRIPTOR

Due to scale- and rotation- invariance of the Y-shape features

registration is unnecessary before matching on Y shape descriptor The

whole matching algorithm is listed as algorithm 1

FIG

Here ytei and yta j are the Y shape descriptors of test template Tte

and target template Tta respectively dϕ is the Euclidian distance of angle

element of descriptors vector defined as (3) dxy is the Euclidian distance of

two descriptor centers defined as (4) ni and di are the matched descriptor

pairsrsquo number and their centers distance respectively tϕ is a distance

threshold and txy is the threshold to restrict the searching area We set tϕ to

30 and txy to 675 in our experiment Here

To match two sclera templates we searched the areas nearby to all

the Y shape branches The search area is limited to the corresponding left or

right half of the sclera in order to reduce the searching range and time The

distance of two branches is defined in (3) where ϕi j is the angle between

the j th branch and the polar from pupil center in desctiptor i

The number of matched pairs ni and the distance between Y shape

branches centers di are stored as the matching result We fuse the number of

matched branches and the average distance between matched branches

centers as (2) Here α is a factor to fuse the matching score which was set

to 30 in our study Ni and Nj is the total numbers of feature vectors in

template i and j separately The decision is regulated by the threshold t if

the sclerarsquos matching score is lower than t the sclera will be discarded The

sclera with high matching score will be passed to the next more precisely

matching process

242 STAGE II FINE MATCHING USING WPL DESCRIPTOR

The line segment WSL descriptor reveals more vessel structure detail of

sclera than the Y shape descriptor The variation of sclera vessel pattern is

nonlinear because

When acquiring an eye image in different gaze angle the vessel structure

will appear nonlinear shrink or extend because eyeball is spherical in shape

sclera is made up of four layers episclera stroma lamina fusca and

endothelium There are slightly differences among movement of these

layers Considering these factors our registration employed both single

shift transform and multi-parameter transform which combines shift

rotation and scale together

1) SHIFT PARAMETER SEARCH As we discussed before

segmentation may not be accurate As a result the detected iris center could

not be very accurate Shift transform is designed to tolerant possible errors

in pupil center detection in the segmentation step If there is no deformation

or only very minor deformation registration with shift transform together

would be adequate to achieve an accurate result We designed Algorithm 2

to get optimized shift parameter Where Tte is the test template and ssei is

the i th WPL descriptor of Tte Tta is the target template and ssai is the i th

WPL descriptor of Ttad(stek staj ) is Euclidean distance of escriptors stek

and staj

Δsk is the shift value of two descriptors defines as

We first randomly select an equal number of segment descriptors

stek in test template Tte from each quad and find its nearest neighbors staj _

in target template Tta The shift offset of them is recorded as the possible

registration shift factor _sk The final offset registration factor is _soptim

which has the smallest standard deviation among these candidate offsets

2) AFFINE TRANSFORM PARAMETER SEARCH

Affine transform is designed to tolerant some deformation of sclera

patterns in the matching step The affine transform algorithm is shown in

Algorithm 3 The shift value in the parameter set is obtained by randomly

selecting descriptor s(it )te and calculating the distance from its nearest

neighbor staj_ in Tta We transform the test template by the matrix in (7)

At end of the iteration we count the numbers of matched descriptor pairs

from the transformed template and the target template The factor β is

involved to determine if the pair of descriptor is matched and we set it to

be 20 pixels in our experiment After N iterations the optimized transform

parameter set is determined via selecting the maximum matching numbers

m(it) Here stei Tte staj and Tta is defined same as algorithm 2 tr (it )

shi f t θ(it )tr (it ) scale is the parameters of shift rotation and scale

transform generated in i tth iteration R(θ (it )) T (tr (it ) shi f t ) and S(tr (it

) scale) are the transform matrix defined as (7) To search optimize

transform parameter we iterated N times to generate these parameters In

our experiment we set iteration time to 512

3) REGISTRATION AND MATCHING ALGORITHM

Using the optimized parameter set determined from Algorithms 2

and 3 the test template will be registered and matched simultaneously The

registration and matching algorithm is listed in Algorithm 4 Here stei Tte

staj and Tta are defined same as Algorithms 2 and 3 θ(optm) tr (optm) shi f

t tr (optm) scale _soptim are the registration parameters attained from

Algorithms 2 and 3 R_θ(optm)_T _tr (optm) shi f t _S_tr (optm) scale

_ is the descriptor transform matrix defined in Algorithm 3 empty is the angle

between the segment descriptor and radius direction w is the weight of the

descriptor which indicates whether the descriptor is at the edge of sclera or

not To ensure that the nearest descriptors have a similar orientation we

used a constant factor α to check the abstract difference of two ɸ In our

experiment we set α to 5 The total matching score is minimal score of two

transformed result divided by the minimal matching score for test template

and target template

25 MAPPING THE SUBTASKS TO CUDA

CUDA is a single instruction multiple data (SIMD) system and

works as a coprocessor with a CPU A CUDA consists of many streaming

multiprocessors (SM) where the parallel part of the program should be

partitioned into threads by the programmer and mapped into those threads

There are multiple memory spaces in the CUDA memory hierarchy

register local memory shared memory global memory constant memory

and texture memory Register local memory and shared memory are on-

chip and could be a little time consuming to access these memories Only

shared memory can be accessed by other threads within the same block

However there is only limited availability of shared memory Global

memory constant memory and texture memory are off-chip memory and

accessible by all threads which would be very time consuming to access

these memories

Constant memory and texture memory are read-only and cacheable

memory Mapping algorithms to CUDA to achieve efficient processing is

not a trivial task There are several challenges in CUDA programming

If threads in a warp have different control path all the branches will be

executed serially To improve performance branch divergence within a

warp should be avoided

Global memory is slower than on-chip memory in term of access To

completely hide the latency of the small instructions set we should use on-

chip memory preferentially rather than global memory When global

memory access occurs threads in same warp should access the words in

sequence to achieve coalescence

Shared memory is much faster than the local and global memory space

But shared memory is organized into banks which are equal in size If two

addresses of memory request from different thread within a warp fall in the

same memory bank the access will be serialized To get maximum

performance memory requests should be scheduled to minimize bank

conflicts

251 MAPPING ALGORITHM TO BLOCKS

Because the proposed registration and matching algorithm has four

independent modules all the modules will be converted to different kernels

on the GPU These kernels are different in computation density thus we

map them to the GPU by various map strategies to fully utilize the

computing power of CUDA Figure 11 shows our scheme of CPU-GPU

task distribution and the partition among blocks and threads Algorithm 1 is

partitioned into coarse-grained parallel subtasks

We create a number of threads in this kernel The number of threads

is the same as the number of templates in the database As the upper middle

column shows in Figure 11 each target template will be assigned to one

thread One thread performs a pair of templates compare In our work we

use NVIDIA C2070 as our GPU Threads and blocks number is set to

1024 That means we can match our test template with up to 1024times1024

target templates at same time

Algorithms 2-4 will be partitioned into fine-grained subtasks which is

processed a section of descriptors in one thread As the lower portion of the

middle column shows in Figure 11 we assigned a target template to one

block Inside a block one thread corresponds a set of descriptors in this

template This partition makes every block execute independently and there

are no data exchange requirements between different blocks When all

threads complete their responding descriptor fractions the sum of the

intermediate results needs to be computed or compared A parallel prefix

sum algorithm is used to calculate the sum of intermediate results which is

show in right of Figure 11 Firstly all odd number threads compute the sum

of consecutive pairs of the results Then recursively every first of i (= 4 8

16 32 64 ) threads

compute the prefix sum on the new result The final result will be saved in

the first address which has the same variable name as the first intermediate

result

252 MAPPING INSIDE BLOCK

In shift argument searching there are two schemes we can choose to

map task

Mapping one pair of templates to all the threads in a block and then every

thread would take charge of a fraction of descriptors and cooperation with

other threads

Assigning a single possible shift offset to a thread and all the threads will

compute independently unless the final result should be compared with

other possible offset

Due to great number of sum and synchronization operations in every

nearest neighbor searching step we choose the second method to parallelize

shift searching In affine matrix generator we mapped an entire parameter

set searching to a thread and every thread randomly generated a set of

parameters and tried them independently The generated iterations were

assigned to all threads The challenge of this step is the randomly generated

numbers might be correlated among threads In the step of rotation and

scale registration generating we used the Mersenne Twister pseudorandom

number generator because it can use bitwise arithmetic and have long

period

The Mersenne twister as most of pseudorandom generators is iterative

Therefore itrsquos hard to parallelize a single twister state update step among

several execution threads To make sure that thousands of threads in the

launch grid generate uncorrelated random sequence many simultaneous

Mersenne twisters need to process with different initial states in parallel

But even ldquovery differentrdquo (by any definition) initial state values do not

prevent the emission of correlated sequences by each generator sharing

identical parameters To solve this problem and to enable efficient

implementation of Mersenne Twister on parallel architectures we used a

special offline tool for the dynamic creation of Mersenne Twisters

parameters modified from the algorithm developed by Makoto Matsumoto

and Takuji Nishimura In the registration and matching step when

searching the nearest neighbor a line segment that has already matched

with others should not be used again In our approach a flag

FIG

FIG

Variable denoting whether the line has been matched is stored in

shared memory To share the flags all the threads in a block should wait

synchronic operation at every query step Our solution is to use a single

thread in a block to process the matching

253 MEMORY MANAGEMENT

The bandwidth inside GPU board is much higher than the

bandwidth between host memory and device memory The data transfer

between host and device can lead to long latency As shown in Figure 11

we load the entire target templates set from database without considering

when they would be processed Therefore there was no data transfer from

host to device during the matching procedure In global memory the

components in descriptor y(ϕ1 ϕ2 ϕ3 x y) and s(x y rθ ϕw) were stored

separately This would guarantee contiguous kernels of Algorithm 2 to 4

can access their data in successive addresses Although such coalescing

access reduces the latency frequently global memory access was still a

slower way to get data In our kernel we loaded the test template to shared

memory to accelerate memory access Because the Algorithms 2 to 4

execute different number of iterations on same data the bank conflict does

not happen To maximize our texture memory space we set the system

cache to the lowest value and bonded our target descriptor to texture

memory Using this catchable memory our data access was accelerated

more

FIG

26 HISTOGRAM OF ORIENTED GRADIENTS

Histogram of oriented gradients is the feature descriptors It is primarily

applied to the design of target detection In this paper it is applied as the

feature for human recognition In the sclera region the vein patterns are the

edges of an image So HOG is used to determine the gradient orientation

and edge orientations of vein pattern in the sclera region of an eye image

To follow out this technique first of all divide the image into small

connected regions called cells For each cell compute the histogram of

gradient directions or edge orientations of the pixels Then the combination

of different histogram of different cell represents the descriptor To improve

accuracy histograms can be contrast normalized by calculating the intensity

from the block and then using this value normalizes all cells within the

block This normalization result shows that it is invariant to geometric and

photometric changes The gradient magnitude m(x y) and orientation 1050592(x

y) are calculated using x and y directions gradients dx (x y) and dy (x y)

Orientation binning is the second step of HOG This method utilized

to create cell histograms Each pixel within the cell used to give a weight to

the orientation which is found in the gradient computation Gradient

magnitude is used as the weight The cells are in the rectangular form The

binning of gradient orientation should be spread over 0 to 180 degrees and

opposite direction counts as the same In the Fig 8 depicts the edge

orientation of picture elements If the images have any illumination and

contrast changes then the gradient strength must be locally normalized For

that cells are grouped together into larger blocks These blocks are

overlapping so that each cell contributes more than once to the final

descriptor Here rectangular HOG (R-HOG) blocks are applied which are

mainly in square grids The performance of HOG is improved by putting

on a Gaussian window into each block

FIG

CHAPTER 3

SOFTWARE SPECIFICATION

31 GENERAL

MATLAB(matrix laboratory) is a numerical

computing environment and fourth-generation programming language

Developed by Math Works MATLAB allows matrix manipulations

plotting of functions and data implementation of algorithms creation

of user interfaces and interfacing with programs written in other languages

including C C++ Java and Fortran

Although MATLAB is intended primarily for numerical computing an

optional toolbox uses the MuPAD symbolic engine allowing access

to symbolic computing capabilities An additional package Simulink adds

graphicalmulti-domainsimulationandModel-Based

Design for dynamic and embedded systems

In 2004 MATLAB had around one million users across industry

and academia MATLAB users come from various backgrounds

of engineering science and economics MATLAB is widely used in

academic and research institutions as well as industrial enterprises

MATLAB was first adopted by researchers and practitioners

in control engineering Littles specialty but quickly spread to many other

domains It is now also used in education in particular the teaching

of linear algebra and numerical analysis and is popular amongst scientists

involved in image processing The MATLAB application is built around the

MATLAB language The simplest way to execute MATLAB code is to type

it in the Command Window which is one of the elements of the MATLAB

Desktop When code is entered in the Command Window MATLAB can

be used as an interactive mathematical shell Sequences of commands can

be saved in a text file typically using the MATLAB Editor as a script or

encapsulated into a function extending the commands available

MATLAB provides a number of features for documenting and

sharing your work You can integrate your MATLAB code with other

languages and applications and distribute your MATLAB algorithms and

applications

32 FEATURES OF MATLAB

High-level language for technical computing

Development environment for managing code files and data

Interactive tools for iterative exploration design and problem solving

Mathematical functions for linear algebra statistics Fourier analysis

filtering optimization and numerical integration

2-D and 3-D graphics functions for visualizing data

Tools for building custom graphical user interfaces

Functions for integrating MATLAB based algorithms with external

applications and languages such as C C++ FORTRAN Javatrade COM

and Microsoft Excel

MATLAB is used in vast area including signal and image

processing communications control design test and measurement

financial modeling and analysis and computational Add-on toolboxes

(collections of special-purpose MATLAB functions) extend the MATLAB

environment to solve particular classes of problems in these application

areas

MATLAB can be used on personal computers and powerful

server systems including the Cheaha compute cluster With the addition of

the Parallel Computing Toolbox the language can be extended with parallel

implementations for common computational functions including for-loop

unrolling Additionally this toolbox supports offloading computationally

intensive workloads to Cheaha the campus compute cluster MATLAB is

one of a few languages in which each variable is a matrix (broadly

construed) and knows how big it is Moreover the fundamental operators

(eg addition multiplication) are programmed to deal with matrices when

required And the MATLAB environment handles much of the bothersome

housekeeping that makes all this possible Since so many of the procedures

required for Macro-Investment Analysis involves matrices MATLAB

proves to be an extremely efficient language for both communication and

implementation

321 INTERFACING WITH OTHER LANGUAGES

MATLAB can call functions and subroutines written in the C

programming language or FORTRAN A wrapper function is created

allowing MATLAB data types to be passed and returned The dynamically

loadable object files created by compiling such functions are termed MEX-

files (for MATLAB executable)

Libraries written in Java ActiveX or NET can be directly called

from MATLAB and many MATLAB libraries (for

example XML or SQL support) are implemented as wrappers around Java

or ActiveX libraries Calling MATLAB from Java is more complicated but

can be done with MATLAB extension which is sold separately by Math

Works or using an undocumented mechanism called JMI (Java-to-Mat lab

Interface) which should not be confused with the unrelated Java that is also

called JMI

As alternatives to the MuPAD based Symbolic Math Toolbox

available from Math Works MATLAB can be connected

to Maple or Mathematical

Libraries also exist to import and export MathML

Development Environment

Startup Accelerator for faster MATLAB startup on Windows especially on

Windows XP and for network installations

Spreadsheet Import Tool that provides more options for selecting and

loading mixed textual and numeric data

Readability and navigation improvements to warning and error messages in

the MATLAB command window

Automatic variable and function renaming in the MATLAB Editor

Developing Algorithms and Applications

MATLAB provides a high-level language and development

tools that let you quickly develop and analyze your algorithms and

applications

The MATLAB Language

The MATLAB language supports the vector and matrix operations

that are fundamental to engineering and scientific problems It enables fast

development and execution With the MATLAB language you can

program and develop algorithms faster than with traditional languages

because you do not need to perform low-level administrative tasks such as

declaring variables specifying data types and allocating memory In many

cases MATLAB eliminates the need for lsquoforrsquo loops As a result one line of

MATLAB code can often replace several lines of C or C++ code

At the same time MATLAB provides all the features of a traditional

programming language including arithmetic operators flow control data

structures data types object-oriented programming (OOP) and debugging

features

MATLAB lets you execute commands or groups of commands one

at a time without compiling and linking enabling you to quickly iterate to

the optimal solution For fast execution of heavy matrix and vector

computations MATLAB uses processor-optimized libraries For general-

purpose scalar computations MATLAB generates machine-code

instructions using its JIT (Just-In-Time) compilation technology

This technology which is available on most platforms provides

execution speeds that rival those of traditional programming languages

Development Tools

MATLAB includes development tools that help you implement

your algorithm efficiently These include the following

MATLAB Editor

Provides standard editing and debugging features such as setting

breakpoints and single stepping

Code Analyzer

Checks your code for problems and recommends modifications to

maximize performance and maintainability

MATLAB Profiler

Records the time spent executing each line of code

Directory Reports

Scan all the files in a directory and report on code efficiency file

differences file dependencies and code coverage

Designing Graphical User Interfaces

By using the interactive tool GUIDE (Graphical User Interface

Development Environment) to layout design and edit user interfaces

GUIDE lets you include list boxes pull-down menus push buttons radio

buttons and sliders as well as MATLAB plots and Microsoft

ActiveXreg controls Alternatively you can create GUIs programmatically

using MATLAB functions

322 ANALYZING AND ACCESSING DATA

MATLAB supports the entire data analysis process from acquiring

data from external devices and databases through preprocessing

visualization and numerical analysis to producing presentation-quality

output

Data Analysis

MATLAB provides interactive tools and command-line functions for data

analysis operations including

Interpolating and decimating

Extracting sections of data scaling and averaging

Thresholding and smoothing

Correlation Fourier analysis and filtering

1-D peak valley and zero finding

Basic statistics and curve fitting

Matrix analysis

Data Access

MATLAB is an efficient platform for accessing data from

files other applications databases and external devices You can read data

from popular file formats such as Microsoft Excel ASCII text or binary

files image sound and video files and scientific files such as HDF and

HDF5 Low-level binary file IO functions let you work with data files in

any format Additional functions let you read data from Web pages and

XML

Visualizing Data

All the graphics features that are required to visualize engineering

and scientific data are available in MATLAB These include 2-D and 3-D

plotting functions 3-D volume visualization functions tools for

interactively creating plots and the ability to export results to all popular

graphics formats You can customize plots by adding multiple axes

changing line colors and markers adding annotation Latex equations and

legends and drawing shapes

2-D Plotting

Visualizing vectors of data with 2-D plotting functions that create

Line area bar and pie charts

Direction and velocity plots

Histograms

Polygons and surfaces

Scatterbubble plots

Animations

3-D Plotting and Volume Visualization

MATLAB provides functions for visualizing 2-D matrices 3-

D scalar and 3-D vector data You can use these functions to visualize and

understand large often complex multidimensional data Specifying plot

characteristics such as camera viewing angle perspective lighting effect

light source locations and transparency

3-D plotting functions include

Surface contour and mesh

Image plots

Cone slice stream and isosurface

323 PERFORMING NUMERIC COMPUTATION

MATLAB contains mathematical statistical and engineering

functions to support all common engineering and science operations These

functions developed by experts in mathematics are the foundation of the

MATLAB language The core math functions use the LAPACK and BLAS

linear algebra subroutine libraries and the FFTW Discrete Fourier

Transform library Because these processor-dependent libraries are

optimized to the different platforms that MATLAB supports they execute

faster than the equivalent C or C++ code

MATLAB provides the following types of functions for performing

mathematical operations and analyzing data

Matrix manipulation and linear algebra

Polynomials and interpolation

Fourier analysis and filtering

Data analysis and statistics

Optimization and numerical integration

Ordinary differential equations (ODEs)

Partial differential equations (PDEs)

Sparse matrix operations

MATLAB can perform arithmetic on a wide range of data types

including doubles singles and integers

CHAPTER 4

IMPLEMENTATION

41 GENERAL

Matlab is a program that was originally designed to simplify the

implementation of numerical linear algebra routines It has since grown into

something much bigger and it is used to implement numerical algorithms

for a wide range of applications The basic language used is very similar to

standard linear algebra notation but there are a few extensions that will

likely cause you some problems at first

42 SNAPSHOTS

ORIGINAL SCLERA IMAGE IS CONVERTED INTO GREY SCALE IMAGE

FIG

GREY SCALE IMAGE IS CONVERTED INTO BINARY IMAGE

FIG

EDGE DETECTON IS DONE BY OTSUrsquoS THRESHOLDING

FIG

SELECTING THE REGION OF INTEREST (SCLERA PART)

FIG

SELECTED ROI PART

FIG

FIG

ENHANCEMENT OF SCLERA IMAGE

FIG

FEATURE EXTRACTION OF SCLERA IMAGE USING GABOR

FILTERS

FIG

MATCHING WITH IMAGES IN DATABASE

FIG

DISPLAYING THE RESULT (MATCHED OR NOT MATCHED)

FIG

CHAPTER 5

APPLICATIONS

The applications of biometrics can be divided into the following three main groups

Commercial applications such as computer network login electronic data security ecommerce Internet access ATM credit card physical access control cellular phone PDA medical records management distance learning etc

Government applications such as national ID card correctional facility driverrsquos license social security welfare-disbursement border control Passport control etc

Forensic applications such as corpse identification criminal investigation terrorist identification parenthood determination missing children etc Traditionally commercial applications have used knowledge-based systems (eg PIN sand passwords) government applications have used token-based systems (eg ID cards and badges) and forensic applications have relied on human experts to match biometric features Biometric systems are being increasingly deployed in large scale civilian applications The Schiphol Premium scheme at the Amsterdam airport for example employs iris scan cards to speed up the passport and visa control procedures

CHAPTER 6

CONCLUSION AND FUTURE SCOPE

61 CONCLUSION

In this paper we proposed a new parallel sclera vein recognition

method which employees a two stage parallel approach for registration and

matching Even though the research focused on developing a parallel sclera

matching solution for the sequential line-descriptor method using CUDA

GPU architecture the parallel strategies developed in this research can be

applied to design parallel solutions to other sclera vein recognition methods

and general pattern recognition methods We designed the Y shape

descriptor to narrow the search range to increase the matching efficiency

which is a new feature extraction method to take advantage of the GPU

structures We developed the WPL descriptor to incorporate mask

information and make it more suitable for parallel computing which can

dramatically reduce data transferring and computation We then carefully

mapped our algorithms to GPU threads and blocks which is an important

step to achieve parallel computation efficiency using a GPU A work flow

which has high arithmetic intensity to hide the memory access latency was

designed to partition the computation task to the heterogeneous system of

CPU and GPU even to the threads in GPU The proposed method

dramatically improves the matching efficiency without compromising

recognition accuracy

62 REFERENCES

[1] C W Oyster The Human Eye Structure and Function Sunderland

Sinauer Associates 1999

[2] C Cuevas D Berjon F Moran and N Garcia ldquoMoving object

detection for real-time augmented reality applications in a GPGPUrdquo IEEE

Trans Consum Electron vol 58 no 1 pp 117ndash125 Feb 2012

[3] D C Cirean U Meier L M Gambardella and J Schmidhuber ldquoDeep

big simple neural nets for handwritten digit recognitionrdquo Neural Comput

vol 22 no 12 pp 3207ndash3220 2010

[4] F Z Sakr M Taher and A M Wahba ldquoHigh performance iris

recognition system on GPUrdquo in Proc ICCES 2011 pp 237ndash242

[5] G Poli J H Saito J F Mari and M R Zorzan ldquoProcessing

neocognitron of face recognition on high performance environment based

on GPU with CUDA architecturerdquo in Proc 20th Int Symp Comput

Archit High Perform Comput 2008 pp 81ndash88

[6] J Antikainen J Havel R Josth A Herout P Zemcik and M Hauta-

Kasari ldquoNonnegative tensor factorization accelerated using GPGPUrdquo IEEE

Trans Parallel Distrib Syst vol 22 no 7 pp 1135ndash1141 Feb 2011

[7] K-S Oh and K Jung ldquoGPU implementation of neural networksrdquo

Pattern Recognit vol 37 no 6 pp 1311ndash1314 2004

[8] P R Dixon T Oonishi and S Furui ldquoHarnessing graphics processors

for the fast computation of acoustic likelihoods in speech recognitionrdquo

Comput Speech Lang vol 23 no 4 pp 510ndash526 2009

[9] P Kaufman and A Alm ldquoClinical applicationrdquo Adlerrsquos Physiology of

the Eye 2003

[10] R N Rakvic B J Ulis R P Broussard R W Ives and N Steiner

ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics Security vol 4

no 4 pp 812ndash823 Dec 2009

[11] S Crihalmeanu and A Ross ldquoMultispectral scleral patterns for ocular

biometric recognitionrdquo Pattern Recognit Lett vol 33 no 14 pp 1860ndash

1869 Oct 2012

[12] W Wenying Z Dongming Z Yongdong L Jintao and G

Xiaoguang ldquoRobust spatial matching for object retrieval and its parallel

implementation on GPUrdquo IEEE Trans Multimedia vol 13 no 6 pp

1308ndash1318 Dec 2011Multimedia Sec Magdeburg Germany Sep 2004

pp 4ndash15

[13] Y Xu S Deka and R Righetti ldquoA hybrid CPU-GPGPU approach for

real-time elastographyrdquo IEEE Trans Ultrason Ferroelectr Freq Control

vol 58 no 12 pp 2631ndash2645 Dec 2011

[14] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive

multimodal eye recognitionrdquo Signal Image Video Process vol 7 no 4

pp 619ndash631 Jul 2013

[15] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive

approach for sclera image quality measurerdquo Int J Biometrics vol 5 no 2

pp 181ndash198 2013

[16] Z Zhou E Y Du N L Thomas and E J Delp ldquoA new human

identification method Sclera recognitionrdquo IEEE Trans Syst Man

Cybern A Syst Humans vol 42 no 3 pp 571ndash583 May 2012

Page 4: Major project report on

DECLARATION

We hereby declare that the work which is being presented in this project

entitled ldquoAN EFFICIENT PARALLEL APPROACH FOR SCLERA

VEIN RECOGNITIONrdquo submitted towards the partial fulfilment of the

requirement for the award of the Degree of Bachelor Of Technology in

ldquoElectronics and Communication Engineeringrdquo is an authentic record of

our work under the supervision of Ms Syeda Sana Fatima Assistant

professor and Ms SSuneeta Head of the Department of Electronics and

Communication Engineering SHADAN WOMENS COLLEGE OF

ENGINEERING AND TECHNOLOGY affiliated to Jawaharlal Nehru

Technology University Hyderabad

The matter embodied in this report has not been submitted for the award of

any other degree

ADLA KIRANMAYI

ANNABATHULA SRILATHA

MAYESHA MUBEEN

INDEX

ABSTRACThelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

helliphelliphelliphellipi

CHAPTER 1

INTRODUCTIONhelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

hellip1-16

11 GENERAL

12 OVERVIEW ABOUT DIGITAL IMAGE PROCESSING

121 PREPROCESSING

122 IMAGE ENHANCEMENT

123 IMAGE RESTORATION

124 IMAGE COMPRESSION

125 SEGMENTATION

126 IMAGE RESTORATION

127 FUNDAMENTAL STEPS

13 A SIMPLE IMAGE MODEL

14 IMAGE FILE FORMATS

15 TYPE OF IMAGES

151 BINARY IMAGES

152 GRAY SCALE IMAGE

153 COLOR IMAGE

154 INDEXED IMAGE

16 APPLICATIONS OF IMAGE PROCESSING

17 EXIXTING SYSTEM

171 DISADVANTAGES OF EXISTING SYSTEM

18 LITERATURE SURVEY

19 PROPOSED SYSTEM

191 ADVANTAGES

CHAPTER 2 PROJECT

DESCRIPTIONhelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip17-46

21 INTRODUCTION

22 BACKGROUND OF SCLERA VEIN RECOGNITION

221 OVERVIEW OF SCLERA VEIN RECOGNITION

222 SCLERA SEGMENTATION

223 IRIS AND EYELID REFINEMENT

224 OCULAR SURFACE VASCULATURE

225 OVERVIEW OF THE LINE DESCRIPTOR-BASED SCLERA

VEIN

23 EVOLUTION OF GPU ARCHITECTURE

231 PROGRAMMING A GPU FOR GRAPHICS

232 PROGRAMMING A GPU FOR GENERAL-PURPOSE

PROGRAMS (OLD)

233 PROGRAMMING A GPU FOR GENERAL-PURPOSE

PROGRAMS (NEW)

24 COARSE-TO-FINE TWO-STAGE MATCHING PROCESS

241 STAGE I MATCHING WITH Y SHAPE DESCRIPTOR

242 STAGE II FINE MATCHING USING WPL DESCRIPTOR

25 MAPPING THE SUBTASKS TO CUDA

251 MAPPING ALGORITHM TO BLOCKS

252 MAPPING INSIDE BLOCK

253 MEMORY MANAGEMENT

26 HISTOGRAM OF ORIENTED GRADIENTS

CHAPTER 3 SOFTWARE

SPECIFICATIONhelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip47-53

31 GENERAL

32 SOFTWARE REQUIREMENTS

33 INTRODUCTION

34 FEATURES OF MATLAB

341 INTERFACING WITH OTHER LANGUAGES

35 THE MATLAB SYSTEM

351 DESKTOP TOOLS

352 ANALYZING AND ACCESSING DATA

353 PERFORMING NUMERIC COMPUTATION

CHAPTER 4

IMPLEMENTATIONhelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

helliphellip54-69

41 GENERAL

42 CODING IMPLEMENTATION

43 SNAPSHOTS

CHAPTER 5 helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

70

CHAPTER 6 CONCLUSION amp FUTURE

SCOPEhelliphelliphelliphelliphelliphelliphelliphelliphelliphellip71-72

61 CONCLUSION

62 REFERENCES

APPLICATION

LIST OF FIGURES

FIG NO

FIG NAME PGNO

11 Fundamental blocks of digital image processing

2

12 Gray scale image 813 The additive model of RGB 914 The colors created by the

subtractive model of CMYK9

21 The diagram of a typical sclera vein recognition approach

19

22 Steps of Segmentation 2123 Glare area detection 2124 Detection of the sclera area 22

25 Pattern of veins 23

26 Sclera region and its vein

patterns 25

27 Filtering can take place

simultaneously on different

25

parts of the iris image

28 The sketch of parameters of

segment descriptor

26

29 The weighting image 28

210 The module of sclera

template matching

28

211 The Y shape vessel branch in

sclera

28

212 The rotation and scale

invariant character of Y

shape vessel branch

29

213 The line descriptor of the

sclera vessel pattern

30

214 The key elements of

descriptor vector

31

215 Simplified sclera matching

steps on GPU

32

216 Two-stage matching scheme 35

217 Example image from the

UBIRIS database

42

218 Occupancy on various thread

numbers per block

43

219 The task assignment inside

and outside the GPU

44

220 HOG features 46

41 Original sclera image 65

42 Binarised sclera image 65

43 Edge map subtracted image 66

44 Cropping roi 66

45 Roi mask 67

46 Roi finger sclera image 67

47 Enhanced sclera image 68

48 Feature extracted sclera

image

68

49 Matching with images in

database

69

410 Result 69

ABSTRACT

Sclera vein recognition is shown to be a promising method for human

identification However its matching speed is slow which could impact its

application for real-time applications To improve the matching efficiency

we proposed a new parallel sclera vein recognition method using a two-

stage parallel approach for registration and matching First we designed a

rotation- and scale-invariant Y shape descriptor based feature extraction

method to efficiently eliminate most unlikely matches Second we

developed a weighted polar line sclera descriptor structure to incorporate

mask information to reduce GPU memory cost Third we designed a

coarse-to-fine two-stage matching method Finally we developed a

mapping scheme to map the subtasks to GPU processing units The

experimental results show that our proposed method can achieve dramatic

processing speed improvement without compromising the recognition

accuracy

CHAPTER 1

INTRODUCTION

11GENERAL

Digital image processing is the use of computer algorithms to

perform image processing on digital images The 2D continuous image is

divided into N rows and M columns The intersection of a row and a

column is called a pixel The image can also be a function other variables

including depth color and time An image given in the form of a

transparency slide photograph or an X-ray is first digitized and stored as a

matrix of binary digits in computer memory This digitized image can then

be processed andor displayed on a high-resolution television monitor For

display the image is stored in a rapid-access buffer memory which

refreshes the monitor at a rate of 25 frames per second to produce a visually

continuous display

12 OVERVIEW ABOUT DIGITAL IMAGE PROCESSING

The field of ldquoDigital Image Processingrdquo refers to processing the digital

images by means of a digital computer In a broader sense it can be

considered as a processing of any two dimensional data where any image

(optical information) is represented as an array of real or complex numbers

represented by a definite number of bits An image is represented as a two

dimensional function f(xy) where lsquoxrsquo and lsquoyrsquo are spatial (plane)

coordinates and the amplitude of f at any pair of coordinates (xy)

represents the intensity or gray level of the image at that point

A digital image is one for which both the co-ordinates and the

amplitude values of f are all finite discrete quantities Hence a digital

image is composed of a finite number of elements each of which has a

particular location value These elements are called ldquopixelsrdquo A digital

image is discrete in both spatial coordinates and brightness and it can be

considered as a matrix whose rows and column indices identify a point on

the image and the corresponding matrix element value identifies the gray

level at that point

One of the first applications of digital images was in the newspaper

industry when pictures were first sent by submarine cable between London

and New York Introduction of the Bartlane cable picture transmission

system in the early 1920s reduced the time required to transport a picture

across the Atlantic from more than a week to less than three hours

FIG

121 PREPROCESSING

In imaging science image processing is any form of signal

processing for which the input is an image such as a photograph or video

frame the output of image processing may be either an image or a set of

characteristics or parameters related to the image Most image-processing

techniques involve treating the image as a two-dimensional signal and

applying standard signal-processing techniques to it Image processing

usually refers to digital image processing but optical and analog image

processing also are possible This article is about general techniques that

apply to all of them The acquisition of images (producing the input image

in the first place) is referred to as imaging

Image processing refers to processing of a 2D picture by a

computer Basic definitions

An image defined in the ldquoreal worldrdquo is considered to be a function

of two real variables for example a(xy) with a as the amplitude (eg

brightness) of the image at the real coordinate position (xy) Modern digital

technology has made it possible to manipulate multi-dimensional signals

with systems that range from simple digital circuits to advanced parallel

computers The goal of this manipulation can be divided into three

categories

Image processing (image in -gt image out)

Image Analysis (image in -gt measurements out)

Image Understanding (image in -gt high-level description out)

An image may be considered to contain sub-images sometimes referred

to as regions-of-interest ROIs or simply regions This concept reflects the

fact that images frequently contain collections of objects each of which can

be the basis for a region In a sophisticated image processing system it

should be possible to apply specific image processing operations to selected

regions Thus one part of an image (region) might be processed to suppress

motion blur while another part might be processed to improve colour

rendition

Most usually image processing systems require that the images be

available in digitized form that is arrays of finite length binary words For

digitization the given Image is sampled on a discrete grid and each sample

or pixel is quantized using a finite number of bits The digitized image is

processed by a computer To display a digital image it is first converted

into analog signal which is scanned onto a display Closely related to

image processing are computer graphics and computer vision In computer

graphics images are manually made from physical models of objects

environments and lighting instead of being acquired (via imaging devices

such as cameras) from natural scenes as in most animated movies

Computer vision on the other hand is often considered high-level image

processing out of which a machinecomputersoftware intends to decipher

the physical contents of an image or a sequence of images (eg videos or

3D full-body magnetic resonance scans)

In modern sciences and technologies images also gain much

broader scopes due to the ever growing importance of scientific

visualization (of often large-scale complex scientificexperimental data)

Examples include microarray data in genetic research or real-time multi-

asset portfolio trading in finance Before going to processing an image it is

converted into a digital form Digitization includes sampling of image and

quantization of sampled values After converting the image into bit

information processing is performed This processing technique may be

Image enhancement Image restoration and Image compression

122 IMAGE ENHANCEMENT

It refers to accentuation or sharpening of image features such as

boundaries or contrast to make a graphic display more useful for display amp

analysis This process does not increase the inherent information content in

data It includes gray level amp contrast manipulation noise reduction edge

crispening and sharpening filtering interpolation and magnification

pseudo coloring and so on

123 IMAGE RESTORATION

It is concerned with filtering the observed image to minimize the

effect of degradations Effectiveness of image restoration depends on the

extent and accuracy of the knowledge of degradation process as well as on

filter design Image restoration differs from image enhancement in that the

latter is concerned with more extraction or accentuation of image features

124 IMAGE COMPRESSION

It is concerned with minimizing the number of bits required to represent

an image Application of compression are in broadcast TV remote sensing

via satellite military communication via aircraft radar teleconferencing

facsimile transmission for educational amp business documents medical

images that arise in Computer tomography magnetic resonance imaging

and digital radiology motion pictures satellite images weather maps

geological surveys and so on

Text compression ndash CCITT GROUP3 amp GROUP4

Still image compression ndash JPEG

Video image compression ndash MPEG

125 SEGMENTATION

In computer vision image segmentation is the process of

partitioning a digital image into multiple segments (sets of pixels also

known as super pixels) The goal of segmentation is to simplify andor

change the representation of an image into something that is more

meaningful and easier to analyze Image segmentation is typically used to

locate objects and boundaries (lines curves etc) in images More precisely

image segmentation is the process of assigning a label to every pixel in an

image such that pixels with the same label share certain visual

characteristics

The result of image segmentation is a set of segments that

collectively cover the entire image or a set of contours extracted from the

image (see edge detection) Each of the pixels in a region are similar with

respect to some characteristic or computed property such as

colour intensity or texture Adjacent regions are significantly different

with respect to the same characteristic(s) When applied to a stack of

images typical in medical imaging the resulting contours after image

segmentation can be used to create 3D reconstructions with the help of

interpolation algorithms like marching cubes

126 IMAGE RESTORATION

Image restoration like enhancement improves the qualities of image

but all the operations are mainly based on known measured or

degradations of the original image Image restorations are used to restore

images with problems such as geometric distortion improper focus

repetitive noise and camera motion It is used to correct images for known

degradations

127 FUNDAMENTAL STEPS

Image acquisition to acquire a digital image

Image preprocessing to improve the image in ways that increases the

chances for success of the other processes

Image segmentation to partitions an input image into its constituent parts or

objects

Image representation to convert the input data to a form suitable for

computer processing

Image description to extract features that result in some quantitative

information of interest or features that are basic for differentiating one

class of objects from another

Image recognition to assign a label to an object based on the

information provided by its descriptors

Image interpretation to assign meaning to an ensemble of recognized

objects

Knowledge about a problem domain is coded into an image processing

system in the form of a Knowledge database

13 A SIMPLE IMAGE MODEL

To be suitable for computer processing an image f(xy) must be digitalized

both spatially and in amplitude

Digitization of the spatial coordinates (xy) is called image sampling

Amplitude digitization is called gray-level quantization

The storage and processing requirements increase rapidly with the spatial

resolution and the number of gray levels

Example A 256 gray-level image of size 256x256 occupies 64K bytes of

memory

Images of very low spatial resolution produce a checkerboard effect

The use of insufficient number of gray levels in smooth areas of a digital

image results in false contouring

14 IMAGE FILE FORMATS

There are two general groups of lsquoimagesrsquo vector graphics (or line art)

and bitmaps (pixel-based or lsquoimagesrsquo) Some of the most common file

formats are

GIF mdash Graphical interchange Format An 8-bit (256 colour) non-

destructively compressed bitmap format Mostly used for web Has several

sub-standards one of which is the animated GIF

JPEG mdash Joint Photographic Experts Group a very efficient (ie much

information per byte) destructively compressed 24 bit (16 million colours)

bitmap format Widely used especially for web and Internet (bandwidth-

limited)

TIFF mdash Tagged Image File Format The standard 24 bit publication bitmap

format Compresses non-destructively with for instance Lempel-Ziv-

Welch (LZW) compression

PS mdash Postscript a standard vector format Has numerous sub-standards

and can be difficult to transport across platforms and operating systems

PSD ndash Adobe PhotoShop Document a dedicated Photoshop format that

keeps all the information in an image including all the layers

BMP- bit map file format

15 TYPE OF IMAGES

Images are 4 types

1 Binary image

2 Gray scale image

3 Color image

4 Indexed image

151 BINARY IMAGES

A binary image is a digital image that has only two possible values for

each pixel Typically the two colors used for a binary image are black and

white though any two colors can be used Binary images are also called bi-

level or two-level This means that each pixel is stored as a single bitmdashie

a 0 or 1 The names black-and-white BampW

152 GRAY SCALE IMAGE

In a (8-bit) grayscale image each picture element has an assigned intensity

that ranges from 0 to 255 A grey scale image is what people normally call

a black and white image but the name emphasizes that such an image will

also include many shades of grey

FIG

153 COLOR IMAGE

The RGB colour model relates very closely to the way we perceive

colour with the r g and b receptors in our retinas RGB uses additive colour

mixing and is the basic colour model used in television or any other

medium that projects colour with light It is the basic colour model used in

computers and for web graphics but it cannot be used for print production

The secondary colours of RGB ndash cyan magenta and yellow ndash are formed

by mixing two of the primary colours (red green or blue) and excluding the

third colour Red and green combine to make yellow green and blue to

make cyan and blue and red form magenta The combination of red green

and blue in full intensity makes white

In Photoshop using the ldquoscreenrdquo mode for the different layers in an

image will make the intensities mix together according to the additive

colour mixing model This is analogous to stacking slide images on top of

each other and shining light through them

FIG

CMYK The 4-colour CMYK model used in printing lays down

overlapping layers of varying percentages of transparent cyan (C) magenta

(M) and yellow (Y) inks In addition a layer of black (K) ink can be added

The CMYK model uses the subtractive colour model

154 INDEXED IMAGE

FIG

An indexed image consists of an array and a color map matrix The

pixel values in the array are direct indices into a color map By convention

this documentation uses the variable name X to refer to the array and map

to refer to the color map In computing indexed color is a technique to

manage digital images colors in a limited fashion in order to save

computer memory and file storage while speeding up display refresh and

file transfers It is a form of vector quantization compression

When an image is encoded in this way color information is not

directly carried by the image pixel data but is stored in a separate piece of

data called a palette an array of color elements in which every element a

color is indexed by its position within the array The image pixels do not

contain the full specification of its color but only its index in the palette

This technique is sometimes referred as pseudocolor or indirect color as

colors are addressed indirectly

Perhaps the first device that supported palette colors was a random-

access frame buffer described in 1975 by Kajiya Sutherland and Cheadle

This supported a palette of 256 36-bit RGB colors

16 Applications of image processing

Interest in digital image processing methods stems from 2 principal

application areas

1) Improvement of pictorial information for human interpretation

2) Processing of scene data for autonomous machine perception

In the second application area interest focuses on procedures for

extracting from an image

Information in a form suitable for computer processing

Examples include automatic character recognition industrial machine

vision for product assembly and inspection military recognizance

automatic processing of fingerprints etc

17 EXISTING SYSTEM

In Crihalmeanu and Ross proposed three approaches Speed Up Robust

Features (SURF)-based method minutiae detection and direct correlation

matching for feature registration and matching Within these three methods

the SURF method achieves the best accuracy It takes an average of 15

seconds1 using the SURF method to perform a one-to-one matching In

Zhou et al Proposed line descriptor-based method for sclera vein

recognition The matching step (including registration) is the most time-

consuming step in this sclera vein recognition system which costs about 12

seconds to perform a one-to-one matching Both speed was calculated using

a PC with Intelreg Core TM 2 Duo 24GHz processors and 4 GB DRAM

Currently Sclera vein recognition algorithms are designed using central

processing unit (CPU)-based systems

171 DISADVANTAGES OF EXISTING SYSTEM

1 Mask files are used to calculate valid overlapping areas of two sclera

templates and to align the templates to the same coordinate system But the

mask files are large in size and will preoccupy the GPU memory and slow

down the data transfer Also some of processing on the mask files will

involve convolution which is difficult to improve its performance on the

scalar process unit on CUDA

2 The procedure of sclera feature matching consists of a pipeline of several

computational stages with different memory and processing requirements

There is no uniform mapping scheme applicable for all these stages

3 When the scale of sclera database is far larger than the number of the

processing units on the GPU parallel matching on the GPU is still unable to

satisfy the requirement of real-time performance

LITERATURE SURVEY

1 S Crihalmeanu and A Ross ldquo ieee Multispectral scleral patterns

for ocular biometric recognitionrdquo Pattern Recognit Lett vol 33 no

14 pp 1860ndash1869 Oct 2012

Face recognition in unconstrained acquisition conditions is one of the

most challenging problems that has been actively researched in recent

years It is well known that many state-of-the-arts still face recognition

algorithms perform well when constrained (frontal well illuminated high-

resolution sharp and full) face images are acquired However their

performance degrades significantly when the test images contain variations

that are not present in the training images In this paper we highlight some

of the key issues in remote face recognition We define the remote face

recognition as one where faces are several tens of meters (10-250m) from

the cameras We then describe a remote face database which has been

acquired in an unconstrained outdoor maritime environment Recognition

performance of a subset of existing still image-based face recognition

algorithms is evaluated on the remote face data set Further we define the

remote re-identification problem as matching a subject at one location with

candidate sets acquired at a different location and over time in remote

conditions We provide preliminary experimental results on remote re-

identification It is demonstrated that in addition to applying a good

classification algorithm finding features that are robust to variations

mentioned above and developing statistical models which can account for

these variations are very important for remote face recognition

2 R N Rakvic B J Ulis R P Broussard R W Ives and N

Steiner1 ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics

Security

With the rapidly expanded biometric data collected by various sectors

of government and industry for identification and verification purposes

how to manage and process such Big Data draws great concern Even

though modern processors are equipped with more cores and memory

capacity it still requires careful design in order to utilize the hardware

resource effectively and the power consumption efficiently This research

addresses this issue by investigating the workload characteristics of

biometric application Taking Daugmanrsquos iris matching algorithm which

has been proven to be the most reliable iris matching method as a case

study we conduct performance profiling and binary instrumentation on the

benchmark to capture its execution behavior The results show that data

loading and memory access incurs great performance overhead and

motivates us to move the biometrics computation to high-performance

architecture

Modern iris recognition algorithms can be computationally intensive

yet are designed for traditional sequential processing elements such as a

personal computer However a parallel processing alternative using field

programmable gate arrays (FPGAs) offers an opportunity to speed up iris

recognition Within the means of this project iris template generation with

directional filtering which is acomputationally expensive yet parallel

portion of a modern iris recognition algorithm is parallelized on an FPGA

system We will present a performance comparison of the parallelized

algorithm on the FPGA system to a traditional CPU-based version The

parallelized template generation outperforms an optimized C++ code

version determining the information content of an iris approximately 324

times faster

3 R Derakhshani A Ross and S Crihalmeanu ldquoA new biometric

modality based on conjunctival vasculaturerdquo in Proc Artif Neural

Netw Eng 2006 pp 1ndash8

A new biometric indicator based on the patterns of conjunctival

vasculature is proposed Conjunctival vessels can be observed on the visible

part of the sclera that is exposed to the outside world These vessels

demonstrate rich and specific details in visible light and can be easily

photographed using a regular digital camera In this paper we discuss

methods for conjunctival imaging preprocessing and feature extraction in

order to derive a suitable conjunctival vascular template for biometric

authentication Commensurate classification methods along with the

observed accuracy are discussed Experimental results suggest the potential

of using conjunctival vasculature as a biometric measure Identification of

a person based on some unique set of features is an important task The

human identification is possible with several biometric systems and sclera

recognition is one of the promising biometrics The sclera is the white

portion of the human eye The vein pattern seen in the sclera region is

unique to each person Thus the sclera vein pattern is a well suited

biometric technology for human identification The existing methods used

for sclera recognition have some drawbacks like only frontal looking

images are preferred for matching and rotation variance is another problem

These problems are completely eliminated in the proposed system by using

two feature extraction techniques They are Histogram of Oriented

Gradients (HOG) and converting the image into polar form using the

bilinear interpolation technique These two features help the proposed

system to become illumination invariant and rotation invariant The

experimentation is done with the help of UBIRIS database The

experimental result shows that the proposed sclera recognition method can

achieve better accuracy than the previous methods

4 J D Owens M Houston D Luebke S Green J E Stone and J

C Phillips ldquoGPU computingrdquo Proc IEEE vol 96 no 5pp 879ndash899

May 2008

The graphics processing unit (GPU) has become an integral part of

todayrsquos mainstream computing systems Over the past six years there has

been a marked increase in the performance and capabilities of GPUs The

modern GPU is not only a powerful graphics engine but also a highly

parallel programmable processor featuring peak arithmetic and memory

bandwidth that substantially outpaces its CPU counterpart The GPUrsquos

rapid increase in both programmability and capability has spawned a

research community that has successfully mapped a broad range of

computationally demanding complex problems to the GPU This effort in

general purpose computing on the GPU also known as GPU computing

has positioned the GPU as a compelling alternative to traditional

microprocessors in high-performance computer systems of the future We

describe the background hardware and programming model for GPU

computing summarize the state of the art in tools and techniques and

present four GPU computing successes in game physics and computational

biophysics that deliver order-of-magnitude performance gains over

optimized CPU applications

5 H Proenccedila and L A Alexandre ldquoUBIRIS A noisy iris image

databaserdquoin Proc 13th Int Conf Image Anal Process 2005 pp 970ndash

977

This paper proposes algorithms for iris segmentation quality

enhancement match score fusion and indexing to improve both the

accuracy and the speed of iris recognition A curve evolution approach is

proposed to effectively segment a nonideal iris image using the modified

MumfordndashShah functional Different enhancement algorithms are

concurrently applied on the segmented iris image to produce multiple

enhanced versions of the iris image A support-vector-machine-based

learning algorithm selects locally enhanced regions from each globally

enhanced image and combines these good-quality regions to create a single

high-quality iris image Two distinct features are extracted from the high-

quality iris image The global textural feature is extracted using the 1-D log

polar Gabor transform and the local topological feature is extracted using

Euler numbers An intelligent fusion algorithm combines the textural and

topological matching scores to further improve the iris recognition

performance and reduce the false rejection rate whereas an indexing

algorithm enables fast and accurate iris identification The verification and

identification performance of the propose algorithms is validated and

compared with other algorithms using the CASIA Version 3 ICE 2005 and

UBIRIS iris databases

18 PROPOSED METHOD

Proposed a new parallel sclera vein recognition method using a two-

stage parallel approach for registration and matching A parallel sclera

matching solution for Sclera vein recognition using our sequential line-

descriptor method using the CUDA GPU architecture CUDA is a highly

parallel multithreaded many-core processor with tremendous

computational power

It supports not only a traditional graphics pipeline but also computation

on non-graphical data It is relatively straightforward to implement our C

program for CUDA on AMD-based GPU using Open CL Our CUDA

kernels can be directly converted to Open CL kernels by concerning

different syntax for various keywords and built-in functions The mapping

strategy is also effective in Open CL if we regard thread and block in

CUDA as work item and work-group in Open CL Most of our optimization

techniques such as coalesced memory access and prefix sum can work in

Open CL too Moreover since CUDA is a data parallel architecture the

implementation of our approach by Open CL should be programmed in

data-parallel model

In this research we first discuss why the naiumlve parallel approach would

not work We then propose the new sclera descriptor ndash the Y shape sclera

feature-based efficient registration method to speed up the mapping scheme

introduce the ldquoweighted polar line (WPL) descriptorrdquo that would be better

suited for parallel computing to mitigate the mask size issue and develop

our coarse to fine two-stage matching process to dramatically improve the

matching speed These new approaches make the parallel processing

possible and efficient

191PROPOSED SYSTEM ADVANTAGES

1 To improve the efficiency in this research we propose a new descriptor

mdash the Y shape descriptor which can greatly help improve the efficiency of

the coarse registration of two images and can be used to filter out some

non-matching pairs before refined matching

2 We propose the coarse-to-fine two-stage matching process In the first

stage we matched two images coarsely using the Y-shape descriptors

which is very fast to match because no registration was needed The

matching result in this stage can help filter out image pairs with low

similarities

CHAPTER 2

PROJECT DESCRIPTION

21 INTRODUCTION

The sclera is the opaque and white outer layer of the eye The blood

vessel structure of sclera is formed randomly and is unique to each person

which can be used for humanrsquos identification Several researchers have

designed different Sclera vein recognition methods and have shown that it

is promising to use Sclera vein recognition for human identification In

Crihalmeanu and Ross proposed three approaches Speed Up Robust

Features (SURF)-based method minutiae detection and direct correlation

matching for feature registration and matching Within these three methods

the SURF method achieves the best accuracy It takes an average of 15

seconds1 using the SURF method to per- form a one-to-one matching Zhou

et al proposed line descriptor-based method for sclera vein recognition

The matching step (including registration) is the most time-consuming step

in this sclera vein recognition system which costs about 12 seconds to

perform a one-to-one matching Both speed was calculated using a PC with

Intelreg Coretrade 2 Duo 24GHz processors and 4 GB DRAM Currently

Sclera vein recognition algorithms are designed using central processing

unit (CPU)-based systems

As discussed CPU-based systems are designed as sequential

processing devices which may not be efficient in data processing where the

data can be parallelized Because of large time consumption in the matching

step Sclera vein recognition using sequential-based method would be very

challenging to be implemented in a real time biometric system especially

when there is large number of templates in the database for matching GPUs

(as abbreviation of General purpose Graphics Processing Units GPGPUs)

are now popularly used for parallel computing to improve the

computational processing speed and efficiency The highly parallel

structure of GPUs makes them more effective than CPUs for data

processing where processing can be performed in parallel GPUs have been

widely used in biometrics recognition such as speech recognition text

detection handwriting recognition and face recognition In iris

recognition GPU was used to extract the features construct descriptors

and match templates

GPUs are also used for object retrieval and image search Park et al

designed the performance evaluation of image processing algorithms such

as linear feature extraction and multi-view stereo matching on GPUs

However these approaches were designed for their specific biometric

recognition applications and feature searching methods Therefore they may

not be efficient for Sclera vein recognition Compute Unified Device

Architecture (CUDA) the computing engine of NVIDIA GPUs is used in

this research CUDA is a highly parallel multithreaded many-core

processor with tremendous computational power It supports not only a

traditional graphics pipeline but also computation on non-graphical data

More importantly it offers an easier programming platform which

outperforms its CPU counterparts in terms of peak arithmetic intensity and

memory bandwidth In this research the goal is not to develop a unified

strategy to parallelize all sclera matching methods because each method is

quite different from one another and would need customized design To

develop an efficient parallel computing scheme it would need different

strategies for different Sclera vein recognition methods

Rather the goal is to develop a parallel sclera matching solution for

Sclera vein recognition using our sequential line-descriptor method using

the CUDA GPU architecture However the parallelization strategies

developed in this research can be applied to design parallel approaches for

other Sclera vein recognition methods and help parallelize general pattern

recognition methods Based on the matching approach in there are three

challenges to map the task of sclera feature matching to GPU

1) Mask files are used to calculate valid overlapping areas of two sclera

templates and to align the templates to the same coordinate system But the

mask files are large in size and will preoccupy the GPU memory and slow

down the data transfer Also some of processing on the mask files will

involve convolution which is difficult to improve its performance on the

scalar process unit on CUDA

2) The procedure of sclera feature matching consists of a pipeline of several

computational stages with different memory and processing requirements

There is no uniform mapping scheme applicable for all these stages

3) When the scale of sclera database is far larger than the number of the

processing units on the GPU parallel matching on the GPU is still unable to

satisfy the requirement of real-time performance New designs are

necessary to help narrow down the search range In summary naiumlve

implementation of the algorithms in parallel would not work efficiently

Note it is relatively straightforward to implement our C program for

CUDA on AMD-based GPU using Open CL Our CUDA kernels can be

directly converted to Open CL kernels by concerning different syntax for

various keywords and built-in functions The mapping strategy is also

effective in Open CL if we regard thread and block in CUDA as work item

and work-group in Open CL Most of our optimization techniques such as

coalesced memory access and prefix sum can work in Open CL too

Moreover since CUDA is a data parallel architecture the implementation

of our approach by Open CL should be programmed in data-parallel model

In this research we first discuss why the naiumlve parallel approach would not

work (Section 3) We then propose the new sclera descriptor ndash the Y shape

sclera feature-based efficient registration method to speed up the mapping

scheme (Section 4) introduce the ldquoweighted polar line (WPL) descriptorrdquo

that would be better suited for parallel computing to mitigate the mask size

issue (Section 5) and develop our coarse to fine two-stage matching

process to dramatically improve the matching speed (Section 6) These new

approaches make the parallel processing possible and efficient However it

is non-trivial to implement these algorithms in CUDA We then developed

the implementation schemes to map our algorithms into CUDA (Section 7)

In the Section 2 we give brief introduction of Sclera vein recognition In

the Section 8 we performed some experiments using the proposed system

In the Section 9 we draw some conclusions

22 BACKGROUND OF SCLERA VEIN RECOGNITION

221 OVERVIEW OF SCLERA VEIN RECOGNITION

A typical sclera vein recognition system includes sclera

segmentation feature enhancement feature extraction and feature

matching (Figure 1)

FIG

Sclera image segmentation is the first step in sclera vein recognition

Several methods have been designed for sclera segmentation Crihalmeanu

et al presented an semi-automated system for sclera segmentation They

used a clustering algorithm to classify the color eye images into three

clusters - sclera iris and background Later on Crihalmeanu and Ross

designed a segmentation approach based on a normalized sclera index

measure which includes coarse sclera segmentation pupil region

segmentation and fine sclera segmentation Zhou et al developed a skin

tone plus ldquowhite colorrdquo-based voting method for sclera segmentation in

color images and Otsursquos thresholding-based method for grayscale images

After sclera segmentation it is necessary to enhance and extract the sclera

features since the sclera vein patterns often lack contrast and are hard to

detect Zhou et al used a bank of multi-directional Gabor filters for

vascular pattern enhancement Derakhshani et al used contrast limited

adaptive histogram equalization (CLAHE) to enhance the green color plane

of the RGB image and a multi-scale region growing approach to identify

the sclera veins from the image background Crihalmeanu and Ross applied

a selective enhancement filter for blood vessels to extract features from the

green component in a color image In the feature matching step

Crihalmeanu and Ross proposed

three registration and matching approaches including Speed Up Robust

Features (SURF) which is based on interest-point detection minutiae

detection which is based on minutiae points on the vasculature structure

and direct correlation matching which relies on image registration Zhou et

al designed a line descriptor based feature registration and matching

method

The proposed sclera recognition consists of five steps which include

sclera segmentation vein pattern enhancement feature extraction feature

matching and matching decision Fig 2 shows the block diagram of sclera

recognition Two types of feature extraction are used in the proposed

method to achieve good accuracy for the identification The characteristics

that are elicited from the blood vessel structure seen in the sclera region are

Histogram of Oriented Gradients (HOG) and interpolation of Cartesian to

Polar conversion HOG is used to determine the gradient orientation and

edge orientations of vein pattern in the sclera region of an eye image To

become more computationally efficient the data of the image are converted

to the polar form It is mainly used for circular or quasi circular shape of

object These two characteristics are extracted from all the images in the

database and compared with the features of the query image whether the

person is correctly identified or not This procedure is done in the feature

matching step and ultimately makes the matching decision By using the

proposed feature extraction methods and matching techniques the human

identification is more accurate than the existing studies In the proposed

method two features of an image are drawn out

222 SCLERA SEGMENTATION

Sclera segmentation is the first step in the sclera recognition It lets

in three steps glare area detection sclera area estimation and iris and eyelid

detection and refinement Fig shows the steps of segmentation

FIG

Glare Area Detection Glare area means a small bright area near

pupil or iris This is the unwanted portion on the eye image Sobel filter is

applied to detect the glare area present in the iris or pupil Simply it runs

only for the grayscale image If the image is color then it needs a

conversion to grayscale image and after that apply it to the Sobel filter to

detect the glare area Fig 4 shows the result of the glare area detection

FIG

Sclera area estimation For the estimation of sclera area Otsursquos

thresholding method is applied The stairs of the sclera area detection are

selection of the area of interest (ROI) Otsursquos thresholding sclera area

detection Left and right sclera area is selected based on the iris boundaries

When the region of interest is selected then apply Otsursquos thresholding for

obtaining the potential sclera areas The correct left sclera area should be

placed in the right and center positions and correct right sclera area should

be placed in the left and center In this way non sclera areas are wiped out

223 IRIS AND EYELID REFINEMENT

The top and underside of the sclera regions are the limits of the

sclera area And then that upper eyelid lower eyelid and iris boundaries are

refined These altogether are the unwanted portion for recognition In order

to eliminate these effects refinement is done in the footstep of the detection

of sclera area Fig shows after the Otsursquos thresholding process and iris and

eyelid refinement to detect right sclera area In the same way the left sclera

area is detected using this method

FIG

In the segmentation process all images are not perfectly segmented

Hence feature extraction and matching are needed to reduce the

segmentation fault The vein patterns in the sclera area are not visible in the

segmentation process To get vein patterns more visible vein pattern

enhancement is to be performed

224 OCULAR SURFACE VASCULATURE

Human recognition using vascular patterns in the human body has

been studied in the context of fingers (Miura et al 2004) palm (Lin and

Fan 2004) and retina (Hill 1999) In the case of retinal biometrics an

especial optical device for imaging the back of the eyeball is needed (Hill

1999) Due to its perceived invasiveness and the required degree of subject

cooperation the use of retinal biometrics may not be acceptable to some

individuals The conjunctiva is a thin transparent and moist tissue that

covers the outer surface of the eye The part of the conjunctiva that covers

the inner lining of the eyelids is called palpebral conjunctiva and the part

that covers the outer surface of the eye is called ocular (or the bulbar)

conjunctiva which is the focus of this study The ocular conjunctiva is very

thin and clear thus the vasculature (including those of the episclera) is

easily visible through it The visible microcirculation of conjunctiva offers a

rich and complex network of veins and fine microcirculation (Fig 1) The

apparent complexity and specificity of these vascular patterns motivated us

to utilize them for personal identification (Derakhshani and Ross 2006)

FIG

We have found conjunctival vasculature to be a suitable biometric as it

conforms to the following criteria (Jain et al 2004)

UNIVERSALITY All normal living tissues including that of the

conjunctiva and episclera have vascular structure

UNIQUENESS Vasculature is created during embryonic vasculogenesis

Its detailed final structure is mostly stochastic and thus unique Even

though no comprehensive study on the uniqueness of vascular structures

has been conducted study of some targeted areas such as those of the eye

fundus confirm the uniqueness of such vascular patterns even between

identical twins (Simon and Goldstein 1935 Tower 1955)

PERMANENCE Other than cases of significant trauma pathology or

chemical intervention spontaneous adult ocular vasculogenesis and

angiogenesis does not easily occur Thus the conjunctival vascular

structure is expected to have reasonable permanence (Joussen 2001)

Practicality Conjunctival vasculature can be captured with commercial off

the shelf digital cameras under normal lighting conditions making this

modality highly practical

ACCEPTABILITY Since the subject is not required to stare directly into

the camera lens and given the possibility of capturing the conjunctival

vasculature from several feet away this modality is non-intrusive and thus

more acceptable

SPOOF-PROOFNESS The fine multi surface structure of the ocular

veins makes them hard to reproduce as a physical artifact Besides being a

stand-alone biometric modality we anticipate that the addition of

conjunctival biometrics will enhance the performance of current iris-based

biometric system in the following ways

Improving accuracy by the addition of vascular features

Facilitating recognition using off-angle iris images For instance if the iris

information is relegated to the left or right portions of the eye the sclera

vein patterns will be further exposed This feature makes sclera vasculature

a natural complement to the iris biometric

Addressing the failure-to-enroll issue when iris patterns are not usable (eg

due to surgical procedures)

Reducing vulnerability to spoof attacks For instance when implemented

alongside iris systems an attacker needs to reproduce not only the iris but

also different surfaces of the sclera along with the associated

microcirculation and make them available on commensurate eye surfaces

The first step in parallelizing an algorithm is to determine the

availability for simultaneous computation Below Figure demonstrates the

possibility for parallel directional filtering Since the filter is computed over

different portions of the input image the computation can be computed in

parallel (denoted by Elements below) In addition individual parallelization

of each element of Filtering can also be performed A detailed discussion of

our proposed parallelization is outside the scope of this paper

FIG

FIG

225 OVERVIEW OF THE LINE DESCRIPTOR-BASED SCLERA

VEIN

2251 RECOGNITION METHOD

The matching segment of the line-descriptor based method is a

bottleneck with regard to matching speed In this section we briefly

describe the Line Descriptor-based sclera vein recognition method After

segmentation vein patterns were enhanced by a bank of directional Gabor

filters Binary morphological operations are used to thin the detected vein

structure down to a single pixel wide skeleton and remove the branch

points The line descriptor is used to describe the segments in the vein

structure Figure 2 shows a visual description of the line descriptor Each

segment is described by three quantities the segments angle to some

reference angle at the iris center θ the segments distance to the iris center r

and the dominant angular orientation of the line segment ɸ Thus the

descriptor is S = ( θ r ɸ )T The individual components of the line descriptor

are calculated as

FIG

Here fline (x) is the polynomial approximation of the line segment (xl yl )

is the center point of the line segment (xi yi ) is the center of the detected

iris and S is the line descriptor In order to register the segments of the

vascular patterns a RANSAC-based algorithm is used to estimate the best-

fit parameters for registration between the two sclera vascular patterns For

the registration algorithm it randomly chooses two points ndash one from the

test template and one from the target template It also randomly chooses a

scaling factor and a rotation value based on a priori knowledge of the

database Using these values it calculates a fitness value for the registration

using these parameters

After sclera template registration each line segment in the test

template is compared to the line segments in the target template for

matches In order to reduce the effect of segmentation errors we created the

weighting image (Figure 3) from the sclera mask by setting interior pixels

in the sclera mask to 1 pixels within some distance of the boundary of the

mask to 05 and pixels outside the mask to 0

The matching score for two segment descriptors is calculated By

where Si and Sj are two segment descriptors m(Si Sj ) is the matching

score between segments Si and Sj d(Si Sj ) is the Euclidean distance

between the segment descriptors center points (from Eq 6-8) Dmatch is

the matching distance threshold and match is the matching angle threshold

The total matching score M is the sum of the individual matching scores

divided by the maximum matching score for the minimal set between the

test and target template That is one of the test or target templates has fewer

points and thus the sum of its descriptors weight sets the maximum score

that can be attained

FIG

FIG

FIG

FIG

movement of eye Y shape branches are observed to be a stable feature and

can be used as sclera feature descriptor To detect the Y shape branches in

the original template we search for the nearest neighbors set of every line

segment in a regular distance classified the angles among these neighbors

If there were two types of angle values in the line segment set this set may

be inferred as a Y shape structure and the line segment angles would be

recorded as a new feature of the sclera

There are two ways to measure both orientation and relationship of

every branch of Y shape vessels one is to use the angles of every branch to

x axle the other is to use the angels between branch and iris radial

direction The first method needs additional rotation operating to align the

template In our approach we employed the second method As Figure 6

shows ϕ1 ϕ2 and ϕ3 denote the angle between each branch and the radius

from pupil center Even when the head tilts the eye moves or the camera

zooms occurs at the image acquisition step ϕ1 ϕ2 and ϕ3 are quite stable

To tolerate errors from the pupil center calculation in the segmentation step

we also recorded the center position (x y) of the Y shape branches as

auxiliary parameters So in our rotation shift and scale invariant feature

vector is defined as y(ϕ1 ϕ2 ϕ3 x y) The Y-shape descriptor is generated

with reference to the iris center Therefore it is automatically aligned to the

iris centers It is a rotational- and scale- invariant descriptor V WPL

SCLERA DESCRIPTOR As we discussed in the Section 22 the line

descriptor is extracted from the skeleton of vessel structure in binary images

(Figure 7) The skeleton is then broken into smaller segments For each

segment a line descriptor is created to record the center and orientation of

the segment This descriptor is expressed as s(x yɸ) where (x y) is the

position of the center and ɸ is its orientation Because of the limitation of

segmentation accuracy the descriptor in the boundary of sclera area might

not be accurate and may contain spur edges resulting from the iris eyelid

andor eyelashes To be tolerant of such error the mask file

FIG

The line descriptor of the sclera vessel pattern (a) An eye image (b) Vessel

patterns in sclera (c) Enhanced sclera vessel patterns (d) Centers of line

segments of vessel patterns

Is designed to indicate whether a line segment belongs to the edge of the

sclera or not However in GPU application using the mask is a challenging

since the mask files are large in size and will occupy the GPU memory and

slow down the data transfer When matching the registration RANSAC

type algorithm was used to randomly select the corresponding descriptors

and the transform parameter between them was used to generate the

template transform affine matrix After every templates transform the mask

data should also be transformed and new boundary should be calculated to

evaluate the weight of the transformed descriptor This results in too many

convolutions in processor unit

To reduce heavy data transfer and computation we designed the

weighted polar line (WPL) descriptor structure which includes the

information of mask and can be automatically aligned We extracted the

relationship of geometric feature of descriptors and store them as a new

descriptor We use a weighted image created via setting various weight

values according to their positions The weight of those descriptors who are

beyond the sclera are set to be 0 and those who are near the sclera

boundary are 05 and interior descriptors are set to be 1 In our work

descriptors weights were calculated on their own mask by the CPU only

once

The calculating result was saved as a component of descriptor The

descriptor of sclera will change to s(x y ɸw) where w denotes the weight

of the point and the value may be 0 05 1 To align two templates when a

template is shifted to another location along the line connecting their

centers all the descriptors of that template will be transformed It would be

faster if two templates have similar reference points If we use the center of

the iris as the reference point when two templates are compared the

correspondence will automatically be aligned to each other since they have

the similar reference point Every feature vector of the template is a set of

line segment descriptors composed of three variable (Figure 8) the

segment angle to the reference line which went through the iris center

denoted as θ the distance between the segments center and pupil center

which is denoted as r the dominant angular orientation of the segment

denoted as ɸ To minimize the GPU computing we also convert the

descriptor value from polar coordinate to rectangular coordinate in CPU

preprocess

The descriptor vector becomes s(x y r θ ɸw) The left and right

parts of sclera in an eye may have different registration parameters For

example as an eyeball moves left left part sclera patterns of the eye may be

compressed while the right part sclera patterns are stretched

In parallel matching these two parts are assigned to threads in

different warps to allow different deformation The multiprocessor in

CUDA manages threads in groups of 32 parallel threads called warps We

reorganized the descriptor from same sides and saved

FIG

FIG

them in continuous address This would meet requirement of coalesced

memory access in GPU

After reorganizing the structure of descriptors and adding mask information

into the new descriptor the computation on the mask file is not needed on

the GPU It was very fast to match with this feature because it does not

need to reregister the templates every time after shifting Thus the cost of

data transfer and computation on GPU will be reduced Matching on the

new descriptor the shift parameters generator in Figure 4 is then simplified

as Figure 9

23 EVOLUTION OF GPU ARCHITECTURE

The fixed-function pipeline lacked the generality to efficiently express

more complicated shading and lighting operations that are essential for

complex effects The key step was replacing the fixed-function per-vertex

and per-fragment operations with user-specified programs run on each

vertex and fragment Over the past six years these vertex programs and

fragment programs have become increasingly more capable with larger

limits on their size and resource consumption with more fully featured

instruction sets and with more flexible control-flow operations After many

years of separate instruction sets for vertex and fragment operations current

GPUs support the unified Shader Model 40 on both vertex and fragment

shaders

The hardware must support shader programs of at least 65 k static

instructions and unlimited dynamic instructions

The instruction set for the first time supports both 32-bit integers and 32-

bit floating-point numbers

The hardware must allow an arbitrary number of both direct and indirect

reads from global memory (texture)

Finally dynamic flow control in the form of loops and branches must be

supported

As the shader model has evolved and become more powerful and GPU

applications of all types have increased vertex and fragment program

complexity GPU architectures have increasingly focused on the

programmable parts of the graphics pipeline Indeed while previous

generations of GPUs could best be described as additions of

programmability to a fixed-function pipeline todayrsquos GPUs are better

characterized as a programmable engine surrounded by supporting fixed-

function units General-Purpose Computing on the GPU Mapping general-

purpose computation onto the GPU uses the graphics hardware in much the

same way as any standard graphics application Because of this similarity it

is both easier and more difficult to explain the process On one hand the

actual operations are the same and are easy to follow on the other hand the

terminology is different between graphics and general-purpose use Harris

provides an excellent description of this mapping process

We begin by describing GPU programming using graphics terminology

then show how the same steps are used in a general-purpose way to author

GPGPU applications and finally use the same steps to show the more

simple and direct way that todayrsquos GPU computing applications are written

231 PROGRAMMING A GPU FOR GRAPHICS

We begin with the same GPU pipeline that we described in Section II

concentrating on the programmable aspects of this pipeline

The programmer specifies geometry that covers a region on the screen

The rasterizer generates a fragment at each pixel location covered by that

geometry

Each fragment is shaded by the fragment program

The fragment program computes the value of the fragment by a

combination of math operations and global memory reads from a global

Btexture[ memory

The resulting image can then be used as texture on future passes through

the graphics pipeline

232 PROGRAMMING A GPU FOR GENERAL-PURPOSE

PROGRAMS (OLD)

Coopting this pipeline to perform general-purpose computation

involves the exact same steps but different terminology A motivating

example is a fluid simulation computed over a grid at each time step we

compute the next state of the fluid for each grid point from the current state

at its grid point and at the grid points of its neighbors

The programmer specifies a geometric primitive that covers a

computation domain of interest The rasterizer generates a fragment at each

pixel location covered by that geometry (In our example our primitive

must cover a grid of fragments equal to the domain size of our fluid

simulation)

Each fragment is shaded by an SPMD general-purpose fragment

program (Each grid point runs the same program to update the state of its

fluid)

The fragment program computes the value of the fragment by a

combination of math operations and Bgather[ accesses from global

memory (Each grid point can access the state of its neighbors from the

previous time step in computing its current value)

The resulting buffer in global memory can then be used as an input on

future passes (The current state of the fluid will be used on the next time

step)

23 2 PROGRAMMING A GPU FOR GENERAL-PURPOSE

PROGRAMS (NEW)

One of the historical difficulties in programming GPGPU applications

has been that despite their general-purpose tasksrsquo having nothing to do with

graphics the applications still had to be programmed using graphics APIs

In addition the program had to be structured in terms of the graphics

pipeline with the programmable units only accessible as an intermediate

step in that pipeline when the programmer would almost certainly prefer to

access the programmable units directly The programming environments we

describe in detail in Section IV are solving this difficulty by providing a

more natural direct non-graphics interface to the hardware and

specifically the programmable units Today GPU computing applications

are structured in the following way

The programmer directly defines the computation domain of interest as a

structured grid of threads

An SPMD general-purpose program computes the value of each thread

The value for each thread is computed by a combination of math

operations and both Bgather[ (read) accesses from and Bscatter[ (write)

accesses to global memory Unlike in the previous two

methods the same buffer can be used for both reading and writing

allowing more flexible algorithms (for example in-place algorithms that

use less memory)

The resulting buffer in global memory can then be used as an input in

future computation

24 COARSE-TO-FINE TWO-STAGE MATCHING PROCESS

To further improve the matching process we propose the coarse-to-fine

two-stage matching process In the first stage we matched two images

coarsely using the Y-shape descriptors which is very fast to match because

no registration was needed The matching result in this stage can help filter

out image pairs with low similarities After this step it is still possible for

some false positive matches In the second stage we used WPL descriptor

to register the two images for more detailed descriptor matching including

scale- and translation invariance This stage includes shift transform affine

matrix generation and final WPL descriptor matching Overall we

partitioned the registration and matching processing into four kernels2 in

CUDA (Figure 10) matching on the Y shape descriptor shift

transformation affine matrix generation and final WSL descriptor

matching Combining these two stages the matching program can run faster

and

achieve more accurate score

241 STAGE I MATCHING WITH Y SHAPE DESCRIPTOR

Due to scale- and rotation- invariance of the Y-shape features

registration is unnecessary before matching on Y shape descriptor The

whole matching algorithm is listed as algorithm 1

FIG

Here ytei and yta j are the Y shape descriptors of test template Tte

and target template Tta respectively dϕ is the Euclidian distance of angle

element of descriptors vector defined as (3) dxy is the Euclidian distance of

two descriptor centers defined as (4) ni and di are the matched descriptor

pairsrsquo number and their centers distance respectively tϕ is a distance

threshold and txy is the threshold to restrict the searching area We set tϕ to

30 and txy to 675 in our experiment Here

To match two sclera templates we searched the areas nearby to all

the Y shape branches The search area is limited to the corresponding left or

right half of the sclera in order to reduce the searching range and time The

distance of two branches is defined in (3) where ϕi j is the angle between

the j th branch and the polar from pupil center in desctiptor i

The number of matched pairs ni and the distance between Y shape

branches centers di are stored as the matching result We fuse the number of

matched branches and the average distance between matched branches

centers as (2) Here α is a factor to fuse the matching score which was set

to 30 in our study Ni and Nj is the total numbers of feature vectors in

template i and j separately The decision is regulated by the threshold t if

the sclerarsquos matching score is lower than t the sclera will be discarded The

sclera with high matching score will be passed to the next more precisely

matching process

242 STAGE II FINE MATCHING USING WPL DESCRIPTOR

The line segment WSL descriptor reveals more vessel structure detail of

sclera than the Y shape descriptor The variation of sclera vessel pattern is

nonlinear because

When acquiring an eye image in different gaze angle the vessel structure

will appear nonlinear shrink or extend because eyeball is spherical in shape

sclera is made up of four layers episclera stroma lamina fusca and

endothelium There are slightly differences among movement of these

layers Considering these factors our registration employed both single

shift transform and multi-parameter transform which combines shift

rotation and scale together

1) SHIFT PARAMETER SEARCH As we discussed before

segmentation may not be accurate As a result the detected iris center could

not be very accurate Shift transform is designed to tolerant possible errors

in pupil center detection in the segmentation step If there is no deformation

or only very minor deformation registration with shift transform together

would be adequate to achieve an accurate result We designed Algorithm 2

to get optimized shift parameter Where Tte is the test template and ssei is

the i th WPL descriptor of Tte Tta is the target template and ssai is the i th

WPL descriptor of Ttad(stek staj ) is Euclidean distance of escriptors stek

and staj

Δsk is the shift value of two descriptors defines as

We first randomly select an equal number of segment descriptors

stek in test template Tte from each quad and find its nearest neighbors staj _

in target template Tta The shift offset of them is recorded as the possible

registration shift factor _sk The final offset registration factor is _soptim

which has the smallest standard deviation among these candidate offsets

2) AFFINE TRANSFORM PARAMETER SEARCH

Affine transform is designed to tolerant some deformation of sclera

patterns in the matching step The affine transform algorithm is shown in

Algorithm 3 The shift value in the parameter set is obtained by randomly

selecting descriptor s(it )te and calculating the distance from its nearest

neighbor staj_ in Tta We transform the test template by the matrix in (7)

At end of the iteration we count the numbers of matched descriptor pairs

from the transformed template and the target template The factor β is

involved to determine if the pair of descriptor is matched and we set it to

be 20 pixels in our experiment After N iterations the optimized transform

parameter set is determined via selecting the maximum matching numbers

m(it) Here stei Tte staj and Tta is defined same as algorithm 2 tr (it )

shi f t θ(it )tr (it ) scale is the parameters of shift rotation and scale

transform generated in i tth iteration R(θ (it )) T (tr (it ) shi f t ) and S(tr (it

) scale) are the transform matrix defined as (7) To search optimize

transform parameter we iterated N times to generate these parameters In

our experiment we set iteration time to 512

3) REGISTRATION AND MATCHING ALGORITHM

Using the optimized parameter set determined from Algorithms 2

and 3 the test template will be registered and matched simultaneously The

registration and matching algorithm is listed in Algorithm 4 Here stei Tte

staj and Tta are defined same as Algorithms 2 and 3 θ(optm) tr (optm) shi f

t tr (optm) scale _soptim are the registration parameters attained from

Algorithms 2 and 3 R_θ(optm)_T _tr (optm) shi f t _S_tr (optm) scale

_ is the descriptor transform matrix defined in Algorithm 3 empty is the angle

between the segment descriptor and radius direction w is the weight of the

descriptor which indicates whether the descriptor is at the edge of sclera or

not To ensure that the nearest descriptors have a similar orientation we

used a constant factor α to check the abstract difference of two ɸ In our

experiment we set α to 5 The total matching score is minimal score of two

transformed result divided by the minimal matching score for test template

and target template

25 MAPPING THE SUBTASKS TO CUDA

CUDA is a single instruction multiple data (SIMD) system and

works as a coprocessor with a CPU A CUDA consists of many streaming

multiprocessors (SM) where the parallel part of the program should be

partitioned into threads by the programmer and mapped into those threads

There are multiple memory spaces in the CUDA memory hierarchy

register local memory shared memory global memory constant memory

and texture memory Register local memory and shared memory are on-

chip and could be a little time consuming to access these memories Only

shared memory can be accessed by other threads within the same block

However there is only limited availability of shared memory Global

memory constant memory and texture memory are off-chip memory and

accessible by all threads which would be very time consuming to access

these memories

Constant memory and texture memory are read-only and cacheable

memory Mapping algorithms to CUDA to achieve efficient processing is

not a trivial task There are several challenges in CUDA programming

If threads in a warp have different control path all the branches will be

executed serially To improve performance branch divergence within a

warp should be avoided

Global memory is slower than on-chip memory in term of access To

completely hide the latency of the small instructions set we should use on-

chip memory preferentially rather than global memory When global

memory access occurs threads in same warp should access the words in

sequence to achieve coalescence

Shared memory is much faster than the local and global memory space

But shared memory is organized into banks which are equal in size If two

addresses of memory request from different thread within a warp fall in the

same memory bank the access will be serialized To get maximum

performance memory requests should be scheduled to minimize bank

conflicts

251 MAPPING ALGORITHM TO BLOCKS

Because the proposed registration and matching algorithm has four

independent modules all the modules will be converted to different kernels

on the GPU These kernels are different in computation density thus we

map them to the GPU by various map strategies to fully utilize the

computing power of CUDA Figure 11 shows our scheme of CPU-GPU

task distribution and the partition among blocks and threads Algorithm 1 is

partitioned into coarse-grained parallel subtasks

We create a number of threads in this kernel The number of threads

is the same as the number of templates in the database As the upper middle

column shows in Figure 11 each target template will be assigned to one

thread One thread performs a pair of templates compare In our work we

use NVIDIA C2070 as our GPU Threads and blocks number is set to

1024 That means we can match our test template with up to 1024times1024

target templates at same time

Algorithms 2-4 will be partitioned into fine-grained subtasks which is

processed a section of descriptors in one thread As the lower portion of the

middle column shows in Figure 11 we assigned a target template to one

block Inside a block one thread corresponds a set of descriptors in this

template This partition makes every block execute independently and there

are no data exchange requirements between different blocks When all

threads complete their responding descriptor fractions the sum of the

intermediate results needs to be computed or compared A parallel prefix

sum algorithm is used to calculate the sum of intermediate results which is

show in right of Figure 11 Firstly all odd number threads compute the sum

of consecutive pairs of the results Then recursively every first of i (= 4 8

16 32 64 ) threads

compute the prefix sum on the new result The final result will be saved in

the first address which has the same variable name as the first intermediate

result

252 MAPPING INSIDE BLOCK

In shift argument searching there are two schemes we can choose to

map task

Mapping one pair of templates to all the threads in a block and then every

thread would take charge of a fraction of descriptors and cooperation with

other threads

Assigning a single possible shift offset to a thread and all the threads will

compute independently unless the final result should be compared with

other possible offset

Due to great number of sum and synchronization operations in every

nearest neighbor searching step we choose the second method to parallelize

shift searching In affine matrix generator we mapped an entire parameter

set searching to a thread and every thread randomly generated a set of

parameters and tried them independently The generated iterations were

assigned to all threads The challenge of this step is the randomly generated

numbers might be correlated among threads In the step of rotation and

scale registration generating we used the Mersenne Twister pseudorandom

number generator because it can use bitwise arithmetic and have long

period

The Mersenne twister as most of pseudorandom generators is iterative

Therefore itrsquos hard to parallelize a single twister state update step among

several execution threads To make sure that thousands of threads in the

launch grid generate uncorrelated random sequence many simultaneous

Mersenne twisters need to process with different initial states in parallel

But even ldquovery differentrdquo (by any definition) initial state values do not

prevent the emission of correlated sequences by each generator sharing

identical parameters To solve this problem and to enable efficient

implementation of Mersenne Twister on parallel architectures we used a

special offline tool for the dynamic creation of Mersenne Twisters

parameters modified from the algorithm developed by Makoto Matsumoto

and Takuji Nishimura In the registration and matching step when

searching the nearest neighbor a line segment that has already matched

with others should not be used again In our approach a flag

FIG

FIG

Variable denoting whether the line has been matched is stored in

shared memory To share the flags all the threads in a block should wait

synchronic operation at every query step Our solution is to use a single

thread in a block to process the matching

253 MEMORY MANAGEMENT

The bandwidth inside GPU board is much higher than the

bandwidth between host memory and device memory The data transfer

between host and device can lead to long latency As shown in Figure 11

we load the entire target templates set from database without considering

when they would be processed Therefore there was no data transfer from

host to device during the matching procedure In global memory the

components in descriptor y(ϕ1 ϕ2 ϕ3 x y) and s(x y rθ ϕw) were stored

separately This would guarantee contiguous kernels of Algorithm 2 to 4

can access their data in successive addresses Although such coalescing

access reduces the latency frequently global memory access was still a

slower way to get data In our kernel we loaded the test template to shared

memory to accelerate memory access Because the Algorithms 2 to 4

execute different number of iterations on same data the bank conflict does

not happen To maximize our texture memory space we set the system

cache to the lowest value and bonded our target descriptor to texture

memory Using this catchable memory our data access was accelerated

more

FIG

26 HISTOGRAM OF ORIENTED GRADIENTS

Histogram of oriented gradients is the feature descriptors It is primarily

applied to the design of target detection In this paper it is applied as the

feature for human recognition In the sclera region the vein patterns are the

edges of an image So HOG is used to determine the gradient orientation

and edge orientations of vein pattern in the sclera region of an eye image

To follow out this technique first of all divide the image into small

connected regions called cells For each cell compute the histogram of

gradient directions or edge orientations of the pixels Then the combination

of different histogram of different cell represents the descriptor To improve

accuracy histograms can be contrast normalized by calculating the intensity

from the block and then using this value normalizes all cells within the

block This normalization result shows that it is invariant to geometric and

photometric changes The gradient magnitude m(x y) and orientation 1050592(x

y) are calculated using x and y directions gradients dx (x y) and dy (x y)

Orientation binning is the second step of HOG This method utilized

to create cell histograms Each pixel within the cell used to give a weight to

the orientation which is found in the gradient computation Gradient

magnitude is used as the weight The cells are in the rectangular form The

binning of gradient orientation should be spread over 0 to 180 degrees and

opposite direction counts as the same In the Fig 8 depicts the edge

orientation of picture elements If the images have any illumination and

contrast changes then the gradient strength must be locally normalized For

that cells are grouped together into larger blocks These blocks are

overlapping so that each cell contributes more than once to the final

descriptor Here rectangular HOG (R-HOG) blocks are applied which are

mainly in square grids The performance of HOG is improved by putting

on a Gaussian window into each block

FIG

CHAPTER 3

SOFTWARE SPECIFICATION

31 GENERAL

MATLAB(matrix laboratory) is a numerical

computing environment and fourth-generation programming language

Developed by Math Works MATLAB allows matrix manipulations

plotting of functions and data implementation of algorithms creation

of user interfaces and interfacing with programs written in other languages

including C C++ Java and Fortran

Although MATLAB is intended primarily for numerical computing an

optional toolbox uses the MuPAD symbolic engine allowing access

to symbolic computing capabilities An additional package Simulink adds

graphicalmulti-domainsimulationandModel-Based

Design for dynamic and embedded systems

In 2004 MATLAB had around one million users across industry

and academia MATLAB users come from various backgrounds

of engineering science and economics MATLAB is widely used in

academic and research institutions as well as industrial enterprises

MATLAB was first adopted by researchers and practitioners

in control engineering Littles specialty but quickly spread to many other

domains It is now also used in education in particular the teaching

of linear algebra and numerical analysis and is popular amongst scientists

involved in image processing The MATLAB application is built around the

MATLAB language The simplest way to execute MATLAB code is to type

it in the Command Window which is one of the elements of the MATLAB

Desktop When code is entered in the Command Window MATLAB can

be used as an interactive mathematical shell Sequences of commands can

be saved in a text file typically using the MATLAB Editor as a script or

encapsulated into a function extending the commands available

MATLAB provides a number of features for documenting and

sharing your work You can integrate your MATLAB code with other

languages and applications and distribute your MATLAB algorithms and

applications

32 FEATURES OF MATLAB

High-level language for technical computing

Development environment for managing code files and data

Interactive tools for iterative exploration design and problem solving

Mathematical functions for linear algebra statistics Fourier analysis

filtering optimization and numerical integration

2-D and 3-D graphics functions for visualizing data

Tools for building custom graphical user interfaces

Functions for integrating MATLAB based algorithms with external

applications and languages such as C C++ FORTRAN Javatrade COM

and Microsoft Excel

MATLAB is used in vast area including signal and image

processing communications control design test and measurement

financial modeling and analysis and computational Add-on toolboxes

(collections of special-purpose MATLAB functions) extend the MATLAB

environment to solve particular classes of problems in these application

areas

MATLAB can be used on personal computers and powerful

server systems including the Cheaha compute cluster With the addition of

the Parallel Computing Toolbox the language can be extended with parallel

implementations for common computational functions including for-loop

unrolling Additionally this toolbox supports offloading computationally

intensive workloads to Cheaha the campus compute cluster MATLAB is

one of a few languages in which each variable is a matrix (broadly

construed) and knows how big it is Moreover the fundamental operators

(eg addition multiplication) are programmed to deal with matrices when

required And the MATLAB environment handles much of the bothersome

housekeeping that makes all this possible Since so many of the procedures

required for Macro-Investment Analysis involves matrices MATLAB

proves to be an extremely efficient language for both communication and

implementation

321 INTERFACING WITH OTHER LANGUAGES

MATLAB can call functions and subroutines written in the C

programming language or FORTRAN A wrapper function is created

allowing MATLAB data types to be passed and returned The dynamically

loadable object files created by compiling such functions are termed MEX-

files (for MATLAB executable)

Libraries written in Java ActiveX or NET can be directly called

from MATLAB and many MATLAB libraries (for

example XML or SQL support) are implemented as wrappers around Java

or ActiveX libraries Calling MATLAB from Java is more complicated but

can be done with MATLAB extension which is sold separately by Math

Works or using an undocumented mechanism called JMI (Java-to-Mat lab

Interface) which should not be confused with the unrelated Java that is also

called JMI

As alternatives to the MuPAD based Symbolic Math Toolbox

available from Math Works MATLAB can be connected

to Maple or Mathematical

Libraries also exist to import and export MathML

Development Environment

Startup Accelerator for faster MATLAB startup on Windows especially on

Windows XP and for network installations

Spreadsheet Import Tool that provides more options for selecting and

loading mixed textual and numeric data

Readability and navigation improvements to warning and error messages in

the MATLAB command window

Automatic variable and function renaming in the MATLAB Editor

Developing Algorithms and Applications

MATLAB provides a high-level language and development

tools that let you quickly develop and analyze your algorithms and

applications

The MATLAB Language

The MATLAB language supports the vector and matrix operations

that are fundamental to engineering and scientific problems It enables fast

development and execution With the MATLAB language you can

program and develop algorithms faster than with traditional languages

because you do not need to perform low-level administrative tasks such as

declaring variables specifying data types and allocating memory In many

cases MATLAB eliminates the need for lsquoforrsquo loops As a result one line of

MATLAB code can often replace several lines of C or C++ code

At the same time MATLAB provides all the features of a traditional

programming language including arithmetic operators flow control data

structures data types object-oriented programming (OOP) and debugging

features

MATLAB lets you execute commands or groups of commands one

at a time without compiling and linking enabling you to quickly iterate to

the optimal solution For fast execution of heavy matrix and vector

computations MATLAB uses processor-optimized libraries For general-

purpose scalar computations MATLAB generates machine-code

instructions using its JIT (Just-In-Time) compilation technology

This technology which is available on most platforms provides

execution speeds that rival those of traditional programming languages

Development Tools

MATLAB includes development tools that help you implement

your algorithm efficiently These include the following

MATLAB Editor

Provides standard editing and debugging features such as setting

breakpoints and single stepping

Code Analyzer

Checks your code for problems and recommends modifications to

maximize performance and maintainability

MATLAB Profiler

Records the time spent executing each line of code

Directory Reports

Scan all the files in a directory and report on code efficiency file

differences file dependencies and code coverage

Designing Graphical User Interfaces

By using the interactive tool GUIDE (Graphical User Interface

Development Environment) to layout design and edit user interfaces

GUIDE lets you include list boxes pull-down menus push buttons radio

buttons and sliders as well as MATLAB plots and Microsoft

ActiveXreg controls Alternatively you can create GUIs programmatically

using MATLAB functions

322 ANALYZING AND ACCESSING DATA

MATLAB supports the entire data analysis process from acquiring

data from external devices and databases through preprocessing

visualization and numerical analysis to producing presentation-quality

output

Data Analysis

MATLAB provides interactive tools and command-line functions for data

analysis operations including

Interpolating and decimating

Extracting sections of data scaling and averaging

Thresholding and smoothing

Correlation Fourier analysis and filtering

1-D peak valley and zero finding

Basic statistics and curve fitting

Matrix analysis

Data Access

MATLAB is an efficient platform for accessing data from

files other applications databases and external devices You can read data

from popular file formats such as Microsoft Excel ASCII text or binary

files image sound and video files and scientific files such as HDF and

HDF5 Low-level binary file IO functions let you work with data files in

any format Additional functions let you read data from Web pages and

XML

Visualizing Data

All the graphics features that are required to visualize engineering

and scientific data are available in MATLAB These include 2-D and 3-D

plotting functions 3-D volume visualization functions tools for

interactively creating plots and the ability to export results to all popular

graphics formats You can customize plots by adding multiple axes

changing line colors and markers adding annotation Latex equations and

legends and drawing shapes

2-D Plotting

Visualizing vectors of data with 2-D plotting functions that create

Line area bar and pie charts

Direction and velocity plots

Histograms

Polygons and surfaces

Scatterbubble plots

Animations

3-D Plotting and Volume Visualization

MATLAB provides functions for visualizing 2-D matrices 3-

D scalar and 3-D vector data You can use these functions to visualize and

understand large often complex multidimensional data Specifying plot

characteristics such as camera viewing angle perspective lighting effect

light source locations and transparency

3-D plotting functions include

Surface contour and mesh

Image plots

Cone slice stream and isosurface

323 PERFORMING NUMERIC COMPUTATION

MATLAB contains mathematical statistical and engineering

functions to support all common engineering and science operations These

functions developed by experts in mathematics are the foundation of the

MATLAB language The core math functions use the LAPACK and BLAS

linear algebra subroutine libraries and the FFTW Discrete Fourier

Transform library Because these processor-dependent libraries are

optimized to the different platforms that MATLAB supports they execute

faster than the equivalent C or C++ code

MATLAB provides the following types of functions for performing

mathematical operations and analyzing data

Matrix manipulation and linear algebra

Polynomials and interpolation

Fourier analysis and filtering

Data analysis and statistics

Optimization and numerical integration

Ordinary differential equations (ODEs)

Partial differential equations (PDEs)

Sparse matrix operations

MATLAB can perform arithmetic on a wide range of data types

including doubles singles and integers

CHAPTER 4

IMPLEMENTATION

41 GENERAL

Matlab is a program that was originally designed to simplify the

implementation of numerical linear algebra routines It has since grown into

something much bigger and it is used to implement numerical algorithms

for a wide range of applications The basic language used is very similar to

standard linear algebra notation but there are a few extensions that will

likely cause you some problems at first

42 SNAPSHOTS

ORIGINAL SCLERA IMAGE IS CONVERTED INTO GREY SCALE IMAGE

FIG

GREY SCALE IMAGE IS CONVERTED INTO BINARY IMAGE

FIG

EDGE DETECTON IS DONE BY OTSUrsquoS THRESHOLDING

FIG

SELECTING THE REGION OF INTEREST (SCLERA PART)

FIG

SELECTED ROI PART

FIG

FIG

ENHANCEMENT OF SCLERA IMAGE

FIG

FEATURE EXTRACTION OF SCLERA IMAGE USING GABOR

FILTERS

FIG

MATCHING WITH IMAGES IN DATABASE

FIG

DISPLAYING THE RESULT (MATCHED OR NOT MATCHED)

FIG

CHAPTER 5

APPLICATIONS

The applications of biometrics can be divided into the following three main groups

Commercial applications such as computer network login electronic data security ecommerce Internet access ATM credit card physical access control cellular phone PDA medical records management distance learning etc

Government applications such as national ID card correctional facility driverrsquos license social security welfare-disbursement border control Passport control etc

Forensic applications such as corpse identification criminal investigation terrorist identification parenthood determination missing children etc Traditionally commercial applications have used knowledge-based systems (eg PIN sand passwords) government applications have used token-based systems (eg ID cards and badges) and forensic applications have relied on human experts to match biometric features Biometric systems are being increasingly deployed in large scale civilian applications The Schiphol Premium scheme at the Amsterdam airport for example employs iris scan cards to speed up the passport and visa control procedures

CHAPTER 6

CONCLUSION AND FUTURE SCOPE

61 CONCLUSION

In this paper we proposed a new parallel sclera vein recognition

method which employees a two stage parallel approach for registration and

matching Even though the research focused on developing a parallel sclera

matching solution for the sequential line-descriptor method using CUDA

GPU architecture the parallel strategies developed in this research can be

applied to design parallel solutions to other sclera vein recognition methods

and general pattern recognition methods We designed the Y shape

descriptor to narrow the search range to increase the matching efficiency

which is a new feature extraction method to take advantage of the GPU

structures We developed the WPL descriptor to incorporate mask

information and make it more suitable for parallel computing which can

dramatically reduce data transferring and computation We then carefully

mapped our algorithms to GPU threads and blocks which is an important

step to achieve parallel computation efficiency using a GPU A work flow

which has high arithmetic intensity to hide the memory access latency was

designed to partition the computation task to the heterogeneous system of

CPU and GPU even to the threads in GPU The proposed method

dramatically improves the matching efficiency without compromising

recognition accuracy

62 REFERENCES

[1] C W Oyster The Human Eye Structure and Function Sunderland

Sinauer Associates 1999

[2] C Cuevas D Berjon F Moran and N Garcia ldquoMoving object

detection for real-time augmented reality applications in a GPGPUrdquo IEEE

Trans Consum Electron vol 58 no 1 pp 117ndash125 Feb 2012

[3] D C Cirean U Meier L M Gambardella and J Schmidhuber ldquoDeep

big simple neural nets for handwritten digit recognitionrdquo Neural Comput

vol 22 no 12 pp 3207ndash3220 2010

[4] F Z Sakr M Taher and A M Wahba ldquoHigh performance iris

recognition system on GPUrdquo in Proc ICCES 2011 pp 237ndash242

[5] G Poli J H Saito J F Mari and M R Zorzan ldquoProcessing

neocognitron of face recognition on high performance environment based

on GPU with CUDA architecturerdquo in Proc 20th Int Symp Comput

Archit High Perform Comput 2008 pp 81ndash88

[6] J Antikainen J Havel R Josth A Herout P Zemcik and M Hauta-

Kasari ldquoNonnegative tensor factorization accelerated using GPGPUrdquo IEEE

Trans Parallel Distrib Syst vol 22 no 7 pp 1135ndash1141 Feb 2011

[7] K-S Oh and K Jung ldquoGPU implementation of neural networksrdquo

Pattern Recognit vol 37 no 6 pp 1311ndash1314 2004

[8] P R Dixon T Oonishi and S Furui ldquoHarnessing graphics processors

for the fast computation of acoustic likelihoods in speech recognitionrdquo

Comput Speech Lang vol 23 no 4 pp 510ndash526 2009

[9] P Kaufman and A Alm ldquoClinical applicationrdquo Adlerrsquos Physiology of

the Eye 2003

[10] R N Rakvic B J Ulis R P Broussard R W Ives and N Steiner

ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics Security vol 4

no 4 pp 812ndash823 Dec 2009

[11] S Crihalmeanu and A Ross ldquoMultispectral scleral patterns for ocular

biometric recognitionrdquo Pattern Recognit Lett vol 33 no 14 pp 1860ndash

1869 Oct 2012

[12] W Wenying Z Dongming Z Yongdong L Jintao and G

Xiaoguang ldquoRobust spatial matching for object retrieval and its parallel

implementation on GPUrdquo IEEE Trans Multimedia vol 13 no 6 pp

1308ndash1318 Dec 2011Multimedia Sec Magdeburg Germany Sep 2004

pp 4ndash15

[13] Y Xu S Deka and R Righetti ldquoA hybrid CPU-GPGPU approach for

real-time elastographyrdquo IEEE Trans Ultrason Ferroelectr Freq Control

vol 58 no 12 pp 2631ndash2645 Dec 2011

[14] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive

multimodal eye recognitionrdquo Signal Image Video Process vol 7 no 4

pp 619ndash631 Jul 2013

[15] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive

approach for sclera image quality measurerdquo Int J Biometrics vol 5 no 2

pp 181ndash198 2013

[16] Z Zhou E Y Du N L Thomas and E J Delp ldquoA new human

identification method Sclera recognitionrdquo IEEE Trans Syst Man

Cybern A Syst Humans vol 42 no 3 pp 571ndash583 May 2012

Page 5: Major project report on

INDEX

ABSTRACThelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

helliphelliphelliphellipi

CHAPTER 1

INTRODUCTIONhelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

hellip1-16

11 GENERAL

12 OVERVIEW ABOUT DIGITAL IMAGE PROCESSING

121 PREPROCESSING

122 IMAGE ENHANCEMENT

123 IMAGE RESTORATION

124 IMAGE COMPRESSION

125 SEGMENTATION

126 IMAGE RESTORATION

127 FUNDAMENTAL STEPS

13 A SIMPLE IMAGE MODEL

14 IMAGE FILE FORMATS

15 TYPE OF IMAGES

151 BINARY IMAGES

152 GRAY SCALE IMAGE

153 COLOR IMAGE

154 INDEXED IMAGE

16 APPLICATIONS OF IMAGE PROCESSING

17 EXIXTING SYSTEM

171 DISADVANTAGES OF EXISTING SYSTEM

18 LITERATURE SURVEY

19 PROPOSED SYSTEM

191 ADVANTAGES

CHAPTER 2 PROJECT

DESCRIPTIONhelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip17-46

21 INTRODUCTION

22 BACKGROUND OF SCLERA VEIN RECOGNITION

221 OVERVIEW OF SCLERA VEIN RECOGNITION

222 SCLERA SEGMENTATION

223 IRIS AND EYELID REFINEMENT

224 OCULAR SURFACE VASCULATURE

225 OVERVIEW OF THE LINE DESCRIPTOR-BASED SCLERA

VEIN

23 EVOLUTION OF GPU ARCHITECTURE

231 PROGRAMMING A GPU FOR GRAPHICS

232 PROGRAMMING A GPU FOR GENERAL-PURPOSE

PROGRAMS (OLD)

233 PROGRAMMING A GPU FOR GENERAL-PURPOSE

PROGRAMS (NEW)

24 COARSE-TO-FINE TWO-STAGE MATCHING PROCESS

241 STAGE I MATCHING WITH Y SHAPE DESCRIPTOR

242 STAGE II FINE MATCHING USING WPL DESCRIPTOR

25 MAPPING THE SUBTASKS TO CUDA

251 MAPPING ALGORITHM TO BLOCKS

252 MAPPING INSIDE BLOCK

253 MEMORY MANAGEMENT

26 HISTOGRAM OF ORIENTED GRADIENTS

CHAPTER 3 SOFTWARE

SPECIFICATIONhelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip47-53

31 GENERAL

32 SOFTWARE REQUIREMENTS

33 INTRODUCTION

34 FEATURES OF MATLAB

341 INTERFACING WITH OTHER LANGUAGES

35 THE MATLAB SYSTEM

351 DESKTOP TOOLS

352 ANALYZING AND ACCESSING DATA

353 PERFORMING NUMERIC COMPUTATION

CHAPTER 4

IMPLEMENTATIONhelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

helliphellip54-69

41 GENERAL

42 CODING IMPLEMENTATION

43 SNAPSHOTS

CHAPTER 5 helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

70

CHAPTER 6 CONCLUSION amp FUTURE

SCOPEhelliphelliphelliphelliphelliphelliphelliphelliphelliphellip71-72

61 CONCLUSION

62 REFERENCES

APPLICATION

LIST OF FIGURES

FIG NO

FIG NAME PGNO

11 Fundamental blocks of digital image processing

2

12 Gray scale image 813 The additive model of RGB 914 The colors created by the

subtractive model of CMYK9

21 The diagram of a typical sclera vein recognition approach

19

22 Steps of Segmentation 2123 Glare area detection 2124 Detection of the sclera area 22

25 Pattern of veins 23

26 Sclera region and its vein

patterns 25

27 Filtering can take place

simultaneously on different

25

parts of the iris image

28 The sketch of parameters of

segment descriptor

26

29 The weighting image 28

210 The module of sclera

template matching

28

211 The Y shape vessel branch in

sclera

28

212 The rotation and scale

invariant character of Y

shape vessel branch

29

213 The line descriptor of the

sclera vessel pattern

30

214 The key elements of

descriptor vector

31

215 Simplified sclera matching

steps on GPU

32

216 Two-stage matching scheme 35

217 Example image from the

UBIRIS database

42

218 Occupancy on various thread

numbers per block

43

219 The task assignment inside

and outside the GPU

44

220 HOG features 46

41 Original sclera image 65

42 Binarised sclera image 65

43 Edge map subtracted image 66

44 Cropping roi 66

45 Roi mask 67

46 Roi finger sclera image 67

47 Enhanced sclera image 68

48 Feature extracted sclera

image

68

49 Matching with images in

database

69

410 Result 69

ABSTRACT

Sclera vein recognition is shown to be a promising method for human

identification However its matching speed is slow which could impact its

application for real-time applications To improve the matching efficiency

we proposed a new parallel sclera vein recognition method using a two-

stage parallel approach for registration and matching First we designed a

rotation- and scale-invariant Y shape descriptor based feature extraction

method to efficiently eliminate most unlikely matches Second we

developed a weighted polar line sclera descriptor structure to incorporate

mask information to reduce GPU memory cost Third we designed a

coarse-to-fine two-stage matching method Finally we developed a

mapping scheme to map the subtasks to GPU processing units The

experimental results show that our proposed method can achieve dramatic

processing speed improvement without compromising the recognition

accuracy

CHAPTER 1

INTRODUCTION

11GENERAL

Digital image processing is the use of computer algorithms to

perform image processing on digital images The 2D continuous image is

divided into N rows and M columns The intersection of a row and a

column is called a pixel The image can also be a function other variables

including depth color and time An image given in the form of a

transparency slide photograph or an X-ray is first digitized and stored as a

matrix of binary digits in computer memory This digitized image can then

be processed andor displayed on a high-resolution television monitor For

display the image is stored in a rapid-access buffer memory which

refreshes the monitor at a rate of 25 frames per second to produce a visually

continuous display

12 OVERVIEW ABOUT DIGITAL IMAGE PROCESSING

The field of ldquoDigital Image Processingrdquo refers to processing the digital

images by means of a digital computer In a broader sense it can be

considered as a processing of any two dimensional data where any image

(optical information) is represented as an array of real or complex numbers

represented by a definite number of bits An image is represented as a two

dimensional function f(xy) where lsquoxrsquo and lsquoyrsquo are spatial (plane)

coordinates and the amplitude of f at any pair of coordinates (xy)

represents the intensity or gray level of the image at that point

A digital image is one for which both the co-ordinates and the

amplitude values of f are all finite discrete quantities Hence a digital

image is composed of a finite number of elements each of which has a

particular location value These elements are called ldquopixelsrdquo A digital

image is discrete in both spatial coordinates and brightness and it can be

considered as a matrix whose rows and column indices identify a point on

the image and the corresponding matrix element value identifies the gray

level at that point

One of the first applications of digital images was in the newspaper

industry when pictures were first sent by submarine cable between London

and New York Introduction of the Bartlane cable picture transmission

system in the early 1920s reduced the time required to transport a picture

across the Atlantic from more than a week to less than three hours

FIG

121 PREPROCESSING

In imaging science image processing is any form of signal

processing for which the input is an image such as a photograph or video

frame the output of image processing may be either an image or a set of

characteristics or parameters related to the image Most image-processing

techniques involve treating the image as a two-dimensional signal and

applying standard signal-processing techniques to it Image processing

usually refers to digital image processing but optical and analog image

processing also are possible This article is about general techniques that

apply to all of them The acquisition of images (producing the input image

in the first place) is referred to as imaging

Image processing refers to processing of a 2D picture by a

computer Basic definitions

An image defined in the ldquoreal worldrdquo is considered to be a function

of two real variables for example a(xy) with a as the amplitude (eg

brightness) of the image at the real coordinate position (xy) Modern digital

technology has made it possible to manipulate multi-dimensional signals

with systems that range from simple digital circuits to advanced parallel

computers The goal of this manipulation can be divided into three

categories

Image processing (image in -gt image out)

Image Analysis (image in -gt measurements out)

Image Understanding (image in -gt high-level description out)

An image may be considered to contain sub-images sometimes referred

to as regions-of-interest ROIs or simply regions This concept reflects the

fact that images frequently contain collections of objects each of which can

be the basis for a region In a sophisticated image processing system it

should be possible to apply specific image processing operations to selected

regions Thus one part of an image (region) might be processed to suppress

motion blur while another part might be processed to improve colour

rendition

Most usually image processing systems require that the images be

available in digitized form that is arrays of finite length binary words For

digitization the given Image is sampled on a discrete grid and each sample

or pixel is quantized using a finite number of bits The digitized image is

processed by a computer To display a digital image it is first converted

into analog signal which is scanned onto a display Closely related to

image processing are computer graphics and computer vision In computer

graphics images are manually made from physical models of objects

environments and lighting instead of being acquired (via imaging devices

such as cameras) from natural scenes as in most animated movies

Computer vision on the other hand is often considered high-level image

processing out of which a machinecomputersoftware intends to decipher

the physical contents of an image or a sequence of images (eg videos or

3D full-body magnetic resonance scans)

In modern sciences and technologies images also gain much

broader scopes due to the ever growing importance of scientific

visualization (of often large-scale complex scientificexperimental data)

Examples include microarray data in genetic research or real-time multi-

asset portfolio trading in finance Before going to processing an image it is

converted into a digital form Digitization includes sampling of image and

quantization of sampled values After converting the image into bit

information processing is performed This processing technique may be

Image enhancement Image restoration and Image compression

122 IMAGE ENHANCEMENT

It refers to accentuation or sharpening of image features such as

boundaries or contrast to make a graphic display more useful for display amp

analysis This process does not increase the inherent information content in

data It includes gray level amp contrast manipulation noise reduction edge

crispening and sharpening filtering interpolation and magnification

pseudo coloring and so on

123 IMAGE RESTORATION

It is concerned with filtering the observed image to minimize the

effect of degradations Effectiveness of image restoration depends on the

extent and accuracy of the knowledge of degradation process as well as on

filter design Image restoration differs from image enhancement in that the

latter is concerned with more extraction or accentuation of image features

124 IMAGE COMPRESSION

It is concerned with minimizing the number of bits required to represent

an image Application of compression are in broadcast TV remote sensing

via satellite military communication via aircraft radar teleconferencing

facsimile transmission for educational amp business documents medical

images that arise in Computer tomography magnetic resonance imaging

and digital radiology motion pictures satellite images weather maps

geological surveys and so on

Text compression ndash CCITT GROUP3 amp GROUP4

Still image compression ndash JPEG

Video image compression ndash MPEG

125 SEGMENTATION

In computer vision image segmentation is the process of

partitioning a digital image into multiple segments (sets of pixels also

known as super pixels) The goal of segmentation is to simplify andor

change the representation of an image into something that is more

meaningful and easier to analyze Image segmentation is typically used to

locate objects and boundaries (lines curves etc) in images More precisely

image segmentation is the process of assigning a label to every pixel in an

image such that pixels with the same label share certain visual

characteristics

The result of image segmentation is a set of segments that

collectively cover the entire image or a set of contours extracted from the

image (see edge detection) Each of the pixels in a region are similar with

respect to some characteristic or computed property such as

colour intensity or texture Adjacent regions are significantly different

with respect to the same characteristic(s) When applied to a stack of

images typical in medical imaging the resulting contours after image

segmentation can be used to create 3D reconstructions with the help of

interpolation algorithms like marching cubes

126 IMAGE RESTORATION

Image restoration like enhancement improves the qualities of image

but all the operations are mainly based on known measured or

degradations of the original image Image restorations are used to restore

images with problems such as geometric distortion improper focus

repetitive noise and camera motion It is used to correct images for known

degradations

127 FUNDAMENTAL STEPS

Image acquisition to acquire a digital image

Image preprocessing to improve the image in ways that increases the

chances for success of the other processes

Image segmentation to partitions an input image into its constituent parts or

objects

Image representation to convert the input data to a form suitable for

computer processing

Image description to extract features that result in some quantitative

information of interest or features that are basic for differentiating one

class of objects from another

Image recognition to assign a label to an object based on the

information provided by its descriptors

Image interpretation to assign meaning to an ensemble of recognized

objects

Knowledge about a problem domain is coded into an image processing

system in the form of a Knowledge database

13 A SIMPLE IMAGE MODEL

To be suitable for computer processing an image f(xy) must be digitalized

both spatially and in amplitude

Digitization of the spatial coordinates (xy) is called image sampling

Amplitude digitization is called gray-level quantization

The storage and processing requirements increase rapidly with the spatial

resolution and the number of gray levels

Example A 256 gray-level image of size 256x256 occupies 64K bytes of

memory

Images of very low spatial resolution produce a checkerboard effect

The use of insufficient number of gray levels in smooth areas of a digital

image results in false contouring

14 IMAGE FILE FORMATS

There are two general groups of lsquoimagesrsquo vector graphics (or line art)

and bitmaps (pixel-based or lsquoimagesrsquo) Some of the most common file

formats are

GIF mdash Graphical interchange Format An 8-bit (256 colour) non-

destructively compressed bitmap format Mostly used for web Has several

sub-standards one of which is the animated GIF

JPEG mdash Joint Photographic Experts Group a very efficient (ie much

information per byte) destructively compressed 24 bit (16 million colours)

bitmap format Widely used especially for web and Internet (bandwidth-

limited)

TIFF mdash Tagged Image File Format The standard 24 bit publication bitmap

format Compresses non-destructively with for instance Lempel-Ziv-

Welch (LZW) compression

PS mdash Postscript a standard vector format Has numerous sub-standards

and can be difficult to transport across platforms and operating systems

PSD ndash Adobe PhotoShop Document a dedicated Photoshop format that

keeps all the information in an image including all the layers

BMP- bit map file format

15 TYPE OF IMAGES

Images are 4 types

1 Binary image

2 Gray scale image

3 Color image

4 Indexed image

151 BINARY IMAGES

A binary image is a digital image that has only two possible values for

each pixel Typically the two colors used for a binary image are black and

white though any two colors can be used Binary images are also called bi-

level or two-level This means that each pixel is stored as a single bitmdashie

a 0 or 1 The names black-and-white BampW

152 GRAY SCALE IMAGE

In a (8-bit) grayscale image each picture element has an assigned intensity

that ranges from 0 to 255 A grey scale image is what people normally call

a black and white image but the name emphasizes that such an image will

also include many shades of grey

FIG

153 COLOR IMAGE

The RGB colour model relates very closely to the way we perceive

colour with the r g and b receptors in our retinas RGB uses additive colour

mixing and is the basic colour model used in television or any other

medium that projects colour with light It is the basic colour model used in

computers and for web graphics but it cannot be used for print production

The secondary colours of RGB ndash cyan magenta and yellow ndash are formed

by mixing two of the primary colours (red green or blue) and excluding the

third colour Red and green combine to make yellow green and blue to

make cyan and blue and red form magenta The combination of red green

and blue in full intensity makes white

In Photoshop using the ldquoscreenrdquo mode for the different layers in an

image will make the intensities mix together according to the additive

colour mixing model This is analogous to stacking slide images on top of

each other and shining light through them

FIG

CMYK The 4-colour CMYK model used in printing lays down

overlapping layers of varying percentages of transparent cyan (C) magenta

(M) and yellow (Y) inks In addition a layer of black (K) ink can be added

The CMYK model uses the subtractive colour model

154 INDEXED IMAGE

FIG

An indexed image consists of an array and a color map matrix The

pixel values in the array are direct indices into a color map By convention

this documentation uses the variable name X to refer to the array and map

to refer to the color map In computing indexed color is a technique to

manage digital images colors in a limited fashion in order to save

computer memory and file storage while speeding up display refresh and

file transfers It is a form of vector quantization compression

When an image is encoded in this way color information is not

directly carried by the image pixel data but is stored in a separate piece of

data called a palette an array of color elements in which every element a

color is indexed by its position within the array The image pixels do not

contain the full specification of its color but only its index in the palette

This technique is sometimes referred as pseudocolor or indirect color as

colors are addressed indirectly

Perhaps the first device that supported palette colors was a random-

access frame buffer described in 1975 by Kajiya Sutherland and Cheadle

This supported a palette of 256 36-bit RGB colors

16 Applications of image processing

Interest in digital image processing methods stems from 2 principal

application areas

1) Improvement of pictorial information for human interpretation

2) Processing of scene data for autonomous machine perception

In the second application area interest focuses on procedures for

extracting from an image

Information in a form suitable for computer processing

Examples include automatic character recognition industrial machine

vision for product assembly and inspection military recognizance

automatic processing of fingerprints etc

17 EXISTING SYSTEM

In Crihalmeanu and Ross proposed three approaches Speed Up Robust

Features (SURF)-based method minutiae detection and direct correlation

matching for feature registration and matching Within these three methods

the SURF method achieves the best accuracy It takes an average of 15

seconds1 using the SURF method to perform a one-to-one matching In

Zhou et al Proposed line descriptor-based method for sclera vein

recognition The matching step (including registration) is the most time-

consuming step in this sclera vein recognition system which costs about 12

seconds to perform a one-to-one matching Both speed was calculated using

a PC with Intelreg Core TM 2 Duo 24GHz processors and 4 GB DRAM

Currently Sclera vein recognition algorithms are designed using central

processing unit (CPU)-based systems

171 DISADVANTAGES OF EXISTING SYSTEM

1 Mask files are used to calculate valid overlapping areas of two sclera

templates and to align the templates to the same coordinate system But the

mask files are large in size and will preoccupy the GPU memory and slow

down the data transfer Also some of processing on the mask files will

involve convolution which is difficult to improve its performance on the

scalar process unit on CUDA

2 The procedure of sclera feature matching consists of a pipeline of several

computational stages with different memory and processing requirements

There is no uniform mapping scheme applicable for all these stages

3 When the scale of sclera database is far larger than the number of the

processing units on the GPU parallel matching on the GPU is still unable to

satisfy the requirement of real-time performance

LITERATURE SURVEY

1 S Crihalmeanu and A Ross ldquo ieee Multispectral scleral patterns

for ocular biometric recognitionrdquo Pattern Recognit Lett vol 33 no

14 pp 1860ndash1869 Oct 2012

Face recognition in unconstrained acquisition conditions is one of the

most challenging problems that has been actively researched in recent

years It is well known that many state-of-the-arts still face recognition

algorithms perform well when constrained (frontal well illuminated high-

resolution sharp and full) face images are acquired However their

performance degrades significantly when the test images contain variations

that are not present in the training images In this paper we highlight some

of the key issues in remote face recognition We define the remote face

recognition as one where faces are several tens of meters (10-250m) from

the cameras We then describe a remote face database which has been

acquired in an unconstrained outdoor maritime environment Recognition

performance of a subset of existing still image-based face recognition

algorithms is evaluated on the remote face data set Further we define the

remote re-identification problem as matching a subject at one location with

candidate sets acquired at a different location and over time in remote

conditions We provide preliminary experimental results on remote re-

identification It is demonstrated that in addition to applying a good

classification algorithm finding features that are robust to variations

mentioned above and developing statistical models which can account for

these variations are very important for remote face recognition

2 R N Rakvic B J Ulis R P Broussard R W Ives and N

Steiner1 ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics

Security

With the rapidly expanded biometric data collected by various sectors

of government and industry for identification and verification purposes

how to manage and process such Big Data draws great concern Even

though modern processors are equipped with more cores and memory

capacity it still requires careful design in order to utilize the hardware

resource effectively and the power consumption efficiently This research

addresses this issue by investigating the workload characteristics of

biometric application Taking Daugmanrsquos iris matching algorithm which

has been proven to be the most reliable iris matching method as a case

study we conduct performance profiling and binary instrumentation on the

benchmark to capture its execution behavior The results show that data

loading and memory access incurs great performance overhead and

motivates us to move the biometrics computation to high-performance

architecture

Modern iris recognition algorithms can be computationally intensive

yet are designed for traditional sequential processing elements such as a

personal computer However a parallel processing alternative using field

programmable gate arrays (FPGAs) offers an opportunity to speed up iris

recognition Within the means of this project iris template generation with

directional filtering which is acomputationally expensive yet parallel

portion of a modern iris recognition algorithm is parallelized on an FPGA

system We will present a performance comparison of the parallelized

algorithm on the FPGA system to a traditional CPU-based version The

parallelized template generation outperforms an optimized C++ code

version determining the information content of an iris approximately 324

times faster

3 R Derakhshani A Ross and S Crihalmeanu ldquoA new biometric

modality based on conjunctival vasculaturerdquo in Proc Artif Neural

Netw Eng 2006 pp 1ndash8

A new biometric indicator based on the patterns of conjunctival

vasculature is proposed Conjunctival vessels can be observed on the visible

part of the sclera that is exposed to the outside world These vessels

demonstrate rich and specific details in visible light and can be easily

photographed using a regular digital camera In this paper we discuss

methods for conjunctival imaging preprocessing and feature extraction in

order to derive a suitable conjunctival vascular template for biometric

authentication Commensurate classification methods along with the

observed accuracy are discussed Experimental results suggest the potential

of using conjunctival vasculature as a biometric measure Identification of

a person based on some unique set of features is an important task The

human identification is possible with several biometric systems and sclera

recognition is one of the promising biometrics The sclera is the white

portion of the human eye The vein pattern seen in the sclera region is

unique to each person Thus the sclera vein pattern is a well suited

biometric technology for human identification The existing methods used

for sclera recognition have some drawbacks like only frontal looking

images are preferred for matching and rotation variance is another problem

These problems are completely eliminated in the proposed system by using

two feature extraction techniques They are Histogram of Oriented

Gradients (HOG) and converting the image into polar form using the

bilinear interpolation technique These two features help the proposed

system to become illumination invariant and rotation invariant The

experimentation is done with the help of UBIRIS database The

experimental result shows that the proposed sclera recognition method can

achieve better accuracy than the previous methods

4 J D Owens M Houston D Luebke S Green J E Stone and J

C Phillips ldquoGPU computingrdquo Proc IEEE vol 96 no 5pp 879ndash899

May 2008

The graphics processing unit (GPU) has become an integral part of

todayrsquos mainstream computing systems Over the past six years there has

been a marked increase in the performance and capabilities of GPUs The

modern GPU is not only a powerful graphics engine but also a highly

parallel programmable processor featuring peak arithmetic and memory

bandwidth that substantially outpaces its CPU counterpart The GPUrsquos

rapid increase in both programmability and capability has spawned a

research community that has successfully mapped a broad range of

computationally demanding complex problems to the GPU This effort in

general purpose computing on the GPU also known as GPU computing

has positioned the GPU as a compelling alternative to traditional

microprocessors in high-performance computer systems of the future We

describe the background hardware and programming model for GPU

computing summarize the state of the art in tools and techniques and

present four GPU computing successes in game physics and computational

biophysics that deliver order-of-magnitude performance gains over

optimized CPU applications

5 H Proenccedila and L A Alexandre ldquoUBIRIS A noisy iris image

databaserdquoin Proc 13th Int Conf Image Anal Process 2005 pp 970ndash

977

This paper proposes algorithms for iris segmentation quality

enhancement match score fusion and indexing to improve both the

accuracy and the speed of iris recognition A curve evolution approach is

proposed to effectively segment a nonideal iris image using the modified

MumfordndashShah functional Different enhancement algorithms are

concurrently applied on the segmented iris image to produce multiple

enhanced versions of the iris image A support-vector-machine-based

learning algorithm selects locally enhanced regions from each globally

enhanced image and combines these good-quality regions to create a single

high-quality iris image Two distinct features are extracted from the high-

quality iris image The global textural feature is extracted using the 1-D log

polar Gabor transform and the local topological feature is extracted using

Euler numbers An intelligent fusion algorithm combines the textural and

topological matching scores to further improve the iris recognition

performance and reduce the false rejection rate whereas an indexing

algorithm enables fast and accurate iris identification The verification and

identification performance of the propose algorithms is validated and

compared with other algorithms using the CASIA Version 3 ICE 2005 and

UBIRIS iris databases

18 PROPOSED METHOD

Proposed a new parallel sclera vein recognition method using a two-

stage parallel approach for registration and matching A parallel sclera

matching solution for Sclera vein recognition using our sequential line-

descriptor method using the CUDA GPU architecture CUDA is a highly

parallel multithreaded many-core processor with tremendous

computational power

It supports not only a traditional graphics pipeline but also computation

on non-graphical data It is relatively straightforward to implement our C

program for CUDA on AMD-based GPU using Open CL Our CUDA

kernels can be directly converted to Open CL kernels by concerning

different syntax for various keywords and built-in functions The mapping

strategy is also effective in Open CL if we regard thread and block in

CUDA as work item and work-group in Open CL Most of our optimization

techniques such as coalesced memory access and prefix sum can work in

Open CL too Moreover since CUDA is a data parallel architecture the

implementation of our approach by Open CL should be programmed in

data-parallel model

In this research we first discuss why the naiumlve parallel approach would

not work We then propose the new sclera descriptor ndash the Y shape sclera

feature-based efficient registration method to speed up the mapping scheme

introduce the ldquoweighted polar line (WPL) descriptorrdquo that would be better

suited for parallel computing to mitigate the mask size issue and develop

our coarse to fine two-stage matching process to dramatically improve the

matching speed These new approaches make the parallel processing

possible and efficient

191PROPOSED SYSTEM ADVANTAGES

1 To improve the efficiency in this research we propose a new descriptor

mdash the Y shape descriptor which can greatly help improve the efficiency of

the coarse registration of two images and can be used to filter out some

non-matching pairs before refined matching

2 We propose the coarse-to-fine two-stage matching process In the first

stage we matched two images coarsely using the Y-shape descriptors

which is very fast to match because no registration was needed The

matching result in this stage can help filter out image pairs with low

similarities

CHAPTER 2

PROJECT DESCRIPTION

21 INTRODUCTION

The sclera is the opaque and white outer layer of the eye The blood

vessel structure of sclera is formed randomly and is unique to each person

which can be used for humanrsquos identification Several researchers have

designed different Sclera vein recognition methods and have shown that it

is promising to use Sclera vein recognition for human identification In

Crihalmeanu and Ross proposed three approaches Speed Up Robust

Features (SURF)-based method minutiae detection and direct correlation

matching for feature registration and matching Within these three methods

the SURF method achieves the best accuracy It takes an average of 15

seconds1 using the SURF method to per- form a one-to-one matching Zhou

et al proposed line descriptor-based method for sclera vein recognition

The matching step (including registration) is the most time-consuming step

in this sclera vein recognition system which costs about 12 seconds to

perform a one-to-one matching Both speed was calculated using a PC with

Intelreg Coretrade 2 Duo 24GHz processors and 4 GB DRAM Currently

Sclera vein recognition algorithms are designed using central processing

unit (CPU)-based systems

As discussed CPU-based systems are designed as sequential

processing devices which may not be efficient in data processing where the

data can be parallelized Because of large time consumption in the matching

step Sclera vein recognition using sequential-based method would be very

challenging to be implemented in a real time biometric system especially

when there is large number of templates in the database for matching GPUs

(as abbreviation of General purpose Graphics Processing Units GPGPUs)

are now popularly used for parallel computing to improve the

computational processing speed and efficiency The highly parallel

structure of GPUs makes them more effective than CPUs for data

processing where processing can be performed in parallel GPUs have been

widely used in biometrics recognition such as speech recognition text

detection handwriting recognition and face recognition In iris

recognition GPU was used to extract the features construct descriptors

and match templates

GPUs are also used for object retrieval and image search Park et al

designed the performance evaluation of image processing algorithms such

as linear feature extraction and multi-view stereo matching on GPUs

However these approaches were designed for their specific biometric

recognition applications and feature searching methods Therefore they may

not be efficient for Sclera vein recognition Compute Unified Device

Architecture (CUDA) the computing engine of NVIDIA GPUs is used in

this research CUDA is a highly parallel multithreaded many-core

processor with tremendous computational power It supports not only a

traditional graphics pipeline but also computation on non-graphical data

More importantly it offers an easier programming platform which

outperforms its CPU counterparts in terms of peak arithmetic intensity and

memory bandwidth In this research the goal is not to develop a unified

strategy to parallelize all sclera matching methods because each method is

quite different from one another and would need customized design To

develop an efficient parallel computing scheme it would need different

strategies for different Sclera vein recognition methods

Rather the goal is to develop a parallel sclera matching solution for

Sclera vein recognition using our sequential line-descriptor method using

the CUDA GPU architecture However the parallelization strategies

developed in this research can be applied to design parallel approaches for

other Sclera vein recognition methods and help parallelize general pattern

recognition methods Based on the matching approach in there are three

challenges to map the task of sclera feature matching to GPU

1) Mask files are used to calculate valid overlapping areas of two sclera

templates and to align the templates to the same coordinate system But the

mask files are large in size and will preoccupy the GPU memory and slow

down the data transfer Also some of processing on the mask files will

involve convolution which is difficult to improve its performance on the

scalar process unit on CUDA

2) The procedure of sclera feature matching consists of a pipeline of several

computational stages with different memory and processing requirements

There is no uniform mapping scheme applicable for all these stages

3) When the scale of sclera database is far larger than the number of the

processing units on the GPU parallel matching on the GPU is still unable to

satisfy the requirement of real-time performance New designs are

necessary to help narrow down the search range In summary naiumlve

implementation of the algorithms in parallel would not work efficiently

Note it is relatively straightforward to implement our C program for

CUDA on AMD-based GPU using Open CL Our CUDA kernels can be

directly converted to Open CL kernels by concerning different syntax for

various keywords and built-in functions The mapping strategy is also

effective in Open CL if we regard thread and block in CUDA as work item

and work-group in Open CL Most of our optimization techniques such as

coalesced memory access and prefix sum can work in Open CL too

Moreover since CUDA is a data parallel architecture the implementation

of our approach by Open CL should be programmed in data-parallel model

In this research we first discuss why the naiumlve parallel approach would not

work (Section 3) We then propose the new sclera descriptor ndash the Y shape

sclera feature-based efficient registration method to speed up the mapping

scheme (Section 4) introduce the ldquoweighted polar line (WPL) descriptorrdquo

that would be better suited for parallel computing to mitigate the mask size

issue (Section 5) and develop our coarse to fine two-stage matching

process to dramatically improve the matching speed (Section 6) These new

approaches make the parallel processing possible and efficient However it

is non-trivial to implement these algorithms in CUDA We then developed

the implementation schemes to map our algorithms into CUDA (Section 7)

In the Section 2 we give brief introduction of Sclera vein recognition In

the Section 8 we performed some experiments using the proposed system

In the Section 9 we draw some conclusions

22 BACKGROUND OF SCLERA VEIN RECOGNITION

221 OVERVIEW OF SCLERA VEIN RECOGNITION

A typical sclera vein recognition system includes sclera

segmentation feature enhancement feature extraction and feature

matching (Figure 1)

FIG

Sclera image segmentation is the first step in sclera vein recognition

Several methods have been designed for sclera segmentation Crihalmeanu

et al presented an semi-automated system for sclera segmentation They

used a clustering algorithm to classify the color eye images into three

clusters - sclera iris and background Later on Crihalmeanu and Ross

designed a segmentation approach based on a normalized sclera index

measure which includes coarse sclera segmentation pupil region

segmentation and fine sclera segmentation Zhou et al developed a skin

tone plus ldquowhite colorrdquo-based voting method for sclera segmentation in

color images and Otsursquos thresholding-based method for grayscale images

After sclera segmentation it is necessary to enhance and extract the sclera

features since the sclera vein patterns often lack contrast and are hard to

detect Zhou et al used a bank of multi-directional Gabor filters for

vascular pattern enhancement Derakhshani et al used contrast limited

adaptive histogram equalization (CLAHE) to enhance the green color plane

of the RGB image and a multi-scale region growing approach to identify

the sclera veins from the image background Crihalmeanu and Ross applied

a selective enhancement filter for blood vessels to extract features from the

green component in a color image In the feature matching step

Crihalmeanu and Ross proposed

three registration and matching approaches including Speed Up Robust

Features (SURF) which is based on interest-point detection minutiae

detection which is based on minutiae points on the vasculature structure

and direct correlation matching which relies on image registration Zhou et

al designed a line descriptor based feature registration and matching

method

The proposed sclera recognition consists of five steps which include

sclera segmentation vein pattern enhancement feature extraction feature

matching and matching decision Fig 2 shows the block diagram of sclera

recognition Two types of feature extraction are used in the proposed

method to achieve good accuracy for the identification The characteristics

that are elicited from the blood vessel structure seen in the sclera region are

Histogram of Oriented Gradients (HOG) and interpolation of Cartesian to

Polar conversion HOG is used to determine the gradient orientation and

edge orientations of vein pattern in the sclera region of an eye image To

become more computationally efficient the data of the image are converted

to the polar form It is mainly used for circular or quasi circular shape of

object These two characteristics are extracted from all the images in the

database and compared with the features of the query image whether the

person is correctly identified or not This procedure is done in the feature

matching step and ultimately makes the matching decision By using the

proposed feature extraction methods and matching techniques the human

identification is more accurate than the existing studies In the proposed

method two features of an image are drawn out

222 SCLERA SEGMENTATION

Sclera segmentation is the first step in the sclera recognition It lets

in three steps glare area detection sclera area estimation and iris and eyelid

detection and refinement Fig shows the steps of segmentation

FIG

Glare Area Detection Glare area means a small bright area near

pupil or iris This is the unwanted portion on the eye image Sobel filter is

applied to detect the glare area present in the iris or pupil Simply it runs

only for the grayscale image If the image is color then it needs a

conversion to grayscale image and after that apply it to the Sobel filter to

detect the glare area Fig 4 shows the result of the glare area detection

FIG

Sclera area estimation For the estimation of sclera area Otsursquos

thresholding method is applied The stairs of the sclera area detection are

selection of the area of interest (ROI) Otsursquos thresholding sclera area

detection Left and right sclera area is selected based on the iris boundaries

When the region of interest is selected then apply Otsursquos thresholding for

obtaining the potential sclera areas The correct left sclera area should be

placed in the right and center positions and correct right sclera area should

be placed in the left and center In this way non sclera areas are wiped out

223 IRIS AND EYELID REFINEMENT

The top and underside of the sclera regions are the limits of the

sclera area And then that upper eyelid lower eyelid and iris boundaries are

refined These altogether are the unwanted portion for recognition In order

to eliminate these effects refinement is done in the footstep of the detection

of sclera area Fig shows after the Otsursquos thresholding process and iris and

eyelid refinement to detect right sclera area In the same way the left sclera

area is detected using this method

FIG

In the segmentation process all images are not perfectly segmented

Hence feature extraction and matching are needed to reduce the

segmentation fault The vein patterns in the sclera area are not visible in the

segmentation process To get vein patterns more visible vein pattern

enhancement is to be performed

224 OCULAR SURFACE VASCULATURE

Human recognition using vascular patterns in the human body has

been studied in the context of fingers (Miura et al 2004) palm (Lin and

Fan 2004) and retina (Hill 1999) In the case of retinal biometrics an

especial optical device for imaging the back of the eyeball is needed (Hill

1999) Due to its perceived invasiveness and the required degree of subject

cooperation the use of retinal biometrics may not be acceptable to some

individuals The conjunctiva is a thin transparent and moist tissue that

covers the outer surface of the eye The part of the conjunctiva that covers

the inner lining of the eyelids is called palpebral conjunctiva and the part

that covers the outer surface of the eye is called ocular (or the bulbar)

conjunctiva which is the focus of this study The ocular conjunctiva is very

thin and clear thus the vasculature (including those of the episclera) is

easily visible through it The visible microcirculation of conjunctiva offers a

rich and complex network of veins and fine microcirculation (Fig 1) The

apparent complexity and specificity of these vascular patterns motivated us

to utilize them for personal identification (Derakhshani and Ross 2006)

FIG

We have found conjunctival vasculature to be a suitable biometric as it

conforms to the following criteria (Jain et al 2004)

UNIVERSALITY All normal living tissues including that of the

conjunctiva and episclera have vascular structure

UNIQUENESS Vasculature is created during embryonic vasculogenesis

Its detailed final structure is mostly stochastic and thus unique Even

though no comprehensive study on the uniqueness of vascular structures

has been conducted study of some targeted areas such as those of the eye

fundus confirm the uniqueness of such vascular patterns even between

identical twins (Simon and Goldstein 1935 Tower 1955)

PERMANENCE Other than cases of significant trauma pathology or

chemical intervention spontaneous adult ocular vasculogenesis and

angiogenesis does not easily occur Thus the conjunctival vascular

structure is expected to have reasonable permanence (Joussen 2001)

Practicality Conjunctival vasculature can be captured with commercial off

the shelf digital cameras under normal lighting conditions making this

modality highly practical

ACCEPTABILITY Since the subject is not required to stare directly into

the camera lens and given the possibility of capturing the conjunctival

vasculature from several feet away this modality is non-intrusive and thus

more acceptable

SPOOF-PROOFNESS The fine multi surface structure of the ocular

veins makes them hard to reproduce as a physical artifact Besides being a

stand-alone biometric modality we anticipate that the addition of

conjunctival biometrics will enhance the performance of current iris-based

biometric system in the following ways

Improving accuracy by the addition of vascular features

Facilitating recognition using off-angle iris images For instance if the iris

information is relegated to the left or right portions of the eye the sclera

vein patterns will be further exposed This feature makes sclera vasculature

a natural complement to the iris biometric

Addressing the failure-to-enroll issue when iris patterns are not usable (eg

due to surgical procedures)

Reducing vulnerability to spoof attacks For instance when implemented

alongside iris systems an attacker needs to reproduce not only the iris but

also different surfaces of the sclera along with the associated

microcirculation and make them available on commensurate eye surfaces

The first step in parallelizing an algorithm is to determine the

availability for simultaneous computation Below Figure demonstrates the

possibility for parallel directional filtering Since the filter is computed over

different portions of the input image the computation can be computed in

parallel (denoted by Elements below) In addition individual parallelization

of each element of Filtering can also be performed A detailed discussion of

our proposed parallelization is outside the scope of this paper

FIG

FIG

225 OVERVIEW OF THE LINE DESCRIPTOR-BASED SCLERA

VEIN

2251 RECOGNITION METHOD

The matching segment of the line-descriptor based method is a

bottleneck with regard to matching speed In this section we briefly

describe the Line Descriptor-based sclera vein recognition method After

segmentation vein patterns were enhanced by a bank of directional Gabor

filters Binary morphological operations are used to thin the detected vein

structure down to a single pixel wide skeleton and remove the branch

points The line descriptor is used to describe the segments in the vein

structure Figure 2 shows a visual description of the line descriptor Each

segment is described by three quantities the segments angle to some

reference angle at the iris center θ the segments distance to the iris center r

and the dominant angular orientation of the line segment ɸ Thus the

descriptor is S = ( θ r ɸ )T The individual components of the line descriptor

are calculated as

FIG

Here fline (x) is the polynomial approximation of the line segment (xl yl )

is the center point of the line segment (xi yi ) is the center of the detected

iris and S is the line descriptor In order to register the segments of the

vascular patterns a RANSAC-based algorithm is used to estimate the best-

fit parameters for registration between the two sclera vascular patterns For

the registration algorithm it randomly chooses two points ndash one from the

test template and one from the target template It also randomly chooses a

scaling factor and a rotation value based on a priori knowledge of the

database Using these values it calculates a fitness value for the registration

using these parameters

After sclera template registration each line segment in the test

template is compared to the line segments in the target template for

matches In order to reduce the effect of segmentation errors we created the

weighting image (Figure 3) from the sclera mask by setting interior pixels

in the sclera mask to 1 pixels within some distance of the boundary of the

mask to 05 and pixels outside the mask to 0

The matching score for two segment descriptors is calculated By

where Si and Sj are two segment descriptors m(Si Sj ) is the matching

score between segments Si and Sj d(Si Sj ) is the Euclidean distance

between the segment descriptors center points (from Eq 6-8) Dmatch is

the matching distance threshold and match is the matching angle threshold

The total matching score M is the sum of the individual matching scores

divided by the maximum matching score for the minimal set between the

test and target template That is one of the test or target templates has fewer

points and thus the sum of its descriptors weight sets the maximum score

that can be attained

FIG

FIG

FIG

FIG

movement of eye Y shape branches are observed to be a stable feature and

can be used as sclera feature descriptor To detect the Y shape branches in

the original template we search for the nearest neighbors set of every line

segment in a regular distance classified the angles among these neighbors

If there were two types of angle values in the line segment set this set may

be inferred as a Y shape structure and the line segment angles would be

recorded as a new feature of the sclera

There are two ways to measure both orientation and relationship of

every branch of Y shape vessels one is to use the angles of every branch to

x axle the other is to use the angels between branch and iris radial

direction The first method needs additional rotation operating to align the

template In our approach we employed the second method As Figure 6

shows ϕ1 ϕ2 and ϕ3 denote the angle between each branch and the radius

from pupil center Even when the head tilts the eye moves or the camera

zooms occurs at the image acquisition step ϕ1 ϕ2 and ϕ3 are quite stable

To tolerate errors from the pupil center calculation in the segmentation step

we also recorded the center position (x y) of the Y shape branches as

auxiliary parameters So in our rotation shift and scale invariant feature

vector is defined as y(ϕ1 ϕ2 ϕ3 x y) The Y-shape descriptor is generated

with reference to the iris center Therefore it is automatically aligned to the

iris centers It is a rotational- and scale- invariant descriptor V WPL

SCLERA DESCRIPTOR As we discussed in the Section 22 the line

descriptor is extracted from the skeleton of vessel structure in binary images

(Figure 7) The skeleton is then broken into smaller segments For each

segment a line descriptor is created to record the center and orientation of

the segment This descriptor is expressed as s(x yɸ) where (x y) is the

position of the center and ɸ is its orientation Because of the limitation of

segmentation accuracy the descriptor in the boundary of sclera area might

not be accurate and may contain spur edges resulting from the iris eyelid

andor eyelashes To be tolerant of such error the mask file

FIG

The line descriptor of the sclera vessel pattern (a) An eye image (b) Vessel

patterns in sclera (c) Enhanced sclera vessel patterns (d) Centers of line

segments of vessel patterns

Is designed to indicate whether a line segment belongs to the edge of the

sclera or not However in GPU application using the mask is a challenging

since the mask files are large in size and will occupy the GPU memory and

slow down the data transfer When matching the registration RANSAC

type algorithm was used to randomly select the corresponding descriptors

and the transform parameter between them was used to generate the

template transform affine matrix After every templates transform the mask

data should also be transformed and new boundary should be calculated to

evaluate the weight of the transformed descriptor This results in too many

convolutions in processor unit

To reduce heavy data transfer and computation we designed the

weighted polar line (WPL) descriptor structure which includes the

information of mask and can be automatically aligned We extracted the

relationship of geometric feature of descriptors and store them as a new

descriptor We use a weighted image created via setting various weight

values according to their positions The weight of those descriptors who are

beyond the sclera are set to be 0 and those who are near the sclera

boundary are 05 and interior descriptors are set to be 1 In our work

descriptors weights were calculated on their own mask by the CPU only

once

The calculating result was saved as a component of descriptor The

descriptor of sclera will change to s(x y ɸw) where w denotes the weight

of the point and the value may be 0 05 1 To align two templates when a

template is shifted to another location along the line connecting their

centers all the descriptors of that template will be transformed It would be

faster if two templates have similar reference points If we use the center of

the iris as the reference point when two templates are compared the

correspondence will automatically be aligned to each other since they have

the similar reference point Every feature vector of the template is a set of

line segment descriptors composed of three variable (Figure 8) the

segment angle to the reference line which went through the iris center

denoted as θ the distance between the segments center and pupil center

which is denoted as r the dominant angular orientation of the segment

denoted as ɸ To minimize the GPU computing we also convert the

descriptor value from polar coordinate to rectangular coordinate in CPU

preprocess

The descriptor vector becomes s(x y r θ ɸw) The left and right

parts of sclera in an eye may have different registration parameters For

example as an eyeball moves left left part sclera patterns of the eye may be

compressed while the right part sclera patterns are stretched

In parallel matching these two parts are assigned to threads in

different warps to allow different deformation The multiprocessor in

CUDA manages threads in groups of 32 parallel threads called warps We

reorganized the descriptor from same sides and saved

FIG

FIG

them in continuous address This would meet requirement of coalesced

memory access in GPU

After reorganizing the structure of descriptors and adding mask information

into the new descriptor the computation on the mask file is not needed on

the GPU It was very fast to match with this feature because it does not

need to reregister the templates every time after shifting Thus the cost of

data transfer and computation on GPU will be reduced Matching on the

new descriptor the shift parameters generator in Figure 4 is then simplified

as Figure 9

23 EVOLUTION OF GPU ARCHITECTURE

The fixed-function pipeline lacked the generality to efficiently express

more complicated shading and lighting operations that are essential for

complex effects The key step was replacing the fixed-function per-vertex

and per-fragment operations with user-specified programs run on each

vertex and fragment Over the past six years these vertex programs and

fragment programs have become increasingly more capable with larger

limits on their size and resource consumption with more fully featured

instruction sets and with more flexible control-flow operations After many

years of separate instruction sets for vertex and fragment operations current

GPUs support the unified Shader Model 40 on both vertex and fragment

shaders

The hardware must support shader programs of at least 65 k static

instructions and unlimited dynamic instructions

The instruction set for the first time supports both 32-bit integers and 32-

bit floating-point numbers

The hardware must allow an arbitrary number of both direct and indirect

reads from global memory (texture)

Finally dynamic flow control in the form of loops and branches must be

supported

As the shader model has evolved and become more powerful and GPU

applications of all types have increased vertex and fragment program

complexity GPU architectures have increasingly focused on the

programmable parts of the graphics pipeline Indeed while previous

generations of GPUs could best be described as additions of

programmability to a fixed-function pipeline todayrsquos GPUs are better

characterized as a programmable engine surrounded by supporting fixed-

function units General-Purpose Computing on the GPU Mapping general-

purpose computation onto the GPU uses the graphics hardware in much the

same way as any standard graphics application Because of this similarity it

is both easier and more difficult to explain the process On one hand the

actual operations are the same and are easy to follow on the other hand the

terminology is different between graphics and general-purpose use Harris

provides an excellent description of this mapping process

We begin by describing GPU programming using graphics terminology

then show how the same steps are used in a general-purpose way to author

GPGPU applications and finally use the same steps to show the more

simple and direct way that todayrsquos GPU computing applications are written

231 PROGRAMMING A GPU FOR GRAPHICS

We begin with the same GPU pipeline that we described in Section II

concentrating on the programmable aspects of this pipeline

The programmer specifies geometry that covers a region on the screen

The rasterizer generates a fragment at each pixel location covered by that

geometry

Each fragment is shaded by the fragment program

The fragment program computes the value of the fragment by a

combination of math operations and global memory reads from a global

Btexture[ memory

The resulting image can then be used as texture on future passes through

the graphics pipeline

232 PROGRAMMING A GPU FOR GENERAL-PURPOSE

PROGRAMS (OLD)

Coopting this pipeline to perform general-purpose computation

involves the exact same steps but different terminology A motivating

example is a fluid simulation computed over a grid at each time step we

compute the next state of the fluid for each grid point from the current state

at its grid point and at the grid points of its neighbors

The programmer specifies a geometric primitive that covers a

computation domain of interest The rasterizer generates a fragment at each

pixel location covered by that geometry (In our example our primitive

must cover a grid of fragments equal to the domain size of our fluid

simulation)

Each fragment is shaded by an SPMD general-purpose fragment

program (Each grid point runs the same program to update the state of its

fluid)

The fragment program computes the value of the fragment by a

combination of math operations and Bgather[ accesses from global

memory (Each grid point can access the state of its neighbors from the

previous time step in computing its current value)

The resulting buffer in global memory can then be used as an input on

future passes (The current state of the fluid will be used on the next time

step)

23 2 PROGRAMMING A GPU FOR GENERAL-PURPOSE

PROGRAMS (NEW)

One of the historical difficulties in programming GPGPU applications

has been that despite their general-purpose tasksrsquo having nothing to do with

graphics the applications still had to be programmed using graphics APIs

In addition the program had to be structured in terms of the graphics

pipeline with the programmable units only accessible as an intermediate

step in that pipeline when the programmer would almost certainly prefer to

access the programmable units directly The programming environments we

describe in detail in Section IV are solving this difficulty by providing a

more natural direct non-graphics interface to the hardware and

specifically the programmable units Today GPU computing applications

are structured in the following way

The programmer directly defines the computation domain of interest as a

structured grid of threads

An SPMD general-purpose program computes the value of each thread

The value for each thread is computed by a combination of math

operations and both Bgather[ (read) accesses from and Bscatter[ (write)

accesses to global memory Unlike in the previous two

methods the same buffer can be used for both reading and writing

allowing more flexible algorithms (for example in-place algorithms that

use less memory)

The resulting buffer in global memory can then be used as an input in

future computation

24 COARSE-TO-FINE TWO-STAGE MATCHING PROCESS

To further improve the matching process we propose the coarse-to-fine

two-stage matching process In the first stage we matched two images

coarsely using the Y-shape descriptors which is very fast to match because

no registration was needed The matching result in this stage can help filter

out image pairs with low similarities After this step it is still possible for

some false positive matches In the second stage we used WPL descriptor

to register the two images for more detailed descriptor matching including

scale- and translation invariance This stage includes shift transform affine

matrix generation and final WPL descriptor matching Overall we

partitioned the registration and matching processing into four kernels2 in

CUDA (Figure 10) matching on the Y shape descriptor shift

transformation affine matrix generation and final WSL descriptor

matching Combining these two stages the matching program can run faster

and

achieve more accurate score

241 STAGE I MATCHING WITH Y SHAPE DESCRIPTOR

Due to scale- and rotation- invariance of the Y-shape features

registration is unnecessary before matching on Y shape descriptor The

whole matching algorithm is listed as algorithm 1

FIG

Here ytei and yta j are the Y shape descriptors of test template Tte

and target template Tta respectively dϕ is the Euclidian distance of angle

element of descriptors vector defined as (3) dxy is the Euclidian distance of

two descriptor centers defined as (4) ni and di are the matched descriptor

pairsrsquo number and their centers distance respectively tϕ is a distance

threshold and txy is the threshold to restrict the searching area We set tϕ to

30 and txy to 675 in our experiment Here

To match two sclera templates we searched the areas nearby to all

the Y shape branches The search area is limited to the corresponding left or

right half of the sclera in order to reduce the searching range and time The

distance of two branches is defined in (3) where ϕi j is the angle between

the j th branch and the polar from pupil center in desctiptor i

The number of matched pairs ni and the distance between Y shape

branches centers di are stored as the matching result We fuse the number of

matched branches and the average distance between matched branches

centers as (2) Here α is a factor to fuse the matching score which was set

to 30 in our study Ni and Nj is the total numbers of feature vectors in

template i and j separately The decision is regulated by the threshold t if

the sclerarsquos matching score is lower than t the sclera will be discarded The

sclera with high matching score will be passed to the next more precisely

matching process

242 STAGE II FINE MATCHING USING WPL DESCRIPTOR

The line segment WSL descriptor reveals more vessel structure detail of

sclera than the Y shape descriptor The variation of sclera vessel pattern is

nonlinear because

When acquiring an eye image in different gaze angle the vessel structure

will appear nonlinear shrink or extend because eyeball is spherical in shape

sclera is made up of four layers episclera stroma lamina fusca and

endothelium There are slightly differences among movement of these

layers Considering these factors our registration employed both single

shift transform and multi-parameter transform which combines shift

rotation and scale together

1) SHIFT PARAMETER SEARCH As we discussed before

segmentation may not be accurate As a result the detected iris center could

not be very accurate Shift transform is designed to tolerant possible errors

in pupil center detection in the segmentation step If there is no deformation

or only very minor deformation registration with shift transform together

would be adequate to achieve an accurate result We designed Algorithm 2

to get optimized shift parameter Where Tte is the test template and ssei is

the i th WPL descriptor of Tte Tta is the target template and ssai is the i th

WPL descriptor of Ttad(stek staj ) is Euclidean distance of escriptors stek

and staj

Δsk is the shift value of two descriptors defines as

We first randomly select an equal number of segment descriptors

stek in test template Tte from each quad and find its nearest neighbors staj _

in target template Tta The shift offset of them is recorded as the possible

registration shift factor _sk The final offset registration factor is _soptim

which has the smallest standard deviation among these candidate offsets

2) AFFINE TRANSFORM PARAMETER SEARCH

Affine transform is designed to tolerant some deformation of sclera

patterns in the matching step The affine transform algorithm is shown in

Algorithm 3 The shift value in the parameter set is obtained by randomly

selecting descriptor s(it )te and calculating the distance from its nearest

neighbor staj_ in Tta We transform the test template by the matrix in (7)

At end of the iteration we count the numbers of matched descriptor pairs

from the transformed template and the target template The factor β is

involved to determine if the pair of descriptor is matched and we set it to

be 20 pixels in our experiment After N iterations the optimized transform

parameter set is determined via selecting the maximum matching numbers

m(it) Here stei Tte staj and Tta is defined same as algorithm 2 tr (it )

shi f t θ(it )tr (it ) scale is the parameters of shift rotation and scale

transform generated in i tth iteration R(θ (it )) T (tr (it ) shi f t ) and S(tr (it

) scale) are the transform matrix defined as (7) To search optimize

transform parameter we iterated N times to generate these parameters In

our experiment we set iteration time to 512

3) REGISTRATION AND MATCHING ALGORITHM

Using the optimized parameter set determined from Algorithms 2

and 3 the test template will be registered and matched simultaneously The

registration and matching algorithm is listed in Algorithm 4 Here stei Tte

staj and Tta are defined same as Algorithms 2 and 3 θ(optm) tr (optm) shi f

t tr (optm) scale _soptim are the registration parameters attained from

Algorithms 2 and 3 R_θ(optm)_T _tr (optm) shi f t _S_tr (optm) scale

_ is the descriptor transform matrix defined in Algorithm 3 empty is the angle

between the segment descriptor and radius direction w is the weight of the

descriptor which indicates whether the descriptor is at the edge of sclera or

not To ensure that the nearest descriptors have a similar orientation we

used a constant factor α to check the abstract difference of two ɸ In our

experiment we set α to 5 The total matching score is minimal score of two

transformed result divided by the minimal matching score for test template

and target template

25 MAPPING THE SUBTASKS TO CUDA

CUDA is a single instruction multiple data (SIMD) system and

works as a coprocessor with a CPU A CUDA consists of many streaming

multiprocessors (SM) where the parallel part of the program should be

partitioned into threads by the programmer and mapped into those threads

There are multiple memory spaces in the CUDA memory hierarchy

register local memory shared memory global memory constant memory

and texture memory Register local memory and shared memory are on-

chip and could be a little time consuming to access these memories Only

shared memory can be accessed by other threads within the same block

However there is only limited availability of shared memory Global

memory constant memory and texture memory are off-chip memory and

accessible by all threads which would be very time consuming to access

these memories

Constant memory and texture memory are read-only and cacheable

memory Mapping algorithms to CUDA to achieve efficient processing is

not a trivial task There are several challenges in CUDA programming

If threads in a warp have different control path all the branches will be

executed serially To improve performance branch divergence within a

warp should be avoided

Global memory is slower than on-chip memory in term of access To

completely hide the latency of the small instructions set we should use on-

chip memory preferentially rather than global memory When global

memory access occurs threads in same warp should access the words in

sequence to achieve coalescence

Shared memory is much faster than the local and global memory space

But shared memory is organized into banks which are equal in size If two

addresses of memory request from different thread within a warp fall in the

same memory bank the access will be serialized To get maximum

performance memory requests should be scheduled to minimize bank

conflicts

251 MAPPING ALGORITHM TO BLOCKS

Because the proposed registration and matching algorithm has four

independent modules all the modules will be converted to different kernels

on the GPU These kernels are different in computation density thus we

map them to the GPU by various map strategies to fully utilize the

computing power of CUDA Figure 11 shows our scheme of CPU-GPU

task distribution and the partition among blocks and threads Algorithm 1 is

partitioned into coarse-grained parallel subtasks

We create a number of threads in this kernel The number of threads

is the same as the number of templates in the database As the upper middle

column shows in Figure 11 each target template will be assigned to one

thread One thread performs a pair of templates compare In our work we

use NVIDIA C2070 as our GPU Threads and blocks number is set to

1024 That means we can match our test template with up to 1024times1024

target templates at same time

Algorithms 2-4 will be partitioned into fine-grained subtasks which is

processed a section of descriptors in one thread As the lower portion of the

middle column shows in Figure 11 we assigned a target template to one

block Inside a block one thread corresponds a set of descriptors in this

template This partition makes every block execute independently and there

are no data exchange requirements between different blocks When all

threads complete their responding descriptor fractions the sum of the

intermediate results needs to be computed or compared A parallel prefix

sum algorithm is used to calculate the sum of intermediate results which is

show in right of Figure 11 Firstly all odd number threads compute the sum

of consecutive pairs of the results Then recursively every first of i (= 4 8

16 32 64 ) threads

compute the prefix sum on the new result The final result will be saved in

the first address which has the same variable name as the first intermediate

result

252 MAPPING INSIDE BLOCK

In shift argument searching there are two schemes we can choose to

map task

Mapping one pair of templates to all the threads in a block and then every

thread would take charge of a fraction of descriptors and cooperation with

other threads

Assigning a single possible shift offset to a thread and all the threads will

compute independently unless the final result should be compared with

other possible offset

Due to great number of sum and synchronization operations in every

nearest neighbor searching step we choose the second method to parallelize

shift searching In affine matrix generator we mapped an entire parameter

set searching to a thread and every thread randomly generated a set of

parameters and tried them independently The generated iterations were

assigned to all threads The challenge of this step is the randomly generated

numbers might be correlated among threads In the step of rotation and

scale registration generating we used the Mersenne Twister pseudorandom

number generator because it can use bitwise arithmetic and have long

period

The Mersenne twister as most of pseudorandom generators is iterative

Therefore itrsquos hard to parallelize a single twister state update step among

several execution threads To make sure that thousands of threads in the

launch grid generate uncorrelated random sequence many simultaneous

Mersenne twisters need to process with different initial states in parallel

But even ldquovery differentrdquo (by any definition) initial state values do not

prevent the emission of correlated sequences by each generator sharing

identical parameters To solve this problem and to enable efficient

implementation of Mersenne Twister on parallel architectures we used a

special offline tool for the dynamic creation of Mersenne Twisters

parameters modified from the algorithm developed by Makoto Matsumoto

and Takuji Nishimura In the registration and matching step when

searching the nearest neighbor a line segment that has already matched

with others should not be used again In our approach a flag

FIG

FIG

Variable denoting whether the line has been matched is stored in

shared memory To share the flags all the threads in a block should wait

synchronic operation at every query step Our solution is to use a single

thread in a block to process the matching

253 MEMORY MANAGEMENT

The bandwidth inside GPU board is much higher than the

bandwidth between host memory and device memory The data transfer

between host and device can lead to long latency As shown in Figure 11

we load the entire target templates set from database without considering

when they would be processed Therefore there was no data transfer from

host to device during the matching procedure In global memory the

components in descriptor y(ϕ1 ϕ2 ϕ3 x y) and s(x y rθ ϕw) were stored

separately This would guarantee contiguous kernels of Algorithm 2 to 4

can access their data in successive addresses Although such coalescing

access reduces the latency frequently global memory access was still a

slower way to get data In our kernel we loaded the test template to shared

memory to accelerate memory access Because the Algorithms 2 to 4

execute different number of iterations on same data the bank conflict does

not happen To maximize our texture memory space we set the system

cache to the lowest value and bonded our target descriptor to texture

memory Using this catchable memory our data access was accelerated

more

FIG

26 HISTOGRAM OF ORIENTED GRADIENTS

Histogram of oriented gradients is the feature descriptors It is primarily

applied to the design of target detection In this paper it is applied as the

feature for human recognition In the sclera region the vein patterns are the

edges of an image So HOG is used to determine the gradient orientation

and edge orientations of vein pattern in the sclera region of an eye image

To follow out this technique first of all divide the image into small

connected regions called cells For each cell compute the histogram of

gradient directions or edge orientations of the pixels Then the combination

of different histogram of different cell represents the descriptor To improve

accuracy histograms can be contrast normalized by calculating the intensity

from the block and then using this value normalizes all cells within the

block This normalization result shows that it is invariant to geometric and

photometric changes The gradient magnitude m(x y) and orientation 1050592(x

y) are calculated using x and y directions gradients dx (x y) and dy (x y)

Orientation binning is the second step of HOG This method utilized

to create cell histograms Each pixel within the cell used to give a weight to

the orientation which is found in the gradient computation Gradient

magnitude is used as the weight The cells are in the rectangular form The

binning of gradient orientation should be spread over 0 to 180 degrees and

opposite direction counts as the same In the Fig 8 depicts the edge

orientation of picture elements If the images have any illumination and

contrast changes then the gradient strength must be locally normalized For

that cells are grouped together into larger blocks These blocks are

overlapping so that each cell contributes more than once to the final

descriptor Here rectangular HOG (R-HOG) blocks are applied which are

mainly in square grids The performance of HOG is improved by putting

on a Gaussian window into each block

FIG

CHAPTER 3

SOFTWARE SPECIFICATION

31 GENERAL

MATLAB(matrix laboratory) is a numerical

computing environment and fourth-generation programming language

Developed by Math Works MATLAB allows matrix manipulations

plotting of functions and data implementation of algorithms creation

of user interfaces and interfacing with programs written in other languages

including C C++ Java and Fortran

Although MATLAB is intended primarily for numerical computing an

optional toolbox uses the MuPAD symbolic engine allowing access

to symbolic computing capabilities An additional package Simulink adds

graphicalmulti-domainsimulationandModel-Based

Design for dynamic and embedded systems

In 2004 MATLAB had around one million users across industry

and academia MATLAB users come from various backgrounds

of engineering science and economics MATLAB is widely used in

academic and research institutions as well as industrial enterprises

MATLAB was first adopted by researchers and practitioners

in control engineering Littles specialty but quickly spread to many other

domains It is now also used in education in particular the teaching

of linear algebra and numerical analysis and is popular amongst scientists

involved in image processing The MATLAB application is built around the

MATLAB language The simplest way to execute MATLAB code is to type

it in the Command Window which is one of the elements of the MATLAB

Desktop When code is entered in the Command Window MATLAB can

be used as an interactive mathematical shell Sequences of commands can

be saved in a text file typically using the MATLAB Editor as a script or

encapsulated into a function extending the commands available

MATLAB provides a number of features for documenting and

sharing your work You can integrate your MATLAB code with other

languages and applications and distribute your MATLAB algorithms and

applications

32 FEATURES OF MATLAB

High-level language for technical computing

Development environment for managing code files and data

Interactive tools for iterative exploration design and problem solving

Mathematical functions for linear algebra statistics Fourier analysis

filtering optimization and numerical integration

2-D and 3-D graphics functions for visualizing data

Tools for building custom graphical user interfaces

Functions for integrating MATLAB based algorithms with external

applications and languages such as C C++ FORTRAN Javatrade COM

and Microsoft Excel

MATLAB is used in vast area including signal and image

processing communications control design test and measurement

financial modeling and analysis and computational Add-on toolboxes

(collections of special-purpose MATLAB functions) extend the MATLAB

environment to solve particular classes of problems in these application

areas

MATLAB can be used on personal computers and powerful

server systems including the Cheaha compute cluster With the addition of

the Parallel Computing Toolbox the language can be extended with parallel

implementations for common computational functions including for-loop

unrolling Additionally this toolbox supports offloading computationally

intensive workloads to Cheaha the campus compute cluster MATLAB is

one of a few languages in which each variable is a matrix (broadly

construed) and knows how big it is Moreover the fundamental operators

(eg addition multiplication) are programmed to deal with matrices when

required And the MATLAB environment handles much of the bothersome

housekeeping that makes all this possible Since so many of the procedures

required for Macro-Investment Analysis involves matrices MATLAB

proves to be an extremely efficient language for both communication and

implementation

321 INTERFACING WITH OTHER LANGUAGES

MATLAB can call functions and subroutines written in the C

programming language or FORTRAN A wrapper function is created

allowing MATLAB data types to be passed and returned The dynamically

loadable object files created by compiling such functions are termed MEX-

files (for MATLAB executable)

Libraries written in Java ActiveX or NET can be directly called

from MATLAB and many MATLAB libraries (for

example XML or SQL support) are implemented as wrappers around Java

or ActiveX libraries Calling MATLAB from Java is more complicated but

can be done with MATLAB extension which is sold separately by Math

Works or using an undocumented mechanism called JMI (Java-to-Mat lab

Interface) which should not be confused with the unrelated Java that is also

called JMI

As alternatives to the MuPAD based Symbolic Math Toolbox

available from Math Works MATLAB can be connected

to Maple or Mathematical

Libraries also exist to import and export MathML

Development Environment

Startup Accelerator for faster MATLAB startup on Windows especially on

Windows XP and for network installations

Spreadsheet Import Tool that provides more options for selecting and

loading mixed textual and numeric data

Readability and navigation improvements to warning and error messages in

the MATLAB command window

Automatic variable and function renaming in the MATLAB Editor

Developing Algorithms and Applications

MATLAB provides a high-level language and development

tools that let you quickly develop and analyze your algorithms and

applications

The MATLAB Language

The MATLAB language supports the vector and matrix operations

that are fundamental to engineering and scientific problems It enables fast

development and execution With the MATLAB language you can

program and develop algorithms faster than with traditional languages

because you do not need to perform low-level administrative tasks such as

declaring variables specifying data types and allocating memory In many

cases MATLAB eliminates the need for lsquoforrsquo loops As a result one line of

MATLAB code can often replace several lines of C or C++ code

At the same time MATLAB provides all the features of a traditional

programming language including arithmetic operators flow control data

structures data types object-oriented programming (OOP) and debugging

features

MATLAB lets you execute commands or groups of commands one

at a time without compiling and linking enabling you to quickly iterate to

the optimal solution For fast execution of heavy matrix and vector

computations MATLAB uses processor-optimized libraries For general-

purpose scalar computations MATLAB generates machine-code

instructions using its JIT (Just-In-Time) compilation technology

This technology which is available on most platforms provides

execution speeds that rival those of traditional programming languages

Development Tools

MATLAB includes development tools that help you implement

your algorithm efficiently These include the following

MATLAB Editor

Provides standard editing and debugging features such as setting

breakpoints and single stepping

Code Analyzer

Checks your code for problems and recommends modifications to

maximize performance and maintainability

MATLAB Profiler

Records the time spent executing each line of code

Directory Reports

Scan all the files in a directory and report on code efficiency file

differences file dependencies and code coverage

Designing Graphical User Interfaces

By using the interactive tool GUIDE (Graphical User Interface

Development Environment) to layout design and edit user interfaces

GUIDE lets you include list boxes pull-down menus push buttons radio

buttons and sliders as well as MATLAB plots and Microsoft

ActiveXreg controls Alternatively you can create GUIs programmatically

using MATLAB functions

322 ANALYZING AND ACCESSING DATA

MATLAB supports the entire data analysis process from acquiring

data from external devices and databases through preprocessing

visualization and numerical analysis to producing presentation-quality

output

Data Analysis

MATLAB provides interactive tools and command-line functions for data

analysis operations including

Interpolating and decimating

Extracting sections of data scaling and averaging

Thresholding and smoothing

Correlation Fourier analysis and filtering

1-D peak valley and zero finding

Basic statistics and curve fitting

Matrix analysis

Data Access

MATLAB is an efficient platform for accessing data from

files other applications databases and external devices You can read data

from popular file formats such as Microsoft Excel ASCII text or binary

files image sound and video files and scientific files such as HDF and

HDF5 Low-level binary file IO functions let you work with data files in

any format Additional functions let you read data from Web pages and

XML

Visualizing Data

All the graphics features that are required to visualize engineering

and scientific data are available in MATLAB These include 2-D and 3-D

plotting functions 3-D volume visualization functions tools for

interactively creating plots and the ability to export results to all popular

graphics formats You can customize plots by adding multiple axes

changing line colors and markers adding annotation Latex equations and

legends and drawing shapes

2-D Plotting

Visualizing vectors of data with 2-D plotting functions that create

Line area bar and pie charts

Direction and velocity plots

Histograms

Polygons and surfaces

Scatterbubble plots

Animations

3-D Plotting and Volume Visualization

MATLAB provides functions for visualizing 2-D matrices 3-

D scalar and 3-D vector data You can use these functions to visualize and

understand large often complex multidimensional data Specifying plot

characteristics such as camera viewing angle perspective lighting effect

light source locations and transparency

3-D plotting functions include

Surface contour and mesh

Image plots

Cone slice stream and isosurface

323 PERFORMING NUMERIC COMPUTATION

MATLAB contains mathematical statistical and engineering

functions to support all common engineering and science operations These

functions developed by experts in mathematics are the foundation of the

MATLAB language The core math functions use the LAPACK and BLAS

linear algebra subroutine libraries and the FFTW Discrete Fourier

Transform library Because these processor-dependent libraries are

optimized to the different platforms that MATLAB supports they execute

faster than the equivalent C or C++ code

MATLAB provides the following types of functions for performing

mathematical operations and analyzing data

Matrix manipulation and linear algebra

Polynomials and interpolation

Fourier analysis and filtering

Data analysis and statistics

Optimization and numerical integration

Ordinary differential equations (ODEs)

Partial differential equations (PDEs)

Sparse matrix operations

MATLAB can perform arithmetic on a wide range of data types

including doubles singles and integers

CHAPTER 4

IMPLEMENTATION

41 GENERAL

Matlab is a program that was originally designed to simplify the

implementation of numerical linear algebra routines It has since grown into

something much bigger and it is used to implement numerical algorithms

for a wide range of applications The basic language used is very similar to

standard linear algebra notation but there are a few extensions that will

likely cause you some problems at first

42 SNAPSHOTS

ORIGINAL SCLERA IMAGE IS CONVERTED INTO GREY SCALE IMAGE

FIG

GREY SCALE IMAGE IS CONVERTED INTO BINARY IMAGE

FIG

EDGE DETECTON IS DONE BY OTSUrsquoS THRESHOLDING

FIG

SELECTING THE REGION OF INTEREST (SCLERA PART)

FIG

SELECTED ROI PART

FIG

FIG

ENHANCEMENT OF SCLERA IMAGE

FIG

FEATURE EXTRACTION OF SCLERA IMAGE USING GABOR

FILTERS

FIG

MATCHING WITH IMAGES IN DATABASE

FIG

DISPLAYING THE RESULT (MATCHED OR NOT MATCHED)

FIG

CHAPTER 5

APPLICATIONS

The applications of biometrics can be divided into the following three main groups

Commercial applications such as computer network login electronic data security ecommerce Internet access ATM credit card physical access control cellular phone PDA medical records management distance learning etc

Government applications such as national ID card correctional facility driverrsquos license social security welfare-disbursement border control Passport control etc

Forensic applications such as corpse identification criminal investigation terrorist identification parenthood determination missing children etc Traditionally commercial applications have used knowledge-based systems (eg PIN sand passwords) government applications have used token-based systems (eg ID cards and badges) and forensic applications have relied on human experts to match biometric features Biometric systems are being increasingly deployed in large scale civilian applications The Schiphol Premium scheme at the Amsterdam airport for example employs iris scan cards to speed up the passport and visa control procedures

CHAPTER 6

CONCLUSION AND FUTURE SCOPE

61 CONCLUSION

In this paper we proposed a new parallel sclera vein recognition

method which employees a two stage parallel approach for registration and

matching Even though the research focused on developing a parallel sclera

matching solution for the sequential line-descriptor method using CUDA

GPU architecture the parallel strategies developed in this research can be

applied to design parallel solutions to other sclera vein recognition methods

and general pattern recognition methods We designed the Y shape

descriptor to narrow the search range to increase the matching efficiency

which is a new feature extraction method to take advantage of the GPU

structures We developed the WPL descriptor to incorporate mask

information and make it more suitable for parallel computing which can

dramatically reduce data transferring and computation We then carefully

mapped our algorithms to GPU threads and blocks which is an important

step to achieve parallel computation efficiency using a GPU A work flow

which has high arithmetic intensity to hide the memory access latency was

designed to partition the computation task to the heterogeneous system of

CPU and GPU even to the threads in GPU The proposed method

dramatically improves the matching efficiency without compromising

recognition accuracy

62 REFERENCES

[1] C W Oyster The Human Eye Structure and Function Sunderland

Sinauer Associates 1999

[2] C Cuevas D Berjon F Moran and N Garcia ldquoMoving object

detection for real-time augmented reality applications in a GPGPUrdquo IEEE

Trans Consum Electron vol 58 no 1 pp 117ndash125 Feb 2012

[3] D C Cirean U Meier L M Gambardella and J Schmidhuber ldquoDeep

big simple neural nets for handwritten digit recognitionrdquo Neural Comput

vol 22 no 12 pp 3207ndash3220 2010

[4] F Z Sakr M Taher and A M Wahba ldquoHigh performance iris

recognition system on GPUrdquo in Proc ICCES 2011 pp 237ndash242

[5] G Poli J H Saito J F Mari and M R Zorzan ldquoProcessing

neocognitron of face recognition on high performance environment based

on GPU with CUDA architecturerdquo in Proc 20th Int Symp Comput

Archit High Perform Comput 2008 pp 81ndash88

[6] J Antikainen J Havel R Josth A Herout P Zemcik and M Hauta-

Kasari ldquoNonnegative tensor factorization accelerated using GPGPUrdquo IEEE

Trans Parallel Distrib Syst vol 22 no 7 pp 1135ndash1141 Feb 2011

[7] K-S Oh and K Jung ldquoGPU implementation of neural networksrdquo

Pattern Recognit vol 37 no 6 pp 1311ndash1314 2004

[8] P R Dixon T Oonishi and S Furui ldquoHarnessing graphics processors

for the fast computation of acoustic likelihoods in speech recognitionrdquo

Comput Speech Lang vol 23 no 4 pp 510ndash526 2009

[9] P Kaufman and A Alm ldquoClinical applicationrdquo Adlerrsquos Physiology of

the Eye 2003

[10] R N Rakvic B J Ulis R P Broussard R W Ives and N Steiner

ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics Security vol 4

no 4 pp 812ndash823 Dec 2009

[11] S Crihalmeanu and A Ross ldquoMultispectral scleral patterns for ocular

biometric recognitionrdquo Pattern Recognit Lett vol 33 no 14 pp 1860ndash

1869 Oct 2012

[12] W Wenying Z Dongming Z Yongdong L Jintao and G

Xiaoguang ldquoRobust spatial matching for object retrieval and its parallel

implementation on GPUrdquo IEEE Trans Multimedia vol 13 no 6 pp

1308ndash1318 Dec 2011Multimedia Sec Magdeburg Germany Sep 2004

pp 4ndash15

[13] Y Xu S Deka and R Righetti ldquoA hybrid CPU-GPGPU approach for

real-time elastographyrdquo IEEE Trans Ultrason Ferroelectr Freq Control

vol 58 no 12 pp 2631ndash2645 Dec 2011

[14] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive

multimodal eye recognitionrdquo Signal Image Video Process vol 7 no 4

pp 619ndash631 Jul 2013

[15] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive

approach for sclera image quality measurerdquo Int J Biometrics vol 5 no 2

pp 181ndash198 2013

[16] Z Zhou E Y Du N L Thomas and E J Delp ldquoA new human

identification method Sclera recognitionrdquo IEEE Trans Syst Man

Cybern A Syst Humans vol 42 no 3 pp 571ndash583 May 2012

Page 6: Major project report on

16 APPLICATIONS OF IMAGE PROCESSING

17 EXIXTING SYSTEM

171 DISADVANTAGES OF EXISTING SYSTEM

18 LITERATURE SURVEY

19 PROPOSED SYSTEM

191 ADVANTAGES

CHAPTER 2 PROJECT

DESCRIPTIONhelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip17-46

21 INTRODUCTION

22 BACKGROUND OF SCLERA VEIN RECOGNITION

221 OVERVIEW OF SCLERA VEIN RECOGNITION

222 SCLERA SEGMENTATION

223 IRIS AND EYELID REFINEMENT

224 OCULAR SURFACE VASCULATURE

225 OVERVIEW OF THE LINE DESCRIPTOR-BASED SCLERA

VEIN

23 EVOLUTION OF GPU ARCHITECTURE

231 PROGRAMMING A GPU FOR GRAPHICS

232 PROGRAMMING A GPU FOR GENERAL-PURPOSE

PROGRAMS (OLD)

233 PROGRAMMING A GPU FOR GENERAL-PURPOSE

PROGRAMS (NEW)

24 COARSE-TO-FINE TWO-STAGE MATCHING PROCESS

241 STAGE I MATCHING WITH Y SHAPE DESCRIPTOR

242 STAGE II FINE MATCHING USING WPL DESCRIPTOR

25 MAPPING THE SUBTASKS TO CUDA

251 MAPPING ALGORITHM TO BLOCKS

252 MAPPING INSIDE BLOCK

253 MEMORY MANAGEMENT

26 HISTOGRAM OF ORIENTED GRADIENTS

CHAPTER 3 SOFTWARE

SPECIFICATIONhelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip47-53

31 GENERAL

32 SOFTWARE REQUIREMENTS

33 INTRODUCTION

34 FEATURES OF MATLAB

341 INTERFACING WITH OTHER LANGUAGES

35 THE MATLAB SYSTEM

351 DESKTOP TOOLS

352 ANALYZING AND ACCESSING DATA

353 PERFORMING NUMERIC COMPUTATION

CHAPTER 4

IMPLEMENTATIONhelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

helliphellip54-69

41 GENERAL

42 CODING IMPLEMENTATION

43 SNAPSHOTS

CHAPTER 5 helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

70

CHAPTER 6 CONCLUSION amp FUTURE

SCOPEhelliphelliphelliphelliphelliphelliphelliphelliphelliphellip71-72

61 CONCLUSION

62 REFERENCES

APPLICATION

LIST OF FIGURES

FIG NO

FIG NAME PGNO

11 Fundamental blocks of digital image processing

2

12 Gray scale image 813 The additive model of RGB 914 The colors created by the

subtractive model of CMYK9

21 The diagram of a typical sclera vein recognition approach

19

22 Steps of Segmentation 2123 Glare area detection 2124 Detection of the sclera area 22

25 Pattern of veins 23

26 Sclera region and its vein

patterns 25

27 Filtering can take place

simultaneously on different

25

parts of the iris image

28 The sketch of parameters of

segment descriptor

26

29 The weighting image 28

210 The module of sclera

template matching

28

211 The Y shape vessel branch in

sclera

28

212 The rotation and scale

invariant character of Y

shape vessel branch

29

213 The line descriptor of the

sclera vessel pattern

30

214 The key elements of

descriptor vector

31

215 Simplified sclera matching

steps on GPU

32

216 Two-stage matching scheme 35

217 Example image from the

UBIRIS database

42

218 Occupancy on various thread

numbers per block

43

219 The task assignment inside

and outside the GPU

44

220 HOG features 46

41 Original sclera image 65

42 Binarised sclera image 65

43 Edge map subtracted image 66

44 Cropping roi 66

45 Roi mask 67

46 Roi finger sclera image 67

47 Enhanced sclera image 68

48 Feature extracted sclera

image

68

49 Matching with images in

database

69

410 Result 69

ABSTRACT

Sclera vein recognition is shown to be a promising method for human

identification However its matching speed is slow which could impact its

application for real-time applications To improve the matching efficiency

we proposed a new parallel sclera vein recognition method using a two-

stage parallel approach for registration and matching First we designed a

rotation- and scale-invariant Y shape descriptor based feature extraction

method to efficiently eliminate most unlikely matches Second we

developed a weighted polar line sclera descriptor structure to incorporate

mask information to reduce GPU memory cost Third we designed a

coarse-to-fine two-stage matching method Finally we developed a

mapping scheme to map the subtasks to GPU processing units The

experimental results show that our proposed method can achieve dramatic

processing speed improvement without compromising the recognition

accuracy

CHAPTER 1

INTRODUCTION

11GENERAL

Digital image processing is the use of computer algorithms to

perform image processing on digital images The 2D continuous image is

divided into N rows and M columns The intersection of a row and a

column is called a pixel The image can also be a function other variables

including depth color and time An image given in the form of a

transparency slide photograph or an X-ray is first digitized and stored as a

matrix of binary digits in computer memory This digitized image can then

be processed andor displayed on a high-resolution television monitor For

display the image is stored in a rapid-access buffer memory which

refreshes the monitor at a rate of 25 frames per second to produce a visually

continuous display

12 OVERVIEW ABOUT DIGITAL IMAGE PROCESSING

The field of ldquoDigital Image Processingrdquo refers to processing the digital

images by means of a digital computer In a broader sense it can be

considered as a processing of any two dimensional data where any image

(optical information) is represented as an array of real or complex numbers

represented by a definite number of bits An image is represented as a two

dimensional function f(xy) where lsquoxrsquo and lsquoyrsquo are spatial (plane)

coordinates and the amplitude of f at any pair of coordinates (xy)

represents the intensity or gray level of the image at that point

A digital image is one for which both the co-ordinates and the

amplitude values of f are all finite discrete quantities Hence a digital

image is composed of a finite number of elements each of which has a

particular location value These elements are called ldquopixelsrdquo A digital

image is discrete in both spatial coordinates and brightness and it can be

considered as a matrix whose rows and column indices identify a point on

the image and the corresponding matrix element value identifies the gray

level at that point

One of the first applications of digital images was in the newspaper

industry when pictures were first sent by submarine cable between London

and New York Introduction of the Bartlane cable picture transmission

system in the early 1920s reduced the time required to transport a picture

across the Atlantic from more than a week to less than three hours

FIG

121 PREPROCESSING

In imaging science image processing is any form of signal

processing for which the input is an image such as a photograph or video

frame the output of image processing may be either an image or a set of

characteristics or parameters related to the image Most image-processing

techniques involve treating the image as a two-dimensional signal and

applying standard signal-processing techniques to it Image processing

usually refers to digital image processing but optical and analog image

processing also are possible This article is about general techniques that

apply to all of them The acquisition of images (producing the input image

in the first place) is referred to as imaging

Image processing refers to processing of a 2D picture by a

computer Basic definitions

An image defined in the ldquoreal worldrdquo is considered to be a function

of two real variables for example a(xy) with a as the amplitude (eg

brightness) of the image at the real coordinate position (xy) Modern digital

technology has made it possible to manipulate multi-dimensional signals

with systems that range from simple digital circuits to advanced parallel

computers The goal of this manipulation can be divided into three

categories

Image processing (image in -gt image out)

Image Analysis (image in -gt measurements out)

Image Understanding (image in -gt high-level description out)

An image may be considered to contain sub-images sometimes referred

to as regions-of-interest ROIs or simply regions This concept reflects the

fact that images frequently contain collections of objects each of which can

be the basis for a region In a sophisticated image processing system it

should be possible to apply specific image processing operations to selected

regions Thus one part of an image (region) might be processed to suppress

motion blur while another part might be processed to improve colour

rendition

Most usually image processing systems require that the images be

available in digitized form that is arrays of finite length binary words For

digitization the given Image is sampled on a discrete grid and each sample

or pixel is quantized using a finite number of bits The digitized image is

processed by a computer To display a digital image it is first converted

into analog signal which is scanned onto a display Closely related to

image processing are computer graphics and computer vision In computer

graphics images are manually made from physical models of objects

environments and lighting instead of being acquired (via imaging devices

such as cameras) from natural scenes as in most animated movies

Computer vision on the other hand is often considered high-level image

processing out of which a machinecomputersoftware intends to decipher

the physical contents of an image or a sequence of images (eg videos or

3D full-body magnetic resonance scans)

In modern sciences and technologies images also gain much

broader scopes due to the ever growing importance of scientific

visualization (of often large-scale complex scientificexperimental data)

Examples include microarray data in genetic research or real-time multi-

asset portfolio trading in finance Before going to processing an image it is

converted into a digital form Digitization includes sampling of image and

quantization of sampled values After converting the image into bit

information processing is performed This processing technique may be

Image enhancement Image restoration and Image compression

122 IMAGE ENHANCEMENT

It refers to accentuation or sharpening of image features such as

boundaries or contrast to make a graphic display more useful for display amp

analysis This process does not increase the inherent information content in

data It includes gray level amp contrast manipulation noise reduction edge

crispening and sharpening filtering interpolation and magnification

pseudo coloring and so on

123 IMAGE RESTORATION

It is concerned with filtering the observed image to minimize the

effect of degradations Effectiveness of image restoration depends on the

extent and accuracy of the knowledge of degradation process as well as on

filter design Image restoration differs from image enhancement in that the

latter is concerned with more extraction or accentuation of image features

124 IMAGE COMPRESSION

It is concerned with minimizing the number of bits required to represent

an image Application of compression are in broadcast TV remote sensing

via satellite military communication via aircraft radar teleconferencing

facsimile transmission for educational amp business documents medical

images that arise in Computer tomography magnetic resonance imaging

and digital radiology motion pictures satellite images weather maps

geological surveys and so on

Text compression ndash CCITT GROUP3 amp GROUP4

Still image compression ndash JPEG

Video image compression ndash MPEG

125 SEGMENTATION

In computer vision image segmentation is the process of

partitioning a digital image into multiple segments (sets of pixels also

known as super pixels) The goal of segmentation is to simplify andor

change the representation of an image into something that is more

meaningful and easier to analyze Image segmentation is typically used to

locate objects and boundaries (lines curves etc) in images More precisely

image segmentation is the process of assigning a label to every pixel in an

image such that pixels with the same label share certain visual

characteristics

The result of image segmentation is a set of segments that

collectively cover the entire image or a set of contours extracted from the

image (see edge detection) Each of the pixels in a region are similar with

respect to some characteristic or computed property such as

colour intensity or texture Adjacent regions are significantly different

with respect to the same characteristic(s) When applied to a stack of

images typical in medical imaging the resulting contours after image

segmentation can be used to create 3D reconstructions with the help of

interpolation algorithms like marching cubes

126 IMAGE RESTORATION

Image restoration like enhancement improves the qualities of image

but all the operations are mainly based on known measured or

degradations of the original image Image restorations are used to restore

images with problems such as geometric distortion improper focus

repetitive noise and camera motion It is used to correct images for known

degradations

127 FUNDAMENTAL STEPS

Image acquisition to acquire a digital image

Image preprocessing to improve the image in ways that increases the

chances for success of the other processes

Image segmentation to partitions an input image into its constituent parts or

objects

Image representation to convert the input data to a form suitable for

computer processing

Image description to extract features that result in some quantitative

information of interest or features that are basic for differentiating one

class of objects from another

Image recognition to assign a label to an object based on the

information provided by its descriptors

Image interpretation to assign meaning to an ensemble of recognized

objects

Knowledge about a problem domain is coded into an image processing

system in the form of a Knowledge database

13 A SIMPLE IMAGE MODEL

To be suitable for computer processing an image f(xy) must be digitalized

both spatially and in amplitude

Digitization of the spatial coordinates (xy) is called image sampling

Amplitude digitization is called gray-level quantization

The storage and processing requirements increase rapidly with the spatial

resolution and the number of gray levels

Example A 256 gray-level image of size 256x256 occupies 64K bytes of

memory

Images of very low spatial resolution produce a checkerboard effect

The use of insufficient number of gray levels in smooth areas of a digital

image results in false contouring

14 IMAGE FILE FORMATS

There are two general groups of lsquoimagesrsquo vector graphics (or line art)

and bitmaps (pixel-based or lsquoimagesrsquo) Some of the most common file

formats are

GIF mdash Graphical interchange Format An 8-bit (256 colour) non-

destructively compressed bitmap format Mostly used for web Has several

sub-standards one of which is the animated GIF

JPEG mdash Joint Photographic Experts Group a very efficient (ie much

information per byte) destructively compressed 24 bit (16 million colours)

bitmap format Widely used especially for web and Internet (bandwidth-

limited)

TIFF mdash Tagged Image File Format The standard 24 bit publication bitmap

format Compresses non-destructively with for instance Lempel-Ziv-

Welch (LZW) compression

PS mdash Postscript a standard vector format Has numerous sub-standards

and can be difficult to transport across platforms and operating systems

PSD ndash Adobe PhotoShop Document a dedicated Photoshop format that

keeps all the information in an image including all the layers

BMP- bit map file format

15 TYPE OF IMAGES

Images are 4 types

1 Binary image

2 Gray scale image

3 Color image

4 Indexed image

151 BINARY IMAGES

A binary image is a digital image that has only two possible values for

each pixel Typically the two colors used for a binary image are black and

white though any two colors can be used Binary images are also called bi-

level or two-level This means that each pixel is stored as a single bitmdashie

a 0 or 1 The names black-and-white BampW

152 GRAY SCALE IMAGE

In a (8-bit) grayscale image each picture element has an assigned intensity

that ranges from 0 to 255 A grey scale image is what people normally call

a black and white image but the name emphasizes that such an image will

also include many shades of grey

FIG

153 COLOR IMAGE

The RGB colour model relates very closely to the way we perceive

colour with the r g and b receptors in our retinas RGB uses additive colour

mixing and is the basic colour model used in television or any other

medium that projects colour with light It is the basic colour model used in

computers and for web graphics but it cannot be used for print production

The secondary colours of RGB ndash cyan magenta and yellow ndash are formed

by mixing two of the primary colours (red green or blue) and excluding the

third colour Red and green combine to make yellow green and blue to

make cyan and blue and red form magenta The combination of red green

and blue in full intensity makes white

In Photoshop using the ldquoscreenrdquo mode for the different layers in an

image will make the intensities mix together according to the additive

colour mixing model This is analogous to stacking slide images on top of

each other and shining light through them

FIG

CMYK The 4-colour CMYK model used in printing lays down

overlapping layers of varying percentages of transparent cyan (C) magenta

(M) and yellow (Y) inks In addition a layer of black (K) ink can be added

The CMYK model uses the subtractive colour model

154 INDEXED IMAGE

FIG

An indexed image consists of an array and a color map matrix The

pixel values in the array are direct indices into a color map By convention

this documentation uses the variable name X to refer to the array and map

to refer to the color map In computing indexed color is a technique to

manage digital images colors in a limited fashion in order to save

computer memory and file storage while speeding up display refresh and

file transfers It is a form of vector quantization compression

When an image is encoded in this way color information is not

directly carried by the image pixel data but is stored in a separate piece of

data called a palette an array of color elements in which every element a

color is indexed by its position within the array The image pixels do not

contain the full specification of its color but only its index in the palette

This technique is sometimes referred as pseudocolor or indirect color as

colors are addressed indirectly

Perhaps the first device that supported palette colors was a random-

access frame buffer described in 1975 by Kajiya Sutherland and Cheadle

This supported a palette of 256 36-bit RGB colors

16 Applications of image processing

Interest in digital image processing methods stems from 2 principal

application areas

1) Improvement of pictorial information for human interpretation

2) Processing of scene data for autonomous machine perception

In the second application area interest focuses on procedures for

extracting from an image

Information in a form suitable for computer processing

Examples include automatic character recognition industrial machine

vision for product assembly and inspection military recognizance

automatic processing of fingerprints etc

17 EXISTING SYSTEM

In Crihalmeanu and Ross proposed three approaches Speed Up Robust

Features (SURF)-based method minutiae detection and direct correlation

matching for feature registration and matching Within these three methods

the SURF method achieves the best accuracy It takes an average of 15

seconds1 using the SURF method to perform a one-to-one matching In

Zhou et al Proposed line descriptor-based method for sclera vein

recognition The matching step (including registration) is the most time-

consuming step in this sclera vein recognition system which costs about 12

seconds to perform a one-to-one matching Both speed was calculated using

a PC with Intelreg Core TM 2 Duo 24GHz processors and 4 GB DRAM

Currently Sclera vein recognition algorithms are designed using central

processing unit (CPU)-based systems

171 DISADVANTAGES OF EXISTING SYSTEM

1 Mask files are used to calculate valid overlapping areas of two sclera

templates and to align the templates to the same coordinate system But the

mask files are large in size and will preoccupy the GPU memory and slow

down the data transfer Also some of processing on the mask files will

involve convolution which is difficult to improve its performance on the

scalar process unit on CUDA

2 The procedure of sclera feature matching consists of a pipeline of several

computational stages with different memory and processing requirements

There is no uniform mapping scheme applicable for all these stages

3 When the scale of sclera database is far larger than the number of the

processing units on the GPU parallel matching on the GPU is still unable to

satisfy the requirement of real-time performance

LITERATURE SURVEY

1 S Crihalmeanu and A Ross ldquo ieee Multispectral scleral patterns

for ocular biometric recognitionrdquo Pattern Recognit Lett vol 33 no

14 pp 1860ndash1869 Oct 2012

Face recognition in unconstrained acquisition conditions is one of the

most challenging problems that has been actively researched in recent

years It is well known that many state-of-the-arts still face recognition

algorithms perform well when constrained (frontal well illuminated high-

resolution sharp and full) face images are acquired However their

performance degrades significantly when the test images contain variations

that are not present in the training images In this paper we highlight some

of the key issues in remote face recognition We define the remote face

recognition as one where faces are several tens of meters (10-250m) from

the cameras We then describe a remote face database which has been

acquired in an unconstrained outdoor maritime environment Recognition

performance of a subset of existing still image-based face recognition

algorithms is evaluated on the remote face data set Further we define the

remote re-identification problem as matching a subject at one location with

candidate sets acquired at a different location and over time in remote

conditions We provide preliminary experimental results on remote re-

identification It is demonstrated that in addition to applying a good

classification algorithm finding features that are robust to variations

mentioned above and developing statistical models which can account for

these variations are very important for remote face recognition

2 R N Rakvic B J Ulis R P Broussard R W Ives and N

Steiner1 ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics

Security

With the rapidly expanded biometric data collected by various sectors

of government and industry for identification and verification purposes

how to manage and process such Big Data draws great concern Even

though modern processors are equipped with more cores and memory

capacity it still requires careful design in order to utilize the hardware

resource effectively and the power consumption efficiently This research

addresses this issue by investigating the workload characteristics of

biometric application Taking Daugmanrsquos iris matching algorithm which

has been proven to be the most reliable iris matching method as a case

study we conduct performance profiling and binary instrumentation on the

benchmark to capture its execution behavior The results show that data

loading and memory access incurs great performance overhead and

motivates us to move the biometrics computation to high-performance

architecture

Modern iris recognition algorithms can be computationally intensive

yet are designed for traditional sequential processing elements such as a

personal computer However a parallel processing alternative using field

programmable gate arrays (FPGAs) offers an opportunity to speed up iris

recognition Within the means of this project iris template generation with

directional filtering which is acomputationally expensive yet parallel

portion of a modern iris recognition algorithm is parallelized on an FPGA

system We will present a performance comparison of the parallelized

algorithm on the FPGA system to a traditional CPU-based version The

parallelized template generation outperforms an optimized C++ code

version determining the information content of an iris approximately 324

times faster

3 R Derakhshani A Ross and S Crihalmeanu ldquoA new biometric

modality based on conjunctival vasculaturerdquo in Proc Artif Neural

Netw Eng 2006 pp 1ndash8

A new biometric indicator based on the patterns of conjunctival

vasculature is proposed Conjunctival vessels can be observed on the visible

part of the sclera that is exposed to the outside world These vessels

demonstrate rich and specific details in visible light and can be easily

photographed using a regular digital camera In this paper we discuss

methods for conjunctival imaging preprocessing and feature extraction in

order to derive a suitable conjunctival vascular template for biometric

authentication Commensurate classification methods along with the

observed accuracy are discussed Experimental results suggest the potential

of using conjunctival vasculature as a biometric measure Identification of

a person based on some unique set of features is an important task The

human identification is possible with several biometric systems and sclera

recognition is one of the promising biometrics The sclera is the white

portion of the human eye The vein pattern seen in the sclera region is

unique to each person Thus the sclera vein pattern is a well suited

biometric technology for human identification The existing methods used

for sclera recognition have some drawbacks like only frontal looking

images are preferred for matching and rotation variance is another problem

These problems are completely eliminated in the proposed system by using

two feature extraction techniques They are Histogram of Oriented

Gradients (HOG) and converting the image into polar form using the

bilinear interpolation technique These two features help the proposed

system to become illumination invariant and rotation invariant The

experimentation is done with the help of UBIRIS database The

experimental result shows that the proposed sclera recognition method can

achieve better accuracy than the previous methods

4 J D Owens M Houston D Luebke S Green J E Stone and J

C Phillips ldquoGPU computingrdquo Proc IEEE vol 96 no 5pp 879ndash899

May 2008

The graphics processing unit (GPU) has become an integral part of

todayrsquos mainstream computing systems Over the past six years there has

been a marked increase in the performance and capabilities of GPUs The

modern GPU is not only a powerful graphics engine but also a highly

parallel programmable processor featuring peak arithmetic and memory

bandwidth that substantially outpaces its CPU counterpart The GPUrsquos

rapid increase in both programmability and capability has spawned a

research community that has successfully mapped a broad range of

computationally demanding complex problems to the GPU This effort in

general purpose computing on the GPU also known as GPU computing

has positioned the GPU as a compelling alternative to traditional

microprocessors in high-performance computer systems of the future We

describe the background hardware and programming model for GPU

computing summarize the state of the art in tools and techniques and

present four GPU computing successes in game physics and computational

biophysics that deliver order-of-magnitude performance gains over

optimized CPU applications

5 H Proenccedila and L A Alexandre ldquoUBIRIS A noisy iris image

databaserdquoin Proc 13th Int Conf Image Anal Process 2005 pp 970ndash

977

This paper proposes algorithms for iris segmentation quality

enhancement match score fusion and indexing to improve both the

accuracy and the speed of iris recognition A curve evolution approach is

proposed to effectively segment a nonideal iris image using the modified

MumfordndashShah functional Different enhancement algorithms are

concurrently applied on the segmented iris image to produce multiple

enhanced versions of the iris image A support-vector-machine-based

learning algorithm selects locally enhanced regions from each globally

enhanced image and combines these good-quality regions to create a single

high-quality iris image Two distinct features are extracted from the high-

quality iris image The global textural feature is extracted using the 1-D log

polar Gabor transform and the local topological feature is extracted using

Euler numbers An intelligent fusion algorithm combines the textural and

topological matching scores to further improve the iris recognition

performance and reduce the false rejection rate whereas an indexing

algorithm enables fast and accurate iris identification The verification and

identification performance of the propose algorithms is validated and

compared with other algorithms using the CASIA Version 3 ICE 2005 and

UBIRIS iris databases

18 PROPOSED METHOD

Proposed a new parallel sclera vein recognition method using a two-

stage parallel approach for registration and matching A parallel sclera

matching solution for Sclera vein recognition using our sequential line-

descriptor method using the CUDA GPU architecture CUDA is a highly

parallel multithreaded many-core processor with tremendous

computational power

It supports not only a traditional graphics pipeline but also computation

on non-graphical data It is relatively straightforward to implement our C

program for CUDA on AMD-based GPU using Open CL Our CUDA

kernels can be directly converted to Open CL kernels by concerning

different syntax for various keywords and built-in functions The mapping

strategy is also effective in Open CL if we regard thread and block in

CUDA as work item and work-group in Open CL Most of our optimization

techniques such as coalesced memory access and prefix sum can work in

Open CL too Moreover since CUDA is a data parallel architecture the

implementation of our approach by Open CL should be programmed in

data-parallel model

In this research we first discuss why the naiumlve parallel approach would

not work We then propose the new sclera descriptor ndash the Y shape sclera

feature-based efficient registration method to speed up the mapping scheme

introduce the ldquoweighted polar line (WPL) descriptorrdquo that would be better

suited for parallel computing to mitigate the mask size issue and develop

our coarse to fine two-stage matching process to dramatically improve the

matching speed These new approaches make the parallel processing

possible and efficient

191PROPOSED SYSTEM ADVANTAGES

1 To improve the efficiency in this research we propose a new descriptor

mdash the Y shape descriptor which can greatly help improve the efficiency of

the coarse registration of two images and can be used to filter out some

non-matching pairs before refined matching

2 We propose the coarse-to-fine two-stage matching process In the first

stage we matched two images coarsely using the Y-shape descriptors

which is very fast to match because no registration was needed The

matching result in this stage can help filter out image pairs with low

similarities

CHAPTER 2

PROJECT DESCRIPTION

21 INTRODUCTION

The sclera is the opaque and white outer layer of the eye The blood

vessel structure of sclera is formed randomly and is unique to each person

which can be used for humanrsquos identification Several researchers have

designed different Sclera vein recognition methods and have shown that it

is promising to use Sclera vein recognition for human identification In

Crihalmeanu and Ross proposed three approaches Speed Up Robust

Features (SURF)-based method minutiae detection and direct correlation

matching for feature registration and matching Within these three methods

the SURF method achieves the best accuracy It takes an average of 15

seconds1 using the SURF method to per- form a one-to-one matching Zhou

et al proposed line descriptor-based method for sclera vein recognition

The matching step (including registration) is the most time-consuming step

in this sclera vein recognition system which costs about 12 seconds to

perform a one-to-one matching Both speed was calculated using a PC with

Intelreg Coretrade 2 Duo 24GHz processors and 4 GB DRAM Currently

Sclera vein recognition algorithms are designed using central processing

unit (CPU)-based systems

As discussed CPU-based systems are designed as sequential

processing devices which may not be efficient in data processing where the

data can be parallelized Because of large time consumption in the matching

step Sclera vein recognition using sequential-based method would be very

challenging to be implemented in a real time biometric system especially

when there is large number of templates in the database for matching GPUs

(as abbreviation of General purpose Graphics Processing Units GPGPUs)

are now popularly used for parallel computing to improve the

computational processing speed and efficiency The highly parallel

structure of GPUs makes them more effective than CPUs for data

processing where processing can be performed in parallel GPUs have been

widely used in biometrics recognition such as speech recognition text

detection handwriting recognition and face recognition In iris

recognition GPU was used to extract the features construct descriptors

and match templates

GPUs are also used for object retrieval and image search Park et al

designed the performance evaluation of image processing algorithms such

as linear feature extraction and multi-view stereo matching on GPUs

However these approaches were designed for their specific biometric

recognition applications and feature searching methods Therefore they may

not be efficient for Sclera vein recognition Compute Unified Device

Architecture (CUDA) the computing engine of NVIDIA GPUs is used in

this research CUDA is a highly parallel multithreaded many-core

processor with tremendous computational power It supports not only a

traditional graphics pipeline but also computation on non-graphical data

More importantly it offers an easier programming platform which

outperforms its CPU counterparts in terms of peak arithmetic intensity and

memory bandwidth In this research the goal is not to develop a unified

strategy to parallelize all sclera matching methods because each method is

quite different from one another and would need customized design To

develop an efficient parallel computing scheme it would need different

strategies for different Sclera vein recognition methods

Rather the goal is to develop a parallel sclera matching solution for

Sclera vein recognition using our sequential line-descriptor method using

the CUDA GPU architecture However the parallelization strategies

developed in this research can be applied to design parallel approaches for

other Sclera vein recognition methods and help parallelize general pattern

recognition methods Based on the matching approach in there are three

challenges to map the task of sclera feature matching to GPU

1) Mask files are used to calculate valid overlapping areas of two sclera

templates and to align the templates to the same coordinate system But the

mask files are large in size and will preoccupy the GPU memory and slow

down the data transfer Also some of processing on the mask files will

involve convolution which is difficult to improve its performance on the

scalar process unit on CUDA

2) The procedure of sclera feature matching consists of a pipeline of several

computational stages with different memory and processing requirements

There is no uniform mapping scheme applicable for all these stages

3) When the scale of sclera database is far larger than the number of the

processing units on the GPU parallel matching on the GPU is still unable to

satisfy the requirement of real-time performance New designs are

necessary to help narrow down the search range In summary naiumlve

implementation of the algorithms in parallel would not work efficiently

Note it is relatively straightforward to implement our C program for

CUDA on AMD-based GPU using Open CL Our CUDA kernels can be

directly converted to Open CL kernels by concerning different syntax for

various keywords and built-in functions The mapping strategy is also

effective in Open CL if we regard thread and block in CUDA as work item

and work-group in Open CL Most of our optimization techniques such as

coalesced memory access and prefix sum can work in Open CL too

Moreover since CUDA is a data parallel architecture the implementation

of our approach by Open CL should be programmed in data-parallel model

In this research we first discuss why the naiumlve parallel approach would not

work (Section 3) We then propose the new sclera descriptor ndash the Y shape

sclera feature-based efficient registration method to speed up the mapping

scheme (Section 4) introduce the ldquoweighted polar line (WPL) descriptorrdquo

that would be better suited for parallel computing to mitigate the mask size

issue (Section 5) and develop our coarse to fine two-stage matching

process to dramatically improve the matching speed (Section 6) These new

approaches make the parallel processing possible and efficient However it

is non-trivial to implement these algorithms in CUDA We then developed

the implementation schemes to map our algorithms into CUDA (Section 7)

In the Section 2 we give brief introduction of Sclera vein recognition In

the Section 8 we performed some experiments using the proposed system

In the Section 9 we draw some conclusions

22 BACKGROUND OF SCLERA VEIN RECOGNITION

221 OVERVIEW OF SCLERA VEIN RECOGNITION

A typical sclera vein recognition system includes sclera

segmentation feature enhancement feature extraction and feature

matching (Figure 1)

FIG

Sclera image segmentation is the first step in sclera vein recognition

Several methods have been designed for sclera segmentation Crihalmeanu

et al presented an semi-automated system for sclera segmentation They

used a clustering algorithm to classify the color eye images into three

clusters - sclera iris and background Later on Crihalmeanu and Ross

designed a segmentation approach based on a normalized sclera index

measure which includes coarse sclera segmentation pupil region

segmentation and fine sclera segmentation Zhou et al developed a skin

tone plus ldquowhite colorrdquo-based voting method for sclera segmentation in

color images and Otsursquos thresholding-based method for grayscale images

After sclera segmentation it is necessary to enhance and extract the sclera

features since the sclera vein patterns often lack contrast and are hard to

detect Zhou et al used a bank of multi-directional Gabor filters for

vascular pattern enhancement Derakhshani et al used contrast limited

adaptive histogram equalization (CLAHE) to enhance the green color plane

of the RGB image and a multi-scale region growing approach to identify

the sclera veins from the image background Crihalmeanu and Ross applied

a selective enhancement filter for blood vessels to extract features from the

green component in a color image In the feature matching step

Crihalmeanu and Ross proposed

three registration and matching approaches including Speed Up Robust

Features (SURF) which is based on interest-point detection minutiae

detection which is based on minutiae points on the vasculature structure

and direct correlation matching which relies on image registration Zhou et

al designed a line descriptor based feature registration and matching

method

The proposed sclera recognition consists of five steps which include

sclera segmentation vein pattern enhancement feature extraction feature

matching and matching decision Fig 2 shows the block diagram of sclera

recognition Two types of feature extraction are used in the proposed

method to achieve good accuracy for the identification The characteristics

that are elicited from the blood vessel structure seen in the sclera region are

Histogram of Oriented Gradients (HOG) and interpolation of Cartesian to

Polar conversion HOG is used to determine the gradient orientation and

edge orientations of vein pattern in the sclera region of an eye image To

become more computationally efficient the data of the image are converted

to the polar form It is mainly used for circular or quasi circular shape of

object These two characteristics are extracted from all the images in the

database and compared with the features of the query image whether the

person is correctly identified or not This procedure is done in the feature

matching step and ultimately makes the matching decision By using the

proposed feature extraction methods and matching techniques the human

identification is more accurate than the existing studies In the proposed

method two features of an image are drawn out

222 SCLERA SEGMENTATION

Sclera segmentation is the first step in the sclera recognition It lets

in three steps glare area detection sclera area estimation and iris and eyelid

detection and refinement Fig shows the steps of segmentation

FIG

Glare Area Detection Glare area means a small bright area near

pupil or iris This is the unwanted portion on the eye image Sobel filter is

applied to detect the glare area present in the iris or pupil Simply it runs

only for the grayscale image If the image is color then it needs a

conversion to grayscale image and after that apply it to the Sobel filter to

detect the glare area Fig 4 shows the result of the glare area detection

FIG

Sclera area estimation For the estimation of sclera area Otsursquos

thresholding method is applied The stairs of the sclera area detection are

selection of the area of interest (ROI) Otsursquos thresholding sclera area

detection Left and right sclera area is selected based on the iris boundaries

When the region of interest is selected then apply Otsursquos thresholding for

obtaining the potential sclera areas The correct left sclera area should be

placed in the right and center positions and correct right sclera area should

be placed in the left and center In this way non sclera areas are wiped out

223 IRIS AND EYELID REFINEMENT

The top and underside of the sclera regions are the limits of the

sclera area And then that upper eyelid lower eyelid and iris boundaries are

refined These altogether are the unwanted portion for recognition In order

to eliminate these effects refinement is done in the footstep of the detection

of sclera area Fig shows after the Otsursquos thresholding process and iris and

eyelid refinement to detect right sclera area In the same way the left sclera

area is detected using this method

FIG

In the segmentation process all images are not perfectly segmented

Hence feature extraction and matching are needed to reduce the

segmentation fault The vein patterns in the sclera area are not visible in the

segmentation process To get vein patterns more visible vein pattern

enhancement is to be performed

224 OCULAR SURFACE VASCULATURE

Human recognition using vascular patterns in the human body has

been studied in the context of fingers (Miura et al 2004) palm (Lin and

Fan 2004) and retina (Hill 1999) In the case of retinal biometrics an

especial optical device for imaging the back of the eyeball is needed (Hill

1999) Due to its perceived invasiveness and the required degree of subject

cooperation the use of retinal biometrics may not be acceptable to some

individuals The conjunctiva is a thin transparent and moist tissue that

covers the outer surface of the eye The part of the conjunctiva that covers

the inner lining of the eyelids is called palpebral conjunctiva and the part

that covers the outer surface of the eye is called ocular (or the bulbar)

conjunctiva which is the focus of this study The ocular conjunctiva is very

thin and clear thus the vasculature (including those of the episclera) is

easily visible through it The visible microcirculation of conjunctiva offers a

rich and complex network of veins and fine microcirculation (Fig 1) The

apparent complexity and specificity of these vascular patterns motivated us

to utilize them for personal identification (Derakhshani and Ross 2006)

FIG

We have found conjunctival vasculature to be a suitable biometric as it

conforms to the following criteria (Jain et al 2004)

UNIVERSALITY All normal living tissues including that of the

conjunctiva and episclera have vascular structure

UNIQUENESS Vasculature is created during embryonic vasculogenesis

Its detailed final structure is mostly stochastic and thus unique Even

though no comprehensive study on the uniqueness of vascular structures

has been conducted study of some targeted areas such as those of the eye

fundus confirm the uniqueness of such vascular patterns even between

identical twins (Simon and Goldstein 1935 Tower 1955)

PERMANENCE Other than cases of significant trauma pathology or

chemical intervention spontaneous adult ocular vasculogenesis and

angiogenesis does not easily occur Thus the conjunctival vascular

structure is expected to have reasonable permanence (Joussen 2001)

Practicality Conjunctival vasculature can be captured with commercial off

the shelf digital cameras under normal lighting conditions making this

modality highly practical

ACCEPTABILITY Since the subject is not required to stare directly into

the camera lens and given the possibility of capturing the conjunctival

vasculature from several feet away this modality is non-intrusive and thus

more acceptable

SPOOF-PROOFNESS The fine multi surface structure of the ocular

veins makes them hard to reproduce as a physical artifact Besides being a

stand-alone biometric modality we anticipate that the addition of

conjunctival biometrics will enhance the performance of current iris-based

biometric system in the following ways

Improving accuracy by the addition of vascular features

Facilitating recognition using off-angle iris images For instance if the iris

information is relegated to the left or right portions of the eye the sclera

vein patterns will be further exposed This feature makes sclera vasculature

a natural complement to the iris biometric

Addressing the failure-to-enroll issue when iris patterns are not usable (eg

due to surgical procedures)

Reducing vulnerability to spoof attacks For instance when implemented

alongside iris systems an attacker needs to reproduce not only the iris but

also different surfaces of the sclera along with the associated

microcirculation and make them available on commensurate eye surfaces

The first step in parallelizing an algorithm is to determine the

availability for simultaneous computation Below Figure demonstrates the

possibility for parallel directional filtering Since the filter is computed over

different portions of the input image the computation can be computed in

parallel (denoted by Elements below) In addition individual parallelization

of each element of Filtering can also be performed A detailed discussion of

our proposed parallelization is outside the scope of this paper

FIG

FIG

225 OVERVIEW OF THE LINE DESCRIPTOR-BASED SCLERA

VEIN

2251 RECOGNITION METHOD

The matching segment of the line-descriptor based method is a

bottleneck with regard to matching speed In this section we briefly

describe the Line Descriptor-based sclera vein recognition method After

segmentation vein patterns were enhanced by a bank of directional Gabor

filters Binary morphological operations are used to thin the detected vein

structure down to a single pixel wide skeleton and remove the branch

points The line descriptor is used to describe the segments in the vein

structure Figure 2 shows a visual description of the line descriptor Each

segment is described by three quantities the segments angle to some

reference angle at the iris center θ the segments distance to the iris center r

and the dominant angular orientation of the line segment ɸ Thus the

descriptor is S = ( θ r ɸ )T The individual components of the line descriptor

are calculated as

FIG

Here fline (x) is the polynomial approximation of the line segment (xl yl )

is the center point of the line segment (xi yi ) is the center of the detected

iris and S is the line descriptor In order to register the segments of the

vascular patterns a RANSAC-based algorithm is used to estimate the best-

fit parameters for registration between the two sclera vascular patterns For

the registration algorithm it randomly chooses two points ndash one from the

test template and one from the target template It also randomly chooses a

scaling factor and a rotation value based on a priori knowledge of the

database Using these values it calculates a fitness value for the registration

using these parameters

After sclera template registration each line segment in the test

template is compared to the line segments in the target template for

matches In order to reduce the effect of segmentation errors we created the

weighting image (Figure 3) from the sclera mask by setting interior pixels

in the sclera mask to 1 pixels within some distance of the boundary of the

mask to 05 and pixels outside the mask to 0

The matching score for two segment descriptors is calculated By

where Si and Sj are two segment descriptors m(Si Sj ) is the matching

score between segments Si and Sj d(Si Sj ) is the Euclidean distance

between the segment descriptors center points (from Eq 6-8) Dmatch is

the matching distance threshold and match is the matching angle threshold

The total matching score M is the sum of the individual matching scores

divided by the maximum matching score for the minimal set between the

test and target template That is one of the test or target templates has fewer

points and thus the sum of its descriptors weight sets the maximum score

that can be attained

FIG

FIG

FIG

FIG

movement of eye Y shape branches are observed to be a stable feature and

can be used as sclera feature descriptor To detect the Y shape branches in

the original template we search for the nearest neighbors set of every line

segment in a regular distance classified the angles among these neighbors

If there were two types of angle values in the line segment set this set may

be inferred as a Y shape structure and the line segment angles would be

recorded as a new feature of the sclera

There are two ways to measure both orientation and relationship of

every branch of Y shape vessels one is to use the angles of every branch to

x axle the other is to use the angels between branch and iris radial

direction The first method needs additional rotation operating to align the

template In our approach we employed the second method As Figure 6

shows ϕ1 ϕ2 and ϕ3 denote the angle between each branch and the radius

from pupil center Even when the head tilts the eye moves or the camera

zooms occurs at the image acquisition step ϕ1 ϕ2 and ϕ3 are quite stable

To tolerate errors from the pupil center calculation in the segmentation step

we also recorded the center position (x y) of the Y shape branches as

auxiliary parameters So in our rotation shift and scale invariant feature

vector is defined as y(ϕ1 ϕ2 ϕ3 x y) The Y-shape descriptor is generated

with reference to the iris center Therefore it is automatically aligned to the

iris centers It is a rotational- and scale- invariant descriptor V WPL

SCLERA DESCRIPTOR As we discussed in the Section 22 the line

descriptor is extracted from the skeleton of vessel structure in binary images

(Figure 7) The skeleton is then broken into smaller segments For each

segment a line descriptor is created to record the center and orientation of

the segment This descriptor is expressed as s(x yɸ) where (x y) is the

position of the center and ɸ is its orientation Because of the limitation of

segmentation accuracy the descriptor in the boundary of sclera area might

not be accurate and may contain spur edges resulting from the iris eyelid

andor eyelashes To be tolerant of such error the mask file

FIG

The line descriptor of the sclera vessel pattern (a) An eye image (b) Vessel

patterns in sclera (c) Enhanced sclera vessel patterns (d) Centers of line

segments of vessel patterns

Is designed to indicate whether a line segment belongs to the edge of the

sclera or not However in GPU application using the mask is a challenging

since the mask files are large in size and will occupy the GPU memory and

slow down the data transfer When matching the registration RANSAC

type algorithm was used to randomly select the corresponding descriptors

and the transform parameter between them was used to generate the

template transform affine matrix After every templates transform the mask

data should also be transformed and new boundary should be calculated to

evaluate the weight of the transformed descriptor This results in too many

convolutions in processor unit

To reduce heavy data transfer and computation we designed the

weighted polar line (WPL) descriptor structure which includes the

information of mask and can be automatically aligned We extracted the

relationship of geometric feature of descriptors and store them as a new

descriptor We use a weighted image created via setting various weight

values according to their positions The weight of those descriptors who are

beyond the sclera are set to be 0 and those who are near the sclera

boundary are 05 and interior descriptors are set to be 1 In our work

descriptors weights were calculated on their own mask by the CPU only

once

The calculating result was saved as a component of descriptor The

descriptor of sclera will change to s(x y ɸw) where w denotes the weight

of the point and the value may be 0 05 1 To align two templates when a

template is shifted to another location along the line connecting their

centers all the descriptors of that template will be transformed It would be

faster if two templates have similar reference points If we use the center of

the iris as the reference point when two templates are compared the

correspondence will automatically be aligned to each other since they have

the similar reference point Every feature vector of the template is a set of

line segment descriptors composed of three variable (Figure 8) the

segment angle to the reference line which went through the iris center

denoted as θ the distance between the segments center and pupil center

which is denoted as r the dominant angular orientation of the segment

denoted as ɸ To minimize the GPU computing we also convert the

descriptor value from polar coordinate to rectangular coordinate in CPU

preprocess

The descriptor vector becomes s(x y r θ ɸw) The left and right

parts of sclera in an eye may have different registration parameters For

example as an eyeball moves left left part sclera patterns of the eye may be

compressed while the right part sclera patterns are stretched

In parallel matching these two parts are assigned to threads in

different warps to allow different deformation The multiprocessor in

CUDA manages threads in groups of 32 parallel threads called warps We

reorganized the descriptor from same sides and saved

FIG

FIG

them in continuous address This would meet requirement of coalesced

memory access in GPU

After reorganizing the structure of descriptors and adding mask information

into the new descriptor the computation on the mask file is not needed on

the GPU It was very fast to match with this feature because it does not

need to reregister the templates every time after shifting Thus the cost of

data transfer and computation on GPU will be reduced Matching on the

new descriptor the shift parameters generator in Figure 4 is then simplified

as Figure 9

23 EVOLUTION OF GPU ARCHITECTURE

The fixed-function pipeline lacked the generality to efficiently express

more complicated shading and lighting operations that are essential for

complex effects The key step was replacing the fixed-function per-vertex

and per-fragment operations with user-specified programs run on each

vertex and fragment Over the past six years these vertex programs and

fragment programs have become increasingly more capable with larger

limits on their size and resource consumption with more fully featured

instruction sets and with more flexible control-flow operations After many

years of separate instruction sets for vertex and fragment operations current

GPUs support the unified Shader Model 40 on both vertex and fragment

shaders

The hardware must support shader programs of at least 65 k static

instructions and unlimited dynamic instructions

The instruction set for the first time supports both 32-bit integers and 32-

bit floating-point numbers

The hardware must allow an arbitrary number of both direct and indirect

reads from global memory (texture)

Finally dynamic flow control in the form of loops and branches must be

supported

As the shader model has evolved and become more powerful and GPU

applications of all types have increased vertex and fragment program

complexity GPU architectures have increasingly focused on the

programmable parts of the graphics pipeline Indeed while previous

generations of GPUs could best be described as additions of

programmability to a fixed-function pipeline todayrsquos GPUs are better

characterized as a programmable engine surrounded by supporting fixed-

function units General-Purpose Computing on the GPU Mapping general-

purpose computation onto the GPU uses the graphics hardware in much the

same way as any standard graphics application Because of this similarity it

is both easier and more difficult to explain the process On one hand the

actual operations are the same and are easy to follow on the other hand the

terminology is different between graphics and general-purpose use Harris

provides an excellent description of this mapping process

We begin by describing GPU programming using graphics terminology

then show how the same steps are used in a general-purpose way to author

GPGPU applications and finally use the same steps to show the more

simple and direct way that todayrsquos GPU computing applications are written

231 PROGRAMMING A GPU FOR GRAPHICS

We begin with the same GPU pipeline that we described in Section II

concentrating on the programmable aspects of this pipeline

The programmer specifies geometry that covers a region on the screen

The rasterizer generates a fragment at each pixel location covered by that

geometry

Each fragment is shaded by the fragment program

The fragment program computes the value of the fragment by a

combination of math operations and global memory reads from a global

Btexture[ memory

The resulting image can then be used as texture on future passes through

the graphics pipeline

232 PROGRAMMING A GPU FOR GENERAL-PURPOSE

PROGRAMS (OLD)

Coopting this pipeline to perform general-purpose computation

involves the exact same steps but different terminology A motivating

example is a fluid simulation computed over a grid at each time step we

compute the next state of the fluid for each grid point from the current state

at its grid point and at the grid points of its neighbors

The programmer specifies a geometric primitive that covers a

computation domain of interest The rasterizer generates a fragment at each

pixel location covered by that geometry (In our example our primitive

must cover a grid of fragments equal to the domain size of our fluid

simulation)

Each fragment is shaded by an SPMD general-purpose fragment

program (Each grid point runs the same program to update the state of its

fluid)

The fragment program computes the value of the fragment by a

combination of math operations and Bgather[ accesses from global

memory (Each grid point can access the state of its neighbors from the

previous time step in computing its current value)

The resulting buffer in global memory can then be used as an input on

future passes (The current state of the fluid will be used on the next time

step)

23 2 PROGRAMMING A GPU FOR GENERAL-PURPOSE

PROGRAMS (NEW)

One of the historical difficulties in programming GPGPU applications

has been that despite their general-purpose tasksrsquo having nothing to do with

graphics the applications still had to be programmed using graphics APIs

In addition the program had to be structured in terms of the graphics

pipeline with the programmable units only accessible as an intermediate

step in that pipeline when the programmer would almost certainly prefer to

access the programmable units directly The programming environments we

describe in detail in Section IV are solving this difficulty by providing a

more natural direct non-graphics interface to the hardware and

specifically the programmable units Today GPU computing applications

are structured in the following way

The programmer directly defines the computation domain of interest as a

structured grid of threads

An SPMD general-purpose program computes the value of each thread

The value for each thread is computed by a combination of math

operations and both Bgather[ (read) accesses from and Bscatter[ (write)

accesses to global memory Unlike in the previous two

methods the same buffer can be used for both reading and writing

allowing more flexible algorithms (for example in-place algorithms that

use less memory)

The resulting buffer in global memory can then be used as an input in

future computation

24 COARSE-TO-FINE TWO-STAGE MATCHING PROCESS

To further improve the matching process we propose the coarse-to-fine

two-stage matching process In the first stage we matched two images

coarsely using the Y-shape descriptors which is very fast to match because

no registration was needed The matching result in this stage can help filter

out image pairs with low similarities After this step it is still possible for

some false positive matches In the second stage we used WPL descriptor

to register the two images for more detailed descriptor matching including

scale- and translation invariance This stage includes shift transform affine

matrix generation and final WPL descriptor matching Overall we

partitioned the registration and matching processing into four kernels2 in

CUDA (Figure 10) matching on the Y shape descriptor shift

transformation affine matrix generation and final WSL descriptor

matching Combining these two stages the matching program can run faster

and

achieve more accurate score

241 STAGE I MATCHING WITH Y SHAPE DESCRIPTOR

Due to scale- and rotation- invariance of the Y-shape features

registration is unnecessary before matching on Y shape descriptor The

whole matching algorithm is listed as algorithm 1

FIG

Here ytei and yta j are the Y shape descriptors of test template Tte

and target template Tta respectively dϕ is the Euclidian distance of angle

element of descriptors vector defined as (3) dxy is the Euclidian distance of

two descriptor centers defined as (4) ni and di are the matched descriptor

pairsrsquo number and their centers distance respectively tϕ is a distance

threshold and txy is the threshold to restrict the searching area We set tϕ to

30 and txy to 675 in our experiment Here

To match two sclera templates we searched the areas nearby to all

the Y shape branches The search area is limited to the corresponding left or

right half of the sclera in order to reduce the searching range and time The

distance of two branches is defined in (3) where ϕi j is the angle between

the j th branch and the polar from pupil center in desctiptor i

The number of matched pairs ni and the distance between Y shape

branches centers di are stored as the matching result We fuse the number of

matched branches and the average distance between matched branches

centers as (2) Here α is a factor to fuse the matching score which was set

to 30 in our study Ni and Nj is the total numbers of feature vectors in

template i and j separately The decision is regulated by the threshold t if

the sclerarsquos matching score is lower than t the sclera will be discarded The

sclera with high matching score will be passed to the next more precisely

matching process

242 STAGE II FINE MATCHING USING WPL DESCRIPTOR

The line segment WSL descriptor reveals more vessel structure detail of

sclera than the Y shape descriptor The variation of sclera vessel pattern is

nonlinear because

When acquiring an eye image in different gaze angle the vessel structure

will appear nonlinear shrink or extend because eyeball is spherical in shape

sclera is made up of four layers episclera stroma lamina fusca and

endothelium There are slightly differences among movement of these

layers Considering these factors our registration employed both single

shift transform and multi-parameter transform which combines shift

rotation and scale together

1) SHIFT PARAMETER SEARCH As we discussed before

segmentation may not be accurate As a result the detected iris center could

not be very accurate Shift transform is designed to tolerant possible errors

in pupil center detection in the segmentation step If there is no deformation

or only very minor deformation registration with shift transform together

would be adequate to achieve an accurate result We designed Algorithm 2

to get optimized shift parameter Where Tte is the test template and ssei is

the i th WPL descriptor of Tte Tta is the target template and ssai is the i th

WPL descriptor of Ttad(stek staj ) is Euclidean distance of escriptors stek

and staj

Δsk is the shift value of two descriptors defines as

We first randomly select an equal number of segment descriptors

stek in test template Tte from each quad and find its nearest neighbors staj _

in target template Tta The shift offset of them is recorded as the possible

registration shift factor _sk The final offset registration factor is _soptim

which has the smallest standard deviation among these candidate offsets

2) AFFINE TRANSFORM PARAMETER SEARCH

Affine transform is designed to tolerant some deformation of sclera

patterns in the matching step The affine transform algorithm is shown in

Algorithm 3 The shift value in the parameter set is obtained by randomly

selecting descriptor s(it )te and calculating the distance from its nearest

neighbor staj_ in Tta We transform the test template by the matrix in (7)

At end of the iteration we count the numbers of matched descriptor pairs

from the transformed template and the target template The factor β is

involved to determine if the pair of descriptor is matched and we set it to

be 20 pixels in our experiment After N iterations the optimized transform

parameter set is determined via selecting the maximum matching numbers

m(it) Here stei Tte staj and Tta is defined same as algorithm 2 tr (it )

shi f t θ(it )tr (it ) scale is the parameters of shift rotation and scale

transform generated in i tth iteration R(θ (it )) T (tr (it ) shi f t ) and S(tr (it

) scale) are the transform matrix defined as (7) To search optimize

transform parameter we iterated N times to generate these parameters In

our experiment we set iteration time to 512

3) REGISTRATION AND MATCHING ALGORITHM

Using the optimized parameter set determined from Algorithms 2

and 3 the test template will be registered and matched simultaneously The

registration and matching algorithm is listed in Algorithm 4 Here stei Tte

staj and Tta are defined same as Algorithms 2 and 3 θ(optm) tr (optm) shi f

t tr (optm) scale _soptim are the registration parameters attained from

Algorithms 2 and 3 R_θ(optm)_T _tr (optm) shi f t _S_tr (optm) scale

_ is the descriptor transform matrix defined in Algorithm 3 empty is the angle

between the segment descriptor and radius direction w is the weight of the

descriptor which indicates whether the descriptor is at the edge of sclera or

not To ensure that the nearest descriptors have a similar orientation we

used a constant factor α to check the abstract difference of two ɸ In our

experiment we set α to 5 The total matching score is minimal score of two

transformed result divided by the minimal matching score for test template

and target template

25 MAPPING THE SUBTASKS TO CUDA

CUDA is a single instruction multiple data (SIMD) system and

works as a coprocessor with a CPU A CUDA consists of many streaming

multiprocessors (SM) where the parallel part of the program should be

partitioned into threads by the programmer and mapped into those threads

There are multiple memory spaces in the CUDA memory hierarchy

register local memory shared memory global memory constant memory

and texture memory Register local memory and shared memory are on-

chip and could be a little time consuming to access these memories Only

shared memory can be accessed by other threads within the same block

However there is only limited availability of shared memory Global

memory constant memory and texture memory are off-chip memory and

accessible by all threads which would be very time consuming to access

these memories

Constant memory and texture memory are read-only and cacheable

memory Mapping algorithms to CUDA to achieve efficient processing is

not a trivial task There are several challenges in CUDA programming

If threads in a warp have different control path all the branches will be

executed serially To improve performance branch divergence within a

warp should be avoided

Global memory is slower than on-chip memory in term of access To

completely hide the latency of the small instructions set we should use on-

chip memory preferentially rather than global memory When global

memory access occurs threads in same warp should access the words in

sequence to achieve coalescence

Shared memory is much faster than the local and global memory space

But shared memory is organized into banks which are equal in size If two

addresses of memory request from different thread within a warp fall in the

same memory bank the access will be serialized To get maximum

performance memory requests should be scheduled to minimize bank

conflicts

251 MAPPING ALGORITHM TO BLOCKS

Because the proposed registration and matching algorithm has four

independent modules all the modules will be converted to different kernels

on the GPU These kernels are different in computation density thus we

map them to the GPU by various map strategies to fully utilize the

computing power of CUDA Figure 11 shows our scheme of CPU-GPU

task distribution and the partition among blocks and threads Algorithm 1 is

partitioned into coarse-grained parallel subtasks

We create a number of threads in this kernel The number of threads

is the same as the number of templates in the database As the upper middle

column shows in Figure 11 each target template will be assigned to one

thread One thread performs a pair of templates compare In our work we

use NVIDIA C2070 as our GPU Threads and blocks number is set to

1024 That means we can match our test template with up to 1024times1024

target templates at same time

Algorithms 2-4 will be partitioned into fine-grained subtasks which is

processed a section of descriptors in one thread As the lower portion of the

middle column shows in Figure 11 we assigned a target template to one

block Inside a block one thread corresponds a set of descriptors in this

template This partition makes every block execute independently and there

are no data exchange requirements between different blocks When all

threads complete their responding descriptor fractions the sum of the

intermediate results needs to be computed or compared A parallel prefix

sum algorithm is used to calculate the sum of intermediate results which is

show in right of Figure 11 Firstly all odd number threads compute the sum

of consecutive pairs of the results Then recursively every first of i (= 4 8

16 32 64 ) threads

compute the prefix sum on the new result The final result will be saved in

the first address which has the same variable name as the first intermediate

result

252 MAPPING INSIDE BLOCK

In shift argument searching there are two schemes we can choose to

map task

Mapping one pair of templates to all the threads in a block and then every

thread would take charge of a fraction of descriptors and cooperation with

other threads

Assigning a single possible shift offset to a thread and all the threads will

compute independently unless the final result should be compared with

other possible offset

Due to great number of sum and synchronization operations in every

nearest neighbor searching step we choose the second method to parallelize

shift searching In affine matrix generator we mapped an entire parameter

set searching to a thread and every thread randomly generated a set of

parameters and tried them independently The generated iterations were

assigned to all threads The challenge of this step is the randomly generated

numbers might be correlated among threads In the step of rotation and

scale registration generating we used the Mersenne Twister pseudorandom

number generator because it can use bitwise arithmetic and have long

period

The Mersenne twister as most of pseudorandom generators is iterative

Therefore itrsquos hard to parallelize a single twister state update step among

several execution threads To make sure that thousands of threads in the

launch grid generate uncorrelated random sequence many simultaneous

Mersenne twisters need to process with different initial states in parallel

But even ldquovery differentrdquo (by any definition) initial state values do not

prevent the emission of correlated sequences by each generator sharing

identical parameters To solve this problem and to enable efficient

implementation of Mersenne Twister on parallel architectures we used a

special offline tool for the dynamic creation of Mersenne Twisters

parameters modified from the algorithm developed by Makoto Matsumoto

and Takuji Nishimura In the registration and matching step when

searching the nearest neighbor a line segment that has already matched

with others should not be used again In our approach a flag

FIG

FIG

Variable denoting whether the line has been matched is stored in

shared memory To share the flags all the threads in a block should wait

synchronic operation at every query step Our solution is to use a single

thread in a block to process the matching

253 MEMORY MANAGEMENT

The bandwidth inside GPU board is much higher than the

bandwidth between host memory and device memory The data transfer

between host and device can lead to long latency As shown in Figure 11

we load the entire target templates set from database without considering

when they would be processed Therefore there was no data transfer from

host to device during the matching procedure In global memory the

components in descriptor y(ϕ1 ϕ2 ϕ3 x y) and s(x y rθ ϕw) were stored

separately This would guarantee contiguous kernels of Algorithm 2 to 4

can access their data in successive addresses Although such coalescing

access reduces the latency frequently global memory access was still a

slower way to get data In our kernel we loaded the test template to shared

memory to accelerate memory access Because the Algorithms 2 to 4

execute different number of iterations on same data the bank conflict does

not happen To maximize our texture memory space we set the system

cache to the lowest value and bonded our target descriptor to texture

memory Using this catchable memory our data access was accelerated

more

FIG

26 HISTOGRAM OF ORIENTED GRADIENTS

Histogram of oriented gradients is the feature descriptors It is primarily

applied to the design of target detection In this paper it is applied as the

feature for human recognition In the sclera region the vein patterns are the

edges of an image So HOG is used to determine the gradient orientation

and edge orientations of vein pattern in the sclera region of an eye image

To follow out this technique first of all divide the image into small

connected regions called cells For each cell compute the histogram of

gradient directions or edge orientations of the pixels Then the combination

of different histogram of different cell represents the descriptor To improve

accuracy histograms can be contrast normalized by calculating the intensity

from the block and then using this value normalizes all cells within the

block This normalization result shows that it is invariant to geometric and

photometric changes The gradient magnitude m(x y) and orientation 1050592(x

y) are calculated using x and y directions gradients dx (x y) and dy (x y)

Orientation binning is the second step of HOG This method utilized

to create cell histograms Each pixel within the cell used to give a weight to

the orientation which is found in the gradient computation Gradient

magnitude is used as the weight The cells are in the rectangular form The

binning of gradient orientation should be spread over 0 to 180 degrees and

opposite direction counts as the same In the Fig 8 depicts the edge

orientation of picture elements If the images have any illumination and

contrast changes then the gradient strength must be locally normalized For

that cells are grouped together into larger blocks These blocks are

overlapping so that each cell contributes more than once to the final

descriptor Here rectangular HOG (R-HOG) blocks are applied which are

mainly in square grids The performance of HOG is improved by putting

on a Gaussian window into each block

FIG

CHAPTER 3

SOFTWARE SPECIFICATION

31 GENERAL

MATLAB(matrix laboratory) is a numerical

computing environment and fourth-generation programming language

Developed by Math Works MATLAB allows matrix manipulations

plotting of functions and data implementation of algorithms creation

of user interfaces and interfacing with programs written in other languages

including C C++ Java and Fortran

Although MATLAB is intended primarily for numerical computing an

optional toolbox uses the MuPAD symbolic engine allowing access

to symbolic computing capabilities An additional package Simulink adds

graphicalmulti-domainsimulationandModel-Based

Design for dynamic and embedded systems

In 2004 MATLAB had around one million users across industry

and academia MATLAB users come from various backgrounds

of engineering science and economics MATLAB is widely used in

academic and research institutions as well as industrial enterprises

MATLAB was first adopted by researchers and practitioners

in control engineering Littles specialty but quickly spread to many other

domains It is now also used in education in particular the teaching

of linear algebra and numerical analysis and is popular amongst scientists

involved in image processing The MATLAB application is built around the

MATLAB language The simplest way to execute MATLAB code is to type

it in the Command Window which is one of the elements of the MATLAB

Desktop When code is entered in the Command Window MATLAB can

be used as an interactive mathematical shell Sequences of commands can

be saved in a text file typically using the MATLAB Editor as a script or

encapsulated into a function extending the commands available

MATLAB provides a number of features for documenting and

sharing your work You can integrate your MATLAB code with other

languages and applications and distribute your MATLAB algorithms and

applications

32 FEATURES OF MATLAB

High-level language for technical computing

Development environment for managing code files and data

Interactive tools for iterative exploration design and problem solving

Mathematical functions for linear algebra statistics Fourier analysis

filtering optimization and numerical integration

2-D and 3-D graphics functions for visualizing data

Tools for building custom graphical user interfaces

Functions for integrating MATLAB based algorithms with external

applications and languages such as C C++ FORTRAN Javatrade COM

and Microsoft Excel

MATLAB is used in vast area including signal and image

processing communications control design test and measurement

financial modeling and analysis and computational Add-on toolboxes

(collections of special-purpose MATLAB functions) extend the MATLAB

environment to solve particular classes of problems in these application

areas

MATLAB can be used on personal computers and powerful

server systems including the Cheaha compute cluster With the addition of

the Parallel Computing Toolbox the language can be extended with parallel

implementations for common computational functions including for-loop

unrolling Additionally this toolbox supports offloading computationally

intensive workloads to Cheaha the campus compute cluster MATLAB is

one of a few languages in which each variable is a matrix (broadly

construed) and knows how big it is Moreover the fundamental operators

(eg addition multiplication) are programmed to deal with matrices when

required And the MATLAB environment handles much of the bothersome

housekeeping that makes all this possible Since so many of the procedures

required for Macro-Investment Analysis involves matrices MATLAB

proves to be an extremely efficient language for both communication and

implementation

321 INTERFACING WITH OTHER LANGUAGES

MATLAB can call functions and subroutines written in the C

programming language or FORTRAN A wrapper function is created

allowing MATLAB data types to be passed and returned The dynamically

loadable object files created by compiling such functions are termed MEX-

files (for MATLAB executable)

Libraries written in Java ActiveX or NET can be directly called

from MATLAB and many MATLAB libraries (for

example XML or SQL support) are implemented as wrappers around Java

or ActiveX libraries Calling MATLAB from Java is more complicated but

can be done with MATLAB extension which is sold separately by Math

Works or using an undocumented mechanism called JMI (Java-to-Mat lab

Interface) which should not be confused with the unrelated Java that is also

called JMI

As alternatives to the MuPAD based Symbolic Math Toolbox

available from Math Works MATLAB can be connected

to Maple or Mathematical

Libraries also exist to import and export MathML

Development Environment

Startup Accelerator for faster MATLAB startup on Windows especially on

Windows XP and for network installations

Spreadsheet Import Tool that provides more options for selecting and

loading mixed textual and numeric data

Readability and navigation improvements to warning and error messages in

the MATLAB command window

Automatic variable and function renaming in the MATLAB Editor

Developing Algorithms and Applications

MATLAB provides a high-level language and development

tools that let you quickly develop and analyze your algorithms and

applications

The MATLAB Language

The MATLAB language supports the vector and matrix operations

that are fundamental to engineering and scientific problems It enables fast

development and execution With the MATLAB language you can

program and develop algorithms faster than with traditional languages

because you do not need to perform low-level administrative tasks such as

declaring variables specifying data types and allocating memory In many

cases MATLAB eliminates the need for lsquoforrsquo loops As a result one line of

MATLAB code can often replace several lines of C or C++ code

At the same time MATLAB provides all the features of a traditional

programming language including arithmetic operators flow control data

structures data types object-oriented programming (OOP) and debugging

features

MATLAB lets you execute commands or groups of commands one

at a time without compiling and linking enabling you to quickly iterate to

the optimal solution For fast execution of heavy matrix and vector

computations MATLAB uses processor-optimized libraries For general-

purpose scalar computations MATLAB generates machine-code

instructions using its JIT (Just-In-Time) compilation technology

This technology which is available on most platforms provides

execution speeds that rival those of traditional programming languages

Development Tools

MATLAB includes development tools that help you implement

your algorithm efficiently These include the following

MATLAB Editor

Provides standard editing and debugging features such as setting

breakpoints and single stepping

Code Analyzer

Checks your code for problems and recommends modifications to

maximize performance and maintainability

MATLAB Profiler

Records the time spent executing each line of code

Directory Reports

Scan all the files in a directory and report on code efficiency file

differences file dependencies and code coverage

Designing Graphical User Interfaces

By using the interactive tool GUIDE (Graphical User Interface

Development Environment) to layout design and edit user interfaces

GUIDE lets you include list boxes pull-down menus push buttons radio

buttons and sliders as well as MATLAB plots and Microsoft

ActiveXreg controls Alternatively you can create GUIs programmatically

using MATLAB functions

322 ANALYZING AND ACCESSING DATA

MATLAB supports the entire data analysis process from acquiring

data from external devices and databases through preprocessing

visualization and numerical analysis to producing presentation-quality

output

Data Analysis

MATLAB provides interactive tools and command-line functions for data

analysis operations including

Interpolating and decimating

Extracting sections of data scaling and averaging

Thresholding and smoothing

Correlation Fourier analysis and filtering

1-D peak valley and zero finding

Basic statistics and curve fitting

Matrix analysis

Data Access

MATLAB is an efficient platform for accessing data from

files other applications databases and external devices You can read data

from popular file formats such as Microsoft Excel ASCII text or binary

files image sound and video files and scientific files such as HDF and

HDF5 Low-level binary file IO functions let you work with data files in

any format Additional functions let you read data from Web pages and

XML

Visualizing Data

All the graphics features that are required to visualize engineering

and scientific data are available in MATLAB These include 2-D and 3-D

plotting functions 3-D volume visualization functions tools for

interactively creating plots and the ability to export results to all popular

graphics formats You can customize plots by adding multiple axes

changing line colors and markers adding annotation Latex equations and

legends and drawing shapes

2-D Plotting

Visualizing vectors of data with 2-D plotting functions that create

Line area bar and pie charts

Direction and velocity plots

Histograms

Polygons and surfaces

Scatterbubble plots

Animations

3-D Plotting and Volume Visualization

MATLAB provides functions for visualizing 2-D matrices 3-

D scalar and 3-D vector data You can use these functions to visualize and

understand large often complex multidimensional data Specifying plot

characteristics such as camera viewing angle perspective lighting effect

light source locations and transparency

3-D plotting functions include

Surface contour and mesh

Image plots

Cone slice stream and isosurface

323 PERFORMING NUMERIC COMPUTATION

MATLAB contains mathematical statistical and engineering

functions to support all common engineering and science operations These

functions developed by experts in mathematics are the foundation of the

MATLAB language The core math functions use the LAPACK and BLAS

linear algebra subroutine libraries and the FFTW Discrete Fourier

Transform library Because these processor-dependent libraries are

optimized to the different platforms that MATLAB supports they execute

faster than the equivalent C or C++ code

MATLAB provides the following types of functions for performing

mathematical operations and analyzing data

Matrix manipulation and linear algebra

Polynomials and interpolation

Fourier analysis and filtering

Data analysis and statistics

Optimization and numerical integration

Ordinary differential equations (ODEs)

Partial differential equations (PDEs)

Sparse matrix operations

MATLAB can perform arithmetic on a wide range of data types

including doubles singles and integers

CHAPTER 4

IMPLEMENTATION

41 GENERAL

Matlab is a program that was originally designed to simplify the

implementation of numerical linear algebra routines It has since grown into

something much bigger and it is used to implement numerical algorithms

for a wide range of applications The basic language used is very similar to

standard linear algebra notation but there are a few extensions that will

likely cause you some problems at first

42 SNAPSHOTS

ORIGINAL SCLERA IMAGE IS CONVERTED INTO GREY SCALE IMAGE

FIG

GREY SCALE IMAGE IS CONVERTED INTO BINARY IMAGE

FIG

EDGE DETECTON IS DONE BY OTSUrsquoS THRESHOLDING

FIG

SELECTING THE REGION OF INTEREST (SCLERA PART)

FIG

SELECTED ROI PART

FIG

FIG

ENHANCEMENT OF SCLERA IMAGE

FIG

FEATURE EXTRACTION OF SCLERA IMAGE USING GABOR

FILTERS

FIG

MATCHING WITH IMAGES IN DATABASE

FIG

DISPLAYING THE RESULT (MATCHED OR NOT MATCHED)

FIG

CHAPTER 5

APPLICATIONS

The applications of biometrics can be divided into the following three main groups

Commercial applications such as computer network login electronic data security ecommerce Internet access ATM credit card physical access control cellular phone PDA medical records management distance learning etc

Government applications such as national ID card correctional facility driverrsquos license social security welfare-disbursement border control Passport control etc

Forensic applications such as corpse identification criminal investigation terrorist identification parenthood determination missing children etc Traditionally commercial applications have used knowledge-based systems (eg PIN sand passwords) government applications have used token-based systems (eg ID cards and badges) and forensic applications have relied on human experts to match biometric features Biometric systems are being increasingly deployed in large scale civilian applications The Schiphol Premium scheme at the Amsterdam airport for example employs iris scan cards to speed up the passport and visa control procedures

CHAPTER 6

CONCLUSION AND FUTURE SCOPE

61 CONCLUSION

In this paper we proposed a new parallel sclera vein recognition

method which employees a two stage parallel approach for registration and

matching Even though the research focused on developing a parallel sclera

matching solution for the sequential line-descriptor method using CUDA

GPU architecture the parallel strategies developed in this research can be

applied to design parallel solutions to other sclera vein recognition methods

and general pattern recognition methods We designed the Y shape

descriptor to narrow the search range to increase the matching efficiency

which is a new feature extraction method to take advantage of the GPU

structures We developed the WPL descriptor to incorporate mask

information and make it more suitable for parallel computing which can

dramatically reduce data transferring and computation We then carefully

mapped our algorithms to GPU threads and blocks which is an important

step to achieve parallel computation efficiency using a GPU A work flow

which has high arithmetic intensity to hide the memory access latency was

designed to partition the computation task to the heterogeneous system of

CPU and GPU even to the threads in GPU The proposed method

dramatically improves the matching efficiency without compromising

recognition accuracy

62 REFERENCES

[1] C W Oyster The Human Eye Structure and Function Sunderland

Sinauer Associates 1999

[2] C Cuevas D Berjon F Moran and N Garcia ldquoMoving object

detection for real-time augmented reality applications in a GPGPUrdquo IEEE

Trans Consum Electron vol 58 no 1 pp 117ndash125 Feb 2012

[3] D C Cirean U Meier L M Gambardella and J Schmidhuber ldquoDeep

big simple neural nets for handwritten digit recognitionrdquo Neural Comput

vol 22 no 12 pp 3207ndash3220 2010

[4] F Z Sakr M Taher and A M Wahba ldquoHigh performance iris

recognition system on GPUrdquo in Proc ICCES 2011 pp 237ndash242

[5] G Poli J H Saito J F Mari and M R Zorzan ldquoProcessing

neocognitron of face recognition on high performance environment based

on GPU with CUDA architecturerdquo in Proc 20th Int Symp Comput

Archit High Perform Comput 2008 pp 81ndash88

[6] J Antikainen J Havel R Josth A Herout P Zemcik and M Hauta-

Kasari ldquoNonnegative tensor factorization accelerated using GPGPUrdquo IEEE

Trans Parallel Distrib Syst vol 22 no 7 pp 1135ndash1141 Feb 2011

[7] K-S Oh and K Jung ldquoGPU implementation of neural networksrdquo

Pattern Recognit vol 37 no 6 pp 1311ndash1314 2004

[8] P R Dixon T Oonishi and S Furui ldquoHarnessing graphics processors

for the fast computation of acoustic likelihoods in speech recognitionrdquo

Comput Speech Lang vol 23 no 4 pp 510ndash526 2009

[9] P Kaufman and A Alm ldquoClinical applicationrdquo Adlerrsquos Physiology of

the Eye 2003

[10] R N Rakvic B J Ulis R P Broussard R W Ives and N Steiner

ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics Security vol 4

no 4 pp 812ndash823 Dec 2009

[11] S Crihalmeanu and A Ross ldquoMultispectral scleral patterns for ocular

biometric recognitionrdquo Pattern Recognit Lett vol 33 no 14 pp 1860ndash

1869 Oct 2012

[12] W Wenying Z Dongming Z Yongdong L Jintao and G

Xiaoguang ldquoRobust spatial matching for object retrieval and its parallel

implementation on GPUrdquo IEEE Trans Multimedia vol 13 no 6 pp

1308ndash1318 Dec 2011Multimedia Sec Magdeburg Germany Sep 2004

pp 4ndash15

[13] Y Xu S Deka and R Righetti ldquoA hybrid CPU-GPGPU approach for

real-time elastographyrdquo IEEE Trans Ultrason Ferroelectr Freq Control

vol 58 no 12 pp 2631ndash2645 Dec 2011

[14] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive

multimodal eye recognitionrdquo Signal Image Video Process vol 7 no 4

pp 619ndash631 Jul 2013

[15] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive

approach for sclera image quality measurerdquo Int J Biometrics vol 5 no 2

pp 181ndash198 2013

[16] Z Zhou E Y Du N L Thomas and E J Delp ldquoA new human

identification method Sclera recognitionrdquo IEEE Trans Syst Man

Cybern A Syst Humans vol 42 no 3 pp 571ndash583 May 2012

Page 7: Major project report on

251 MAPPING ALGORITHM TO BLOCKS

252 MAPPING INSIDE BLOCK

253 MEMORY MANAGEMENT

26 HISTOGRAM OF ORIENTED GRADIENTS

CHAPTER 3 SOFTWARE

SPECIFICATIONhelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip47-53

31 GENERAL

32 SOFTWARE REQUIREMENTS

33 INTRODUCTION

34 FEATURES OF MATLAB

341 INTERFACING WITH OTHER LANGUAGES

35 THE MATLAB SYSTEM

351 DESKTOP TOOLS

352 ANALYZING AND ACCESSING DATA

353 PERFORMING NUMERIC COMPUTATION

CHAPTER 4

IMPLEMENTATIONhelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

helliphellip54-69

41 GENERAL

42 CODING IMPLEMENTATION

43 SNAPSHOTS

CHAPTER 5 helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

70

CHAPTER 6 CONCLUSION amp FUTURE

SCOPEhelliphelliphelliphelliphelliphelliphelliphelliphelliphellip71-72

61 CONCLUSION

62 REFERENCES

APPLICATION

LIST OF FIGURES

FIG NO

FIG NAME PGNO

11 Fundamental blocks of digital image processing

2

12 Gray scale image 813 The additive model of RGB 914 The colors created by the

subtractive model of CMYK9

21 The diagram of a typical sclera vein recognition approach

19

22 Steps of Segmentation 2123 Glare area detection 2124 Detection of the sclera area 22

25 Pattern of veins 23

26 Sclera region and its vein

patterns 25

27 Filtering can take place

simultaneously on different

25

parts of the iris image

28 The sketch of parameters of

segment descriptor

26

29 The weighting image 28

210 The module of sclera

template matching

28

211 The Y shape vessel branch in

sclera

28

212 The rotation and scale

invariant character of Y

shape vessel branch

29

213 The line descriptor of the

sclera vessel pattern

30

214 The key elements of

descriptor vector

31

215 Simplified sclera matching

steps on GPU

32

216 Two-stage matching scheme 35

217 Example image from the

UBIRIS database

42

218 Occupancy on various thread

numbers per block

43

219 The task assignment inside

and outside the GPU

44

220 HOG features 46

41 Original sclera image 65

42 Binarised sclera image 65

43 Edge map subtracted image 66

44 Cropping roi 66

45 Roi mask 67

46 Roi finger sclera image 67

47 Enhanced sclera image 68

48 Feature extracted sclera

image

68

49 Matching with images in

database

69

410 Result 69

ABSTRACT

Sclera vein recognition is shown to be a promising method for human

identification However its matching speed is slow which could impact its

application for real-time applications To improve the matching efficiency

we proposed a new parallel sclera vein recognition method using a two-

stage parallel approach for registration and matching First we designed a

rotation- and scale-invariant Y shape descriptor based feature extraction

method to efficiently eliminate most unlikely matches Second we

developed a weighted polar line sclera descriptor structure to incorporate

mask information to reduce GPU memory cost Third we designed a

coarse-to-fine two-stage matching method Finally we developed a

mapping scheme to map the subtasks to GPU processing units The

experimental results show that our proposed method can achieve dramatic

processing speed improvement without compromising the recognition

accuracy

CHAPTER 1

INTRODUCTION

11GENERAL

Digital image processing is the use of computer algorithms to

perform image processing on digital images The 2D continuous image is

divided into N rows and M columns The intersection of a row and a

column is called a pixel The image can also be a function other variables

including depth color and time An image given in the form of a

transparency slide photograph or an X-ray is first digitized and stored as a

matrix of binary digits in computer memory This digitized image can then

be processed andor displayed on a high-resolution television monitor For

display the image is stored in a rapid-access buffer memory which

refreshes the monitor at a rate of 25 frames per second to produce a visually

continuous display

12 OVERVIEW ABOUT DIGITAL IMAGE PROCESSING

The field of ldquoDigital Image Processingrdquo refers to processing the digital

images by means of a digital computer In a broader sense it can be

considered as a processing of any two dimensional data where any image

(optical information) is represented as an array of real or complex numbers

represented by a definite number of bits An image is represented as a two

dimensional function f(xy) where lsquoxrsquo and lsquoyrsquo are spatial (plane)

coordinates and the amplitude of f at any pair of coordinates (xy)

represents the intensity or gray level of the image at that point

A digital image is one for which both the co-ordinates and the

amplitude values of f are all finite discrete quantities Hence a digital

image is composed of a finite number of elements each of which has a

particular location value These elements are called ldquopixelsrdquo A digital

image is discrete in both spatial coordinates and brightness and it can be

considered as a matrix whose rows and column indices identify a point on

the image and the corresponding matrix element value identifies the gray

level at that point

One of the first applications of digital images was in the newspaper

industry when pictures were first sent by submarine cable between London

and New York Introduction of the Bartlane cable picture transmission

system in the early 1920s reduced the time required to transport a picture

across the Atlantic from more than a week to less than three hours

FIG

121 PREPROCESSING

In imaging science image processing is any form of signal

processing for which the input is an image such as a photograph or video

frame the output of image processing may be either an image or a set of

characteristics or parameters related to the image Most image-processing

techniques involve treating the image as a two-dimensional signal and

applying standard signal-processing techniques to it Image processing

usually refers to digital image processing but optical and analog image

processing also are possible This article is about general techniques that

apply to all of them The acquisition of images (producing the input image

in the first place) is referred to as imaging

Image processing refers to processing of a 2D picture by a

computer Basic definitions

An image defined in the ldquoreal worldrdquo is considered to be a function

of two real variables for example a(xy) with a as the amplitude (eg

brightness) of the image at the real coordinate position (xy) Modern digital

technology has made it possible to manipulate multi-dimensional signals

with systems that range from simple digital circuits to advanced parallel

computers The goal of this manipulation can be divided into three

categories

Image processing (image in -gt image out)

Image Analysis (image in -gt measurements out)

Image Understanding (image in -gt high-level description out)

An image may be considered to contain sub-images sometimes referred

to as regions-of-interest ROIs or simply regions This concept reflects the

fact that images frequently contain collections of objects each of which can

be the basis for a region In a sophisticated image processing system it

should be possible to apply specific image processing operations to selected

regions Thus one part of an image (region) might be processed to suppress

motion blur while another part might be processed to improve colour

rendition

Most usually image processing systems require that the images be

available in digitized form that is arrays of finite length binary words For

digitization the given Image is sampled on a discrete grid and each sample

or pixel is quantized using a finite number of bits The digitized image is

processed by a computer To display a digital image it is first converted

into analog signal which is scanned onto a display Closely related to

image processing are computer graphics and computer vision In computer

graphics images are manually made from physical models of objects

environments and lighting instead of being acquired (via imaging devices

such as cameras) from natural scenes as in most animated movies

Computer vision on the other hand is often considered high-level image

processing out of which a machinecomputersoftware intends to decipher

the physical contents of an image or a sequence of images (eg videos or

3D full-body magnetic resonance scans)

In modern sciences and technologies images also gain much

broader scopes due to the ever growing importance of scientific

visualization (of often large-scale complex scientificexperimental data)

Examples include microarray data in genetic research or real-time multi-

asset portfolio trading in finance Before going to processing an image it is

converted into a digital form Digitization includes sampling of image and

quantization of sampled values After converting the image into bit

information processing is performed This processing technique may be

Image enhancement Image restoration and Image compression

122 IMAGE ENHANCEMENT

It refers to accentuation or sharpening of image features such as

boundaries or contrast to make a graphic display more useful for display amp

analysis This process does not increase the inherent information content in

data It includes gray level amp contrast manipulation noise reduction edge

crispening and sharpening filtering interpolation and magnification

pseudo coloring and so on

123 IMAGE RESTORATION

It is concerned with filtering the observed image to minimize the

effect of degradations Effectiveness of image restoration depends on the

extent and accuracy of the knowledge of degradation process as well as on

filter design Image restoration differs from image enhancement in that the

latter is concerned with more extraction or accentuation of image features

124 IMAGE COMPRESSION

It is concerned with minimizing the number of bits required to represent

an image Application of compression are in broadcast TV remote sensing

via satellite military communication via aircraft radar teleconferencing

facsimile transmission for educational amp business documents medical

images that arise in Computer tomography magnetic resonance imaging

and digital radiology motion pictures satellite images weather maps

geological surveys and so on

Text compression ndash CCITT GROUP3 amp GROUP4

Still image compression ndash JPEG

Video image compression ndash MPEG

125 SEGMENTATION

In computer vision image segmentation is the process of

partitioning a digital image into multiple segments (sets of pixels also

known as super pixels) The goal of segmentation is to simplify andor

change the representation of an image into something that is more

meaningful and easier to analyze Image segmentation is typically used to

locate objects and boundaries (lines curves etc) in images More precisely

image segmentation is the process of assigning a label to every pixel in an

image such that pixels with the same label share certain visual

characteristics

The result of image segmentation is a set of segments that

collectively cover the entire image or a set of contours extracted from the

image (see edge detection) Each of the pixels in a region are similar with

respect to some characteristic or computed property such as

colour intensity or texture Adjacent regions are significantly different

with respect to the same characteristic(s) When applied to a stack of

images typical in medical imaging the resulting contours after image

segmentation can be used to create 3D reconstructions with the help of

interpolation algorithms like marching cubes

126 IMAGE RESTORATION

Image restoration like enhancement improves the qualities of image

but all the operations are mainly based on known measured or

degradations of the original image Image restorations are used to restore

images with problems such as geometric distortion improper focus

repetitive noise and camera motion It is used to correct images for known

degradations

127 FUNDAMENTAL STEPS

Image acquisition to acquire a digital image

Image preprocessing to improve the image in ways that increases the

chances for success of the other processes

Image segmentation to partitions an input image into its constituent parts or

objects

Image representation to convert the input data to a form suitable for

computer processing

Image description to extract features that result in some quantitative

information of interest or features that are basic for differentiating one

class of objects from another

Image recognition to assign a label to an object based on the

information provided by its descriptors

Image interpretation to assign meaning to an ensemble of recognized

objects

Knowledge about a problem domain is coded into an image processing

system in the form of a Knowledge database

13 A SIMPLE IMAGE MODEL

To be suitable for computer processing an image f(xy) must be digitalized

both spatially and in amplitude

Digitization of the spatial coordinates (xy) is called image sampling

Amplitude digitization is called gray-level quantization

The storage and processing requirements increase rapidly with the spatial

resolution and the number of gray levels

Example A 256 gray-level image of size 256x256 occupies 64K bytes of

memory

Images of very low spatial resolution produce a checkerboard effect

The use of insufficient number of gray levels in smooth areas of a digital

image results in false contouring

14 IMAGE FILE FORMATS

There are two general groups of lsquoimagesrsquo vector graphics (or line art)

and bitmaps (pixel-based or lsquoimagesrsquo) Some of the most common file

formats are

GIF mdash Graphical interchange Format An 8-bit (256 colour) non-

destructively compressed bitmap format Mostly used for web Has several

sub-standards one of which is the animated GIF

JPEG mdash Joint Photographic Experts Group a very efficient (ie much

information per byte) destructively compressed 24 bit (16 million colours)

bitmap format Widely used especially for web and Internet (bandwidth-

limited)

TIFF mdash Tagged Image File Format The standard 24 bit publication bitmap

format Compresses non-destructively with for instance Lempel-Ziv-

Welch (LZW) compression

PS mdash Postscript a standard vector format Has numerous sub-standards

and can be difficult to transport across platforms and operating systems

PSD ndash Adobe PhotoShop Document a dedicated Photoshop format that

keeps all the information in an image including all the layers

BMP- bit map file format

15 TYPE OF IMAGES

Images are 4 types

1 Binary image

2 Gray scale image

3 Color image

4 Indexed image

151 BINARY IMAGES

A binary image is a digital image that has only two possible values for

each pixel Typically the two colors used for a binary image are black and

white though any two colors can be used Binary images are also called bi-

level or two-level This means that each pixel is stored as a single bitmdashie

a 0 or 1 The names black-and-white BampW

152 GRAY SCALE IMAGE

In a (8-bit) grayscale image each picture element has an assigned intensity

that ranges from 0 to 255 A grey scale image is what people normally call

a black and white image but the name emphasizes that such an image will

also include many shades of grey

FIG

153 COLOR IMAGE

The RGB colour model relates very closely to the way we perceive

colour with the r g and b receptors in our retinas RGB uses additive colour

mixing and is the basic colour model used in television or any other

medium that projects colour with light It is the basic colour model used in

computers and for web graphics but it cannot be used for print production

The secondary colours of RGB ndash cyan magenta and yellow ndash are formed

by mixing two of the primary colours (red green or blue) and excluding the

third colour Red and green combine to make yellow green and blue to

make cyan and blue and red form magenta The combination of red green

and blue in full intensity makes white

In Photoshop using the ldquoscreenrdquo mode for the different layers in an

image will make the intensities mix together according to the additive

colour mixing model This is analogous to stacking slide images on top of

each other and shining light through them

FIG

CMYK The 4-colour CMYK model used in printing lays down

overlapping layers of varying percentages of transparent cyan (C) magenta

(M) and yellow (Y) inks In addition a layer of black (K) ink can be added

The CMYK model uses the subtractive colour model

154 INDEXED IMAGE

FIG

An indexed image consists of an array and a color map matrix The

pixel values in the array are direct indices into a color map By convention

this documentation uses the variable name X to refer to the array and map

to refer to the color map In computing indexed color is a technique to

manage digital images colors in a limited fashion in order to save

computer memory and file storage while speeding up display refresh and

file transfers It is a form of vector quantization compression

When an image is encoded in this way color information is not

directly carried by the image pixel data but is stored in a separate piece of

data called a palette an array of color elements in which every element a

color is indexed by its position within the array The image pixels do not

contain the full specification of its color but only its index in the palette

This technique is sometimes referred as pseudocolor or indirect color as

colors are addressed indirectly

Perhaps the first device that supported palette colors was a random-

access frame buffer described in 1975 by Kajiya Sutherland and Cheadle

This supported a palette of 256 36-bit RGB colors

16 Applications of image processing

Interest in digital image processing methods stems from 2 principal

application areas

1) Improvement of pictorial information for human interpretation

2) Processing of scene data for autonomous machine perception

In the second application area interest focuses on procedures for

extracting from an image

Information in a form suitable for computer processing

Examples include automatic character recognition industrial machine

vision for product assembly and inspection military recognizance

automatic processing of fingerprints etc

17 EXISTING SYSTEM

In Crihalmeanu and Ross proposed three approaches Speed Up Robust

Features (SURF)-based method minutiae detection and direct correlation

matching for feature registration and matching Within these three methods

the SURF method achieves the best accuracy It takes an average of 15

seconds1 using the SURF method to perform a one-to-one matching In

Zhou et al Proposed line descriptor-based method for sclera vein

recognition The matching step (including registration) is the most time-

consuming step in this sclera vein recognition system which costs about 12

seconds to perform a one-to-one matching Both speed was calculated using

a PC with Intelreg Core TM 2 Duo 24GHz processors and 4 GB DRAM

Currently Sclera vein recognition algorithms are designed using central

processing unit (CPU)-based systems

171 DISADVANTAGES OF EXISTING SYSTEM

1 Mask files are used to calculate valid overlapping areas of two sclera

templates and to align the templates to the same coordinate system But the

mask files are large in size and will preoccupy the GPU memory and slow

down the data transfer Also some of processing on the mask files will

involve convolution which is difficult to improve its performance on the

scalar process unit on CUDA

2 The procedure of sclera feature matching consists of a pipeline of several

computational stages with different memory and processing requirements

There is no uniform mapping scheme applicable for all these stages

3 When the scale of sclera database is far larger than the number of the

processing units on the GPU parallel matching on the GPU is still unable to

satisfy the requirement of real-time performance

LITERATURE SURVEY

1 S Crihalmeanu and A Ross ldquo ieee Multispectral scleral patterns

for ocular biometric recognitionrdquo Pattern Recognit Lett vol 33 no

14 pp 1860ndash1869 Oct 2012

Face recognition in unconstrained acquisition conditions is one of the

most challenging problems that has been actively researched in recent

years It is well known that many state-of-the-arts still face recognition

algorithms perform well when constrained (frontal well illuminated high-

resolution sharp and full) face images are acquired However their

performance degrades significantly when the test images contain variations

that are not present in the training images In this paper we highlight some

of the key issues in remote face recognition We define the remote face

recognition as one where faces are several tens of meters (10-250m) from

the cameras We then describe a remote face database which has been

acquired in an unconstrained outdoor maritime environment Recognition

performance of a subset of existing still image-based face recognition

algorithms is evaluated on the remote face data set Further we define the

remote re-identification problem as matching a subject at one location with

candidate sets acquired at a different location and over time in remote

conditions We provide preliminary experimental results on remote re-

identification It is demonstrated that in addition to applying a good

classification algorithm finding features that are robust to variations

mentioned above and developing statistical models which can account for

these variations are very important for remote face recognition

2 R N Rakvic B J Ulis R P Broussard R W Ives and N

Steiner1 ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics

Security

With the rapidly expanded biometric data collected by various sectors

of government and industry for identification and verification purposes

how to manage and process such Big Data draws great concern Even

though modern processors are equipped with more cores and memory

capacity it still requires careful design in order to utilize the hardware

resource effectively and the power consumption efficiently This research

addresses this issue by investigating the workload characteristics of

biometric application Taking Daugmanrsquos iris matching algorithm which

has been proven to be the most reliable iris matching method as a case

study we conduct performance profiling and binary instrumentation on the

benchmark to capture its execution behavior The results show that data

loading and memory access incurs great performance overhead and

motivates us to move the biometrics computation to high-performance

architecture

Modern iris recognition algorithms can be computationally intensive

yet are designed for traditional sequential processing elements such as a

personal computer However a parallel processing alternative using field

programmable gate arrays (FPGAs) offers an opportunity to speed up iris

recognition Within the means of this project iris template generation with

directional filtering which is acomputationally expensive yet parallel

portion of a modern iris recognition algorithm is parallelized on an FPGA

system We will present a performance comparison of the parallelized

algorithm on the FPGA system to a traditional CPU-based version The

parallelized template generation outperforms an optimized C++ code

version determining the information content of an iris approximately 324

times faster

3 R Derakhshani A Ross and S Crihalmeanu ldquoA new biometric

modality based on conjunctival vasculaturerdquo in Proc Artif Neural

Netw Eng 2006 pp 1ndash8

A new biometric indicator based on the patterns of conjunctival

vasculature is proposed Conjunctival vessels can be observed on the visible

part of the sclera that is exposed to the outside world These vessels

demonstrate rich and specific details in visible light and can be easily

photographed using a regular digital camera In this paper we discuss

methods for conjunctival imaging preprocessing and feature extraction in

order to derive a suitable conjunctival vascular template for biometric

authentication Commensurate classification methods along with the

observed accuracy are discussed Experimental results suggest the potential

of using conjunctival vasculature as a biometric measure Identification of

a person based on some unique set of features is an important task The

human identification is possible with several biometric systems and sclera

recognition is one of the promising biometrics The sclera is the white

portion of the human eye The vein pattern seen in the sclera region is

unique to each person Thus the sclera vein pattern is a well suited

biometric technology for human identification The existing methods used

for sclera recognition have some drawbacks like only frontal looking

images are preferred for matching and rotation variance is another problem

These problems are completely eliminated in the proposed system by using

two feature extraction techniques They are Histogram of Oriented

Gradients (HOG) and converting the image into polar form using the

bilinear interpolation technique These two features help the proposed

system to become illumination invariant and rotation invariant The

experimentation is done with the help of UBIRIS database The

experimental result shows that the proposed sclera recognition method can

achieve better accuracy than the previous methods

4 J D Owens M Houston D Luebke S Green J E Stone and J

C Phillips ldquoGPU computingrdquo Proc IEEE vol 96 no 5pp 879ndash899

May 2008

The graphics processing unit (GPU) has become an integral part of

todayrsquos mainstream computing systems Over the past six years there has

been a marked increase in the performance and capabilities of GPUs The

modern GPU is not only a powerful graphics engine but also a highly

parallel programmable processor featuring peak arithmetic and memory

bandwidth that substantially outpaces its CPU counterpart The GPUrsquos

rapid increase in both programmability and capability has spawned a

research community that has successfully mapped a broad range of

computationally demanding complex problems to the GPU This effort in

general purpose computing on the GPU also known as GPU computing

has positioned the GPU as a compelling alternative to traditional

microprocessors in high-performance computer systems of the future We

describe the background hardware and programming model for GPU

computing summarize the state of the art in tools and techniques and

present four GPU computing successes in game physics and computational

biophysics that deliver order-of-magnitude performance gains over

optimized CPU applications

5 H Proenccedila and L A Alexandre ldquoUBIRIS A noisy iris image

databaserdquoin Proc 13th Int Conf Image Anal Process 2005 pp 970ndash

977

This paper proposes algorithms for iris segmentation quality

enhancement match score fusion and indexing to improve both the

accuracy and the speed of iris recognition A curve evolution approach is

proposed to effectively segment a nonideal iris image using the modified

MumfordndashShah functional Different enhancement algorithms are

concurrently applied on the segmented iris image to produce multiple

enhanced versions of the iris image A support-vector-machine-based

learning algorithm selects locally enhanced regions from each globally

enhanced image and combines these good-quality regions to create a single

high-quality iris image Two distinct features are extracted from the high-

quality iris image The global textural feature is extracted using the 1-D log

polar Gabor transform and the local topological feature is extracted using

Euler numbers An intelligent fusion algorithm combines the textural and

topological matching scores to further improve the iris recognition

performance and reduce the false rejection rate whereas an indexing

algorithm enables fast and accurate iris identification The verification and

identification performance of the propose algorithms is validated and

compared with other algorithms using the CASIA Version 3 ICE 2005 and

UBIRIS iris databases

18 PROPOSED METHOD

Proposed a new parallel sclera vein recognition method using a two-

stage parallel approach for registration and matching A parallel sclera

matching solution for Sclera vein recognition using our sequential line-

descriptor method using the CUDA GPU architecture CUDA is a highly

parallel multithreaded many-core processor with tremendous

computational power

It supports not only a traditional graphics pipeline but also computation

on non-graphical data It is relatively straightforward to implement our C

program for CUDA on AMD-based GPU using Open CL Our CUDA

kernels can be directly converted to Open CL kernels by concerning

different syntax for various keywords and built-in functions The mapping

strategy is also effective in Open CL if we regard thread and block in

CUDA as work item and work-group in Open CL Most of our optimization

techniques such as coalesced memory access and prefix sum can work in

Open CL too Moreover since CUDA is a data parallel architecture the

implementation of our approach by Open CL should be programmed in

data-parallel model

In this research we first discuss why the naiumlve parallel approach would

not work We then propose the new sclera descriptor ndash the Y shape sclera

feature-based efficient registration method to speed up the mapping scheme

introduce the ldquoweighted polar line (WPL) descriptorrdquo that would be better

suited for parallel computing to mitigate the mask size issue and develop

our coarse to fine two-stage matching process to dramatically improve the

matching speed These new approaches make the parallel processing

possible and efficient

191PROPOSED SYSTEM ADVANTAGES

1 To improve the efficiency in this research we propose a new descriptor

mdash the Y shape descriptor which can greatly help improve the efficiency of

the coarse registration of two images and can be used to filter out some

non-matching pairs before refined matching

2 We propose the coarse-to-fine two-stage matching process In the first

stage we matched two images coarsely using the Y-shape descriptors

which is very fast to match because no registration was needed The

matching result in this stage can help filter out image pairs with low

similarities

CHAPTER 2

PROJECT DESCRIPTION

21 INTRODUCTION

The sclera is the opaque and white outer layer of the eye The blood

vessel structure of sclera is formed randomly and is unique to each person

which can be used for humanrsquos identification Several researchers have

designed different Sclera vein recognition methods and have shown that it

is promising to use Sclera vein recognition for human identification In

Crihalmeanu and Ross proposed three approaches Speed Up Robust

Features (SURF)-based method minutiae detection and direct correlation

matching for feature registration and matching Within these three methods

the SURF method achieves the best accuracy It takes an average of 15

seconds1 using the SURF method to per- form a one-to-one matching Zhou

et al proposed line descriptor-based method for sclera vein recognition

The matching step (including registration) is the most time-consuming step

in this sclera vein recognition system which costs about 12 seconds to

perform a one-to-one matching Both speed was calculated using a PC with

Intelreg Coretrade 2 Duo 24GHz processors and 4 GB DRAM Currently

Sclera vein recognition algorithms are designed using central processing

unit (CPU)-based systems

As discussed CPU-based systems are designed as sequential

processing devices which may not be efficient in data processing where the

data can be parallelized Because of large time consumption in the matching

step Sclera vein recognition using sequential-based method would be very

challenging to be implemented in a real time biometric system especially

when there is large number of templates in the database for matching GPUs

(as abbreviation of General purpose Graphics Processing Units GPGPUs)

are now popularly used for parallel computing to improve the

computational processing speed and efficiency The highly parallel

structure of GPUs makes them more effective than CPUs for data

processing where processing can be performed in parallel GPUs have been

widely used in biometrics recognition such as speech recognition text

detection handwriting recognition and face recognition In iris

recognition GPU was used to extract the features construct descriptors

and match templates

GPUs are also used for object retrieval and image search Park et al

designed the performance evaluation of image processing algorithms such

as linear feature extraction and multi-view stereo matching on GPUs

However these approaches were designed for their specific biometric

recognition applications and feature searching methods Therefore they may

not be efficient for Sclera vein recognition Compute Unified Device

Architecture (CUDA) the computing engine of NVIDIA GPUs is used in

this research CUDA is a highly parallel multithreaded many-core

processor with tremendous computational power It supports not only a

traditional graphics pipeline but also computation on non-graphical data

More importantly it offers an easier programming platform which

outperforms its CPU counterparts in terms of peak arithmetic intensity and

memory bandwidth In this research the goal is not to develop a unified

strategy to parallelize all sclera matching methods because each method is

quite different from one another and would need customized design To

develop an efficient parallel computing scheme it would need different

strategies for different Sclera vein recognition methods

Rather the goal is to develop a parallel sclera matching solution for

Sclera vein recognition using our sequential line-descriptor method using

the CUDA GPU architecture However the parallelization strategies

developed in this research can be applied to design parallel approaches for

other Sclera vein recognition methods and help parallelize general pattern

recognition methods Based on the matching approach in there are three

challenges to map the task of sclera feature matching to GPU

1) Mask files are used to calculate valid overlapping areas of two sclera

templates and to align the templates to the same coordinate system But the

mask files are large in size and will preoccupy the GPU memory and slow

down the data transfer Also some of processing on the mask files will

involve convolution which is difficult to improve its performance on the

scalar process unit on CUDA

2) The procedure of sclera feature matching consists of a pipeline of several

computational stages with different memory and processing requirements

There is no uniform mapping scheme applicable for all these stages

3) When the scale of sclera database is far larger than the number of the

processing units on the GPU parallel matching on the GPU is still unable to

satisfy the requirement of real-time performance New designs are

necessary to help narrow down the search range In summary naiumlve

implementation of the algorithms in parallel would not work efficiently

Note it is relatively straightforward to implement our C program for

CUDA on AMD-based GPU using Open CL Our CUDA kernels can be

directly converted to Open CL kernels by concerning different syntax for

various keywords and built-in functions The mapping strategy is also

effective in Open CL if we regard thread and block in CUDA as work item

and work-group in Open CL Most of our optimization techniques such as

coalesced memory access and prefix sum can work in Open CL too

Moreover since CUDA is a data parallel architecture the implementation

of our approach by Open CL should be programmed in data-parallel model

In this research we first discuss why the naiumlve parallel approach would not

work (Section 3) We then propose the new sclera descriptor ndash the Y shape

sclera feature-based efficient registration method to speed up the mapping

scheme (Section 4) introduce the ldquoweighted polar line (WPL) descriptorrdquo

that would be better suited for parallel computing to mitigate the mask size

issue (Section 5) and develop our coarse to fine two-stage matching

process to dramatically improve the matching speed (Section 6) These new

approaches make the parallel processing possible and efficient However it

is non-trivial to implement these algorithms in CUDA We then developed

the implementation schemes to map our algorithms into CUDA (Section 7)

In the Section 2 we give brief introduction of Sclera vein recognition In

the Section 8 we performed some experiments using the proposed system

In the Section 9 we draw some conclusions

22 BACKGROUND OF SCLERA VEIN RECOGNITION

221 OVERVIEW OF SCLERA VEIN RECOGNITION

A typical sclera vein recognition system includes sclera

segmentation feature enhancement feature extraction and feature

matching (Figure 1)

FIG

Sclera image segmentation is the first step in sclera vein recognition

Several methods have been designed for sclera segmentation Crihalmeanu

et al presented an semi-automated system for sclera segmentation They

used a clustering algorithm to classify the color eye images into three

clusters - sclera iris and background Later on Crihalmeanu and Ross

designed a segmentation approach based on a normalized sclera index

measure which includes coarse sclera segmentation pupil region

segmentation and fine sclera segmentation Zhou et al developed a skin

tone plus ldquowhite colorrdquo-based voting method for sclera segmentation in

color images and Otsursquos thresholding-based method for grayscale images

After sclera segmentation it is necessary to enhance and extract the sclera

features since the sclera vein patterns often lack contrast and are hard to

detect Zhou et al used a bank of multi-directional Gabor filters for

vascular pattern enhancement Derakhshani et al used contrast limited

adaptive histogram equalization (CLAHE) to enhance the green color plane

of the RGB image and a multi-scale region growing approach to identify

the sclera veins from the image background Crihalmeanu and Ross applied

a selective enhancement filter for blood vessels to extract features from the

green component in a color image In the feature matching step

Crihalmeanu and Ross proposed

three registration and matching approaches including Speed Up Robust

Features (SURF) which is based on interest-point detection minutiae

detection which is based on minutiae points on the vasculature structure

and direct correlation matching which relies on image registration Zhou et

al designed a line descriptor based feature registration and matching

method

The proposed sclera recognition consists of five steps which include

sclera segmentation vein pattern enhancement feature extraction feature

matching and matching decision Fig 2 shows the block diagram of sclera

recognition Two types of feature extraction are used in the proposed

method to achieve good accuracy for the identification The characteristics

that are elicited from the blood vessel structure seen in the sclera region are

Histogram of Oriented Gradients (HOG) and interpolation of Cartesian to

Polar conversion HOG is used to determine the gradient orientation and

edge orientations of vein pattern in the sclera region of an eye image To

become more computationally efficient the data of the image are converted

to the polar form It is mainly used for circular or quasi circular shape of

object These two characteristics are extracted from all the images in the

database and compared with the features of the query image whether the

person is correctly identified or not This procedure is done in the feature

matching step and ultimately makes the matching decision By using the

proposed feature extraction methods and matching techniques the human

identification is more accurate than the existing studies In the proposed

method two features of an image are drawn out

222 SCLERA SEGMENTATION

Sclera segmentation is the first step in the sclera recognition It lets

in three steps glare area detection sclera area estimation and iris and eyelid

detection and refinement Fig shows the steps of segmentation

FIG

Glare Area Detection Glare area means a small bright area near

pupil or iris This is the unwanted portion on the eye image Sobel filter is

applied to detect the glare area present in the iris or pupil Simply it runs

only for the grayscale image If the image is color then it needs a

conversion to grayscale image and after that apply it to the Sobel filter to

detect the glare area Fig 4 shows the result of the glare area detection

FIG

Sclera area estimation For the estimation of sclera area Otsursquos

thresholding method is applied The stairs of the sclera area detection are

selection of the area of interest (ROI) Otsursquos thresholding sclera area

detection Left and right sclera area is selected based on the iris boundaries

When the region of interest is selected then apply Otsursquos thresholding for

obtaining the potential sclera areas The correct left sclera area should be

placed in the right and center positions and correct right sclera area should

be placed in the left and center In this way non sclera areas are wiped out

223 IRIS AND EYELID REFINEMENT

The top and underside of the sclera regions are the limits of the

sclera area And then that upper eyelid lower eyelid and iris boundaries are

refined These altogether are the unwanted portion for recognition In order

to eliminate these effects refinement is done in the footstep of the detection

of sclera area Fig shows after the Otsursquos thresholding process and iris and

eyelid refinement to detect right sclera area In the same way the left sclera

area is detected using this method

FIG

In the segmentation process all images are not perfectly segmented

Hence feature extraction and matching are needed to reduce the

segmentation fault The vein patterns in the sclera area are not visible in the

segmentation process To get vein patterns more visible vein pattern

enhancement is to be performed

224 OCULAR SURFACE VASCULATURE

Human recognition using vascular patterns in the human body has

been studied in the context of fingers (Miura et al 2004) palm (Lin and

Fan 2004) and retina (Hill 1999) In the case of retinal biometrics an

especial optical device for imaging the back of the eyeball is needed (Hill

1999) Due to its perceived invasiveness and the required degree of subject

cooperation the use of retinal biometrics may not be acceptable to some

individuals The conjunctiva is a thin transparent and moist tissue that

covers the outer surface of the eye The part of the conjunctiva that covers

the inner lining of the eyelids is called palpebral conjunctiva and the part

that covers the outer surface of the eye is called ocular (or the bulbar)

conjunctiva which is the focus of this study The ocular conjunctiva is very

thin and clear thus the vasculature (including those of the episclera) is

easily visible through it The visible microcirculation of conjunctiva offers a

rich and complex network of veins and fine microcirculation (Fig 1) The

apparent complexity and specificity of these vascular patterns motivated us

to utilize them for personal identification (Derakhshani and Ross 2006)

FIG

We have found conjunctival vasculature to be a suitable biometric as it

conforms to the following criteria (Jain et al 2004)

UNIVERSALITY All normal living tissues including that of the

conjunctiva and episclera have vascular structure

UNIQUENESS Vasculature is created during embryonic vasculogenesis

Its detailed final structure is mostly stochastic and thus unique Even

though no comprehensive study on the uniqueness of vascular structures

has been conducted study of some targeted areas such as those of the eye

fundus confirm the uniqueness of such vascular patterns even between

identical twins (Simon and Goldstein 1935 Tower 1955)

PERMANENCE Other than cases of significant trauma pathology or

chemical intervention spontaneous adult ocular vasculogenesis and

angiogenesis does not easily occur Thus the conjunctival vascular

structure is expected to have reasonable permanence (Joussen 2001)

Practicality Conjunctival vasculature can be captured with commercial off

the shelf digital cameras under normal lighting conditions making this

modality highly practical

ACCEPTABILITY Since the subject is not required to stare directly into

the camera lens and given the possibility of capturing the conjunctival

vasculature from several feet away this modality is non-intrusive and thus

more acceptable

SPOOF-PROOFNESS The fine multi surface structure of the ocular

veins makes them hard to reproduce as a physical artifact Besides being a

stand-alone biometric modality we anticipate that the addition of

conjunctival biometrics will enhance the performance of current iris-based

biometric system in the following ways

Improving accuracy by the addition of vascular features

Facilitating recognition using off-angle iris images For instance if the iris

information is relegated to the left or right portions of the eye the sclera

vein patterns will be further exposed This feature makes sclera vasculature

a natural complement to the iris biometric

Addressing the failure-to-enroll issue when iris patterns are not usable (eg

due to surgical procedures)

Reducing vulnerability to spoof attacks For instance when implemented

alongside iris systems an attacker needs to reproduce not only the iris but

also different surfaces of the sclera along with the associated

microcirculation and make them available on commensurate eye surfaces

The first step in parallelizing an algorithm is to determine the

availability for simultaneous computation Below Figure demonstrates the

possibility for parallel directional filtering Since the filter is computed over

different portions of the input image the computation can be computed in

parallel (denoted by Elements below) In addition individual parallelization

of each element of Filtering can also be performed A detailed discussion of

our proposed parallelization is outside the scope of this paper

FIG

FIG

225 OVERVIEW OF THE LINE DESCRIPTOR-BASED SCLERA

VEIN

2251 RECOGNITION METHOD

The matching segment of the line-descriptor based method is a

bottleneck with regard to matching speed In this section we briefly

describe the Line Descriptor-based sclera vein recognition method After

segmentation vein patterns were enhanced by a bank of directional Gabor

filters Binary morphological operations are used to thin the detected vein

structure down to a single pixel wide skeleton and remove the branch

points The line descriptor is used to describe the segments in the vein

structure Figure 2 shows a visual description of the line descriptor Each

segment is described by three quantities the segments angle to some

reference angle at the iris center θ the segments distance to the iris center r

and the dominant angular orientation of the line segment ɸ Thus the

descriptor is S = ( θ r ɸ )T The individual components of the line descriptor

are calculated as

FIG

Here fline (x) is the polynomial approximation of the line segment (xl yl )

is the center point of the line segment (xi yi ) is the center of the detected

iris and S is the line descriptor In order to register the segments of the

vascular patterns a RANSAC-based algorithm is used to estimate the best-

fit parameters for registration between the two sclera vascular patterns For

the registration algorithm it randomly chooses two points ndash one from the

test template and one from the target template It also randomly chooses a

scaling factor and a rotation value based on a priori knowledge of the

database Using these values it calculates a fitness value for the registration

using these parameters

After sclera template registration each line segment in the test

template is compared to the line segments in the target template for

matches In order to reduce the effect of segmentation errors we created the

weighting image (Figure 3) from the sclera mask by setting interior pixels

in the sclera mask to 1 pixels within some distance of the boundary of the

mask to 05 and pixels outside the mask to 0

The matching score for two segment descriptors is calculated By

where Si and Sj are two segment descriptors m(Si Sj ) is the matching

score between segments Si and Sj d(Si Sj ) is the Euclidean distance

between the segment descriptors center points (from Eq 6-8) Dmatch is

the matching distance threshold and match is the matching angle threshold

The total matching score M is the sum of the individual matching scores

divided by the maximum matching score for the minimal set between the

test and target template That is one of the test or target templates has fewer

points and thus the sum of its descriptors weight sets the maximum score

that can be attained

FIG

FIG

FIG

FIG

movement of eye Y shape branches are observed to be a stable feature and

can be used as sclera feature descriptor To detect the Y shape branches in

the original template we search for the nearest neighbors set of every line

segment in a regular distance classified the angles among these neighbors

If there were two types of angle values in the line segment set this set may

be inferred as a Y shape structure and the line segment angles would be

recorded as a new feature of the sclera

There are two ways to measure both orientation and relationship of

every branch of Y shape vessels one is to use the angles of every branch to

x axle the other is to use the angels between branch and iris radial

direction The first method needs additional rotation operating to align the

template In our approach we employed the second method As Figure 6

shows ϕ1 ϕ2 and ϕ3 denote the angle between each branch and the radius

from pupil center Even when the head tilts the eye moves or the camera

zooms occurs at the image acquisition step ϕ1 ϕ2 and ϕ3 are quite stable

To tolerate errors from the pupil center calculation in the segmentation step

we also recorded the center position (x y) of the Y shape branches as

auxiliary parameters So in our rotation shift and scale invariant feature

vector is defined as y(ϕ1 ϕ2 ϕ3 x y) The Y-shape descriptor is generated

with reference to the iris center Therefore it is automatically aligned to the

iris centers It is a rotational- and scale- invariant descriptor V WPL

SCLERA DESCRIPTOR As we discussed in the Section 22 the line

descriptor is extracted from the skeleton of vessel structure in binary images

(Figure 7) The skeleton is then broken into smaller segments For each

segment a line descriptor is created to record the center and orientation of

the segment This descriptor is expressed as s(x yɸ) where (x y) is the

position of the center and ɸ is its orientation Because of the limitation of

segmentation accuracy the descriptor in the boundary of sclera area might

not be accurate and may contain spur edges resulting from the iris eyelid

andor eyelashes To be tolerant of such error the mask file

FIG

The line descriptor of the sclera vessel pattern (a) An eye image (b) Vessel

patterns in sclera (c) Enhanced sclera vessel patterns (d) Centers of line

segments of vessel patterns

Is designed to indicate whether a line segment belongs to the edge of the

sclera or not However in GPU application using the mask is a challenging

since the mask files are large in size and will occupy the GPU memory and

slow down the data transfer When matching the registration RANSAC

type algorithm was used to randomly select the corresponding descriptors

and the transform parameter between them was used to generate the

template transform affine matrix After every templates transform the mask

data should also be transformed and new boundary should be calculated to

evaluate the weight of the transformed descriptor This results in too many

convolutions in processor unit

To reduce heavy data transfer and computation we designed the

weighted polar line (WPL) descriptor structure which includes the

information of mask and can be automatically aligned We extracted the

relationship of geometric feature of descriptors and store them as a new

descriptor We use a weighted image created via setting various weight

values according to their positions The weight of those descriptors who are

beyond the sclera are set to be 0 and those who are near the sclera

boundary are 05 and interior descriptors are set to be 1 In our work

descriptors weights were calculated on their own mask by the CPU only

once

The calculating result was saved as a component of descriptor The

descriptor of sclera will change to s(x y ɸw) where w denotes the weight

of the point and the value may be 0 05 1 To align two templates when a

template is shifted to another location along the line connecting their

centers all the descriptors of that template will be transformed It would be

faster if two templates have similar reference points If we use the center of

the iris as the reference point when two templates are compared the

correspondence will automatically be aligned to each other since they have

the similar reference point Every feature vector of the template is a set of

line segment descriptors composed of three variable (Figure 8) the

segment angle to the reference line which went through the iris center

denoted as θ the distance between the segments center and pupil center

which is denoted as r the dominant angular orientation of the segment

denoted as ɸ To minimize the GPU computing we also convert the

descriptor value from polar coordinate to rectangular coordinate in CPU

preprocess

The descriptor vector becomes s(x y r θ ɸw) The left and right

parts of sclera in an eye may have different registration parameters For

example as an eyeball moves left left part sclera patterns of the eye may be

compressed while the right part sclera patterns are stretched

In parallel matching these two parts are assigned to threads in

different warps to allow different deformation The multiprocessor in

CUDA manages threads in groups of 32 parallel threads called warps We

reorganized the descriptor from same sides and saved

FIG

FIG

them in continuous address This would meet requirement of coalesced

memory access in GPU

After reorganizing the structure of descriptors and adding mask information

into the new descriptor the computation on the mask file is not needed on

the GPU It was very fast to match with this feature because it does not

need to reregister the templates every time after shifting Thus the cost of

data transfer and computation on GPU will be reduced Matching on the

new descriptor the shift parameters generator in Figure 4 is then simplified

as Figure 9

23 EVOLUTION OF GPU ARCHITECTURE

The fixed-function pipeline lacked the generality to efficiently express

more complicated shading and lighting operations that are essential for

complex effects The key step was replacing the fixed-function per-vertex

and per-fragment operations with user-specified programs run on each

vertex and fragment Over the past six years these vertex programs and

fragment programs have become increasingly more capable with larger

limits on their size and resource consumption with more fully featured

instruction sets and with more flexible control-flow operations After many

years of separate instruction sets for vertex and fragment operations current

GPUs support the unified Shader Model 40 on both vertex and fragment

shaders

The hardware must support shader programs of at least 65 k static

instructions and unlimited dynamic instructions

The instruction set for the first time supports both 32-bit integers and 32-

bit floating-point numbers

The hardware must allow an arbitrary number of both direct and indirect

reads from global memory (texture)

Finally dynamic flow control in the form of loops and branches must be

supported

As the shader model has evolved and become more powerful and GPU

applications of all types have increased vertex and fragment program

complexity GPU architectures have increasingly focused on the

programmable parts of the graphics pipeline Indeed while previous

generations of GPUs could best be described as additions of

programmability to a fixed-function pipeline todayrsquos GPUs are better

characterized as a programmable engine surrounded by supporting fixed-

function units General-Purpose Computing on the GPU Mapping general-

purpose computation onto the GPU uses the graphics hardware in much the

same way as any standard graphics application Because of this similarity it

is both easier and more difficult to explain the process On one hand the

actual operations are the same and are easy to follow on the other hand the

terminology is different between graphics and general-purpose use Harris

provides an excellent description of this mapping process

We begin by describing GPU programming using graphics terminology

then show how the same steps are used in a general-purpose way to author

GPGPU applications and finally use the same steps to show the more

simple and direct way that todayrsquos GPU computing applications are written

231 PROGRAMMING A GPU FOR GRAPHICS

We begin with the same GPU pipeline that we described in Section II

concentrating on the programmable aspects of this pipeline

The programmer specifies geometry that covers a region on the screen

The rasterizer generates a fragment at each pixel location covered by that

geometry

Each fragment is shaded by the fragment program

The fragment program computes the value of the fragment by a

combination of math operations and global memory reads from a global

Btexture[ memory

The resulting image can then be used as texture on future passes through

the graphics pipeline

232 PROGRAMMING A GPU FOR GENERAL-PURPOSE

PROGRAMS (OLD)

Coopting this pipeline to perform general-purpose computation

involves the exact same steps but different terminology A motivating

example is a fluid simulation computed over a grid at each time step we

compute the next state of the fluid for each grid point from the current state

at its grid point and at the grid points of its neighbors

The programmer specifies a geometric primitive that covers a

computation domain of interest The rasterizer generates a fragment at each

pixel location covered by that geometry (In our example our primitive

must cover a grid of fragments equal to the domain size of our fluid

simulation)

Each fragment is shaded by an SPMD general-purpose fragment

program (Each grid point runs the same program to update the state of its

fluid)

The fragment program computes the value of the fragment by a

combination of math operations and Bgather[ accesses from global

memory (Each grid point can access the state of its neighbors from the

previous time step in computing its current value)

The resulting buffer in global memory can then be used as an input on

future passes (The current state of the fluid will be used on the next time

step)

23 2 PROGRAMMING A GPU FOR GENERAL-PURPOSE

PROGRAMS (NEW)

One of the historical difficulties in programming GPGPU applications

has been that despite their general-purpose tasksrsquo having nothing to do with

graphics the applications still had to be programmed using graphics APIs

In addition the program had to be structured in terms of the graphics

pipeline with the programmable units only accessible as an intermediate

step in that pipeline when the programmer would almost certainly prefer to

access the programmable units directly The programming environments we

describe in detail in Section IV are solving this difficulty by providing a

more natural direct non-graphics interface to the hardware and

specifically the programmable units Today GPU computing applications

are structured in the following way

The programmer directly defines the computation domain of interest as a

structured grid of threads

An SPMD general-purpose program computes the value of each thread

The value for each thread is computed by a combination of math

operations and both Bgather[ (read) accesses from and Bscatter[ (write)

accesses to global memory Unlike in the previous two

methods the same buffer can be used for both reading and writing

allowing more flexible algorithms (for example in-place algorithms that

use less memory)

The resulting buffer in global memory can then be used as an input in

future computation

24 COARSE-TO-FINE TWO-STAGE MATCHING PROCESS

To further improve the matching process we propose the coarse-to-fine

two-stage matching process In the first stage we matched two images

coarsely using the Y-shape descriptors which is very fast to match because

no registration was needed The matching result in this stage can help filter

out image pairs with low similarities After this step it is still possible for

some false positive matches In the second stage we used WPL descriptor

to register the two images for more detailed descriptor matching including

scale- and translation invariance This stage includes shift transform affine

matrix generation and final WPL descriptor matching Overall we

partitioned the registration and matching processing into four kernels2 in

CUDA (Figure 10) matching on the Y shape descriptor shift

transformation affine matrix generation and final WSL descriptor

matching Combining these two stages the matching program can run faster

and

achieve more accurate score

241 STAGE I MATCHING WITH Y SHAPE DESCRIPTOR

Due to scale- and rotation- invariance of the Y-shape features

registration is unnecessary before matching on Y shape descriptor The

whole matching algorithm is listed as algorithm 1

FIG

Here ytei and yta j are the Y shape descriptors of test template Tte

and target template Tta respectively dϕ is the Euclidian distance of angle

element of descriptors vector defined as (3) dxy is the Euclidian distance of

two descriptor centers defined as (4) ni and di are the matched descriptor

pairsrsquo number and their centers distance respectively tϕ is a distance

threshold and txy is the threshold to restrict the searching area We set tϕ to

30 and txy to 675 in our experiment Here

To match two sclera templates we searched the areas nearby to all

the Y shape branches The search area is limited to the corresponding left or

right half of the sclera in order to reduce the searching range and time The

distance of two branches is defined in (3) where ϕi j is the angle between

the j th branch and the polar from pupil center in desctiptor i

The number of matched pairs ni and the distance between Y shape

branches centers di are stored as the matching result We fuse the number of

matched branches and the average distance between matched branches

centers as (2) Here α is a factor to fuse the matching score which was set

to 30 in our study Ni and Nj is the total numbers of feature vectors in

template i and j separately The decision is regulated by the threshold t if

the sclerarsquos matching score is lower than t the sclera will be discarded The

sclera with high matching score will be passed to the next more precisely

matching process

242 STAGE II FINE MATCHING USING WPL DESCRIPTOR

The line segment WSL descriptor reveals more vessel structure detail of

sclera than the Y shape descriptor The variation of sclera vessel pattern is

nonlinear because

When acquiring an eye image in different gaze angle the vessel structure

will appear nonlinear shrink or extend because eyeball is spherical in shape

sclera is made up of four layers episclera stroma lamina fusca and

endothelium There are slightly differences among movement of these

layers Considering these factors our registration employed both single

shift transform and multi-parameter transform which combines shift

rotation and scale together

1) SHIFT PARAMETER SEARCH As we discussed before

segmentation may not be accurate As a result the detected iris center could

not be very accurate Shift transform is designed to tolerant possible errors

in pupil center detection in the segmentation step If there is no deformation

or only very minor deformation registration with shift transform together

would be adequate to achieve an accurate result We designed Algorithm 2

to get optimized shift parameter Where Tte is the test template and ssei is

the i th WPL descriptor of Tte Tta is the target template and ssai is the i th

WPL descriptor of Ttad(stek staj ) is Euclidean distance of escriptors stek

and staj

Δsk is the shift value of two descriptors defines as

We first randomly select an equal number of segment descriptors

stek in test template Tte from each quad and find its nearest neighbors staj _

in target template Tta The shift offset of them is recorded as the possible

registration shift factor _sk The final offset registration factor is _soptim

which has the smallest standard deviation among these candidate offsets

2) AFFINE TRANSFORM PARAMETER SEARCH

Affine transform is designed to tolerant some deformation of sclera

patterns in the matching step The affine transform algorithm is shown in

Algorithm 3 The shift value in the parameter set is obtained by randomly

selecting descriptor s(it )te and calculating the distance from its nearest

neighbor staj_ in Tta We transform the test template by the matrix in (7)

At end of the iteration we count the numbers of matched descriptor pairs

from the transformed template and the target template The factor β is

involved to determine if the pair of descriptor is matched and we set it to

be 20 pixels in our experiment After N iterations the optimized transform

parameter set is determined via selecting the maximum matching numbers

m(it) Here stei Tte staj and Tta is defined same as algorithm 2 tr (it )

shi f t θ(it )tr (it ) scale is the parameters of shift rotation and scale

transform generated in i tth iteration R(θ (it )) T (tr (it ) shi f t ) and S(tr (it

) scale) are the transform matrix defined as (7) To search optimize

transform parameter we iterated N times to generate these parameters In

our experiment we set iteration time to 512

3) REGISTRATION AND MATCHING ALGORITHM

Using the optimized parameter set determined from Algorithms 2

and 3 the test template will be registered and matched simultaneously The

registration and matching algorithm is listed in Algorithm 4 Here stei Tte

staj and Tta are defined same as Algorithms 2 and 3 θ(optm) tr (optm) shi f

t tr (optm) scale _soptim are the registration parameters attained from

Algorithms 2 and 3 R_θ(optm)_T _tr (optm) shi f t _S_tr (optm) scale

_ is the descriptor transform matrix defined in Algorithm 3 empty is the angle

between the segment descriptor and radius direction w is the weight of the

descriptor which indicates whether the descriptor is at the edge of sclera or

not To ensure that the nearest descriptors have a similar orientation we

used a constant factor α to check the abstract difference of two ɸ In our

experiment we set α to 5 The total matching score is minimal score of two

transformed result divided by the minimal matching score for test template

and target template

25 MAPPING THE SUBTASKS TO CUDA

CUDA is a single instruction multiple data (SIMD) system and

works as a coprocessor with a CPU A CUDA consists of many streaming

multiprocessors (SM) where the parallel part of the program should be

partitioned into threads by the programmer and mapped into those threads

There are multiple memory spaces in the CUDA memory hierarchy

register local memory shared memory global memory constant memory

and texture memory Register local memory and shared memory are on-

chip and could be a little time consuming to access these memories Only

shared memory can be accessed by other threads within the same block

However there is only limited availability of shared memory Global

memory constant memory and texture memory are off-chip memory and

accessible by all threads which would be very time consuming to access

these memories

Constant memory and texture memory are read-only and cacheable

memory Mapping algorithms to CUDA to achieve efficient processing is

not a trivial task There are several challenges in CUDA programming

If threads in a warp have different control path all the branches will be

executed serially To improve performance branch divergence within a

warp should be avoided

Global memory is slower than on-chip memory in term of access To

completely hide the latency of the small instructions set we should use on-

chip memory preferentially rather than global memory When global

memory access occurs threads in same warp should access the words in

sequence to achieve coalescence

Shared memory is much faster than the local and global memory space

But shared memory is organized into banks which are equal in size If two

addresses of memory request from different thread within a warp fall in the

same memory bank the access will be serialized To get maximum

performance memory requests should be scheduled to minimize bank

conflicts

251 MAPPING ALGORITHM TO BLOCKS

Because the proposed registration and matching algorithm has four

independent modules all the modules will be converted to different kernels

on the GPU These kernels are different in computation density thus we

map them to the GPU by various map strategies to fully utilize the

computing power of CUDA Figure 11 shows our scheme of CPU-GPU

task distribution and the partition among blocks and threads Algorithm 1 is

partitioned into coarse-grained parallel subtasks

We create a number of threads in this kernel The number of threads

is the same as the number of templates in the database As the upper middle

column shows in Figure 11 each target template will be assigned to one

thread One thread performs a pair of templates compare In our work we

use NVIDIA C2070 as our GPU Threads and blocks number is set to

1024 That means we can match our test template with up to 1024times1024

target templates at same time

Algorithms 2-4 will be partitioned into fine-grained subtasks which is

processed a section of descriptors in one thread As the lower portion of the

middle column shows in Figure 11 we assigned a target template to one

block Inside a block one thread corresponds a set of descriptors in this

template This partition makes every block execute independently and there

are no data exchange requirements between different blocks When all

threads complete their responding descriptor fractions the sum of the

intermediate results needs to be computed or compared A parallel prefix

sum algorithm is used to calculate the sum of intermediate results which is

show in right of Figure 11 Firstly all odd number threads compute the sum

of consecutive pairs of the results Then recursively every first of i (= 4 8

16 32 64 ) threads

compute the prefix sum on the new result The final result will be saved in

the first address which has the same variable name as the first intermediate

result

252 MAPPING INSIDE BLOCK

In shift argument searching there are two schemes we can choose to

map task

Mapping one pair of templates to all the threads in a block and then every

thread would take charge of a fraction of descriptors and cooperation with

other threads

Assigning a single possible shift offset to a thread and all the threads will

compute independently unless the final result should be compared with

other possible offset

Due to great number of sum and synchronization operations in every

nearest neighbor searching step we choose the second method to parallelize

shift searching In affine matrix generator we mapped an entire parameter

set searching to a thread and every thread randomly generated a set of

parameters and tried them independently The generated iterations were

assigned to all threads The challenge of this step is the randomly generated

numbers might be correlated among threads In the step of rotation and

scale registration generating we used the Mersenne Twister pseudorandom

number generator because it can use bitwise arithmetic and have long

period

The Mersenne twister as most of pseudorandom generators is iterative

Therefore itrsquos hard to parallelize a single twister state update step among

several execution threads To make sure that thousands of threads in the

launch grid generate uncorrelated random sequence many simultaneous

Mersenne twisters need to process with different initial states in parallel

But even ldquovery differentrdquo (by any definition) initial state values do not

prevent the emission of correlated sequences by each generator sharing

identical parameters To solve this problem and to enable efficient

implementation of Mersenne Twister on parallel architectures we used a

special offline tool for the dynamic creation of Mersenne Twisters

parameters modified from the algorithm developed by Makoto Matsumoto

and Takuji Nishimura In the registration and matching step when

searching the nearest neighbor a line segment that has already matched

with others should not be used again In our approach a flag

FIG

FIG

Variable denoting whether the line has been matched is stored in

shared memory To share the flags all the threads in a block should wait

synchronic operation at every query step Our solution is to use a single

thread in a block to process the matching

253 MEMORY MANAGEMENT

The bandwidth inside GPU board is much higher than the

bandwidth between host memory and device memory The data transfer

between host and device can lead to long latency As shown in Figure 11

we load the entire target templates set from database without considering

when they would be processed Therefore there was no data transfer from

host to device during the matching procedure In global memory the

components in descriptor y(ϕ1 ϕ2 ϕ3 x y) and s(x y rθ ϕw) were stored

separately This would guarantee contiguous kernels of Algorithm 2 to 4

can access their data in successive addresses Although such coalescing

access reduces the latency frequently global memory access was still a

slower way to get data In our kernel we loaded the test template to shared

memory to accelerate memory access Because the Algorithms 2 to 4

execute different number of iterations on same data the bank conflict does

not happen To maximize our texture memory space we set the system

cache to the lowest value and bonded our target descriptor to texture

memory Using this catchable memory our data access was accelerated

more

FIG

26 HISTOGRAM OF ORIENTED GRADIENTS

Histogram of oriented gradients is the feature descriptors It is primarily

applied to the design of target detection In this paper it is applied as the

feature for human recognition In the sclera region the vein patterns are the

edges of an image So HOG is used to determine the gradient orientation

and edge orientations of vein pattern in the sclera region of an eye image

To follow out this technique first of all divide the image into small

connected regions called cells For each cell compute the histogram of

gradient directions or edge orientations of the pixels Then the combination

of different histogram of different cell represents the descriptor To improve

accuracy histograms can be contrast normalized by calculating the intensity

from the block and then using this value normalizes all cells within the

block This normalization result shows that it is invariant to geometric and

photometric changes The gradient magnitude m(x y) and orientation 1050592(x

y) are calculated using x and y directions gradients dx (x y) and dy (x y)

Orientation binning is the second step of HOG This method utilized

to create cell histograms Each pixel within the cell used to give a weight to

the orientation which is found in the gradient computation Gradient

magnitude is used as the weight The cells are in the rectangular form The

binning of gradient orientation should be spread over 0 to 180 degrees and

opposite direction counts as the same In the Fig 8 depicts the edge

orientation of picture elements If the images have any illumination and

contrast changes then the gradient strength must be locally normalized For

that cells are grouped together into larger blocks These blocks are

overlapping so that each cell contributes more than once to the final

descriptor Here rectangular HOG (R-HOG) blocks are applied which are

mainly in square grids The performance of HOG is improved by putting

on a Gaussian window into each block

FIG

CHAPTER 3

SOFTWARE SPECIFICATION

31 GENERAL

MATLAB(matrix laboratory) is a numerical

computing environment and fourth-generation programming language

Developed by Math Works MATLAB allows matrix manipulations

plotting of functions and data implementation of algorithms creation

of user interfaces and interfacing with programs written in other languages

including C C++ Java and Fortran

Although MATLAB is intended primarily for numerical computing an

optional toolbox uses the MuPAD symbolic engine allowing access

to symbolic computing capabilities An additional package Simulink adds

graphicalmulti-domainsimulationandModel-Based

Design for dynamic and embedded systems

In 2004 MATLAB had around one million users across industry

and academia MATLAB users come from various backgrounds

of engineering science and economics MATLAB is widely used in

academic and research institutions as well as industrial enterprises

MATLAB was first adopted by researchers and practitioners

in control engineering Littles specialty but quickly spread to many other

domains It is now also used in education in particular the teaching

of linear algebra and numerical analysis and is popular amongst scientists

involved in image processing The MATLAB application is built around the

MATLAB language The simplest way to execute MATLAB code is to type

it in the Command Window which is one of the elements of the MATLAB

Desktop When code is entered in the Command Window MATLAB can

be used as an interactive mathematical shell Sequences of commands can

be saved in a text file typically using the MATLAB Editor as a script or

encapsulated into a function extending the commands available

MATLAB provides a number of features for documenting and

sharing your work You can integrate your MATLAB code with other

languages and applications and distribute your MATLAB algorithms and

applications

32 FEATURES OF MATLAB

High-level language for technical computing

Development environment for managing code files and data

Interactive tools for iterative exploration design and problem solving

Mathematical functions for linear algebra statistics Fourier analysis

filtering optimization and numerical integration

2-D and 3-D graphics functions for visualizing data

Tools for building custom graphical user interfaces

Functions for integrating MATLAB based algorithms with external

applications and languages such as C C++ FORTRAN Javatrade COM

and Microsoft Excel

MATLAB is used in vast area including signal and image

processing communications control design test and measurement

financial modeling and analysis and computational Add-on toolboxes

(collections of special-purpose MATLAB functions) extend the MATLAB

environment to solve particular classes of problems in these application

areas

MATLAB can be used on personal computers and powerful

server systems including the Cheaha compute cluster With the addition of

the Parallel Computing Toolbox the language can be extended with parallel

implementations for common computational functions including for-loop

unrolling Additionally this toolbox supports offloading computationally

intensive workloads to Cheaha the campus compute cluster MATLAB is

one of a few languages in which each variable is a matrix (broadly

construed) and knows how big it is Moreover the fundamental operators

(eg addition multiplication) are programmed to deal with matrices when

required And the MATLAB environment handles much of the bothersome

housekeeping that makes all this possible Since so many of the procedures

required for Macro-Investment Analysis involves matrices MATLAB

proves to be an extremely efficient language for both communication and

implementation

321 INTERFACING WITH OTHER LANGUAGES

MATLAB can call functions and subroutines written in the C

programming language or FORTRAN A wrapper function is created

allowing MATLAB data types to be passed and returned The dynamically

loadable object files created by compiling such functions are termed MEX-

files (for MATLAB executable)

Libraries written in Java ActiveX or NET can be directly called

from MATLAB and many MATLAB libraries (for

example XML or SQL support) are implemented as wrappers around Java

or ActiveX libraries Calling MATLAB from Java is more complicated but

can be done with MATLAB extension which is sold separately by Math

Works or using an undocumented mechanism called JMI (Java-to-Mat lab

Interface) which should not be confused with the unrelated Java that is also

called JMI

As alternatives to the MuPAD based Symbolic Math Toolbox

available from Math Works MATLAB can be connected

to Maple or Mathematical

Libraries also exist to import and export MathML

Development Environment

Startup Accelerator for faster MATLAB startup on Windows especially on

Windows XP and for network installations

Spreadsheet Import Tool that provides more options for selecting and

loading mixed textual and numeric data

Readability and navigation improvements to warning and error messages in

the MATLAB command window

Automatic variable and function renaming in the MATLAB Editor

Developing Algorithms and Applications

MATLAB provides a high-level language and development

tools that let you quickly develop and analyze your algorithms and

applications

The MATLAB Language

The MATLAB language supports the vector and matrix operations

that are fundamental to engineering and scientific problems It enables fast

development and execution With the MATLAB language you can

program and develop algorithms faster than with traditional languages

because you do not need to perform low-level administrative tasks such as

declaring variables specifying data types and allocating memory In many

cases MATLAB eliminates the need for lsquoforrsquo loops As a result one line of

MATLAB code can often replace several lines of C or C++ code

At the same time MATLAB provides all the features of a traditional

programming language including arithmetic operators flow control data

structures data types object-oriented programming (OOP) and debugging

features

MATLAB lets you execute commands or groups of commands one

at a time without compiling and linking enabling you to quickly iterate to

the optimal solution For fast execution of heavy matrix and vector

computations MATLAB uses processor-optimized libraries For general-

purpose scalar computations MATLAB generates machine-code

instructions using its JIT (Just-In-Time) compilation technology

This technology which is available on most platforms provides

execution speeds that rival those of traditional programming languages

Development Tools

MATLAB includes development tools that help you implement

your algorithm efficiently These include the following

MATLAB Editor

Provides standard editing and debugging features such as setting

breakpoints and single stepping

Code Analyzer

Checks your code for problems and recommends modifications to

maximize performance and maintainability

MATLAB Profiler

Records the time spent executing each line of code

Directory Reports

Scan all the files in a directory and report on code efficiency file

differences file dependencies and code coverage

Designing Graphical User Interfaces

By using the interactive tool GUIDE (Graphical User Interface

Development Environment) to layout design and edit user interfaces

GUIDE lets you include list boxes pull-down menus push buttons radio

buttons and sliders as well as MATLAB plots and Microsoft

ActiveXreg controls Alternatively you can create GUIs programmatically

using MATLAB functions

322 ANALYZING AND ACCESSING DATA

MATLAB supports the entire data analysis process from acquiring

data from external devices and databases through preprocessing

visualization and numerical analysis to producing presentation-quality

output

Data Analysis

MATLAB provides interactive tools and command-line functions for data

analysis operations including

Interpolating and decimating

Extracting sections of data scaling and averaging

Thresholding and smoothing

Correlation Fourier analysis and filtering

1-D peak valley and zero finding

Basic statistics and curve fitting

Matrix analysis

Data Access

MATLAB is an efficient platform for accessing data from

files other applications databases and external devices You can read data

from popular file formats such as Microsoft Excel ASCII text or binary

files image sound and video files and scientific files such as HDF and

HDF5 Low-level binary file IO functions let you work with data files in

any format Additional functions let you read data from Web pages and

XML

Visualizing Data

All the graphics features that are required to visualize engineering

and scientific data are available in MATLAB These include 2-D and 3-D

plotting functions 3-D volume visualization functions tools for

interactively creating plots and the ability to export results to all popular

graphics formats You can customize plots by adding multiple axes

changing line colors and markers adding annotation Latex equations and

legends and drawing shapes

2-D Plotting

Visualizing vectors of data with 2-D plotting functions that create

Line area bar and pie charts

Direction and velocity plots

Histograms

Polygons and surfaces

Scatterbubble plots

Animations

3-D Plotting and Volume Visualization

MATLAB provides functions for visualizing 2-D matrices 3-

D scalar and 3-D vector data You can use these functions to visualize and

understand large often complex multidimensional data Specifying plot

characteristics such as camera viewing angle perspective lighting effect

light source locations and transparency

3-D plotting functions include

Surface contour and mesh

Image plots

Cone slice stream and isosurface

323 PERFORMING NUMERIC COMPUTATION

MATLAB contains mathematical statistical and engineering

functions to support all common engineering and science operations These

functions developed by experts in mathematics are the foundation of the

MATLAB language The core math functions use the LAPACK and BLAS

linear algebra subroutine libraries and the FFTW Discrete Fourier

Transform library Because these processor-dependent libraries are

optimized to the different platforms that MATLAB supports they execute

faster than the equivalent C or C++ code

MATLAB provides the following types of functions for performing

mathematical operations and analyzing data

Matrix manipulation and linear algebra

Polynomials and interpolation

Fourier analysis and filtering

Data analysis and statistics

Optimization and numerical integration

Ordinary differential equations (ODEs)

Partial differential equations (PDEs)

Sparse matrix operations

MATLAB can perform arithmetic on a wide range of data types

including doubles singles and integers

CHAPTER 4

IMPLEMENTATION

41 GENERAL

Matlab is a program that was originally designed to simplify the

implementation of numerical linear algebra routines It has since grown into

something much bigger and it is used to implement numerical algorithms

for a wide range of applications The basic language used is very similar to

standard linear algebra notation but there are a few extensions that will

likely cause you some problems at first

42 SNAPSHOTS

ORIGINAL SCLERA IMAGE IS CONVERTED INTO GREY SCALE IMAGE

FIG

GREY SCALE IMAGE IS CONVERTED INTO BINARY IMAGE

FIG

EDGE DETECTON IS DONE BY OTSUrsquoS THRESHOLDING

FIG

SELECTING THE REGION OF INTEREST (SCLERA PART)

FIG

SELECTED ROI PART

FIG

FIG

ENHANCEMENT OF SCLERA IMAGE

FIG

FEATURE EXTRACTION OF SCLERA IMAGE USING GABOR

FILTERS

FIG

MATCHING WITH IMAGES IN DATABASE

FIG

DISPLAYING THE RESULT (MATCHED OR NOT MATCHED)

FIG

CHAPTER 5

APPLICATIONS

The applications of biometrics can be divided into the following three main groups

Commercial applications such as computer network login electronic data security ecommerce Internet access ATM credit card physical access control cellular phone PDA medical records management distance learning etc

Government applications such as national ID card correctional facility driverrsquos license social security welfare-disbursement border control Passport control etc

Forensic applications such as corpse identification criminal investigation terrorist identification parenthood determination missing children etc Traditionally commercial applications have used knowledge-based systems (eg PIN sand passwords) government applications have used token-based systems (eg ID cards and badges) and forensic applications have relied on human experts to match biometric features Biometric systems are being increasingly deployed in large scale civilian applications The Schiphol Premium scheme at the Amsterdam airport for example employs iris scan cards to speed up the passport and visa control procedures

CHAPTER 6

CONCLUSION AND FUTURE SCOPE

61 CONCLUSION

In this paper we proposed a new parallel sclera vein recognition

method which employees a two stage parallel approach for registration and

matching Even though the research focused on developing a parallel sclera

matching solution for the sequential line-descriptor method using CUDA

GPU architecture the parallel strategies developed in this research can be

applied to design parallel solutions to other sclera vein recognition methods

and general pattern recognition methods We designed the Y shape

descriptor to narrow the search range to increase the matching efficiency

which is a new feature extraction method to take advantage of the GPU

structures We developed the WPL descriptor to incorporate mask

information and make it more suitable for parallel computing which can

dramatically reduce data transferring and computation We then carefully

mapped our algorithms to GPU threads and blocks which is an important

step to achieve parallel computation efficiency using a GPU A work flow

which has high arithmetic intensity to hide the memory access latency was

designed to partition the computation task to the heterogeneous system of

CPU and GPU even to the threads in GPU The proposed method

dramatically improves the matching efficiency without compromising

recognition accuracy

62 REFERENCES

[1] C W Oyster The Human Eye Structure and Function Sunderland

Sinauer Associates 1999

[2] C Cuevas D Berjon F Moran and N Garcia ldquoMoving object

detection for real-time augmented reality applications in a GPGPUrdquo IEEE

Trans Consum Electron vol 58 no 1 pp 117ndash125 Feb 2012

[3] D C Cirean U Meier L M Gambardella and J Schmidhuber ldquoDeep

big simple neural nets for handwritten digit recognitionrdquo Neural Comput

vol 22 no 12 pp 3207ndash3220 2010

[4] F Z Sakr M Taher and A M Wahba ldquoHigh performance iris

recognition system on GPUrdquo in Proc ICCES 2011 pp 237ndash242

[5] G Poli J H Saito J F Mari and M R Zorzan ldquoProcessing

neocognitron of face recognition on high performance environment based

on GPU with CUDA architecturerdquo in Proc 20th Int Symp Comput

Archit High Perform Comput 2008 pp 81ndash88

[6] J Antikainen J Havel R Josth A Herout P Zemcik and M Hauta-

Kasari ldquoNonnegative tensor factorization accelerated using GPGPUrdquo IEEE

Trans Parallel Distrib Syst vol 22 no 7 pp 1135ndash1141 Feb 2011

[7] K-S Oh and K Jung ldquoGPU implementation of neural networksrdquo

Pattern Recognit vol 37 no 6 pp 1311ndash1314 2004

[8] P R Dixon T Oonishi and S Furui ldquoHarnessing graphics processors

for the fast computation of acoustic likelihoods in speech recognitionrdquo

Comput Speech Lang vol 23 no 4 pp 510ndash526 2009

[9] P Kaufman and A Alm ldquoClinical applicationrdquo Adlerrsquos Physiology of

the Eye 2003

[10] R N Rakvic B J Ulis R P Broussard R W Ives and N Steiner

ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics Security vol 4

no 4 pp 812ndash823 Dec 2009

[11] S Crihalmeanu and A Ross ldquoMultispectral scleral patterns for ocular

biometric recognitionrdquo Pattern Recognit Lett vol 33 no 14 pp 1860ndash

1869 Oct 2012

[12] W Wenying Z Dongming Z Yongdong L Jintao and G

Xiaoguang ldquoRobust spatial matching for object retrieval and its parallel

implementation on GPUrdquo IEEE Trans Multimedia vol 13 no 6 pp

1308ndash1318 Dec 2011Multimedia Sec Magdeburg Germany Sep 2004

pp 4ndash15

[13] Y Xu S Deka and R Righetti ldquoA hybrid CPU-GPGPU approach for

real-time elastographyrdquo IEEE Trans Ultrason Ferroelectr Freq Control

vol 58 no 12 pp 2631ndash2645 Dec 2011

[14] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive

multimodal eye recognitionrdquo Signal Image Video Process vol 7 no 4

pp 619ndash631 Jul 2013

[15] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive

approach for sclera image quality measurerdquo Int J Biometrics vol 5 no 2

pp 181ndash198 2013

[16] Z Zhou E Y Du N L Thomas and E J Delp ldquoA new human

identification method Sclera recognitionrdquo IEEE Trans Syst Man

Cybern A Syst Humans vol 42 no 3 pp 571ndash583 May 2012

Page 8: Major project report on

62 REFERENCES

APPLICATION

LIST OF FIGURES

FIG NO

FIG NAME PGNO

11 Fundamental blocks of digital image processing

2

12 Gray scale image 813 The additive model of RGB 914 The colors created by the

subtractive model of CMYK9

21 The diagram of a typical sclera vein recognition approach

19

22 Steps of Segmentation 2123 Glare area detection 2124 Detection of the sclera area 22

25 Pattern of veins 23

26 Sclera region and its vein

patterns 25

27 Filtering can take place

simultaneously on different

25

parts of the iris image

28 The sketch of parameters of

segment descriptor

26

29 The weighting image 28

210 The module of sclera

template matching

28

211 The Y shape vessel branch in

sclera

28

212 The rotation and scale

invariant character of Y

shape vessel branch

29

213 The line descriptor of the

sclera vessel pattern

30

214 The key elements of

descriptor vector

31

215 Simplified sclera matching

steps on GPU

32

216 Two-stage matching scheme 35

217 Example image from the

UBIRIS database

42

218 Occupancy on various thread

numbers per block

43

219 The task assignment inside

and outside the GPU

44

220 HOG features 46

41 Original sclera image 65

42 Binarised sclera image 65

43 Edge map subtracted image 66

44 Cropping roi 66

45 Roi mask 67

46 Roi finger sclera image 67

47 Enhanced sclera image 68

48 Feature extracted sclera

image

68

49 Matching with images in

database

69

410 Result 69

ABSTRACT

Sclera vein recognition is shown to be a promising method for human

identification However its matching speed is slow which could impact its

application for real-time applications To improve the matching efficiency

we proposed a new parallel sclera vein recognition method using a two-

stage parallel approach for registration and matching First we designed a

rotation- and scale-invariant Y shape descriptor based feature extraction

method to efficiently eliminate most unlikely matches Second we

developed a weighted polar line sclera descriptor structure to incorporate

mask information to reduce GPU memory cost Third we designed a

coarse-to-fine two-stage matching method Finally we developed a

mapping scheme to map the subtasks to GPU processing units The

experimental results show that our proposed method can achieve dramatic

processing speed improvement without compromising the recognition

accuracy

CHAPTER 1

INTRODUCTION

11GENERAL

Digital image processing is the use of computer algorithms to

perform image processing on digital images The 2D continuous image is

divided into N rows and M columns The intersection of a row and a

column is called a pixel The image can also be a function other variables

including depth color and time An image given in the form of a

transparency slide photograph or an X-ray is first digitized and stored as a

matrix of binary digits in computer memory This digitized image can then

be processed andor displayed on a high-resolution television monitor For

display the image is stored in a rapid-access buffer memory which

refreshes the monitor at a rate of 25 frames per second to produce a visually

continuous display

12 OVERVIEW ABOUT DIGITAL IMAGE PROCESSING

The field of ldquoDigital Image Processingrdquo refers to processing the digital

images by means of a digital computer In a broader sense it can be

considered as a processing of any two dimensional data where any image

(optical information) is represented as an array of real or complex numbers

represented by a definite number of bits An image is represented as a two

dimensional function f(xy) where lsquoxrsquo and lsquoyrsquo are spatial (plane)

coordinates and the amplitude of f at any pair of coordinates (xy)

represents the intensity or gray level of the image at that point

A digital image is one for which both the co-ordinates and the

amplitude values of f are all finite discrete quantities Hence a digital

image is composed of a finite number of elements each of which has a

particular location value These elements are called ldquopixelsrdquo A digital

image is discrete in both spatial coordinates and brightness and it can be

considered as a matrix whose rows and column indices identify a point on

the image and the corresponding matrix element value identifies the gray

level at that point

One of the first applications of digital images was in the newspaper

industry when pictures were first sent by submarine cable between London

and New York Introduction of the Bartlane cable picture transmission

system in the early 1920s reduced the time required to transport a picture

across the Atlantic from more than a week to less than three hours

FIG

121 PREPROCESSING

In imaging science image processing is any form of signal

processing for which the input is an image such as a photograph or video

frame the output of image processing may be either an image or a set of

characteristics or parameters related to the image Most image-processing

techniques involve treating the image as a two-dimensional signal and

applying standard signal-processing techniques to it Image processing

usually refers to digital image processing but optical and analog image

processing also are possible This article is about general techniques that

apply to all of them The acquisition of images (producing the input image

in the first place) is referred to as imaging

Image processing refers to processing of a 2D picture by a

computer Basic definitions

An image defined in the ldquoreal worldrdquo is considered to be a function

of two real variables for example a(xy) with a as the amplitude (eg

brightness) of the image at the real coordinate position (xy) Modern digital

technology has made it possible to manipulate multi-dimensional signals

with systems that range from simple digital circuits to advanced parallel

computers The goal of this manipulation can be divided into three

categories

Image processing (image in -gt image out)

Image Analysis (image in -gt measurements out)

Image Understanding (image in -gt high-level description out)

An image may be considered to contain sub-images sometimes referred

to as regions-of-interest ROIs or simply regions This concept reflects the

fact that images frequently contain collections of objects each of which can

be the basis for a region In a sophisticated image processing system it

should be possible to apply specific image processing operations to selected

regions Thus one part of an image (region) might be processed to suppress

motion blur while another part might be processed to improve colour

rendition

Most usually image processing systems require that the images be

available in digitized form that is arrays of finite length binary words For

digitization the given Image is sampled on a discrete grid and each sample

or pixel is quantized using a finite number of bits The digitized image is

processed by a computer To display a digital image it is first converted

into analog signal which is scanned onto a display Closely related to

image processing are computer graphics and computer vision In computer

graphics images are manually made from physical models of objects

environments and lighting instead of being acquired (via imaging devices

such as cameras) from natural scenes as in most animated movies

Computer vision on the other hand is often considered high-level image

processing out of which a machinecomputersoftware intends to decipher

the physical contents of an image or a sequence of images (eg videos or

3D full-body magnetic resonance scans)

In modern sciences and technologies images also gain much

broader scopes due to the ever growing importance of scientific

visualization (of often large-scale complex scientificexperimental data)

Examples include microarray data in genetic research or real-time multi-

asset portfolio trading in finance Before going to processing an image it is

converted into a digital form Digitization includes sampling of image and

quantization of sampled values After converting the image into bit

information processing is performed This processing technique may be

Image enhancement Image restoration and Image compression

122 IMAGE ENHANCEMENT

It refers to accentuation or sharpening of image features such as

boundaries or contrast to make a graphic display more useful for display amp

analysis This process does not increase the inherent information content in

data It includes gray level amp contrast manipulation noise reduction edge

crispening and sharpening filtering interpolation and magnification

pseudo coloring and so on

123 IMAGE RESTORATION

It is concerned with filtering the observed image to minimize the

effect of degradations Effectiveness of image restoration depends on the

extent and accuracy of the knowledge of degradation process as well as on

filter design Image restoration differs from image enhancement in that the

latter is concerned with more extraction or accentuation of image features

124 IMAGE COMPRESSION

It is concerned with minimizing the number of bits required to represent

an image Application of compression are in broadcast TV remote sensing

via satellite military communication via aircraft radar teleconferencing

facsimile transmission for educational amp business documents medical

images that arise in Computer tomography magnetic resonance imaging

and digital radiology motion pictures satellite images weather maps

geological surveys and so on

Text compression ndash CCITT GROUP3 amp GROUP4

Still image compression ndash JPEG

Video image compression ndash MPEG

125 SEGMENTATION

In computer vision image segmentation is the process of

partitioning a digital image into multiple segments (sets of pixels also

known as super pixels) The goal of segmentation is to simplify andor

change the representation of an image into something that is more

meaningful and easier to analyze Image segmentation is typically used to

locate objects and boundaries (lines curves etc) in images More precisely

image segmentation is the process of assigning a label to every pixel in an

image such that pixels with the same label share certain visual

characteristics

The result of image segmentation is a set of segments that

collectively cover the entire image or a set of contours extracted from the

image (see edge detection) Each of the pixels in a region are similar with

respect to some characteristic or computed property such as

colour intensity or texture Adjacent regions are significantly different

with respect to the same characteristic(s) When applied to a stack of

images typical in medical imaging the resulting contours after image

segmentation can be used to create 3D reconstructions with the help of

interpolation algorithms like marching cubes

126 IMAGE RESTORATION

Image restoration like enhancement improves the qualities of image

but all the operations are mainly based on known measured or

degradations of the original image Image restorations are used to restore

images with problems such as geometric distortion improper focus

repetitive noise and camera motion It is used to correct images for known

degradations

127 FUNDAMENTAL STEPS

Image acquisition to acquire a digital image

Image preprocessing to improve the image in ways that increases the

chances for success of the other processes

Image segmentation to partitions an input image into its constituent parts or

objects

Image representation to convert the input data to a form suitable for

computer processing

Image description to extract features that result in some quantitative

information of interest or features that are basic for differentiating one

class of objects from another

Image recognition to assign a label to an object based on the

information provided by its descriptors

Image interpretation to assign meaning to an ensemble of recognized

objects

Knowledge about a problem domain is coded into an image processing

system in the form of a Knowledge database

13 A SIMPLE IMAGE MODEL

To be suitable for computer processing an image f(xy) must be digitalized

both spatially and in amplitude

Digitization of the spatial coordinates (xy) is called image sampling

Amplitude digitization is called gray-level quantization

The storage and processing requirements increase rapidly with the spatial

resolution and the number of gray levels

Example A 256 gray-level image of size 256x256 occupies 64K bytes of

memory

Images of very low spatial resolution produce a checkerboard effect

The use of insufficient number of gray levels in smooth areas of a digital

image results in false contouring

14 IMAGE FILE FORMATS

There are two general groups of lsquoimagesrsquo vector graphics (or line art)

and bitmaps (pixel-based or lsquoimagesrsquo) Some of the most common file

formats are

GIF mdash Graphical interchange Format An 8-bit (256 colour) non-

destructively compressed bitmap format Mostly used for web Has several

sub-standards one of which is the animated GIF

JPEG mdash Joint Photographic Experts Group a very efficient (ie much

information per byte) destructively compressed 24 bit (16 million colours)

bitmap format Widely used especially for web and Internet (bandwidth-

limited)

TIFF mdash Tagged Image File Format The standard 24 bit publication bitmap

format Compresses non-destructively with for instance Lempel-Ziv-

Welch (LZW) compression

PS mdash Postscript a standard vector format Has numerous sub-standards

and can be difficult to transport across platforms and operating systems

PSD ndash Adobe PhotoShop Document a dedicated Photoshop format that

keeps all the information in an image including all the layers

BMP- bit map file format

15 TYPE OF IMAGES

Images are 4 types

1 Binary image

2 Gray scale image

3 Color image

4 Indexed image

151 BINARY IMAGES

A binary image is a digital image that has only two possible values for

each pixel Typically the two colors used for a binary image are black and

white though any two colors can be used Binary images are also called bi-

level or two-level This means that each pixel is stored as a single bitmdashie

a 0 or 1 The names black-and-white BampW

152 GRAY SCALE IMAGE

In a (8-bit) grayscale image each picture element has an assigned intensity

that ranges from 0 to 255 A grey scale image is what people normally call

a black and white image but the name emphasizes that such an image will

also include many shades of grey

FIG

153 COLOR IMAGE

The RGB colour model relates very closely to the way we perceive

colour with the r g and b receptors in our retinas RGB uses additive colour

mixing and is the basic colour model used in television or any other

medium that projects colour with light It is the basic colour model used in

computers and for web graphics but it cannot be used for print production

The secondary colours of RGB ndash cyan magenta and yellow ndash are formed

by mixing two of the primary colours (red green or blue) and excluding the

third colour Red and green combine to make yellow green and blue to

make cyan and blue and red form magenta The combination of red green

and blue in full intensity makes white

In Photoshop using the ldquoscreenrdquo mode for the different layers in an

image will make the intensities mix together according to the additive

colour mixing model This is analogous to stacking slide images on top of

each other and shining light through them

FIG

CMYK The 4-colour CMYK model used in printing lays down

overlapping layers of varying percentages of transparent cyan (C) magenta

(M) and yellow (Y) inks In addition a layer of black (K) ink can be added

The CMYK model uses the subtractive colour model

154 INDEXED IMAGE

FIG

An indexed image consists of an array and a color map matrix The

pixel values in the array are direct indices into a color map By convention

this documentation uses the variable name X to refer to the array and map

to refer to the color map In computing indexed color is a technique to

manage digital images colors in a limited fashion in order to save

computer memory and file storage while speeding up display refresh and

file transfers It is a form of vector quantization compression

When an image is encoded in this way color information is not

directly carried by the image pixel data but is stored in a separate piece of

data called a palette an array of color elements in which every element a

color is indexed by its position within the array The image pixels do not

contain the full specification of its color but only its index in the palette

This technique is sometimes referred as pseudocolor or indirect color as

colors are addressed indirectly

Perhaps the first device that supported palette colors was a random-

access frame buffer described in 1975 by Kajiya Sutherland and Cheadle

This supported a palette of 256 36-bit RGB colors

16 Applications of image processing

Interest in digital image processing methods stems from 2 principal

application areas

1) Improvement of pictorial information for human interpretation

2) Processing of scene data for autonomous machine perception

In the second application area interest focuses on procedures for

extracting from an image

Information in a form suitable for computer processing

Examples include automatic character recognition industrial machine

vision for product assembly and inspection military recognizance

automatic processing of fingerprints etc

17 EXISTING SYSTEM

In Crihalmeanu and Ross proposed three approaches Speed Up Robust

Features (SURF)-based method minutiae detection and direct correlation

matching for feature registration and matching Within these three methods

the SURF method achieves the best accuracy It takes an average of 15

seconds1 using the SURF method to perform a one-to-one matching In

Zhou et al Proposed line descriptor-based method for sclera vein

recognition The matching step (including registration) is the most time-

consuming step in this sclera vein recognition system which costs about 12

seconds to perform a one-to-one matching Both speed was calculated using

a PC with Intelreg Core TM 2 Duo 24GHz processors and 4 GB DRAM

Currently Sclera vein recognition algorithms are designed using central

processing unit (CPU)-based systems

171 DISADVANTAGES OF EXISTING SYSTEM

1 Mask files are used to calculate valid overlapping areas of two sclera

templates and to align the templates to the same coordinate system But the

mask files are large in size and will preoccupy the GPU memory and slow

down the data transfer Also some of processing on the mask files will

involve convolution which is difficult to improve its performance on the

scalar process unit on CUDA

2 The procedure of sclera feature matching consists of a pipeline of several

computational stages with different memory and processing requirements

There is no uniform mapping scheme applicable for all these stages

3 When the scale of sclera database is far larger than the number of the

processing units on the GPU parallel matching on the GPU is still unable to

satisfy the requirement of real-time performance

LITERATURE SURVEY

1 S Crihalmeanu and A Ross ldquo ieee Multispectral scleral patterns

for ocular biometric recognitionrdquo Pattern Recognit Lett vol 33 no

14 pp 1860ndash1869 Oct 2012

Face recognition in unconstrained acquisition conditions is one of the

most challenging problems that has been actively researched in recent

years It is well known that many state-of-the-arts still face recognition

algorithms perform well when constrained (frontal well illuminated high-

resolution sharp and full) face images are acquired However their

performance degrades significantly when the test images contain variations

that are not present in the training images In this paper we highlight some

of the key issues in remote face recognition We define the remote face

recognition as one where faces are several tens of meters (10-250m) from

the cameras We then describe a remote face database which has been

acquired in an unconstrained outdoor maritime environment Recognition

performance of a subset of existing still image-based face recognition

algorithms is evaluated on the remote face data set Further we define the

remote re-identification problem as matching a subject at one location with

candidate sets acquired at a different location and over time in remote

conditions We provide preliminary experimental results on remote re-

identification It is demonstrated that in addition to applying a good

classification algorithm finding features that are robust to variations

mentioned above and developing statistical models which can account for

these variations are very important for remote face recognition

2 R N Rakvic B J Ulis R P Broussard R W Ives and N

Steiner1 ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics

Security

With the rapidly expanded biometric data collected by various sectors

of government and industry for identification and verification purposes

how to manage and process such Big Data draws great concern Even

though modern processors are equipped with more cores and memory

capacity it still requires careful design in order to utilize the hardware

resource effectively and the power consumption efficiently This research

addresses this issue by investigating the workload characteristics of

biometric application Taking Daugmanrsquos iris matching algorithm which

has been proven to be the most reliable iris matching method as a case

study we conduct performance profiling and binary instrumentation on the

benchmark to capture its execution behavior The results show that data

loading and memory access incurs great performance overhead and

motivates us to move the biometrics computation to high-performance

architecture

Modern iris recognition algorithms can be computationally intensive

yet are designed for traditional sequential processing elements such as a

personal computer However a parallel processing alternative using field

programmable gate arrays (FPGAs) offers an opportunity to speed up iris

recognition Within the means of this project iris template generation with

directional filtering which is acomputationally expensive yet parallel

portion of a modern iris recognition algorithm is parallelized on an FPGA

system We will present a performance comparison of the parallelized

algorithm on the FPGA system to a traditional CPU-based version The

parallelized template generation outperforms an optimized C++ code

version determining the information content of an iris approximately 324

times faster

3 R Derakhshani A Ross and S Crihalmeanu ldquoA new biometric

modality based on conjunctival vasculaturerdquo in Proc Artif Neural

Netw Eng 2006 pp 1ndash8

A new biometric indicator based on the patterns of conjunctival

vasculature is proposed Conjunctival vessels can be observed on the visible

part of the sclera that is exposed to the outside world These vessels

demonstrate rich and specific details in visible light and can be easily

photographed using a regular digital camera In this paper we discuss

methods for conjunctival imaging preprocessing and feature extraction in

order to derive a suitable conjunctival vascular template for biometric

authentication Commensurate classification methods along with the

observed accuracy are discussed Experimental results suggest the potential

of using conjunctival vasculature as a biometric measure Identification of

a person based on some unique set of features is an important task The

human identification is possible with several biometric systems and sclera

recognition is one of the promising biometrics The sclera is the white

portion of the human eye The vein pattern seen in the sclera region is

unique to each person Thus the sclera vein pattern is a well suited

biometric technology for human identification The existing methods used

for sclera recognition have some drawbacks like only frontal looking

images are preferred for matching and rotation variance is another problem

These problems are completely eliminated in the proposed system by using

two feature extraction techniques They are Histogram of Oriented

Gradients (HOG) and converting the image into polar form using the

bilinear interpolation technique These two features help the proposed

system to become illumination invariant and rotation invariant The

experimentation is done with the help of UBIRIS database The

experimental result shows that the proposed sclera recognition method can

achieve better accuracy than the previous methods

4 J D Owens M Houston D Luebke S Green J E Stone and J

C Phillips ldquoGPU computingrdquo Proc IEEE vol 96 no 5pp 879ndash899

May 2008

The graphics processing unit (GPU) has become an integral part of

todayrsquos mainstream computing systems Over the past six years there has

been a marked increase in the performance and capabilities of GPUs The

modern GPU is not only a powerful graphics engine but also a highly

parallel programmable processor featuring peak arithmetic and memory

bandwidth that substantially outpaces its CPU counterpart The GPUrsquos

rapid increase in both programmability and capability has spawned a

research community that has successfully mapped a broad range of

computationally demanding complex problems to the GPU This effort in

general purpose computing on the GPU also known as GPU computing

has positioned the GPU as a compelling alternative to traditional

microprocessors in high-performance computer systems of the future We

describe the background hardware and programming model for GPU

computing summarize the state of the art in tools and techniques and

present four GPU computing successes in game physics and computational

biophysics that deliver order-of-magnitude performance gains over

optimized CPU applications

5 H Proenccedila and L A Alexandre ldquoUBIRIS A noisy iris image

databaserdquoin Proc 13th Int Conf Image Anal Process 2005 pp 970ndash

977

This paper proposes algorithms for iris segmentation quality

enhancement match score fusion and indexing to improve both the

accuracy and the speed of iris recognition A curve evolution approach is

proposed to effectively segment a nonideal iris image using the modified

MumfordndashShah functional Different enhancement algorithms are

concurrently applied on the segmented iris image to produce multiple

enhanced versions of the iris image A support-vector-machine-based

learning algorithm selects locally enhanced regions from each globally

enhanced image and combines these good-quality regions to create a single

high-quality iris image Two distinct features are extracted from the high-

quality iris image The global textural feature is extracted using the 1-D log

polar Gabor transform and the local topological feature is extracted using

Euler numbers An intelligent fusion algorithm combines the textural and

topological matching scores to further improve the iris recognition

performance and reduce the false rejection rate whereas an indexing

algorithm enables fast and accurate iris identification The verification and

identification performance of the propose algorithms is validated and

compared with other algorithms using the CASIA Version 3 ICE 2005 and

UBIRIS iris databases

18 PROPOSED METHOD

Proposed a new parallel sclera vein recognition method using a two-

stage parallel approach for registration and matching A parallel sclera

matching solution for Sclera vein recognition using our sequential line-

descriptor method using the CUDA GPU architecture CUDA is a highly

parallel multithreaded many-core processor with tremendous

computational power

It supports not only a traditional graphics pipeline but also computation

on non-graphical data It is relatively straightforward to implement our C

program for CUDA on AMD-based GPU using Open CL Our CUDA

kernels can be directly converted to Open CL kernels by concerning

different syntax for various keywords and built-in functions The mapping

strategy is also effective in Open CL if we regard thread and block in

CUDA as work item and work-group in Open CL Most of our optimization

techniques such as coalesced memory access and prefix sum can work in

Open CL too Moreover since CUDA is a data parallel architecture the

implementation of our approach by Open CL should be programmed in

data-parallel model

In this research we first discuss why the naiumlve parallel approach would

not work We then propose the new sclera descriptor ndash the Y shape sclera

feature-based efficient registration method to speed up the mapping scheme

introduce the ldquoweighted polar line (WPL) descriptorrdquo that would be better

suited for parallel computing to mitigate the mask size issue and develop

our coarse to fine two-stage matching process to dramatically improve the

matching speed These new approaches make the parallel processing

possible and efficient

191PROPOSED SYSTEM ADVANTAGES

1 To improve the efficiency in this research we propose a new descriptor

mdash the Y shape descriptor which can greatly help improve the efficiency of

the coarse registration of two images and can be used to filter out some

non-matching pairs before refined matching

2 We propose the coarse-to-fine two-stage matching process In the first

stage we matched two images coarsely using the Y-shape descriptors

which is very fast to match because no registration was needed The

matching result in this stage can help filter out image pairs with low

similarities

CHAPTER 2

PROJECT DESCRIPTION

21 INTRODUCTION

The sclera is the opaque and white outer layer of the eye The blood

vessel structure of sclera is formed randomly and is unique to each person

which can be used for humanrsquos identification Several researchers have

designed different Sclera vein recognition methods and have shown that it

is promising to use Sclera vein recognition for human identification In

Crihalmeanu and Ross proposed three approaches Speed Up Robust

Features (SURF)-based method minutiae detection and direct correlation

matching for feature registration and matching Within these three methods

the SURF method achieves the best accuracy It takes an average of 15

seconds1 using the SURF method to per- form a one-to-one matching Zhou

et al proposed line descriptor-based method for sclera vein recognition

The matching step (including registration) is the most time-consuming step

in this sclera vein recognition system which costs about 12 seconds to

perform a one-to-one matching Both speed was calculated using a PC with

Intelreg Coretrade 2 Duo 24GHz processors and 4 GB DRAM Currently

Sclera vein recognition algorithms are designed using central processing

unit (CPU)-based systems

As discussed CPU-based systems are designed as sequential

processing devices which may not be efficient in data processing where the

data can be parallelized Because of large time consumption in the matching

step Sclera vein recognition using sequential-based method would be very

challenging to be implemented in a real time biometric system especially

when there is large number of templates in the database for matching GPUs

(as abbreviation of General purpose Graphics Processing Units GPGPUs)

are now popularly used for parallel computing to improve the

computational processing speed and efficiency The highly parallel

structure of GPUs makes them more effective than CPUs for data

processing where processing can be performed in parallel GPUs have been

widely used in biometrics recognition such as speech recognition text

detection handwriting recognition and face recognition In iris

recognition GPU was used to extract the features construct descriptors

and match templates

GPUs are also used for object retrieval and image search Park et al

designed the performance evaluation of image processing algorithms such

as linear feature extraction and multi-view stereo matching on GPUs

However these approaches were designed for their specific biometric

recognition applications and feature searching methods Therefore they may

not be efficient for Sclera vein recognition Compute Unified Device

Architecture (CUDA) the computing engine of NVIDIA GPUs is used in

this research CUDA is a highly parallel multithreaded many-core

processor with tremendous computational power It supports not only a

traditional graphics pipeline but also computation on non-graphical data

More importantly it offers an easier programming platform which

outperforms its CPU counterparts in terms of peak arithmetic intensity and

memory bandwidth In this research the goal is not to develop a unified

strategy to parallelize all sclera matching methods because each method is

quite different from one another and would need customized design To

develop an efficient parallel computing scheme it would need different

strategies for different Sclera vein recognition methods

Rather the goal is to develop a parallel sclera matching solution for

Sclera vein recognition using our sequential line-descriptor method using

the CUDA GPU architecture However the parallelization strategies

developed in this research can be applied to design parallel approaches for

other Sclera vein recognition methods and help parallelize general pattern

recognition methods Based on the matching approach in there are three

challenges to map the task of sclera feature matching to GPU

1) Mask files are used to calculate valid overlapping areas of two sclera

templates and to align the templates to the same coordinate system But the

mask files are large in size and will preoccupy the GPU memory and slow

down the data transfer Also some of processing on the mask files will

involve convolution which is difficult to improve its performance on the

scalar process unit on CUDA

2) The procedure of sclera feature matching consists of a pipeline of several

computational stages with different memory and processing requirements

There is no uniform mapping scheme applicable for all these stages

3) When the scale of sclera database is far larger than the number of the

processing units on the GPU parallel matching on the GPU is still unable to

satisfy the requirement of real-time performance New designs are

necessary to help narrow down the search range In summary naiumlve

implementation of the algorithms in parallel would not work efficiently

Note it is relatively straightforward to implement our C program for

CUDA on AMD-based GPU using Open CL Our CUDA kernels can be

directly converted to Open CL kernels by concerning different syntax for

various keywords and built-in functions The mapping strategy is also

effective in Open CL if we regard thread and block in CUDA as work item

and work-group in Open CL Most of our optimization techniques such as

coalesced memory access and prefix sum can work in Open CL too

Moreover since CUDA is a data parallel architecture the implementation

of our approach by Open CL should be programmed in data-parallel model

In this research we first discuss why the naiumlve parallel approach would not

work (Section 3) We then propose the new sclera descriptor ndash the Y shape

sclera feature-based efficient registration method to speed up the mapping

scheme (Section 4) introduce the ldquoweighted polar line (WPL) descriptorrdquo

that would be better suited for parallel computing to mitigate the mask size

issue (Section 5) and develop our coarse to fine two-stage matching

process to dramatically improve the matching speed (Section 6) These new

approaches make the parallel processing possible and efficient However it

is non-trivial to implement these algorithms in CUDA We then developed

the implementation schemes to map our algorithms into CUDA (Section 7)

In the Section 2 we give brief introduction of Sclera vein recognition In

the Section 8 we performed some experiments using the proposed system

In the Section 9 we draw some conclusions

22 BACKGROUND OF SCLERA VEIN RECOGNITION

221 OVERVIEW OF SCLERA VEIN RECOGNITION

A typical sclera vein recognition system includes sclera

segmentation feature enhancement feature extraction and feature

matching (Figure 1)

FIG

Sclera image segmentation is the first step in sclera vein recognition

Several methods have been designed for sclera segmentation Crihalmeanu

et al presented an semi-automated system for sclera segmentation They

used a clustering algorithm to classify the color eye images into three

clusters - sclera iris and background Later on Crihalmeanu and Ross

designed a segmentation approach based on a normalized sclera index

measure which includes coarse sclera segmentation pupil region

segmentation and fine sclera segmentation Zhou et al developed a skin

tone plus ldquowhite colorrdquo-based voting method for sclera segmentation in

color images and Otsursquos thresholding-based method for grayscale images

After sclera segmentation it is necessary to enhance and extract the sclera

features since the sclera vein patterns often lack contrast and are hard to

detect Zhou et al used a bank of multi-directional Gabor filters for

vascular pattern enhancement Derakhshani et al used contrast limited

adaptive histogram equalization (CLAHE) to enhance the green color plane

of the RGB image and a multi-scale region growing approach to identify

the sclera veins from the image background Crihalmeanu and Ross applied

a selective enhancement filter for blood vessels to extract features from the

green component in a color image In the feature matching step

Crihalmeanu and Ross proposed

three registration and matching approaches including Speed Up Robust

Features (SURF) which is based on interest-point detection minutiae

detection which is based on minutiae points on the vasculature structure

and direct correlation matching which relies on image registration Zhou et

al designed a line descriptor based feature registration and matching

method

The proposed sclera recognition consists of five steps which include

sclera segmentation vein pattern enhancement feature extraction feature

matching and matching decision Fig 2 shows the block diagram of sclera

recognition Two types of feature extraction are used in the proposed

method to achieve good accuracy for the identification The characteristics

that are elicited from the blood vessel structure seen in the sclera region are

Histogram of Oriented Gradients (HOG) and interpolation of Cartesian to

Polar conversion HOG is used to determine the gradient orientation and

edge orientations of vein pattern in the sclera region of an eye image To

become more computationally efficient the data of the image are converted

to the polar form It is mainly used for circular or quasi circular shape of

object These two characteristics are extracted from all the images in the

database and compared with the features of the query image whether the

person is correctly identified or not This procedure is done in the feature

matching step and ultimately makes the matching decision By using the

proposed feature extraction methods and matching techniques the human

identification is more accurate than the existing studies In the proposed

method two features of an image are drawn out

222 SCLERA SEGMENTATION

Sclera segmentation is the first step in the sclera recognition It lets

in three steps glare area detection sclera area estimation and iris and eyelid

detection and refinement Fig shows the steps of segmentation

FIG

Glare Area Detection Glare area means a small bright area near

pupil or iris This is the unwanted portion on the eye image Sobel filter is

applied to detect the glare area present in the iris or pupil Simply it runs

only for the grayscale image If the image is color then it needs a

conversion to grayscale image and after that apply it to the Sobel filter to

detect the glare area Fig 4 shows the result of the glare area detection

FIG

Sclera area estimation For the estimation of sclera area Otsursquos

thresholding method is applied The stairs of the sclera area detection are

selection of the area of interest (ROI) Otsursquos thresholding sclera area

detection Left and right sclera area is selected based on the iris boundaries

When the region of interest is selected then apply Otsursquos thresholding for

obtaining the potential sclera areas The correct left sclera area should be

placed in the right and center positions and correct right sclera area should

be placed in the left and center In this way non sclera areas are wiped out

223 IRIS AND EYELID REFINEMENT

The top and underside of the sclera regions are the limits of the

sclera area And then that upper eyelid lower eyelid and iris boundaries are

refined These altogether are the unwanted portion for recognition In order

to eliminate these effects refinement is done in the footstep of the detection

of sclera area Fig shows after the Otsursquos thresholding process and iris and

eyelid refinement to detect right sclera area In the same way the left sclera

area is detected using this method

FIG

In the segmentation process all images are not perfectly segmented

Hence feature extraction and matching are needed to reduce the

segmentation fault The vein patterns in the sclera area are not visible in the

segmentation process To get vein patterns more visible vein pattern

enhancement is to be performed

224 OCULAR SURFACE VASCULATURE

Human recognition using vascular patterns in the human body has

been studied in the context of fingers (Miura et al 2004) palm (Lin and

Fan 2004) and retina (Hill 1999) In the case of retinal biometrics an

especial optical device for imaging the back of the eyeball is needed (Hill

1999) Due to its perceived invasiveness and the required degree of subject

cooperation the use of retinal biometrics may not be acceptable to some

individuals The conjunctiva is a thin transparent and moist tissue that

covers the outer surface of the eye The part of the conjunctiva that covers

the inner lining of the eyelids is called palpebral conjunctiva and the part

that covers the outer surface of the eye is called ocular (or the bulbar)

conjunctiva which is the focus of this study The ocular conjunctiva is very

thin and clear thus the vasculature (including those of the episclera) is

easily visible through it The visible microcirculation of conjunctiva offers a

rich and complex network of veins and fine microcirculation (Fig 1) The

apparent complexity and specificity of these vascular patterns motivated us

to utilize them for personal identification (Derakhshani and Ross 2006)

FIG

We have found conjunctival vasculature to be a suitable biometric as it

conforms to the following criteria (Jain et al 2004)

UNIVERSALITY All normal living tissues including that of the

conjunctiva and episclera have vascular structure

UNIQUENESS Vasculature is created during embryonic vasculogenesis

Its detailed final structure is mostly stochastic and thus unique Even

though no comprehensive study on the uniqueness of vascular structures

has been conducted study of some targeted areas such as those of the eye

fundus confirm the uniqueness of such vascular patterns even between

identical twins (Simon and Goldstein 1935 Tower 1955)

PERMANENCE Other than cases of significant trauma pathology or

chemical intervention spontaneous adult ocular vasculogenesis and

angiogenesis does not easily occur Thus the conjunctival vascular

structure is expected to have reasonable permanence (Joussen 2001)

Practicality Conjunctival vasculature can be captured with commercial off

the shelf digital cameras under normal lighting conditions making this

modality highly practical

ACCEPTABILITY Since the subject is not required to stare directly into

the camera lens and given the possibility of capturing the conjunctival

vasculature from several feet away this modality is non-intrusive and thus

more acceptable

SPOOF-PROOFNESS The fine multi surface structure of the ocular

veins makes them hard to reproduce as a physical artifact Besides being a

stand-alone biometric modality we anticipate that the addition of

conjunctival biometrics will enhance the performance of current iris-based

biometric system in the following ways

Improving accuracy by the addition of vascular features

Facilitating recognition using off-angle iris images For instance if the iris

information is relegated to the left or right portions of the eye the sclera

vein patterns will be further exposed This feature makes sclera vasculature

a natural complement to the iris biometric

Addressing the failure-to-enroll issue when iris patterns are not usable (eg

due to surgical procedures)

Reducing vulnerability to spoof attacks For instance when implemented

alongside iris systems an attacker needs to reproduce not only the iris but

also different surfaces of the sclera along with the associated

microcirculation and make them available on commensurate eye surfaces

The first step in parallelizing an algorithm is to determine the

availability for simultaneous computation Below Figure demonstrates the

possibility for parallel directional filtering Since the filter is computed over

different portions of the input image the computation can be computed in

parallel (denoted by Elements below) In addition individual parallelization

of each element of Filtering can also be performed A detailed discussion of

our proposed parallelization is outside the scope of this paper

FIG

FIG

225 OVERVIEW OF THE LINE DESCRIPTOR-BASED SCLERA

VEIN

2251 RECOGNITION METHOD

The matching segment of the line-descriptor based method is a

bottleneck with regard to matching speed In this section we briefly

describe the Line Descriptor-based sclera vein recognition method After

segmentation vein patterns were enhanced by a bank of directional Gabor

filters Binary morphological operations are used to thin the detected vein

structure down to a single pixel wide skeleton and remove the branch

points The line descriptor is used to describe the segments in the vein

structure Figure 2 shows a visual description of the line descriptor Each

segment is described by three quantities the segments angle to some

reference angle at the iris center θ the segments distance to the iris center r

and the dominant angular orientation of the line segment ɸ Thus the

descriptor is S = ( θ r ɸ )T The individual components of the line descriptor

are calculated as

FIG

Here fline (x) is the polynomial approximation of the line segment (xl yl )

is the center point of the line segment (xi yi ) is the center of the detected

iris and S is the line descriptor In order to register the segments of the

vascular patterns a RANSAC-based algorithm is used to estimate the best-

fit parameters for registration between the two sclera vascular patterns For

the registration algorithm it randomly chooses two points ndash one from the

test template and one from the target template It also randomly chooses a

scaling factor and a rotation value based on a priori knowledge of the

database Using these values it calculates a fitness value for the registration

using these parameters

After sclera template registration each line segment in the test

template is compared to the line segments in the target template for

matches In order to reduce the effect of segmentation errors we created the

weighting image (Figure 3) from the sclera mask by setting interior pixels

in the sclera mask to 1 pixels within some distance of the boundary of the

mask to 05 and pixels outside the mask to 0

The matching score for two segment descriptors is calculated By

where Si and Sj are two segment descriptors m(Si Sj ) is the matching

score between segments Si and Sj d(Si Sj ) is the Euclidean distance

between the segment descriptors center points (from Eq 6-8) Dmatch is

the matching distance threshold and match is the matching angle threshold

The total matching score M is the sum of the individual matching scores

divided by the maximum matching score for the minimal set between the

test and target template That is one of the test or target templates has fewer

points and thus the sum of its descriptors weight sets the maximum score

that can be attained

FIG

FIG

FIG

FIG

movement of eye Y shape branches are observed to be a stable feature and

can be used as sclera feature descriptor To detect the Y shape branches in

the original template we search for the nearest neighbors set of every line

segment in a regular distance classified the angles among these neighbors

If there were two types of angle values in the line segment set this set may

be inferred as a Y shape structure and the line segment angles would be

recorded as a new feature of the sclera

There are two ways to measure both orientation and relationship of

every branch of Y shape vessels one is to use the angles of every branch to

x axle the other is to use the angels between branch and iris radial

direction The first method needs additional rotation operating to align the

template In our approach we employed the second method As Figure 6

shows ϕ1 ϕ2 and ϕ3 denote the angle between each branch and the radius

from pupil center Even when the head tilts the eye moves or the camera

zooms occurs at the image acquisition step ϕ1 ϕ2 and ϕ3 are quite stable

To tolerate errors from the pupil center calculation in the segmentation step

we also recorded the center position (x y) of the Y shape branches as

auxiliary parameters So in our rotation shift and scale invariant feature

vector is defined as y(ϕ1 ϕ2 ϕ3 x y) The Y-shape descriptor is generated

with reference to the iris center Therefore it is automatically aligned to the

iris centers It is a rotational- and scale- invariant descriptor V WPL

SCLERA DESCRIPTOR As we discussed in the Section 22 the line

descriptor is extracted from the skeleton of vessel structure in binary images

(Figure 7) The skeleton is then broken into smaller segments For each

segment a line descriptor is created to record the center and orientation of

the segment This descriptor is expressed as s(x yɸ) where (x y) is the

position of the center and ɸ is its orientation Because of the limitation of

segmentation accuracy the descriptor in the boundary of sclera area might

not be accurate and may contain spur edges resulting from the iris eyelid

andor eyelashes To be tolerant of such error the mask file

FIG

The line descriptor of the sclera vessel pattern (a) An eye image (b) Vessel

patterns in sclera (c) Enhanced sclera vessel patterns (d) Centers of line

segments of vessel patterns

Is designed to indicate whether a line segment belongs to the edge of the

sclera or not However in GPU application using the mask is a challenging

since the mask files are large in size and will occupy the GPU memory and

slow down the data transfer When matching the registration RANSAC

type algorithm was used to randomly select the corresponding descriptors

and the transform parameter between them was used to generate the

template transform affine matrix After every templates transform the mask

data should also be transformed and new boundary should be calculated to

evaluate the weight of the transformed descriptor This results in too many

convolutions in processor unit

To reduce heavy data transfer and computation we designed the

weighted polar line (WPL) descriptor structure which includes the

information of mask and can be automatically aligned We extracted the

relationship of geometric feature of descriptors and store them as a new

descriptor We use a weighted image created via setting various weight

values according to their positions The weight of those descriptors who are

beyond the sclera are set to be 0 and those who are near the sclera

boundary are 05 and interior descriptors are set to be 1 In our work

descriptors weights were calculated on their own mask by the CPU only

once

The calculating result was saved as a component of descriptor The

descriptor of sclera will change to s(x y ɸw) where w denotes the weight

of the point and the value may be 0 05 1 To align two templates when a

template is shifted to another location along the line connecting their

centers all the descriptors of that template will be transformed It would be

faster if two templates have similar reference points If we use the center of

the iris as the reference point when two templates are compared the

correspondence will automatically be aligned to each other since they have

the similar reference point Every feature vector of the template is a set of

line segment descriptors composed of three variable (Figure 8) the

segment angle to the reference line which went through the iris center

denoted as θ the distance between the segments center and pupil center

which is denoted as r the dominant angular orientation of the segment

denoted as ɸ To minimize the GPU computing we also convert the

descriptor value from polar coordinate to rectangular coordinate in CPU

preprocess

The descriptor vector becomes s(x y r θ ɸw) The left and right

parts of sclera in an eye may have different registration parameters For

example as an eyeball moves left left part sclera patterns of the eye may be

compressed while the right part sclera patterns are stretched

In parallel matching these two parts are assigned to threads in

different warps to allow different deformation The multiprocessor in

CUDA manages threads in groups of 32 parallel threads called warps We

reorganized the descriptor from same sides and saved

FIG

FIG

them in continuous address This would meet requirement of coalesced

memory access in GPU

After reorganizing the structure of descriptors and adding mask information

into the new descriptor the computation on the mask file is not needed on

the GPU It was very fast to match with this feature because it does not

need to reregister the templates every time after shifting Thus the cost of

data transfer and computation on GPU will be reduced Matching on the

new descriptor the shift parameters generator in Figure 4 is then simplified

as Figure 9

23 EVOLUTION OF GPU ARCHITECTURE

The fixed-function pipeline lacked the generality to efficiently express

more complicated shading and lighting operations that are essential for

complex effects The key step was replacing the fixed-function per-vertex

and per-fragment operations with user-specified programs run on each

vertex and fragment Over the past six years these vertex programs and

fragment programs have become increasingly more capable with larger

limits on their size and resource consumption with more fully featured

instruction sets and with more flexible control-flow operations After many

years of separate instruction sets for vertex and fragment operations current

GPUs support the unified Shader Model 40 on both vertex and fragment

shaders

The hardware must support shader programs of at least 65 k static

instructions and unlimited dynamic instructions

The instruction set for the first time supports both 32-bit integers and 32-

bit floating-point numbers

The hardware must allow an arbitrary number of both direct and indirect

reads from global memory (texture)

Finally dynamic flow control in the form of loops and branches must be

supported

As the shader model has evolved and become more powerful and GPU

applications of all types have increased vertex and fragment program

complexity GPU architectures have increasingly focused on the

programmable parts of the graphics pipeline Indeed while previous

generations of GPUs could best be described as additions of

programmability to a fixed-function pipeline todayrsquos GPUs are better

characterized as a programmable engine surrounded by supporting fixed-

function units General-Purpose Computing on the GPU Mapping general-

purpose computation onto the GPU uses the graphics hardware in much the

same way as any standard graphics application Because of this similarity it

is both easier and more difficult to explain the process On one hand the

actual operations are the same and are easy to follow on the other hand the

terminology is different between graphics and general-purpose use Harris

provides an excellent description of this mapping process

We begin by describing GPU programming using graphics terminology

then show how the same steps are used in a general-purpose way to author

GPGPU applications and finally use the same steps to show the more

simple and direct way that todayrsquos GPU computing applications are written

231 PROGRAMMING A GPU FOR GRAPHICS

We begin with the same GPU pipeline that we described in Section II

concentrating on the programmable aspects of this pipeline

The programmer specifies geometry that covers a region on the screen

The rasterizer generates a fragment at each pixel location covered by that

geometry

Each fragment is shaded by the fragment program

The fragment program computes the value of the fragment by a

combination of math operations and global memory reads from a global

Btexture[ memory

The resulting image can then be used as texture on future passes through

the graphics pipeline

232 PROGRAMMING A GPU FOR GENERAL-PURPOSE

PROGRAMS (OLD)

Coopting this pipeline to perform general-purpose computation

involves the exact same steps but different terminology A motivating

example is a fluid simulation computed over a grid at each time step we

compute the next state of the fluid for each grid point from the current state

at its grid point and at the grid points of its neighbors

The programmer specifies a geometric primitive that covers a

computation domain of interest The rasterizer generates a fragment at each

pixel location covered by that geometry (In our example our primitive

must cover a grid of fragments equal to the domain size of our fluid

simulation)

Each fragment is shaded by an SPMD general-purpose fragment

program (Each grid point runs the same program to update the state of its

fluid)

The fragment program computes the value of the fragment by a

combination of math operations and Bgather[ accesses from global

memory (Each grid point can access the state of its neighbors from the

previous time step in computing its current value)

The resulting buffer in global memory can then be used as an input on

future passes (The current state of the fluid will be used on the next time

step)

23 2 PROGRAMMING A GPU FOR GENERAL-PURPOSE

PROGRAMS (NEW)

One of the historical difficulties in programming GPGPU applications

has been that despite their general-purpose tasksrsquo having nothing to do with

graphics the applications still had to be programmed using graphics APIs

In addition the program had to be structured in terms of the graphics

pipeline with the programmable units only accessible as an intermediate

step in that pipeline when the programmer would almost certainly prefer to

access the programmable units directly The programming environments we

describe in detail in Section IV are solving this difficulty by providing a

more natural direct non-graphics interface to the hardware and

specifically the programmable units Today GPU computing applications

are structured in the following way

The programmer directly defines the computation domain of interest as a

structured grid of threads

An SPMD general-purpose program computes the value of each thread

The value for each thread is computed by a combination of math

operations and both Bgather[ (read) accesses from and Bscatter[ (write)

accesses to global memory Unlike in the previous two

methods the same buffer can be used for both reading and writing

allowing more flexible algorithms (for example in-place algorithms that

use less memory)

The resulting buffer in global memory can then be used as an input in

future computation

24 COARSE-TO-FINE TWO-STAGE MATCHING PROCESS

To further improve the matching process we propose the coarse-to-fine

two-stage matching process In the first stage we matched two images

coarsely using the Y-shape descriptors which is very fast to match because

no registration was needed The matching result in this stage can help filter

out image pairs with low similarities After this step it is still possible for

some false positive matches In the second stage we used WPL descriptor

to register the two images for more detailed descriptor matching including

scale- and translation invariance This stage includes shift transform affine

matrix generation and final WPL descriptor matching Overall we

partitioned the registration and matching processing into four kernels2 in

CUDA (Figure 10) matching on the Y shape descriptor shift

transformation affine matrix generation and final WSL descriptor

matching Combining these two stages the matching program can run faster

and

achieve more accurate score

241 STAGE I MATCHING WITH Y SHAPE DESCRIPTOR

Due to scale- and rotation- invariance of the Y-shape features

registration is unnecessary before matching on Y shape descriptor The

whole matching algorithm is listed as algorithm 1

FIG

Here ytei and yta j are the Y shape descriptors of test template Tte

and target template Tta respectively dϕ is the Euclidian distance of angle

element of descriptors vector defined as (3) dxy is the Euclidian distance of

two descriptor centers defined as (4) ni and di are the matched descriptor

pairsrsquo number and their centers distance respectively tϕ is a distance

threshold and txy is the threshold to restrict the searching area We set tϕ to

30 and txy to 675 in our experiment Here

To match two sclera templates we searched the areas nearby to all

the Y shape branches The search area is limited to the corresponding left or

right half of the sclera in order to reduce the searching range and time The

distance of two branches is defined in (3) where ϕi j is the angle between

the j th branch and the polar from pupil center in desctiptor i

The number of matched pairs ni and the distance between Y shape

branches centers di are stored as the matching result We fuse the number of

matched branches and the average distance between matched branches

centers as (2) Here α is a factor to fuse the matching score which was set

to 30 in our study Ni and Nj is the total numbers of feature vectors in

template i and j separately The decision is regulated by the threshold t if

the sclerarsquos matching score is lower than t the sclera will be discarded The

sclera with high matching score will be passed to the next more precisely

matching process

242 STAGE II FINE MATCHING USING WPL DESCRIPTOR

The line segment WSL descriptor reveals more vessel structure detail of

sclera than the Y shape descriptor The variation of sclera vessel pattern is

nonlinear because

When acquiring an eye image in different gaze angle the vessel structure

will appear nonlinear shrink or extend because eyeball is spherical in shape

sclera is made up of four layers episclera stroma lamina fusca and

endothelium There are slightly differences among movement of these

layers Considering these factors our registration employed both single

shift transform and multi-parameter transform which combines shift

rotation and scale together

1) SHIFT PARAMETER SEARCH As we discussed before

segmentation may not be accurate As a result the detected iris center could

not be very accurate Shift transform is designed to tolerant possible errors

in pupil center detection in the segmentation step If there is no deformation

or only very minor deformation registration with shift transform together

would be adequate to achieve an accurate result We designed Algorithm 2

to get optimized shift parameter Where Tte is the test template and ssei is

the i th WPL descriptor of Tte Tta is the target template and ssai is the i th

WPL descriptor of Ttad(stek staj ) is Euclidean distance of escriptors stek

and staj

Δsk is the shift value of two descriptors defines as

We first randomly select an equal number of segment descriptors

stek in test template Tte from each quad and find its nearest neighbors staj _

in target template Tta The shift offset of them is recorded as the possible

registration shift factor _sk The final offset registration factor is _soptim

which has the smallest standard deviation among these candidate offsets

2) AFFINE TRANSFORM PARAMETER SEARCH

Affine transform is designed to tolerant some deformation of sclera

patterns in the matching step The affine transform algorithm is shown in

Algorithm 3 The shift value in the parameter set is obtained by randomly

selecting descriptor s(it )te and calculating the distance from its nearest

neighbor staj_ in Tta We transform the test template by the matrix in (7)

At end of the iteration we count the numbers of matched descriptor pairs

from the transformed template and the target template The factor β is

involved to determine if the pair of descriptor is matched and we set it to

be 20 pixels in our experiment After N iterations the optimized transform

parameter set is determined via selecting the maximum matching numbers

m(it) Here stei Tte staj and Tta is defined same as algorithm 2 tr (it )

shi f t θ(it )tr (it ) scale is the parameters of shift rotation and scale

transform generated in i tth iteration R(θ (it )) T (tr (it ) shi f t ) and S(tr (it

) scale) are the transform matrix defined as (7) To search optimize

transform parameter we iterated N times to generate these parameters In

our experiment we set iteration time to 512

3) REGISTRATION AND MATCHING ALGORITHM

Using the optimized parameter set determined from Algorithms 2

and 3 the test template will be registered and matched simultaneously The

registration and matching algorithm is listed in Algorithm 4 Here stei Tte

staj and Tta are defined same as Algorithms 2 and 3 θ(optm) tr (optm) shi f

t tr (optm) scale _soptim are the registration parameters attained from

Algorithms 2 and 3 R_θ(optm)_T _tr (optm) shi f t _S_tr (optm) scale

_ is the descriptor transform matrix defined in Algorithm 3 empty is the angle

between the segment descriptor and radius direction w is the weight of the

descriptor which indicates whether the descriptor is at the edge of sclera or

not To ensure that the nearest descriptors have a similar orientation we

used a constant factor α to check the abstract difference of two ɸ In our

experiment we set α to 5 The total matching score is minimal score of two

transformed result divided by the minimal matching score for test template

and target template

25 MAPPING THE SUBTASKS TO CUDA

CUDA is a single instruction multiple data (SIMD) system and

works as a coprocessor with a CPU A CUDA consists of many streaming

multiprocessors (SM) where the parallel part of the program should be

partitioned into threads by the programmer and mapped into those threads

There are multiple memory spaces in the CUDA memory hierarchy

register local memory shared memory global memory constant memory

and texture memory Register local memory and shared memory are on-

chip and could be a little time consuming to access these memories Only

shared memory can be accessed by other threads within the same block

However there is only limited availability of shared memory Global

memory constant memory and texture memory are off-chip memory and

accessible by all threads which would be very time consuming to access

these memories

Constant memory and texture memory are read-only and cacheable

memory Mapping algorithms to CUDA to achieve efficient processing is

not a trivial task There are several challenges in CUDA programming

If threads in a warp have different control path all the branches will be

executed serially To improve performance branch divergence within a

warp should be avoided

Global memory is slower than on-chip memory in term of access To

completely hide the latency of the small instructions set we should use on-

chip memory preferentially rather than global memory When global

memory access occurs threads in same warp should access the words in

sequence to achieve coalescence

Shared memory is much faster than the local and global memory space

But shared memory is organized into banks which are equal in size If two

addresses of memory request from different thread within a warp fall in the

same memory bank the access will be serialized To get maximum

performance memory requests should be scheduled to minimize bank

conflicts

251 MAPPING ALGORITHM TO BLOCKS

Because the proposed registration and matching algorithm has four

independent modules all the modules will be converted to different kernels

on the GPU These kernels are different in computation density thus we

map them to the GPU by various map strategies to fully utilize the

computing power of CUDA Figure 11 shows our scheme of CPU-GPU

task distribution and the partition among blocks and threads Algorithm 1 is

partitioned into coarse-grained parallel subtasks

We create a number of threads in this kernel The number of threads

is the same as the number of templates in the database As the upper middle

column shows in Figure 11 each target template will be assigned to one

thread One thread performs a pair of templates compare In our work we

use NVIDIA C2070 as our GPU Threads and blocks number is set to

1024 That means we can match our test template with up to 1024times1024

target templates at same time

Algorithms 2-4 will be partitioned into fine-grained subtasks which is

processed a section of descriptors in one thread As the lower portion of the

middle column shows in Figure 11 we assigned a target template to one

block Inside a block one thread corresponds a set of descriptors in this

template This partition makes every block execute independently and there

are no data exchange requirements between different blocks When all

threads complete their responding descriptor fractions the sum of the

intermediate results needs to be computed or compared A parallel prefix

sum algorithm is used to calculate the sum of intermediate results which is

show in right of Figure 11 Firstly all odd number threads compute the sum

of consecutive pairs of the results Then recursively every first of i (= 4 8

16 32 64 ) threads

compute the prefix sum on the new result The final result will be saved in

the first address which has the same variable name as the first intermediate

result

252 MAPPING INSIDE BLOCK

In shift argument searching there are two schemes we can choose to

map task

Mapping one pair of templates to all the threads in a block and then every

thread would take charge of a fraction of descriptors and cooperation with

other threads

Assigning a single possible shift offset to a thread and all the threads will

compute independently unless the final result should be compared with

other possible offset

Due to great number of sum and synchronization operations in every

nearest neighbor searching step we choose the second method to parallelize

shift searching In affine matrix generator we mapped an entire parameter

set searching to a thread and every thread randomly generated a set of

parameters and tried them independently The generated iterations were

assigned to all threads The challenge of this step is the randomly generated

numbers might be correlated among threads In the step of rotation and

scale registration generating we used the Mersenne Twister pseudorandom

number generator because it can use bitwise arithmetic and have long

period

The Mersenne twister as most of pseudorandom generators is iterative

Therefore itrsquos hard to parallelize a single twister state update step among

several execution threads To make sure that thousands of threads in the

launch grid generate uncorrelated random sequence many simultaneous

Mersenne twisters need to process with different initial states in parallel

But even ldquovery differentrdquo (by any definition) initial state values do not

prevent the emission of correlated sequences by each generator sharing

identical parameters To solve this problem and to enable efficient

implementation of Mersenne Twister on parallel architectures we used a

special offline tool for the dynamic creation of Mersenne Twisters

parameters modified from the algorithm developed by Makoto Matsumoto

and Takuji Nishimura In the registration and matching step when

searching the nearest neighbor a line segment that has already matched

with others should not be used again In our approach a flag

FIG

FIG

Variable denoting whether the line has been matched is stored in

shared memory To share the flags all the threads in a block should wait

synchronic operation at every query step Our solution is to use a single

thread in a block to process the matching

253 MEMORY MANAGEMENT

The bandwidth inside GPU board is much higher than the

bandwidth between host memory and device memory The data transfer

between host and device can lead to long latency As shown in Figure 11

we load the entire target templates set from database without considering

when they would be processed Therefore there was no data transfer from

host to device during the matching procedure In global memory the

components in descriptor y(ϕ1 ϕ2 ϕ3 x y) and s(x y rθ ϕw) were stored

separately This would guarantee contiguous kernels of Algorithm 2 to 4

can access their data in successive addresses Although such coalescing

access reduces the latency frequently global memory access was still a

slower way to get data In our kernel we loaded the test template to shared

memory to accelerate memory access Because the Algorithms 2 to 4

execute different number of iterations on same data the bank conflict does

not happen To maximize our texture memory space we set the system

cache to the lowest value and bonded our target descriptor to texture

memory Using this catchable memory our data access was accelerated

more

FIG

26 HISTOGRAM OF ORIENTED GRADIENTS

Histogram of oriented gradients is the feature descriptors It is primarily

applied to the design of target detection In this paper it is applied as the

feature for human recognition In the sclera region the vein patterns are the

edges of an image So HOG is used to determine the gradient orientation

and edge orientations of vein pattern in the sclera region of an eye image

To follow out this technique first of all divide the image into small

connected regions called cells For each cell compute the histogram of

gradient directions or edge orientations of the pixels Then the combination

of different histogram of different cell represents the descriptor To improve

accuracy histograms can be contrast normalized by calculating the intensity

from the block and then using this value normalizes all cells within the

block This normalization result shows that it is invariant to geometric and

photometric changes The gradient magnitude m(x y) and orientation 1050592(x

y) are calculated using x and y directions gradients dx (x y) and dy (x y)

Orientation binning is the second step of HOG This method utilized

to create cell histograms Each pixel within the cell used to give a weight to

the orientation which is found in the gradient computation Gradient

magnitude is used as the weight The cells are in the rectangular form The

binning of gradient orientation should be spread over 0 to 180 degrees and

opposite direction counts as the same In the Fig 8 depicts the edge

orientation of picture elements If the images have any illumination and

contrast changes then the gradient strength must be locally normalized For

that cells are grouped together into larger blocks These blocks are

overlapping so that each cell contributes more than once to the final

descriptor Here rectangular HOG (R-HOG) blocks are applied which are

mainly in square grids The performance of HOG is improved by putting

on a Gaussian window into each block

FIG

CHAPTER 3

SOFTWARE SPECIFICATION

31 GENERAL

MATLAB(matrix laboratory) is a numerical

computing environment and fourth-generation programming language

Developed by Math Works MATLAB allows matrix manipulations

plotting of functions and data implementation of algorithms creation

of user interfaces and interfacing with programs written in other languages

including C C++ Java and Fortran

Although MATLAB is intended primarily for numerical computing an

optional toolbox uses the MuPAD symbolic engine allowing access

to symbolic computing capabilities An additional package Simulink adds

graphicalmulti-domainsimulationandModel-Based

Design for dynamic and embedded systems

In 2004 MATLAB had around one million users across industry

and academia MATLAB users come from various backgrounds

of engineering science and economics MATLAB is widely used in

academic and research institutions as well as industrial enterprises

MATLAB was first adopted by researchers and practitioners

in control engineering Littles specialty but quickly spread to many other

domains It is now also used in education in particular the teaching

of linear algebra and numerical analysis and is popular amongst scientists

involved in image processing The MATLAB application is built around the

MATLAB language The simplest way to execute MATLAB code is to type

it in the Command Window which is one of the elements of the MATLAB

Desktop When code is entered in the Command Window MATLAB can

be used as an interactive mathematical shell Sequences of commands can

be saved in a text file typically using the MATLAB Editor as a script or

encapsulated into a function extending the commands available

MATLAB provides a number of features for documenting and

sharing your work You can integrate your MATLAB code with other

languages and applications and distribute your MATLAB algorithms and

applications

32 FEATURES OF MATLAB

High-level language for technical computing

Development environment for managing code files and data

Interactive tools for iterative exploration design and problem solving

Mathematical functions for linear algebra statistics Fourier analysis

filtering optimization and numerical integration

2-D and 3-D graphics functions for visualizing data

Tools for building custom graphical user interfaces

Functions for integrating MATLAB based algorithms with external

applications and languages such as C C++ FORTRAN Javatrade COM

and Microsoft Excel

MATLAB is used in vast area including signal and image

processing communications control design test and measurement

financial modeling and analysis and computational Add-on toolboxes

(collections of special-purpose MATLAB functions) extend the MATLAB

environment to solve particular classes of problems in these application

areas

MATLAB can be used on personal computers and powerful

server systems including the Cheaha compute cluster With the addition of

the Parallel Computing Toolbox the language can be extended with parallel

implementations for common computational functions including for-loop

unrolling Additionally this toolbox supports offloading computationally

intensive workloads to Cheaha the campus compute cluster MATLAB is

one of a few languages in which each variable is a matrix (broadly

construed) and knows how big it is Moreover the fundamental operators

(eg addition multiplication) are programmed to deal with matrices when

required And the MATLAB environment handles much of the bothersome

housekeeping that makes all this possible Since so many of the procedures

required for Macro-Investment Analysis involves matrices MATLAB

proves to be an extremely efficient language for both communication and

implementation

321 INTERFACING WITH OTHER LANGUAGES

MATLAB can call functions and subroutines written in the C

programming language or FORTRAN A wrapper function is created

allowing MATLAB data types to be passed and returned The dynamically

loadable object files created by compiling such functions are termed MEX-

files (for MATLAB executable)

Libraries written in Java ActiveX or NET can be directly called

from MATLAB and many MATLAB libraries (for

example XML or SQL support) are implemented as wrappers around Java

or ActiveX libraries Calling MATLAB from Java is more complicated but

can be done with MATLAB extension which is sold separately by Math

Works or using an undocumented mechanism called JMI (Java-to-Mat lab

Interface) which should not be confused with the unrelated Java that is also

called JMI

As alternatives to the MuPAD based Symbolic Math Toolbox

available from Math Works MATLAB can be connected

to Maple or Mathematical

Libraries also exist to import and export MathML

Development Environment

Startup Accelerator for faster MATLAB startup on Windows especially on

Windows XP and for network installations

Spreadsheet Import Tool that provides more options for selecting and

loading mixed textual and numeric data

Readability and navigation improvements to warning and error messages in

the MATLAB command window

Automatic variable and function renaming in the MATLAB Editor

Developing Algorithms and Applications

MATLAB provides a high-level language and development

tools that let you quickly develop and analyze your algorithms and

applications

The MATLAB Language

The MATLAB language supports the vector and matrix operations

that are fundamental to engineering and scientific problems It enables fast

development and execution With the MATLAB language you can

program and develop algorithms faster than with traditional languages

because you do not need to perform low-level administrative tasks such as

declaring variables specifying data types and allocating memory In many

cases MATLAB eliminates the need for lsquoforrsquo loops As a result one line of

MATLAB code can often replace several lines of C or C++ code

At the same time MATLAB provides all the features of a traditional

programming language including arithmetic operators flow control data

structures data types object-oriented programming (OOP) and debugging

features

MATLAB lets you execute commands or groups of commands one

at a time without compiling and linking enabling you to quickly iterate to

the optimal solution For fast execution of heavy matrix and vector

computations MATLAB uses processor-optimized libraries For general-

purpose scalar computations MATLAB generates machine-code

instructions using its JIT (Just-In-Time) compilation technology

This technology which is available on most platforms provides

execution speeds that rival those of traditional programming languages

Development Tools

MATLAB includes development tools that help you implement

your algorithm efficiently These include the following

MATLAB Editor

Provides standard editing and debugging features such as setting

breakpoints and single stepping

Code Analyzer

Checks your code for problems and recommends modifications to

maximize performance and maintainability

MATLAB Profiler

Records the time spent executing each line of code

Directory Reports

Scan all the files in a directory and report on code efficiency file

differences file dependencies and code coverage

Designing Graphical User Interfaces

By using the interactive tool GUIDE (Graphical User Interface

Development Environment) to layout design and edit user interfaces

GUIDE lets you include list boxes pull-down menus push buttons radio

buttons and sliders as well as MATLAB plots and Microsoft

ActiveXreg controls Alternatively you can create GUIs programmatically

using MATLAB functions

322 ANALYZING AND ACCESSING DATA

MATLAB supports the entire data analysis process from acquiring

data from external devices and databases through preprocessing

visualization and numerical analysis to producing presentation-quality

output

Data Analysis

MATLAB provides interactive tools and command-line functions for data

analysis operations including

Interpolating and decimating

Extracting sections of data scaling and averaging

Thresholding and smoothing

Correlation Fourier analysis and filtering

1-D peak valley and zero finding

Basic statistics and curve fitting

Matrix analysis

Data Access

MATLAB is an efficient platform for accessing data from

files other applications databases and external devices You can read data

from popular file formats such as Microsoft Excel ASCII text or binary

files image sound and video files and scientific files such as HDF and

HDF5 Low-level binary file IO functions let you work with data files in

any format Additional functions let you read data from Web pages and

XML

Visualizing Data

All the graphics features that are required to visualize engineering

and scientific data are available in MATLAB These include 2-D and 3-D

plotting functions 3-D volume visualization functions tools for

interactively creating plots and the ability to export results to all popular

graphics formats You can customize plots by adding multiple axes

changing line colors and markers adding annotation Latex equations and

legends and drawing shapes

2-D Plotting

Visualizing vectors of data with 2-D plotting functions that create

Line area bar and pie charts

Direction and velocity plots

Histograms

Polygons and surfaces

Scatterbubble plots

Animations

3-D Plotting and Volume Visualization

MATLAB provides functions for visualizing 2-D matrices 3-

D scalar and 3-D vector data You can use these functions to visualize and

understand large often complex multidimensional data Specifying plot

characteristics such as camera viewing angle perspective lighting effect

light source locations and transparency

3-D plotting functions include

Surface contour and mesh

Image plots

Cone slice stream and isosurface

323 PERFORMING NUMERIC COMPUTATION

MATLAB contains mathematical statistical and engineering

functions to support all common engineering and science operations These

functions developed by experts in mathematics are the foundation of the

MATLAB language The core math functions use the LAPACK and BLAS

linear algebra subroutine libraries and the FFTW Discrete Fourier

Transform library Because these processor-dependent libraries are

optimized to the different platforms that MATLAB supports they execute

faster than the equivalent C or C++ code

MATLAB provides the following types of functions for performing

mathematical operations and analyzing data

Matrix manipulation and linear algebra

Polynomials and interpolation

Fourier analysis and filtering

Data analysis and statistics

Optimization and numerical integration

Ordinary differential equations (ODEs)

Partial differential equations (PDEs)

Sparse matrix operations

MATLAB can perform arithmetic on a wide range of data types

including doubles singles and integers

CHAPTER 4

IMPLEMENTATION

41 GENERAL

Matlab is a program that was originally designed to simplify the

implementation of numerical linear algebra routines It has since grown into

something much bigger and it is used to implement numerical algorithms

for a wide range of applications The basic language used is very similar to

standard linear algebra notation but there are a few extensions that will

likely cause you some problems at first

42 SNAPSHOTS

ORIGINAL SCLERA IMAGE IS CONVERTED INTO GREY SCALE IMAGE

FIG

GREY SCALE IMAGE IS CONVERTED INTO BINARY IMAGE

FIG

EDGE DETECTON IS DONE BY OTSUrsquoS THRESHOLDING

FIG

SELECTING THE REGION OF INTEREST (SCLERA PART)

FIG

SELECTED ROI PART

FIG

FIG

ENHANCEMENT OF SCLERA IMAGE

FIG

FEATURE EXTRACTION OF SCLERA IMAGE USING GABOR

FILTERS

FIG

MATCHING WITH IMAGES IN DATABASE

FIG

DISPLAYING THE RESULT (MATCHED OR NOT MATCHED)

FIG

CHAPTER 5

APPLICATIONS

The applications of biometrics can be divided into the following three main groups

Commercial applications such as computer network login electronic data security ecommerce Internet access ATM credit card physical access control cellular phone PDA medical records management distance learning etc

Government applications such as national ID card correctional facility driverrsquos license social security welfare-disbursement border control Passport control etc

Forensic applications such as corpse identification criminal investigation terrorist identification parenthood determination missing children etc Traditionally commercial applications have used knowledge-based systems (eg PIN sand passwords) government applications have used token-based systems (eg ID cards and badges) and forensic applications have relied on human experts to match biometric features Biometric systems are being increasingly deployed in large scale civilian applications The Schiphol Premium scheme at the Amsterdam airport for example employs iris scan cards to speed up the passport and visa control procedures

CHAPTER 6

CONCLUSION AND FUTURE SCOPE

61 CONCLUSION

In this paper we proposed a new parallel sclera vein recognition

method which employees a two stage parallel approach for registration and

matching Even though the research focused on developing a parallel sclera

matching solution for the sequential line-descriptor method using CUDA

GPU architecture the parallel strategies developed in this research can be

applied to design parallel solutions to other sclera vein recognition methods

and general pattern recognition methods We designed the Y shape

descriptor to narrow the search range to increase the matching efficiency

which is a new feature extraction method to take advantage of the GPU

structures We developed the WPL descriptor to incorporate mask

information and make it more suitable for parallel computing which can

dramatically reduce data transferring and computation We then carefully

mapped our algorithms to GPU threads and blocks which is an important

step to achieve parallel computation efficiency using a GPU A work flow

which has high arithmetic intensity to hide the memory access latency was

designed to partition the computation task to the heterogeneous system of

CPU and GPU even to the threads in GPU The proposed method

dramatically improves the matching efficiency without compromising

recognition accuracy

62 REFERENCES

[1] C W Oyster The Human Eye Structure and Function Sunderland

Sinauer Associates 1999

[2] C Cuevas D Berjon F Moran and N Garcia ldquoMoving object

detection for real-time augmented reality applications in a GPGPUrdquo IEEE

Trans Consum Electron vol 58 no 1 pp 117ndash125 Feb 2012

[3] D C Cirean U Meier L M Gambardella and J Schmidhuber ldquoDeep

big simple neural nets for handwritten digit recognitionrdquo Neural Comput

vol 22 no 12 pp 3207ndash3220 2010

[4] F Z Sakr M Taher and A M Wahba ldquoHigh performance iris

recognition system on GPUrdquo in Proc ICCES 2011 pp 237ndash242

[5] G Poli J H Saito J F Mari and M R Zorzan ldquoProcessing

neocognitron of face recognition on high performance environment based

on GPU with CUDA architecturerdquo in Proc 20th Int Symp Comput

Archit High Perform Comput 2008 pp 81ndash88

[6] J Antikainen J Havel R Josth A Herout P Zemcik and M Hauta-

Kasari ldquoNonnegative tensor factorization accelerated using GPGPUrdquo IEEE

Trans Parallel Distrib Syst vol 22 no 7 pp 1135ndash1141 Feb 2011

[7] K-S Oh and K Jung ldquoGPU implementation of neural networksrdquo

Pattern Recognit vol 37 no 6 pp 1311ndash1314 2004

[8] P R Dixon T Oonishi and S Furui ldquoHarnessing graphics processors

for the fast computation of acoustic likelihoods in speech recognitionrdquo

Comput Speech Lang vol 23 no 4 pp 510ndash526 2009

[9] P Kaufman and A Alm ldquoClinical applicationrdquo Adlerrsquos Physiology of

the Eye 2003

[10] R N Rakvic B J Ulis R P Broussard R W Ives and N Steiner

ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics Security vol 4

no 4 pp 812ndash823 Dec 2009

[11] S Crihalmeanu and A Ross ldquoMultispectral scleral patterns for ocular

biometric recognitionrdquo Pattern Recognit Lett vol 33 no 14 pp 1860ndash

1869 Oct 2012

[12] W Wenying Z Dongming Z Yongdong L Jintao and G

Xiaoguang ldquoRobust spatial matching for object retrieval and its parallel

implementation on GPUrdquo IEEE Trans Multimedia vol 13 no 6 pp

1308ndash1318 Dec 2011Multimedia Sec Magdeburg Germany Sep 2004

pp 4ndash15

[13] Y Xu S Deka and R Righetti ldquoA hybrid CPU-GPGPU approach for

real-time elastographyrdquo IEEE Trans Ultrason Ferroelectr Freq Control

vol 58 no 12 pp 2631ndash2645 Dec 2011

[14] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive

multimodal eye recognitionrdquo Signal Image Video Process vol 7 no 4

pp 619ndash631 Jul 2013

[15] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive

approach for sclera image quality measurerdquo Int J Biometrics vol 5 no 2

pp 181ndash198 2013

[16] Z Zhou E Y Du N L Thomas and E J Delp ldquoA new human

identification method Sclera recognitionrdquo IEEE Trans Syst Man

Cybern A Syst Humans vol 42 no 3 pp 571ndash583 May 2012

Page 9: Major project report on

parts of the iris image

28 The sketch of parameters of

segment descriptor

26

29 The weighting image 28

210 The module of sclera

template matching

28

211 The Y shape vessel branch in

sclera

28

212 The rotation and scale

invariant character of Y

shape vessel branch

29

213 The line descriptor of the

sclera vessel pattern

30

214 The key elements of

descriptor vector

31

215 Simplified sclera matching

steps on GPU

32

216 Two-stage matching scheme 35

217 Example image from the

UBIRIS database

42

218 Occupancy on various thread

numbers per block

43

219 The task assignment inside

and outside the GPU

44

220 HOG features 46

41 Original sclera image 65

42 Binarised sclera image 65

43 Edge map subtracted image 66

44 Cropping roi 66

45 Roi mask 67

46 Roi finger sclera image 67

47 Enhanced sclera image 68

48 Feature extracted sclera

image

68

49 Matching with images in

database

69

410 Result 69

ABSTRACT

Sclera vein recognition is shown to be a promising method for human

identification However its matching speed is slow which could impact its

application for real-time applications To improve the matching efficiency

we proposed a new parallel sclera vein recognition method using a two-

stage parallel approach for registration and matching First we designed a

rotation- and scale-invariant Y shape descriptor based feature extraction

method to efficiently eliminate most unlikely matches Second we

developed a weighted polar line sclera descriptor structure to incorporate

mask information to reduce GPU memory cost Third we designed a

coarse-to-fine two-stage matching method Finally we developed a

mapping scheme to map the subtasks to GPU processing units The

experimental results show that our proposed method can achieve dramatic

processing speed improvement without compromising the recognition

accuracy

CHAPTER 1

INTRODUCTION

11GENERAL

Digital image processing is the use of computer algorithms to

perform image processing on digital images The 2D continuous image is

divided into N rows and M columns The intersection of a row and a

column is called a pixel The image can also be a function other variables

including depth color and time An image given in the form of a

transparency slide photograph or an X-ray is first digitized and stored as a

matrix of binary digits in computer memory This digitized image can then

be processed andor displayed on a high-resolution television monitor For

display the image is stored in a rapid-access buffer memory which

refreshes the monitor at a rate of 25 frames per second to produce a visually

continuous display

12 OVERVIEW ABOUT DIGITAL IMAGE PROCESSING

The field of ldquoDigital Image Processingrdquo refers to processing the digital

images by means of a digital computer In a broader sense it can be

considered as a processing of any two dimensional data where any image

(optical information) is represented as an array of real or complex numbers

represented by a definite number of bits An image is represented as a two

dimensional function f(xy) where lsquoxrsquo and lsquoyrsquo are spatial (plane)

coordinates and the amplitude of f at any pair of coordinates (xy)

represents the intensity or gray level of the image at that point

A digital image is one for which both the co-ordinates and the

amplitude values of f are all finite discrete quantities Hence a digital

image is composed of a finite number of elements each of which has a

particular location value These elements are called ldquopixelsrdquo A digital

image is discrete in both spatial coordinates and brightness and it can be

considered as a matrix whose rows and column indices identify a point on

the image and the corresponding matrix element value identifies the gray

level at that point

One of the first applications of digital images was in the newspaper

industry when pictures were first sent by submarine cable between London

and New York Introduction of the Bartlane cable picture transmission

system in the early 1920s reduced the time required to transport a picture

across the Atlantic from more than a week to less than three hours

FIG

121 PREPROCESSING

In imaging science image processing is any form of signal

processing for which the input is an image such as a photograph or video

frame the output of image processing may be either an image or a set of

characteristics or parameters related to the image Most image-processing

techniques involve treating the image as a two-dimensional signal and

applying standard signal-processing techniques to it Image processing

usually refers to digital image processing but optical and analog image

processing also are possible This article is about general techniques that

apply to all of them The acquisition of images (producing the input image

in the first place) is referred to as imaging

Image processing refers to processing of a 2D picture by a

computer Basic definitions

An image defined in the ldquoreal worldrdquo is considered to be a function

of two real variables for example a(xy) with a as the amplitude (eg

brightness) of the image at the real coordinate position (xy) Modern digital

technology has made it possible to manipulate multi-dimensional signals

with systems that range from simple digital circuits to advanced parallel

computers The goal of this manipulation can be divided into three

categories

Image processing (image in -gt image out)

Image Analysis (image in -gt measurements out)

Image Understanding (image in -gt high-level description out)

An image may be considered to contain sub-images sometimes referred

to as regions-of-interest ROIs or simply regions This concept reflects the

fact that images frequently contain collections of objects each of which can

be the basis for a region In a sophisticated image processing system it

should be possible to apply specific image processing operations to selected

regions Thus one part of an image (region) might be processed to suppress

motion blur while another part might be processed to improve colour

rendition

Most usually image processing systems require that the images be

available in digitized form that is arrays of finite length binary words For

digitization the given Image is sampled on a discrete grid and each sample

or pixel is quantized using a finite number of bits The digitized image is

processed by a computer To display a digital image it is first converted

into analog signal which is scanned onto a display Closely related to

image processing are computer graphics and computer vision In computer

graphics images are manually made from physical models of objects

environments and lighting instead of being acquired (via imaging devices

such as cameras) from natural scenes as in most animated movies

Computer vision on the other hand is often considered high-level image

processing out of which a machinecomputersoftware intends to decipher

the physical contents of an image or a sequence of images (eg videos or

3D full-body magnetic resonance scans)

In modern sciences and technologies images also gain much

broader scopes due to the ever growing importance of scientific

visualization (of often large-scale complex scientificexperimental data)

Examples include microarray data in genetic research or real-time multi-

asset portfolio trading in finance Before going to processing an image it is

converted into a digital form Digitization includes sampling of image and

quantization of sampled values After converting the image into bit

information processing is performed This processing technique may be

Image enhancement Image restoration and Image compression

122 IMAGE ENHANCEMENT

It refers to accentuation or sharpening of image features such as

boundaries or contrast to make a graphic display more useful for display amp

analysis This process does not increase the inherent information content in

data It includes gray level amp contrast manipulation noise reduction edge

crispening and sharpening filtering interpolation and magnification

pseudo coloring and so on

123 IMAGE RESTORATION

It is concerned with filtering the observed image to minimize the

effect of degradations Effectiveness of image restoration depends on the

extent and accuracy of the knowledge of degradation process as well as on

filter design Image restoration differs from image enhancement in that the

latter is concerned with more extraction or accentuation of image features

124 IMAGE COMPRESSION

It is concerned with minimizing the number of bits required to represent

an image Application of compression are in broadcast TV remote sensing

via satellite military communication via aircraft radar teleconferencing

facsimile transmission for educational amp business documents medical

images that arise in Computer tomography magnetic resonance imaging

and digital radiology motion pictures satellite images weather maps

geological surveys and so on

Text compression ndash CCITT GROUP3 amp GROUP4

Still image compression ndash JPEG

Video image compression ndash MPEG

125 SEGMENTATION

In computer vision image segmentation is the process of

partitioning a digital image into multiple segments (sets of pixels also

known as super pixels) The goal of segmentation is to simplify andor

change the representation of an image into something that is more

meaningful and easier to analyze Image segmentation is typically used to

locate objects and boundaries (lines curves etc) in images More precisely

image segmentation is the process of assigning a label to every pixel in an

image such that pixels with the same label share certain visual

characteristics

The result of image segmentation is a set of segments that

collectively cover the entire image or a set of contours extracted from the

image (see edge detection) Each of the pixels in a region are similar with

respect to some characteristic or computed property such as

colour intensity or texture Adjacent regions are significantly different

with respect to the same characteristic(s) When applied to a stack of

images typical in medical imaging the resulting contours after image

segmentation can be used to create 3D reconstructions with the help of

interpolation algorithms like marching cubes

126 IMAGE RESTORATION

Image restoration like enhancement improves the qualities of image

but all the operations are mainly based on known measured or

degradations of the original image Image restorations are used to restore

images with problems such as geometric distortion improper focus

repetitive noise and camera motion It is used to correct images for known

degradations

127 FUNDAMENTAL STEPS

Image acquisition to acquire a digital image

Image preprocessing to improve the image in ways that increases the

chances for success of the other processes

Image segmentation to partitions an input image into its constituent parts or

objects

Image representation to convert the input data to a form suitable for

computer processing

Image description to extract features that result in some quantitative

information of interest or features that are basic for differentiating one

class of objects from another

Image recognition to assign a label to an object based on the

information provided by its descriptors

Image interpretation to assign meaning to an ensemble of recognized

objects

Knowledge about a problem domain is coded into an image processing

system in the form of a Knowledge database

13 A SIMPLE IMAGE MODEL

To be suitable for computer processing an image f(xy) must be digitalized

both spatially and in amplitude

Digitization of the spatial coordinates (xy) is called image sampling

Amplitude digitization is called gray-level quantization

The storage and processing requirements increase rapidly with the spatial

resolution and the number of gray levels

Example A 256 gray-level image of size 256x256 occupies 64K bytes of

memory

Images of very low spatial resolution produce a checkerboard effect

The use of insufficient number of gray levels in smooth areas of a digital

image results in false contouring

14 IMAGE FILE FORMATS

There are two general groups of lsquoimagesrsquo vector graphics (or line art)

and bitmaps (pixel-based or lsquoimagesrsquo) Some of the most common file

formats are

GIF mdash Graphical interchange Format An 8-bit (256 colour) non-

destructively compressed bitmap format Mostly used for web Has several

sub-standards one of which is the animated GIF

JPEG mdash Joint Photographic Experts Group a very efficient (ie much

information per byte) destructively compressed 24 bit (16 million colours)

bitmap format Widely used especially for web and Internet (bandwidth-

limited)

TIFF mdash Tagged Image File Format The standard 24 bit publication bitmap

format Compresses non-destructively with for instance Lempel-Ziv-

Welch (LZW) compression

PS mdash Postscript a standard vector format Has numerous sub-standards

and can be difficult to transport across platforms and operating systems

PSD ndash Adobe PhotoShop Document a dedicated Photoshop format that

keeps all the information in an image including all the layers

BMP- bit map file format

15 TYPE OF IMAGES

Images are 4 types

1 Binary image

2 Gray scale image

3 Color image

4 Indexed image

151 BINARY IMAGES

A binary image is a digital image that has only two possible values for

each pixel Typically the two colors used for a binary image are black and

white though any two colors can be used Binary images are also called bi-

level or two-level This means that each pixel is stored as a single bitmdashie

a 0 or 1 The names black-and-white BampW

152 GRAY SCALE IMAGE

In a (8-bit) grayscale image each picture element has an assigned intensity

that ranges from 0 to 255 A grey scale image is what people normally call

a black and white image but the name emphasizes that such an image will

also include many shades of grey

FIG

153 COLOR IMAGE

The RGB colour model relates very closely to the way we perceive

colour with the r g and b receptors in our retinas RGB uses additive colour

mixing and is the basic colour model used in television or any other

medium that projects colour with light It is the basic colour model used in

computers and for web graphics but it cannot be used for print production

The secondary colours of RGB ndash cyan magenta and yellow ndash are formed

by mixing two of the primary colours (red green or blue) and excluding the

third colour Red and green combine to make yellow green and blue to

make cyan and blue and red form magenta The combination of red green

and blue in full intensity makes white

In Photoshop using the ldquoscreenrdquo mode for the different layers in an

image will make the intensities mix together according to the additive

colour mixing model This is analogous to stacking slide images on top of

each other and shining light through them

FIG

CMYK The 4-colour CMYK model used in printing lays down

overlapping layers of varying percentages of transparent cyan (C) magenta

(M) and yellow (Y) inks In addition a layer of black (K) ink can be added

The CMYK model uses the subtractive colour model

154 INDEXED IMAGE

FIG

An indexed image consists of an array and a color map matrix The

pixel values in the array are direct indices into a color map By convention

this documentation uses the variable name X to refer to the array and map

to refer to the color map In computing indexed color is a technique to

manage digital images colors in a limited fashion in order to save

computer memory and file storage while speeding up display refresh and

file transfers It is a form of vector quantization compression

When an image is encoded in this way color information is not

directly carried by the image pixel data but is stored in a separate piece of

data called a palette an array of color elements in which every element a

color is indexed by its position within the array The image pixels do not

contain the full specification of its color but only its index in the palette

This technique is sometimes referred as pseudocolor or indirect color as

colors are addressed indirectly

Perhaps the first device that supported palette colors was a random-

access frame buffer described in 1975 by Kajiya Sutherland and Cheadle

This supported a palette of 256 36-bit RGB colors

16 Applications of image processing

Interest in digital image processing methods stems from 2 principal

application areas

1) Improvement of pictorial information for human interpretation

2) Processing of scene data for autonomous machine perception

In the second application area interest focuses on procedures for

extracting from an image

Information in a form suitable for computer processing

Examples include automatic character recognition industrial machine

vision for product assembly and inspection military recognizance

automatic processing of fingerprints etc

17 EXISTING SYSTEM

In Crihalmeanu and Ross proposed three approaches Speed Up Robust

Features (SURF)-based method minutiae detection and direct correlation

matching for feature registration and matching Within these three methods

the SURF method achieves the best accuracy It takes an average of 15

seconds1 using the SURF method to perform a one-to-one matching In

Zhou et al Proposed line descriptor-based method for sclera vein

recognition The matching step (including registration) is the most time-

consuming step in this sclera vein recognition system which costs about 12

seconds to perform a one-to-one matching Both speed was calculated using

a PC with Intelreg Core TM 2 Duo 24GHz processors and 4 GB DRAM

Currently Sclera vein recognition algorithms are designed using central

processing unit (CPU)-based systems

171 DISADVANTAGES OF EXISTING SYSTEM

1 Mask files are used to calculate valid overlapping areas of two sclera

templates and to align the templates to the same coordinate system But the

mask files are large in size and will preoccupy the GPU memory and slow

down the data transfer Also some of processing on the mask files will

involve convolution which is difficult to improve its performance on the

scalar process unit on CUDA

2 The procedure of sclera feature matching consists of a pipeline of several

computational stages with different memory and processing requirements

There is no uniform mapping scheme applicable for all these stages

3 When the scale of sclera database is far larger than the number of the

processing units on the GPU parallel matching on the GPU is still unable to

satisfy the requirement of real-time performance

LITERATURE SURVEY

1 S Crihalmeanu and A Ross ldquo ieee Multispectral scleral patterns

for ocular biometric recognitionrdquo Pattern Recognit Lett vol 33 no

14 pp 1860ndash1869 Oct 2012

Face recognition in unconstrained acquisition conditions is one of the

most challenging problems that has been actively researched in recent

years It is well known that many state-of-the-arts still face recognition

algorithms perform well when constrained (frontal well illuminated high-

resolution sharp and full) face images are acquired However their

performance degrades significantly when the test images contain variations

that are not present in the training images In this paper we highlight some

of the key issues in remote face recognition We define the remote face

recognition as one where faces are several tens of meters (10-250m) from

the cameras We then describe a remote face database which has been

acquired in an unconstrained outdoor maritime environment Recognition

performance of a subset of existing still image-based face recognition

algorithms is evaluated on the remote face data set Further we define the

remote re-identification problem as matching a subject at one location with

candidate sets acquired at a different location and over time in remote

conditions We provide preliminary experimental results on remote re-

identification It is demonstrated that in addition to applying a good

classification algorithm finding features that are robust to variations

mentioned above and developing statistical models which can account for

these variations are very important for remote face recognition

2 R N Rakvic B J Ulis R P Broussard R W Ives and N

Steiner1 ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics

Security

With the rapidly expanded biometric data collected by various sectors

of government and industry for identification and verification purposes

how to manage and process such Big Data draws great concern Even

though modern processors are equipped with more cores and memory

capacity it still requires careful design in order to utilize the hardware

resource effectively and the power consumption efficiently This research

addresses this issue by investigating the workload characteristics of

biometric application Taking Daugmanrsquos iris matching algorithm which

has been proven to be the most reliable iris matching method as a case

study we conduct performance profiling and binary instrumentation on the

benchmark to capture its execution behavior The results show that data

loading and memory access incurs great performance overhead and

motivates us to move the biometrics computation to high-performance

architecture

Modern iris recognition algorithms can be computationally intensive

yet are designed for traditional sequential processing elements such as a

personal computer However a parallel processing alternative using field

programmable gate arrays (FPGAs) offers an opportunity to speed up iris

recognition Within the means of this project iris template generation with

directional filtering which is acomputationally expensive yet parallel

portion of a modern iris recognition algorithm is parallelized on an FPGA

system We will present a performance comparison of the parallelized

algorithm on the FPGA system to a traditional CPU-based version The

parallelized template generation outperforms an optimized C++ code

version determining the information content of an iris approximately 324

times faster

3 R Derakhshani A Ross and S Crihalmeanu ldquoA new biometric

modality based on conjunctival vasculaturerdquo in Proc Artif Neural

Netw Eng 2006 pp 1ndash8

A new biometric indicator based on the patterns of conjunctival

vasculature is proposed Conjunctival vessels can be observed on the visible

part of the sclera that is exposed to the outside world These vessels

demonstrate rich and specific details in visible light and can be easily

photographed using a regular digital camera In this paper we discuss

methods for conjunctival imaging preprocessing and feature extraction in

order to derive a suitable conjunctival vascular template for biometric

authentication Commensurate classification methods along with the

observed accuracy are discussed Experimental results suggest the potential

of using conjunctival vasculature as a biometric measure Identification of

a person based on some unique set of features is an important task The

human identification is possible with several biometric systems and sclera

recognition is one of the promising biometrics The sclera is the white

portion of the human eye The vein pattern seen in the sclera region is

unique to each person Thus the sclera vein pattern is a well suited

biometric technology for human identification The existing methods used

for sclera recognition have some drawbacks like only frontal looking

images are preferred for matching and rotation variance is another problem

These problems are completely eliminated in the proposed system by using

two feature extraction techniques They are Histogram of Oriented

Gradients (HOG) and converting the image into polar form using the

bilinear interpolation technique These two features help the proposed

system to become illumination invariant and rotation invariant The

experimentation is done with the help of UBIRIS database The

experimental result shows that the proposed sclera recognition method can

achieve better accuracy than the previous methods

4 J D Owens M Houston D Luebke S Green J E Stone and J

C Phillips ldquoGPU computingrdquo Proc IEEE vol 96 no 5pp 879ndash899

May 2008

The graphics processing unit (GPU) has become an integral part of

todayrsquos mainstream computing systems Over the past six years there has

been a marked increase in the performance and capabilities of GPUs The

modern GPU is not only a powerful graphics engine but also a highly

parallel programmable processor featuring peak arithmetic and memory

bandwidth that substantially outpaces its CPU counterpart The GPUrsquos

rapid increase in both programmability and capability has spawned a

research community that has successfully mapped a broad range of

computationally demanding complex problems to the GPU This effort in

general purpose computing on the GPU also known as GPU computing

has positioned the GPU as a compelling alternative to traditional

microprocessors in high-performance computer systems of the future We

describe the background hardware and programming model for GPU

computing summarize the state of the art in tools and techniques and

present four GPU computing successes in game physics and computational

biophysics that deliver order-of-magnitude performance gains over

optimized CPU applications

5 H Proenccedila and L A Alexandre ldquoUBIRIS A noisy iris image

databaserdquoin Proc 13th Int Conf Image Anal Process 2005 pp 970ndash

977

This paper proposes algorithms for iris segmentation quality

enhancement match score fusion and indexing to improve both the

accuracy and the speed of iris recognition A curve evolution approach is

proposed to effectively segment a nonideal iris image using the modified

MumfordndashShah functional Different enhancement algorithms are

concurrently applied on the segmented iris image to produce multiple

enhanced versions of the iris image A support-vector-machine-based

learning algorithm selects locally enhanced regions from each globally

enhanced image and combines these good-quality regions to create a single

high-quality iris image Two distinct features are extracted from the high-

quality iris image The global textural feature is extracted using the 1-D log

polar Gabor transform and the local topological feature is extracted using

Euler numbers An intelligent fusion algorithm combines the textural and

topological matching scores to further improve the iris recognition

performance and reduce the false rejection rate whereas an indexing

algorithm enables fast and accurate iris identification The verification and

identification performance of the propose algorithms is validated and

compared with other algorithms using the CASIA Version 3 ICE 2005 and

UBIRIS iris databases

18 PROPOSED METHOD

Proposed a new parallel sclera vein recognition method using a two-

stage parallel approach for registration and matching A parallel sclera

matching solution for Sclera vein recognition using our sequential line-

descriptor method using the CUDA GPU architecture CUDA is a highly

parallel multithreaded many-core processor with tremendous

computational power

It supports not only a traditional graphics pipeline but also computation

on non-graphical data It is relatively straightforward to implement our C

program for CUDA on AMD-based GPU using Open CL Our CUDA

kernels can be directly converted to Open CL kernels by concerning

different syntax for various keywords and built-in functions The mapping

strategy is also effective in Open CL if we regard thread and block in

CUDA as work item and work-group in Open CL Most of our optimization

techniques such as coalesced memory access and prefix sum can work in

Open CL too Moreover since CUDA is a data parallel architecture the

implementation of our approach by Open CL should be programmed in

data-parallel model

In this research we first discuss why the naiumlve parallel approach would

not work We then propose the new sclera descriptor ndash the Y shape sclera

feature-based efficient registration method to speed up the mapping scheme

introduce the ldquoweighted polar line (WPL) descriptorrdquo that would be better

suited for parallel computing to mitigate the mask size issue and develop

our coarse to fine two-stage matching process to dramatically improve the

matching speed These new approaches make the parallel processing

possible and efficient

191PROPOSED SYSTEM ADVANTAGES

1 To improve the efficiency in this research we propose a new descriptor

mdash the Y shape descriptor which can greatly help improve the efficiency of

the coarse registration of two images and can be used to filter out some

non-matching pairs before refined matching

2 We propose the coarse-to-fine two-stage matching process In the first

stage we matched two images coarsely using the Y-shape descriptors

which is very fast to match because no registration was needed The

matching result in this stage can help filter out image pairs with low

similarities

CHAPTER 2

PROJECT DESCRIPTION

21 INTRODUCTION

The sclera is the opaque and white outer layer of the eye The blood

vessel structure of sclera is formed randomly and is unique to each person

which can be used for humanrsquos identification Several researchers have

designed different Sclera vein recognition methods and have shown that it

is promising to use Sclera vein recognition for human identification In

Crihalmeanu and Ross proposed three approaches Speed Up Robust

Features (SURF)-based method minutiae detection and direct correlation

matching for feature registration and matching Within these three methods

the SURF method achieves the best accuracy It takes an average of 15

seconds1 using the SURF method to per- form a one-to-one matching Zhou

et al proposed line descriptor-based method for sclera vein recognition

The matching step (including registration) is the most time-consuming step

in this sclera vein recognition system which costs about 12 seconds to

perform a one-to-one matching Both speed was calculated using a PC with

Intelreg Coretrade 2 Duo 24GHz processors and 4 GB DRAM Currently

Sclera vein recognition algorithms are designed using central processing

unit (CPU)-based systems

As discussed CPU-based systems are designed as sequential

processing devices which may not be efficient in data processing where the

data can be parallelized Because of large time consumption in the matching

step Sclera vein recognition using sequential-based method would be very

challenging to be implemented in a real time biometric system especially

when there is large number of templates in the database for matching GPUs

(as abbreviation of General purpose Graphics Processing Units GPGPUs)

are now popularly used for parallel computing to improve the

computational processing speed and efficiency The highly parallel

structure of GPUs makes them more effective than CPUs for data

processing where processing can be performed in parallel GPUs have been

widely used in biometrics recognition such as speech recognition text

detection handwriting recognition and face recognition In iris

recognition GPU was used to extract the features construct descriptors

and match templates

GPUs are also used for object retrieval and image search Park et al

designed the performance evaluation of image processing algorithms such

as linear feature extraction and multi-view stereo matching on GPUs

However these approaches were designed for their specific biometric

recognition applications and feature searching methods Therefore they may

not be efficient for Sclera vein recognition Compute Unified Device

Architecture (CUDA) the computing engine of NVIDIA GPUs is used in

this research CUDA is a highly parallel multithreaded many-core

processor with tremendous computational power It supports not only a

traditional graphics pipeline but also computation on non-graphical data

More importantly it offers an easier programming platform which

outperforms its CPU counterparts in terms of peak arithmetic intensity and

memory bandwidth In this research the goal is not to develop a unified

strategy to parallelize all sclera matching methods because each method is

quite different from one another and would need customized design To

develop an efficient parallel computing scheme it would need different

strategies for different Sclera vein recognition methods

Rather the goal is to develop a parallel sclera matching solution for

Sclera vein recognition using our sequential line-descriptor method using

the CUDA GPU architecture However the parallelization strategies

developed in this research can be applied to design parallel approaches for

other Sclera vein recognition methods and help parallelize general pattern

recognition methods Based on the matching approach in there are three

challenges to map the task of sclera feature matching to GPU

1) Mask files are used to calculate valid overlapping areas of two sclera

templates and to align the templates to the same coordinate system But the

mask files are large in size and will preoccupy the GPU memory and slow

down the data transfer Also some of processing on the mask files will

involve convolution which is difficult to improve its performance on the

scalar process unit on CUDA

2) The procedure of sclera feature matching consists of a pipeline of several

computational stages with different memory and processing requirements

There is no uniform mapping scheme applicable for all these stages

3) When the scale of sclera database is far larger than the number of the

processing units on the GPU parallel matching on the GPU is still unable to

satisfy the requirement of real-time performance New designs are

necessary to help narrow down the search range In summary naiumlve

implementation of the algorithms in parallel would not work efficiently

Note it is relatively straightforward to implement our C program for

CUDA on AMD-based GPU using Open CL Our CUDA kernels can be

directly converted to Open CL kernels by concerning different syntax for

various keywords and built-in functions The mapping strategy is also

effective in Open CL if we regard thread and block in CUDA as work item

and work-group in Open CL Most of our optimization techniques such as

coalesced memory access and prefix sum can work in Open CL too

Moreover since CUDA is a data parallel architecture the implementation

of our approach by Open CL should be programmed in data-parallel model

In this research we first discuss why the naiumlve parallel approach would not

work (Section 3) We then propose the new sclera descriptor ndash the Y shape

sclera feature-based efficient registration method to speed up the mapping

scheme (Section 4) introduce the ldquoweighted polar line (WPL) descriptorrdquo

that would be better suited for parallel computing to mitigate the mask size

issue (Section 5) and develop our coarse to fine two-stage matching

process to dramatically improve the matching speed (Section 6) These new

approaches make the parallel processing possible and efficient However it

is non-trivial to implement these algorithms in CUDA We then developed

the implementation schemes to map our algorithms into CUDA (Section 7)

In the Section 2 we give brief introduction of Sclera vein recognition In

the Section 8 we performed some experiments using the proposed system

In the Section 9 we draw some conclusions

22 BACKGROUND OF SCLERA VEIN RECOGNITION

221 OVERVIEW OF SCLERA VEIN RECOGNITION

A typical sclera vein recognition system includes sclera

segmentation feature enhancement feature extraction and feature

matching (Figure 1)

FIG

Sclera image segmentation is the first step in sclera vein recognition

Several methods have been designed for sclera segmentation Crihalmeanu

et al presented an semi-automated system for sclera segmentation They

used a clustering algorithm to classify the color eye images into three

clusters - sclera iris and background Later on Crihalmeanu and Ross

designed a segmentation approach based on a normalized sclera index

measure which includes coarse sclera segmentation pupil region

segmentation and fine sclera segmentation Zhou et al developed a skin

tone plus ldquowhite colorrdquo-based voting method for sclera segmentation in

color images and Otsursquos thresholding-based method for grayscale images

After sclera segmentation it is necessary to enhance and extract the sclera

features since the sclera vein patterns often lack contrast and are hard to

detect Zhou et al used a bank of multi-directional Gabor filters for

vascular pattern enhancement Derakhshani et al used contrast limited

adaptive histogram equalization (CLAHE) to enhance the green color plane

of the RGB image and a multi-scale region growing approach to identify

the sclera veins from the image background Crihalmeanu and Ross applied

a selective enhancement filter for blood vessels to extract features from the

green component in a color image In the feature matching step

Crihalmeanu and Ross proposed

three registration and matching approaches including Speed Up Robust

Features (SURF) which is based on interest-point detection minutiae

detection which is based on minutiae points on the vasculature structure

and direct correlation matching which relies on image registration Zhou et

al designed a line descriptor based feature registration and matching

method

The proposed sclera recognition consists of five steps which include

sclera segmentation vein pattern enhancement feature extraction feature

matching and matching decision Fig 2 shows the block diagram of sclera

recognition Two types of feature extraction are used in the proposed

method to achieve good accuracy for the identification The characteristics

that are elicited from the blood vessel structure seen in the sclera region are

Histogram of Oriented Gradients (HOG) and interpolation of Cartesian to

Polar conversion HOG is used to determine the gradient orientation and

edge orientations of vein pattern in the sclera region of an eye image To

become more computationally efficient the data of the image are converted

to the polar form It is mainly used for circular or quasi circular shape of

object These two characteristics are extracted from all the images in the

database and compared with the features of the query image whether the

person is correctly identified or not This procedure is done in the feature

matching step and ultimately makes the matching decision By using the

proposed feature extraction methods and matching techniques the human

identification is more accurate than the existing studies In the proposed

method two features of an image are drawn out

222 SCLERA SEGMENTATION

Sclera segmentation is the first step in the sclera recognition It lets

in three steps glare area detection sclera area estimation and iris and eyelid

detection and refinement Fig shows the steps of segmentation

FIG

Glare Area Detection Glare area means a small bright area near

pupil or iris This is the unwanted portion on the eye image Sobel filter is

applied to detect the glare area present in the iris or pupil Simply it runs

only for the grayscale image If the image is color then it needs a

conversion to grayscale image and after that apply it to the Sobel filter to

detect the glare area Fig 4 shows the result of the glare area detection

FIG

Sclera area estimation For the estimation of sclera area Otsursquos

thresholding method is applied The stairs of the sclera area detection are

selection of the area of interest (ROI) Otsursquos thresholding sclera area

detection Left and right sclera area is selected based on the iris boundaries

When the region of interest is selected then apply Otsursquos thresholding for

obtaining the potential sclera areas The correct left sclera area should be

placed in the right and center positions and correct right sclera area should

be placed in the left and center In this way non sclera areas are wiped out

223 IRIS AND EYELID REFINEMENT

The top and underside of the sclera regions are the limits of the

sclera area And then that upper eyelid lower eyelid and iris boundaries are

refined These altogether are the unwanted portion for recognition In order

to eliminate these effects refinement is done in the footstep of the detection

of sclera area Fig shows after the Otsursquos thresholding process and iris and

eyelid refinement to detect right sclera area In the same way the left sclera

area is detected using this method

FIG

In the segmentation process all images are not perfectly segmented

Hence feature extraction and matching are needed to reduce the

segmentation fault The vein patterns in the sclera area are not visible in the

segmentation process To get vein patterns more visible vein pattern

enhancement is to be performed

224 OCULAR SURFACE VASCULATURE

Human recognition using vascular patterns in the human body has

been studied in the context of fingers (Miura et al 2004) palm (Lin and

Fan 2004) and retina (Hill 1999) In the case of retinal biometrics an

especial optical device for imaging the back of the eyeball is needed (Hill

1999) Due to its perceived invasiveness and the required degree of subject

cooperation the use of retinal biometrics may not be acceptable to some

individuals The conjunctiva is a thin transparent and moist tissue that

covers the outer surface of the eye The part of the conjunctiva that covers

the inner lining of the eyelids is called palpebral conjunctiva and the part

that covers the outer surface of the eye is called ocular (or the bulbar)

conjunctiva which is the focus of this study The ocular conjunctiva is very

thin and clear thus the vasculature (including those of the episclera) is

easily visible through it The visible microcirculation of conjunctiva offers a

rich and complex network of veins and fine microcirculation (Fig 1) The

apparent complexity and specificity of these vascular patterns motivated us

to utilize them for personal identification (Derakhshani and Ross 2006)

FIG

We have found conjunctival vasculature to be a suitable biometric as it

conforms to the following criteria (Jain et al 2004)

UNIVERSALITY All normal living tissues including that of the

conjunctiva and episclera have vascular structure

UNIQUENESS Vasculature is created during embryonic vasculogenesis

Its detailed final structure is mostly stochastic and thus unique Even

though no comprehensive study on the uniqueness of vascular structures

has been conducted study of some targeted areas such as those of the eye

fundus confirm the uniqueness of such vascular patterns even between

identical twins (Simon and Goldstein 1935 Tower 1955)

PERMANENCE Other than cases of significant trauma pathology or

chemical intervention spontaneous adult ocular vasculogenesis and

angiogenesis does not easily occur Thus the conjunctival vascular

structure is expected to have reasonable permanence (Joussen 2001)

Practicality Conjunctival vasculature can be captured with commercial off

the shelf digital cameras under normal lighting conditions making this

modality highly practical

ACCEPTABILITY Since the subject is not required to stare directly into

the camera lens and given the possibility of capturing the conjunctival

vasculature from several feet away this modality is non-intrusive and thus

more acceptable

SPOOF-PROOFNESS The fine multi surface structure of the ocular

veins makes them hard to reproduce as a physical artifact Besides being a

stand-alone biometric modality we anticipate that the addition of

conjunctival biometrics will enhance the performance of current iris-based

biometric system in the following ways

Improving accuracy by the addition of vascular features

Facilitating recognition using off-angle iris images For instance if the iris

information is relegated to the left or right portions of the eye the sclera

vein patterns will be further exposed This feature makes sclera vasculature

a natural complement to the iris biometric

Addressing the failure-to-enroll issue when iris patterns are not usable (eg

due to surgical procedures)

Reducing vulnerability to spoof attacks For instance when implemented

alongside iris systems an attacker needs to reproduce not only the iris but

also different surfaces of the sclera along with the associated

microcirculation and make them available on commensurate eye surfaces

The first step in parallelizing an algorithm is to determine the

availability for simultaneous computation Below Figure demonstrates the

possibility for parallel directional filtering Since the filter is computed over

different portions of the input image the computation can be computed in

parallel (denoted by Elements below) In addition individual parallelization

of each element of Filtering can also be performed A detailed discussion of

our proposed parallelization is outside the scope of this paper

FIG

FIG

225 OVERVIEW OF THE LINE DESCRIPTOR-BASED SCLERA

VEIN

2251 RECOGNITION METHOD

The matching segment of the line-descriptor based method is a

bottleneck with regard to matching speed In this section we briefly

describe the Line Descriptor-based sclera vein recognition method After

segmentation vein patterns were enhanced by a bank of directional Gabor

filters Binary morphological operations are used to thin the detected vein

structure down to a single pixel wide skeleton and remove the branch

points The line descriptor is used to describe the segments in the vein

structure Figure 2 shows a visual description of the line descriptor Each

segment is described by three quantities the segments angle to some

reference angle at the iris center θ the segments distance to the iris center r

and the dominant angular orientation of the line segment ɸ Thus the

descriptor is S = ( θ r ɸ )T The individual components of the line descriptor

are calculated as

FIG

Here fline (x) is the polynomial approximation of the line segment (xl yl )

is the center point of the line segment (xi yi ) is the center of the detected

iris and S is the line descriptor In order to register the segments of the

vascular patterns a RANSAC-based algorithm is used to estimate the best-

fit parameters for registration between the two sclera vascular patterns For

the registration algorithm it randomly chooses two points ndash one from the

test template and one from the target template It also randomly chooses a

scaling factor and a rotation value based on a priori knowledge of the

database Using these values it calculates a fitness value for the registration

using these parameters

After sclera template registration each line segment in the test

template is compared to the line segments in the target template for

matches In order to reduce the effect of segmentation errors we created the

weighting image (Figure 3) from the sclera mask by setting interior pixels

in the sclera mask to 1 pixels within some distance of the boundary of the

mask to 05 and pixels outside the mask to 0

The matching score for two segment descriptors is calculated By

where Si and Sj are two segment descriptors m(Si Sj ) is the matching

score between segments Si and Sj d(Si Sj ) is the Euclidean distance

between the segment descriptors center points (from Eq 6-8) Dmatch is

the matching distance threshold and match is the matching angle threshold

The total matching score M is the sum of the individual matching scores

divided by the maximum matching score for the minimal set between the

test and target template That is one of the test or target templates has fewer

points and thus the sum of its descriptors weight sets the maximum score

that can be attained

FIG

FIG

FIG

FIG

movement of eye Y shape branches are observed to be a stable feature and

can be used as sclera feature descriptor To detect the Y shape branches in

the original template we search for the nearest neighbors set of every line

segment in a regular distance classified the angles among these neighbors

If there were two types of angle values in the line segment set this set may

be inferred as a Y shape structure and the line segment angles would be

recorded as a new feature of the sclera

There are two ways to measure both orientation and relationship of

every branch of Y shape vessels one is to use the angles of every branch to

x axle the other is to use the angels between branch and iris radial

direction The first method needs additional rotation operating to align the

template In our approach we employed the second method As Figure 6

shows ϕ1 ϕ2 and ϕ3 denote the angle between each branch and the radius

from pupil center Even when the head tilts the eye moves or the camera

zooms occurs at the image acquisition step ϕ1 ϕ2 and ϕ3 are quite stable

To tolerate errors from the pupil center calculation in the segmentation step

we also recorded the center position (x y) of the Y shape branches as

auxiliary parameters So in our rotation shift and scale invariant feature

vector is defined as y(ϕ1 ϕ2 ϕ3 x y) The Y-shape descriptor is generated

with reference to the iris center Therefore it is automatically aligned to the

iris centers It is a rotational- and scale- invariant descriptor V WPL

SCLERA DESCRIPTOR As we discussed in the Section 22 the line

descriptor is extracted from the skeleton of vessel structure in binary images

(Figure 7) The skeleton is then broken into smaller segments For each

segment a line descriptor is created to record the center and orientation of

the segment This descriptor is expressed as s(x yɸ) where (x y) is the

position of the center and ɸ is its orientation Because of the limitation of

segmentation accuracy the descriptor in the boundary of sclera area might

not be accurate and may contain spur edges resulting from the iris eyelid

andor eyelashes To be tolerant of such error the mask file

FIG

The line descriptor of the sclera vessel pattern (a) An eye image (b) Vessel

patterns in sclera (c) Enhanced sclera vessel patterns (d) Centers of line

segments of vessel patterns

Is designed to indicate whether a line segment belongs to the edge of the

sclera or not However in GPU application using the mask is a challenging

since the mask files are large in size and will occupy the GPU memory and

slow down the data transfer When matching the registration RANSAC

type algorithm was used to randomly select the corresponding descriptors

and the transform parameter between them was used to generate the

template transform affine matrix After every templates transform the mask

data should also be transformed and new boundary should be calculated to

evaluate the weight of the transformed descriptor This results in too many

convolutions in processor unit

To reduce heavy data transfer and computation we designed the

weighted polar line (WPL) descriptor structure which includes the

information of mask and can be automatically aligned We extracted the

relationship of geometric feature of descriptors and store them as a new

descriptor We use a weighted image created via setting various weight

values according to their positions The weight of those descriptors who are

beyond the sclera are set to be 0 and those who are near the sclera

boundary are 05 and interior descriptors are set to be 1 In our work

descriptors weights were calculated on their own mask by the CPU only

once

The calculating result was saved as a component of descriptor The

descriptor of sclera will change to s(x y ɸw) where w denotes the weight

of the point and the value may be 0 05 1 To align two templates when a

template is shifted to another location along the line connecting their

centers all the descriptors of that template will be transformed It would be

faster if two templates have similar reference points If we use the center of

the iris as the reference point when two templates are compared the

correspondence will automatically be aligned to each other since they have

the similar reference point Every feature vector of the template is a set of

line segment descriptors composed of three variable (Figure 8) the

segment angle to the reference line which went through the iris center

denoted as θ the distance between the segments center and pupil center

which is denoted as r the dominant angular orientation of the segment

denoted as ɸ To minimize the GPU computing we also convert the

descriptor value from polar coordinate to rectangular coordinate in CPU

preprocess

The descriptor vector becomes s(x y r θ ɸw) The left and right

parts of sclera in an eye may have different registration parameters For

example as an eyeball moves left left part sclera patterns of the eye may be

compressed while the right part sclera patterns are stretched

In parallel matching these two parts are assigned to threads in

different warps to allow different deformation The multiprocessor in

CUDA manages threads in groups of 32 parallel threads called warps We

reorganized the descriptor from same sides and saved

FIG

FIG

them in continuous address This would meet requirement of coalesced

memory access in GPU

After reorganizing the structure of descriptors and adding mask information

into the new descriptor the computation on the mask file is not needed on

the GPU It was very fast to match with this feature because it does not

need to reregister the templates every time after shifting Thus the cost of

data transfer and computation on GPU will be reduced Matching on the

new descriptor the shift parameters generator in Figure 4 is then simplified

as Figure 9

23 EVOLUTION OF GPU ARCHITECTURE

The fixed-function pipeline lacked the generality to efficiently express

more complicated shading and lighting operations that are essential for

complex effects The key step was replacing the fixed-function per-vertex

and per-fragment operations with user-specified programs run on each

vertex and fragment Over the past six years these vertex programs and

fragment programs have become increasingly more capable with larger

limits on their size and resource consumption with more fully featured

instruction sets and with more flexible control-flow operations After many

years of separate instruction sets for vertex and fragment operations current

GPUs support the unified Shader Model 40 on both vertex and fragment

shaders

The hardware must support shader programs of at least 65 k static

instructions and unlimited dynamic instructions

The instruction set for the first time supports both 32-bit integers and 32-

bit floating-point numbers

The hardware must allow an arbitrary number of both direct and indirect

reads from global memory (texture)

Finally dynamic flow control in the form of loops and branches must be

supported

As the shader model has evolved and become more powerful and GPU

applications of all types have increased vertex and fragment program

complexity GPU architectures have increasingly focused on the

programmable parts of the graphics pipeline Indeed while previous

generations of GPUs could best be described as additions of

programmability to a fixed-function pipeline todayrsquos GPUs are better

characterized as a programmable engine surrounded by supporting fixed-

function units General-Purpose Computing on the GPU Mapping general-

purpose computation onto the GPU uses the graphics hardware in much the

same way as any standard graphics application Because of this similarity it

is both easier and more difficult to explain the process On one hand the

actual operations are the same and are easy to follow on the other hand the

terminology is different between graphics and general-purpose use Harris

provides an excellent description of this mapping process

We begin by describing GPU programming using graphics terminology

then show how the same steps are used in a general-purpose way to author

GPGPU applications and finally use the same steps to show the more

simple and direct way that todayrsquos GPU computing applications are written

231 PROGRAMMING A GPU FOR GRAPHICS

We begin with the same GPU pipeline that we described in Section II

concentrating on the programmable aspects of this pipeline

The programmer specifies geometry that covers a region on the screen

The rasterizer generates a fragment at each pixel location covered by that

geometry

Each fragment is shaded by the fragment program

The fragment program computes the value of the fragment by a

combination of math operations and global memory reads from a global

Btexture[ memory

The resulting image can then be used as texture on future passes through

the graphics pipeline

232 PROGRAMMING A GPU FOR GENERAL-PURPOSE

PROGRAMS (OLD)

Coopting this pipeline to perform general-purpose computation

involves the exact same steps but different terminology A motivating

example is a fluid simulation computed over a grid at each time step we

compute the next state of the fluid for each grid point from the current state

at its grid point and at the grid points of its neighbors

The programmer specifies a geometric primitive that covers a

computation domain of interest The rasterizer generates a fragment at each

pixel location covered by that geometry (In our example our primitive

must cover a grid of fragments equal to the domain size of our fluid

simulation)

Each fragment is shaded by an SPMD general-purpose fragment

program (Each grid point runs the same program to update the state of its

fluid)

The fragment program computes the value of the fragment by a

combination of math operations and Bgather[ accesses from global

memory (Each grid point can access the state of its neighbors from the

previous time step in computing its current value)

The resulting buffer in global memory can then be used as an input on

future passes (The current state of the fluid will be used on the next time

step)

23 2 PROGRAMMING A GPU FOR GENERAL-PURPOSE

PROGRAMS (NEW)

One of the historical difficulties in programming GPGPU applications

has been that despite their general-purpose tasksrsquo having nothing to do with

graphics the applications still had to be programmed using graphics APIs

In addition the program had to be structured in terms of the graphics

pipeline with the programmable units only accessible as an intermediate

step in that pipeline when the programmer would almost certainly prefer to

access the programmable units directly The programming environments we

describe in detail in Section IV are solving this difficulty by providing a

more natural direct non-graphics interface to the hardware and

specifically the programmable units Today GPU computing applications

are structured in the following way

The programmer directly defines the computation domain of interest as a

structured grid of threads

An SPMD general-purpose program computes the value of each thread

The value for each thread is computed by a combination of math

operations and both Bgather[ (read) accesses from and Bscatter[ (write)

accesses to global memory Unlike in the previous two

methods the same buffer can be used for both reading and writing

allowing more flexible algorithms (for example in-place algorithms that

use less memory)

The resulting buffer in global memory can then be used as an input in

future computation

24 COARSE-TO-FINE TWO-STAGE MATCHING PROCESS

To further improve the matching process we propose the coarse-to-fine

two-stage matching process In the first stage we matched two images

coarsely using the Y-shape descriptors which is very fast to match because

no registration was needed The matching result in this stage can help filter

out image pairs with low similarities After this step it is still possible for

some false positive matches In the second stage we used WPL descriptor

to register the two images for more detailed descriptor matching including

scale- and translation invariance This stage includes shift transform affine

matrix generation and final WPL descriptor matching Overall we

partitioned the registration and matching processing into four kernels2 in

CUDA (Figure 10) matching on the Y shape descriptor shift

transformation affine matrix generation and final WSL descriptor

matching Combining these two stages the matching program can run faster

and

achieve more accurate score

241 STAGE I MATCHING WITH Y SHAPE DESCRIPTOR

Due to scale- and rotation- invariance of the Y-shape features

registration is unnecessary before matching on Y shape descriptor The

whole matching algorithm is listed as algorithm 1

FIG

Here ytei and yta j are the Y shape descriptors of test template Tte

and target template Tta respectively dϕ is the Euclidian distance of angle

element of descriptors vector defined as (3) dxy is the Euclidian distance of

two descriptor centers defined as (4) ni and di are the matched descriptor

pairsrsquo number and their centers distance respectively tϕ is a distance

threshold and txy is the threshold to restrict the searching area We set tϕ to

30 and txy to 675 in our experiment Here

To match two sclera templates we searched the areas nearby to all

the Y shape branches The search area is limited to the corresponding left or

right half of the sclera in order to reduce the searching range and time The

distance of two branches is defined in (3) where ϕi j is the angle between

the j th branch and the polar from pupil center in desctiptor i

The number of matched pairs ni and the distance between Y shape

branches centers di are stored as the matching result We fuse the number of

matched branches and the average distance between matched branches

centers as (2) Here α is a factor to fuse the matching score which was set

to 30 in our study Ni and Nj is the total numbers of feature vectors in

template i and j separately The decision is regulated by the threshold t if

the sclerarsquos matching score is lower than t the sclera will be discarded The

sclera with high matching score will be passed to the next more precisely

matching process

242 STAGE II FINE MATCHING USING WPL DESCRIPTOR

The line segment WSL descriptor reveals more vessel structure detail of

sclera than the Y shape descriptor The variation of sclera vessel pattern is

nonlinear because

When acquiring an eye image in different gaze angle the vessel structure

will appear nonlinear shrink or extend because eyeball is spherical in shape

sclera is made up of four layers episclera stroma lamina fusca and

endothelium There are slightly differences among movement of these

layers Considering these factors our registration employed both single

shift transform and multi-parameter transform which combines shift

rotation and scale together

1) SHIFT PARAMETER SEARCH As we discussed before

segmentation may not be accurate As a result the detected iris center could

not be very accurate Shift transform is designed to tolerant possible errors

in pupil center detection in the segmentation step If there is no deformation

or only very minor deformation registration with shift transform together

would be adequate to achieve an accurate result We designed Algorithm 2

to get optimized shift parameter Where Tte is the test template and ssei is

the i th WPL descriptor of Tte Tta is the target template and ssai is the i th

WPL descriptor of Ttad(stek staj ) is Euclidean distance of escriptors stek

and staj

Δsk is the shift value of two descriptors defines as

We first randomly select an equal number of segment descriptors

stek in test template Tte from each quad and find its nearest neighbors staj _

in target template Tta The shift offset of them is recorded as the possible

registration shift factor _sk The final offset registration factor is _soptim

which has the smallest standard deviation among these candidate offsets

2) AFFINE TRANSFORM PARAMETER SEARCH

Affine transform is designed to tolerant some deformation of sclera

patterns in the matching step The affine transform algorithm is shown in

Algorithm 3 The shift value in the parameter set is obtained by randomly

selecting descriptor s(it )te and calculating the distance from its nearest

neighbor staj_ in Tta We transform the test template by the matrix in (7)

At end of the iteration we count the numbers of matched descriptor pairs

from the transformed template and the target template The factor β is

involved to determine if the pair of descriptor is matched and we set it to

be 20 pixels in our experiment After N iterations the optimized transform

parameter set is determined via selecting the maximum matching numbers

m(it) Here stei Tte staj and Tta is defined same as algorithm 2 tr (it )

shi f t θ(it )tr (it ) scale is the parameters of shift rotation and scale

transform generated in i tth iteration R(θ (it )) T (tr (it ) shi f t ) and S(tr (it

) scale) are the transform matrix defined as (7) To search optimize

transform parameter we iterated N times to generate these parameters In

our experiment we set iteration time to 512

3) REGISTRATION AND MATCHING ALGORITHM

Using the optimized parameter set determined from Algorithms 2

and 3 the test template will be registered and matched simultaneously The

registration and matching algorithm is listed in Algorithm 4 Here stei Tte

staj and Tta are defined same as Algorithms 2 and 3 θ(optm) tr (optm) shi f

t tr (optm) scale _soptim are the registration parameters attained from

Algorithms 2 and 3 R_θ(optm)_T _tr (optm) shi f t _S_tr (optm) scale

_ is the descriptor transform matrix defined in Algorithm 3 empty is the angle

between the segment descriptor and radius direction w is the weight of the

descriptor which indicates whether the descriptor is at the edge of sclera or

not To ensure that the nearest descriptors have a similar orientation we

used a constant factor α to check the abstract difference of two ɸ In our

experiment we set α to 5 The total matching score is minimal score of two

transformed result divided by the minimal matching score for test template

and target template

25 MAPPING THE SUBTASKS TO CUDA

CUDA is a single instruction multiple data (SIMD) system and

works as a coprocessor with a CPU A CUDA consists of many streaming

multiprocessors (SM) where the parallel part of the program should be

partitioned into threads by the programmer and mapped into those threads

There are multiple memory spaces in the CUDA memory hierarchy

register local memory shared memory global memory constant memory

and texture memory Register local memory and shared memory are on-

chip and could be a little time consuming to access these memories Only

shared memory can be accessed by other threads within the same block

However there is only limited availability of shared memory Global

memory constant memory and texture memory are off-chip memory and

accessible by all threads which would be very time consuming to access

these memories

Constant memory and texture memory are read-only and cacheable

memory Mapping algorithms to CUDA to achieve efficient processing is

not a trivial task There are several challenges in CUDA programming

If threads in a warp have different control path all the branches will be

executed serially To improve performance branch divergence within a

warp should be avoided

Global memory is slower than on-chip memory in term of access To

completely hide the latency of the small instructions set we should use on-

chip memory preferentially rather than global memory When global

memory access occurs threads in same warp should access the words in

sequence to achieve coalescence

Shared memory is much faster than the local and global memory space

But shared memory is organized into banks which are equal in size If two

addresses of memory request from different thread within a warp fall in the

same memory bank the access will be serialized To get maximum

performance memory requests should be scheduled to minimize bank

conflicts

251 MAPPING ALGORITHM TO BLOCKS

Because the proposed registration and matching algorithm has four

independent modules all the modules will be converted to different kernels

on the GPU These kernels are different in computation density thus we

map them to the GPU by various map strategies to fully utilize the

computing power of CUDA Figure 11 shows our scheme of CPU-GPU

task distribution and the partition among blocks and threads Algorithm 1 is

partitioned into coarse-grained parallel subtasks

We create a number of threads in this kernel The number of threads

is the same as the number of templates in the database As the upper middle

column shows in Figure 11 each target template will be assigned to one

thread One thread performs a pair of templates compare In our work we

use NVIDIA C2070 as our GPU Threads and blocks number is set to

1024 That means we can match our test template with up to 1024times1024

target templates at same time

Algorithms 2-4 will be partitioned into fine-grained subtasks which is

processed a section of descriptors in one thread As the lower portion of the

middle column shows in Figure 11 we assigned a target template to one

block Inside a block one thread corresponds a set of descriptors in this

template This partition makes every block execute independently and there

are no data exchange requirements between different blocks When all

threads complete their responding descriptor fractions the sum of the

intermediate results needs to be computed or compared A parallel prefix

sum algorithm is used to calculate the sum of intermediate results which is

show in right of Figure 11 Firstly all odd number threads compute the sum

of consecutive pairs of the results Then recursively every first of i (= 4 8

16 32 64 ) threads

compute the prefix sum on the new result The final result will be saved in

the first address which has the same variable name as the first intermediate

result

252 MAPPING INSIDE BLOCK

In shift argument searching there are two schemes we can choose to

map task

Mapping one pair of templates to all the threads in a block and then every

thread would take charge of a fraction of descriptors and cooperation with

other threads

Assigning a single possible shift offset to a thread and all the threads will

compute independently unless the final result should be compared with

other possible offset

Due to great number of sum and synchronization operations in every

nearest neighbor searching step we choose the second method to parallelize

shift searching In affine matrix generator we mapped an entire parameter

set searching to a thread and every thread randomly generated a set of

parameters and tried them independently The generated iterations were

assigned to all threads The challenge of this step is the randomly generated

numbers might be correlated among threads In the step of rotation and

scale registration generating we used the Mersenne Twister pseudorandom

number generator because it can use bitwise arithmetic and have long

period

The Mersenne twister as most of pseudorandom generators is iterative

Therefore itrsquos hard to parallelize a single twister state update step among

several execution threads To make sure that thousands of threads in the

launch grid generate uncorrelated random sequence many simultaneous

Mersenne twisters need to process with different initial states in parallel

But even ldquovery differentrdquo (by any definition) initial state values do not

prevent the emission of correlated sequences by each generator sharing

identical parameters To solve this problem and to enable efficient

implementation of Mersenne Twister on parallel architectures we used a

special offline tool for the dynamic creation of Mersenne Twisters

parameters modified from the algorithm developed by Makoto Matsumoto

and Takuji Nishimura In the registration and matching step when

searching the nearest neighbor a line segment that has already matched

with others should not be used again In our approach a flag

FIG

FIG

Variable denoting whether the line has been matched is stored in

shared memory To share the flags all the threads in a block should wait

synchronic operation at every query step Our solution is to use a single

thread in a block to process the matching

253 MEMORY MANAGEMENT

The bandwidth inside GPU board is much higher than the

bandwidth between host memory and device memory The data transfer

between host and device can lead to long latency As shown in Figure 11

we load the entire target templates set from database without considering

when they would be processed Therefore there was no data transfer from

host to device during the matching procedure In global memory the

components in descriptor y(ϕ1 ϕ2 ϕ3 x y) and s(x y rθ ϕw) were stored

separately This would guarantee contiguous kernels of Algorithm 2 to 4

can access their data in successive addresses Although such coalescing

access reduces the latency frequently global memory access was still a

slower way to get data In our kernel we loaded the test template to shared

memory to accelerate memory access Because the Algorithms 2 to 4

execute different number of iterations on same data the bank conflict does

not happen To maximize our texture memory space we set the system

cache to the lowest value and bonded our target descriptor to texture

memory Using this catchable memory our data access was accelerated

more

FIG

26 HISTOGRAM OF ORIENTED GRADIENTS

Histogram of oriented gradients is the feature descriptors It is primarily

applied to the design of target detection In this paper it is applied as the

feature for human recognition In the sclera region the vein patterns are the

edges of an image So HOG is used to determine the gradient orientation

and edge orientations of vein pattern in the sclera region of an eye image

To follow out this technique first of all divide the image into small

connected regions called cells For each cell compute the histogram of

gradient directions or edge orientations of the pixels Then the combination

of different histogram of different cell represents the descriptor To improve

accuracy histograms can be contrast normalized by calculating the intensity

from the block and then using this value normalizes all cells within the

block This normalization result shows that it is invariant to geometric and

photometric changes The gradient magnitude m(x y) and orientation 1050592(x

y) are calculated using x and y directions gradients dx (x y) and dy (x y)

Orientation binning is the second step of HOG This method utilized

to create cell histograms Each pixel within the cell used to give a weight to

the orientation which is found in the gradient computation Gradient

magnitude is used as the weight The cells are in the rectangular form The

binning of gradient orientation should be spread over 0 to 180 degrees and

opposite direction counts as the same In the Fig 8 depicts the edge

orientation of picture elements If the images have any illumination and

contrast changes then the gradient strength must be locally normalized For

that cells are grouped together into larger blocks These blocks are

overlapping so that each cell contributes more than once to the final

descriptor Here rectangular HOG (R-HOG) blocks are applied which are

mainly in square grids The performance of HOG is improved by putting

on a Gaussian window into each block

FIG

CHAPTER 3

SOFTWARE SPECIFICATION

31 GENERAL

MATLAB(matrix laboratory) is a numerical

computing environment and fourth-generation programming language

Developed by Math Works MATLAB allows matrix manipulations

plotting of functions and data implementation of algorithms creation

of user interfaces and interfacing with programs written in other languages

including C C++ Java and Fortran

Although MATLAB is intended primarily for numerical computing an

optional toolbox uses the MuPAD symbolic engine allowing access

to symbolic computing capabilities An additional package Simulink adds

graphicalmulti-domainsimulationandModel-Based

Design for dynamic and embedded systems

In 2004 MATLAB had around one million users across industry

and academia MATLAB users come from various backgrounds

of engineering science and economics MATLAB is widely used in

academic and research institutions as well as industrial enterprises

MATLAB was first adopted by researchers and practitioners

in control engineering Littles specialty but quickly spread to many other

domains It is now also used in education in particular the teaching

of linear algebra and numerical analysis and is popular amongst scientists

involved in image processing The MATLAB application is built around the

MATLAB language The simplest way to execute MATLAB code is to type

it in the Command Window which is one of the elements of the MATLAB

Desktop When code is entered in the Command Window MATLAB can

be used as an interactive mathematical shell Sequences of commands can

be saved in a text file typically using the MATLAB Editor as a script or

encapsulated into a function extending the commands available

MATLAB provides a number of features for documenting and

sharing your work You can integrate your MATLAB code with other

languages and applications and distribute your MATLAB algorithms and

applications

32 FEATURES OF MATLAB

High-level language for technical computing

Development environment for managing code files and data

Interactive tools for iterative exploration design and problem solving

Mathematical functions for linear algebra statistics Fourier analysis

filtering optimization and numerical integration

2-D and 3-D graphics functions for visualizing data

Tools for building custom graphical user interfaces

Functions for integrating MATLAB based algorithms with external

applications and languages such as C C++ FORTRAN Javatrade COM

and Microsoft Excel

MATLAB is used in vast area including signal and image

processing communications control design test and measurement

financial modeling and analysis and computational Add-on toolboxes

(collections of special-purpose MATLAB functions) extend the MATLAB

environment to solve particular classes of problems in these application

areas

MATLAB can be used on personal computers and powerful

server systems including the Cheaha compute cluster With the addition of

the Parallel Computing Toolbox the language can be extended with parallel

implementations for common computational functions including for-loop

unrolling Additionally this toolbox supports offloading computationally

intensive workloads to Cheaha the campus compute cluster MATLAB is

one of a few languages in which each variable is a matrix (broadly

construed) and knows how big it is Moreover the fundamental operators

(eg addition multiplication) are programmed to deal with matrices when

required And the MATLAB environment handles much of the bothersome

housekeeping that makes all this possible Since so many of the procedures

required for Macro-Investment Analysis involves matrices MATLAB

proves to be an extremely efficient language for both communication and

implementation

321 INTERFACING WITH OTHER LANGUAGES

MATLAB can call functions and subroutines written in the C

programming language or FORTRAN A wrapper function is created

allowing MATLAB data types to be passed and returned The dynamically

loadable object files created by compiling such functions are termed MEX-

files (for MATLAB executable)

Libraries written in Java ActiveX or NET can be directly called

from MATLAB and many MATLAB libraries (for

example XML or SQL support) are implemented as wrappers around Java

or ActiveX libraries Calling MATLAB from Java is more complicated but

can be done with MATLAB extension which is sold separately by Math

Works or using an undocumented mechanism called JMI (Java-to-Mat lab

Interface) which should not be confused with the unrelated Java that is also

called JMI

As alternatives to the MuPAD based Symbolic Math Toolbox

available from Math Works MATLAB can be connected

to Maple or Mathematical

Libraries also exist to import and export MathML

Development Environment

Startup Accelerator for faster MATLAB startup on Windows especially on

Windows XP and for network installations

Spreadsheet Import Tool that provides more options for selecting and

loading mixed textual and numeric data

Readability and navigation improvements to warning and error messages in

the MATLAB command window

Automatic variable and function renaming in the MATLAB Editor

Developing Algorithms and Applications

MATLAB provides a high-level language and development

tools that let you quickly develop and analyze your algorithms and

applications

The MATLAB Language

The MATLAB language supports the vector and matrix operations

that are fundamental to engineering and scientific problems It enables fast

development and execution With the MATLAB language you can

program and develop algorithms faster than with traditional languages

because you do not need to perform low-level administrative tasks such as

declaring variables specifying data types and allocating memory In many

cases MATLAB eliminates the need for lsquoforrsquo loops As a result one line of

MATLAB code can often replace several lines of C or C++ code

At the same time MATLAB provides all the features of a traditional

programming language including arithmetic operators flow control data

structures data types object-oriented programming (OOP) and debugging

features

MATLAB lets you execute commands or groups of commands one

at a time without compiling and linking enabling you to quickly iterate to

the optimal solution For fast execution of heavy matrix and vector

computations MATLAB uses processor-optimized libraries For general-

purpose scalar computations MATLAB generates machine-code

instructions using its JIT (Just-In-Time) compilation technology

This technology which is available on most platforms provides

execution speeds that rival those of traditional programming languages

Development Tools

MATLAB includes development tools that help you implement

your algorithm efficiently These include the following

MATLAB Editor

Provides standard editing and debugging features such as setting

breakpoints and single stepping

Code Analyzer

Checks your code for problems and recommends modifications to

maximize performance and maintainability

MATLAB Profiler

Records the time spent executing each line of code

Directory Reports

Scan all the files in a directory and report on code efficiency file

differences file dependencies and code coverage

Designing Graphical User Interfaces

By using the interactive tool GUIDE (Graphical User Interface

Development Environment) to layout design and edit user interfaces

GUIDE lets you include list boxes pull-down menus push buttons radio

buttons and sliders as well as MATLAB plots and Microsoft

ActiveXreg controls Alternatively you can create GUIs programmatically

using MATLAB functions

322 ANALYZING AND ACCESSING DATA

MATLAB supports the entire data analysis process from acquiring

data from external devices and databases through preprocessing

visualization and numerical analysis to producing presentation-quality

output

Data Analysis

MATLAB provides interactive tools and command-line functions for data

analysis operations including

Interpolating and decimating

Extracting sections of data scaling and averaging

Thresholding and smoothing

Correlation Fourier analysis and filtering

1-D peak valley and zero finding

Basic statistics and curve fitting

Matrix analysis

Data Access

MATLAB is an efficient platform for accessing data from

files other applications databases and external devices You can read data

from popular file formats such as Microsoft Excel ASCII text or binary

files image sound and video files and scientific files such as HDF and

HDF5 Low-level binary file IO functions let you work with data files in

any format Additional functions let you read data from Web pages and

XML

Visualizing Data

All the graphics features that are required to visualize engineering

and scientific data are available in MATLAB These include 2-D and 3-D

plotting functions 3-D volume visualization functions tools for

interactively creating plots and the ability to export results to all popular

graphics formats You can customize plots by adding multiple axes

changing line colors and markers adding annotation Latex equations and

legends and drawing shapes

2-D Plotting

Visualizing vectors of data with 2-D plotting functions that create

Line area bar and pie charts

Direction and velocity plots

Histograms

Polygons and surfaces

Scatterbubble plots

Animations

3-D Plotting and Volume Visualization

MATLAB provides functions for visualizing 2-D matrices 3-

D scalar and 3-D vector data You can use these functions to visualize and

understand large often complex multidimensional data Specifying plot

characteristics such as camera viewing angle perspective lighting effect

light source locations and transparency

3-D plotting functions include

Surface contour and mesh

Image plots

Cone slice stream and isosurface

323 PERFORMING NUMERIC COMPUTATION

MATLAB contains mathematical statistical and engineering

functions to support all common engineering and science operations These

functions developed by experts in mathematics are the foundation of the

MATLAB language The core math functions use the LAPACK and BLAS

linear algebra subroutine libraries and the FFTW Discrete Fourier

Transform library Because these processor-dependent libraries are

optimized to the different platforms that MATLAB supports they execute

faster than the equivalent C or C++ code

MATLAB provides the following types of functions for performing

mathematical operations and analyzing data

Matrix manipulation and linear algebra

Polynomials and interpolation

Fourier analysis and filtering

Data analysis and statistics

Optimization and numerical integration

Ordinary differential equations (ODEs)

Partial differential equations (PDEs)

Sparse matrix operations

MATLAB can perform arithmetic on a wide range of data types

including doubles singles and integers

CHAPTER 4

IMPLEMENTATION

41 GENERAL

Matlab is a program that was originally designed to simplify the

implementation of numerical linear algebra routines It has since grown into

something much bigger and it is used to implement numerical algorithms

for a wide range of applications The basic language used is very similar to

standard linear algebra notation but there are a few extensions that will

likely cause you some problems at first

42 SNAPSHOTS

ORIGINAL SCLERA IMAGE IS CONVERTED INTO GREY SCALE IMAGE

FIG

GREY SCALE IMAGE IS CONVERTED INTO BINARY IMAGE

FIG

EDGE DETECTON IS DONE BY OTSUrsquoS THRESHOLDING

FIG

SELECTING THE REGION OF INTEREST (SCLERA PART)

FIG

SELECTED ROI PART

FIG

FIG

ENHANCEMENT OF SCLERA IMAGE

FIG

FEATURE EXTRACTION OF SCLERA IMAGE USING GABOR

FILTERS

FIG

MATCHING WITH IMAGES IN DATABASE

FIG

DISPLAYING THE RESULT (MATCHED OR NOT MATCHED)

FIG

CHAPTER 5

APPLICATIONS

The applications of biometrics can be divided into the following three main groups

Commercial applications such as computer network login electronic data security ecommerce Internet access ATM credit card physical access control cellular phone PDA medical records management distance learning etc

Government applications such as national ID card correctional facility driverrsquos license social security welfare-disbursement border control Passport control etc

Forensic applications such as corpse identification criminal investigation terrorist identification parenthood determination missing children etc Traditionally commercial applications have used knowledge-based systems (eg PIN sand passwords) government applications have used token-based systems (eg ID cards and badges) and forensic applications have relied on human experts to match biometric features Biometric systems are being increasingly deployed in large scale civilian applications The Schiphol Premium scheme at the Amsterdam airport for example employs iris scan cards to speed up the passport and visa control procedures

CHAPTER 6

CONCLUSION AND FUTURE SCOPE

61 CONCLUSION

In this paper we proposed a new parallel sclera vein recognition

method which employees a two stage parallel approach for registration and

matching Even though the research focused on developing a parallel sclera

matching solution for the sequential line-descriptor method using CUDA

GPU architecture the parallel strategies developed in this research can be

applied to design parallel solutions to other sclera vein recognition methods

and general pattern recognition methods We designed the Y shape

descriptor to narrow the search range to increase the matching efficiency

which is a new feature extraction method to take advantage of the GPU

structures We developed the WPL descriptor to incorporate mask

information and make it more suitable for parallel computing which can

dramatically reduce data transferring and computation We then carefully

mapped our algorithms to GPU threads and blocks which is an important

step to achieve parallel computation efficiency using a GPU A work flow

which has high arithmetic intensity to hide the memory access latency was

designed to partition the computation task to the heterogeneous system of

CPU and GPU even to the threads in GPU The proposed method

dramatically improves the matching efficiency without compromising

recognition accuracy

62 REFERENCES

[1] C W Oyster The Human Eye Structure and Function Sunderland

Sinauer Associates 1999

[2] C Cuevas D Berjon F Moran and N Garcia ldquoMoving object

detection for real-time augmented reality applications in a GPGPUrdquo IEEE

Trans Consum Electron vol 58 no 1 pp 117ndash125 Feb 2012

[3] D C Cirean U Meier L M Gambardella and J Schmidhuber ldquoDeep

big simple neural nets for handwritten digit recognitionrdquo Neural Comput

vol 22 no 12 pp 3207ndash3220 2010

[4] F Z Sakr M Taher and A M Wahba ldquoHigh performance iris

recognition system on GPUrdquo in Proc ICCES 2011 pp 237ndash242

[5] G Poli J H Saito J F Mari and M R Zorzan ldquoProcessing

neocognitron of face recognition on high performance environment based

on GPU with CUDA architecturerdquo in Proc 20th Int Symp Comput

Archit High Perform Comput 2008 pp 81ndash88

[6] J Antikainen J Havel R Josth A Herout P Zemcik and M Hauta-

Kasari ldquoNonnegative tensor factorization accelerated using GPGPUrdquo IEEE

Trans Parallel Distrib Syst vol 22 no 7 pp 1135ndash1141 Feb 2011

[7] K-S Oh and K Jung ldquoGPU implementation of neural networksrdquo

Pattern Recognit vol 37 no 6 pp 1311ndash1314 2004

[8] P R Dixon T Oonishi and S Furui ldquoHarnessing graphics processors

for the fast computation of acoustic likelihoods in speech recognitionrdquo

Comput Speech Lang vol 23 no 4 pp 510ndash526 2009

[9] P Kaufman and A Alm ldquoClinical applicationrdquo Adlerrsquos Physiology of

the Eye 2003

[10] R N Rakvic B J Ulis R P Broussard R W Ives and N Steiner

ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics Security vol 4

no 4 pp 812ndash823 Dec 2009

[11] S Crihalmeanu and A Ross ldquoMultispectral scleral patterns for ocular

biometric recognitionrdquo Pattern Recognit Lett vol 33 no 14 pp 1860ndash

1869 Oct 2012

[12] W Wenying Z Dongming Z Yongdong L Jintao and G

Xiaoguang ldquoRobust spatial matching for object retrieval and its parallel

implementation on GPUrdquo IEEE Trans Multimedia vol 13 no 6 pp

1308ndash1318 Dec 2011Multimedia Sec Magdeburg Germany Sep 2004

pp 4ndash15

[13] Y Xu S Deka and R Righetti ldquoA hybrid CPU-GPGPU approach for

real-time elastographyrdquo IEEE Trans Ultrason Ferroelectr Freq Control

vol 58 no 12 pp 2631ndash2645 Dec 2011

[14] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive

multimodal eye recognitionrdquo Signal Image Video Process vol 7 no 4

pp 619ndash631 Jul 2013

[15] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive

approach for sclera image quality measurerdquo Int J Biometrics vol 5 no 2

pp 181ndash198 2013

[16] Z Zhou E Y Du N L Thomas and E J Delp ldquoA new human

identification method Sclera recognitionrdquo IEEE Trans Syst Man

Cybern A Syst Humans vol 42 no 3 pp 571ndash583 May 2012

Page 10: Major project report on

48 Feature extracted sclera

image

68

49 Matching with images in

database

69

410 Result 69

ABSTRACT

Sclera vein recognition is shown to be a promising method for human

identification However its matching speed is slow which could impact its

application for real-time applications To improve the matching efficiency

we proposed a new parallel sclera vein recognition method using a two-

stage parallel approach for registration and matching First we designed a

rotation- and scale-invariant Y shape descriptor based feature extraction

method to efficiently eliminate most unlikely matches Second we

developed a weighted polar line sclera descriptor structure to incorporate

mask information to reduce GPU memory cost Third we designed a

coarse-to-fine two-stage matching method Finally we developed a

mapping scheme to map the subtasks to GPU processing units The

experimental results show that our proposed method can achieve dramatic

processing speed improvement without compromising the recognition

accuracy

CHAPTER 1

INTRODUCTION

11GENERAL

Digital image processing is the use of computer algorithms to

perform image processing on digital images The 2D continuous image is

divided into N rows and M columns The intersection of a row and a

column is called a pixel The image can also be a function other variables

including depth color and time An image given in the form of a

transparency slide photograph or an X-ray is first digitized and stored as a

matrix of binary digits in computer memory This digitized image can then

be processed andor displayed on a high-resolution television monitor For

display the image is stored in a rapid-access buffer memory which

refreshes the monitor at a rate of 25 frames per second to produce a visually

continuous display

12 OVERVIEW ABOUT DIGITAL IMAGE PROCESSING

The field of ldquoDigital Image Processingrdquo refers to processing the digital

images by means of a digital computer In a broader sense it can be

considered as a processing of any two dimensional data where any image

(optical information) is represented as an array of real or complex numbers

represented by a definite number of bits An image is represented as a two

dimensional function f(xy) where lsquoxrsquo and lsquoyrsquo are spatial (plane)

coordinates and the amplitude of f at any pair of coordinates (xy)

represents the intensity or gray level of the image at that point

A digital image is one for which both the co-ordinates and the

amplitude values of f are all finite discrete quantities Hence a digital

image is composed of a finite number of elements each of which has a

particular location value These elements are called ldquopixelsrdquo A digital

image is discrete in both spatial coordinates and brightness and it can be

considered as a matrix whose rows and column indices identify a point on

the image and the corresponding matrix element value identifies the gray

level at that point

One of the first applications of digital images was in the newspaper

industry when pictures were first sent by submarine cable between London

and New York Introduction of the Bartlane cable picture transmission

system in the early 1920s reduced the time required to transport a picture

across the Atlantic from more than a week to less than three hours

FIG

121 PREPROCESSING

In imaging science image processing is any form of signal

processing for which the input is an image such as a photograph or video

frame the output of image processing may be either an image or a set of

characteristics or parameters related to the image Most image-processing

techniques involve treating the image as a two-dimensional signal and

applying standard signal-processing techniques to it Image processing

usually refers to digital image processing but optical and analog image

processing also are possible This article is about general techniques that

apply to all of them The acquisition of images (producing the input image

in the first place) is referred to as imaging

Image processing refers to processing of a 2D picture by a

computer Basic definitions

An image defined in the ldquoreal worldrdquo is considered to be a function

of two real variables for example a(xy) with a as the amplitude (eg

brightness) of the image at the real coordinate position (xy) Modern digital

technology has made it possible to manipulate multi-dimensional signals

with systems that range from simple digital circuits to advanced parallel

computers The goal of this manipulation can be divided into three

categories

Image processing (image in -gt image out)

Image Analysis (image in -gt measurements out)

Image Understanding (image in -gt high-level description out)

An image may be considered to contain sub-images sometimes referred

to as regions-of-interest ROIs or simply regions This concept reflects the

fact that images frequently contain collections of objects each of which can

be the basis for a region In a sophisticated image processing system it

should be possible to apply specific image processing operations to selected

regions Thus one part of an image (region) might be processed to suppress

motion blur while another part might be processed to improve colour

rendition

Most usually image processing systems require that the images be

available in digitized form that is arrays of finite length binary words For

digitization the given Image is sampled on a discrete grid and each sample

or pixel is quantized using a finite number of bits The digitized image is

processed by a computer To display a digital image it is first converted

into analog signal which is scanned onto a display Closely related to

image processing are computer graphics and computer vision In computer

graphics images are manually made from physical models of objects

environments and lighting instead of being acquired (via imaging devices

such as cameras) from natural scenes as in most animated movies

Computer vision on the other hand is often considered high-level image

processing out of which a machinecomputersoftware intends to decipher

the physical contents of an image or a sequence of images (eg videos or

3D full-body magnetic resonance scans)

In modern sciences and technologies images also gain much

broader scopes due to the ever growing importance of scientific

visualization (of often large-scale complex scientificexperimental data)

Examples include microarray data in genetic research or real-time multi-

asset portfolio trading in finance Before going to processing an image it is

converted into a digital form Digitization includes sampling of image and

quantization of sampled values After converting the image into bit

information processing is performed This processing technique may be

Image enhancement Image restoration and Image compression

122 IMAGE ENHANCEMENT

It refers to accentuation or sharpening of image features such as

boundaries or contrast to make a graphic display more useful for display amp

analysis This process does not increase the inherent information content in

data It includes gray level amp contrast manipulation noise reduction edge

crispening and sharpening filtering interpolation and magnification

pseudo coloring and so on

123 IMAGE RESTORATION

It is concerned with filtering the observed image to minimize the

effect of degradations Effectiveness of image restoration depends on the

extent and accuracy of the knowledge of degradation process as well as on

filter design Image restoration differs from image enhancement in that the

latter is concerned with more extraction or accentuation of image features

124 IMAGE COMPRESSION

It is concerned with minimizing the number of bits required to represent

an image Application of compression are in broadcast TV remote sensing

via satellite military communication via aircraft radar teleconferencing

facsimile transmission for educational amp business documents medical

images that arise in Computer tomography magnetic resonance imaging

and digital radiology motion pictures satellite images weather maps

geological surveys and so on

Text compression ndash CCITT GROUP3 amp GROUP4

Still image compression ndash JPEG

Video image compression ndash MPEG

125 SEGMENTATION

In computer vision image segmentation is the process of

partitioning a digital image into multiple segments (sets of pixels also

known as super pixels) The goal of segmentation is to simplify andor

change the representation of an image into something that is more

meaningful and easier to analyze Image segmentation is typically used to

locate objects and boundaries (lines curves etc) in images More precisely

image segmentation is the process of assigning a label to every pixel in an

image such that pixels with the same label share certain visual

characteristics

The result of image segmentation is a set of segments that

collectively cover the entire image or a set of contours extracted from the

image (see edge detection) Each of the pixels in a region are similar with

respect to some characteristic or computed property such as

colour intensity or texture Adjacent regions are significantly different

with respect to the same characteristic(s) When applied to a stack of

images typical in medical imaging the resulting contours after image

segmentation can be used to create 3D reconstructions with the help of

interpolation algorithms like marching cubes

126 IMAGE RESTORATION

Image restoration like enhancement improves the qualities of image

but all the operations are mainly based on known measured or

degradations of the original image Image restorations are used to restore

images with problems such as geometric distortion improper focus

repetitive noise and camera motion It is used to correct images for known

degradations

127 FUNDAMENTAL STEPS

Image acquisition to acquire a digital image

Image preprocessing to improve the image in ways that increases the

chances for success of the other processes

Image segmentation to partitions an input image into its constituent parts or

objects

Image representation to convert the input data to a form suitable for

computer processing

Image description to extract features that result in some quantitative

information of interest or features that are basic for differentiating one

class of objects from another

Image recognition to assign a label to an object based on the

information provided by its descriptors

Image interpretation to assign meaning to an ensemble of recognized

objects

Knowledge about a problem domain is coded into an image processing

system in the form of a Knowledge database

13 A SIMPLE IMAGE MODEL

To be suitable for computer processing an image f(xy) must be digitalized

both spatially and in amplitude

Digitization of the spatial coordinates (xy) is called image sampling

Amplitude digitization is called gray-level quantization

The storage and processing requirements increase rapidly with the spatial

resolution and the number of gray levels

Example A 256 gray-level image of size 256x256 occupies 64K bytes of

memory

Images of very low spatial resolution produce a checkerboard effect

The use of insufficient number of gray levels in smooth areas of a digital

image results in false contouring

14 IMAGE FILE FORMATS

There are two general groups of lsquoimagesrsquo vector graphics (or line art)

and bitmaps (pixel-based or lsquoimagesrsquo) Some of the most common file

formats are

GIF mdash Graphical interchange Format An 8-bit (256 colour) non-

destructively compressed bitmap format Mostly used for web Has several

sub-standards one of which is the animated GIF

JPEG mdash Joint Photographic Experts Group a very efficient (ie much

information per byte) destructively compressed 24 bit (16 million colours)

bitmap format Widely used especially for web and Internet (bandwidth-

limited)

TIFF mdash Tagged Image File Format The standard 24 bit publication bitmap

format Compresses non-destructively with for instance Lempel-Ziv-

Welch (LZW) compression

PS mdash Postscript a standard vector format Has numerous sub-standards

and can be difficult to transport across platforms and operating systems

PSD ndash Adobe PhotoShop Document a dedicated Photoshop format that

keeps all the information in an image including all the layers

BMP- bit map file format

15 TYPE OF IMAGES

Images are 4 types

1 Binary image

2 Gray scale image

3 Color image

4 Indexed image

151 BINARY IMAGES

A binary image is a digital image that has only two possible values for

each pixel Typically the two colors used for a binary image are black and

white though any two colors can be used Binary images are also called bi-

level or two-level This means that each pixel is stored as a single bitmdashie

a 0 or 1 The names black-and-white BampW

152 GRAY SCALE IMAGE

In a (8-bit) grayscale image each picture element has an assigned intensity

that ranges from 0 to 255 A grey scale image is what people normally call

a black and white image but the name emphasizes that such an image will

also include many shades of grey

FIG

153 COLOR IMAGE

The RGB colour model relates very closely to the way we perceive

colour with the r g and b receptors in our retinas RGB uses additive colour

mixing and is the basic colour model used in television or any other

medium that projects colour with light It is the basic colour model used in

computers and for web graphics but it cannot be used for print production

The secondary colours of RGB ndash cyan magenta and yellow ndash are formed

by mixing two of the primary colours (red green or blue) and excluding the

third colour Red and green combine to make yellow green and blue to

make cyan and blue and red form magenta The combination of red green

and blue in full intensity makes white

In Photoshop using the ldquoscreenrdquo mode for the different layers in an

image will make the intensities mix together according to the additive

colour mixing model This is analogous to stacking slide images on top of

each other and shining light through them

FIG

CMYK The 4-colour CMYK model used in printing lays down

overlapping layers of varying percentages of transparent cyan (C) magenta

(M) and yellow (Y) inks In addition a layer of black (K) ink can be added

The CMYK model uses the subtractive colour model

154 INDEXED IMAGE

FIG

An indexed image consists of an array and a color map matrix The

pixel values in the array are direct indices into a color map By convention

this documentation uses the variable name X to refer to the array and map

to refer to the color map In computing indexed color is a technique to

manage digital images colors in a limited fashion in order to save

computer memory and file storage while speeding up display refresh and

file transfers It is a form of vector quantization compression

When an image is encoded in this way color information is not

directly carried by the image pixel data but is stored in a separate piece of

data called a palette an array of color elements in which every element a

color is indexed by its position within the array The image pixels do not

contain the full specification of its color but only its index in the palette

This technique is sometimes referred as pseudocolor or indirect color as

colors are addressed indirectly

Perhaps the first device that supported palette colors was a random-

access frame buffer described in 1975 by Kajiya Sutherland and Cheadle

This supported a palette of 256 36-bit RGB colors

16 Applications of image processing

Interest in digital image processing methods stems from 2 principal

application areas

1) Improvement of pictorial information for human interpretation

2) Processing of scene data for autonomous machine perception

In the second application area interest focuses on procedures for

extracting from an image

Information in a form suitable for computer processing

Examples include automatic character recognition industrial machine

vision for product assembly and inspection military recognizance

automatic processing of fingerprints etc

17 EXISTING SYSTEM

In Crihalmeanu and Ross proposed three approaches Speed Up Robust

Features (SURF)-based method minutiae detection and direct correlation

matching for feature registration and matching Within these three methods

the SURF method achieves the best accuracy It takes an average of 15

seconds1 using the SURF method to perform a one-to-one matching In

Zhou et al Proposed line descriptor-based method for sclera vein

recognition The matching step (including registration) is the most time-

consuming step in this sclera vein recognition system which costs about 12

seconds to perform a one-to-one matching Both speed was calculated using

a PC with Intelreg Core TM 2 Duo 24GHz processors and 4 GB DRAM

Currently Sclera vein recognition algorithms are designed using central

processing unit (CPU)-based systems

171 DISADVANTAGES OF EXISTING SYSTEM

1 Mask files are used to calculate valid overlapping areas of two sclera

templates and to align the templates to the same coordinate system But the

mask files are large in size and will preoccupy the GPU memory and slow

down the data transfer Also some of processing on the mask files will

involve convolution which is difficult to improve its performance on the

scalar process unit on CUDA

2 The procedure of sclera feature matching consists of a pipeline of several

computational stages with different memory and processing requirements

There is no uniform mapping scheme applicable for all these stages

3 When the scale of sclera database is far larger than the number of the

processing units on the GPU parallel matching on the GPU is still unable to

satisfy the requirement of real-time performance

LITERATURE SURVEY

1 S Crihalmeanu and A Ross ldquo ieee Multispectral scleral patterns

for ocular biometric recognitionrdquo Pattern Recognit Lett vol 33 no

14 pp 1860ndash1869 Oct 2012

Face recognition in unconstrained acquisition conditions is one of the

most challenging problems that has been actively researched in recent

years It is well known that many state-of-the-arts still face recognition

algorithms perform well when constrained (frontal well illuminated high-

resolution sharp and full) face images are acquired However their

performance degrades significantly when the test images contain variations

that are not present in the training images In this paper we highlight some

of the key issues in remote face recognition We define the remote face

recognition as one where faces are several tens of meters (10-250m) from

the cameras We then describe a remote face database which has been

acquired in an unconstrained outdoor maritime environment Recognition

performance of a subset of existing still image-based face recognition

algorithms is evaluated on the remote face data set Further we define the

remote re-identification problem as matching a subject at one location with

candidate sets acquired at a different location and over time in remote

conditions We provide preliminary experimental results on remote re-

identification It is demonstrated that in addition to applying a good

classification algorithm finding features that are robust to variations

mentioned above and developing statistical models which can account for

these variations are very important for remote face recognition

2 R N Rakvic B J Ulis R P Broussard R W Ives and N

Steiner1 ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics

Security

With the rapidly expanded biometric data collected by various sectors

of government and industry for identification and verification purposes

how to manage and process such Big Data draws great concern Even

though modern processors are equipped with more cores and memory

capacity it still requires careful design in order to utilize the hardware

resource effectively and the power consumption efficiently This research

addresses this issue by investigating the workload characteristics of

biometric application Taking Daugmanrsquos iris matching algorithm which

has been proven to be the most reliable iris matching method as a case

study we conduct performance profiling and binary instrumentation on the

benchmark to capture its execution behavior The results show that data

loading and memory access incurs great performance overhead and

motivates us to move the biometrics computation to high-performance

architecture

Modern iris recognition algorithms can be computationally intensive

yet are designed for traditional sequential processing elements such as a

personal computer However a parallel processing alternative using field

programmable gate arrays (FPGAs) offers an opportunity to speed up iris

recognition Within the means of this project iris template generation with

directional filtering which is acomputationally expensive yet parallel

portion of a modern iris recognition algorithm is parallelized on an FPGA

system We will present a performance comparison of the parallelized

algorithm on the FPGA system to a traditional CPU-based version The

parallelized template generation outperforms an optimized C++ code

version determining the information content of an iris approximately 324

times faster

3 R Derakhshani A Ross and S Crihalmeanu ldquoA new biometric

modality based on conjunctival vasculaturerdquo in Proc Artif Neural

Netw Eng 2006 pp 1ndash8

A new biometric indicator based on the patterns of conjunctival

vasculature is proposed Conjunctival vessels can be observed on the visible

part of the sclera that is exposed to the outside world These vessels

demonstrate rich and specific details in visible light and can be easily

photographed using a regular digital camera In this paper we discuss

methods for conjunctival imaging preprocessing and feature extraction in

order to derive a suitable conjunctival vascular template for biometric

authentication Commensurate classification methods along with the

observed accuracy are discussed Experimental results suggest the potential

of using conjunctival vasculature as a biometric measure Identification of

a person based on some unique set of features is an important task The

human identification is possible with several biometric systems and sclera

recognition is one of the promising biometrics The sclera is the white

portion of the human eye The vein pattern seen in the sclera region is

unique to each person Thus the sclera vein pattern is a well suited

biometric technology for human identification The existing methods used

for sclera recognition have some drawbacks like only frontal looking

images are preferred for matching and rotation variance is another problem

These problems are completely eliminated in the proposed system by using

two feature extraction techniques They are Histogram of Oriented

Gradients (HOG) and converting the image into polar form using the

bilinear interpolation technique These two features help the proposed

system to become illumination invariant and rotation invariant The

experimentation is done with the help of UBIRIS database The

experimental result shows that the proposed sclera recognition method can

achieve better accuracy than the previous methods

4 J D Owens M Houston D Luebke S Green J E Stone and J

C Phillips ldquoGPU computingrdquo Proc IEEE vol 96 no 5pp 879ndash899

May 2008

The graphics processing unit (GPU) has become an integral part of

todayrsquos mainstream computing systems Over the past six years there has

been a marked increase in the performance and capabilities of GPUs The

modern GPU is not only a powerful graphics engine but also a highly

parallel programmable processor featuring peak arithmetic and memory

bandwidth that substantially outpaces its CPU counterpart The GPUrsquos

rapid increase in both programmability and capability has spawned a

research community that has successfully mapped a broad range of

computationally demanding complex problems to the GPU This effort in

general purpose computing on the GPU also known as GPU computing

has positioned the GPU as a compelling alternative to traditional

microprocessors in high-performance computer systems of the future We

describe the background hardware and programming model for GPU

computing summarize the state of the art in tools and techniques and

present four GPU computing successes in game physics and computational

biophysics that deliver order-of-magnitude performance gains over

optimized CPU applications

5 H Proenccedila and L A Alexandre ldquoUBIRIS A noisy iris image

databaserdquoin Proc 13th Int Conf Image Anal Process 2005 pp 970ndash

977

This paper proposes algorithms for iris segmentation quality

enhancement match score fusion and indexing to improve both the

accuracy and the speed of iris recognition A curve evolution approach is

proposed to effectively segment a nonideal iris image using the modified

MumfordndashShah functional Different enhancement algorithms are

concurrently applied on the segmented iris image to produce multiple

enhanced versions of the iris image A support-vector-machine-based

learning algorithm selects locally enhanced regions from each globally

enhanced image and combines these good-quality regions to create a single

high-quality iris image Two distinct features are extracted from the high-

quality iris image The global textural feature is extracted using the 1-D log

polar Gabor transform and the local topological feature is extracted using

Euler numbers An intelligent fusion algorithm combines the textural and

topological matching scores to further improve the iris recognition

performance and reduce the false rejection rate whereas an indexing

algorithm enables fast and accurate iris identification The verification and

identification performance of the propose algorithms is validated and

compared with other algorithms using the CASIA Version 3 ICE 2005 and

UBIRIS iris databases

18 PROPOSED METHOD

Proposed a new parallel sclera vein recognition method using a two-

stage parallel approach for registration and matching A parallel sclera

matching solution for Sclera vein recognition using our sequential line-

descriptor method using the CUDA GPU architecture CUDA is a highly

parallel multithreaded many-core processor with tremendous

computational power

It supports not only a traditional graphics pipeline but also computation

on non-graphical data It is relatively straightforward to implement our C

program for CUDA on AMD-based GPU using Open CL Our CUDA

kernels can be directly converted to Open CL kernels by concerning

different syntax for various keywords and built-in functions The mapping

strategy is also effective in Open CL if we regard thread and block in

CUDA as work item and work-group in Open CL Most of our optimization

techniques such as coalesced memory access and prefix sum can work in

Open CL too Moreover since CUDA is a data parallel architecture the

implementation of our approach by Open CL should be programmed in

data-parallel model

In this research we first discuss why the naiumlve parallel approach would

not work We then propose the new sclera descriptor ndash the Y shape sclera

feature-based efficient registration method to speed up the mapping scheme

introduce the ldquoweighted polar line (WPL) descriptorrdquo that would be better

suited for parallel computing to mitigate the mask size issue and develop

our coarse to fine two-stage matching process to dramatically improve the

matching speed These new approaches make the parallel processing

possible and efficient

191PROPOSED SYSTEM ADVANTAGES

1 To improve the efficiency in this research we propose a new descriptor

mdash the Y shape descriptor which can greatly help improve the efficiency of

the coarse registration of two images and can be used to filter out some

non-matching pairs before refined matching

2 We propose the coarse-to-fine two-stage matching process In the first

stage we matched two images coarsely using the Y-shape descriptors

which is very fast to match because no registration was needed The

matching result in this stage can help filter out image pairs with low

similarities

CHAPTER 2

PROJECT DESCRIPTION

21 INTRODUCTION

The sclera is the opaque and white outer layer of the eye The blood

vessel structure of sclera is formed randomly and is unique to each person

which can be used for humanrsquos identification Several researchers have

designed different Sclera vein recognition methods and have shown that it

is promising to use Sclera vein recognition for human identification In

Crihalmeanu and Ross proposed three approaches Speed Up Robust

Features (SURF)-based method minutiae detection and direct correlation

matching for feature registration and matching Within these three methods

the SURF method achieves the best accuracy It takes an average of 15

seconds1 using the SURF method to per- form a one-to-one matching Zhou

et al proposed line descriptor-based method for sclera vein recognition

The matching step (including registration) is the most time-consuming step

in this sclera vein recognition system which costs about 12 seconds to

perform a one-to-one matching Both speed was calculated using a PC with

Intelreg Coretrade 2 Duo 24GHz processors and 4 GB DRAM Currently

Sclera vein recognition algorithms are designed using central processing

unit (CPU)-based systems

As discussed CPU-based systems are designed as sequential

processing devices which may not be efficient in data processing where the

data can be parallelized Because of large time consumption in the matching

step Sclera vein recognition using sequential-based method would be very

challenging to be implemented in a real time biometric system especially

when there is large number of templates in the database for matching GPUs

(as abbreviation of General purpose Graphics Processing Units GPGPUs)

are now popularly used for parallel computing to improve the

computational processing speed and efficiency The highly parallel

structure of GPUs makes them more effective than CPUs for data

processing where processing can be performed in parallel GPUs have been

widely used in biometrics recognition such as speech recognition text

detection handwriting recognition and face recognition In iris

recognition GPU was used to extract the features construct descriptors

and match templates

GPUs are also used for object retrieval and image search Park et al

designed the performance evaluation of image processing algorithms such

as linear feature extraction and multi-view stereo matching on GPUs

However these approaches were designed for their specific biometric

recognition applications and feature searching methods Therefore they may

not be efficient for Sclera vein recognition Compute Unified Device

Architecture (CUDA) the computing engine of NVIDIA GPUs is used in

this research CUDA is a highly parallel multithreaded many-core

processor with tremendous computational power It supports not only a

traditional graphics pipeline but also computation on non-graphical data

More importantly it offers an easier programming platform which

outperforms its CPU counterparts in terms of peak arithmetic intensity and

memory bandwidth In this research the goal is not to develop a unified

strategy to parallelize all sclera matching methods because each method is

quite different from one another and would need customized design To

develop an efficient parallel computing scheme it would need different

strategies for different Sclera vein recognition methods

Rather the goal is to develop a parallel sclera matching solution for

Sclera vein recognition using our sequential line-descriptor method using

the CUDA GPU architecture However the parallelization strategies

developed in this research can be applied to design parallel approaches for

other Sclera vein recognition methods and help parallelize general pattern

recognition methods Based on the matching approach in there are three

challenges to map the task of sclera feature matching to GPU

1) Mask files are used to calculate valid overlapping areas of two sclera

templates and to align the templates to the same coordinate system But the

mask files are large in size and will preoccupy the GPU memory and slow

down the data transfer Also some of processing on the mask files will

involve convolution which is difficult to improve its performance on the

scalar process unit on CUDA

2) The procedure of sclera feature matching consists of a pipeline of several

computational stages with different memory and processing requirements

There is no uniform mapping scheme applicable for all these stages

3) When the scale of sclera database is far larger than the number of the

processing units on the GPU parallel matching on the GPU is still unable to

satisfy the requirement of real-time performance New designs are

necessary to help narrow down the search range In summary naiumlve

implementation of the algorithms in parallel would not work efficiently

Note it is relatively straightforward to implement our C program for

CUDA on AMD-based GPU using Open CL Our CUDA kernels can be

directly converted to Open CL kernels by concerning different syntax for

various keywords and built-in functions The mapping strategy is also

effective in Open CL if we regard thread and block in CUDA as work item

and work-group in Open CL Most of our optimization techniques such as

coalesced memory access and prefix sum can work in Open CL too

Moreover since CUDA is a data parallel architecture the implementation

of our approach by Open CL should be programmed in data-parallel model

In this research we first discuss why the naiumlve parallel approach would not

work (Section 3) We then propose the new sclera descriptor ndash the Y shape

sclera feature-based efficient registration method to speed up the mapping

scheme (Section 4) introduce the ldquoweighted polar line (WPL) descriptorrdquo

that would be better suited for parallel computing to mitigate the mask size

issue (Section 5) and develop our coarse to fine two-stage matching

process to dramatically improve the matching speed (Section 6) These new

approaches make the parallel processing possible and efficient However it

is non-trivial to implement these algorithms in CUDA We then developed

the implementation schemes to map our algorithms into CUDA (Section 7)

In the Section 2 we give brief introduction of Sclera vein recognition In

the Section 8 we performed some experiments using the proposed system

In the Section 9 we draw some conclusions

22 BACKGROUND OF SCLERA VEIN RECOGNITION

221 OVERVIEW OF SCLERA VEIN RECOGNITION

A typical sclera vein recognition system includes sclera

segmentation feature enhancement feature extraction and feature

matching (Figure 1)

FIG

Sclera image segmentation is the first step in sclera vein recognition

Several methods have been designed for sclera segmentation Crihalmeanu

et al presented an semi-automated system for sclera segmentation They

used a clustering algorithm to classify the color eye images into three

clusters - sclera iris and background Later on Crihalmeanu and Ross

designed a segmentation approach based on a normalized sclera index

measure which includes coarse sclera segmentation pupil region

segmentation and fine sclera segmentation Zhou et al developed a skin

tone plus ldquowhite colorrdquo-based voting method for sclera segmentation in

color images and Otsursquos thresholding-based method for grayscale images

After sclera segmentation it is necessary to enhance and extract the sclera

features since the sclera vein patterns often lack contrast and are hard to

detect Zhou et al used a bank of multi-directional Gabor filters for

vascular pattern enhancement Derakhshani et al used contrast limited

adaptive histogram equalization (CLAHE) to enhance the green color plane

of the RGB image and a multi-scale region growing approach to identify

the sclera veins from the image background Crihalmeanu and Ross applied

a selective enhancement filter for blood vessels to extract features from the

green component in a color image In the feature matching step

Crihalmeanu and Ross proposed

three registration and matching approaches including Speed Up Robust

Features (SURF) which is based on interest-point detection minutiae

detection which is based on minutiae points on the vasculature structure

and direct correlation matching which relies on image registration Zhou et

al designed a line descriptor based feature registration and matching

method

The proposed sclera recognition consists of five steps which include

sclera segmentation vein pattern enhancement feature extraction feature

matching and matching decision Fig 2 shows the block diagram of sclera

recognition Two types of feature extraction are used in the proposed

method to achieve good accuracy for the identification The characteristics

that are elicited from the blood vessel structure seen in the sclera region are

Histogram of Oriented Gradients (HOG) and interpolation of Cartesian to

Polar conversion HOG is used to determine the gradient orientation and

edge orientations of vein pattern in the sclera region of an eye image To

become more computationally efficient the data of the image are converted

to the polar form It is mainly used for circular or quasi circular shape of

object These two characteristics are extracted from all the images in the

database and compared with the features of the query image whether the

person is correctly identified or not This procedure is done in the feature

matching step and ultimately makes the matching decision By using the

proposed feature extraction methods and matching techniques the human

identification is more accurate than the existing studies In the proposed

method two features of an image are drawn out

222 SCLERA SEGMENTATION

Sclera segmentation is the first step in the sclera recognition It lets

in three steps glare area detection sclera area estimation and iris and eyelid

detection and refinement Fig shows the steps of segmentation

FIG

Glare Area Detection Glare area means a small bright area near

pupil or iris This is the unwanted portion on the eye image Sobel filter is

applied to detect the glare area present in the iris or pupil Simply it runs

only for the grayscale image If the image is color then it needs a

conversion to grayscale image and after that apply it to the Sobel filter to

detect the glare area Fig 4 shows the result of the glare area detection

FIG

Sclera area estimation For the estimation of sclera area Otsursquos

thresholding method is applied The stairs of the sclera area detection are

selection of the area of interest (ROI) Otsursquos thresholding sclera area

detection Left and right sclera area is selected based on the iris boundaries

When the region of interest is selected then apply Otsursquos thresholding for

obtaining the potential sclera areas The correct left sclera area should be

placed in the right and center positions and correct right sclera area should

be placed in the left and center In this way non sclera areas are wiped out

223 IRIS AND EYELID REFINEMENT

The top and underside of the sclera regions are the limits of the

sclera area And then that upper eyelid lower eyelid and iris boundaries are

refined These altogether are the unwanted portion for recognition In order

to eliminate these effects refinement is done in the footstep of the detection

of sclera area Fig shows after the Otsursquos thresholding process and iris and

eyelid refinement to detect right sclera area In the same way the left sclera

area is detected using this method

FIG

In the segmentation process all images are not perfectly segmented

Hence feature extraction and matching are needed to reduce the

segmentation fault The vein patterns in the sclera area are not visible in the

segmentation process To get vein patterns more visible vein pattern

enhancement is to be performed

224 OCULAR SURFACE VASCULATURE

Human recognition using vascular patterns in the human body has

been studied in the context of fingers (Miura et al 2004) palm (Lin and

Fan 2004) and retina (Hill 1999) In the case of retinal biometrics an

especial optical device for imaging the back of the eyeball is needed (Hill

1999) Due to its perceived invasiveness and the required degree of subject

cooperation the use of retinal biometrics may not be acceptable to some

individuals The conjunctiva is a thin transparent and moist tissue that

covers the outer surface of the eye The part of the conjunctiva that covers

the inner lining of the eyelids is called palpebral conjunctiva and the part

that covers the outer surface of the eye is called ocular (or the bulbar)

conjunctiva which is the focus of this study The ocular conjunctiva is very

thin and clear thus the vasculature (including those of the episclera) is

easily visible through it The visible microcirculation of conjunctiva offers a

rich and complex network of veins and fine microcirculation (Fig 1) The

apparent complexity and specificity of these vascular patterns motivated us

to utilize them for personal identification (Derakhshani and Ross 2006)

FIG

We have found conjunctival vasculature to be a suitable biometric as it

conforms to the following criteria (Jain et al 2004)

UNIVERSALITY All normal living tissues including that of the

conjunctiva and episclera have vascular structure

UNIQUENESS Vasculature is created during embryonic vasculogenesis

Its detailed final structure is mostly stochastic and thus unique Even

though no comprehensive study on the uniqueness of vascular structures

has been conducted study of some targeted areas such as those of the eye

fundus confirm the uniqueness of such vascular patterns even between

identical twins (Simon and Goldstein 1935 Tower 1955)

PERMANENCE Other than cases of significant trauma pathology or

chemical intervention spontaneous adult ocular vasculogenesis and

angiogenesis does not easily occur Thus the conjunctival vascular

structure is expected to have reasonable permanence (Joussen 2001)

Practicality Conjunctival vasculature can be captured with commercial off

the shelf digital cameras under normal lighting conditions making this

modality highly practical

ACCEPTABILITY Since the subject is not required to stare directly into

the camera lens and given the possibility of capturing the conjunctival

vasculature from several feet away this modality is non-intrusive and thus

more acceptable

SPOOF-PROOFNESS The fine multi surface structure of the ocular

veins makes them hard to reproduce as a physical artifact Besides being a

stand-alone biometric modality we anticipate that the addition of

conjunctival biometrics will enhance the performance of current iris-based

biometric system in the following ways

Improving accuracy by the addition of vascular features

Facilitating recognition using off-angle iris images For instance if the iris

information is relegated to the left or right portions of the eye the sclera

vein patterns will be further exposed This feature makes sclera vasculature

a natural complement to the iris biometric

Addressing the failure-to-enroll issue when iris patterns are not usable (eg

due to surgical procedures)

Reducing vulnerability to spoof attacks For instance when implemented

alongside iris systems an attacker needs to reproduce not only the iris but

also different surfaces of the sclera along with the associated

microcirculation and make them available on commensurate eye surfaces

The first step in parallelizing an algorithm is to determine the

availability for simultaneous computation Below Figure demonstrates the

possibility for parallel directional filtering Since the filter is computed over

different portions of the input image the computation can be computed in

parallel (denoted by Elements below) In addition individual parallelization

of each element of Filtering can also be performed A detailed discussion of

our proposed parallelization is outside the scope of this paper

FIG

FIG

225 OVERVIEW OF THE LINE DESCRIPTOR-BASED SCLERA

VEIN

2251 RECOGNITION METHOD

The matching segment of the line-descriptor based method is a

bottleneck with regard to matching speed In this section we briefly

describe the Line Descriptor-based sclera vein recognition method After

segmentation vein patterns were enhanced by a bank of directional Gabor

filters Binary morphological operations are used to thin the detected vein

structure down to a single pixel wide skeleton and remove the branch

points The line descriptor is used to describe the segments in the vein

structure Figure 2 shows a visual description of the line descriptor Each

segment is described by three quantities the segments angle to some

reference angle at the iris center θ the segments distance to the iris center r

and the dominant angular orientation of the line segment ɸ Thus the

descriptor is S = ( θ r ɸ )T The individual components of the line descriptor

are calculated as

FIG

Here fline (x) is the polynomial approximation of the line segment (xl yl )

is the center point of the line segment (xi yi ) is the center of the detected

iris and S is the line descriptor In order to register the segments of the

vascular patterns a RANSAC-based algorithm is used to estimate the best-

fit parameters for registration between the two sclera vascular patterns For

the registration algorithm it randomly chooses two points ndash one from the

test template and one from the target template It also randomly chooses a

scaling factor and a rotation value based on a priori knowledge of the

database Using these values it calculates a fitness value for the registration

using these parameters

After sclera template registration each line segment in the test

template is compared to the line segments in the target template for

matches In order to reduce the effect of segmentation errors we created the

weighting image (Figure 3) from the sclera mask by setting interior pixels

in the sclera mask to 1 pixels within some distance of the boundary of the

mask to 05 and pixels outside the mask to 0

The matching score for two segment descriptors is calculated By

where Si and Sj are two segment descriptors m(Si Sj ) is the matching

score between segments Si and Sj d(Si Sj ) is the Euclidean distance

between the segment descriptors center points (from Eq 6-8) Dmatch is

the matching distance threshold and match is the matching angle threshold

The total matching score M is the sum of the individual matching scores

divided by the maximum matching score for the minimal set between the

test and target template That is one of the test or target templates has fewer

points and thus the sum of its descriptors weight sets the maximum score

that can be attained

FIG

FIG

FIG

FIG

movement of eye Y shape branches are observed to be a stable feature and

can be used as sclera feature descriptor To detect the Y shape branches in

the original template we search for the nearest neighbors set of every line

segment in a regular distance classified the angles among these neighbors

If there were two types of angle values in the line segment set this set may

be inferred as a Y shape structure and the line segment angles would be

recorded as a new feature of the sclera

There are two ways to measure both orientation and relationship of

every branch of Y shape vessels one is to use the angles of every branch to

x axle the other is to use the angels between branch and iris radial

direction The first method needs additional rotation operating to align the

template In our approach we employed the second method As Figure 6

shows ϕ1 ϕ2 and ϕ3 denote the angle between each branch and the radius

from pupil center Even when the head tilts the eye moves or the camera

zooms occurs at the image acquisition step ϕ1 ϕ2 and ϕ3 are quite stable

To tolerate errors from the pupil center calculation in the segmentation step

we also recorded the center position (x y) of the Y shape branches as

auxiliary parameters So in our rotation shift and scale invariant feature

vector is defined as y(ϕ1 ϕ2 ϕ3 x y) The Y-shape descriptor is generated

with reference to the iris center Therefore it is automatically aligned to the

iris centers It is a rotational- and scale- invariant descriptor V WPL

SCLERA DESCRIPTOR As we discussed in the Section 22 the line

descriptor is extracted from the skeleton of vessel structure in binary images

(Figure 7) The skeleton is then broken into smaller segments For each

segment a line descriptor is created to record the center and orientation of

the segment This descriptor is expressed as s(x yɸ) where (x y) is the

position of the center and ɸ is its orientation Because of the limitation of

segmentation accuracy the descriptor in the boundary of sclera area might

not be accurate and may contain spur edges resulting from the iris eyelid

andor eyelashes To be tolerant of such error the mask file

FIG

The line descriptor of the sclera vessel pattern (a) An eye image (b) Vessel

patterns in sclera (c) Enhanced sclera vessel patterns (d) Centers of line

segments of vessel patterns

Is designed to indicate whether a line segment belongs to the edge of the

sclera or not However in GPU application using the mask is a challenging

since the mask files are large in size and will occupy the GPU memory and

slow down the data transfer When matching the registration RANSAC

type algorithm was used to randomly select the corresponding descriptors

and the transform parameter between them was used to generate the

template transform affine matrix After every templates transform the mask

data should also be transformed and new boundary should be calculated to

evaluate the weight of the transformed descriptor This results in too many

convolutions in processor unit

To reduce heavy data transfer and computation we designed the

weighted polar line (WPL) descriptor structure which includes the

information of mask and can be automatically aligned We extracted the

relationship of geometric feature of descriptors and store them as a new

descriptor We use a weighted image created via setting various weight

values according to their positions The weight of those descriptors who are

beyond the sclera are set to be 0 and those who are near the sclera

boundary are 05 and interior descriptors are set to be 1 In our work

descriptors weights were calculated on their own mask by the CPU only

once

The calculating result was saved as a component of descriptor The

descriptor of sclera will change to s(x y ɸw) where w denotes the weight

of the point and the value may be 0 05 1 To align two templates when a

template is shifted to another location along the line connecting their

centers all the descriptors of that template will be transformed It would be

faster if two templates have similar reference points If we use the center of

the iris as the reference point when two templates are compared the

correspondence will automatically be aligned to each other since they have

the similar reference point Every feature vector of the template is a set of

line segment descriptors composed of three variable (Figure 8) the

segment angle to the reference line which went through the iris center

denoted as θ the distance between the segments center and pupil center

which is denoted as r the dominant angular orientation of the segment

denoted as ɸ To minimize the GPU computing we also convert the

descriptor value from polar coordinate to rectangular coordinate in CPU

preprocess

The descriptor vector becomes s(x y r θ ɸw) The left and right

parts of sclera in an eye may have different registration parameters For

example as an eyeball moves left left part sclera patterns of the eye may be

compressed while the right part sclera patterns are stretched

In parallel matching these two parts are assigned to threads in

different warps to allow different deformation The multiprocessor in

CUDA manages threads in groups of 32 parallel threads called warps We

reorganized the descriptor from same sides and saved

FIG

FIG

them in continuous address This would meet requirement of coalesced

memory access in GPU

After reorganizing the structure of descriptors and adding mask information

into the new descriptor the computation on the mask file is not needed on

the GPU It was very fast to match with this feature because it does not

need to reregister the templates every time after shifting Thus the cost of

data transfer and computation on GPU will be reduced Matching on the

new descriptor the shift parameters generator in Figure 4 is then simplified

as Figure 9

23 EVOLUTION OF GPU ARCHITECTURE

The fixed-function pipeline lacked the generality to efficiently express

more complicated shading and lighting operations that are essential for

complex effects The key step was replacing the fixed-function per-vertex

and per-fragment operations with user-specified programs run on each

vertex and fragment Over the past six years these vertex programs and

fragment programs have become increasingly more capable with larger

limits on their size and resource consumption with more fully featured

instruction sets and with more flexible control-flow operations After many

years of separate instruction sets for vertex and fragment operations current

GPUs support the unified Shader Model 40 on both vertex and fragment

shaders

The hardware must support shader programs of at least 65 k static

instructions and unlimited dynamic instructions

The instruction set for the first time supports both 32-bit integers and 32-

bit floating-point numbers

The hardware must allow an arbitrary number of both direct and indirect

reads from global memory (texture)

Finally dynamic flow control in the form of loops and branches must be

supported

As the shader model has evolved and become more powerful and GPU

applications of all types have increased vertex and fragment program

complexity GPU architectures have increasingly focused on the

programmable parts of the graphics pipeline Indeed while previous

generations of GPUs could best be described as additions of

programmability to a fixed-function pipeline todayrsquos GPUs are better

characterized as a programmable engine surrounded by supporting fixed-

function units General-Purpose Computing on the GPU Mapping general-

purpose computation onto the GPU uses the graphics hardware in much the

same way as any standard graphics application Because of this similarity it

is both easier and more difficult to explain the process On one hand the

actual operations are the same and are easy to follow on the other hand the

terminology is different between graphics and general-purpose use Harris

provides an excellent description of this mapping process

We begin by describing GPU programming using graphics terminology

then show how the same steps are used in a general-purpose way to author

GPGPU applications and finally use the same steps to show the more

simple and direct way that todayrsquos GPU computing applications are written

231 PROGRAMMING A GPU FOR GRAPHICS

We begin with the same GPU pipeline that we described in Section II

concentrating on the programmable aspects of this pipeline

The programmer specifies geometry that covers a region on the screen

The rasterizer generates a fragment at each pixel location covered by that

geometry

Each fragment is shaded by the fragment program

The fragment program computes the value of the fragment by a

combination of math operations and global memory reads from a global

Btexture[ memory

The resulting image can then be used as texture on future passes through

the graphics pipeline

232 PROGRAMMING A GPU FOR GENERAL-PURPOSE

PROGRAMS (OLD)

Coopting this pipeline to perform general-purpose computation

involves the exact same steps but different terminology A motivating

example is a fluid simulation computed over a grid at each time step we

compute the next state of the fluid for each grid point from the current state

at its grid point and at the grid points of its neighbors

The programmer specifies a geometric primitive that covers a

computation domain of interest The rasterizer generates a fragment at each

pixel location covered by that geometry (In our example our primitive

must cover a grid of fragments equal to the domain size of our fluid

simulation)

Each fragment is shaded by an SPMD general-purpose fragment

program (Each grid point runs the same program to update the state of its

fluid)

The fragment program computes the value of the fragment by a

combination of math operations and Bgather[ accesses from global

memory (Each grid point can access the state of its neighbors from the

previous time step in computing its current value)

The resulting buffer in global memory can then be used as an input on

future passes (The current state of the fluid will be used on the next time

step)

23 2 PROGRAMMING A GPU FOR GENERAL-PURPOSE

PROGRAMS (NEW)

One of the historical difficulties in programming GPGPU applications

has been that despite their general-purpose tasksrsquo having nothing to do with

graphics the applications still had to be programmed using graphics APIs

In addition the program had to be structured in terms of the graphics

pipeline with the programmable units only accessible as an intermediate

step in that pipeline when the programmer would almost certainly prefer to

access the programmable units directly The programming environments we

describe in detail in Section IV are solving this difficulty by providing a

more natural direct non-graphics interface to the hardware and

specifically the programmable units Today GPU computing applications

are structured in the following way

The programmer directly defines the computation domain of interest as a

structured grid of threads

An SPMD general-purpose program computes the value of each thread

The value for each thread is computed by a combination of math

operations and both Bgather[ (read) accesses from and Bscatter[ (write)

accesses to global memory Unlike in the previous two

methods the same buffer can be used for both reading and writing

allowing more flexible algorithms (for example in-place algorithms that

use less memory)

The resulting buffer in global memory can then be used as an input in

future computation

24 COARSE-TO-FINE TWO-STAGE MATCHING PROCESS

To further improve the matching process we propose the coarse-to-fine

two-stage matching process In the first stage we matched two images

coarsely using the Y-shape descriptors which is very fast to match because

no registration was needed The matching result in this stage can help filter

out image pairs with low similarities After this step it is still possible for

some false positive matches In the second stage we used WPL descriptor

to register the two images for more detailed descriptor matching including

scale- and translation invariance This stage includes shift transform affine

matrix generation and final WPL descriptor matching Overall we

partitioned the registration and matching processing into four kernels2 in

CUDA (Figure 10) matching on the Y shape descriptor shift

transformation affine matrix generation and final WSL descriptor

matching Combining these two stages the matching program can run faster

and

achieve more accurate score

241 STAGE I MATCHING WITH Y SHAPE DESCRIPTOR

Due to scale- and rotation- invariance of the Y-shape features

registration is unnecessary before matching on Y shape descriptor The

whole matching algorithm is listed as algorithm 1

FIG

Here ytei and yta j are the Y shape descriptors of test template Tte

and target template Tta respectively dϕ is the Euclidian distance of angle

element of descriptors vector defined as (3) dxy is the Euclidian distance of

two descriptor centers defined as (4) ni and di are the matched descriptor

pairsrsquo number and their centers distance respectively tϕ is a distance

threshold and txy is the threshold to restrict the searching area We set tϕ to

30 and txy to 675 in our experiment Here

To match two sclera templates we searched the areas nearby to all

the Y shape branches The search area is limited to the corresponding left or

right half of the sclera in order to reduce the searching range and time The

distance of two branches is defined in (3) where ϕi j is the angle between

the j th branch and the polar from pupil center in desctiptor i

The number of matched pairs ni and the distance between Y shape

branches centers di are stored as the matching result We fuse the number of

matched branches and the average distance between matched branches

centers as (2) Here α is a factor to fuse the matching score which was set

to 30 in our study Ni and Nj is the total numbers of feature vectors in

template i and j separately The decision is regulated by the threshold t if

the sclerarsquos matching score is lower than t the sclera will be discarded The

sclera with high matching score will be passed to the next more precisely

matching process

242 STAGE II FINE MATCHING USING WPL DESCRIPTOR

The line segment WSL descriptor reveals more vessel structure detail of

sclera than the Y shape descriptor The variation of sclera vessel pattern is

nonlinear because

When acquiring an eye image in different gaze angle the vessel structure

will appear nonlinear shrink or extend because eyeball is spherical in shape

sclera is made up of four layers episclera stroma lamina fusca and

endothelium There are slightly differences among movement of these

layers Considering these factors our registration employed both single

shift transform and multi-parameter transform which combines shift

rotation and scale together

1) SHIFT PARAMETER SEARCH As we discussed before

segmentation may not be accurate As a result the detected iris center could

not be very accurate Shift transform is designed to tolerant possible errors

in pupil center detection in the segmentation step If there is no deformation

or only very minor deformation registration with shift transform together

would be adequate to achieve an accurate result We designed Algorithm 2

to get optimized shift parameter Where Tte is the test template and ssei is

the i th WPL descriptor of Tte Tta is the target template and ssai is the i th

WPL descriptor of Ttad(stek staj ) is Euclidean distance of escriptors stek

and staj

Δsk is the shift value of two descriptors defines as

We first randomly select an equal number of segment descriptors

stek in test template Tte from each quad and find its nearest neighbors staj _

in target template Tta The shift offset of them is recorded as the possible

registration shift factor _sk The final offset registration factor is _soptim

which has the smallest standard deviation among these candidate offsets

2) AFFINE TRANSFORM PARAMETER SEARCH

Affine transform is designed to tolerant some deformation of sclera

patterns in the matching step The affine transform algorithm is shown in

Algorithm 3 The shift value in the parameter set is obtained by randomly

selecting descriptor s(it )te and calculating the distance from its nearest

neighbor staj_ in Tta We transform the test template by the matrix in (7)

At end of the iteration we count the numbers of matched descriptor pairs

from the transformed template and the target template The factor β is

involved to determine if the pair of descriptor is matched and we set it to

be 20 pixels in our experiment After N iterations the optimized transform

parameter set is determined via selecting the maximum matching numbers

m(it) Here stei Tte staj and Tta is defined same as algorithm 2 tr (it )

shi f t θ(it )tr (it ) scale is the parameters of shift rotation and scale

transform generated in i tth iteration R(θ (it )) T (tr (it ) shi f t ) and S(tr (it

) scale) are the transform matrix defined as (7) To search optimize

transform parameter we iterated N times to generate these parameters In

our experiment we set iteration time to 512

3) REGISTRATION AND MATCHING ALGORITHM

Using the optimized parameter set determined from Algorithms 2

and 3 the test template will be registered and matched simultaneously The

registration and matching algorithm is listed in Algorithm 4 Here stei Tte

staj and Tta are defined same as Algorithms 2 and 3 θ(optm) tr (optm) shi f

t tr (optm) scale _soptim are the registration parameters attained from

Algorithms 2 and 3 R_θ(optm)_T _tr (optm) shi f t _S_tr (optm) scale

_ is the descriptor transform matrix defined in Algorithm 3 empty is the angle

between the segment descriptor and radius direction w is the weight of the

descriptor which indicates whether the descriptor is at the edge of sclera or

not To ensure that the nearest descriptors have a similar orientation we

used a constant factor α to check the abstract difference of two ɸ In our

experiment we set α to 5 The total matching score is minimal score of two

transformed result divided by the minimal matching score for test template

and target template

25 MAPPING THE SUBTASKS TO CUDA

CUDA is a single instruction multiple data (SIMD) system and

works as a coprocessor with a CPU A CUDA consists of many streaming

multiprocessors (SM) where the parallel part of the program should be

partitioned into threads by the programmer and mapped into those threads

There are multiple memory spaces in the CUDA memory hierarchy

register local memory shared memory global memory constant memory

and texture memory Register local memory and shared memory are on-

chip and could be a little time consuming to access these memories Only

shared memory can be accessed by other threads within the same block

However there is only limited availability of shared memory Global

memory constant memory and texture memory are off-chip memory and

accessible by all threads which would be very time consuming to access

these memories

Constant memory and texture memory are read-only and cacheable

memory Mapping algorithms to CUDA to achieve efficient processing is

not a trivial task There are several challenges in CUDA programming

If threads in a warp have different control path all the branches will be

executed serially To improve performance branch divergence within a

warp should be avoided

Global memory is slower than on-chip memory in term of access To

completely hide the latency of the small instructions set we should use on-

chip memory preferentially rather than global memory When global

memory access occurs threads in same warp should access the words in

sequence to achieve coalescence

Shared memory is much faster than the local and global memory space

But shared memory is organized into banks which are equal in size If two

addresses of memory request from different thread within a warp fall in the

same memory bank the access will be serialized To get maximum

performance memory requests should be scheduled to minimize bank

conflicts

251 MAPPING ALGORITHM TO BLOCKS

Because the proposed registration and matching algorithm has four

independent modules all the modules will be converted to different kernels

on the GPU These kernels are different in computation density thus we

map them to the GPU by various map strategies to fully utilize the

computing power of CUDA Figure 11 shows our scheme of CPU-GPU

task distribution and the partition among blocks and threads Algorithm 1 is

partitioned into coarse-grained parallel subtasks

We create a number of threads in this kernel The number of threads

is the same as the number of templates in the database As the upper middle

column shows in Figure 11 each target template will be assigned to one

thread One thread performs a pair of templates compare In our work we

use NVIDIA C2070 as our GPU Threads and blocks number is set to

1024 That means we can match our test template with up to 1024times1024

target templates at same time

Algorithms 2-4 will be partitioned into fine-grained subtasks which is

processed a section of descriptors in one thread As the lower portion of the

middle column shows in Figure 11 we assigned a target template to one

block Inside a block one thread corresponds a set of descriptors in this

template This partition makes every block execute independently and there

are no data exchange requirements between different blocks When all

threads complete their responding descriptor fractions the sum of the

intermediate results needs to be computed or compared A parallel prefix

sum algorithm is used to calculate the sum of intermediate results which is

show in right of Figure 11 Firstly all odd number threads compute the sum

of consecutive pairs of the results Then recursively every first of i (= 4 8

16 32 64 ) threads

compute the prefix sum on the new result The final result will be saved in

the first address which has the same variable name as the first intermediate

result

252 MAPPING INSIDE BLOCK

In shift argument searching there are two schemes we can choose to

map task

Mapping one pair of templates to all the threads in a block and then every

thread would take charge of a fraction of descriptors and cooperation with

other threads

Assigning a single possible shift offset to a thread and all the threads will

compute independently unless the final result should be compared with

other possible offset

Due to great number of sum and synchronization operations in every

nearest neighbor searching step we choose the second method to parallelize

shift searching In affine matrix generator we mapped an entire parameter

set searching to a thread and every thread randomly generated a set of

parameters and tried them independently The generated iterations were

assigned to all threads The challenge of this step is the randomly generated

numbers might be correlated among threads In the step of rotation and

scale registration generating we used the Mersenne Twister pseudorandom

number generator because it can use bitwise arithmetic and have long

period

The Mersenne twister as most of pseudorandom generators is iterative

Therefore itrsquos hard to parallelize a single twister state update step among

several execution threads To make sure that thousands of threads in the

launch grid generate uncorrelated random sequence many simultaneous

Mersenne twisters need to process with different initial states in parallel

But even ldquovery differentrdquo (by any definition) initial state values do not

prevent the emission of correlated sequences by each generator sharing

identical parameters To solve this problem and to enable efficient

implementation of Mersenne Twister on parallel architectures we used a

special offline tool for the dynamic creation of Mersenne Twisters

parameters modified from the algorithm developed by Makoto Matsumoto

and Takuji Nishimura In the registration and matching step when

searching the nearest neighbor a line segment that has already matched

with others should not be used again In our approach a flag

FIG

FIG

Variable denoting whether the line has been matched is stored in

shared memory To share the flags all the threads in a block should wait

synchronic operation at every query step Our solution is to use a single

thread in a block to process the matching

253 MEMORY MANAGEMENT

The bandwidth inside GPU board is much higher than the

bandwidth between host memory and device memory The data transfer

between host and device can lead to long latency As shown in Figure 11

we load the entire target templates set from database without considering

when they would be processed Therefore there was no data transfer from

host to device during the matching procedure In global memory the

components in descriptor y(ϕ1 ϕ2 ϕ3 x y) and s(x y rθ ϕw) were stored

separately This would guarantee contiguous kernels of Algorithm 2 to 4

can access their data in successive addresses Although such coalescing

access reduces the latency frequently global memory access was still a

slower way to get data In our kernel we loaded the test template to shared

memory to accelerate memory access Because the Algorithms 2 to 4

execute different number of iterations on same data the bank conflict does

not happen To maximize our texture memory space we set the system

cache to the lowest value and bonded our target descriptor to texture

memory Using this catchable memory our data access was accelerated

more

FIG

26 HISTOGRAM OF ORIENTED GRADIENTS

Histogram of oriented gradients is the feature descriptors It is primarily

applied to the design of target detection In this paper it is applied as the

feature for human recognition In the sclera region the vein patterns are the

edges of an image So HOG is used to determine the gradient orientation

and edge orientations of vein pattern in the sclera region of an eye image

To follow out this technique first of all divide the image into small

connected regions called cells For each cell compute the histogram of

gradient directions or edge orientations of the pixels Then the combination

of different histogram of different cell represents the descriptor To improve

accuracy histograms can be contrast normalized by calculating the intensity

from the block and then using this value normalizes all cells within the

block This normalization result shows that it is invariant to geometric and

photometric changes The gradient magnitude m(x y) and orientation 1050592(x

y) are calculated using x and y directions gradients dx (x y) and dy (x y)

Orientation binning is the second step of HOG This method utilized

to create cell histograms Each pixel within the cell used to give a weight to

the orientation which is found in the gradient computation Gradient

magnitude is used as the weight The cells are in the rectangular form The

binning of gradient orientation should be spread over 0 to 180 degrees and

opposite direction counts as the same In the Fig 8 depicts the edge

orientation of picture elements If the images have any illumination and

contrast changes then the gradient strength must be locally normalized For

that cells are grouped together into larger blocks These blocks are

overlapping so that each cell contributes more than once to the final

descriptor Here rectangular HOG (R-HOG) blocks are applied which are

mainly in square grids The performance of HOG is improved by putting

on a Gaussian window into each block

FIG

CHAPTER 3

SOFTWARE SPECIFICATION

31 GENERAL

MATLAB(matrix laboratory) is a numerical

computing environment and fourth-generation programming language

Developed by Math Works MATLAB allows matrix manipulations

plotting of functions and data implementation of algorithms creation

of user interfaces and interfacing with programs written in other languages

including C C++ Java and Fortran

Although MATLAB is intended primarily for numerical computing an

optional toolbox uses the MuPAD symbolic engine allowing access

to symbolic computing capabilities An additional package Simulink adds

graphicalmulti-domainsimulationandModel-Based

Design for dynamic and embedded systems

In 2004 MATLAB had around one million users across industry

and academia MATLAB users come from various backgrounds

of engineering science and economics MATLAB is widely used in

academic and research institutions as well as industrial enterprises

MATLAB was first adopted by researchers and practitioners

in control engineering Littles specialty but quickly spread to many other

domains It is now also used in education in particular the teaching

of linear algebra and numerical analysis and is popular amongst scientists

involved in image processing The MATLAB application is built around the

MATLAB language The simplest way to execute MATLAB code is to type

it in the Command Window which is one of the elements of the MATLAB

Desktop When code is entered in the Command Window MATLAB can

be used as an interactive mathematical shell Sequences of commands can

be saved in a text file typically using the MATLAB Editor as a script or

encapsulated into a function extending the commands available

MATLAB provides a number of features for documenting and

sharing your work You can integrate your MATLAB code with other

languages and applications and distribute your MATLAB algorithms and

applications

32 FEATURES OF MATLAB

High-level language for technical computing

Development environment for managing code files and data

Interactive tools for iterative exploration design and problem solving

Mathematical functions for linear algebra statistics Fourier analysis

filtering optimization and numerical integration

2-D and 3-D graphics functions for visualizing data

Tools for building custom graphical user interfaces

Functions for integrating MATLAB based algorithms with external

applications and languages such as C C++ FORTRAN Javatrade COM

and Microsoft Excel

MATLAB is used in vast area including signal and image

processing communications control design test and measurement

financial modeling and analysis and computational Add-on toolboxes

(collections of special-purpose MATLAB functions) extend the MATLAB

environment to solve particular classes of problems in these application

areas

MATLAB can be used on personal computers and powerful

server systems including the Cheaha compute cluster With the addition of

the Parallel Computing Toolbox the language can be extended with parallel

implementations for common computational functions including for-loop

unrolling Additionally this toolbox supports offloading computationally

intensive workloads to Cheaha the campus compute cluster MATLAB is

one of a few languages in which each variable is a matrix (broadly

construed) and knows how big it is Moreover the fundamental operators

(eg addition multiplication) are programmed to deal with matrices when

required And the MATLAB environment handles much of the bothersome

housekeeping that makes all this possible Since so many of the procedures

required for Macro-Investment Analysis involves matrices MATLAB

proves to be an extremely efficient language for both communication and

implementation

321 INTERFACING WITH OTHER LANGUAGES

MATLAB can call functions and subroutines written in the C

programming language or FORTRAN A wrapper function is created

allowing MATLAB data types to be passed and returned The dynamically

loadable object files created by compiling such functions are termed MEX-

files (for MATLAB executable)

Libraries written in Java ActiveX or NET can be directly called

from MATLAB and many MATLAB libraries (for

example XML or SQL support) are implemented as wrappers around Java

or ActiveX libraries Calling MATLAB from Java is more complicated but

can be done with MATLAB extension which is sold separately by Math

Works or using an undocumented mechanism called JMI (Java-to-Mat lab

Interface) which should not be confused with the unrelated Java that is also

called JMI

As alternatives to the MuPAD based Symbolic Math Toolbox

available from Math Works MATLAB can be connected

to Maple or Mathematical

Libraries also exist to import and export MathML

Development Environment

Startup Accelerator for faster MATLAB startup on Windows especially on

Windows XP and for network installations

Spreadsheet Import Tool that provides more options for selecting and

loading mixed textual and numeric data

Readability and navigation improvements to warning and error messages in

the MATLAB command window

Automatic variable and function renaming in the MATLAB Editor

Developing Algorithms and Applications

MATLAB provides a high-level language and development

tools that let you quickly develop and analyze your algorithms and

applications

The MATLAB Language

The MATLAB language supports the vector and matrix operations

that are fundamental to engineering and scientific problems It enables fast

development and execution With the MATLAB language you can

program and develop algorithms faster than with traditional languages

because you do not need to perform low-level administrative tasks such as

declaring variables specifying data types and allocating memory In many

cases MATLAB eliminates the need for lsquoforrsquo loops As a result one line of

MATLAB code can often replace several lines of C or C++ code

At the same time MATLAB provides all the features of a traditional

programming language including arithmetic operators flow control data

structures data types object-oriented programming (OOP) and debugging

features

MATLAB lets you execute commands or groups of commands one

at a time without compiling and linking enabling you to quickly iterate to

the optimal solution For fast execution of heavy matrix and vector

computations MATLAB uses processor-optimized libraries For general-

purpose scalar computations MATLAB generates machine-code

instructions using its JIT (Just-In-Time) compilation technology

This technology which is available on most platforms provides

execution speeds that rival those of traditional programming languages

Development Tools

MATLAB includes development tools that help you implement

your algorithm efficiently These include the following

MATLAB Editor

Provides standard editing and debugging features such as setting

breakpoints and single stepping

Code Analyzer

Checks your code for problems and recommends modifications to

maximize performance and maintainability

MATLAB Profiler

Records the time spent executing each line of code

Directory Reports

Scan all the files in a directory and report on code efficiency file

differences file dependencies and code coverage

Designing Graphical User Interfaces

By using the interactive tool GUIDE (Graphical User Interface

Development Environment) to layout design and edit user interfaces

GUIDE lets you include list boxes pull-down menus push buttons radio

buttons and sliders as well as MATLAB plots and Microsoft

ActiveXreg controls Alternatively you can create GUIs programmatically

using MATLAB functions

322 ANALYZING AND ACCESSING DATA

MATLAB supports the entire data analysis process from acquiring

data from external devices and databases through preprocessing

visualization and numerical analysis to producing presentation-quality

output

Data Analysis

MATLAB provides interactive tools and command-line functions for data

analysis operations including

Interpolating and decimating

Extracting sections of data scaling and averaging

Thresholding and smoothing

Correlation Fourier analysis and filtering

1-D peak valley and zero finding

Basic statistics and curve fitting

Matrix analysis

Data Access

MATLAB is an efficient platform for accessing data from

files other applications databases and external devices You can read data

from popular file formats such as Microsoft Excel ASCII text or binary

files image sound and video files and scientific files such as HDF and

HDF5 Low-level binary file IO functions let you work with data files in

any format Additional functions let you read data from Web pages and

XML

Visualizing Data

All the graphics features that are required to visualize engineering

and scientific data are available in MATLAB These include 2-D and 3-D

plotting functions 3-D volume visualization functions tools for

interactively creating plots and the ability to export results to all popular

graphics formats You can customize plots by adding multiple axes

changing line colors and markers adding annotation Latex equations and

legends and drawing shapes

2-D Plotting

Visualizing vectors of data with 2-D plotting functions that create

Line area bar and pie charts

Direction and velocity plots

Histograms

Polygons and surfaces

Scatterbubble plots

Animations

3-D Plotting and Volume Visualization

MATLAB provides functions for visualizing 2-D matrices 3-

D scalar and 3-D vector data You can use these functions to visualize and

understand large often complex multidimensional data Specifying plot

characteristics such as camera viewing angle perspective lighting effect

light source locations and transparency

3-D plotting functions include

Surface contour and mesh

Image plots

Cone slice stream and isosurface

323 PERFORMING NUMERIC COMPUTATION

MATLAB contains mathematical statistical and engineering

functions to support all common engineering and science operations These

functions developed by experts in mathematics are the foundation of the

MATLAB language The core math functions use the LAPACK and BLAS

linear algebra subroutine libraries and the FFTW Discrete Fourier

Transform library Because these processor-dependent libraries are

optimized to the different platforms that MATLAB supports they execute

faster than the equivalent C or C++ code

MATLAB provides the following types of functions for performing

mathematical operations and analyzing data

Matrix manipulation and linear algebra

Polynomials and interpolation

Fourier analysis and filtering

Data analysis and statistics

Optimization and numerical integration

Ordinary differential equations (ODEs)

Partial differential equations (PDEs)

Sparse matrix operations

MATLAB can perform arithmetic on a wide range of data types

including doubles singles and integers

CHAPTER 4

IMPLEMENTATION

41 GENERAL

Matlab is a program that was originally designed to simplify the

implementation of numerical linear algebra routines It has since grown into

something much bigger and it is used to implement numerical algorithms

for a wide range of applications The basic language used is very similar to

standard linear algebra notation but there are a few extensions that will

likely cause you some problems at first

42 SNAPSHOTS

ORIGINAL SCLERA IMAGE IS CONVERTED INTO GREY SCALE IMAGE

FIG

GREY SCALE IMAGE IS CONVERTED INTO BINARY IMAGE

FIG

EDGE DETECTON IS DONE BY OTSUrsquoS THRESHOLDING

FIG

SELECTING THE REGION OF INTEREST (SCLERA PART)

FIG

SELECTED ROI PART

FIG

FIG

ENHANCEMENT OF SCLERA IMAGE

FIG

FEATURE EXTRACTION OF SCLERA IMAGE USING GABOR

FILTERS

FIG

MATCHING WITH IMAGES IN DATABASE

FIG

DISPLAYING THE RESULT (MATCHED OR NOT MATCHED)

FIG

CHAPTER 5

APPLICATIONS

The applications of biometrics can be divided into the following three main groups

Commercial applications such as computer network login electronic data security ecommerce Internet access ATM credit card physical access control cellular phone PDA medical records management distance learning etc

Government applications such as national ID card correctional facility driverrsquos license social security welfare-disbursement border control Passport control etc

Forensic applications such as corpse identification criminal investigation terrorist identification parenthood determination missing children etc Traditionally commercial applications have used knowledge-based systems (eg PIN sand passwords) government applications have used token-based systems (eg ID cards and badges) and forensic applications have relied on human experts to match biometric features Biometric systems are being increasingly deployed in large scale civilian applications The Schiphol Premium scheme at the Amsterdam airport for example employs iris scan cards to speed up the passport and visa control procedures

CHAPTER 6

CONCLUSION AND FUTURE SCOPE

61 CONCLUSION

In this paper we proposed a new parallel sclera vein recognition

method which employees a two stage parallel approach for registration and

matching Even though the research focused on developing a parallel sclera

matching solution for the sequential line-descriptor method using CUDA

GPU architecture the parallel strategies developed in this research can be

applied to design parallel solutions to other sclera vein recognition methods

and general pattern recognition methods We designed the Y shape

descriptor to narrow the search range to increase the matching efficiency

which is a new feature extraction method to take advantage of the GPU

structures We developed the WPL descriptor to incorporate mask

information and make it more suitable for parallel computing which can

dramatically reduce data transferring and computation We then carefully

mapped our algorithms to GPU threads and blocks which is an important

step to achieve parallel computation efficiency using a GPU A work flow

which has high arithmetic intensity to hide the memory access latency was

designed to partition the computation task to the heterogeneous system of

CPU and GPU even to the threads in GPU The proposed method

dramatically improves the matching efficiency without compromising

recognition accuracy

62 REFERENCES

[1] C W Oyster The Human Eye Structure and Function Sunderland

Sinauer Associates 1999

[2] C Cuevas D Berjon F Moran and N Garcia ldquoMoving object

detection for real-time augmented reality applications in a GPGPUrdquo IEEE

Trans Consum Electron vol 58 no 1 pp 117ndash125 Feb 2012

[3] D C Cirean U Meier L M Gambardella and J Schmidhuber ldquoDeep

big simple neural nets for handwritten digit recognitionrdquo Neural Comput

vol 22 no 12 pp 3207ndash3220 2010

[4] F Z Sakr M Taher and A M Wahba ldquoHigh performance iris

recognition system on GPUrdquo in Proc ICCES 2011 pp 237ndash242

[5] G Poli J H Saito J F Mari and M R Zorzan ldquoProcessing

neocognitron of face recognition on high performance environment based

on GPU with CUDA architecturerdquo in Proc 20th Int Symp Comput

Archit High Perform Comput 2008 pp 81ndash88

[6] J Antikainen J Havel R Josth A Herout P Zemcik and M Hauta-

Kasari ldquoNonnegative tensor factorization accelerated using GPGPUrdquo IEEE

Trans Parallel Distrib Syst vol 22 no 7 pp 1135ndash1141 Feb 2011

[7] K-S Oh and K Jung ldquoGPU implementation of neural networksrdquo

Pattern Recognit vol 37 no 6 pp 1311ndash1314 2004

[8] P R Dixon T Oonishi and S Furui ldquoHarnessing graphics processors

for the fast computation of acoustic likelihoods in speech recognitionrdquo

Comput Speech Lang vol 23 no 4 pp 510ndash526 2009

[9] P Kaufman and A Alm ldquoClinical applicationrdquo Adlerrsquos Physiology of

the Eye 2003

[10] R N Rakvic B J Ulis R P Broussard R W Ives and N Steiner

ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics Security vol 4

no 4 pp 812ndash823 Dec 2009

[11] S Crihalmeanu and A Ross ldquoMultispectral scleral patterns for ocular

biometric recognitionrdquo Pattern Recognit Lett vol 33 no 14 pp 1860ndash

1869 Oct 2012

[12] W Wenying Z Dongming Z Yongdong L Jintao and G

Xiaoguang ldquoRobust spatial matching for object retrieval and its parallel

implementation on GPUrdquo IEEE Trans Multimedia vol 13 no 6 pp

1308ndash1318 Dec 2011Multimedia Sec Magdeburg Germany Sep 2004

pp 4ndash15

[13] Y Xu S Deka and R Righetti ldquoA hybrid CPU-GPGPU approach for

real-time elastographyrdquo IEEE Trans Ultrason Ferroelectr Freq Control

vol 58 no 12 pp 2631ndash2645 Dec 2011

[14] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive

multimodal eye recognitionrdquo Signal Image Video Process vol 7 no 4

pp 619ndash631 Jul 2013

[15] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive

approach for sclera image quality measurerdquo Int J Biometrics vol 5 no 2

pp 181ndash198 2013

[16] Z Zhou E Y Du N L Thomas and E J Delp ldquoA new human

identification method Sclera recognitionrdquo IEEE Trans Syst Man

Cybern A Syst Humans vol 42 no 3 pp 571ndash583 May 2012

Page 11: Major project report on

CHAPTER 1

INTRODUCTION

11GENERAL

Digital image processing is the use of computer algorithms to

perform image processing on digital images The 2D continuous image is

divided into N rows and M columns The intersection of a row and a

column is called a pixel The image can also be a function other variables

including depth color and time An image given in the form of a

transparency slide photograph or an X-ray is first digitized and stored as a

matrix of binary digits in computer memory This digitized image can then

be processed andor displayed on a high-resolution television monitor For

display the image is stored in a rapid-access buffer memory which

refreshes the monitor at a rate of 25 frames per second to produce a visually

continuous display

12 OVERVIEW ABOUT DIGITAL IMAGE PROCESSING

The field of ldquoDigital Image Processingrdquo refers to processing the digital

images by means of a digital computer In a broader sense it can be

considered as a processing of any two dimensional data where any image

(optical information) is represented as an array of real or complex numbers

represented by a definite number of bits An image is represented as a two

dimensional function f(xy) where lsquoxrsquo and lsquoyrsquo are spatial (plane)

coordinates and the amplitude of f at any pair of coordinates (xy)

represents the intensity or gray level of the image at that point

A digital image is one for which both the co-ordinates and the

amplitude values of f are all finite discrete quantities Hence a digital

image is composed of a finite number of elements each of which has a

particular location value These elements are called ldquopixelsrdquo A digital

image is discrete in both spatial coordinates and brightness and it can be

considered as a matrix whose rows and column indices identify a point on

the image and the corresponding matrix element value identifies the gray

level at that point

One of the first applications of digital images was in the newspaper

industry when pictures were first sent by submarine cable between London

and New York Introduction of the Bartlane cable picture transmission

system in the early 1920s reduced the time required to transport a picture

across the Atlantic from more than a week to less than three hours

FIG

121 PREPROCESSING

In imaging science image processing is any form of signal

processing for which the input is an image such as a photograph or video

frame the output of image processing may be either an image or a set of

characteristics or parameters related to the image Most image-processing

techniques involve treating the image as a two-dimensional signal and

applying standard signal-processing techniques to it Image processing

usually refers to digital image processing but optical and analog image

processing also are possible This article is about general techniques that

apply to all of them The acquisition of images (producing the input image

in the first place) is referred to as imaging

Image processing refers to processing of a 2D picture by a

computer Basic definitions

An image defined in the ldquoreal worldrdquo is considered to be a function

of two real variables for example a(xy) with a as the amplitude (eg

brightness) of the image at the real coordinate position (xy) Modern digital

technology has made it possible to manipulate multi-dimensional signals

with systems that range from simple digital circuits to advanced parallel

computers The goal of this manipulation can be divided into three

categories

Image processing (image in -gt image out)

Image Analysis (image in -gt measurements out)

Image Understanding (image in -gt high-level description out)

An image may be considered to contain sub-images sometimes referred

to as regions-of-interest ROIs or simply regions This concept reflects the

fact that images frequently contain collections of objects each of which can

be the basis for a region In a sophisticated image processing system it

should be possible to apply specific image processing operations to selected

regions Thus one part of an image (region) might be processed to suppress

motion blur while another part might be processed to improve colour

rendition

Most usually image processing systems require that the images be

available in digitized form that is arrays of finite length binary words For

digitization the given Image is sampled on a discrete grid and each sample

or pixel is quantized using a finite number of bits The digitized image is

processed by a computer To display a digital image it is first converted

into analog signal which is scanned onto a display Closely related to

image processing are computer graphics and computer vision In computer

graphics images are manually made from physical models of objects

environments and lighting instead of being acquired (via imaging devices

such as cameras) from natural scenes as in most animated movies

Computer vision on the other hand is often considered high-level image

processing out of which a machinecomputersoftware intends to decipher

the physical contents of an image or a sequence of images (eg videos or

3D full-body magnetic resonance scans)

In modern sciences and technologies images also gain much

broader scopes due to the ever growing importance of scientific

visualization (of often large-scale complex scientificexperimental data)

Examples include microarray data in genetic research or real-time multi-

asset portfolio trading in finance Before going to processing an image it is

converted into a digital form Digitization includes sampling of image and

quantization of sampled values After converting the image into bit

information processing is performed This processing technique may be

Image enhancement Image restoration and Image compression

122 IMAGE ENHANCEMENT

It refers to accentuation or sharpening of image features such as

boundaries or contrast to make a graphic display more useful for display amp

analysis This process does not increase the inherent information content in

data It includes gray level amp contrast manipulation noise reduction edge

crispening and sharpening filtering interpolation and magnification

pseudo coloring and so on

123 IMAGE RESTORATION

It is concerned with filtering the observed image to minimize the

effect of degradations Effectiveness of image restoration depends on the

extent and accuracy of the knowledge of degradation process as well as on

filter design Image restoration differs from image enhancement in that the

latter is concerned with more extraction or accentuation of image features

124 IMAGE COMPRESSION

It is concerned with minimizing the number of bits required to represent

an image Application of compression are in broadcast TV remote sensing

via satellite military communication via aircraft radar teleconferencing

facsimile transmission for educational amp business documents medical

images that arise in Computer tomography magnetic resonance imaging

and digital radiology motion pictures satellite images weather maps

geological surveys and so on

Text compression ndash CCITT GROUP3 amp GROUP4

Still image compression ndash JPEG

Video image compression ndash MPEG

125 SEGMENTATION

In computer vision image segmentation is the process of

partitioning a digital image into multiple segments (sets of pixels also

known as super pixels) The goal of segmentation is to simplify andor

change the representation of an image into something that is more

meaningful and easier to analyze Image segmentation is typically used to

locate objects and boundaries (lines curves etc) in images More precisely

image segmentation is the process of assigning a label to every pixel in an

image such that pixels with the same label share certain visual

characteristics

The result of image segmentation is a set of segments that

collectively cover the entire image or a set of contours extracted from the

image (see edge detection) Each of the pixels in a region are similar with

respect to some characteristic or computed property such as

colour intensity or texture Adjacent regions are significantly different

with respect to the same characteristic(s) When applied to a stack of

images typical in medical imaging the resulting contours after image

segmentation can be used to create 3D reconstructions with the help of

interpolation algorithms like marching cubes

126 IMAGE RESTORATION

Image restoration like enhancement improves the qualities of image

but all the operations are mainly based on known measured or

degradations of the original image Image restorations are used to restore

images with problems such as geometric distortion improper focus

repetitive noise and camera motion It is used to correct images for known

degradations

127 FUNDAMENTAL STEPS

Image acquisition to acquire a digital image

Image preprocessing to improve the image in ways that increases the

chances for success of the other processes

Image segmentation to partitions an input image into its constituent parts or

objects

Image representation to convert the input data to a form suitable for

computer processing

Image description to extract features that result in some quantitative

information of interest or features that are basic for differentiating one

class of objects from another

Image recognition to assign a label to an object based on the

information provided by its descriptors

Image interpretation to assign meaning to an ensemble of recognized

objects

Knowledge about a problem domain is coded into an image processing

system in the form of a Knowledge database

13 A SIMPLE IMAGE MODEL

To be suitable for computer processing an image f(xy) must be digitalized

both spatially and in amplitude

Digitization of the spatial coordinates (xy) is called image sampling

Amplitude digitization is called gray-level quantization

The storage and processing requirements increase rapidly with the spatial

resolution and the number of gray levels

Example A 256 gray-level image of size 256x256 occupies 64K bytes of

memory

Images of very low spatial resolution produce a checkerboard effect

The use of insufficient number of gray levels in smooth areas of a digital

image results in false contouring

14 IMAGE FILE FORMATS

There are two general groups of lsquoimagesrsquo vector graphics (or line art)

and bitmaps (pixel-based or lsquoimagesrsquo) Some of the most common file

formats are

GIF mdash Graphical interchange Format An 8-bit (256 colour) non-

destructively compressed bitmap format Mostly used for web Has several

sub-standards one of which is the animated GIF

JPEG mdash Joint Photographic Experts Group a very efficient (ie much

information per byte) destructively compressed 24 bit (16 million colours)

bitmap format Widely used especially for web and Internet (bandwidth-

limited)

TIFF mdash Tagged Image File Format The standard 24 bit publication bitmap

format Compresses non-destructively with for instance Lempel-Ziv-

Welch (LZW) compression

PS mdash Postscript a standard vector format Has numerous sub-standards

and can be difficult to transport across platforms and operating systems

PSD ndash Adobe PhotoShop Document a dedicated Photoshop format that

keeps all the information in an image including all the layers

BMP- bit map file format

15 TYPE OF IMAGES

Images are 4 types

1 Binary image

2 Gray scale image

3 Color image

4 Indexed image

151 BINARY IMAGES

A binary image is a digital image that has only two possible values for

each pixel Typically the two colors used for a binary image are black and

white though any two colors can be used Binary images are also called bi-

level or two-level This means that each pixel is stored as a single bitmdashie

a 0 or 1 The names black-and-white BampW

152 GRAY SCALE IMAGE

In a (8-bit) grayscale image each picture element has an assigned intensity

that ranges from 0 to 255 A grey scale image is what people normally call

a black and white image but the name emphasizes that such an image will

also include many shades of grey

FIG

153 COLOR IMAGE

The RGB colour model relates very closely to the way we perceive

colour with the r g and b receptors in our retinas RGB uses additive colour

mixing and is the basic colour model used in television or any other

medium that projects colour with light It is the basic colour model used in

computers and for web graphics but it cannot be used for print production

The secondary colours of RGB ndash cyan magenta and yellow ndash are formed

by mixing two of the primary colours (red green or blue) and excluding the

third colour Red and green combine to make yellow green and blue to

make cyan and blue and red form magenta The combination of red green

and blue in full intensity makes white

In Photoshop using the ldquoscreenrdquo mode for the different layers in an

image will make the intensities mix together according to the additive

colour mixing model This is analogous to stacking slide images on top of

each other and shining light through them

FIG

CMYK The 4-colour CMYK model used in printing lays down

overlapping layers of varying percentages of transparent cyan (C) magenta

(M) and yellow (Y) inks In addition a layer of black (K) ink can be added

The CMYK model uses the subtractive colour model

154 INDEXED IMAGE

FIG

An indexed image consists of an array and a color map matrix The

pixel values in the array are direct indices into a color map By convention

this documentation uses the variable name X to refer to the array and map

to refer to the color map In computing indexed color is a technique to

manage digital images colors in a limited fashion in order to save

computer memory and file storage while speeding up display refresh and

file transfers It is a form of vector quantization compression

When an image is encoded in this way color information is not

directly carried by the image pixel data but is stored in a separate piece of

data called a palette an array of color elements in which every element a

color is indexed by its position within the array The image pixels do not

contain the full specification of its color but only its index in the palette

This technique is sometimes referred as pseudocolor or indirect color as

colors are addressed indirectly

Perhaps the first device that supported palette colors was a random-

access frame buffer described in 1975 by Kajiya Sutherland and Cheadle

This supported a palette of 256 36-bit RGB colors

16 Applications of image processing

Interest in digital image processing methods stems from 2 principal

application areas

1) Improvement of pictorial information for human interpretation

2) Processing of scene data for autonomous machine perception

In the second application area interest focuses on procedures for

extracting from an image

Information in a form suitable for computer processing

Examples include automatic character recognition industrial machine

vision for product assembly and inspection military recognizance

automatic processing of fingerprints etc

17 EXISTING SYSTEM

In Crihalmeanu and Ross proposed three approaches Speed Up Robust

Features (SURF)-based method minutiae detection and direct correlation

matching for feature registration and matching Within these three methods

the SURF method achieves the best accuracy It takes an average of 15

seconds1 using the SURF method to perform a one-to-one matching In

Zhou et al Proposed line descriptor-based method for sclera vein

recognition The matching step (including registration) is the most time-

consuming step in this sclera vein recognition system which costs about 12

seconds to perform a one-to-one matching Both speed was calculated using

a PC with Intelreg Core TM 2 Duo 24GHz processors and 4 GB DRAM

Currently Sclera vein recognition algorithms are designed using central

processing unit (CPU)-based systems

171 DISADVANTAGES OF EXISTING SYSTEM

1 Mask files are used to calculate valid overlapping areas of two sclera

templates and to align the templates to the same coordinate system But the

mask files are large in size and will preoccupy the GPU memory and slow

down the data transfer Also some of processing on the mask files will

involve convolution which is difficult to improve its performance on the

scalar process unit on CUDA

2 The procedure of sclera feature matching consists of a pipeline of several

computational stages with different memory and processing requirements

There is no uniform mapping scheme applicable for all these stages

3 When the scale of sclera database is far larger than the number of the

processing units on the GPU parallel matching on the GPU is still unable to

satisfy the requirement of real-time performance

LITERATURE SURVEY

1 S Crihalmeanu and A Ross ldquo ieee Multispectral scleral patterns

for ocular biometric recognitionrdquo Pattern Recognit Lett vol 33 no

14 pp 1860ndash1869 Oct 2012

Face recognition in unconstrained acquisition conditions is one of the

most challenging problems that has been actively researched in recent

years It is well known that many state-of-the-arts still face recognition

algorithms perform well when constrained (frontal well illuminated high-

resolution sharp and full) face images are acquired However their

performance degrades significantly when the test images contain variations

that are not present in the training images In this paper we highlight some

of the key issues in remote face recognition We define the remote face

recognition as one where faces are several tens of meters (10-250m) from

the cameras We then describe a remote face database which has been

acquired in an unconstrained outdoor maritime environment Recognition

performance of a subset of existing still image-based face recognition

algorithms is evaluated on the remote face data set Further we define the

remote re-identification problem as matching a subject at one location with

candidate sets acquired at a different location and over time in remote

conditions We provide preliminary experimental results on remote re-

identification It is demonstrated that in addition to applying a good

classification algorithm finding features that are robust to variations

mentioned above and developing statistical models which can account for

these variations are very important for remote face recognition

2 R N Rakvic B J Ulis R P Broussard R W Ives and N

Steiner1 ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics

Security

With the rapidly expanded biometric data collected by various sectors

of government and industry for identification and verification purposes

how to manage and process such Big Data draws great concern Even

though modern processors are equipped with more cores and memory

capacity it still requires careful design in order to utilize the hardware

resource effectively and the power consumption efficiently This research

addresses this issue by investigating the workload characteristics of

biometric application Taking Daugmanrsquos iris matching algorithm which

has been proven to be the most reliable iris matching method as a case

study we conduct performance profiling and binary instrumentation on the

benchmark to capture its execution behavior The results show that data

loading and memory access incurs great performance overhead and

motivates us to move the biometrics computation to high-performance

architecture

Modern iris recognition algorithms can be computationally intensive

yet are designed for traditional sequential processing elements such as a

personal computer However a parallel processing alternative using field

programmable gate arrays (FPGAs) offers an opportunity to speed up iris

recognition Within the means of this project iris template generation with

directional filtering which is acomputationally expensive yet parallel

portion of a modern iris recognition algorithm is parallelized on an FPGA

system We will present a performance comparison of the parallelized

algorithm on the FPGA system to a traditional CPU-based version The

parallelized template generation outperforms an optimized C++ code

version determining the information content of an iris approximately 324

times faster

3 R Derakhshani A Ross and S Crihalmeanu ldquoA new biometric

modality based on conjunctival vasculaturerdquo in Proc Artif Neural

Netw Eng 2006 pp 1ndash8

A new biometric indicator based on the patterns of conjunctival

vasculature is proposed Conjunctival vessels can be observed on the visible

part of the sclera that is exposed to the outside world These vessels

demonstrate rich and specific details in visible light and can be easily

photographed using a regular digital camera In this paper we discuss

methods for conjunctival imaging preprocessing and feature extraction in

order to derive a suitable conjunctival vascular template for biometric

authentication Commensurate classification methods along with the

observed accuracy are discussed Experimental results suggest the potential

of using conjunctival vasculature as a biometric measure Identification of

a person based on some unique set of features is an important task The

human identification is possible with several biometric systems and sclera

recognition is one of the promising biometrics The sclera is the white

portion of the human eye The vein pattern seen in the sclera region is

unique to each person Thus the sclera vein pattern is a well suited

biometric technology for human identification The existing methods used

for sclera recognition have some drawbacks like only frontal looking

images are preferred for matching and rotation variance is another problem

These problems are completely eliminated in the proposed system by using

two feature extraction techniques They are Histogram of Oriented

Gradients (HOG) and converting the image into polar form using the

bilinear interpolation technique These two features help the proposed

system to become illumination invariant and rotation invariant The

experimentation is done with the help of UBIRIS database The

experimental result shows that the proposed sclera recognition method can

achieve better accuracy than the previous methods

4 J D Owens M Houston D Luebke S Green J E Stone and J

C Phillips ldquoGPU computingrdquo Proc IEEE vol 96 no 5pp 879ndash899

May 2008

The graphics processing unit (GPU) has become an integral part of

todayrsquos mainstream computing systems Over the past six years there has

been a marked increase in the performance and capabilities of GPUs The

modern GPU is not only a powerful graphics engine but also a highly

parallel programmable processor featuring peak arithmetic and memory

bandwidth that substantially outpaces its CPU counterpart The GPUrsquos

rapid increase in both programmability and capability has spawned a

research community that has successfully mapped a broad range of

computationally demanding complex problems to the GPU This effort in

general purpose computing on the GPU also known as GPU computing

has positioned the GPU as a compelling alternative to traditional

microprocessors in high-performance computer systems of the future We

describe the background hardware and programming model for GPU

computing summarize the state of the art in tools and techniques and

present four GPU computing successes in game physics and computational

biophysics that deliver order-of-magnitude performance gains over

optimized CPU applications

5 H Proenccedila and L A Alexandre ldquoUBIRIS A noisy iris image

databaserdquoin Proc 13th Int Conf Image Anal Process 2005 pp 970ndash

977

This paper proposes algorithms for iris segmentation quality

enhancement match score fusion and indexing to improve both the

accuracy and the speed of iris recognition A curve evolution approach is

proposed to effectively segment a nonideal iris image using the modified

MumfordndashShah functional Different enhancement algorithms are

concurrently applied on the segmented iris image to produce multiple

enhanced versions of the iris image A support-vector-machine-based

learning algorithm selects locally enhanced regions from each globally

enhanced image and combines these good-quality regions to create a single

high-quality iris image Two distinct features are extracted from the high-

quality iris image The global textural feature is extracted using the 1-D log

polar Gabor transform and the local topological feature is extracted using

Euler numbers An intelligent fusion algorithm combines the textural and

topological matching scores to further improve the iris recognition

performance and reduce the false rejection rate whereas an indexing

algorithm enables fast and accurate iris identification The verification and

identification performance of the propose algorithms is validated and

compared with other algorithms using the CASIA Version 3 ICE 2005 and

UBIRIS iris databases

18 PROPOSED METHOD

Proposed a new parallel sclera vein recognition method using a two-

stage parallel approach for registration and matching A parallel sclera

matching solution for Sclera vein recognition using our sequential line-

descriptor method using the CUDA GPU architecture CUDA is a highly

parallel multithreaded many-core processor with tremendous

computational power

It supports not only a traditional graphics pipeline but also computation

on non-graphical data It is relatively straightforward to implement our C

program for CUDA on AMD-based GPU using Open CL Our CUDA

kernels can be directly converted to Open CL kernels by concerning

different syntax for various keywords and built-in functions The mapping

strategy is also effective in Open CL if we regard thread and block in

CUDA as work item and work-group in Open CL Most of our optimization

techniques such as coalesced memory access and prefix sum can work in

Open CL too Moreover since CUDA is a data parallel architecture the

implementation of our approach by Open CL should be programmed in

data-parallel model

In this research we first discuss why the naiumlve parallel approach would

not work We then propose the new sclera descriptor ndash the Y shape sclera

feature-based efficient registration method to speed up the mapping scheme

introduce the ldquoweighted polar line (WPL) descriptorrdquo that would be better

suited for parallel computing to mitigate the mask size issue and develop

our coarse to fine two-stage matching process to dramatically improve the

matching speed These new approaches make the parallel processing

possible and efficient

191PROPOSED SYSTEM ADVANTAGES

1 To improve the efficiency in this research we propose a new descriptor

mdash the Y shape descriptor which can greatly help improve the efficiency of

the coarse registration of two images and can be used to filter out some

non-matching pairs before refined matching

2 We propose the coarse-to-fine two-stage matching process In the first

stage we matched two images coarsely using the Y-shape descriptors

which is very fast to match because no registration was needed The

matching result in this stage can help filter out image pairs with low

similarities

CHAPTER 2

PROJECT DESCRIPTION

21 INTRODUCTION

The sclera is the opaque and white outer layer of the eye The blood

vessel structure of sclera is formed randomly and is unique to each person

which can be used for humanrsquos identification Several researchers have

designed different Sclera vein recognition methods and have shown that it

is promising to use Sclera vein recognition for human identification In

Crihalmeanu and Ross proposed three approaches Speed Up Robust

Features (SURF)-based method minutiae detection and direct correlation

matching for feature registration and matching Within these three methods

the SURF method achieves the best accuracy It takes an average of 15

seconds1 using the SURF method to per- form a one-to-one matching Zhou

et al proposed line descriptor-based method for sclera vein recognition

The matching step (including registration) is the most time-consuming step

in this sclera vein recognition system which costs about 12 seconds to

perform a one-to-one matching Both speed was calculated using a PC with

Intelreg Coretrade 2 Duo 24GHz processors and 4 GB DRAM Currently

Sclera vein recognition algorithms are designed using central processing

unit (CPU)-based systems

As discussed CPU-based systems are designed as sequential

processing devices which may not be efficient in data processing where the

data can be parallelized Because of large time consumption in the matching

step Sclera vein recognition using sequential-based method would be very

challenging to be implemented in a real time biometric system especially

when there is large number of templates in the database for matching GPUs

(as abbreviation of General purpose Graphics Processing Units GPGPUs)

are now popularly used for parallel computing to improve the

computational processing speed and efficiency The highly parallel

structure of GPUs makes them more effective than CPUs for data

processing where processing can be performed in parallel GPUs have been

widely used in biometrics recognition such as speech recognition text

detection handwriting recognition and face recognition In iris

recognition GPU was used to extract the features construct descriptors

and match templates

GPUs are also used for object retrieval and image search Park et al

designed the performance evaluation of image processing algorithms such

as linear feature extraction and multi-view stereo matching on GPUs

However these approaches were designed for their specific biometric

recognition applications and feature searching methods Therefore they may

not be efficient for Sclera vein recognition Compute Unified Device

Architecture (CUDA) the computing engine of NVIDIA GPUs is used in

this research CUDA is a highly parallel multithreaded many-core

processor with tremendous computational power It supports not only a

traditional graphics pipeline but also computation on non-graphical data

More importantly it offers an easier programming platform which

outperforms its CPU counterparts in terms of peak arithmetic intensity and

memory bandwidth In this research the goal is not to develop a unified

strategy to parallelize all sclera matching methods because each method is

quite different from one another and would need customized design To

develop an efficient parallel computing scheme it would need different

strategies for different Sclera vein recognition methods

Rather the goal is to develop a parallel sclera matching solution for

Sclera vein recognition using our sequential line-descriptor method using

the CUDA GPU architecture However the parallelization strategies

developed in this research can be applied to design parallel approaches for

other Sclera vein recognition methods and help parallelize general pattern

recognition methods Based on the matching approach in there are three

challenges to map the task of sclera feature matching to GPU

1) Mask files are used to calculate valid overlapping areas of two sclera

templates and to align the templates to the same coordinate system But the

mask files are large in size and will preoccupy the GPU memory and slow

down the data transfer Also some of processing on the mask files will

involve convolution which is difficult to improve its performance on the

scalar process unit on CUDA

2) The procedure of sclera feature matching consists of a pipeline of several

computational stages with different memory and processing requirements

There is no uniform mapping scheme applicable for all these stages

3) When the scale of sclera database is far larger than the number of the

processing units on the GPU parallel matching on the GPU is still unable to

satisfy the requirement of real-time performance New designs are

necessary to help narrow down the search range In summary naiumlve

implementation of the algorithms in parallel would not work efficiently

Note it is relatively straightforward to implement our C program for

CUDA on AMD-based GPU using Open CL Our CUDA kernels can be

directly converted to Open CL kernels by concerning different syntax for

various keywords and built-in functions The mapping strategy is also

effective in Open CL if we regard thread and block in CUDA as work item

and work-group in Open CL Most of our optimization techniques such as

coalesced memory access and prefix sum can work in Open CL too

Moreover since CUDA is a data parallel architecture the implementation

of our approach by Open CL should be programmed in data-parallel model

In this research we first discuss why the naiumlve parallel approach would not

work (Section 3) We then propose the new sclera descriptor ndash the Y shape

sclera feature-based efficient registration method to speed up the mapping

scheme (Section 4) introduce the ldquoweighted polar line (WPL) descriptorrdquo

that would be better suited for parallel computing to mitigate the mask size

issue (Section 5) and develop our coarse to fine two-stage matching

process to dramatically improve the matching speed (Section 6) These new

approaches make the parallel processing possible and efficient However it

is non-trivial to implement these algorithms in CUDA We then developed

the implementation schemes to map our algorithms into CUDA (Section 7)

In the Section 2 we give brief introduction of Sclera vein recognition In

the Section 8 we performed some experiments using the proposed system

In the Section 9 we draw some conclusions

22 BACKGROUND OF SCLERA VEIN RECOGNITION

221 OVERVIEW OF SCLERA VEIN RECOGNITION

A typical sclera vein recognition system includes sclera

segmentation feature enhancement feature extraction and feature

matching (Figure 1)

FIG

Sclera image segmentation is the first step in sclera vein recognition

Several methods have been designed for sclera segmentation Crihalmeanu

et al presented an semi-automated system for sclera segmentation They

used a clustering algorithm to classify the color eye images into three

clusters - sclera iris and background Later on Crihalmeanu and Ross

designed a segmentation approach based on a normalized sclera index

measure which includes coarse sclera segmentation pupil region

segmentation and fine sclera segmentation Zhou et al developed a skin

tone plus ldquowhite colorrdquo-based voting method for sclera segmentation in

color images and Otsursquos thresholding-based method for grayscale images

After sclera segmentation it is necessary to enhance and extract the sclera

features since the sclera vein patterns often lack contrast and are hard to

detect Zhou et al used a bank of multi-directional Gabor filters for

vascular pattern enhancement Derakhshani et al used contrast limited

adaptive histogram equalization (CLAHE) to enhance the green color plane

of the RGB image and a multi-scale region growing approach to identify

the sclera veins from the image background Crihalmeanu and Ross applied

a selective enhancement filter for blood vessels to extract features from the

green component in a color image In the feature matching step

Crihalmeanu and Ross proposed

three registration and matching approaches including Speed Up Robust

Features (SURF) which is based on interest-point detection minutiae

detection which is based on minutiae points on the vasculature structure

and direct correlation matching which relies on image registration Zhou et

al designed a line descriptor based feature registration and matching

method

The proposed sclera recognition consists of five steps which include

sclera segmentation vein pattern enhancement feature extraction feature

matching and matching decision Fig 2 shows the block diagram of sclera

recognition Two types of feature extraction are used in the proposed

method to achieve good accuracy for the identification The characteristics

that are elicited from the blood vessel structure seen in the sclera region are

Histogram of Oriented Gradients (HOG) and interpolation of Cartesian to

Polar conversion HOG is used to determine the gradient orientation and

edge orientations of vein pattern in the sclera region of an eye image To

become more computationally efficient the data of the image are converted

to the polar form It is mainly used for circular or quasi circular shape of

object These two characteristics are extracted from all the images in the

database and compared with the features of the query image whether the

person is correctly identified or not This procedure is done in the feature

matching step and ultimately makes the matching decision By using the

proposed feature extraction methods and matching techniques the human

identification is more accurate than the existing studies In the proposed

method two features of an image are drawn out

222 SCLERA SEGMENTATION

Sclera segmentation is the first step in the sclera recognition It lets

in three steps glare area detection sclera area estimation and iris and eyelid

detection and refinement Fig shows the steps of segmentation

FIG

Glare Area Detection Glare area means a small bright area near

pupil or iris This is the unwanted portion on the eye image Sobel filter is

applied to detect the glare area present in the iris or pupil Simply it runs

only for the grayscale image If the image is color then it needs a

conversion to grayscale image and after that apply it to the Sobel filter to

detect the glare area Fig 4 shows the result of the glare area detection

FIG

Sclera area estimation For the estimation of sclera area Otsursquos

thresholding method is applied The stairs of the sclera area detection are

selection of the area of interest (ROI) Otsursquos thresholding sclera area

detection Left and right sclera area is selected based on the iris boundaries

When the region of interest is selected then apply Otsursquos thresholding for

obtaining the potential sclera areas The correct left sclera area should be

placed in the right and center positions and correct right sclera area should

be placed in the left and center In this way non sclera areas are wiped out

223 IRIS AND EYELID REFINEMENT

The top and underside of the sclera regions are the limits of the

sclera area And then that upper eyelid lower eyelid and iris boundaries are

refined These altogether are the unwanted portion for recognition In order

to eliminate these effects refinement is done in the footstep of the detection

of sclera area Fig shows after the Otsursquos thresholding process and iris and

eyelid refinement to detect right sclera area In the same way the left sclera

area is detected using this method

FIG

In the segmentation process all images are not perfectly segmented

Hence feature extraction and matching are needed to reduce the

segmentation fault The vein patterns in the sclera area are not visible in the

segmentation process To get vein patterns more visible vein pattern

enhancement is to be performed

224 OCULAR SURFACE VASCULATURE

Human recognition using vascular patterns in the human body has

been studied in the context of fingers (Miura et al 2004) palm (Lin and

Fan 2004) and retina (Hill 1999) In the case of retinal biometrics an

especial optical device for imaging the back of the eyeball is needed (Hill

1999) Due to its perceived invasiveness and the required degree of subject

cooperation the use of retinal biometrics may not be acceptable to some

individuals The conjunctiva is a thin transparent and moist tissue that

covers the outer surface of the eye The part of the conjunctiva that covers

the inner lining of the eyelids is called palpebral conjunctiva and the part

that covers the outer surface of the eye is called ocular (or the bulbar)

conjunctiva which is the focus of this study The ocular conjunctiva is very

thin and clear thus the vasculature (including those of the episclera) is

easily visible through it The visible microcirculation of conjunctiva offers a

rich and complex network of veins and fine microcirculation (Fig 1) The

apparent complexity and specificity of these vascular patterns motivated us

to utilize them for personal identification (Derakhshani and Ross 2006)

FIG

We have found conjunctival vasculature to be a suitable biometric as it

conforms to the following criteria (Jain et al 2004)

UNIVERSALITY All normal living tissues including that of the

conjunctiva and episclera have vascular structure

UNIQUENESS Vasculature is created during embryonic vasculogenesis

Its detailed final structure is mostly stochastic and thus unique Even

though no comprehensive study on the uniqueness of vascular structures

has been conducted study of some targeted areas such as those of the eye

fundus confirm the uniqueness of such vascular patterns even between

identical twins (Simon and Goldstein 1935 Tower 1955)

PERMANENCE Other than cases of significant trauma pathology or

chemical intervention spontaneous adult ocular vasculogenesis and

angiogenesis does not easily occur Thus the conjunctival vascular

structure is expected to have reasonable permanence (Joussen 2001)

Practicality Conjunctival vasculature can be captured with commercial off

the shelf digital cameras under normal lighting conditions making this

modality highly practical

ACCEPTABILITY Since the subject is not required to stare directly into

the camera lens and given the possibility of capturing the conjunctival

vasculature from several feet away this modality is non-intrusive and thus

more acceptable

SPOOF-PROOFNESS The fine multi surface structure of the ocular

veins makes them hard to reproduce as a physical artifact Besides being a

stand-alone biometric modality we anticipate that the addition of

conjunctival biometrics will enhance the performance of current iris-based

biometric system in the following ways

Improving accuracy by the addition of vascular features

Facilitating recognition using off-angle iris images For instance if the iris

information is relegated to the left or right portions of the eye the sclera

vein patterns will be further exposed This feature makes sclera vasculature

a natural complement to the iris biometric

Addressing the failure-to-enroll issue when iris patterns are not usable (eg

due to surgical procedures)

Reducing vulnerability to spoof attacks For instance when implemented

alongside iris systems an attacker needs to reproduce not only the iris but

also different surfaces of the sclera along with the associated

microcirculation and make them available on commensurate eye surfaces

The first step in parallelizing an algorithm is to determine the

availability for simultaneous computation Below Figure demonstrates the

possibility for parallel directional filtering Since the filter is computed over

different portions of the input image the computation can be computed in

parallel (denoted by Elements below) In addition individual parallelization

of each element of Filtering can also be performed A detailed discussion of

our proposed parallelization is outside the scope of this paper

FIG

FIG

225 OVERVIEW OF THE LINE DESCRIPTOR-BASED SCLERA

VEIN

2251 RECOGNITION METHOD

The matching segment of the line-descriptor based method is a

bottleneck with regard to matching speed In this section we briefly

describe the Line Descriptor-based sclera vein recognition method After

segmentation vein patterns were enhanced by a bank of directional Gabor

filters Binary morphological operations are used to thin the detected vein

structure down to a single pixel wide skeleton and remove the branch

points The line descriptor is used to describe the segments in the vein

structure Figure 2 shows a visual description of the line descriptor Each

segment is described by three quantities the segments angle to some

reference angle at the iris center θ the segments distance to the iris center r

and the dominant angular orientation of the line segment ɸ Thus the

descriptor is S = ( θ r ɸ )T The individual components of the line descriptor

are calculated as

FIG

Here fline (x) is the polynomial approximation of the line segment (xl yl )

is the center point of the line segment (xi yi ) is the center of the detected

iris and S is the line descriptor In order to register the segments of the

vascular patterns a RANSAC-based algorithm is used to estimate the best-

fit parameters for registration between the two sclera vascular patterns For

the registration algorithm it randomly chooses two points ndash one from the

test template and one from the target template It also randomly chooses a

scaling factor and a rotation value based on a priori knowledge of the

database Using these values it calculates a fitness value for the registration

using these parameters

After sclera template registration each line segment in the test

template is compared to the line segments in the target template for

matches In order to reduce the effect of segmentation errors we created the

weighting image (Figure 3) from the sclera mask by setting interior pixels

in the sclera mask to 1 pixels within some distance of the boundary of the

mask to 05 and pixels outside the mask to 0

The matching score for two segment descriptors is calculated By

where Si and Sj are two segment descriptors m(Si Sj ) is the matching

score between segments Si and Sj d(Si Sj ) is the Euclidean distance

between the segment descriptors center points (from Eq 6-8) Dmatch is

the matching distance threshold and match is the matching angle threshold

The total matching score M is the sum of the individual matching scores

divided by the maximum matching score for the minimal set between the

test and target template That is one of the test or target templates has fewer

points and thus the sum of its descriptors weight sets the maximum score

that can be attained

FIG

FIG

FIG

FIG

movement of eye Y shape branches are observed to be a stable feature and

can be used as sclera feature descriptor To detect the Y shape branches in

the original template we search for the nearest neighbors set of every line

segment in a regular distance classified the angles among these neighbors

If there were two types of angle values in the line segment set this set may

be inferred as a Y shape structure and the line segment angles would be

recorded as a new feature of the sclera

There are two ways to measure both orientation and relationship of

every branch of Y shape vessels one is to use the angles of every branch to

x axle the other is to use the angels between branch and iris radial

direction The first method needs additional rotation operating to align the

template In our approach we employed the second method As Figure 6

shows ϕ1 ϕ2 and ϕ3 denote the angle between each branch and the radius

from pupil center Even when the head tilts the eye moves or the camera

zooms occurs at the image acquisition step ϕ1 ϕ2 and ϕ3 are quite stable

To tolerate errors from the pupil center calculation in the segmentation step

we also recorded the center position (x y) of the Y shape branches as

auxiliary parameters So in our rotation shift and scale invariant feature

vector is defined as y(ϕ1 ϕ2 ϕ3 x y) The Y-shape descriptor is generated

with reference to the iris center Therefore it is automatically aligned to the

iris centers It is a rotational- and scale- invariant descriptor V WPL

SCLERA DESCRIPTOR As we discussed in the Section 22 the line

descriptor is extracted from the skeleton of vessel structure in binary images

(Figure 7) The skeleton is then broken into smaller segments For each

segment a line descriptor is created to record the center and orientation of

the segment This descriptor is expressed as s(x yɸ) where (x y) is the

position of the center and ɸ is its orientation Because of the limitation of

segmentation accuracy the descriptor in the boundary of sclera area might

not be accurate and may contain spur edges resulting from the iris eyelid

andor eyelashes To be tolerant of such error the mask file

FIG

The line descriptor of the sclera vessel pattern (a) An eye image (b) Vessel

patterns in sclera (c) Enhanced sclera vessel patterns (d) Centers of line

segments of vessel patterns

Is designed to indicate whether a line segment belongs to the edge of the

sclera or not However in GPU application using the mask is a challenging

since the mask files are large in size and will occupy the GPU memory and

slow down the data transfer When matching the registration RANSAC

type algorithm was used to randomly select the corresponding descriptors

and the transform parameter between them was used to generate the

template transform affine matrix After every templates transform the mask

data should also be transformed and new boundary should be calculated to

evaluate the weight of the transformed descriptor This results in too many

convolutions in processor unit

To reduce heavy data transfer and computation we designed the

weighted polar line (WPL) descriptor structure which includes the

information of mask and can be automatically aligned We extracted the

relationship of geometric feature of descriptors and store them as a new

descriptor We use a weighted image created via setting various weight

values according to their positions The weight of those descriptors who are

beyond the sclera are set to be 0 and those who are near the sclera

boundary are 05 and interior descriptors are set to be 1 In our work

descriptors weights were calculated on their own mask by the CPU only

once

The calculating result was saved as a component of descriptor The

descriptor of sclera will change to s(x y ɸw) where w denotes the weight

of the point and the value may be 0 05 1 To align two templates when a

template is shifted to another location along the line connecting their

centers all the descriptors of that template will be transformed It would be

faster if two templates have similar reference points If we use the center of

the iris as the reference point when two templates are compared the

correspondence will automatically be aligned to each other since they have

the similar reference point Every feature vector of the template is a set of

line segment descriptors composed of three variable (Figure 8) the

segment angle to the reference line which went through the iris center

denoted as θ the distance between the segments center and pupil center

which is denoted as r the dominant angular orientation of the segment

denoted as ɸ To minimize the GPU computing we also convert the

descriptor value from polar coordinate to rectangular coordinate in CPU

preprocess

The descriptor vector becomes s(x y r θ ɸw) The left and right

parts of sclera in an eye may have different registration parameters For

example as an eyeball moves left left part sclera patterns of the eye may be

compressed while the right part sclera patterns are stretched

In parallel matching these two parts are assigned to threads in

different warps to allow different deformation The multiprocessor in

CUDA manages threads in groups of 32 parallel threads called warps We

reorganized the descriptor from same sides and saved

FIG

FIG

them in continuous address This would meet requirement of coalesced

memory access in GPU

After reorganizing the structure of descriptors and adding mask information

into the new descriptor the computation on the mask file is not needed on

the GPU It was very fast to match with this feature because it does not

need to reregister the templates every time after shifting Thus the cost of

data transfer and computation on GPU will be reduced Matching on the

new descriptor the shift parameters generator in Figure 4 is then simplified

as Figure 9

23 EVOLUTION OF GPU ARCHITECTURE

The fixed-function pipeline lacked the generality to efficiently express

more complicated shading and lighting operations that are essential for

complex effects The key step was replacing the fixed-function per-vertex

and per-fragment operations with user-specified programs run on each

vertex and fragment Over the past six years these vertex programs and

fragment programs have become increasingly more capable with larger

limits on their size and resource consumption with more fully featured

instruction sets and with more flexible control-flow operations After many

years of separate instruction sets for vertex and fragment operations current

GPUs support the unified Shader Model 40 on both vertex and fragment

shaders

The hardware must support shader programs of at least 65 k static

instructions and unlimited dynamic instructions

The instruction set for the first time supports both 32-bit integers and 32-

bit floating-point numbers

The hardware must allow an arbitrary number of both direct and indirect

reads from global memory (texture)

Finally dynamic flow control in the form of loops and branches must be

supported

As the shader model has evolved and become more powerful and GPU

applications of all types have increased vertex and fragment program

complexity GPU architectures have increasingly focused on the

programmable parts of the graphics pipeline Indeed while previous

generations of GPUs could best be described as additions of

programmability to a fixed-function pipeline todayrsquos GPUs are better

characterized as a programmable engine surrounded by supporting fixed-

function units General-Purpose Computing on the GPU Mapping general-

purpose computation onto the GPU uses the graphics hardware in much the

same way as any standard graphics application Because of this similarity it

is both easier and more difficult to explain the process On one hand the

actual operations are the same and are easy to follow on the other hand the

terminology is different between graphics and general-purpose use Harris

provides an excellent description of this mapping process

We begin by describing GPU programming using graphics terminology

then show how the same steps are used in a general-purpose way to author

GPGPU applications and finally use the same steps to show the more

simple and direct way that todayrsquos GPU computing applications are written

231 PROGRAMMING A GPU FOR GRAPHICS

We begin with the same GPU pipeline that we described in Section II

concentrating on the programmable aspects of this pipeline

The programmer specifies geometry that covers a region on the screen

The rasterizer generates a fragment at each pixel location covered by that

geometry

Each fragment is shaded by the fragment program

The fragment program computes the value of the fragment by a

combination of math operations and global memory reads from a global

Btexture[ memory

The resulting image can then be used as texture on future passes through

the graphics pipeline

232 PROGRAMMING A GPU FOR GENERAL-PURPOSE

PROGRAMS (OLD)

Coopting this pipeline to perform general-purpose computation

involves the exact same steps but different terminology A motivating

example is a fluid simulation computed over a grid at each time step we

compute the next state of the fluid for each grid point from the current state

at its grid point and at the grid points of its neighbors

The programmer specifies a geometric primitive that covers a

computation domain of interest The rasterizer generates a fragment at each

pixel location covered by that geometry (In our example our primitive

must cover a grid of fragments equal to the domain size of our fluid

simulation)

Each fragment is shaded by an SPMD general-purpose fragment

program (Each grid point runs the same program to update the state of its

fluid)

The fragment program computes the value of the fragment by a

combination of math operations and Bgather[ accesses from global

memory (Each grid point can access the state of its neighbors from the

previous time step in computing its current value)

The resulting buffer in global memory can then be used as an input on

future passes (The current state of the fluid will be used on the next time

step)

23 2 PROGRAMMING A GPU FOR GENERAL-PURPOSE

PROGRAMS (NEW)

One of the historical difficulties in programming GPGPU applications

has been that despite their general-purpose tasksrsquo having nothing to do with

graphics the applications still had to be programmed using graphics APIs

In addition the program had to be structured in terms of the graphics

pipeline with the programmable units only accessible as an intermediate

step in that pipeline when the programmer would almost certainly prefer to

access the programmable units directly The programming environments we

describe in detail in Section IV are solving this difficulty by providing a

more natural direct non-graphics interface to the hardware and

specifically the programmable units Today GPU computing applications

are structured in the following way

The programmer directly defines the computation domain of interest as a

structured grid of threads

An SPMD general-purpose program computes the value of each thread

The value for each thread is computed by a combination of math

operations and both Bgather[ (read) accesses from and Bscatter[ (write)

accesses to global memory Unlike in the previous two

methods the same buffer can be used for both reading and writing

allowing more flexible algorithms (for example in-place algorithms that

use less memory)

The resulting buffer in global memory can then be used as an input in

future computation

24 COARSE-TO-FINE TWO-STAGE MATCHING PROCESS

To further improve the matching process we propose the coarse-to-fine

two-stage matching process In the first stage we matched two images

coarsely using the Y-shape descriptors which is very fast to match because

no registration was needed The matching result in this stage can help filter

out image pairs with low similarities After this step it is still possible for

some false positive matches In the second stage we used WPL descriptor

to register the two images for more detailed descriptor matching including

scale- and translation invariance This stage includes shift transform affine

matrix generation and final WPL descriptor matching Overall we

partitioned the registration and matching processing into four kernels2 in

CUDA (Figure 10) matching on the Y shape descriptor shift

transformation affine matrix generation and final WSL descriptor

matching Combining these two stages the matching program can run faster

and

achieve more accurate score

241 STAGE I MATCHING WITH Y SHAPE DESCRIPTOR

Due to scale- and rotation- invariance of the Y-shape features

registration is unnecessary before matching on Y shape descriptor The

whole matching algorithm is listed as algorithm 1

FIG

Here ytei and yta j are the Y shape descriptors of test template Tte

and target template Tta respectively dϕ is the Euclidian distance of angle

element of descriptors vector defined as (3) dxy is the Euclidian distance of

two descriptor centers defined as (4) ni and di are the matched descriptor

pairsrsquo number and their centers distance respectively tϕ is a distance

threshold and txy is the threshold to restrict the searching area We set tϕ to

30 and txy to 675 in our experiment Here

To match two sclera templates we searched the areas nearby to all

the Y shape branches The search area is limited to the corresponding left or

right half of the sclera in order to reduce the searching range and time The

distance of two branches is defined in (3) where ϕi j is the angle between

the j th branch and the polar from pupil center in desctiptor i

The number of matched pairs ni and the distance between Y shape

branches centers di are stored as the matching result We fuse the number of

matched branches and the average distance between matched branches

centers as (2) Here α is a factor to fuse the matching score which was set

to 30 in our study Ni and Nj is the total numbers of feature vectors in

template i and j separately The decision is regulated by the threshold t if

the sclerarsquos matching score is lower than t the sclera will be discarded The

sclera with high matching score will be passed to the next more precisely

matching process

242 STAGE II FINE MATCHING USING WPL DESCRIPTOR

The line segment WSL descriptor reveals more vessel structure detail of

sclera than the Y shape descriptor The variation of sclera vessel pattern is

nonlinear because

When acquiring an eye image in different gaze angle the vessel structure

will appear nonlinear shrink or extend because eyeball is spherical in shape

sclera is made up of four layers episclera stroma lamina fusca and

endothelium There are slightly differences among movement of these

layers Considering these factors our registration employed both single

shift transform and multi-parameter transform which combines shift

rotation and scale together

1) SHIFT PARAMETER SEARCH As we discussed before

segmentation may not be accurate As a result the detected iris center could

not be very accurate Shift transform is designed to tolerant possible errors

in pupil center detection in the segmentation step If there is no deformation

or only very minor deformation registration with shift transform together

would be adequate to achieve an accurate result We designed Algorithm 2

to get optimized shift parameter Where Tte is the test template and ssei is

the i th WPL descriptor of Tte Tta is the target template and ssai is the i th

WPL descriptor of Ttad(stek staj ) is Euclidean distance of escriptors stek

and staj

Δsk is the shift value of two descriptors defines as

We first randomly select an equal number of segment descriptors

stek in test template Tte from each quad and find its nearest neighbors staj _

in target template Tta The shift offset of them is recorded as the possible

registration shift factor _sk The final offset registration factor is _soptim

which has the smallest standard deviation among these candidate offsets

2) AFFINE TRANSFORM PARAMETER SEARCH

Affine transform is designed to tolerant some deformation of sclera

patterns in the matching step The affine transform algorithm is shown in

Algorithm 3 The shift value in the parameter set is obtained by randomly

selecting descriptor s(it )te and calculating the distance from its nearest

neighbor staj_ in Tta We transform the test template by the matrix in (7)

At end of the iteration we count the numbers of matched descriptor pairs

from the transformed template and the target template The factor β is

involved to determine if the pair of descriptor is matched and we set it to

be 20 pixels in our experiment After N iterations the optimized transform

parameter set is determined via selecting the maximum matching numbers

m(it) Here stei Tte staj and Tta is defined same as algorithm 2 tr (it )

shi f t θ(it )tr (it ) scale is the parameters of shift rotation and scale

transform generated in i tth iteration R(θ (it )) T (tr (it ) shi f t ) and S(tr (it

) scale) are the transform matrix defined as (7) To search optimize

transform parameter we iterated N times to generate these parameters In

our experiment we set iteration time to 512

3) REGISTRATION AND MATCHING ALGORITHM

Using the optimized parameter set determined from Algorithms 2

and 3 the test template will be registered and matched simultaneously The

registration and matching algorithm is listed in Algorithm 4 Here stei Tte

staj and Tta are defined same as Algorithms 2 and 3 θ(optm) tr (optm) shi f

t tr (optm) scale _soptim are the registration parameters attained from

Algorithms 2 and 3 R_θ(optm)_T _tr (optm) shi f t _S_tr (optm) scale

_ is the descriptor transform matrix defined in Algorithm 3 empty is the angle

between the segment descriptor and radius direction w is the weight of the

descriptor which indicates whether the descriptor is at the edge of sclera or

not To ensure that the nearest descriptors have a similar orientation we

used a constant factor α to check the abstract difference of two ɸ In our

experiment we set α to 5 The total matching score is minimal score of two

transformed result divided by the minimal matching score for test template

and target template

25 MAPPING THE SUBTASKS TO CUDA

CUDA is a single instruction multiple data (SIMD) system and

works as a coprocessor with a CPU A CUDA consists of many streaming

multiprocessors (SM) where the parallel part of the program should be

partitioned into threads by the programmer and mapped into those threads

There are multiple memory spaces in the CUDA memory hierarchy

register local memory shared memory global memory constant memory

and texture memory Register local memory and shared memory are on-

chip and could be a little time consuming to access these memories Only

shared memory can be accessed by other threads within the same block

However there is only limited availability of shared memory Global

memory constant memory and texture memory are off-chip memory and

accessible by all threads which would be very time consuming to access

these memories

Constant memory and texture memory are read-only and cacheable

memory Mapping algorithms to CUDA to achieve efficient processing is

not a trivial task There are several challenges in CUDA programming

If threads in a warp have different control path all the branches will be

executed serially To improve performance branch divergence within a

warp should be avoided

Global memory is slower than on-chip memory in term of access To

completely hide the latency of the small instructions set we should use on-

chip memory preferentially rather than global memory When global

memory access occurs threads in same warp should access the words in

sequence to achieve coalescence

Shared memory is much faster than the local and global memory space

But shared memory is organized into banks which are equal in size If two

addresses of memory request from different thread within a warp fall in the

same memory bank the access will be serialized To get maximum

performance memory requests should be scheduled to minimize bank

conflicts

251 MAPPING ALGORITHM TO BLOCKS

Because the proposed registration and matching algorithm has four

independent modules all the modules will be converted to different kernels

on the GPU These kernels are different in computation density thus we

map them to the GPU by various map strategies to fully utilize the

computing power of CUDA Figure 11 shows our scheme of CPU-GPU

task distribution and the partition among blocks and threads Algorithm 1 is

partitioned into coarse-grained parallel subtasks

We create a number of threads in this kernel The number of threads

is the same as the number of templates in the database As the upper middle

column shows in Figure 11 each target template will be assigned to one

thread One thread performs a pair of templates compare In our work we

use NVIDIA C2070 as our GPU Threads and blocks number is set to

1024 That means we can match our test template with up to 1024times1024

target templates at same time

Algorithms 2-4 will be partitioned into fine-grained subtasks which is

processed a section of descriptors in one thread As the lower portion of the

middle column shows in Figure 11 we assigned a target template to one

block Inside a block one thread corresponds a set of descriptors in this

template This partition makes every block execute independently and there

are no data exchange requirements between different blocks When all

threads complete their responding descriptor fractions the sum of the

intermediate results needs to be computed or compared A parallel prefix

sum algorithm is used to calculate the sum of intermediate results which is

show in right of Figure 11 Firstly all odd number threads compute the sum

of consecutive pairs of the results Then recursively every first of i (= 4 8

16 32 64 ) threads

compute the prefix sum on the new result The final result will be saved in

the first address which has the same variable name as the first intermediate

result

252 MAPPING INSIDE BLOCK

In shift argument searching there are two schemes we can choose to

map task

Mapping one pair of templates to all the threads in a block and then every

thread would take charge of a fraction of descriptors and cooperation with

other threads

Assigning a single possible shift offset to a thread and all the threads will

compute independently unless the final result should be compared with

other possible offset

Due to great number of sum and synchronization operations in every

nearest neighbor searching step we choose the second method to parallelize

shift searching In affine matrix generator we mapped an entire parameter

set searching to a thread and every thread randomly generated a set of

parameters and tried them independently The generated iterations were

assigned to all threads The challenge of this step is the randomly generated

numbers might be correlated among threads In the step of rotation and

scale registration generating we used the Mersenne Twister pseudorandom

number generator because it can use bitwise arithmetic and have long

period

The Mersenne twister as most of pseudorandom generators is iterative

Therefore itrsquos hard to parallelize a single twister state update step among

several execution threads To make sure that thousands of threads in the

launch grid generate uncorrelated random sequence many simultaneous

Mersenne twisters need to process with different initial states in parallel

But even ldquovery differentrdquo (by any definition) initial state values do not

prevent the emission of correlated sequences by each generator sharing

identical parameters To solve this problem and to enable efficient

implementation of Mersenne Twister on parallel architectures we used a

special offline tool for the dynamic creation of Mersenne Twisters

parameters modified from the algorithm developed by Makoto Matsumoto

and Takuji Nishimura In the registration and matching step when

searching the nearest neighbor a line segment that has already matched

with others should not be used again In our approach a flag

FIG

FIG

Variable denoting whether the line has been matched is stored in

shared memory To share the flags all the threads in a block should wait

synchronic operation at every query step Our solution is to use a single

thread in a block to process the matching

253 MEMORY MANAGEMENT

The bandwidth inside GPU board is much higher than the

bandwidth between host memory and device memory The data transfer

between host and device can lead to long latency As shown in Figure 11

we load the entire target templates set from database without considering

when they would be processed Therefore there was no data transfer from

host to device during the matching procedure In global memory the

components in descriptor y(ϕ1 ϕ2 ϕ3 x y) and s(x y rθ ϕw) were stored

separately This would guarantee contiguous kernels of Algorithm 2 to 4

can access their data in successive addresses Although such coalescing

access reduces the latency frequently global memory access was still a

slower way to get data In our kernel we loaded the test template to shared

memory to accelerate memory access Because the Algorithms 2 to 4

execute different number of iterations on same data the bank conflict does

not happen To maximize our texture memory space we set the system

cache to the lowest value and bonded our target descriptor to texture

memory Using this catchable memory our data access was accelerated

more

FIG

26 HISTOGRAM OF ORIENTED GRADIENTS

Histogram of oriented gradients is the feature descriptors It is primarily

applied to the design of target detection In this paper it is applied as the

feature for human recognition In the sclera region the vein patterns are the

edges of an image So HOG is used to determine the gradient orientation

and edge orientations of vein pattern in the sclera region of an eye image

To follow out this technique first of all divide the image into small

connected regions called cells For each cell compute the histogram of

gradient directions or edge orientations of the pixels Then the combination

of different histogram of different cell represents the descriptor To improve

accuracy histograms can be contrast normalized by calculating the intensity

from the block and then using this value normalizes all cells within the

block This normalization result shows that it is invariant to geometric and

photometric changes The gradient magnitude m(x y) and orientation 1050592(x

y) are calculated using x and y directions gradients dx (x y) and dy (x y)

Orientation binning is the second step of HOG This method utilized

to create cell histograms Each pixel within the cell used to give a weight to

the orientation which is found in the gradient computation Gradient

magnitude is used as the weight The cells are in the rectangular form The

binning of gradient orientation should be spread over 0 to 180 degrees and

opposite direction counts as the same In the Fig 8 depicts the edge

orientation of picture elements If the images have any illumination and

contrast changes then the gradient strength must be locally normalized For

that cells are grouped together into larger blocks These blocks are

overlapping so that each cell contributes more than once to the final

descriptor Here rectangular HOG (R-HOG) blocks are applied which are

mainly in square grids The performance of HOG is improved by putting

on a Gaussian window into each block

FIG

CHAPTER 3

SOFTWARE SPECIFICATION

31 GENERAL

MATLAB(matrix laboratory) is a numerical

computing environment and fourth-generation programming language

Developed by Math Works MATLAB allows matrix manipulations

plotting of functions and data implementation of algorithms creation

of user interfaces and interfacing with programs written in other languages

including C C++ Java and Fortran

Although MATLAB is intended primarily for numerical computing an

optional toolbox uses the MuPAD symbolic engine allowing access

to symbolic computing capabilities An additional package Simulink adds

graphicalmulti-domainsimulationandModel-Based

Design for dynamic and embedded systems

In 2004 MATLAB had around one million users across industry

and academia MATLAB users come from various backgrounds

of engineering science and economics MATLAB is widely used in

academic and research institutions as well as industrial enterprises

MATLAB was first adopted by researchers and practitioners

in control engineering Littles specialty but quickly spread to many other

domains It is now also used in education in particular the teaching

of linear algebra and numerical analysis and is popular amongst scientists

involved in image processing The MATLAB application is built around the

MATLAB language The simplest way to execute MATLAB code is to type

it in the Command Window which is one of the elements of the MATLAB

Desktop When code is entered in the Command Window MATLAB can

be used as an interactive mathematical shell Sequences of commands can

be saved in a text file typically using the MATLAB Editor as a script or

encapsulated into a function extending the commands available

MATLAB provides a number of features for documenting and

sharing your work You can integrate your MATLAB code with other

languages and applications and distribute your MATLAB algorithms and

applications

32 FEATURES OF MATLAB

High-level language for technical computing

Development environment for managing code files and data

Interactive tools for iterative exploration design and problem solving

Mathematical functions for linear algebra statistics Fourier analysis

filtering optimization and numerical integration

2-D and 3-D graphics functions for visualizing data

Tools for building custom graphical user interfaces

Functions for integrating MATLAB based algorithms with external

applications and languages such as C C++ FORTRAN Javatrade COM

and Microsoft Excel

MATLAB is used in vast area including signal and image

processing communications control design test and measurement

financial modeling and analysis and computational Add-on toolboxes

(collections of special-purpose MATLAB functions) extend the MATLAB

environment to solve particular classes of problems in these application

areas

MATLAB can be used on personal computers and powerful

server systems including the Cheaha compute cluster With the addition of

the Parallel Computing Toolbox the language can be extended with parallel

implementations for common computational functions including for-loop

unrolling Additionally this toolbox supports offloading computationally

intensive workloads to Cheaha the campus compute cluster MATLAB is

one of a few languages in which each variable is a matrix (broadly

construed) and knows how big it is Moreover the fundamental operators

(eg addition multiplication) are programmed to deal with matrices when

required And the MATLAB environment handles much of the bothersome

housekeeping that makes all this possible Since so many of the procedures

required for Macro-Investment Analysis involves matrices MATLAB

proves to be an extremely efficient language for both communication and

implementation

321 INTERFACING WITH OTHER LANGUAGES

MATLAB can call functions and subroutines written in the C

programming language or FORTRAN A wrapper function is created

allowing MATLAB data types to be passed and returned The dynamically

loadable object files created by compiling such functions are termed MEX-

files (for MATLAB executable)

Libraries written in Java ActiveX or NET can be directly called

from MATLAB and many MATLAB libraries (for

example XML or SQL support) are implemented as wrappers around Java

or ActiveX libraries Calling MATLAB from Java is more complicated but

can be done with MATLAB extension which is sold separately by Math

Works or using an undocumented mechanism called JMI (Java-to-Mat lab

Interface) which should not be confused with the unrelated Java that is also

called JMI

As alternatives to the MuPAD based Symbolic Math Toolbox

available from Math Works MATLAB can be connected

to Maple or Mathematical

Libraries also exist to import and export MathML

Development Environment

Startup Accelerator for faster MATLAB startup on Windows especially on

Windows XP and for network installations

Spreadsheet Import Tool that provides more options for selecting and

loading mixed textual and numeric data

Readability and navigation improvements to warning and error messages in

the MATLAB command window

Automatic variable and function renaming in the MATLAB Editor

Developing Algorithms and Applications

MATLAB provides a high-level language and development

tools that let you quickly develop and analyze your algorithms and

applications

The MATLAB Language

The MATLAB language supports the vector and matrix operations

that are fundamental to engineering and scientific problems It enables fast

development and execution With the MATLAB language you can

program and develop algorithms faster than with traditional languages

because you do not need to perform low-level administrative tasks such as

declaring variables specifying data types and allocating memory In many

cases MATLAB eliminates the need for lsquoforrsquo loops As a result one line of

MATLAB code can often replace several lines of C or C++ code

At the same time MATLAB provides all the features of a traditional

programming language including arithmetic operators flow control data

structures data types object-oriented programming (OOP) and debugging

features

MATLAB lets you execute commands or groups of commands one

at a time without compiling and linking enabling you to quickly iterate to

the optimal solution For fast execution of heavy matrix and vector

computations MATLAB uses processor-optimized libraries For general-

purpose scalar computations MATLAB generates machine-code

instructions using its JIT (Just-In-Time) compilation technology

This technology which is available on most platforms provides

execution speeds that rival those of traditional programming languages

Development Tools

MATLAB includes development tools that help you implement

your algorithm efficiently These include the following

MATLAB Editor

Provides standard editing and debugging features such as setting

breakpoints and single stepping

Code Analyzer

Checks your code for problems and recommends modifications to

maximize performance and maintainability

MATLAB Profiler

Records the time spent executing each line of code

Directory Reports

Scan all the files in a directory and report on code efficiency file

differences file dependencies and code coverage

Designing Graphical User Interfaces

By using the interactive tool GUIDE (Graphical User Interface

Development Environment) to layout design and edit user interfaces

GUIDE lets you include list boxes pull-down menus push buttons radio

buttons and sliders as well as MATLAB plots and Microsoft

ActiveXreg controls Alternatively you can create GUIs programmatically

using MATLAB functions

322 ANALYZING AND ACCESSING DATA

MATLAB supports the entire data analysis process from acquiring

data from external devices and databases through preprocessing

visualization and numerical analysis to producing presentation-quality

output

Data Analysis

MATLAB provides interactive tools and command-line functions for data

analysis operations including

Interpolating and decimating

Extracting sections of data scaling and averaging

Thresholding and smoothing

Correlation Fourier analysis and filtering

1-D peak valley and zero finding

Basic statistics and curve fitting

Matrix analysis

Data Access

MATLAB is an efficient platform for accessing data from

files other applications databases and external devices You can read data

from popular file formats such as Microsoft Excel ASCII text or binary

files image sound and video files and scientific files such as HDF and

HDF5 Low-level binary file IO functions let you work with data files in

any format Additional functions let you read data from Web pages and

XML

Visualizing Data

All the graphics features that are required to visualize engineering

and scientific data are available in MATLAB These include 2-D and 3-D

plotting functions 3-D volume visualization functions tools for

interactively creating plots and the ability to export results to all popular

graphics formats You can customize plots by adding multiple axes

changing line colors and markers adding annotation Latex equations and

legends and drawing shapes

2-D Plotting

Visualizing vectors of data with 2-D plotting functions that create

Line area bar and pie charts

Direction and velocity plots

Histograms

Polygons and surfaces

Scatterbubble plots

Animations

3-D Plotting and Volume Visualization

MATLAB provides functions for visualizing 2-D matrices 3-

D scalar and 3-D vector data You can use these functions to visualize and

understand large often complex multidimensional data Specifying plot

characteristics such as camera viewing angle perspective lighting effect

light source locations and transparency

3-D plotting functions include

Surface contour and mesh

Image plots

Cone slice stream and isosurface

323 PERFORMING NUMERIC COMPUTATION

MATLAB contains mathematical statistical and engineering

functions to support all common engineering and science operations These

functions developed by experts in mathematics are the foundation of the

MATLAB language The core math functions use the LAPACK and BLAS

linear algebra subroutine libraries and the FFTW Discrete Fourier

Transform library Because these processor-dependent libraries are

optimized to the different platforms that MATLAB supports they execute

faster than the equivalent C or C++ code

MATLAB provides the following types of functions for performing

mathematical operations and analyzing data

Matrix manipulation and linear algebra

Polynomials and interpolation

Fourier analysis and filtering

Data analysis and statistics

Optimization and numerical integration

Ordinary differential equations (ODEs)

Partial differential equations (PDEs)

Sparse matrix operations

MATLAB can perform arithmetic on a wide range of data types

including doubles singles and integers

CHAPTER 4

IMPLEMENTATION

41 GENERAL

Matlab is a program that was originally designed to simplify the

implementation of numerical linear algebra routines It has since grown into

something much bigger and it is used to implement numerical algorithms

for a wide range of applications The basic language used is very similar to

standard linear algebra notation but there are a few extensions that will

likely cause you some problems at first

42 SNAPSHOTS

ORIGINAL SCLERA IMAGE IS CONVERTED INTO GREY SCALE IMAGE

FIG

GREY SCALE IMAGE IS CONVERTED INTO BINARY IMAGE

FIG

EDGE DETECTON IS DONE BY OTSUrsquoS THRESHOLDING

FIG

SELECTING THE REGION OF INTEREST (SCLERA PART)

FIG

SELECTED ROI PART

FIG

FIG

ENHANCEMENT OF SCLERA IMAGE

FIG

FEATURE EXTRACTION OF SCLERA IMAGE USING GABOR

FILTERS

FIG

MATCHING WITH IMAGES IN DATABASE

FIG

DISPLAYING THE RESULT (MATCHED OR NOT MATCHED)

FIG

CHAPTER 5

APPLICATIONS

The applications of biometrics can be divided into the following three main groups

Commercial applications such as computer network login electronic data security ecommerce Internet access ATM credit card physical access control cellular phone PDA medical records management distance learning etc

Government applications such as national ID card correctional facility driverrsquos license social security welfare-disbursement border control Passport control etc

Forensic applications such as corpse identification criminal investigation terrorist identification parenthood determination missing children etc Traditionally commercial applications have used knowledge-based systems (eg PIN sand passwords) government applications have used token-based systems (eg ID cards and badges) and forensic applications have relied on human experts to match biometric features Biometric systems are being increasingly deployed in large scale civilian applications The Schiphol Premium scheme at the Amsterdam airport for example employs iris scan cards to speed up the passport and visa control procedures

CHAPTER 6

CONCLUSION AND FUTURE SCOPE

61 CONCLUSION

In this paper we proposed a new parallel sclera vein recognition

method which employees a two stage parallel approach for registration and

matching Even though the research focused on developing a parallel sclera

matching solution for the sequential line-descriptor method using CUDA

GPU architecture the parallel strategies developed in this research can be

applied to design parallel solutions to other sclera vein recognition methods

and general pattern recognition methods We designed the Y shape

descriptor to narrow the search range to increase the matching efficiency

which is a new feature extraction method to take advantage of the GPU

structures We developed the WPL descriptor to incorporate mask

information and make it more suitable for parallel computing which can

dramatically reduce data transferring and computation We then carefully

mapped our algorithms to GPU threads and blocks which is an important

step to achieve parallel computation efficiency using a GPU A work flow

which has high arithmetic intensity to hide the memory access latency was

designed to partition the computation task to the heterogeneous system of

CPU and GPU even to the threads in GPU The proposed method

dramatically improves the matching efficiency without compromising

recognition accuracy

62 REFERENCES

[1] C W Oyster The Human Eye Structure and Function Sunderland

Sinauer Associates 1999

[2] C Cuevas D Berjon F Moran and N Garcia ldquoMoving object

detection for real-time augmented reality applications in a GPGPUrdquo IEEE

Trans Consum Electron vol 58 no 1 pp 117ndash125 Feb 2012

[3] D C Cirean U Meier L M Gambardella and J Schmidhuber ldquoDeep

big simple neural nets for handwritten digit recognitionrdquo Neural Comput

vol 22 no 12 pp 3207ndash3220 2010

[4] F Z Sakr M Taher and A M Wahba ldquoHigh performance iris

recognition system on GPUrdquo in Proc ICCES 2011 pp 237ndash242

[5] G Poli J H Saito J F Mari and M R Zorzan ldquoProcessing

neocognitron of face recognition on high performance environment based

on GPU with CUDA architecturerdquo in Proc 20th Int Symp Comput

Archit High Perform Comput 2008 pp 81ndash88

[6] J Antikainen J Havel R Josth A Herout P Zemcik and M Hauta-

Kasari ldquoNonnegative tensor factorization accelerated using GPGPUrdquo IEEE

Trans Parallel Distrib Syst vol 22 no 7 pp 1135ndash1141 Feb 2011

[7] K-S Oh and K Jung ldquoGPU implementation of neural networksrdquo

Pattern Recognit vol 37 no 6 pp 1311ndash1314 2004

[8] P R Dixon T Oonishi and S Furui ldquoHarnessing graphics processors

for the fast computation of acoustic likelihoods in speech recognitionrdquo

Comput Speech Lang vol 23 no 4 pp 510ndash526 2009

[9] P Kaufman and A Alm ldquoClinical applicationrdquo Adlerrsquos Physiology of

the Eye 2003

[10] R N Rakvic B J Ulis R P Broussard R W Ives and N Steiner

ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics Security vol 4

no 4 pp 812ndash823 Dec 2009

[11] S Crihalmeanu and A Ross ldquoMultispectral scleral patterns for ocular

biometric recognitionrdquo Pattern Recognit Lett vol 33 no 14 pp 1860ndash

1869 Oct 2012

[12] W Wenying Z Dongming Z Yongdong L Jintao and G

Xiaoguang ldquoRobust spatial matching for object retrieval and its parallel

implementation on GPUrdquo IEEE Trans Multimedia vol 13 no 6 pp

1308ndash1318 Dec 2011Multimedia Sec Magdeburg Germany Sep 2004

pp 4ndash15

[13] Y Xu S Deka and R Righetti ldquoA hybrid CPU-GPGPU approach for

real-time elastographyrdquo IEEE Trans Ultrason Ferroelectr Freq Control

vol 58 no 12 pp 2631ndash2645 Dec 2011

[14] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive

multimodal eye recognitionrdquo Signal Image Video Process vol 7 no 4

pp 619ndash631 Jul 2013

[15] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive

approach for sclera image quality measurerdquo Int J Biometrics vol 5 no 2

pp 181ndash198 2013

[16] Z Zhou E Y Du N L Thomas and E J Delp ldquoA new human

identification method Sclera recognitionrdquo IEEE Trans Syst Man

Cybern A Syst Humans vol 42 no 3 pp 571ndash583 May 2012

Page 12: Major project report on

dimensional function f(xy) where lsquoxrsquo and lsquoyrsquo are spatial (plane)

coordinates and the amplitude of f at any pair of coordinates (xy)

represents the intensity or gray level of the image at that point

A digital image is one for which both the co-ordinates and the

amplitude values of f are all finite discrete quantities Hence a digital

image is composed of a finite number of elements each of which has a

particular location value These elements are called ldquopixelsrdquo A digital

image is discrete in both spatial coordinates and brightness and it can be

considered as a matrix whose rows and column indices identify a point on

the image and the corresponding matrix element value identifies the gray

level at that point

One of the first applications of digital images was in the newspaper

industry when pictures were first sent by submarine cable between London

and New York Introduction of the Bartlane cable picture transmission

system in the early 1920s reduced the time required to transport a picture

across the Atlantic from more than a week to less than three hours

FIG

121 PREPROCESSING

In imaging science image processing is any form of signal

processing for which the input is an image such as a photograph or video

frame the output of image processing may be either an image or a set of

characteristics or parameters related to the image Most image-processing

techniques involve treating the image as a two-dimensional signal and

applying standard signal-processing techniques to it Image processing

usually refers to digital image processing but optical and analog image

processing also are possible This article is about general techniques that

apply to all of them The acquisition of images (producing the input image

in the first place) is referred to as imaging

Image processing refers to processing of a 2D picture by a

computer Basic definitions

An image defined in the ldquoreal worldrdquo is considered to be a function

of two real variables for example a(xy) with a as the amplitude (eg

brightness) of the image at the real coordinate position (xy) Modern digital

technology has made it possible to manipulate multi-dimensional signals

with systems that range from simple digital circuits to advanced parallel

computers The goal of this manipulation can be divided into three

categories

Image processing (image in -gt image out)

Image Analysis (image in -gt measurements out)

Image Understanding (image in -gt high-level description out)

An image may be considered to contain sub-images sometimes referred

to as regions-of-interest ROIs or simply regions This concept reflects the

fact that images frequently contain collections of objects each of which can

be the basis for a region In a sophisticated image processing system it

should be possible to apply specific image processing operations to selected

regions Thus one part of an image (region) might be processed to suppress

motion blur while another part might be processed to improve colour

rendition

Most usually image processing systems require that the images be

available in digitized form that is arrays of finite length binary words For

digitization the given Image is sampled on a discrete grid and each sample

or pixel is quantized using a finite number of bits The digitized image is

processed by a computer To display a digital image it is first converted

into analog signal which is scanned onto a display Closely related to

image processing are computer graphics and computer vision In computer

graphics images are manually made from physical models of objects

environments and lighting instead of being acquired (via imaging devices

such as cameras) from natural scenes as in most animated movies

Computer vision on the other hand is often considered high-level image

processing out of which a machinecomputersoftware intends to decipher

the physical contents of an image or a sequence of images (eg videos or

3D full-body magnetic resonance scans)

In modern sciences and technologies images also gain much

broader scopes due to the ever growing importance of scientific

visualization (of often large-scale complex scientificexperimental data)

Examples include microarray data in genetic research or real-time multi-

asset portfolio trading in finance Before going to processing an image it is

converted into a digital form Digitization includes sampling of image and

quantization of sampled values After converting the image into bit

information processing is performed This processing technique may be

Image enhancement Image restoration and Image compression

122 IMAGE ENHANCEMENT

It refers to accentuation or sharpening of image features such as

boundaries or contrast to make a graphic display more useful for display amp

analysis This process does not increase the inherent information content in

data It includes gray level amp contrast manipulation noise reduction edge

crispening and sharpening filtering interpolation and magnification

pseudo coloring and so on

123 IMAGE RESTORATION

It is concerned with filtering the observed image to minimize the

effect of degradations Effectiveness of image restoration depends on the

extent and accuracy of the knowledge of degradation process as well as on

filter design Image restoration differs from image enhancement in that the

latter is concerned with more extraction or accentuation of image features

124 IMAGE COMPRESSION

It is concerned with minimizing the number of bits required to represent

an image Application of compression are in broadcast TV remote sensing

via satellite military communication via aircraft radar teleconferencing

facsimile transmission for educational amp business documents medical

images that arise in Computer tomography magnetic resonance imaging

and digital radiology motion pictures satellite images weather maps

geological surveys and so on

Text compression ndash CCITT GROUP3 amp GROUP4

Still image compression ndash JPEG

Video image compression ndash MPEG

125 SEGMENTATION

In computer vision image segmentation is the process of

partitioning a digital image into multiple segments (sets of pixels also

known as super pixels) The goal of segmentation is to simplify andor

change the representation of an image into something that is more

meaningful and easier to analyze Image segmentation is typically used to

locate objects and boundaries (lines curves etc) in images More precisely

image segmentation is the process of assigning a label to every pixel in an

image such that pixels with the same label share certain visual

characteristics

The result of image segmentation is a set of segments that

collectively cover the entire image or a set of contours extracted from the

image (see edge detection) Each of the pixels in a region are similar with

respect to some characteristic or computed property such as

colour intensity or texture Adjacent regions are significantly different

with respect to the same characteristic(s) When applied to a stack of

images typical in medical imaging the resulting contours after image

segmentation can be used to create 3D reconstructions with the help of

interpolation algorithms like marching cubes

126 IMAGE RESTORATION

Image restoration like enhancement improves the qualities of image

but all the operations are mainly based on known measured or

degradations of the original image Image restorations are used to restore

images with problems such as geometric distortion improper focus

repetitive noise and camera motion It is used to correct images for known

degradations

127 FUNDAMENTAL STEPS

Image acquisition to acquire a digital image

Image preprocessing to improve the image in ways that increases the

chances for success of the other processes

Image segmentation to partitions an input image into its constituent parts or

objects

Image representation to convert the input data to a form suitable for

computer processing

Image description to extract features that result in some quantitative

information of interest or features that are basic for differentiating one

class of objects from another

Image recognition to assign a label to an object based on the

information provided by its descriptors

Image interpretation to assign meaning to an ensemble of recognized

objects

Knowledge about a problem domain is coded into an image processing

system in the form of a Knowledge database

13 A SIMPLE IMAGE MODEL

To be suitable for computer processing an image f(xy) must be digitalized

both spatially and in amplitude

Digitization of the spatial coordinates (xy) is called image sampling

Amplitude digitization is called gray-level quantization

The storage and processing requirements increase rapidly with the spatial

resolution and the number of gray levels

Example A 256 gray-level image of size 256x256 occupies 64K bytes of

memory

Images of very low spatial resolution produce a checkerboard effect

The use of insufficient number of gray levels in smooth areas of a digital

image results in false contouring

14 IMAGE FILE FORMATS

There are two general groups of lsquoimagesrsquo vector graphics (or line art)

and bitmaps (pixel-based or lsquoimagesrsquo) Some of the most common file

formats are

GIF mdash Graphical interchange Format An 8-bit (256 colour) non-

destructively compressed bitmap format Mostly used for web Has several

sub-standards one of which is the animated GIF

JPEG mdash Joint Photographic Experts Group a very efficient (ie much

information per byte) destructively compressed 24 bit (16 million colours)

bitmap format Widely used especially for web and Internet (bandwidth-

limited)

TIFF mdash Tagged Image File Format The standard 24 bit publication bitmap

format Compresses non-destructively with for instance Lempel-Ziv-

Welch (LZW) compression

PS mdash Postscript a standard vector format Has numerous sub-standards

and can be difficult to transport across platforms and operating systems

PSD ndash Adobe PhotoShop Document a dedicated Photoshop format that

keeps all the information in an image including all the layers

BMP- bit map file format

15 TYPE OF IMAGES

Images are 4 types

1 Binary image

2 Gray scale image

3 Color image

4 Indexed image

151 BINARY IMAGES

A binary image is a digital image that has only two possible values for

each pixel Typically the two colors used for a binary image are black and

white though any two colors can be used Binary images are also called bi-

level or two-level This means that each pixel is stored as a single bitmdashie

a 0 or 1 The names black-and-white BampW

152 GRAY SCALE IMAGE

In a (8-bit) grayscale image each picture element has an assigned intensity

that ranges from 0 to 255 A grey scale image is what people normally call

a black and white image but the name emphasizes that such an image will

also include many shades of grey

FIG

153 COLOR IMAGE

The RGB colour model relates very closely to the way we perceive

colour with the r g and b receptors in our retinas RGB uses additive colour

mixing and is the basic colour model used in television or any other

medium that projects colour with light It is the basic colour model used in

computers and for web graphics but it cannot be used for print production

The secondary colours of RGB ndash cyan magenta and yellow ndash are formed

by mixing two of the primary colours (red green or blue) and excluding the

third colour Red and green combine to make yellow green and blue to

make cyan and blue and red form magenta The combination of red green

and blue in full intensity makes white

In Photoshop using the ldquoscreenrdquo mode for the different layers in an

image will make the intensities mix together according to the additive

colour mixing model This is analogous to stacking slide images on top of

each other and shining light through them

FIG

CMYK The 4-colour CMYK model used in printing lays down

overlapping layers of varying percentages of transparent cyan (C) magenta

(M) and yellow (Y) inks In addition a layer of black (K) ink can be added

The CMYK model uses the subtractive colour model

154 INDEXED IMAGE

FIG

An indexed image consists of an array and a color map matrix The

pixel values in the array are direct indices into a color map By convention

this documentation uses the variable name X to refer to the array and map

to refer to the color map In computing indexed color is a technique to

manage digital images colors in a limited fashion in order to save

computer memory and file storage while speeding up display refresh and

file transfers It is a form of vector quantization compression

When an image is encoded in this way color information is not

directly carried by the image pixel data but is stored in a separate piece of

data called a palette an array of color elements in which every element a

color is indexed by its position within the array The image pixels do not

contain the full specification of its color but only its index in the palette

This technique is sometimes referred as pseudocolor or indirect color as

colors are addressed indirectly

Perhaps the first device that supported palette colors was a random-

access frame buffer described in 1975 by Kajiya Sutherland and Cheadle

This supported a palette of 256 36-bit RGB colors

16 Applications of image processing

Interest in digital image processing methods stems from 2 principal

application areas

1) Improvement of pictorial information for human interpretation

2) Processing of scene data for autonomous machine perception

In the second application area interest focuses on procedures for

extracting from an image

Information in a form suitable for computer processing

Examples include automatic character recognition industrial machine

vision for product assembly and inspection military recognizance

automatic processing of fingerprints etc

17 EXISTING SYSTEM

In Crihalmeanu and Ross proposed three approaches Speed Up Robust

Features (SURF)-based method minutiae detection and direct correlation

matching for feature registration and matching Within these three methods

the SURF method achieves the best accuracy It takes an average of 15

seconds1 using the SURF method to perform a one-to-one matching In

Zhou et al Proposed line descriptor-based method for sclera vein

recognition The matching step (including registration) is the most time-

consuming step in this sclera vein recognition system which costs about 12

seconds to perform a one-to-one matching Both speed was calculated using

a PC with Intelreg Core TM 2 Duo 24GHz processors and 4 GB DRAM

Currently Sclera vein recognition algorithms are designed using central

processing unit (CPU)-based systems

171 DISADVANTAGES OF EXISTING SYSTEM

1 Mask files are used to calculate valid overlapping areas of two sclera

templates and to align the templates to the same coordinate system But the

mask files are large in size and will preoccupy the GPU memory and slow

down the data transfer Also some of processing on the mask files will

involve convolution which is difficult to improve its performance on the

scalar process unit on CUDA

2 The procedure of sclera feature matching consists of a pipeline of several

computational stages with different memory and processing requirements

There is no uniform mapping scheme applicable for all these stages

3 When the scale of sclera database is far larger than the number of the

processing units on the GPU parallel matching on the GPU is still unable to

satisfy the requirement of real-time performance

LITERATURE SURVEY

1 S Crihalmeanu and A Ross ldquo ieee Multispectral scleral patterns

for ocular biometric recognitionrdquo Pattern Recognit Lett vol 33 no

14 pp 1860ndash1869 Oct 2012

Face recognition in unconstrained acquisition conditions is one of the

most challenging problems that has been actively researched in recent

years It is well known that many state-of-the-arts still face recognition

algorithms perform well when constrained (frontal well illuminated high-

resolution sharp and full) face images are acquired However their

performance degrades significantly when the test images contain variations

that are not present in the training images In this paper we highlight some

of the key issues in remote face recognition We define the remote face

recognition as one where faces are several tens of meters (10-250m) from

the cameras We then describe a remote face database which has been

acquired in an unconstrained outdoor maritime environment Recognition

performance of a subset of existing still image-based face recognition

algorithms is evaluated on the remote face data set Further we define the

remote re-identification problem as matching a subject at one location with

candidate sets acquired at a different location and over time in remote

conditions We provide preliminary experimental results on remote re-

identification It is demonstrated that in addition to applying a good

classification algorithm finding features that are robust to variations

mentioned above and developing statistical models which can account for

these variations are very important for remote face recognition

2 R N Rakvic B J Ulis R P Broussard R W Ives and N

Steiner1 ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics

Security

With the rapidly expanded biometric data collected by various sectors

of government and industry for identification and verification purposes

how to manage and process such Big Data draws great concern Even

though modern processors are equipped with more cores and memory

capacity it still requires careful design in order to utilize the hardware

resource effectively and the power consumption efficiently This research

addresses this issue by investigating the workload characteristics of

biometric application Taking Daugmanrsquos iris matching algorithm which

has been proven to be the most reliable iris matching method as a case

study we conduct performance profiling and binary instrumentation on the

benchmark to capture its execution behavior The results show that data

loading and memory access incurs great performance overhead and

motivates us to move the biometrics computation to high-performance

architecture

Modern iris recognition algorithms can be computationally intensive

yet are designed for traditional sequential processing elements such as a

personal computer However a parallel processing alternative using field

programmable gate arrays (FPGAs) offers an opportunity to speed up iris

recognition Within the means of this project iris template generation with

directional filtering which is acomputationally expensive yet parallel

portion of a modern iris recognition algorithm is parallelized on an FPGA

system We will present a performance comparison of the parallelized

algorithm on the FPGA system to a traditional CPU-based version The

parallelized template generation outperforms an optimized C++ code

version determining the information content of an iris approximately 324

times faster

3 R Derakhshani A Ross and S Crihalmeanu ldquoA new biometric

modality based on conjunctival vasculaturerdquo in Proc Artif Neural

Netw Eng 2006 pp 1ndash8

A new biometric indicator based on the patterns of conjunctival

vasculature is proposed Conjunctival vessels can be observed on the visible

part of the sclera that is exposed to the outside world These vessels

demonstrate rich and specific details in visible light and can be easily

photographed using a regular digital camera In this paper we discuss

methods for conjunctival imaging preprocessing and feature extraction in

order to derive a suitable conjunctival vascular template for biometric

authentication Commensurate classification methods along with the

observed accuracy are discussed Experimental results suggest the potential

of using conjunctival vasculature as a biometric measure Identification of

a person based on some unique set of features is an important task The

human identification is possible with several biometric systems and sclera

recognition is one of the promising biometrics The sclera is the white

portion of the human eye The vein pattern seen in the sclera region is

unique to each person Thus the sclera vein pattern is a well suited

biometric technology for human identification The existing methods used

for sclera recognition have some drawbacks like only frontal looking

images are preferred for matching and rotation variance is another problem

These problems are completely eliminated in the proposed system by using

two feature extraction techniques They are Histogram of Oriented

Gradients (HOG) and converting the image into polar form using the

bilinear interpolation technique These two features help the proposed

system to become illumination invariant and rotation invariant The

experimentation is done with the help of UBIRIS database The

experimental result shows that the proposed sclera recognition method can

achieve better accuracy than the previous methods

4 J D Owens M Houston D Luebke S Green J E Stone and J

C Phillips ldquoGPU computingrdquo Proc IEEE vol 96 no 5pp 879ndash899

May 2008

The graphics processing unit (GPU) has become an integral part of

todayrsquos mainstream computing systems Over the past six years there has

been a marked increase in the performance and capabilities of GPUs The

modern GPU is not only a powerful graphics engine but also a highly

parallel programmable processor featuring peak arithmetic and memory

bandwidth that substantially outpaces its CPU counterpart The GPUrsquos

rapid increase in both programmability and capability has spawned a

research community that has successfully mapped a broad range of

computationally demanding complex problems to the GPU This effort in

general purpose computing on the GPU also known as GPU computing

has positioned the GPU as a compelling alternative to traditional

microprocessors in high-performance computer systems of the future We

describe the background hardware and programming model for GPU

computing summarize the state of the art in tools and techniques and

present four GPU computing successes in game physics and computational

biophysics that deliver order-of-magnitude performance gains over

optimized CPU applications

5 H Proenccedila and L A Alexandre ldquoUBIRIS A noisy iris image

databaserdquoin Proc 13th Int Conf Image Anal Process 2005 pp 970ndash

977

This paper proposes algorithms for iris segmentation quality

enhancement match score fusion and indexing to improve both the

accuracy and the speed of iris recognition A curve evolution approach is

proposed to effectively segment a nonideal iris image using the modified

MumfordndashShah functional Different enhancement algorithms are

concurrently applied on the segmented iris image to produce multiple

enhanced versions of the iris image A support-vector-machine-based

learning algorithm selects locally enhanced regions from each globally

enhanced image and combines these good-quality regions to create a single

high-quality iris image Two distinct features are extracted from the high-

quality iris image The global textural feature is extracted using the 1-D log

polar Gabor transform and the local topological feature is extracted using

Euler numbers An intelligent fusion algorithm combines the textural and

topological matching scores to further improve the iris recognition

performance and reduce the false rejection rate whereas an indexing

algorithm enables fast and accurate iris identification The verification and

identification performance of the propose algorithms is validated and

compared with other algorithms using the CASIA Version 3 ICE 2005 and

UBIRIS iris databases

18 PROPOSED METHOD

Proposed a new parallel sclera vein recognition method using a two-

stage parallel approach for registration and matching A parallel sclera

matching solution for Sclera vein recognition using our sequential line-

descriptor method using the CUDA GPU architecture CUDA is a highly

parallel multithreaded many-core processor with tremendous

computational power

It supports not only a traditional graphics pipeline but also computation

on non-graphical data It is relatively straightforward to implement our C

program for CUDA on AMD-based GPU using Open CL Our CUDA

kernels can be directly converted to Open CL kernels by concerning

different syntax for various keywords and built-in functions The mapping

strategy is also effective in Open CL if we regard thread and block in

CUDA as work item and work-group in Open CL Most of our optimization

techniques such as coalesced memory access and prefix sum can work in

Open CL too Moreover since CUDA is a data parallel architecture the

implementation of our approach by Open CL should be programmed in

data-parallel model

In this research we first discuss why the naiumlve parallel approach would

not work We then propose the new sclera descriptor ndash the Y shape sclera

feature-based efficient registration method to speed up the mapping scheme

introduce the ldquoweighted polar line (WPL) descriptorrdquo that would be better

suited for parallel computing to mitigate the mask size issue and develop

our coarse to fine two-stage matching process to dramatically improve the

matching speed These new approaches make the parallel processing

possible and efficient

191PROPOSED SYSTEM ADVANTAGES

1 To improve the efficiency in this research we propose a new descriptor

mdash the Y shape descriptor which can greatly help improve the efficiency of

the coarse registration of two images and can be used to filter out some

non-matching pairs before refined matching

2 We propose the coarse-to-fine two-stage matching process In the first

stage we matched two images coarsely using the Y-shape descriptors

which is very fast to match because no registration was needed The

matching result in this stage can help filter out image pairs with low

similarities

CHAPTER 2

PROJECT DESCRIPTION

21 INTRODUCTION

The sclera is the opaque and white outer layer of the eye The blood

vessel structure of sclera is formed randomly and is unique to each person

which can be used for humanrsquos identification Several researchers have

designed different Sclera vein recognition methods and have shown that it

is promising to use Sclera vein recognition for human identification In

Crihalmeanu and Ross proposed three approaches Speed Up Robust

Features (SURF)-based method minutiae detection and direct correlation

matching for feature registration and matching Within these three methods

the SURF method achieves the best accuracy It takes an average of 15

seconds1 using the SURF method to per- form a one-to-one matching Zhou

et al proposed line descriptor-based method for sclera vein recognition

The matching step (including registration) is the most time-consuming step

in this sclera vein recognition system which costs about 12 seconds to

perform a one-to-one matching Both speed was calculated using a PC with

Intelreg Coretrade 2 Duo 24GHz processors and 4 GB DRAM Currently

Sclera vein recognition algorithms are designed using central processing

unit (CPU)-based systems

As discussed CPU-based systems are designed as sequential

processing devices which may not be efficient in data processing where the

data can be parallelized Because of large time consumption in the matching

step Sclera vein recognition using sequential-based method would be very

challenging to be implemented in a real time biometric system especially

when there is large number of templates in the database for matching GPUs

(as abbreviation of General purpose Graphics Processing Units GPGPUs)

are now popularly used for parallel computing to improve the

computational processing speed and efficiency The highly parallel

structure of GPUs makes them more effective than CPUs for data

processing where processing can be performed in parallel GPUs have been

widely used in biometrics recognition such as speech recognition text

detection handwriting recognition and face recognition In iris

recognition GPU was used to extract the features construct descriptors

and match templates

GPUs are also used for object retrieval and image search Park et al

designed the performance evaluation of image processing algorithms such

as linear feature extraction and multi-view stereo matching on GPUs

However these approaches were designed for their specific biometric

recognition applications and feature searching methods Therefore they may

not be efficient for Sclera vein recognition Compute Unified Device

Architecture (CUDA) the computing engine of NVIDIA GPUs is used in

this research CUDA is a highly parallel multithreaded many-core

processor with tremendous computational power It supports not only a

traditional graphics pipeline but also computation on non-graphical data

More importantly it offers an easier programming platform which

outperforms its CPU counterparts in terms of peak arithmetic intensity and

memory bandwidth In this research the goal is not to develop a unified

strategy to parallelize all sclera matching methods because each method is

quite different from one another and would need customized design To

develop an efficient parallel computing scheme it would need different

strategies for different Sclera vein recognition methods

Rather the goal is to develop a parallel sclera matching solution for

Sclera vein recognition using our sequential line-descriptor method using

the CUDA GPU architecture However the parallelization strategies

developed in this research can be applied to design parallel approaches for

other Sclera vein recognition methods and help parallelize general pattern

recognition methods Based on the matching approach in there are three

challenges to map the task of sclera feature matching to GPU

1) Mask files are used to calculate valid overlapping areas of two sclera

templates and to align the templates to the same coordinate system But the

mask files are large in size and will preoccupy the GPU memory and slow

down the data transfer Also some of processing on the mask files will

involve convolution which is difficult to improve its performance on the

scalar process unit on CUDA

2) The procedure of sclera feature matching consists of a pipeline of several

computational stages with different memory and processing requirements

There is no uniform mapping scheme applicable for all these stages

3) When the scale of sclera database is far larger than the number of the

processing units on the GPU parallel matching on the GPU is still unable to

satisfy the requirement of real-time performance New designs are

necessary to help narrow down the search range In summary naiumlve

implementation of the algorithms in parallel would not work efficiently

Note it is relatively straightforward to implement our C program for

CUDA on AMD-based GPU using Open CL Our CUDA kernels can be

directly converted to Open CL kernels by concerning different syntax for

various keywords and built-in functions The mapping strategy is also

effective in Open CL if we regard thread and block in CUDA as work item

and work-group in Open CL Most of our optimization techniques such as

coalesced memory access and prefix sum can work in Open CL too

Moreover since CUDA is a data parallel architecture the implementation

of our approach by Open CL should be programmed in data-parallel model

In this research we first discuss why the naiumlve parallel approach would not

work (Section 3) We then propose the new sclera descriptor ndash the Y shape

sclera feature-based efficient registration method to speed up the mapping

scheme (Section 4) introduce the ldquoweighted polar line (WPL) descriptorrdquo

that would be better suited for parallel computing to mitigate the mask size

issue (Section 5) and develop our coarse to fine two-stage matching

process to dramatically improve the matching speed (Section 6) These new

approaches make the parallel processing possible and efficient However it

is non-trivial to implement these algorithms in CUDA We then developed

the implementation schemes to map our algorithms into CUDA (Section 7)

In the Section 2 we give brief introduction of Sclera vein recognition In

the Section 8 we performed some experiments using the proposed system

In the Section 9 we draw some conclusions

22 BACKGROUND OF SCLERA VEIN RECOGNITION

221 OVERVIEW OF SCLERA VEIN RECOGNITION

A typical sclera vein recognition system includes sclera

segmentation feature enhancement feature extraction and feature

matching (Figure 1)

FIG

Sclera image segmentation is the first step in sclera vein recognition

Several methods have been designed for sclera segmentation Crihalmeanu

et al presented an semi-automated system for sclera segmentation They

used a clustering algorithm to classify the color eye images into three

clusters - sclera iris and background Later on Crihalmeanu and Ross

designed a segmentation approach based on a normalized sclera index

measure which includes coarse sclera segmentation pupil region

segmentation and fine sclera segmentation Zhou et al developed a skin

tone plus ldquowhite colorrdquo-based voting method for sclera segmentation in

color images and Otsursquos thresholding-based method for grayscale images

After sclera segmentation it is necessary to enhance and extract the sclera

features since the sclera vein patterns often lack contrast and are hard to

detect Zhou et al used a bank of multi-directional Gabor filters for

vascular pattern enhancement Derakhshani et al used contrast limited

adaptive histogram equalization (CLAHE) to enhance the green color plane

of the RGB image and a multi-scale region growing approach to identify

the sclera veins from the image background Crihalmeanu and Ross applied

a selective enhancement filter for blood vessels to extract features from the

green component in a color image In the feature matching step

Crihalmeanu and Ross proposed

three registration and matching approaches including Speed Up Robust

Features (SURF) which is based on interest-point detection minutiae

detection which is based on minutiae points on the vasculature structure

and direct correlation matching which relies on image registration Zhou et

al designed a line descriptor based feature registration and matching

method

The proposed sclera recognition consists of five steps which include

sclera segmentation vein pattern enhancement feature extraction feature

matching and matching decision Fig 2 shows the block diagram of sclera

recognition Two types of feature extraction are used in the proposed

method to achieve good accuracy for the identification The characteristics

that are elicited from the blood vessel structure seen in the sclera region are

Histogram of Oriented Gradients (HOG) and interpolation of Cartesian to

Polar conversion HOG is used to determine the gradient orientation and

edge orientations of vein pattern in the sclera region of an eye image To

become more computationally efficient the data of the image are converted

to the polar form It is mainly used for circular or quasi circular shape of

object These two characteristics are extracted from all the images in the

database and compared with the features of the query image whether the

person is correctly identified or not This procedure is done in the feature

matching step and ultimately makes the matching decision By using the

proposed feature extraction methods and matching techniques the human

identification is more accurate than the existing studies In the proposed

method two features of an image are drawn out

222 SCLERA SEGMENTATION

Sclera segmentation is the first step in the sclera recognition It lets

in three steps glare area detection sclera area estimation and iris and eyelid

detection and refinement Fig shows the steps of segmentation

FIG

Glare Area Detection Glare area means a small bright area near

pupil or iris This is the unwanted portion on the eye image Sobel filter is

applied to detect the glare area present in the iris or pupil Simply it runs

only for the grayscale image If the image is color then it needs a

conversion to grayscale image and after that apply it to the Sobel filter to

detect the glare area Fig 4 shows the result of the glare area detection

FIG

Sclera area estimation For the estimation of sclera area Otsursquos

thresholding method is applied The stairs of the sclera area detection are

selection of the area of interest (ROI) Otsursquos thresholding sclera area

detection Left and right sclera area is selected based on the iris boundaries

When the region of interest is selected then apply Otsursquos thresholding for

obtaining the potential sclera areas The correct left sclera area should be

placed in the right and center positions and correct right sclera area should

be placed in the left and center In this way non sclera areas are wiped out

223 IRIS AND EYELID REFINEMENT

The top and underside of the sclera regions are the limits of the

sclera area And then that upper eyelid lower eyelid and iris boundaries are

refined These altogether are the unwanted portion for recognition In order

to eliminate these effects refinement is done in the footstep of the detection

of sclera area Fig shows after the Otsursquos thresholding process and iris and

eyelid refinement to detect right sclera area In the same way the left sclera

area is detected using this method

FIG

In the segmentation process all images are not perfectly segmented

Hence feature extraction and matching are needed to reduce the

segmentation fault The vein patterns in the sclera area are not visible in the

segmentation process To get vein patterns more visible vein pattern

enhancement is to be performed

224 OCULAR SURFACE VASCULATURE

Human recognition using vascular patterns in the human body has

been studied in the context of fingers (Miura et al 2004) palm (Lin and

Fan 2004) and retina (Hill 1999) In the case of retinal biometrics an

especial optical device for imaging the back of the eyeball is needed (Hill

1999) Due to its perceived invasiveness and the required degree of subject

cooperation the use of retinal biometrics may not be acceptable to some

individuals The conjunctiva is a thin transparent and moist tissue that

covers the outer surface of the eye The part of the conjunctiva that covers

the inner lining of the eyelids is called palpebral conjunctiva and the part

that covers the outer surface of the eye is called ocular (or the bulbar)

conjunctiva which is the focus of this study The ocular conjunctiva is very

thin and clear thus the vasculature (including those of the episclera) is

easily visible through it The visible microcirculation of conjunctiva offers a

rich and complex network of veins and fine microcirculation (Fig 1) The

apparent complexity and specificity of these vascular patterns motivated us

to utilize them for personal identification (Derakhshani and Ross 2006)

FIG

We have found conjunctival vasculature to be a suitable biometric as it

conforms to the following criteria (Jain et al 2004)

UNIVERSALITY All normal living tissues including that of the

conjunctiva and episclera have vascular structure

UNIQUENESS Vasculature is created during embryonic vasculogenesis

Its detailed final structure is mostly stochastic and thus unique Even

though no comprehensive study on the uniqueness of vascular structures

has been conducted study of some targeted areas such as those of the eye

fundus confirm the uniqueness of such vascular patterns even between

identical twins (Simon and Goldstein 1935 Tower 1955)

PERMANENCE Other than cases of significant trauma pathology or

chemical intervention spontaneous adult ocular vasculogenesis and

angiogenesis does not easily occur Thus the conjunctival vascular

structure is expected to have reasonable permanence (Joussen 2001)

Practicality Conjunctival vasculature can be captured with commercial off

the shelf digital cameras under normal lighting conditions making this

modality highly practical

ACCEPTABILITY Since the subject is not required to stare directly into

the camera lens and given the possibility of capturing the conjunctival

vasculature from several feet away this modality is non-intrusive and thus

more acceptable

SPOOF-PROOFNESS The fine multi surface structure of the ocular

veins makes them hard to reproduce as a physical artifact Besides being a

stand-alone biometric modality we anticipate that the addition of

conjunctival biometrics will enhance the performance of current iris-based

biometric system in the following ways

Improving accuracy by the addition of vascular features

Facilitating recognition using off-angle iris images For instance if the iris

information is relegated to the left or right portions of the eye the sclera

vein patterns will be further exposed This feature makes sclera vasculature

a natural complement to the iris biometric

Addressing the failure-to-enroll issue when iris patterns are not usable (eg

due to surgical procedures)

Reducing vulnerability to spoof attacks For instance when implemented

alongside iris systems an attacker needs to reproduce not only the iris but

also different surfaces of the sclera along with the associated

microcirculation and make them available on commensurate eye surfaces

The first step in parallelizing an algorithm is to determine the

availability for simultaneous computation Below Figure demonstrates the

possibility for parallel directional filtering Since the filter is computed over

different portions of the input image the computation can be computed in

parallel (denoted by Elements below) In addition individual parallelization

of each element of Filtering can also be performed A detailed discussion of

our proposed parallelization is outside the scope of this paper

FIG

FIG

225 OVERVIEW OF THE LINE DESCRIPTOR-BASED SCLERA

VEIN

2251 RECOGNITION METHOD

The matching segment of the line-descriptor based method is a

bottleneck with regard to matching speed In this section we briefly

describe the Line Descriptor-based sclera vein recognition method After

segmentation vein patterns were enhanced by a bank of directional Gabor

filters Binary morphological operations are used to thin the detected vein

structure down to a single pixel wide skeleton and remove the branch

points The line descriptor is used to describe the segments in the vein

structure Figure 2 shows a visual description of the line descriptor Each

segment is described by three quantities the segments angle to some

reference angle at the iris center θ the segments distance to the iris center r

and the dominant angular orientation of the line segment ɸ Thus the

descriptor is S = ( θ r ɸ )T The individual components of the line descriptor

are calculated as

FIG

Here fline (x) is the polynomial approximation of the line segment (xl yl )

is the center point of the line segment (xi yi ) is the center of the detected

iris and S is the line descriptor In order to register the segments of the

vascular patterns a RANSAC-based algorithm is used to estimate the best-

fit parameters for registration between the two sclera vascular patterns For

the registration algorithm it randomly chooses two points ndash one from the

test template and one from the target template It also randomly chooses a

scaling factor and a rotation value based on a priori knowledge of the

database Using these values it calculates a fitness value for the registration

using these parameters

After sclera template registration each line segment in the test

template is compared to the line segments in the target template for

matches In order to reduce the effect of segmentation errors we created the

weighting image (Figure 3) from the sclera mask by setting interior pixels

in the sclera mask to 1 pixels within some distance of the boundary of the

mask to 05 and pixels outside the mask to 0

The matching score for two segment descriptors is calculated By

where Si and Sj are two segment descriptors m(Si Sj ) is the matching

score between segments Si and Sj d(Si Sj ) is the Euclidean distance

between the segment descriptors center points (from Eq 6-8) Dmatch is

the matching distance threshold and match is the matching angle threshold

The total matching score M is the sum of the individual matching scores

divided by the maximum matching score for the minimal set between the

test and target template That is one of the test or target templates has fewer

points and thus the sum of its descriptors weight sets the maximum score

that can be attained

FIG

FIG

FIG

FIG

movement of eye Y shape branches are observed to be a stable feature and

can be used as sclera feature descriptor To detect the Y shape branches in

the original template we search for the nearest neighbors set of every line

segment in a regular distance classified the angles among these neighbors

If there were two types of angle values in the line segment set this set may

be inferred as a Y shape structure and the line segment angles would be

recorded as a new feature of the sclera

There are two ways to measure both orientation and relationship of

every branch of Y shape vessels one is to use the angles of every branch to

x axle the other is to use the angels between branch and iris radial

direction The first method needs additional rotation operating to align the

template In our approach we employed the second method As Figure 6

shows ϕ1 ϕ2 and ϕ3 denote the angle between each branch and the radius

from pupil center Even when the head tilts the eye moves or the camera

zooms occurs at the image acquisition step ϕ1 ϕ2 and ϕ3 are quite stable

To tolerate errors from the pupil center calculation in the segmentation step

we also recorded the center position (x y) of the Y shape branches as

auxiliary parameters So in our rotation shift and scale invariant feature

vector is defined as y(ϕ1 ϕ2 ϕ3 x y) The Y-shape descriptor is generated

with reference to the iris center Therefore it is automatically aligned to the

iris centers It is a rotational- and scale- invariant descriptor V WPL

SCLERA DESCRIPTOR As we discussed in the Section 22 the line

descriptor is extracted from the skeleton of vessel structure in binary images

(Figure 7) The skeleton is then broken into smaller segments For each

segment a line descriptor is created to record the center and orientation of

the segment This descriptor is expressed as s(x yɸ) where (x y) is the

position of the center and ɸ is its orientation Because of the limitation of

segmentation accuracy the descriptor in the boundary of sclera area might

not be accurate and may contain spur edges resulting from the iris eyelid

andor eyelashes To be tolerant of such error the mask file

FIG

The line descriptor of the sclera vessel pattern (a) An eye image (b) Vessel

patterns in sclera (c) Enhanced sclera vessel patterns (d) Centers of line

segments of vessel patterns

Is designed to indicate whether a line segment belongs to the edge of the

sclera or not However in GPU application using the mask is a challenging

since the mask files are large in size and will occupy the GPU memory and

slow down the data transfer When matching the registration RANSAC

type algorithm was used to randomly select the corresponding descriptors

and the transform parameter between them was used to generate the

template transform affine matrix After every templates transform the mask

data should also be transformed and new boundary should be calculated to

evaluate the weight of the transformed descriptor This results in too many

convolutions in processor unit

To reduce heavy data transfer and computation we designed the

weighted polar line (WPL) descriptor structure which includes the

information of mask and can be automatically aligned We extracted the

relationship of geometric feature of descriptors and store them as a new

descriptor We use a weighted image created via setting various weight

values according to their positions The weight of those descriptors who are

beyond the sclera are set to be 0 and those who are near the sclera

boundary are 05 and interior descriptors are set to be 1 In our work

descriptors weights were calculated on their own mask by the CPU only

once

The calculating result was saved as a component of descriptor The

descriptor of sclera will change to s(x y ɸw) where w denotes the weight

of the point and the value may be 0 05 1 To align two templates when a

template is shifted to another location along the line connecting their

centers all the descriptors of that template will be transformed It would be

faster if two templates have similar reference points If we use the center of

the iris as the reference point when two templates are compared the

correspondence will automatically be aligned to each other since they have

the similar reference point Every feature vector of the template is a set of

line segment descriptors composed of three variable (Figure 8) the

segment angle to the reference line which went through the iris center

denoted as θ the distance between the segments center and pupil center

which is denoted as r the dominant angular orientation of the segment

denoted as ɸ To minimize the GPU computing we also convert the

descriptor value from polar coordinate to rectangular coordinate in CPU

preprocess

The descriptor vector becomes s(x y r θ ɸw) The left and right

parts of sclera in an eye may have different registration parameters For

example as an eyeball moves left left part sclera patterns of the eye may be

compressed while the right part sclera patterns are stretched

In parallel matching these two parts are assigned to threads in

different warps to allow different deformation The multiprocessor in

CUDA manages threads in groups of 32 parallel threads called warps We

reorganized the descriptor from same sides and saved

FIG

FIG

them in continuous address This would meet requirement of coalesced

memory access in GPU

After reorganizing the structure of descriptors and adding mask information

into the new descriptor the computation on the mask file is not needed on

the GPU It was very fast to match with this feature because it does not

need to reregister the templates every time after shifting Thus the cost of

data transfer and computation on GPU will be reduced Matching on the

new descriptor the shift parameters generator in Figure 4 is then simplified

as Figure 9

23 EVOLUTION OF GPU ARCHITECTURE

The fixed-function pipeline lacked the generality to efficiently express

more complicated shading and lighting operations that are essential for

complex effects The key step was replacing the fixed-function per-vertex

and per-fragment operations with user-specified programs run on each

vertex and fragment Over the past six years these vertex programs and

fragment programs have become increasingly more capable with larger

limits on their size and resource consumption with more fully featured

instruction sets and with more flexible control-flow operations After many

years of separate instruction sets for vertex and fragment operations current

GPUs support the unified Shader Model 40 on both vertex and fragment

shaders

The hardware must support shader programs of at least 65 k static

instructions and unlimited dynamic instructions

The instruction set for the first time supports both 32-bit integers and 32-

bit floating-point numbers

The hardware must allow an arbitrary number of both direct and indirect

reads from global memory (texture)

Finally dynamic flow control in the form of loops and branches must be

supported

As the shader model has evolved and become more powerful and GPU

applications of all types have increased vertex and fragment program

complexity GPU architectures have increasingly focused on the

programmable parts of the graphics pipeline Indeed while previous

generations of GPUs could best be described as additions of

programmability to a fixed-function pipeline todayrsquos GPUs are better

characterized as a programmable engine surrounded by supporting fixed-

function units General-Purpose Computing on the GPU Mapping general-

purpose computation onto the GPU uses the graphics hardware in much the

same way as any standard graphics application Because of this similarity it

is both easier and more difficult to explain the process On one hand the

actual operations are the same and are easy to follow on the other hand the

terminology is different between graphics and general-purpose use Harris

provides an excellent description of this mapping process

We begin by describing GPU programming using graphics terminology

then show how the same steps are used in a general-purpose way to author

GPGPU applications and finally use the same steps to show the more

simple and direct way that todayrsquos GPU computing applications are written

231 PROGRAMMING A GPU FOR GRAPHICS

We begin with the same GPU pipeline that we described in Section II

concentrating on the programmable aspects of this pipeline

The programmer specifies geometry that covers a region on the screen

The rasterizer generates a fragment at each pixel location covered by that

geometry

Each fragment is shaded by the fragment program

The fragment program computes the value of the fragment by a

combination of math operations and global memory reads from a global

Btexture[ memory

The resulting image can then be used as texture on future passes through

the graphics pipeline

232 PROGRAMMING A GPU FOR GENERAL-PURPOSE

PROGRAMS (OLD)

Coopting this pipeline to perform general-purpose computation

involves the exact same steps but different terminology A motivating

example is a fluid simulation computed over a grid at each time step we

compute the next state of the fluid for each grid point from the current state

at its grid point and at the grid points of its neighbors

The programmer specifies a geometric primitive that covers a

computation domain of interest The rasterizer generates a fragment at each

pixel location covered by that geometry (In our example our primitive

must cover a grid of fragments equal to the domain size of our fluid

simulation)

Each fragment is shaded by an SPMD general-purpose fragment

program (Each grid point runs the same program to update the state of its

fluid)

The fragment program computes the value of the fragment by a

combination of math operations and Bgather[ accesses from global

memory (Each grid point can access the state of its neighbors from the

previous time step in computing its current value)

The resulting buffer in global memory can then be used as an input on

future passes (The current state of the fluid will be used on the next time

step)

23 2 PROGRAMMING A GPU FOR GENERAL-PURPOSE

PROGRAMS (NEW)

One of the historical difficulties in programming GPGPU applications

has been that despite their general-purpose tasksrsquo having nothing to do with

graphics the applications still had to be programmed using graphics APIs

In addition the program had to be structured in terms of the graphics

pipeline with the programmable units only accessible as an intermediate

step in that pipeline when the programmer would almost certainly prefer to

access the programmable units directly The programming environments we

describe in detail in Section IV are solving this difficulty by providing a

more natural direct non-graphics interface to the hardware and

specifically the programmable units Today GPU computing applications

are structured in the following way

The programmer directly defines the computation domain of interest as a

structured grid of threads

An SPMD general-purpose program computes the value of each thread

The value for each thread is computed by a combination of math

operations and both Bgather[ (read) accesses from and Bscatter[ (write)

accesses to global memory Unlike in the previous two

methods the same buffer can be used for both reading and writing

allowing more flexible algorithms (for example in-place algorithms that

use less memory)

The resulting buffer in global memory can then be used as an input in

future computation

24 COARSE-TO-FINE TWO-STAGE MATCHING PROCESS

To further improve the matching process we propose the coarse-to-fine

two-stage matching process In the first stage we matched two images

coarsely using the Y-shape descriptors which is very fast to match because

no registration was needed The matching result in this stage can help filter

out image pairs with low similarities After this step it is still possible for

some false positive matches In the second stage we used WPL descriptor

to register the two images for more detailed descriptor matching including

scale- and translation invariance This stage includes shift transform affine

matrix generation and final WPL descriptor matching Overall we

partitioned the registration and matching processing into four kernels2 in

CUDA (Figure 10) matching on the Y shape descriptor shift

transformation affine matrix generation and final WSL descriptor

matching Combining these two stages the matching program can run faster

and

achieve more accurate score

241 STAGE I MATCHING WITH Y SHAPE DESCRIPTOR

Due to scale- and rotation- invariance of the Y-shape features

registration is unnecessary before matching on Y shape descriptor The

whole matching algorithm is listed as algorithm 1

FIG

Here ytei and yta j are the Y shape descriptors of test template Tte

and target template Tta respectively dϕ is the Euclidian distance of angle

element of descriptors vector defined as (3) dxy is the Euclidian distance of

two descriptor centers defined as (4) ni and di are the matched descriptor

pairsrsquo number and their centers distance respectively tϕ is a distance

threshold and txy is the threshold to restrict the searching area We set tϕ to

30 and txy to 675 in our experiment Here

To match two sclera templates we searched the areas nearby to all

the Y shape branches The search area is limited to the corresponding left or

right half of the sclera in order to reduce the searching range and time The

distance of two branches is defined in (3) where ϕi j is the angle between

the j th branch and the polar from pupil center in desctiptor i

The number of matched pairs ni and the distance between Y shape

branches centers di are stored as the matching result We fuse the number of

matched branches and the average distance between matched branches

centers as (2) Here α is a factor to fuse the matching score which was set

to 30 in our study Ni and Nj is the total numbers of feature vectors in

template i and j separately The decision is regulated by the threshold t if

the sclerarsquos matching score is lower than t the sclera will be discarded The

sclera with high matching score will be passed to the next more precisely

matching process

242 STAGE II FINE MATCHING USING WPL DESCRIPTOR

The line segment WSL descriptor reveals more vessel structure detail of

sclera than the Y shape descriptor The variation of sclera vessel pattern is

nonlinear because

When acquiring an eye image in different gaze angle the vessel structure

will appear nonlinear shrink or extend because eyeball is spherical in shape

sclera is made up of four layers episclera stroma lamina fusca and

endothelium There are slightly differences among movement of these

layers Considering these factors our registration employed both single

shift transform and multi-parameter transform which combines shift

rotation and scale together

1) SHIFT PARAMETER SEARCH As we discussed before

segmentation may not be accurate As a result the detected iris center could

not be very accurate Shift transform is designed to tolerant possible errors

in pupil center detection in the segmentation step If there is no deformation

or only very minor deformation registration with shift transform together

would be adequate to achieve an accurate result We designed Algorithm 2

to get optimized shift parameter Where Tte is the test template and ssei is

the i th WPL descriptor of Tte Tta is the target template and ssai is the i th

WPL descriptor of Ttad(stek staj ) is Euclidean distance of escriptors stek

and staj

Δsk is the shift value of two descriptors defines as

We first randomly select an equal number of segment descriptors

stek in test template Tte from each quad and find its nearest neighbors staj _

in target template Tta The shift offset of them is recorded as the possible

registration shift factor _sk The final offset registration factor is _soptim

which has the smallest standard deviation among these candidate offsets

2) AFFINE TRANSFORM PARAMETER SEARCH

Affine transform is designed to tolerant some deformation of sclera

patterns in the matching step The affine transform algorithm is shown in

Algorithm 3 The shift value in the parameter set is obtained by randomly

selecting descriptor s(it )te and calculating the distance from its nearest

neighbor staj_ in Tta We transform the test template by the matrix in (7)

At end of the iteration we count the numbers of matched descriptor pairs

from the transformed template and the target template The factor β is

involved to determine if the pair of descriptor is matched and we set it to

be 20 pixels in our experiment After N iterations the optimized transform

parameter set is determined via selecting the maximum matching numbers

m(it) Here stei Tte staj and Tta is defined same as algorithm 2 tr (it )

shi f t θ(it )tr (it ) scale is the parameters of shift rotation and scale

transform generated in i tth iteration R(θ (it )) T (tr (it ) shi f t ) and S(tr (it

) scale) are the transform matrix defined as (7) To search optimize

transform parameter we iterated N times to generate these parameters In

our experiment we set iteration time to 512

3) REGISTRATION AND MATCHING ALGORITHM

Using the optimized parameter set determined from Algorithms 2

and 3 the test template will be registered and matched simultaneously The

registration and matching algorithm is listed in Algorithm 4 Here stei Tte

staj and Tta are defined same as Algorithms 2 and 3 θ(optm) tr (optm) shi f

t tr (optm) scale _soptim are the registration parameters attained from

Algorithms 2 and 3 R_θ(optm)_T _tr (optm) shi f t _S_tr (optm) scale

_ is the descriptor transform matrix defined in Algorithm 3 empty is the angle

between the segment descriptor and radius direction w is the weight of the

descriptor which indicates whether the descriptor is at the edge of sclera or

not To ensure that the nearest descriptors have a similar orientation we

used a constant factor α to check the abstract difference of two ɸ In our

experiment we set α to 5 The total matching score is minimal score of two

transformed result divided by the minimal matching score for test template

and target template

25 MAPPING THE SUBTASKS TO CUDA

CUDA is a single instruction multiple data (SIMD) system and

works as a coprocessor with a CPU A CUDA consists of many streaming

multiprocessors (SM) where the parallel part of the program should be

partitioned into threads by the programmer and mapped into those threads

There are multiple memory spaces in the CUDA memory hierarchy

register local memory shared memory global memory constant memory

and texture memory Register local memory and shared memory are on-

chip and could be a little time consuming to access these memories Only

shared memory can be accessed by other threads within the same block

However there is only limited availability of shared memory Global

memory constant memory and texture memory are off-chip memory and

accessible by all threads which would be very time consuming to access

these memories

Constant memory and texture memory are read-only and cacheable

memory Mapping algorithms to CUDA to achieve efficient processing is

not a trivial task There are several challenges in CUDA programming

If threads in a warp have different control path all the branches will be

executed serially To improve performance branch divergence within a

warp should be avoided

Global memory is slower than on-chip memory in term of access To

completely hide the latency of the small instructions set we should use on-

chip memory preferentially rather than global memory When global

memory access occurs threads in same warp should access the words in

sequence to achieve coalescence

Shared memory is much faster than the local and global memory space

But shared memory is organized into banks which are equal in size If two

addresses of memory request from different thread within a warp fall in the

same memory bank the access will be serialized To get maximum

performance memory requests should be scheduled to minimize bank

conflicts

251 MAPPING ALGORITHM TO BLOCKS

Because the proposed registration and matching algorithm has four

independent modules all the modules will be converted to different kernels

on the GPU These kernels are different in computation density thus we

map them to the GPU by various map strategies to fully utilize the

computing power of CUDA Figure 11 shows our scheme of CPU-GPU

task distribution and the partition among blocks and threads Algorithm 1 is

partitioned into coarse-grained parallel subtasks

We create a number of threads in this kernel The number of threads

is the same as the number of templates in the database As the upper middle

column shows in Figure 11 each target template will be assigned to one

thread One thread performs a pair of templates compare In our work we

use NVIDIA C2070 as our GPU Threads and blocks number is set to

1024 That means we can match our test template with up to 1024times1024

target templates at same time

Algorithms 2-4 will be partitioned into fine-grained subtasks which is

processed a section of descriptors in one thread As the lower portion of the

middle column shows in Figure 11 we assigned a target template to one

block Inside a block one thread corresponds a set of descriptors in this

template This partition makes every block execute independently and there

are no data exchange requirements between different blocks When all

threads complete their responding descriptor fractions the sum of the

intermediate results needs to be computed or compared A parallel prefix

sum algorithm is used to calculate the sum of intermediate results which is

show in right of Figure 11 Firstly all odd number threads compute the sum

of consecutive pairs of the results Then recursively every first of i (= 4 8

16 32 64 ) threads

compute the prefix sum on the new result The final result will be saved in

the first address which has the same variable name as the first intermediate

result

252 MAPPING INSIDE BLOCK

In shift argument searching there are two schemes we can choose to

map task

Mapping one pair of templates to all the threads in a block and then every

thread would take charge of a fraction of descriptors and cooperation with

other threads

Assigning a single possible shift offset to a thread and all the threads will

compute independently unless the final result should be compared with

other possible offset

Due to great number of sum and synchronization operations in every

nearest neighbor searching step we choose the second method to parallelize

shift searching In affine matrix generator we mapped an entire parameter

set searching to a thread and every thread randomly generated a set of

parameters and tried them independently The generated iterations were

assigned to all threads The challenge of this step is the randomly generated

numbers might be correlated among threads In the step of rotation and

scale registration generating we used the Mersenne Twister pseudorandom

number generator because it can use bitwise arithmetic and have long

period

The Mersenne twister as most of pseudorandom generators is iterative

Therefore itrsquos hard to parallelize a single twister state update step among

several execution threads To make sure that thousands of threads in the

launch grid generate uncorrelated random sequence many simultaneous

Mersenne twisters need to process with different initial states in parallel

But even ldquovery differentrdquo (by any definition) initial state values do not

prevent the emission of correlated sequences by each generator sharing

identical parameters To solve this problem and to enable efficient

implementation of Mersenne Twister on parallel architectures we used a

special offline tool for the dynamic creation of Mersenne Twisters

parameters modified from the algorithm developed by Makoto Matsumoto

and Takuji Nishimura In the registration and matching step when

searching the nearest neighbor a line segment that has already matched

with others should not be used again In our approach a flag

FIG

FIG

Variable denoting whether the line has been matched is stored in

shared memory To share the flags all the threads in a block should wait

synchronic operation at every query step Our solution is to use a single

thread in a block to process the matching

253 MEMORY MANAGEMENT

The bandwidth inside GPU board is much higher than the

bandwidth between host memory and device memory The data transfer

between host and device can lead to long latency As shown in Figure 11

we load the entire target templates set from database without considering

when they would be processed Therefore there was no data transfer from

host to device during the matching procedure In global memory the

components in descriptor y(ϕ1 ϕ2 ϕ3 x y) and s(x y rθ ϕw) were stored

separately This would guarantee contiguous kernels of Algorithm 2 to 4

can access their data in successive addresses Although such coalescing

access reduces the latency frequently global memory access was still a

slower way to get data In our kernel we loaded the test template to shared

memory to accelerate memory access Because the Algorithms 2 to 4

execute different number of iterations on same data the bank conflict does

not happen To maximize our texture memory space we set the system

cache to the lowest value and bonded our target descriptor to texture

memory Using this catchable memory our data access was accelerated

more

FIG

26 HISTOGRAM OF ORIENTED GRADIENTS

Histogram of oriented gradients is the feature descriptors It is primarily

applied to the design of target detection In this paper it is applied as the

feature for human recognition In the sclera region the vein patterns are the

edges of an image So HOG is used to determine the gradient orientation

and edge orientations of vein pattern in the sclera region of an eye image

To follow out this technique first of all divide the image into small

connected regions called cells For each cell compute the histogram of

gradient directions or edge orientations of the pixels Then the combination

of different histogram of different cell represents the descriptor To improve

accuracy histograms can be contrast normalized by calculating the intensity

from the block and then using this value normalizes all cells within the

block This normalization result shows that it is invariant to geometric and

photometric changes The gradient magnitude m(x y) and orientation 1050592(x

y) are calculated using x and y directions gradients dx (x y) and dy (x y)

Orientation binning is the second step of HOG This method utilized

to create cell histograms Each pixel within the cell used to give a weight to

the orientation which is found in the gradient computation Gradient

magnitude is used as the weight The cells are in the rectangular form The

binning of gradient orientation should be spread over 0 to 180 degrees and

opposite direction counts as the same In the Fig 8 depicts the edge

orientation of picture elements If the images have any illumination and

contrast changes then the gradient strength must be locally normalized For

that cells are grouped together into larger blocks These blocks are

overlapping so that each cell contributes more than once to the final

descriptor Here rectangular HOG (R-HOG) blocks are applied which are

mainly in square grids The performance of HOG is improved by putting

on a Gaussian window into each block

FIG

CHAPTER 3

SOFTWARE SPECIFICATION

31 GENERAL

MATLAB(matrix laboratory) is a numerical

computing environment and fourth-generation programming language

Developed by Math Works MATLAB allows matrix manipulations

plotting of functions and data implementation of algorithms creation

of user interfaces and interfacing with programs written in other languages

including C C++ Java and Fortran

Although MATLAB is intended primarily for numerical computing an

optional toolbox uses the MuPAD symbolic engine allowing access

to symbolic computing capabilities An additional package Simulink adds

graphicalmulti-domainsimulationandModel-Based

Design for dynamic and embedded systems

In 2004 MATLAB had around one million users across industry

and academia MATLAB users come from various backgrounds

of engineering science and economics MATLAB is widely used in

academic and research institutions as well as industrial enterprises

MATLAB was first adopted by researchers and practitioners

in control engineering Littles specialty but quickly spread to many other

domains It is now also used in education in particular the teaching

of linear algebra and numerical analysis and is popular amongst scientists

involved in image processing The MATLAB application is built around the

MATLAB language The simplest way to execute MATLAB code is to type

it in the Command Window which is one of the elements of the MATLAB

Desktop When code is entered in the Command Window MATLAB can

be used as an interactive mathematical shell Sequences of commands can

be saved in a text file typically using the MATLAB Editor as a script or

encapsulated into a function extending the commands available

MATLAB provides a number of features for documenting and

sharing your work You can integrate your MATLAB code with other

languages and applications and distribute your MATLAB algorithms and

applications

32 FEATURES OF MATLAB

High-level language for technical computing

Development environment for managing code files and data

Interactive tools for iterative exploration design and problem solving

Mathematical functions for linear algebra statistics Fourier analysis

filtering optimization and numerical integration

2-D and 3-D graphics functions for visualizing data

Tools for building custom graphical user interfaces

Functions for integrating MATLAB based algorithms with external

applications and languages such as C C++ FORTRAN Javatrade COM

and Microsoft Excel

MATLAB is used in vast area including signal and image

processing communications control design test and measurement

financial modeling and analysis and computational Add-on toolboxes

(collections of special-purpose MATLAB functions) extend the MATLAB

environment to solve particular classes of problems in these application

areas

MATLAB can be used on personal computers and powerful

server systems including the Cheaha compute cluster With the addition of

the Parallel Computing Toolbox the language can be extended with parallel

implementations for common computational functions including for-loop

unrolling Additionally this toolbox supports offloading computationally

intensive workloads to Cheaha the campus compute cluster MATLAB is

one of a few languages in which each variable is a matrix (broadly

construed) and knows how big it is Moreover the fundamental operators

(eg addition multiplication) are programmed to deal with matrices when

required And the MATLAB environment handles much of the bothersome

housekeeping that makes all this possible Since so many of the procedures

required for Macro-Investment Analysis involves matrices MATLAB

proves to be an extremely efficient language for both communication and

implementation

321 INTERFACING WITH OTHER LANGUAGES

MATLAB can call functions and subroutines written in the C

programming language or FORTRAN A wrapper function is created

allowing MATLAB data types to be passed and returned The dynamically

loadable object files created by compiling such functions are termed MEX-

files (for MATLAB executable)

Libraries written in Java ActiveX or NET can be directly called

from MATLAB and many MATLAB libraries (for

example XML or SQL support) are implemented as wrappers around Java

or ActiveX libraries Calling MATLAB from Java is more complicated but

can be done with MATLAB extension which is sold separately by Math

Works or using an undocumented mechanism called JMI (Java-to-Mat lab

Interface) which should not be confused with the unrelated Java that is also

called JMI

As alternatives to the MuPAD based Symbolic Math Toolbox

available from Math Works MATLAB can be connected

to Maple or Mathematical

Libraries also exist to import and export MathML

Development Environment

Startup Accelerator for faster MATLAB startup on Windows especially on

Windows XP and for network installations

Spreadsheet Import Tool that provides more options for selecting and

loading mixed textual and numeric data

Readability and navigation improvements to warning and error messages in

the MATLAB command window

Automatic variable and function renaming in the MATLAB Editor

Developing Algorithms and Applications

MATLAB provides a high-level language and development

tools that let you quickly develop and analyze your algorithms and

applications

The MATLAB Language

The MATLAB language supports the vector and matrix operations

that are fundamental to engineering and scientific problems It enables fast

development and execution With the MATLAB language you can

program and develop algorithms faster than with traditional languages

because you do not need to perform low-level administrative tasks such as

declaring variables specifying data types and allocating memory In many

cases MATLAB eliminates the need for lsquoforrsquo loops As a result one line of

MATLAB code can often replace several lines of C or C++ code

At the same time MATLAB provides all the features of a traditional

programming language including arithmetic operators flow control data

structures data types object-oriented programming (OOP) and debugging

features

MATLAB lets you execute commands or groups of commands one

at a time without compiling and linking enabling you to quickly iterate to

the optimal solution For fast execution of heavy matrix and vector

computations MATLAB uses processor-optimized libraries For general-

purpose scalar computations MATLAB generates machine-code

instructions using its JIT (Just-In-Time) compilation technology

This technology which is available on most platforms provides

execution speeds that rival those of traditional programming languages

Development Tools

MATLAB includes development tools that help you implement

your algorithm efficiently These include the following

MATLAB Editor

Provides standard editing and debugging features such as setting

breakpoints and single stepping

Code Analyzer

Checks your code for problems and recommends modifications to

maximize performance and maintainability

MATLAB Profiler

Records the time spent executing each line of code

Directory Reports

Scan all the files in a directory and report on code efficiency file

differences file dependencies and code coverage

Designing Graphical User Interfaces

By using the interactive tool GUIDE (Graphical User Interface

Development Environment) to layout design and edit user interfaces

GUIDE lets you include list boxes pull-down menus push buttons radio

buttons and sliders as well as MATLAB plots and Microsoft

ActiveXreg controls Alternatively you can create GUIs programmatically

using MATLAB functions

322 ANALYZING AND ACCESSING DATA

MATLAB supports the entire data analysis process from acquiring

data from external devices and databases through preprocessing

visualization and numerical analysis to producing presentation-quality

output

Data Analysis

MATLAB provides interactive tools and command-line functions for data

analysis operations including

Interpolating and decimating

Extracting sections of data scaling and averaging

Thresholding and smoothing

Correlation Fourier analysis and filtering

1-D peak valley and zero finding

Basic statistics and curve fitting

Matrix analysis

Data Access

MATLAB is an efficient platform for accessing data from

files other applications databases and external devices You can read data

from popular file formats such as Microsoft Excel ASCII text or binary

files image sound and video files and scientific files such as HDF and

HDF5 Low-level binary file IO functions let you work with data files in

any format Additional functions let you read data from Web pages and

XML

Visualizing Data

All the graphics features that are required to visualize engineering

and scientific data are available in MATLAB These include 2-D and 3-D

plotting functions 3-D volume visualization functions tools for

interactively creating plots and the ability to export results to all popular

graphics formats You can customize plots by adding multiple axes

changing line colors and markers adding annotation Latex equations and

legends and drawing shapes

2-D Plotting

Visualizing vectors of data with 2-D plotting functions that create

Line area bar and pie charts

Direction and velocity plots

Histograms

Polygons and surfaces

Scatterbubble plots

Animations

3-D Plotting and Volume Visualization

MATLAB provides functions for visualizing 2-D matrices 3-

D scalar and 3-D vector data You can use these functions to visualize and

understand large often complex multidimensional data Specifying plot

characteristics such as camera viewing angle perspective lighting effect

light source locations and transparency

3-D plotting functions include

Surface contour and mesh

Image plots

Cone slice stream and isosurface

323 PERFORMING NUMERIC COMPUTATION

MATLAB contains mathematical statistical and engineering

functions to support all common engineering and science operations These

functions developed by experts in mathematics are the foundation of the

MATLAB language The core math functions use the LAPACK and BLAS

linear algebra subroutine libraries and the FFTW Discrete Fourier

Transform library Because these processor-dependent libraries are

optimized to the different platforms that MATLAB supports they execute

faster than the equivalent C or C++ code

MATLAB provides the following types of functions for performing

mathematical operations and analyzing data

Matrix manipulation and linear algebra

Polynomials and interpolation

Fourier analysis and filtering

Data analysis and statistics

Optimization and numerical integration

Ordinary differential equations (ODEs)

Partial differential equations (PDEs)

Sparse matrix operations

MATLAB can perform arithmetic on a wide range of data types

including doubles singles and integers

CHAPTER 4

IMPLEMENTATION

41 GENERAL

Matlab is a program that was originally designed to simplify the

implementation of numerical linear algebra routines It has since grown into

something much bigger and it is used to implement numerical algorithms

for a wide range of applications The basic language used is very similar to

standard linear algebra notation but there are a few extensions that will

likely cause you some problems at first

42 SNAPSHOTS

ORIGINAL SCLERA IMAGE IS CONVERTED INTO GREY SCALE IMAGE

FIG

GREY SCALE IMAGE IS CONVERTED INTO BINARY IMAGE

FIG

EDGE DETECTON IS DONE BY OTSUrsquoS THRESHOLDING

FIG

SELECTING THE REGION OF INTEREST (SCLERA PART)

FIG

SELECTED ROI PART

FIG

FIG

ENHANCEMENT OF SCLERA IMAGE

FIG

FEATURE EXTRACTION OF SCLERA IMAGE USING GABOR

FILTERS

FIG

MATCHING WITH IMAGES IN DATABASE

FIG

DISPLAYING THE RESULT (MATCHED OR NOT MATCHED)

FIG

CHAPTER 5

APPLICATIONS

The applications of biometrics can be divided into the following three main groups

Commercial applications such as computer network login electronic data security ecommerce Internet access ATM credit card physical access control cellular phone PDA medical records management distance learning etc

Government applications such as national ID card correctional facility driverrsquos license social security welfare-disbursement border control Passport control etc

Forensic applications such as corpse identification criminal investigation terrorist identification parenthood determination missing children etc Traditionally commercial applications have used knowledge-based systems (eg PIN sand passwords) government applications have used token-based systems (eg ID cards and badges) and forensic applications have relied on human experts to match biometric features Biometric systems are being increasingly deployed in large scale civilian applications The Schiphol Premium scheme at the Amsterdam airport for example employs iris scan cards to speed up the passport and visa control procedures

CHAPTER 6

CONCLUSION AND FUTURE SCOPE

61 CONCLUSION

In this paper we proposed a new parallel sclera vein recognition

method which employees a two stage parallel approach for registration and

matching Even though the research focused on developing a parallel sclera

matching solution for the sequential line-descriptor method using CUDA

GPU architecture the parallel strategies developed in this research can be

applied to design parallel solutions to other sclera vein recognition methods

and general pattern recognition methods We designed the Y shape

descriptor to narrow the search range to increase the matching efficiency

which is a new feature extraction method to take advantage of the GPU

structures We developed the WPL descriptor to incorporate mask

information and make it more suitable for parallel computing which can

dramatically reduce data transferring and computation We then carefully

mapped our algorithms to GPU threads and blocks which is an important

step to achieve parallel computation efficiency using a GPU A work flow

which has high arithmetic intensity to hide the memory access latency was

designed to partition the computation task to the heterogeneous system of

CPU and GPU even to the threads in GPU The proposed method

dramatically improves the matching efficiency without compromising

recognition accuracy

62 REFERENCES

[1] C W Oyster The Human Eye Structure and Function Sunderland

Sinauer Associates 1999

[2] C Cuevas D Berjon F Moran and N Garcia ldquoMoving object

detection for real-time augmented reality applications in a GPGPUrdquo IEEE

Trans Consum Electron vol 58 no 1 pp 117ndash125 Feb 2012

[3] D C Cirean U Meier L M Gambardella and J Schmidhuber ldquoDeep

big simple neural nets for handwritten digit recognitionrdquo Neural Comput

vol 22 no 12 pp 3207ndash3220 2010

[4] F Z Sakr M Taher and A M Wahba ldquoHigh performance iris

recognition system on GPUrdquo in Proc ICCES 2011 pp 237ndash242

[5] G Poli J H Saito J F Mari and M R Zorzan ldquoProcessing

neocognitron of face recognition on high performance environment based

on GPU with CUDA architecturerdquo in Proc 20th Int Symp Comput

Archit High Perform Comput 2008 pp 81ndash88

[6] J Antikainen J Havel R Josth A Herout P Zemcik and M Hauta-

Kasari ldquoNonnegative tensor factorization accelerated using GPGPUrdquo IEEE

Trans Parallel Distrib Syst vol 22 no 7 pp 1135ndash1141 Feb 2011

[7] K-S Oh and K Jung ldquoGPU implementation of neural networksrdquo

Pattern Recognit vol 37 no 6 pp 1311ndash1314 2004

[8] P R Dixon T Oonishi and S Furui ldquoHarnessing graphics processors

for the fast computation of acoustic likelihoods in speech recognitionrdquo

Comput Speech Lang vol 23 no 4 pp 510ndash526 2009

[9] P Kaufman and A Alm ldquoClinical applicationrdquo Adlerrsquos Physiology of

the Eye 2003

[10] R N Rakvic B J Ulis R P Broussard R W Ives and N Steiner

ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics Security vol 4

no 4 pp 812ndash823 Dec 2009

[11] S Crihalmeanu and A Ross ldquoMultispectral scleral patterns for ocular

biometric recognitionrdquo Pattern Recognit Lett vol 33 no 14 pp 1860ndash

1869 Oct 2012

[12] W Wenying Z Dongming Z Yongdong L Jintao and G

Xiaoguang ldquoRobust spatial matching for object retrieval and its parallel

implementation on GPUrdquo IEEE Trans Multimedia vol 13 no 6 pp

1308ndash1318 Dec 2011Multimedia Sec Magdeburg Germany Sep 2004

pp 4ndash15

[13] Y Xu S Deka and R Righetti ldquoA hybrid CPU-GPGPU approach for

real-time elastographyrdquo IEEE Trans Ultrason Ferroelectr Freq Control

vol 58 no 12 pp 2631ndash2645 Dec 2011

[14] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive

multimodal eye recognitionrdquo Signal Image Video Process vol 7 no 4

pp 619ndash631 Jul 2013

[15] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive

approach for sclera image quality measurerdquo Int J Biometrics vol 5 no 2

pp 181ndash198 2013

[16] Z Zhou E Y Du N L Thomas and E J Delp ldquoA new human

identification method Sclera recognitionrdquo IEEE Trans Syst Man

Cybern A Syst Humans vol 42 no 3 pp 571ndash583 May 2012

Page 13: Major project report on

In imaging science image processing is any form of signal

processing for which the input is an image such as a photograph or video

frame the output of image processing may be either an image or a set of

characteristics or parameters related to the image Most image-processing

techniques involve treating the image as a two-dimensional signal and

applying standard signal-processing techniques to it Image processing

usually refers to digital image processing but optical and analog image

processing also are possible This article is about general techniques that

apply to all of them The acquisition of images (producing the input image

in the first place) is referred to as imaging

Image processing refers to processing of a 2D picture by a

computer Basic definitions

An image defined in the ldquoreal worldrdquo is considered to be a function

of two real variables for example a(xy) with a as the amplitude (eg

brightness) of the image at the real coordinate position (xy) Modern digital

technology has made it possible to manipulate multi-dimensional signals

with systems that range from simple digital circuits to advanced parallel

computers The goal of this manipulation can be divided into three

categories

Image processing (image in -gt image out)

Image Analysis (image in -gt measurements out)

Image Understanding (image in -gt high-level description out)

An image may be considered to contain sub-images sometimes referred

to as regions-of-interest ROIs or simply regions This concept reflects the

fact that images frequently contain collections of objects each of which can

be the basis for a region In a sophisticated image processing system it

should be possible to apply specific image processing operations to selected

regions Thus one part of an image (region) might be processed to suppress

motion blur while another part might be processed to improve colour

rendition

Most usually image processing systems require that the images be

available in digitized form that is arrays of finite length binary words For

digitization the given Image is sampled on a discrete grid and each sample

or pixel is quantized using a finite number of bits The digitized image is

processed by a computer To display a digital image it is first converted

into analog signal which is scanned onto a display Closely related to

image processing are computer graphics and computer vision In computer

graphics images are manually made from physical models of objects

environments and lighting instead of being acquired (via imaging devices

such as cameras) from natural scenes as in most animated movies

Computer vision on the other hand is often considered high-level image

processing out of which a machinecomputersoftware intends to decipher

the physical contents of an image or a sequence of images (eg videos or

3D full-body magnetic resonance scans)

In modern sciences and technologies images also gain much

broader scopes due to the ever growing importance of scientific

visualization (of often large-scale complex scientificexperimental data)

Examples include microarray data in genetic research or real-time multi-

asset portfolio trading in finance Before going to processing an image it is

converted into a digital form Digitization includes sampling of image and

quantization of sampled values After converting the image into bit

information processing is performed This processing technique may be

Image enhancement Image restoration and Image compression

122 IMAGE ENHANCEMENT

It refers to accentuation or sharpening of image features such as

boundaries or contrast to make a graphic display more useful for display amp

analysis This process does not increase the inherent information content in

data It includes gray level amp contrast manipulation noise reduction edge

crispening and sharpening filtering interpolation and magnification

pseudo coloring and so on

123 IMAGE RESTORATION

It is concerned with filtering the observed image to minimize the

effect of degradations Effectiveness of image restoration depends on the

extent and accuracy of the knowledge of degradation process as well as on

filter design Image restoration differs from image enhancement in that the

latter is concerned with more extraction or accentuation of image features

124 IMAGE COMPRESSION

It is concerned with minimizing the number of bits required to represent

an image Application of compression are in broadcast TV remote sensing

via satellite military communication via aircraft radar teleconferencing

facsimile transmission for educational amp business documents medical

images that arise in Computer tomography magnetic resonance imaging

and digital radiology motion pictures satellite images weather maps

geological surveys and so on

Text compression ndash CCITT GROUP3 amp GROUP4

Still image compression ndash JPEG

Video image compression ndash MPEG

125 SEGMENTATION

In computer vision image segmentation is the process of

partitioning a digital image into multiple segments (sets of pixels also

known as super pixels) The goal of segmentation is to simplify andor

change the representation of an image into something that is more

meaningful and easier to analyze Image segmentation is typically used to

locate objects and boundaries (lines curves etc) in images More precisely

image segmentation is the process of assigning a label to every pixel in an

image such that pixels with the same label share certain visual

characteristics

The result of image segmentation is a set of segments that

collectively cover the entire image or a set of contours extracted from the

image (see edge detection) Each of the pixels in a region are similar with

respect to some characteristic or computed property such as

colour intensity or texture Adjacent regions are significantly different

with respect to the same characteristic(s) When applied to a stack of

images typical in medical imaging the resulting contours after image

segmentation can be used to create 3D reconstructions with the help of

interpolation algorithms like marching cubes

126 IMAGE RESTORATION

Image restoration like enhancement improves the qualities of image

but all the operations are mainly based on known measured or

degradations of the original image Image restorations are used to restore

images with problems such as geometric distortion improper focus

repetitive noise and camera motion It is used to correct images for known

degradations

127 FUNDAMENTAL STEPS

Image acquisition to acquire a digital image

Image preprocessing to improve the image in ways that increases the

chances for success of the other processes

Image segmentation to partitions an input image into its constituent parts or

objects

Image representation to convert the input data to a form suitable for

computer processing

Image description to extract features that result in some quantitative

information of interest or features that are basic for differentiating one

class of objects from another

Image recognition to assign a label to an object based on the

information provided by its descriptors

Image interpretation to assign meaning to an ensemble of recognized

objects

Knowledge about a problem domain is coded into an image processing

system in the form of a Knowledge database

13 A SIMPLE IMAGE MODEL

To be suitable for computer processing an image f(xy) must be digitalized

both spatially and in amplitude

Digitization of the spatial coordinates (xy) is called image sampling

Amplitude digitization is called gray-level quantization

The storage and processing requirements increase rapidly with the spatial

resolution and the number of gray levels

Example A 256 gray-level image of size 256x256 occupies 64K bytes of

memory

Images of very low spatial resolution produce a checkerboard effect

The use of insufficient number of gray levels in smooth areas of a digital

image results in false contouring

14 IMAGE FILE FORMATS

There are two general groups of lsquoimagesrsquo vector graphics (or line art)

and bitmaps (pixel-based or lsquoimagesrsquo) Some of the most common file

formats are

GIF mdash Graphical interchange Format An 8-bit (256 colour) non-

destructively compressed bitmap format Mostly used for web Has several

sub-standards one of which is the animated GIF

JPEG mdash Joint Photographic Experts Group a very efficient (ie much

information per byte) destructively compressed 24 bit (16 million colours)

bitmap format Widely used especially for web and Internet (bandwidth-

limited)

TIFF mdash Tagged Image File Format The standard 24 bit publication bitmap

format Compresses non-destructively with for instance Lempel-Ziv-

Welch (LZW) compression

PS mdash Postscript a standard vector format Has numerous sub-standards

and can be difficult to transport across platforms and operating systems

PSD ndash Adobe PhotoShop Document a dedicated Photoshop format that

keeps all the information in an image including all the layers

BMP- bit map file format

15 TYPE OF IMAGES

Images are 4 types

1 Binary image

2 Gray scale image

3 Color image

4 Indexed image

151 BINARY IMAGES

A binary image is a digital image that has only two possible values for

each pixel Typically the two colors used for a binary image are black and

white though any two colors can be used Binary images are also called bi-

level or two-level This means that each pixel is stored as a single bitmdashie

a 0 or 1 The names black-and-white BampW

152 GRAY SCALE IMAGE

In a (8-bit) grayscale image each picture element has an assigned intensity

that ranges from 0 to 255 A grey scale image is what people normally call

a black and white image but the name emphasizes that such an image will

also include many shades of grey

FIG

153 COLOR IMAGE

The RGB colour model relates very closely to the way we perceive

colour with the r g and b receptors in our retinas RGB uses additive colour

mixing and is the basic colour model used in television or any other

medium that projects colour with light It is the basic colour model used in

computers and for web graphics but it cannot be used for print production

The secondary colours of RGB ndash cyan magenta and yellow ndash are formed

by mixing two of the primary colours (red green or blue) and excluding the

third colour Red and green combine to make yellow green and blue to

make cyan and blue and red form magenta The combination of red green

and blue in full intensity makes white

In Photoshop using the ldquoscreenrdquo mode for the different layers in an

image will make the intensities mix together according to the additive

colour mixing model This is analogous to stacking slide images on top of

each other and shining light through them

FIG

CMYK The 4-colour CMYK model used in printing lays down

overlapping layers of varying percentages of transparent cyan (C) magenta

(M) and yellow (Y) inks In addition a layer of black (K) ink can be added

The CMYK model uses the subtractive colour model

154 INDEXED IMAGE

FIG

An indexed image consists of an array and a color map matrix The

pixel values in the array are direct indices into a color map By convention

this documentation uses the variable name X to refer to the array and map

to refer to the color map In computing indexed color is a technique to

manage digital images colors in a limited fashion in order to save

computer memory and file storage while speeding up display refresh and

file transfers It is a form of vector quantization compression

When an image is encoded in this way color information is not

directly carried by the image pixel data but is stored in a separate piece of

data called a palette an array of color elements in which every element a

color is indexed by its position within the array The image pixels do not

contain the full specification of its color but only its index in the palette

This technique is sometimes referred as pseudocolor or indirect color as

colors are addressed indirectly

Perhaps the first device that supported palette colors was a random-

access frame buffer described in 1975 by Kajiya Sutherland and Cheadle

This supported a palette of 256 36-bit RGB colors

16 Applications of image processing

Interest in digital image processing methods stems from 2 principal

application areas

1) Improvement of pictorial information for human interpretation

2) Processing of scene data for autonomous machine perception

In the second application area interest focuses on procedures for

extracting from an image

Information in a form suitable for computer processing

Examples include automatic character recognition industrial machine

vision for product assembly and inspection military recognizance

automatic processing of fingerprints etc

17 EXISTING SYSTEM

In Crihalmeanu and Ross proposed three approaches Speed Up Robust

Features (SURF)-based method minutiae detection and direct correlation

matching for feature registration and matching Within these three methods

the SURF method achieves the best accuracy It takes an average of 15

seconds1 using the SURF method to perform a one-to-one matching In

Zhou et al Proposed line descriptor-based method for sclera vein

recognition The matching step (including registration) is the most time-

consuming step in this sclera vein recognition system which costs about 12

seconds to perform a one-to-one matching Both speed was calculated using

a PC with Intelreg Core TM 2 Duo 24GHz processors and 4 GB DRAM

Currently Sclera vein recognition algorithms are designed using central

processing unit (CPU)-based systems

171 DISADVANTAGES OF EXISTING SYSTEM

1 Mask files are used to calculate valid overlapping areas of two sclera

templates and to align the templates to the same coordinate system But the

mask files are large in size and will preoccupy the GPU memory and slow

down the data transfer Also some of processing on the mask files will

involve convolution which is difficult to improve its performance on the

scalar process unit on CUDA

2 The procedure of sclera feature matching consists of a pipeline of several

computational stages with different memory and processing requirements

There is no uniform mapping scheme applicable for all these stages

3 When the scale of sclera database is far larger than the number of the

processing units on the GPU parallel matching on the GPU is still unable to

satisfy the requirement of real-time performance

LITERATURE SURVEY

1 S Crihalmeanu and A Ross ldquo ieee Multispectral scleral patterns

for ocular biometric recognitionrdquo Pattern Recognit Lett vol 33 no

14 pp 1860ndash1869 Oct 2012

Face recognition in unconstrained acquisition conditions is one of the

most challenging problems that has been actively researched in recent

years It is well known that many state-of-the-arts still face recognition

algorithms perform well when constrained (frontal well illuminated high-

resolution sharp and full) face images are acquired However their

performance degrades significantly when the test images contain variations

that are not present in the training images In this paper we highlight some

of the key issues in remote face recognition We define the remote face

recognition as one where faces are several tens of meters (10-250m) from

the cameras We then describe a remote face database which has been

acquired in an unconstrained outdoor maritime environment Recognition

performance of a subset of existing still image-based face recognition

algorithms is evaluated on the remote face data set Further we define the

remote re-identification problem as matching a subject at one location with

candidate sets acquired at a different location and over time in remote

conditions We provide preliminary experimental results on remote re-

identification It is demonstrated that in addition to applying a good

classification algorithm finding features that are robust to variations

mentioned above and developing statistical models which can account for

these variations are very important for remote face recognition

2 R N Rakvic B J Ulis R P Broussard R W Ives and N

Steiner1 ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics

Security

With the rapidly expanded biometric data collected by various sectors

of government and industry for identification and verification purposes

how to manage and process such Big Data draws great concern Even

though modern processors are equipped with more cores and memory

capacity it still requires careful design in order to utilize the hardware

resource effectively and the power consumption efficiently This research

addresses this issue by investigating the workload characteristics of

biometric application Taking Daugmanrsquos iris matching algorithm which

has been proven to be the most reliable iris matching method as a case

study we conduct performance profiling and binary instrumentation on the

benchmark to capture its execution behavior The results show that data

loading and memory access incurs great performance overhead and

motivates us to move the biometrics computation to high-performance

architecture

Modern iris recognition algorithms can be computationally intensive

yet are designed for traditional sequential processing elements such as a

personal computer However a parallel processing alternative using field

programmable gate arrays (FPGAs) offers an opportunity to speed up iris

recognition Within the means of this project iris template generation with

directional filtering which is acomputationally expensive yet parallel

portion of a modern iris recognition algorithm is parallelized on an FPGA

system We will present a performance comparison of the parallelized

algorithm on the FPGA system to a traditional CPU-based version The

parallelized template generation outperforms an optimized C++ code

version determining the information content of an iris approximately 324

times faster

3 R Derakhshani A Ross and S Crihalmeanu ldquoA new biometric

modality based on conjunctival vasculaturerdquo in Proc Artif Neural

Netw Eng 2006 pp 1ndash8

A new biometric indicator based on the patterns of conjunctival

vasculature is proposed Conjunctival vessels can be observed on the visible

part of the sclera that is exposed to the outside world These vessels

demonstrate rich and specific details in visible light and can be easily

photographed using a regular digital camera In this paper we discuss

methods for conjunctival imaging preprocessing and feature extraction in

order to derive a suitable conjunctival vascular template for biometric

authentication Commensurate classification methods along with the

observed accuracy are discussed Experimental results suggest the potential

of using conjunctival vasculature as a biometric measure Identification of

a person based on some unique set of features is an important task The

human identification is possible with several biometric systems and sclera

recognition is one of the promising biometrics The sclera is the white

portion of the human eye The vein pattern seen in the sclera region is

unique to each person Thus the sclera vein pattern is a well suited

biometric technology for human identification The existing methods used

for sclera recognition have some drawbacks like only frontal looking

images are preferred for matching and rotation variance is another problem

These problems are completely eliminated in the proposed system by using

two feature extraction techniques They are Histogram of Oriented

Gradients (HOG) and converting the image into polar form using the

bilinear interpolation technique These two features help the proposed

system to become illumination invariant and rotation invariant The

experimentation is done with the help of UBIRIS database The

experimental result shows that the proposed sclera recognition method can

achieve better accuracy than the previous methods

4 J D Owens M Houston D Luebke S Green J E Stone and J

C Phillips ldquoGPU computingrdquo Proc IEEE vol 96 no 5pp 879ndash899

May 2008

The graphics processing unit (GPU) has become an integral part of

todayrsquos mainstream computing systems Over the past six years there has

been a marked increase in the performance and capabilities of GPUs The

modern GPU is not only a powerful graphics engine but also a highly

parallel programmable processor featuring peak arithmetic and memory

bandwidth that substantially outpaces its CPU counterpart The GPUrsquos

rapid increase in both programmability and capability has spawned a

research community that has successfully mapped a broad range of

computationally demanding complex problems to the GPU This effort in

general purpose computing on the GPU also known as GPU computing

has positioned the GPU as a compelling alternative to traditional

microprocessors in high-performance computer systems of the future We

describe the background hardware and programming model for GPU

computing summarize the state of the art in tools and techniques and

present four GPU computing successes in game physics and computational

biophysics that deliver order-of-magnitude performance gains over

optimized CPU applications

5 H Proenccedila and L A Alexandre ldquoUBIRIS A noisy iris image

databaserdquoin Proc 13th Int Conf Image Anal Process 2005 pp 970ndash

977

This paper proposes algorithms for iris segmentation quality

enhancement match score fusion and indexing to improve both the

accuracy and the speed of iris recognition A curve evolution approach is

proposed to effectively segment a nonideal iris image using the modified

MumfordndashShah functional Different enhancement algorithms are

concurrently applied on the segmented iris image to produce multiple

enhanced versions of the iris image A support-vector-machine-based

learning algorithm selects locally enhanced regions from each globally

enhanced image and combines these good-quality regions to create a single

high-quality iris image Two distinct features are extracted from the high-

quality iris image The global textural feature is extracted using the 1-D log

polar Gabor transform and the local topological feature is extracted using

Euler numbers An intelligent fusion algorithm combines the textural and

topological matching scores to further improve the iris recognition

performance and reduce the false rejection rate whereas an indexing

algorithm enables fast and accurate iris identification The verification and

identification performance of the propose algorithms is validated and

compared with other algorithms using the CASIA Version 3 ICE 2005 and

UBIRIS iris databases

18 PROPOSED METHOD

Proposed a new parallel sclera vein recognition method using a two-

stage parallel approach for registration and matching A parallel sclera

matching solution for Sclera vein recognition using our sequential line-

descriptor method using the CUDA GPU architecture CUDA is a highly

parallel multithreaded many-core processor with tremendous

computational power

It supports not only a traditional graphics pipeline but also computation

on non-graphical data It is relatively straightforward to implement our C

program for CUDA on AMD-based GPU using Open CL Our CUDA

kernels can be directly converted to Open CL kernels by concerning

different syntax for various keywords and built-in functions The mapping

strategy is also effective in Open CL if we regard thread and block in

CUDA as work item and work-group in Open CL Most of our optimization

techniques such as coalesced memory access and prefix sum can work in

Open CL too Moreover since CUDA is a data parallel architecture the

implementation of our approach by Open CL should be programmed in

data-parallel model

In this research we first discuss why the naiumlve parallel approach would

not work We then propose the new sclera descriptor ndash the Y shape sclera

feature-based efficient registration method to speed up the mapping scheme

introduce the ldquoweighted polar line (WPL) descriptorrdquo that would be better

suited for parallel computing to mitigate the mask size issue and develop

our coarse to fine two-stage matching process to dramatically improve the

matching speed These new approaches make the parallel processing

possible and efficient

191PROPOSED SYSTEM ADVANTAGES

1 To improve the efficiency in this research we propose a new descriptor

mdash the Y shape descriptor which can greatly help improve the efficiency of

the coarse registration of two images and can be used to filter out some

non-matching pairs before refined matching

2 We propose the coarse-to-fine two-stage matching process In the first

stage we matched two images coarsely using the Y-shape descriptors

which is very fast to match because no registration was needed The

matching result in this stage can help filter out image pairs with low

similarities

CHAPTER 2

PROJECT DESCRIPTION

21 INTRODUCTION

The sclera is the opaque and white outer layer of the eye The blood

vessel structure of sclera is formed randomly and is unique to each person

which can be used for humanrsquos identification Several researchers have

designed different Sclera vein recognition methods and have shown that it

is promising to use Sclera vein recognition for human identification In

Crihalmeanu and Ross proposed three approaches Speed Up Robust

Features (SURF)-based method minutiae detection and direct correlation

matching for feature registration and matching Within these three methods

the SURF method achieves the best accuracy It takes an average of 15

seconds1 using the SURF method to per- form a one-to-one matching Zhou

et al proposed line descriptor-based method for sclera vein recognition

The matching step (including registration) is the most time-consuming step

in this sclera vein recognition system which costs about 12 seconds to

perform a one-to-one matching Both speed was calculated using a PC with

Intelreg Coretrade 2 Duo 24GHz processors and 4 GB DRAM Currently

Sclera vein recognition algorithms are designed using central processing

unit (CPU)-based systems

As discussed CPU-based systems are designed as sequential

processing devices which may not be efficient in data processing where the

data can be parallelized Because of large time consumption in the matching

step Sclera vein recognition using sequential-based method would be very

challenging to be implemented in a real time biometric system especially

when there is large number of templates in the database for matching GPUs

(as abbreviation of General purpose Graphics Processing Units GPGPUs)

are now popularly used for parallel computing to improve the

computational processing speed and efficiency The highly parallel

structure of GPUs makes them more effective than CPUs for data

processing where processing can be performed in parallel GPUs have been

widely used in biometrics recognition such as speech recognition text

detection handwriting recognition and face recognition In iris

recognition GPU was used to extract the features construct descriptors

and match templates

GPUs are also used for object retrieval and image search Park et al

designed the performance evaluation of image processing algorithms such

as linear feature extraction and multi-view stereo matching on GPUs

However these approaches were designed for their specific biometric

recognition applications and feature searching methods Therefore they may

not be efficient for Sclera vein recognition Compute Unified Device

Architecture (CUDA) the computing engine of NVIDIA GPUs is used in

this research CUDA is a highly parallel multithreaded many-core

processor with tremendous computational power It supports not only a

traditional graphics pipeline but also computation on non-graphical data

More importantly it offers an easier programming platform which

outperforms its CPU counterparts in terms of peak arithmetic intensity and

memory bandwidth In this research the goal is not to develop a unified

strategy to parallelize all sclera matching methods because each method is

quite different from one another and would need customized design To

develop an efficient parallel computing scheme it would need different

strategies for different Sclera vein recognition methods

Rather the goal is to develop a parallel sclera matching solution for

Sclera vein recognition using our sequential line-descriptor method using

the CUDA GPU architecture However the parallelization strategies

developed in this research can be applied to design parallel approaches for

other Sclera vein recognition methods and help parallelize general pattern

recognition methods Based on the matching approach in there are three

challenges to map the task of sclera feature matching to GPU

1) Mask files are used to calculate valid overlapping areas of two sclera

templates and to align the templates to the same coordinate system But the

mask files are large in size and will preoccupy the GPU memory and slow

down the data transfer Also some of processing on the mask files will

involve convolution which is difficult to improve its performance on the

scalar process unit on CUDA

2) The procedure of sclera feature matching consists of a pipeline of several

computational stages with different memory and processing requirements

There is no uniform mapping scheme applicable for all these stages

3) When the scale of sclera database is far larger than the number of the

processing units on the GPU parallel matching on the GPU is still unable to

satisfy the requirement of real-time performance New designs are

necessary to help narrow down the search range In summary naiumlve

implementation of the algorithms in parallel would not work efficiently

Note it is relatively straightforward to implement our C program for

CUDA on AMD-based GPU using Open CL Our CUDA kernels can be

directly converted to Open CL kernels by concerning different syntax for

various keywords and built-in functions The mapping strategy is also

effective in Open CL if we regard thread and block in CUDA as work item

and work-group in Open CL Most of our optimization techniques such as

coalesced memory access and prefix sum can work in Open CL too

Moreover since CUDA is a data parallel architecture the implementation

of our approach by Open CL should be programmed in data-parallel model

In this research we first discuss why the naiumlve parallel approach would not

work (Section 3) We then propose the new sclera descriptor ndash the Y shape

sclera feature-based efficient registration method to speed up the mapping

scheme (Section 4) introduce the ldquoweighted polar line (WPL) descriptorrdquo

that would be better suited for parallel computing to mitigate the mask size

issue (Section 5) and develop our coarse to fine two-stage matching

process to dramatically improve the matching speed (Section 6) These new

approaches make the parallel processing possible and efficient However it

is non-trivial to implement these algorithms in CUDA We then developed

the implementation schemes to map our algorithms into CUDA (Section 7)

In the Section 2 we give brief introduction of Sclera vein recognition In

the Section 8 we performed some experiments using the proposed system

In the Section 9 we draw some conclusions

22 BACKGROUND OF SCLERA VEIN RECOGNITION

221 OVERVIEW OF SCLERA VEIN RECOGNITION

A typical sclera vein recognition system includes sclera

segmentation feature enhancement feature extraction and feature

matching (Figure 1)

FIG

Sclera image segmentation is the first step in sclera vein recognition

Several methods have been designed for sclera segmentation Crihalmeanu

et al presented an semi-automated system for sclera segmentation They

used a clustering algorithm to classify the color eye images into three

clusters - sclera iris and background Later on Crihalmeanu and Ross

designed a segmentation approach based on a normalized sclera index

measure which includes coarse sclera segmentation pupil region

segmentation and fine sclera segmentation Zhou et al developed a skin

tone plus ldquowhite colorrdquo-based voting method for sclera segmentation in

color images and Otsursquos thresholding-based method for grayscale images

After sclera segmentation it is necessary to enhance and extract the sclera

features since the sclera vein patterns often lack contrast and are hard to

detect Zhou et al used a bank of multi-directional Gabor filters for

vascular pattern enhancement Derakhshani et al used contrast limited

adaptive histogram equalization (CLAHE) to enhance the green color plane

of the RGB image and a multi-scale region growing approach to identify

the sclera veins from the image background Crihalmeanu and Ross applied

a selective enhancement filter for blood vessels to extract features from the

green component in a color image In the feature matching step

Crihalmeanu and Ross proposed

three registration and matching approaches including Speed Up Robust

Features (SURF) which is based on interest-point detection minutiae

detection which is based on minutiae points on the vasculature structure

and direct correlation matching which relies on image registration Zhou et

al designed a line descriptor based feature registration and matching

method

The proposed sclera recognition consists of five steps which include

sclera segmentation vein pattern enhancement feature extraction feature

matching and matching decision Fig 2 shows the block diagram of sclera

recognition Two types of feature extraction are used in the proposed

method to achieve good accuracy for the identification The characteristics

that are elicited from the blood vessel structure seen in the sclera region are

Histogram of Oriented Gradients (HOG) and interpolation of Cartesian to

Polar conversion HOG is used to determine the gradient orientation and

edge orientations of vein pattern in the sclera region of an eye image To

become more computationally efficient the data of the image are converted

to the polar form It is mainly used for circular or quasi circular shape of

object These two characteristics are extracted from all the images in the

database and compared with the features of the query image whether the

person is correctly identified or not This procedure is done in the feature

matching step and ultimately makes the matching decision By using the

proposed feature extraction methods and matching techniques the human

identification is more accurate than the existing studies In the proposed

method two features of an image are drawn out

222 SCLERA SEGMENTATION

Sclera segmentation is the first step in the sclera recognition It lets

in three steps glare area detection sclera area estimation and iris and eyelid

detection and refinement Fig shows the steps of segmentation

FIG

Glare Area Detection Glare area means a small bright area near

pupil or iris This is the unwanted portion on the eye image Sobel filter is

applied to detect the glare area present in the iris or pupil Simply it runs

only for the grayscale image If the image is color then it needs a

conversion to grayscale image and after that apply it to the Sobel filter to

detect the glare area Fig 4 shows the result of the glare area detection

FIG

Sclera area estimation For the estimation of sclera area Otsursquos

thresholding method is applied The stairs of the sclera area detection are

selection of the area of interest (ROI) Otsursquos thresholding sclera area

detection Left and right sclera area is selected based on the iris boundaries

When the region of interest is selected then apply Otsursquos thresholding for

obtaining the potential sclera areas The correct left sclera area should be

placed in the right and center positions and correct right sclera area should

be placed in the left and center In this way non sclera areas are wiped out

223 IRIS AND EYELID REFINEMENT

The top and underside of the sclera regions are the limits of the

sclera area And then that upper eyelid lower eyelid and iris boundaries are

refined These altogether are the unwanted portion for recognition In order

to eliminate these effects refinement is done in the footstep of the detection

of sclera area Fig shows after the Otsursquos thresholding process and iris and

eyelid refinement to detect right sclera area In the same way the left sclera

area is detected using this method

FIG

In the segmentation process all images are not perfectly segmented

Hence feature extraction and matching are needed to reduce the

segmentation fault The vein patterns in the sclera area are not visible in the

segmentation process To get vein patterns more visible vein pattern

enhancement is to be performed

224 OCULAR SURFACE VASCULATURE

Human recognition using vascular patterns in the human body has

been studied in the context of fingers (Miura et al 2004) palm (Lin and

Fan 2004) and retina (Hill 1999) In the case of retinal biometrics an

especial optical device for imaging the back of the eyeball is needed (Hill

1999) Due to its perceived invasiveness and the required degree of subject

cooperation the use of retinal biometrics may not be acceptable to some

individuals The conjunctiva is a thin transparent and moist tissue that

covers the outer surface of the eye The part of the conjunctiva that covers

the inner lining of the eyelids is called palpebral conjunctiva and the part

that covers the outer surface of the eye is called ocular (or the bulbar)

conjunctiva which is the focus of this study The ocular conjunctiva is very

thin and clear thus the vasculature (including those of the episclera) is

easily visible through it The visible microcirculation of conjunctiva offers a

rich and complex network of veins and fine microcirculation (Fig 1) The

apparent complexity and specificity of these vascular patterns motivated us

to utilize them for personal identification (Derakhshani and Ross 2006)

FIG

We have found conjunctival vasculature to be a suitable biometric as it

conforms to the following criteria (Jain et al 2004)

UNIVERSALITY All normal living tissues including that of the

conjunctiva and episclera have vascular structure

UNIQUENESS Vasculature is created during embryonic vasculogenesis

Its detailed final structure is mostly stochastic and thus unique Even

though no comprehensive study on the uniqueness of vascular structures

has been conducted study of some targeted areas such as those of the eye

fundus confirm the uniqueness of such vascular patterns even between

identical twins (Simon and Goldstein 1935 Tower 1955)

PERMANENCE Other than cases of significant trauma pathology or

chemical intervention spontaneous adult ocular vasculogenesis and

angiogenesis does not easily occur Thus the conjunctival vascular

structure is expected to have reasonable permanence (Joussen 2001)

Practicality Conjunctival vasculature can be captured with commercial off

the shelf digital cameras under normal lighting conditions making this

modality highly practical

ACCEPTABILITY Since the subject is not required to stare directly into

the camera lens and given the possibility of capturing the conjunctival

vasculature from several feet away this modality is non-intrusive and thus

more acceptable

SPOOF-PROOFNESS The fine multi surface structure of the ocular

veins makes them hard to reproduce as a physical artifact Besides being a

stand-alone biometric modality we anticipate that the addition of

conjunctival biometrics will enhance the performance of current iris-based

biometric system in the following ways

Improving accuracy by the addition of vascular features

Facilitating recognition using off-angle iris images For instance if the iris

information is relegated to the left or right portions of the eye the sclera

vein patterns will be further exposed This feature makes sclera vasculature

a natural complement to the iris biometric

Addressing the failure-to-enroll issue when iris patterns are not usable (eg

due to surgical procedures)

Reducing vulnerability to spoof attacks For instance when implemented

alongside iris systems an attacker needs to reproduce not only the iris but

also different surfaces of the sclera along with the associated

microcirculation and make them available on commensurate eye surfaces

The first step in parallelizing an algorithm is to determine the

availability for simultaneous computation Below Figure demonstrates the

possibility for parallel directional filtering Since the filter is computed over

different portions of the input image the computation can be computed in

parallel (denoted by Elements below) In addition individual parallelization

of each element of Filtering can also be performed A detailed discussion of

our proposed parallelization is outside the scope of this paper

FIG

FIG

225 OVERVIEW OF THE LINE DESCRIPTOR-BASED SCLERA

VEIN

2251 RECOGNITION METHOD

The matching segment of the line-descriptor based method is a

bottleneck with regard to matching speed In this section we briefly

describe the Line Descriptor-based sclera vein recognition method After

segmentation vein patterns were enhanced by a bank of directional Gabor

filters Binary morphological operations are used to thin the detected vein

structure down to a single pixel wide skeleton and remove the branch

points The line descriptor is used to describe the segments in the vein

structure Figure 2 shows a visual description of the line descriptor Each

segment is described by three quantities the segments angle to some

reference angle at the iris center θ the segments distance to the iris center r

and the dominant angular orientation of the line segment ɸ Thus the

descriptor is S = ( θ r ɸ )T The individual components of the line descriptor

are calculated as

FIG

Here fline (x) is the polynomial approximation of the line segment (xl yl )

is the center point of the line segment (xi yi ) is the center of the detected

iris and S is the line descriptor In order to register the segments of the

vascular patterns a RANSAC-based algorithm is used to estimate the best-

fit parameters for registration between the two sclera vascular patterns For

the registration algorithm it randomly chooses two points ndash one from the

test template and one from the target template It also randomly chooses a

scaling factor and a rotation value based on a priori knowledge of the

database Using these values it calculates a fitness value for the registration

using these parameters

After sclera template registration each line segment in the test

template is compared to the line segments in the target template for

matches In order to reduce the effect of segmentation errors we created the

weighting image (Figure 3) from the sclera mask by setting interior pixels

in the sclera mask to 1 pixels within some distance of the boundary of the

mask to 05 and pixels outside the mask to 0

The matching score for two segment descriptors is calculated By

where Si and Sj are two segment descriptors m(Si Sj ) is the matching

score between segments Si and Sj d(Si Sj ) is the Euclidean distance

between the segment descriptors center points (from Eq 6-8) Dmatch is

the matching distance threshold and match is the matching angle threshold

The total matching score M is the sum of the individual matching scores

divided by the maximum matching score for the minimal set between the

test and target template That is one of the test or target templates has fewer

points and thus the sum of its descriptors weight sets the maximum score

that can be attained

FIG

FIG

FIG

FIG

movement of eye Y shape branches are observed to be a stable feature and

can be used as sclera feature descriptor To detect the Y shape branches in

the original template we search for the nearest neighbors set of every line

segment in a regular distance classified the angles among these neighbors

If there were two types of angle values in the line segment set this set may

be inferred as a Y shape structure and the line segment angles would be

recorded as a new feature of the sclera

There are two ways to measure both orientation and relationship of

every branch of Y shape vessels one is to use the angles of every branch to

x axle the other is to use the angels between branch and iris radial

direction The first method needs additional rotation operating to align the

template In our approach we employed the second method As Figure 6

shows ϕ1 ϕ2 and ϕ3 denote the angle between each branch and the radius

from pupil center Even when the head tilts the eye moves or the camera

zooms occurs at the image acquisition step ϕ1 ϕ2 and ϕ3 are quite stable

To tolerate errors from the pupil center calculation in the segmentation step

we also recorded the center position (x y) of the Y shape branches as

auxiliary parameters So in our rotation shift and scale invariant feature

vector is defined as y(ϕ1 ϕ2 ϕ3 x y) The Y-shape descriptor is generated

with reference to the iris center Therefore it is automatically aligned to the

iris centers It is a rotational- and scale- invariant descriptor V WPL

SCLERA DESCRIPTOR As we discussed in the Section 22 the line

descriptor is extracted from the skeleton of vessel structure in binary images

(Figure 7) The skeleton is then broken into smaller segments For each

segment a line descriptor is created to record the center and orientation of

the segment This descriptor is expressed as s(x yɸ) where (x y) is the

position of the center and ɸ is its orientation Because of the limitation of

segmentation accuracy the descriptor in the boundary of sclera area might

not be accurate and may contain spur edges resulting from the iris eyelid

andor eyelashes To be tolerant of such error the mask file

FIG

The line descriptor of the sclera vessel pattern (a) An eye image (b) Vessel

patterns in sclera (c) Enhanced sclera vessel patterns (d) Centers of line

segments of vessel patterns

Is designed to indicate whether a line segment belongs to the edge of the

sclera or not However in GPU application using the mask is a challenging

since the mask files are large in size and will occupy the GPU memory and

slow down the data transfer When matching the registration RANSAC

type algorithm was used to randomly select the corresponding descriptors

and the transform parameter between them was used to generate the

template transform affine matrix After every templates transform the mask

data should also be transformed and new boundary should be calculated to

evaluate the weight of the transformed descriptor This results in too many

convolutions in processor unit

To reduce heavy data transfer and computation we designed the

weighted polar line (WPL) descriptor structure which includes the

information of mask and can be automatically aligned We extracted the

relationship of geometric feature of descriptors and store them as a new

descriptor We use a weighted image created via setting various weight

values according to their positions The weight of those descriptors who are

beyond the sclera are set to be 0 and those who are near the sclera

boundary are 05 and interior descriptors are set to be 1 In our work

descriptors weights were calculated on their own mask by the CPU only

once

The calculating result was saved as a component of descriptor The

descriptor of sclera will change to s(x y ɸw) where w denotes the weight

of the point and the value may be 0 05 1 To align two templates when a

template is shifted to another location along the line connecting their

centers all the descriptors of that template will be transformed It would be

faster if two templates have similar reference points If we use the center of

the iris as the reference point when two templates are compared the

correspondence will automatically be aligned to each other since they have

the similar reference point Every feature vector of the template is a set of

line segment descriptors composed of three variable (Figure 8) the

segment angle to the reference line which went through the iris center

denoted as θ the distance between the segments center and pupil center

which is denoted as r the dominant angular orientation of the segment

denoted as ɸ To minimize the GPU computing we also convert the

descriptor value from polar coordinate to rectangular coordinate in CPU

preprocess

The descriptor vector becomes s(x y r θ ɸw) The left and right

parts of sclera in an eye may have different registration parameters For

example as an eyeball moves left left part sclera patterns of the eye may be

compressed while the right part sclera patterns are stretched

In parallel matching these two parts are assigned to threads in

different warps to allow different deformation The multiprocessor in

CUDA manages threads in groups of 32 parallel threads called warps We

reorganized the descriptor from same sides and saved

FIG

FIG

them in continuous address This would meet requirement of coalesced

memory access in GPU

After reorganizing the structure of descriptors and adding mask information

into the new descriptor the computation on the mask file is not needed on

the GPU It was very fast to match with this feature because it does not

need to reregister the templates every time after shifting Thus the cost of

data transfer and computation on GPU will be reduced Matching on the

new descriptor the shift parameters generator in Figure 4 is then simplified

as Figure 9

23 EVOLUTION OF GPU ARCHITECTURE

The fixed-function pipeline lacked the generality to efficiently express

more complicated shading and lighting operations that are essential for

complex effects The key step was replacing the fixed-function per-vertex

and per-fragment operations with user-specified programs run on each

vertex and fragment Over the past six years these vertex programs and

fragment programs have become increasingly more capable with larger

limits on their size and resource consumption with more fully featured

instruction sets and with more flexible control-flow operations After many

years of separate instruction sets for vertex and fragment operations current

GPUs support the unified Shader Model 40 on both vertex and fragment

shaders

The hardware must support shader programs of at least 65 k static

instructions and unlimited dynamic instructions

The instruction set for the first time supports both 32-bit integers and 32-

bit floating-point numbers

The hardware must allow an arbitrary number of both direct and indirect

reads from global memory (texture)

Finally dynamic flow control in the form of loops and branches must be

supported

As the shader model has evolved and become more powerful and GPU

applications of all types have increased vertex and fragment program

complexity GPU architectures have increasingly focused on the

programmable parts of the graphics pipeline Indeed while previous

generations of GPUs could best be described as additions of

programmability to a fixed-function pipeline todayrsquos GPUs are better

characterized as a programmable engine surrounded by supporting fixed-

function units General-Purpose Computing on the GPU Mapping general-

purpose computation onto the GPU uses the graphics hardware in much the

same way as any standard graphics application Because of this similarity it

is both easier and more difficult to explain the process On one hand the

actual operations are the same and are easy to follow on the other hand the

terminology is different between graphics and general-purpose use Harris

provides an excellent description of this mapping process

We begin by describing GPU programming using graphics terminology

then show how the same steps are used in a general-purpose way to author

GPGPU applications and finally use the same steps to show the more

simple and direct way that todayrsquos GPU computing applications are written

231 PROGRAMMING A GPU FOR GRAPHICS

We begin with the same GPU pipeline that we described in Section II

concentrating on the programmable aspects of this pipeline

The programmer specifies geometry that covers a region on the screen

The rasterizer generates a fragment at each pixel location covered by that

geometry

Each fragment is shaded by the fragment program

The fragment program computes the value of the fragment by a

combination of math operations and global memory reads from a global

Btexture[ memory

The resulting image can then be used as texture on future passes through

the graphics pipeline

232 PROGRAMMING A GPU FOR GENERAL-PURPOSE

PROGRAMS (OLD)

Coopting this pipeline to perform general-purpose computation

involves the exact same steps but different terminology A motivating

example is a fluid simulation computed over a grid at each time step we

compute the next state of the fluid for each grid point from the current state

at its grid point and at the grid points of its neighbors

The programmer specifies a geometric primitive that covers a

computation domain of interest The rasterizer generates a fragment at each

pixel location covered by that geometry (In our example our primitive

must cover a grid of fragments equal to the domain size of our fluid

simulation)

Each fragment is shaded by an SPMD general-purpose fragment

program (Each grid point runs the same program to update the state of its

fluid)

The fragment program computes the value of the fragment by a

combination of math operations and Bgather[ accesses from global

memory (Each grid point can access the state of its neighbors from the

previous time step in computing its current value)

The resulting buffer in global memory can then be used as an input on

future passes (The current state of the fluid will be used on the next time

step)

23 2 PROGRAMMING A GPU FOR GENERAL-PURPOSE

PROGRAMS (NEW)

One of the historical difficulties in programming GPGPU applications

has been that despite their general-purpose tasksrsquo having nothing to do with

graphics the applications still had to be programmed using graphics APIs

In addition the program had to be structured in terms of the graphics

pipeline with the programmable units only accessible as an intermediate

step in that pipeline when the programmer would almost certainly prefer to

access the programmable units directly The programming environments we

describe in detail in Section IV are solving this difficulty by providing a

more natural direct non-graphics interface to the hardware and

specifically the programmable units Today GPU computing applications

are structured in the following way

The programmer directly defines the computation domain of interest as a

structured grid of threads

An SPMD general-purpose program computes the value of each thread

The value for each thread is computed by a combination of math

operations and both Bgather[ (read) accesses from and Bscatter[ (write)

accesses to global memory Unlike in the previous two

methods the same buffer can be used for both reading and writing

allowing more flexible algorithms (for example in-place algorithms that

use less memory)

The resulting buffer in global memory can then be used as an input in

future computation

24 COARSE-TO-FINE TWO-STAGE MATCHING PROCESS

To further improve the matching process we propose the coarse-to-fine

two-stage matching process In the first stage we matched two images

coarsely using the Y-shape descriptors which is very fast to match because

no registration was needed The matching result in this stage can help filter

out image pairs with low similarities After this step it is still possible for

some false positive matches In the second stage we used WPL descriptor

to register the two images for more detailed descriptor matching including

scale- and translation invariance This stage includes shift transform affine

matrix generation and final WPL descriptor matching Overall we

partitioned the registration and matching processing into four kernels2 in

CUDA (Figure 10) matching on the Y shape descriptor shift

transformation affine matrix generation and final WSL descriptor

matching Combining these two stages the matching program can run faster

and

achieve more accurate score

241 STAGE I MATCHING WITH Y SHAPE DESCRIPTOR

Due to scale- and rotation- invariance of the Y-shape features

registration is unnecessary before matching on Y shape descriptor The

whole matching algorithm is listed as algorithm 1

FIG

Here ytei and yta j are the Y shape descriptors of test template Tte

and target template Tta respectively dϕ is the Euclidian distance of angle

element of descriptors vector defined as (3) dxy is the Euclidian distance of

two descriptor centers defined as (4) ni and di are the matched descriptor

pairsrsquo number and their centers distance respectively tϕ is a distance

threshold and txy is the threshold to restrict the searching area We set tϕ to

30 and txy to 675 in our experiment Here

To match two sclera templates we searched the areas nearby to all

the Y shape branches The search area is limited to the corresponding left or

right half of the sclera in order to reduce the searching range and time The

distance of two branches is defined in (3) where ϕi j is the angle between

the j th branch and the polar from pupil center in desctiptor i

The number of matched pairs ni and the distance between Y shape

branches centers di are stored as the matching result We fuse the number of

matched branches and the average distance between matched branches

centers as (2) Here α is a factor to fuse the matching score which was set

to 30 in our study Ni and Nj is the total numbers of feature vectors in

template i and j separately The decision is regulated by the threshold t if

the sclerarsquos matching score is lower than t the sclera will be discarded The

sclera with high matching score will be passed to the next more precisely

matching process

242 STAGE II FINE MATCHING USING WPL DESCRIPTOR

The line segment WSL descriptor reveals more vessel structure detail of

sclera than the Y shape descriptor The variation of sclera vessel pattern is

nonlinear because

When acquiring an eye image in different gaze angle the vessel structure

will appear nonlinear shrink or extend because eyeball is spherical in shape

sclera is made up of four layers episclera stroma lamina fusca and

endothelium There are slightly differences among movement of these

layers Considering these factors our registration employed both single

shift transform and multi-parameter transform which combines shift

rotation and scale together

1) SHIFT PARAMETER SEARCH As we discussed before

segmentation may not be accurate As a result the detected iris center could

not be very accurate Shift transform is designed to tolerant possible errors

in pupil center detection in the segmentation step If there is no deformation

or only very minor deformation registration with shift transform together

would be adequate to achieve an accurate result We designed Algorithm 2

to get optimized shift parameter Where Tte is the test template and ssei is

the i th WPL descriptor of Tte Tta is the target template and ssai is the i th

WPL descriptor of Ttad(stek staj ) is Euclidean distance of escriptors stek

and staj

Δsk is the shift value of two descriptors defines as

We first randomly select an equal number of segment descriptors

stek in test template Tte from each quad and find its nearest neighbors staj _

in target template Tta The shift offset of them is recorded as the possible

registration shift factor _sk The final offset registration factor is _soptim

which has the smallest standard deviation among these candidate offsets

2) AFFINE TRANSFORM PARAMETER SEARCH

Affine transform is designed to tolerant some deformation of sclera

patterns in the matching step The affine transform algorithm is shown in

Algorithm 3 The shift value in the parameter set is obtained by randomly

selecting descriptor s(it )te and calculating the distance from its nearest

neighbor staj_ in Tta We transform the test template by the matrix in (7)

At end of the iteration we count the numbers of matched descriptor pairs

from the transformed template and the target template The factor β is

involved to determine if the pair of descriptor is matched and we set it to

be 20 pixels in our experiment After N iterations the optimized transform

parameter set is determined via selecting the maximum matching numbers

m(it) Here stei Tte staj and Tta is defined same as algorithm 2 tr (it )

shi f t θ(it )tr (it ) scale is the parameters of shift rotation and scale

transform generated in i tth iteration R(θ (it )) T (tr (it ) shi f t ) and S(tr (it

) scale) are the transform matrix defined as (7) To search optimize

transform parameter we iterated N times to generate these parameters In

our experiment we set iteration time to 512

3) REGISTRATION AND MATCHING ALGORITHM

Using the optimized parameter set determined from Algorithms 2

and 3 the test template will be registered and matched simultaneously The

registration and matching algorithm is listed in Algorithm 4 Here stei Tte

staj and Tta are defined same as Algorithms 2 and 3 θ(optm) tr (optm) shi f

t tr (optm) scale _soptim are the registration parameters attained from

Algorithms 2 and 3 R_θ(optm)_T _tr (optm) shi f t _S_tr (optm) scale

_ is the descriptor transform matrix defined in Algorithm 3 empty is the angle

between the segment descriptor and radius direction w is the weight of the

descriptor which indicates whether the descriptor is at the edge of sclera or

not To ensure that the nearest descriptors have a similar orientation we

used a constant factor α to check the abstract difference of two ɸ In our

experiment we set α to 5 The total matching score is minimal score of two

transformed result divided by the minimal matching score for test template

and target template

25 MAPPING THE SUBTASKS TO CUDA

CUDA is a single instruction multiple data (SIMD) system and

works as a coprocessor with a CPU A CUDA consists of many streaming

multiprocessors (SM) where the parallel part of the program should be

partitioned into threads by the programmer and mapped into those threads

There are multiple memory spaces in the CUDA memory hierarchy

register local memory shared memory global memory constant memory

and texture memory Register local memory and shared memory are on-

chip and could be a little time consuming to access these memories Only

shared memory can be accessed by other threads within the same block

However there is only limited availability of shared memory Global

memory constant memory and texture memory are off-chip memory and

accessible by all threads which would be very time consuming to access

these memories

Constant memory and texture memory are read-only and cacheable

memory Mapping algorithms to CUDA to achieve efficient processing is

not a trivial task There are several challenges in CUDA programming

If threads in a warp have different control path all the branches will be

executed serially To improve performance branch divergence within a

warp should be avoided

Global memory is slower than on-chip memory in term of access To

completely hide the latency of the small instructions set we should use on-

chip memory preferentially rather than global memory When global

memory access occurs threads in same warp should access the words in

sequence to achieve coalescence

Shared memory is much faster than the local and global memory space

But shared memory is organized into banks which are equal in size If two

addresses of memory request from different thread within a warp fall in the

same memory bank the access will be serialized To get maximum

performance memory requests should be scheduled to minimize bank

conflicts

251 MAPPING ALGORITHM TO BLOCKS

Because the proposed registration and matching algorithm has four

independent modules all the modules will be converted to different kernels

on the GPU These kernels are different in computation density thus we

map them to the GPU by various map strategies to fully utilize the

computing power of CUDA Figure 11 shows our scheme of CPU-GPU

task distribution and the partition among blocks and threads Algorithm 1 is

partitioned into coarse-grained parallel subtasks

We create a number of threads in this kernel The number of threads

is the same as the number of templates in the database As the upper middle

column shows in Figure 11 each target template will be assigned to one

thread One thread performs a pair of templates compare In our work we

use NVIDIA C2070 as our GPU Threads and blocks number is set to

1024 That means we can match our test template with up to 1024times1024

target templates at same time

Algorithms 2-4 will be partitioned into fine-grained subtasks which is

processed a section of descriptors in one thread As the lower portion of the

middle column shows in Figure 11 we assigned a target template to one

block Inside a block one thread corresponds a set of descriptors in this

template This partition makes every block execute independently and there

are no data exchange requirements between different blocks When all

threads complete their responding descriptor fractions the sum of the

intermediate results needs to be computed or compared A parallel prefix

sum algorithm is used to calculate the sum of intermediate results which is

show in right of Figure 11 Firstly all odd number threads compute the sum

of consecutive pairs of the results Then recursively every first of i (= 4 8

16 32 64 ) threads

compute the prefix sum on the new result The final result will be saved in

the first address which has the same variable name as the first intermediate

result

252 MAPPING INSIDE BLOCK

In shift argument searching there are two schemes we can choose to

map task

Mapping one pair of templates to all the threads in a block and then every

thread would take charge of a fraction of descriptors and cooperation with

other threads

Assigning a single possible shift offset to a thread and all the threads will

compute independently unless the final result should be compared with

other possible offset

Due to great number of sum and synchronization operations in every

nearest neighbor searching step we choose the second method to parallelize

shift searching In affine matrix generator we mapped an entire parameter

set searching to a thread and every thread randomly generated a set of

parameters and tried them independently The generated iterations were

assigned to all threads The challenge of this step is the randomly generated

numbers might be correlated among threads In the step of rotation and

scale registration generating we used the Mersenne Twister pseudorandom

number generator because it can use bitwise arithmetic and have long

period

The Mersenne twister as most of pseudorandom generators is iterative

Therefore itrsquos hard to parallelize a single twister state update step among

several execution threads To make sure that thousands of threads in the

launch grid generate uncorrelated random sequence many simultaneous

Mersenne twisters need to process with different initial states in parallel

But even ldquovery differentrdquo (by any definition) initial state values do not

prevent the emission of correlated sequences by each generator sharing

identical parameters To solve this problem and to enable efficient

implementation of Mersenne Twister on parallel architectures we used a

special offline tool for the dynamic creation of Mersenne Twisters

parameters modified from the algorithm developed by Makoto Matsumoto

and Takuji Nishimura In the registration and matching step when

searching the nearest neighbor a line segment that has already matched

with others should not be used again In our approach a flag

FIG

FIG

Variable denoting whether the line has been matched is stored in

shared memory To share the flags all the threads in a block should wait

synchronic operation at every query step Our solution is to use a single

thread in a block to process the matching

253 MEMORY MANAGEMENT

The bandwidth inside GPU board is much higher than the

bandwidth between host memory and device memory The data transfer

between host and device can lead to long latency As shown in Figure 11

we load the entire target templates set from database without considering

when they would be processed Therefore there was no data transfer from

host to device during the matching procedure In global memory the

components in descriptor y(ϕ1 ϕ2 ϕ3 x y) and s(x y rθ ϕw) were stored

separately This would guarantee contiguous kernels of Algorithm 2 to 4

can access their data in successive addresses Although such coalescing

access reduces the latency frequently global memory access was still a

slower way to get data In our kernel we loaded the test template to shared

memory to accelerate memory access Because the Algorithms 2 to 4

execute different number of iterations on same data the bank conflict does

not happen To maximize our texture memory space we set the system

cache to the lowest value and bonded our target descriptor to texture

memory Using this catchable memory our data access was accelerated

more

FIG

26 HISTOGRAM OF ORIENTED GRADIENTS

Histogram of oriented gradients is the feature descriptors It is primarily

applied to the design of target detection In this paper it is applied as the

feature for human recognition In the sclera region the vein patterns are the

edges of an image So HOG is used to determine the gradient orientation

and edge orientations of vein pattern in the sclera region of an eye image

To follow out this technique first of all divide the image into small

connected regions called cells For each cell compute the histogram of

gradient directions or edge orientations of the pixels Then the combination

of different histogram of different cell represents the descriptor To improve

accuracy histograms can be contrast normalized by calculating the intensity

from the block and then using this value normalizes all cells within the

block This normalization result shows that it is invariant to geometric and

photometric changes The gradient magnitude m(x y) and orientation 1050592(x

y) are calculated using x and y directions gradients dx (x y) and dy (x y)

Orientation binning is the second step of HOG This method utilized

to create cell histograms Each pixel within the cell used to give a weight to

the orientation which is found in the gradient computation Gradient

magnitude is used as the weight The cells are in the rectangular form The

binning of gradient orientation should be spread over 0 to 180 degrees and

opposite direction counts as the same In the Fig 8 depicts the edge

orientation of picture elements If the images have any illumination and

contrast changes then the gradient strength must be locally normalized For

that cells are grouped together into larger blocks These blocks are

overlapping so that each cell contributes more than once to the final

descriptor Here rectangular HOG (R-HOG) blocks are applied which are

mainly in square grids The performance of HOG is improved by putting

on a Gaussian window into each block

FIG

CHAPTER 3

SOFTWARE SPECIFICATION

31 GENERAL

MATLAB(matrix laboratory) is a numerical

computing environment and fourth-generation programming language

Developed by Math Works MATLAB allows matrix manipulations

plotting of functions and data implementation of algorithms creation

of user interfaces and interfacing with programs written in other languages

including C C++ Java and Fortran

Although MATLAB is intended primarily for numerical computing an

optional toolbox uses the MuPAD symbolic engine allowing access

to symbolic computing capabilities An additional package Simulink adds

graphicalmulti-domainsimulationandModel-Based

Design for dynamic and embedded systems

In 2004 MATLAB had around one million users across industry

and academia MATLAB users come from various backgrounds

of engineering science and economics MATLAB is widely used in

academic and research institutions as well as industrial enterprises

MATLAB was first adopted by researchers and practitioners

in control engineering Littles specialty but quickly spread to many other

domains It is now also used in education in particular the teaching

of linear algebra and numerical analysis and is popular amongst scientists

involved in image processing The MATLAB application is built around the

MATLAB language The simplest way to execute MATLAB code is to type

it in the Command Window which is one of the elements of the MATLAB

Desktop When code is entered in the Command Window MATLAB can

be used as an interactive mathematical shell Sequences of commands can

be saved in a text file typically using the MATLAB Editor as a script or

encapsulated into a function extending the commands available

MATLAB provides a number of features for documenting and

sharing your work You can integrate your MATLAB code with other

languages and applications and distribute your MATLAB algorithms and

applications

32 FEATURES OF MATLAB

High-level language for technical computing

Development environment for managing code files and data

Interactive tools for iterative exploration design and problem solving

Mathematical functions for linear algebra statistics Fourier analysis

filtering optimization and numerical integration

2-D and 3-D graphics functions for visualizing data

Tools for building custom graphical user interfaces

Functions for integrating MATLAB based algorithms with external

applications and languages such as C C++ FORTRAN Javatrade COM

and Microsoft Excel

MATLAB is used in vast area including signal and image

processing communications control design test and measurement

financial modeling and analysis and computational Add-on toolboxes

(collections of special-purpose MATLAB functions) extend the MATLAB

environment to solve particular classes of problems in these application

areas

MATLAB can be used on personal computers and powerful

server systems including the Cheaha compute cluster With the addition of

the Parallel Computing Toolbox the language can be extended with parallel

implementations for common computational functions including for-loop

unrolling Additionally this toolbox supports offloading computationally

intensive workloads to Cheaha the campus compute cluster MATLAB is

one of a few languages in which each variable is a matrix (broadly

construed) and knows how big it is Moreover the fundamental operators

(eg addition multiplication) are programmed to deal with matrices when

required And the MATLAB environment handles much of the bothersome

housekeeping that makes all this possible Since so many of the procedures

required for Macro-Investment Analysis involves matrices MATLAB

proves to be an extremely efficient language for both communication and

implementation

321 INTERFACING WITH OTHER LANGUAGES

MATLAB can call functions and subroutines written in the C

programming language or FORTRAN A wrapper function is created

allowing MATLAB data types to be passed and returned The dynamically

loadable object files created by compiling such functions are termed MEX-

files (for MATLAB executable)

Libraries written in Java ActiveX or NET can be directly called

from MATLAB and many MATLAB libraries (for

example XML or SQL support) are implemented as wrappers around Java

or ActiveX libraries Calling MATLAB from Java is more complicated but

can be done with MATLAB extension which is sold separately by Math

Works or using an undocumented mechanism called JMI (Java-to-Mat lab

Interface) which should not be confused with the unrelated Java that is also

called JMI

As alternatives to the MuPAD based Symbolic Math Toolbox

available from Math Works MATLAB can be connected

to Maple or Mathematical

Libraries also exist to import and export MathML

Development Environment

Startup Accelerator for faster MATLAB startup on Windows especially on

Windows XP and for network installations

Spreadsheet Import Tool that provides more options for selecting and

loading mixed textual and numeric data

Readability and navigation improvements to warning and error messages in

the MATLAB command window

Automatic variable and function renaming in the MATLAB Editor

Developing Algorithms and Applications

MATLAB provides a high-level language and development

tools that let you quickly develop and analyze your algorithms and

applications

The MATLAB Language

The MATLAB language supports the vector and matrix operations

that are fundamental to engineering and scientific problems It enables fast

development and execution With the MATLAB language you can

program and develop algorithms faster than with traditional languages

because you do not need to perform low-level administrative tasks such as

declaring variables specifying data types and allocating memory In many

cases MATLAB eliminates the need for lsquoforrsquo loops As a result one line of

MATLAB code can often replace several lines of C or C++ code

At the same time MATLAB provides all the features of a traditional

programming language including arithmetic operators flow control data

structures data types object-oriented programming (OOP) and debugging

features

MATLAB lets you execute commands or groups of commands one

at a time without compiling and linking enabling you to quickly iterate to

the optimal solution For fast execution of heavy matrix and vector

computations MATLAB uses processor-optimized libraries For general-

purpose scalar computations MATLAB generates machine-code

instructions using its JIT (Just-In-Time) compilation technology

This technology which is available on most platforms provides

execution speeds that rival those of traditional programming languages

Development Tools

MATLAB includes development tools that help you implement

your algorithm efficiently These include the following

MATLAB Editor

Provides standard editing and debugging features such as setting

breakpoints and single stepping

Code Analyzer

Checks your code for problems and recommends modifications to

maximize performance and maintainability

MATLAB Profiler

Records the time spent executing each line of code

Directory Reports

Scan all the files in a directory and report on code efficiency file

differences file dependencies and code coverage

Designing Graphical User Interfaces

By using the interactive tool GUIDE (Graphical User Interface

Development Environment) to layout design and edit user interfaces

GUIDE lets you include list boxes pull-down menus push buttons radio

buttons and sliders as well as MATLAB plots and Microsoft

ActiveXreg controls Alternatively you can create GUIs programmatically

using MATLAB functions

322 ANALYZING AND ACCESSING DATA

MATLAB supports the entire data analysis process from acquiring

data from external devices and databases through preprocessing

visualization and numerical analysis to producing presentation-quality

output

Data Analysis

MATLAB provides interactive tools and command-line functions for data

analysis operations including

Interpolating and decimating

Extracting sections of data scaling and averaging

Thresholding and smoothing

Correlation Fourier analysis and filtering

1-D peak valley and zero finding

Basic statistics and curve fitting

Matrix analysis

Data Access

MATLAB is an efficient platform for accessing data from

files other applications databases and external devices You can read data

from popular file formats such as Microsoft Excel ASCII text or binary

files image sound and video files and scientific files such as HDF and

HDF5 Low-level binary file IO functions let you work with data files in

any format Additional functions let you read data from Web pages and

XML

Visualizing Data

All the graphics features that are required to visualize engineering

and scientific data are available in MATLAB These include 2-D and 3-D

plotting functions 3-D volume visualization functions tools for

interactively creating plots and the ability to export results to all popular

graphics formats You can customize plots by adding multiple axes

changing line colors and markers adding annotation Latex equations and

legends and drawing shapes

2-D Plotting

Visualizing vectors of data with 2-D plotting functions that create

Line area bar and pie charts

Direction and velocity plots

Histograms

Polygons and surfaces

Scatterbubble plots

Animations

3-D Plotting and Volume Visualization

MATLAB provides functions for visualizing 2-D matrices 3-

D scalar and 3-D vector data You can use these functions to visualize and

understand large often complex multidimensional data Specifying plot

characteristics such as camera viewing angle perspective lighting effect

light source locations and transparency

3-D plotting functions include

Surface contour and mesh

Image plots

Cone slice stream and isosurface

323 PERFORMING NUMERIC COMPUTATION

MATLAB contains mathematical statistical and engineering

functions to support all common engineering and science operations These

functions developed by experts in mathematics are the foundation of the

MATLAB language The core math functions use the LAPACK and BLAS

linear algebra subroutine libraries and the FFTW Discrete Fourier

Transform library Because these processor-dependent libraries are

optimized to the different platforms that MATLAB supports they execute

faster than the equivalent C or C++ code

MATLAB provides the following types of functions for performing

mathematical operations and analyzing data

Matrix manipulation and linear algebra

Polynomials and interpolation

Fourier analysis and filtering

Data analysis and statistics

Optimization and numerical integration

Ordinary differential equations (ODEs)

Partial differential equations (PDEs)

Sparse matrix operations

MATLAB can perform arithmetic on a wide range of data types

including doubles singles and integers

CHAPTER 4

IMPLEMENTATION

41 GENERAL

Matlab is a program that was originally designed to simplify the

implementation of numerical linear algebra routines It has since grown into

something much bigger and it is used to implement numerical algorithms

for a wide range of applications The basic language used is very similar to

standard linear algebra notation but there are a few extensions that will

likely cause you some problems at first

42 SNAPSHOTS

ORIGINAL SCLERA IMAGE IS CONVERTED INTO GREY SCALE IMAGE

FIG

GREY SCALE IMAGE IS CONVERTED INTO BINARY IMAGE

FIG

EDGE DETECTON IS DONE BY OTSUrsquoS THRESHOLDING

FIG

SELECTING THE REGION OF INTEREST (SCLERA PART)

FIG

SELECTED ROI PART

FIG

FIG

ENHANCEMENT OF SCLERA IMAGE

FIG

FEATURE EXTRACTION OF SCLERA IMAGE USING GABOR

FILTERS

FIG

MATCHING WITH IMAGES IN DATABASE

FIG

DISPLAYING THE RESULT (MATCHED OR NOT MATCHED)

FIG

CHAPTER 5

APPLICATIONS

The applications of biometrics can be divided into the following three main groups

Commercial applications such as computer network login electronic data security ecommerce Internet access ATM credit card physical access control cellular phone PDA medical records management distance learning etc

Government applications such as national ID card correctional facility driverrsquos license social security welfare-disbursement border control Passport control etc

Forensic applications such as corpse identification criminal investigation terrorist identification parenthood determination missing children etc Traditionally commercial applications have used knowledge-based systems (eg PIN sand passwords) government applications have used token-based systems (eg ID cards and badges) and forensic applications have relied on human experts to match biometric features Biometric systems are being increasingly deployed in large scale civilian applications The Schiphol Premium scheme at the Amsterdam airport for example employs iris scan cards to speed up the passport and visa control procedures

CHAPTER 6

CONCLUSION AND FUTURE SCOPE

61 CONCLUSION

In this paper we proposed a new parallel sclera vein recognition

method which employees a two stage parallel approach for registration and

matching Even though the research focused on developing a parallel sclera

matching solution for the sequential line-descriptor method using CUDA

GPU architecture the parallel strategies developed in this research can be

applied to design parallel solutions to other sclera vein recognition methods

and general pattern recognition methods We designed the Y shape

descriptor to narrow the search range to increase the matching efficiency

which is a new feature extraction method to take advantage of the GPU

structures We developed the WPL descriptor to incorporate mask

information and make it more suitable for parallel computing which can

dramatically reduce data transferring and computation We then carefully

mapped our algorithms to GPU threads and blocks which is an important

step to achieve parallel computation efficiency using a GPU A work flow

which has high arithmetic intensity to hide the memory access latency was

designed to partition the computation task to the heterogeneous system of

CPU and GPU even to the threads in GPU The proposed method

dramatically improves the matching efficiency without compromising

recognition accuracy

62 REFERENCES

[1] C W Oyster The Human Eye Structure and Function Sunderland

Sinauer Associates 1999

[2] C Cuevas D Berjon F Moran and N Garcia ldquoMoving object

detection for real-time augmented reality applications in a GPGPUrdquo IEEE

Trans Consum Electron vol 58 no 1 pp 117ndash125 Feb 2012

[3] D C Cirean U Meier L M Gambardella and J Schmidhuber ldquoDeep

big simple neural nets for handwritten digit recognitionrdquo Neural Comput

vol 22 no 12 pp 3207ndash3220 2010

[4] F Z Sakr M Taher and A M Wahba ldquoHigh performance iris

recognition system on GPUrdquo in Proc ICCES 2011 pp 237ndash242

[5] G Poli J H Saito J F Mari and M R Zorzan ldquoProcessing

neocognitron of face recognition on high performance environment based

on GPU with CUDA architecturerdquo in Proc 20th Int Symp Comput

Archit High Perform Comput 2008 pp 81ndash88

[6] J Antikainen J Havel R Josth A Herout P Zemcik and M Hauta-

Kasari ldquoNonnegative tensor factorization accelerated using GPGPUrdquo IEEE

Trans Parallel Distrib Syst vol 22 no 7 pp 1135ndash1141 Feb 2011

[7] K-S Oh and K Jung ldquoGPU implementation of neural networksrdquo

Pattern Recognit vol 37 no 6 pp 1311ndash1314 2004

[8] P R Dixon T Oonishi and S Furui ldquoHarnessing graphics processors

for the fast computation of acoustic likelihoods in speech recognitionrdquo

Comput Speech Lang vol 23 no 4 pp 510ndash526 2009

[9] P Kaufman and A Alm ldquoClinical applicationrdquo Adlerrsquos Physiology of

the Eye 2003

[10] R N Rakvic B J Ulis R P Broussard R W Ives and N Steiner

ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics Security vol 4

no 4 pp 812ndash823 Dec 2009

[11] S Crihalmeanu and A Ross ldquoMultispectral scleral patterns for ocular

biometric recognitionrdquo Pattern Recognit Lett vol 33 no 14 pp 1860ndash

1869 Oct 2012

[12] W Wenying Z Dongming Z Yongdong L Jintao and G

Xiaoguang ldquoRobust spatial matching for object retrieval and its parallel

implementation on GPUrdquo IEEE Trans Multimedia vol 13 no 6 pp

1308ndash1318 Dec 2011Multimedia Sec Magdeburg Germany Sep 2004

pp 4ndash15

[13] Y Xu S Deka and R Righetti ldquoA hybrid CPU-GPGPU approach for

real-time elastographyrdquo IEEE Trans Ultrason Ferroelectr Freq Control

vol 58 no 12 pp 2631ndash2645 Dec 2011

[14] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive

multimodal eye recognitionrdquo Signal Image Video Process vol 7 no 4

pp 619ndash631 Jul 2013

[15] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive

approach for sclera image quality measurerdquo Int J Biometrics vol 5 no 2

pp 181ndash198 2013

[16] Z Zhou E Y Du N L Thomas and E J Delp ldquoA new human

identification method Sclera recognitionrdquo IEEE Trans Syst Man

Cybern A Syst Humans vol 42 no 3 pp 571ndash583 May 2012

Page 14: Major project report on

regions Thus one part of an image (region) might be processed to suppress

motion blur while another part might be processed to improve colour

rendition

Most usually image processing systems require that the images be

available in digitized form that is arrays of finite length binary words For

digitization the given Image is sampled on a discrete grid and each sample

or pixel is quantized using a finite number of bits The digitized image is

processed by a computer To display a digital image it is first converted

into analog signal which is scanned onto a display Closely related to

image processing are computer graphics and computer vision In computer

graphics images are manually made from physical models of objects

environments and lighting instead of being acquired (via imaging devices

such as cameras) from natural scenes as in most animated movies

Computer vision on the other hand is often considered high-level image

processing out of which a machinecomputersoftware intends to decipher

the physical contents of an image or a sequence of images (eg videos or

3D full-body magnetic resonance scans)

In modern sciences and technologies images also gain much

broader scopes due to the ever growing importance of scientific

visualization (of often large-scale complex scientificexperimental data)

Examples include microarray data in genetic research or real-time multi-

asset portfolio trading in finance Before going to processing an image it is

converted into a digital form Digitization includes sampling of image and

quantization of sampled values After converting the image into bit

information processing is performed This processing technique may be

Image enhancement Image restoration and Image compression

122 IMAGE ENHANCEMENT

It refers to accentuation or sharpening of image features such as

boundaries or contrast to make a graphic display more useful for display amp

analysis This process does not increase the inherent information content in

data It includes gray level amp contrast manipulation noise reduction edge

crispening and sharpening filtering interpolation and magnification

pseudo coloring and so on

123 IMAGE RESTORATION

It is concerned with filtering the observed image to minimize the

effect of degradations Effectiveness of image restoration depends on the

extent and accuracy of the knowledge of degradation process as well as on

filter design Image restoration differs from image enhancement in that the

latter is concerned with more extraction or accentuation of image features

124 IMAGE COMPRESSION

It is concerned with minimizing the number of bits required to represent

an image Application of compression are in broadcast TV remote sensing

via satellite military communication via aircraft radar teleconferencing

facsimile transmission for educational amp business documents medical

images that arise in Computer tomography magnetic resonance imaging

and digital radiology motion pictures satellite images weather maps

geological surveys and so on

Text compression ndash CCITT GROUP3 amp GROUP4

Still image compression ndash JPEG

Video image compression ndash MPEG

125 SEGMENTATION

In computer vision image segmentation is the process of

partitioning a digital image into multiple segments (sets of pixels also

known as super pixels) The goal of segmentation is to simplify andor

change the representation of an image into something that is more

meaningful and easier to analyze Image segmentation is typically used to

locate objects and boundaries (lines curves etc) in images More precisely

image segmentation is the process of assigning a label to every pixel in an

image such that pixels with the same label share certain visual

characteristics

The result of image segmentation is a set of segments that

collectively cover the entire image or a set of contours extracted from the

image (see edge detection) Each of the pixels in a region are similar with

respect to some characteristic or computed property such as

colour intensity or texture Adjacent regions are significantly different

with respect to the same characteristic(s) When applied to a stack of

images typical in medical imaging the resulting contours after image

segmentation can be used to create 3D reconstructions with the help of

interpolation algorithms like marching cubes

126 IMAGE RESTORATION

Image restoration like enhancement improves the qualities of image

but all the operations are mainly based on known measured or

degradations of the original image Image restorations are used to restore

images with problems such as geometric distortion improper focus

repetitive noise and camera motion It is used to correct images for known

degradations

127 FUNDAMENTAL STEPS

Image acquisition to acquire a digital image

Image preprocessing to improve the image in ways that increases the

chances for success of the other processes

Image segmentation to partitions an input image into its constituent parts or

objects

Image representation to convert the input data to a form suitable for

computer processing

Image description to extract features that result in some quantitative

information of interest or features that are basic for differentiating one

class of objects from another

Image recognition to assign a label to an object based on the

information provided by its descriptors

Image interpretation to assign meaning to an ensemble of recognized

objects

Knowledge about a problem domain is coded into an image processing

system in the form of a Knowledge database

13 A SIMPLE IMAGE MODEL

To be suitable for computer processing an image f(xy) must be digitalized

both spatially and in amplitude

Digitization of the spatial coordinates (xy) is called image sampling

Amplitude digitization is called gray-level quantization

The storage and processing requirements increase rapidly with the spatial

resolution and the number of gray levels

Example A 256 gray-level image of size 256x256 occupies 64K bytes of

memory

Images of very low spatial resolution produce a checkerboard effect

The use of insufficient number of gray levels in smooth areas of a digital

image results in false contouring

14 IMAGE FILE FORMATS

There are two general groups of lsquoimagesrsquo vector graphics (or line art)

and bitmaps (pixel-based or lsquoimagesrsquo) Some of the most common file

formats are

GIF mdash Graphical interchange Format An 8-bit (256 colour) non-

destructively compressed bitmap format Mostly used for web Has several

sub-standards one of which is the animated GIF

JPEG mdash Joint Photographic Experts Group a very efficient (ie much

information per byte) destructively compressed 24 bit (16 million colours)

bitmap format Widely used especially for web and Internet (bandwidth-

limited)

TIFF mdash Tagged Image File Format The standard 24 bit publication bitmap

format Compresses non-destructively with for instance Lempel-Ziv-

Welch (LZW) compression

PS mdash Postscript a standard vector format Has numerous sub-standards

and can be difficult to transport across platforms and operating systems

PSD ndash Adobe PhotoShop Document a dedicated Photoshop format that

keeps all the information in an image including all the layers

BMP- bit map file format

15 TYPE OF IMAGES

Images are 4 types

1 Binary image

2 Gray scale image

3 Color image

4 Indexed image

151 BINARY IMAGES

A binary image is a digital image that has only two possible values for

each pixel Typically the two colors used for a binary image are black and

white though any two colors can be used Binary images are also called bi-

level or two-level This means that each pixel is stored as a single bitmdashie

a 0 or 1 The names black-and-white BampW

152 GRAY SCALE IMAGE

In a (8-bit) grayscale image each picture element has an assigned intensity

that ranges from 0 to 255 A grey scale image is what people normally call

a black and white image but the name emphasizes that such an image will

also include many shades of grey

FIG

153 COLOR IMAGE

The RGB colour model relates very closely to the way we perceive

colour with the r g and b receptors in our retinas RGB uses additive colour

mixing and is the basic colour model used in television or any other

medium that projects colour with light It is the basic colour model used in

computers and for web graphics but it cannot be used for print production

The secondary colours of RGB ndash cyan magenta and yellow ndash are formed

by mixing two of the primary colours (red green or blue) and excluding the

third colour Red and green combine to make yellow green and blue to

make cyan and blue and red form magenta The combination of red green

and blue in full intensity makes white

In Photoshop using the ldquoscreenrdquo mode for the different layers in an

image will make the intensities mix together according to the additive

colour mixing model This is analogous to stacking slide images on top of

each other and shining light through them

FIG

CMYK The 4-colour CMYK model used in printing lays down

overlapping layers of varying percentages of transparent cyan (C) magenta

(M) and yellow (Y) inks In addition a layer of black (K) ink can be added

The CMYK model uses the subtractive colour model

154 INDEXED IMAGE

FIG

An indexed image consists of an array and a color map matrix The

pixel values in the array are direct indices into a color map By convention

this documentation uses the variable name X to refer to the array and map

to refer to the color map In computing indexed color is a technique to

manage digital images colors in a limited fashion in order to save

computer memory and file storage while speeding up display refresh and

file transfers It is a form of vector quantization compression

When an image is encoded in this way color information is not

directly carried by the image pixel data but is stored in a separate piece of

data called a palette an array of color elements in which every element a

color is indexed by its position within the array The image pixels do not

contain the full specification of its color but only its index in the palette

This technique is sometimes referred as pseudocolor or indirect color as

colors are addressed indirectly

Perhaps the first device that supported palette colors was a random-

access frame buffer described in 1975 by Kajiya Sutherland and Cheadle

This supported a palette of 256 36-bit RGB colors

16 Applications of image processing

Interest in digital image processing methods stems from 2 principal

application areas

1) Improvement of pictorial information for human interpretation

2) Processing of scene data for autonomous machine perception

In the second application area interest focuses on procedures for

extracting from an image

Information in a form suitable for computer processing

Examples include automatic character recognition industrial machine

vision for product assembly and inspection military recognizance

automatic processing of fingerprints etc

17 EXISTING SYSTEM

In Crihalmeanu and Ross proposed three approaches Speed Up Robust

Features (SURF)-based method minutiae detection and direct correlation

matching for feature registration and matching Within these three methods

the SURF method achieves the best accuracy It takes an average of 15

seconds1 using the SURF method to perform a one-to-one matching In

Zhou et al Proposed line descriptor-based method for sclera vein

recognition The matching step (including registration) is the most time-

consuming step in this sclera vein recognition system which costs about 12

seconds to perform a one-to-one matching Both speed was calculated using

a PC with Intelreg Core TM 2 Duo 24GHz processors and 4 GB DRAM

Currently Sclera vein recognition algorithms are designed using central

processing unit (CPU)-based systems

171 DISADVANTAGES OF EXISTING SYSTEM

1 Mask files are used to calculate valid overlapping areas of two sclera

templates and to align the templates to the same coordinate system But the

mask files are large in size and will preoccupy the GPU memory and slow

down the data transfer Also some of processing on the mask files will

involve convolution which is difficult to improve its performance on the

scalar process unit on CUDA

2 The procedure of sclera feature matching consists of a pipeline of several

computational stages with different memory and processing requirements

There is no uniform mapping scheme applicable for all these stages

3 When the scale of sclera database is far larger than the number of the

processing units on the GPU parallel matching on the GPU is still unable to

satisfy the requirement of real-time performance

LITERATURE SURVEY

1 S Crihalmeanu and A Ross ldquo ieee Multispectral scleral patterns

for ocular biometric recognitionrdquo Pattern Recognit Lett vol 33 no

14 pp 1860ndash1869 Oct 2012

Face recognition in unconstrained acquisition conditions is one of the

most challenging problems that has been actively researched in recent

years It is well known that many state-of-the-arts still face recognition

algorithms perform well when constrained (frontal well illuminated high-

resolution sharp and full) face images are acquired However their

performance degrades significantly when the test images contain variations

that are not present in the training images In this paper we highlight some

of the key issues in remote face recognition We define the remote face

recognition as one where faces are several tens of meters (10-250m) from

the cameras We then describe a remote face database which has been

acquired in an unconstrained outdoor maritime environment Recognition

performance of a subset of existing still image-based face recognition

algorithms is evaluated on the remote face data set Further we define the

remote re-identification problem as matching a subject at one location with

candidate sets acquired at a different location and over time in remote

conditions We provide preliminary experimental results on remote re-

identification It is demonstrated that in addition to applying a good

classification algorithm finding features that are robust to variations

mentioned above and developing statistical models which can account for

these variations are very important for remote face recognition

2 R N Rakvic B J Ulis R P Broussard R W Ives and N

Steiner1 ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics

Security

With the rapidly expanded biometric data collected by various sectors

of government and industry for identification and verification purposes

how to manage and process such Big Data draws great concern Even

though modern processors are equipped with more cores and memory

capacity it still requires careful design in order to utilize the hardware

resource effectively and the power consumption efficiently This research

addresses this issue by investigating the workload characteristics of

biometric application Taking Daugmanrsquos iris matching algorithm which

has been proven to be the most reliable iris matching method as a case

study we conduct performance profiling and binary instrumentation on the

benchmark to capture its execution behavior The results show that data

loading and memory access incurs great performance overhead and

motivates us to move the biometrics computation to high-performance

architecture

Modern iris recognition algorithms can be computationally intensive

yet are designed for traditional sequential processing elements such as a

personal computer However a parallel processing alternative using field

programmable gate arrays (FPGAs) offers an opportunity to speed up iris

recognition Within the means of this project iris template generation with

directional filtering which is acomputationally expensive yet parallel

portion of a modern iris recognition algorithm is parallelized on an FPGA

system We will present a performance comparison of the parallelized

algorithm on the FPGA system to a traditional CPU-based version The

parallelized template generation outperforms an optimized C++ code

version determining the information content of an iris approximately 324

times faster

3 R Derakhshani A Ross and S Crihalmeanu ldquoA new biometric

modality based on conjunctival vasculaturerdquo in Proc Artif Neural

Netw Eng 2006 pp 1ndash8

A new biometric indicator based on the patterns of conjunctival

vasculature is proposed Conjunctival vessels can be observed on the visible

part of the sclera that is exposed to the outside world These vessels

demonstrate rich and specific details in visible light and can be easily

photographed using a regular digital camera In this paper we discuss

methods for conjunctival imaging preprocessing and feature extraction in

order to derive a suitable conjunctival vascular template for biometric

authentication Commensurate classification methods along with the

observed accuracy are discussed Experimental results suggest the potential

of using conjunctival vasculature as a biometric measure Identification of

a person based on some unique set of features is an important task The

human identification is possible with several biometric systems and sclera

recognition is one of the promising biometrics The sclera is the white

portion of the human eye The vein pattern seen in the sclera region is

unique to each person Thus the sclera vein pattern is a well suited

biometric technology for human identification The existing methods used

for sclera recognition have some drawbacks like only frontal looking

images are preferred for matching and rotation variance is another problem

These problems are completely eliminated in the proposed system by using

two feature extraction techniques They are Histogram of Oriented

Gradients (HOG) and converting the image into polar form using the

bilinear interpolation technique These two features help the proposed

system to become illumination invariant and rotation invariant The

experimentation is done with the help of UBIRIS database The

experimental result shows that the proposed sclera recognition method can

achieve better accuracy than the previous methods

4 J D Owens M Houston D Luebke S Green J E Stone and J

C Phillips ldquoGPU computingrdquo Proc IEEE vol 96 no 5pp 879ndash899

May 2008

The graphics processing unit (GPU) has become an integral part of

todayrsquos mainstream computing systems Over the past six years there has

been a marked increase in the performance and capabilities of GPUs The

modern GPU is not only a powerful graphics engine but also a highly

parallel programmable processor featuring peak arithmetic and memory

bandwidth that substantially outpaces its CPU counterpart The GPUrsquos

rapid increase in both programmability and capability has spawned a

research community that has successfully mapped a broad range of

computationally demanding complex problems to the GPU This effort in

general purpose computing on the GPU also known as GPU computing

has positioned the GPU as a compelling alternative to traditional

microprocessors in high-performance computer systems of the future We

describe the background hardware and programming model for GPU

computing summarize the state of the art in tools and techniques and

present four GPU computing successes in game physics and computational

biophysics that deliver order-of-magnitude performance gains over

optimized CPU applications

5 H Proenccedila and L A Alexandre ldquoUBIRIS A noisy iris image

databaserdquoin Proc 13th Int Conf Image Anal Process 2005 pp 970ndash

977

This paper proposes algorithms for iris segmentation quality

enhancement match score fusion and indexing to improve both the

accuracy and the speed of iris recognition A curve evolution approach is

proposed to effectively segment a nonideal iris image using the modified

MumfordndashShah functional Different enhancement algorithms are

concurrently applied on the segmented iris image to produce multiple

enhanced versions of the iris image A support-vector-machine-based

learning algorithm selects locally enhanced regions from each globally

enhanced image and combines these good-quality regions to create a single

high-quality iris image Two distinct features are extracted from the high-

quality iris image The global textural feature is extracted using the 1-D log

polar Gabor transform and the local topological feature is extracted using

Euler numbers An intelligent fusion algorithm combines the textural and

topological matching scores to further improve the iris recognition

performance and reduce the false rejection rate whereas an indexing

algorithm enables fast and accurate iris identification The verification and

identification performance of the propose algorithms is validated and

compared with other algorithms using the CASIA Version 3 ICE 2005 and

UBIRIS iris databases

18 PROPOSED METHOD

Proposed a new parallel sclera vein recognition method using a two-

stage parallel approach for registration and matching A parallel sclera

matching solution for Sclera vein recognition using our sequential line-

descriptor method using the CUDA GPU architecture CUDA is a highly

parallel multithreaded many-core processor with tremendous

computational power

It supports not only a traditional graphics pipeline but also computation

on non-graphical data It is relatively straightforward to implement our C

program for CUDA on AMD-based GPU using Open CL Our CUDA

kernels can be directly converted to Open CL kernels by concerning

different syntax for various keywords and built-in functions The mapping

strategy is also effective in Open CL if we regard thread and block in

CUDA as work item and work-group in Open CL Most of our optimization

techniques such as coalesced memory access and prefix sum can work in

Open CL too Moreover since CUDA is a data parallel architecture the

implementation of our approach by Open CL should be programmed in

data-parallel model

In this research we first discuss why the naiumlve parallel approach would

not work We then propose the new sclera descriptor ndash the Y shape sclera

feature-based efficient registration method to speed up the mapping scheme

introduce the ldquoweighted polar line (WPL) descriptorrdquo that would be better

suited for parallel computing to mitigate the mask size issue and develop

our coarse to fine two-stage matching process to dramatically improve the

matching speed These new approaches make the parallel processing

possible and efficient

191PROPOSED SYSTEM ADVANTAGES

1 To improve the efficiency in this research we propose a new descriptor

mdash the Y shape descriptor which can greatly help improve the efficiency of

the coarse registration of two images and can be used to filter out some

non-matching pairs before refined matching

2 We propose the coarse-to-fine two-stage matching process In the first

stage we matched two images coarsely using the Y-shape descriptors

which is very fast to match because no registration was needed The

matching result in this stage can help filter out image pairs with low

similarities

CHAPTER 2

PROJECT DESCRIPTION

21 INTRODUCTION

The sclera is the opaque and white outer layer of the eye The blood

vessel structure of sclera is formed randomly and is unique to each person

which can be used for humanrsquos identification Several researchers have

designed different Sclera vein recognition methods and have shown that it

is promising to use Sclera vein recognition for human identification In

Crihalmeanu and Ross proposed three approaches Speed Up Robust

Features (SURF)-based method minutiae detection and direct correlation

matching for feature registration and matching Within these three methods

the SURF method achieves the best accuracy It takes an average of 15

seconds1 using the SURF method to per- form a one-to-one matching Zhou

et al proposed line descriptor-based method for sclera vein recognition

The matching step (including registration) is the most time-consuming step

in this sclera vein recognition system which costs about 12 seconds to

perform a one-to-one matching Both speed was calculated using a PC with

Intelreg Coretrade 2 Duo 24GHz processors and 4 GB DRAM Currently

Sclera vein recognition algorithms are designed using central processing

unit (CPU)-based systems

As discussed CPU-based systems are designed as sequential

processing devices which may not be efficient in data processing where the

data can be parallelized Because of large time consumption in the matching

step Sclera vein recognition using sequential-based method would be very

challenging to be implemented in a real time biometric system especially

when there is large number of templates in the database for matching GPUs

(as abbreviation of General purpose Graphics Processing Units GPGPUs)

are now popularly used for parallel computing to improve the

computational processing speed and efficiency The highly parallel

structure of GPUs makes them more effective than CPUs for data

processing where processing can be performed in parallel GPUs have been

widely used in biometrics recognition such as speech recognition text

detection handwriting recognition and face recognition In iris

recognition GPU was used to extract the features construct descriptors

and match templates

GPUs are also used for object retrieval and image search Park et al

designed the performance evaluation of image processing algorithms such

as linear feature extraction and multi-view stereo matching on GPUs

However these approaches were designed for their specific biometric

recognition applications and feature searching methods Therefore they may

not be efficient for Sclera vein recognition Compute Unified Device

Architecture (CUDA) the computing engine of NVIDIA GPUs is used in

this research CUDA is a highly parallel multithreaded many-core

processor with tremendous computational power It supports not only a

traditional graphics pipeline but also computation on non-graphical data

More importantly it offers an easier programming platform which

outperforms its CPU counterparts in terms of peak arithmetic intensity and

memory bandwidth In this research the goal is not to develop a unified

strategy to parallelize all sclera matching methods because each method is

quite different from one another and would need customized design To

develop an efficient parallel computing scheme it would need different

strategies for different Sclera vein recognition methods

Rather the goal is to develop a parallel sclera matching solution for

Sclera vein recognition using our sequential line-descriptor method using

the CUDA GPU architecture However the parallelization strategies

developed in this research can be applied to design parallel approaches for

other Sclera vein recognition methods and help parallelize general pattern

recognition methods Based on the matching approach in there are three

challenges to map the task of sclera feature matching to GPU

1) Mask files are used to calculate valid overlapping areas of two sclera

templates and to align the templates to the same coordinate system But the

mask files are large in size and will preoccupy the GPU memory and slow

down the data transfer Also some of processing on the mask files will

involve convolution which is difficult to improve its performance on the

scalar process unit on CUDA

2) The procedure of sclera feature matching consists of a pipeline of several

computational stages with different memory and processing requirements

There is no uniform mapping scheme applicable for all these stages

3) When the scale of sclera database is far larger than the number of the

processing units on the GPU parallel matching on the GPU is still unable to

satisfy the requirement of real-time performance New designs are

necessary to help narrow down the search range In summary naiumlve

implementation of the algorithms in parallel would not work efficiently

Note it is relatively straightforward to implement our C program for

CUDA on AMD-based GPU using Open CL Our CUDA kernels can be

directly converted to Open CL kernels by concerning different syntax for

various keywords and built-in functions The mapping strategy is also

effective in Open CL if we regard thread and block in CUDA as work item

and work-group in Open CL Most of our optimization techniques such as

coalesced memory access and prefix sum can work in Open CL too

Moreover since CUDA is a data parallel architecture the implementation

of our approach by Open CL should be programmed in data-parallel model

In this research we first discuss why the naiumlve parallel approach would not

work (Section 3) We then propose the new sclera descriptor ndash the Y shape

sclera feature-based efficient registration method to speed up the mapping

scheme (Section 4) introduce the ldquoweighted polar line (WPL) descriptorrdquo

that would be better suited for parallel computing to mitigate the mask size

issue (Section 5) and develop our coarse to fine two-stage matching

process to dramatically improve the matching speed (Section 6) These new

approaches make the parallel processing possible and efficient However it

is non-trivial to implement these algorithms in CUDA We then developed

the implementation schemes to map our algorithms into CUDA (Section 7)

In the Section 2 we give brief introduction of Sclera vein recognition In

the Section 8 we performed some experiments using the proposed system

In the Section 9 we draw some conclusions

22 BACKGROUND OF SCLERA VEIN RECOGNITION

221 OVERVIEW OF SCLERA VEIN RECOGNITION

A typical sclera vein recognition system includes sclera

segmentation feature enhancement feature extraction and feature

matching (Figure 1)

FIG

Sclera image segmentation is the first step in sclera vein recognition

Several methods have been designed for sclera segmentation Crihalmeanu

et al presented an semi-automated system for sclera segmentation They

used a clustering algorithm to classify the color eye images into three

clusters - sclera iris and background Later on Crihalmeanu and Ross

designed a segmentation approach based on a normalized sclera index

measure which includes coarse sclera segmentation pupil region

segmentation and fine sclera segmentation Zhou et al developed a skin

tone plus ldquowhite colorrdquo-based voting method for sclera segmentation in

color images and Otsursquos thresholding-based method for grayscale images

After sclera segmentation it is necessary to enhance and extract the sclera

features since the sclera vein patterns often lack contrast and are hard to

detect Zhou et al used a bank of multi-directional Gabor filters for

vascular pattern enhancement Derakhshani et al used contrast limited

adaptive histogram equalization (CLAHE) to enhance the green color plane

of the RGB image and a multi-scale region growing approach to identify

the sclera veins from the image background Crihalmeanu and Ross applied

a selective enhancement filter for blood vessels to extract features from the

green component in a color image In the feature matching step

Crihalmeanu and Ross proposed

three registration and matching approaches including Speed Up Robust

Features (SURF) which is based on interest-point detection minutiae

detection which is based on minutiae points on the vasculature structure

and direct correlation matching which relies on image registration Zhou et

al designed a line descriptor based feature registration and matching

method

The proposed sclera recognition consists of five steps which include

sclera segmentation vein pattern enhancement feature extraction feature

matching and matching decision Fig 2 shows the block diagram of sclera

recognition Two types of feature extraction are used in the proposed

method to achieve good accuracy for the identification The characteristics

that are elicited from the blood vessel structure seen in the sclera region are

Histogram of Oriented Gradients (HOG) and interpolation of Cartesian to

Polar conversion HOG is used to determine the gradient orientation and

edge orientations of vein pattern in the sclera region of an eye image To

become more computationally efficient the data of the image are converted

to the polar form It is mainly used for circular or quasi circular shape of

object These two characteristics are extracted from all the images in the

database and compared with the features of the query image whether the

person is correctly identified or not This procedure is done in the feature

matching step and ultimately makes the matching decision By using the

proposed feature extraction methods and matching techniques the human

identification is more accurate than the existing studies In the proposed

method two features of an image are drawn out

222 SCLERA SEGMENTATION

Sclera segmentation is the first step in the sclera recognition It lets

in three steps glare area detection sclera area estimation and iris and eyelid

detection and refinement Fig shows the steps of segmentation

FIG

Glare Area Detection Glare area means a small bright area near

pupil or iris This is the unwanted portion on the eye image Sobel filter is

applied to detect the glare area present in the iris or pupil Simply it runs

only for the grayscale image If the image is color then it needs a

conversion to grayscale image and after that apply it to the Sobel filter to

detect the glare area Fig 4 shows the result of the glare area detection

FIG

Sclera area estimation For the estimation of sclera area Otsursquos

thresholding method is applied The stairs of the sclera area detection are

selection of the area of interest (ROI) Otsursquos thresholding sclera area

detection Left and right sclera area is selected based on the iris boundaries

When the region of interest is selected then apply Otsursquos thresholding for

obtaining the potential sclera areas The correct left sclera area should be

placed in the right and center positions and correct right sclera area should

be placed in the left and center In this way non sclera areas are wiped out

223 IRIS AND EYELID REFINEMENT

The top and underside of the sclera regions are the limits of the

sclera area And then that upper eyelid lower eyelid and iris boundaries are

refined These altogether are the unwanted portion for recognition In order

to eliminate these effects refinement is done in the footstep of the detection

of sclera area Fig shows after the Otsursquos thresholding process and iris and

eyelid refinement to detect right sclera area In the same way the left sclera

area is detected using this method

FIG

In the segmentation process all images are not perfectly segmented

Hence feature extraction and matching are needed to reduce the

segmentation fault The vein patterns in the sclera area are not visible in the

segmentation process To get vein patterns more visible vein pattern

enhancement is to be performed

224 OCULAR SURFACE VASCULATURE

Human recognition using vascular patterns in the human body has

been studied in the context of fingers (Miura et al 2004) palm (Lin and

Fan 2004) and retina (Hill 1999) In the case of retinal biometrics an

especial optical device for imaging the back of the eyeball is needed (Hill

1999) Due to its perceived invasiveness and the required degree of subject

cooperation the use of retinal biometrics may not be acceptable to some

individuals The conjunctiva is a thin transparent and moist tissue that

covers the outer surface of the eye The part of the conjunctiva that covers

the inner lining of the eyelids is called palpebral conjunctiva and the part

that covers the outer surface of the eye is called ocular (or the bulbar)

conjunctiva which is the focus of this study The ocular conjunctiva is very

thin and clear thus the vasculature (including those of the episclera) is

easily visible through it The visible microcirculation of conjunctiva offers a

rich and complex network of veins and fine microcirculation (Fig 1) The

apparent complexity and specificity of these vascular patterns motivated us

to utilize them for personal identification (Derakhshani and Ross 2006)

FIG

We have found conjunctival vasculature to be a suitable biometric as it

conforms to the following criteria (Jain et al 2004)

UNIVERSALITY All normal living tissues including that of the

conjunctiva and episclera have vascular structure

UNIQUENESS Vasculature is created during embryonic vasculogenesis

Its detailed final structure is mostly stochastic and thus unique Even

though no comprehensive study on the uniqueness of vascular structures

has been conducted study of some targeted areas such as those of the eye

fundus confirm the uniqueness of such vascular patterns even between

identical twins (Simon and Goldstein 1935 Tower 1955)

PERMANENCE Other than cases of significant trauma pathology or

chemical intervention spontaneous adult ocular vasculogenesis and

angiogenesis does not easily occur Thus the conjunctival vascular

structure is expected to have reasonable permanence (Joussen 2001)

Practicality Conjunctival vasculature can be captured with commercial off

the shelf digital cameras under normal lighting conditions making this

modality highly practical

ACCEPTABILITY Since the subject is not required to stare directly into

the camera lens and given the possibility of capturing the conjunctival

vasculature from several feet away this modality is non-intrusive and thus

more acceptable

SPOOF-PROOFNESS The fine multi surface structure of the ocular

veins makes them hard to reproduce as a physical artifact Besides being a

stand-alone biometric modality we anticipate that the addition of

conjunctival biometrics will enhance the performance of current iris-based

biometric system in the following ways

Improving accuracy by the addition of vascular features

Facilitating recognition using off-angle iris images For instance if the iris

information is relegated to the left or right portions of the eye the sclera

vein patterns will be further exposed This feature makes sclera vasculature

a natural complement to the iris biometric

Addressing the failure-to-enroll issue when iris patterns are not usable (eg

due to surgical procedures)

Reducing vulnerability to spoof attacks For instance when implemented

alongside iris systems an attacker needs to reproduce not only the iris but

also different surfaces of the sclera along with the associated

microcirculation and make them available on commensurate eye surfaces

The first step in parallelizing an algorithm is to determine the

availability for simultaneous computation Below Figure demonstrates the

possibility for parallel directional filtering Since the filter is computed over

different portions of the input image the computation can be computed in

parallel (denoted by Elements below) In addition individual parallelization

of each element of Filtering can also be performed A detailed discussion of

our proposed parallelization is outside the scope of this paper

FIG

FIG

225 OVERVIEW OF THE LINE DESCRIPTOR-BASED SCLERA

VEIN

2251 RECOGNITION METHOD

The matching segment of the line-descriptor based method is a

bottleneck with regard to matching speed In this section we briefly

describe the Line Descriptor-based sclera vein recognition method After

segmentation vein patterns were enhanced by a bank of directional Gabor

filters Binary morphological operations are used to thin the detected vein

structure down to a single pixel wide skeleton and remove the branch

points The line descriptor is used to describe the segments in the vein

structure Figure 2 shows a visual description of the line descriptor Each

segment is described by three quantities the segments angle to some

reference angle at the iris center θ the segments distance to the iris center r

and the dominant angular orientation of the line segment ɸ Thus the

descriptor is S = ( θ r ɸ )T The individual components of the line descriptor

are calculated as

FIG

Here fline (x) is the polynomial approximation of the line segment (xl yl )

is the center point of the line segment (xi yi ) is the center of the detected

iris and S is the line descriptor In order to register the segments of the

vascular patterns a RANSAC-based algorithm is used to estimate the best-

fit parameters for registration between the two sclera vascular patterns For

the registration algorithm it randomly chooses two points ndash one from the

test template and one from the target template It also randomly chooses a

scaling factor and a rotation value based on a priori knowledge of the

database Using these values it calculates a fitness value for the registration

using these parameters

After sclera template registration each line segment in the test

template is compared to the line segments in the target template for

matches In order to reduce the effect of segmentation errors we created the

weighting image (Figure 3) from the sclera mask by setting interior pixels

in the sclera mask to 1 pixels within some distance of the boundary of the

mask to 05 and pixels outside the mask to 0

The matching score for two segment descriptors is calculated By

where Si and Sj are two segment descriptors m(Si Sj ) is the matching

score between segments Si and Sj d(Si Sj ) is the Euclidean distance

between the segment descriptors center points (from Eq 6-8) Dmatch is

the matching distance threshold and match is the matching angle threshold

The total matching score M is the sum of the individual matching scores

divided by the maximum matching score for the minimal set between the

test and target template That is one of the test or target templates has fewer

points and thus the sum of its descriptors weight sets the maximum score

that can be attained

FIG

FIG

FIG

FIG

movement of eye Y shape branches are observed to be a stable feature and

can be used as sclera feature descriptor To detect the Y shape branches in

the original template we search for the nearest neighbors set of every line

segment in a regular distance classified the angles among these neighbors

If there were two types of angle values in the line segment set this set may

be inferred as a Y shape structure and the line segment angles would be

recorded as a new feature of the sclera

There are two ways to measure both orientation and relationship of

every branch of Y shape vessels one is to use the angles of every branch to

x axle the other is to use the angels between branch and iris radial

direction The first method needs additional rotation operating to align the

template In our approach we employed the second method As Figure 6

shows ϕ1 ϕ2 and ϕ3 denote the angle between each branch and the radius

from pupil center Even when the head tilts the eye moves or the camera

zooms occurs at the image acquisition step ϕ1 ϕ2 and ϕ3 are quite stable

To tolerate errors from the pupil center calculation in the segmentation step

we also recorded the center position (x y) of the Y shape branches as

auxiliary parameters So in our rotation shift and scale invariant feature

vector is defined as y(ϕ1 ϕ2 ϕ3 x y) The Y-shape descriptor is generated

with reference to the iris center Therefore it is automatically aligned to the

iris centers It is a rotational- and scale- invariant descriptor V WPL

SCLERA DESCRIPTOR As we discussed in the Section 22 the line

descriptor is extracted from the skeleton of vessel structure in binary images

(Figure 7) The skeleton is then broken into smaller segments For each

segment a line descriptor is created to record the center and orientation of

the segment This descriptor is expressed as s(x yɸ) where (x y) is the

position of the center and ɸ is its orientation Because of the limitation of

segmentation accuracy the descriptor in the boundary of sclera area might

not be accurate and may contain spur edges resulting from the iris eyelid

andor eyelashes To be tolerant of such error the mask file

FIG

The line descriptor of the sclera vessel pattern (a) An eye image (b) Vessel

patterns in sclera (c) Enhanced sclera vessel patterns (d) Centers of line

segments of vessel patterns

Is designed to indicate whether a line segment belongs to the edge of the

sclera or not However in GPU application using the mask is a challenging

since the mask files are large in size and will occupy the GPU memory and

slow down the data transfer When matching the registration RANSAC

type algorithm was used to randomly select the corresponding descriptors

and the transform parameter between them was used to generate the

template transform affine matrix After every templates transform the mask

data should also be transformed and new boundary should be calculated to

evaluate the weight of the transformed descriptor This results in too many

convolutions in processor unit

To reduce heavy data transfer and computation we designed the

weighted polar line (WPL) descriptor structure which includes the

information of mask and can be automatically aligned We extracted the

relationship of geometric feature of descriptors and store them as a new

descriptor We use a weighted image created via setting various weight

values according to their positions The weight of those descriptors who are

beyond the sclera are set to be 0 and those who are near the sclera

boundary are 05 and interior descriptors are set to be 1 In our work

descriptors weights were calculated on their own mask by the CPU only

once

The calculating result was saved as a component of descriptor The

descriptor of sclera will change to s(x y ɸw) where w denotes the weight

of the point and the value may be 0 05 1 To align two templates when a

template is shifted to another location along the line connecting their

centers all the descriptors of that template will be transformed It would be

faster if two templates have similar reference points If we use the center of

the iris as the reference point when two templates are compared the

correspondence will automatically be aligned to each other since they have

the similar reference point Every feature vector of the template is a set of

line segment descriptors composed of three variable (Figure 8) the

segment angle to the reference line which went through the iris center

denoted as θ the distance between the segments center and pupil center

which is denoted as r the dominant angular orientation of the segment

denoted as ɸ To minimize the GPU computing we also convert the

descriptor value from polar coordinate to rectangular coordinate in CPU

preprocess

The descriptor vector becomes s(x y r θ ɸw) The left and right

parts of sclera in an eye may have different registration parameters For

example as an eyeball moves left left part sclera patterns of the eye may be

compressed while the right part sclera patterns are stretched

In parallel matching these two parts are assigned to threads in

different warps to allow different deformation The multiprocessor in

CUDA manages threads in groups of 32 parallel threads called warps We

reorganized the descriptor from same sides and saved

FIG

FIG

them in continuous address This would meet requirement of coalesced

memory access in GPU

After reorganizing the structure of descriptors and adding mask information

into the new descriptor the computation on the mask file is not needed on

the GPU It was very fast to match with this feature because it does not

need to reregister the templates every time after shifting Thus the cost of

data transfer and computation on GPU will be reduced Matching on the

new descriptor the shift parameters generator in Figure 4 is then simplified

as Figure 9

23 EVOLUTION OF GPU ARCHITECTURE

The fixed-function pipeline lacked the generality to efficiently express

more complicated shading and lighting operations that are essential for

complex effects The key step was replacing the fixed-function per-vertex

and per-fragment operations with user-specified programs run on each

vertex and fragment Over the past six years these vertex programs and

fragment programs have become increasingly more capable with larger

limits on their size and resource consumption with more fully featured

instruction sets and with more flexible control-flow operations After many

years of separate instruction sets for vertex and fragment operations current

GPUs support the unified Shader Model 40 on both vertex and fragment

shaders

The hardware must support shader programs of at least 65 k static

instructions and unlimited dynamic instructions

The instruction set for the first time supports both 32-bit integers and 32-

bit floating-point numbers

The hardware must allow an arbitrary number of both direct and indirect

reads from global memory (texture)

Finally dynamic flow control in the form of loops and branches must be

supported

As the shader model has evolved and become more powerful and GPU

applications of all types have increased vertex and fragment program

complexity GPU architectures have increasingly focused on the

programmable parts of the graphics pipeline Indeed while previous

generations of GPUs could best be described as additions of

programmability to a fixed-function pipeline todayrsquos GPUs are better

characterized as a programmable engine surrounded by supporting fixed-

function units General-Purpose Computing on the GPU Mapping general-

purpose computation onto the GPU uses the graphics hardware in much the

same way as any standard graphics application Because of this similarity it

is both easier and more difficult to explain the process On one hand the

actual operations are the same and are easy to follow on the other hand the

terminology is different between graphics and general-purpose use Harris

provides an excellent description of this mapping process

We begin by describing GPU programming using graphics terminology

then show how the same steps are used in a general-purpose way to author

GPGPU applications and finally use the same steps to show the more

simple and direct way that todayrsquos GPU computing applications are written

231 PROGRAMMING A GPU FOR GRAPHICS

We begin with the same GPU pipeline that we described in Section II

concentrating on the programmable aspects of this pipeline

The programmer specifies geometry that covers a region on the screen

The rasterizer generates a fragment at each pixel location covered by that

geometry

Each fragment is shaded by the fragment program

The fragment program computes the value of the fragment by a

combination of math operations and global memory reads from a global

Btexture[ memory

The resulting image can then be used as texture on future passes through

the graphics pipeline

232 PROGRAMMING A GPU FOR GENERAL-PURPOSE

PROGRAMS (OLD)

Coopting this pipeline to perform general-purpose computation

involves the exact same steps but different terminology A motivating

example is a fluid simulation computed over a grid at each time step we

compute the next state of the fluid for each grid point from the current state

at its grid point and at the grid points of its neighbors

The programmer specifies a geometric primitive that covers a

computation domain of interest The rasterizer generates a fragment at each

pixel location covered by that geometry (In our example our primitive

must cover a grid of fragments equal to the domain size of our fluid

simulation)

Each fragment is shaded by an SPMD general-purpose fragment

program (Each grid point runs the same program to update the state of its

fluid)

The fragment program computes the value of the fragment by a

combination of math operations and Bgather[ accesses from global

memory (Each grid point can access the state of its neighbors from the

previous time step in computing its current value)

The resulting buffer in global memory can then be used as an input on

future passes (The current state of the fluid will be used on the next time

step)

23 2 PROGRAMMING A GPU FOR GENERAL-PURPOSE

PROGRAMS (NEW)

One of the historical difficulties in programming GPGPU applications

has been that despite their general-purpose tasksrsquo having nothing to do with

graphics the applications still had to be programmed using graphics APIs

In addition the program had to be structured in terms of the graphics

pipeline with the programmable units only accessible as an intermediate

step in that pipeline when the programmer would almost certainly prefer to

access the programmable units directly The programming environments we

describe in detail in Section IV are solving this difficulty by providing a

more natural direct non-graphics interface to the hardware and

specifically the programmable units Today GPU computing applications

are structured in the following way

The programmer directly defines the computation domain of interest as a

structured grid of threads

An SPMD general-purpose program computes the value of each thread

The value for each thread is computed by a combination of math

operations and both Bgather[ (read) accesses from and Bscatter[ (write)

accesses to global memory Unlike in the previous two

methods the same buffer can be used for both reading and writing

allowing more flexible algorithms (for example in-place algorithms that

use less memory)

The resulting buffer in global memory can then be used as an input in

future computation

24 COARSE-TO-FINE TWO-STAGE MATCHING PROCESS

To further improve the matching process we propose the coarse-to-fine

two-stage matching process In the first stage we matched two images

coarsely using the Y-shape descriptors which is very fast to match because

no registration was needed The matching result in this stage can help filter

out image pairs with low similarities After this step it is still possible for

some false positive matches In the second stage we used WPL descriptor

to register the two images for more detailed descriptor matching including

scale- and translation invariance This stage includes shift transform affine

matrix generation and final WPL descriptor matching Overall we

partitioned the registration and matching processing into four kernels2 in

CUDA (Figure 10) matching on the Y shape descriptor shift

transformation affine matrix generation and final WSL descriptor

matching Combining these two stages the matching program can run faster

and

achieve more accurate score

241 STAGE I MATCHING WITH Y SHAPE DESCRIPTOR

Due to scale- and rotation- invariance of the Y-shape features

registration is unnecessary before matching on Y shape descriptor The

whole matching algorithm is listed as algorithm 1

FIG

Here ytei and yta j are the Y shape descriptors of test template Tte

and target template Tta respectively dϕ is the Euclidian distance of angle

element of descriptors vector defined as (3) dxy is the Euclidian distance of

two descriptor centers defined as (4) ni and di are the matched descriptor

pairsrsquo number and their centers distance respectively tϕ is a distance

threshold and txy is the threshold to restrict the searching area We set tϕ to

30 and txy to 675 in our experiment Here

To match two sclera templates we searched the areas nearby to all

the Y shape branches The search area is limited to the corresponding left or

right half of the sclera in order to reduce the searching range and time The

distance of two branches is defined in (3) where ϕi j is the angle between

the j th branch and the polar from pupil center in desctiptor i

The number of matched pairs ni and the distance between Y shape

branches centers di are stored as the matching result We fuse the number of

matched branches and the average distance between matched branches

centers as (2) Here α is a factor to fuse the matching score which was set

to 30 in our study Ni and Nj is the total numbers of feature vectors in

template i and j separately The decision is regulated by the threshold t if

the sclerarsquos matching score is lower than t the sclera will be discarded The

sclera with high matching score will be passed to the next more precisely

matching process

242 STAGE II FINE MATCHING USING WPL DESCRIPTOR

The line segment WSL descriptor reveals more vessel structure detail of

sclera than the Y shape descriptor The variation of sclera vessel pattern is

nonlinear because

When acquiring an eye image in different gaze angle the vessel structure

will appear nonlinear shrink or extend because eyeball is spherical in shape

sclera is made up of four layers episclera stroma lamina fusca and

endothelium There are slightly differences among movement of these

layers Considering these factors our registration employed both single

shift transform and multi-parameter transform which combines shift

rotation and scale together

1) SHIFT PARAMETER SEARCH As we discussed before

segmentation may not be accurate As a result the detected iris center could

not be very accurate Shift transform is designed to tolerant possible errors

in pupil center detection in the segmentation step If there is no deformation

or only very minor deformation registration with shift transform together

would be adequate to achieve an accurate result We designed Algorithm 2

to get optimized shift parameter Where Tte is the test template and ssei is

the i th WPL descriptor of Tte Tta is the target template and ssai is the i th

WPL descriptor of Ttad(stek staj ) is Euclidean distance of escriptors stek

and staj

Δsk is the shift value of two descriptors defines as

We first randomly select an equal number of segment descriptors

stek in test template Tte from each quad and find its nearest neighbors staj _

in target template Tta The shift offset of them is recorded as the possible

registration shift factor _sk The final offset registration factor is _soptim

which has the smallest standard deviation among these candidate offsets

2) AFFINE TRANSFORM PARAMETER SEARCH

Affine transform is designed to tolerant some deformation of sclera

patterns in the matching step The affine transform algorithm is shown in

Algorithm 3 The shift value in the parameter set is obtained by randomly

selecting descriptor s(it )te and calculating the distance from its nearest

neighbor staj_ in Tta We transform the test template by the matrix in (7)

At end of the iteration we count the numbers of matched descriptor pairs

from the transformed template and the target template The factor β is

involved to determine if the pair of descriptor is matched and we set it to

be 20 pixels in our experiment After N iterations the optimized transform

parameter set is determined via selecting the maximum matching numbers

m(it) Here stei Tte staj and Tta is defined same as algorithm 2 tr (it )

shi f t θ(it )tr (it ) scale is the parameters of shift rotation and scale

transform generated in i tth iteration R(θ (it )) T (tr (it ) shi f t ) and S(tr (it

) scale) are the transform matrix defined as (7) To search optimize

transform parameter we iterated N times to generate these parameters In

our experiment we set iteration time to 512

3) REGISTRATION AND MATCHING ALGORITHM

Using the optimized parameter set determined from Algorithms 2

and 3 the test template will be registered and matched simultaneously The

registration and matching algorithm is listed in Algorithm 4 Here stei Tte

staj and Tta are defined same as Algorithms 2 and 3 θ(optm) tr (optm) shi f

t tr (optm) scale _soptim are the registration parameters attained from

Algorithms 2 and 3 R_θ(optm)_T _tr (optm) shi f t _S_tr (optm) scale

_ is the descriptor transform matrix defined in Algorithm 3 empty is the angle

between the segment descriptor and radius direction w is the weight of the

descriptor which indicates whether the descriptor is at the edge of sclera or

not To ensure that the nearest descriptors have a similar orientation we

used a constant factor α to check the abstract difference of two ɸ In our

experiment we set α to 5 The total matching score is minimal score of two

transformed result divided by the minimal matching score for test template

and target template

25 MAPPING THE SUBTASKS TO CUDA

CUDA is a single instruction multiple data (SIMD) system and

works as a coprocessor with a CPU A CUDA consists of many streaming

multiprocessors (SM) where the parallel part of the program should be

partitioned into threads by the programmer and mapped into those threads

There are multiple memory spaces in the CUDA memory hierarchy

register local memory shared memory global memory constant memory

and texture memory Register local memory and shared memory are on-

chip and could be a little time consuming to access these memories Only

shared memory can be accessed by other threads within the same block

However there is only limited availability of shared memory Global

memory constant memory and texture memory are off-chip memory and

accessible by all threads which would be very time consuming to access

these memories

Constant memory and texture memory are read-only and cacheable

memory Mapping algorithms to CUDA to achieve efficient processing is

not a trivial task There are several challenges in CUDA programming

If threads in a warp have different control path all the branches will be

executed serially To improve performance branch divergence within a

warp should be avoided

Global memory is slower than on-chip memory in term of access To

completely hide the latency of the small instructions set we should use on-

chip memory preferentially rather than global memory When global

memory access occurs threads in same warp should access the words in

sequence to achieve coalescence

Shared memory is much faster than the local and global memory space

But shared memory is organized into banks which are equal in size If two

addresses of memory request from different thread within a warp fall in the

same memory bank the access will be serialized To get maximum

performance memory requests should be scheduled to minimize bank

conflicts

251 MAPPING ALGORITHM TO BLOCKS

Because the proposed registration and matching algorithm has four

independent modules all the modules will be converted to different kernels

on the GPU These kernels are different in computation density thus we

map them to the GPU by various map strategies to fully utilize the

computing power of CUDA Figure 11 shows our scheme of CPU-GPU

task distribution and the partition among blocks and threads Algorithm 1 is

partitioned into coarse-grained parallel subtasks

We create a number of threads in this kernel The number of threads

is the same as the number of templates in the database As the upper middle

column shows in Figure 11 each target template will be assigned to one

thread One thread performs a pair of templates compare In our work we

use NVIDIA C2070 as our GPU Threads and blocks number is set to

1024 That means we can match our test template with up to 1024times1024

target templates at same time

Algorithms 2-4 will be partitioned into fine-grained subtasks which is

processed a section of descriptors in one thread As the lower portion of the

middle column shows in Figure 11 we assigned a target template to one

block Inside a block one thread corresponds a set of descriptors in this

template This partition makes every block execute independently and there

are no data exchange requirements between different blocks When all

threads complete their responding descriptor fractions the sum of the

intermediate results needs to be computed or compared A parallel prefix

sum algorithm is used to calculate the sum of intermediate results which is

show in right of Figure 11 Firstly all odd number threads compute the sum

of consecutive pairs of the results Then recursively every first of i (= 4 8

16 32 64 ) threads

compute the prefix sum on the new result The final result will be saved in

the first address which has the same variable name as the first intermediate

result

252 MAPPING INSIDE BLOCK

In shift argument searching there are two schemes we can choose to

map task

Mapping one pair of templates to all the threads in a block and then every

thread would take charge of a fraction of descriptors and cooperation with

other threads

Assigning a single possible shift offset to a thread and all the threads will

compute independently unless the final result should be compared with

other possible offset

Due to great number of sum and synchronization operations in every

nearest neighbor searching step we choose the second method to parallelize

shift searching In affine matrix generator we mapped an entire parameter

set searching to a thread and every thread randomly generated a set of

parameters and tried them independently The generated iterations were

assigned to all threads The challenge of this step is the randomly generated

numbers might be correlated among threads In the step of rotation and

scale registration generating we used the Mersenne Twister pseudorandom

number generator because it can use bitwise arithmetic and have long

period

The Mersenne twister as most of pseudorandom generators is iterative

Therefore itrsquos hard to parallelize a single twister state update step among

several execution threads To make sure that thousands of threads in the

launch grid generate uncorrelated random sequence many simultaneous

Mersenne twisters need to process with different initial states in parallel

But even ldquovery differentrdquo (by any definition) initial state values do not

prevent the emission of correlated sequences by each generator sharing

identical parameters To solve this problem and to enable efficient

implementation of Mersenne Twister on parallel architectures we used a

special offline tool for the dynamic creation of Mersenne Twisters

parameters modified from the algorithm developed by Makoto Matsumoto

and Takuji Nishimura In the registration and matching step when

searching the nearest neighbor a line segment that has already matched

with others should not be used again In our approach a flag

FIG

FIG

Variable denoting whether the line has been matched is stored in

shared memory To share the flags all the threads in a block should wait

synchronic operation at every query step Our solution is to use a single

thread in a block to process the matching

253 MEMORY MANAGEMENT

The bandwidth inside GPU board is much higher than the

bandwidth between host memory and device memory The data transfer

between host and device can lead to long latency As shown in Figure 11

we load the entire target templates set from database without considering

when they would be processed Therefore there was no data transfer from

host to device during the matching procedure In global memory the

components in descriptor y(ϕ1 ϕ2 ϕ3 x y) and s(x y rθ ϕw) were stored

separately This would guarantee contiguous kernels of Algorithm 2 to 4

can access their data in successive addresses Although such coalescing

access reduces the latency frequently global memory access was still a

slower way to get data In our kernel we loaded the test template to shared

memory to accelerate memory access Because the Algorithms 2 to 4

execute different number of iterations on same data the bank conflict does

not happen To maximize our texture memory space we set the system

cache to the lowest value and bonded our target descriptor to texture

memory Using this catchable memory our data access was accelerated

more

FIG

26 HISTOGRAM OF ORIENTED GRADIENTS

Histogram of oriented gradients is the feature descriptors It is primarily

applied to the design of target detection In this paper it is applied as the

feature for human recognition In the sclera region the vein patterns are the

edges of an image So HOG is used to determine the gradient orientation

and edge orientations of vein pattern in the sclera region of an eye image

To follow out this technique first of all divide the image into small

connected regions called cells For each cell compute the histogram of

gradient directions or edge orientations of the pixels Then the combination

of different histogram of different cell represents the descriptor To improve

accuracy histograms can be contrast normalized by calculating the intensity

from the block and then using this value normalizes all cells within the

block This normalization result shows that it is invariant to geometric and

photometric changes The gradient magnitude m(x y) and orientation 1050592(x

y) are calculated using x and y directions gradients dx (x y) and dy (x y)

Orientation binning is the second step of HOG This method utilized

to create cell histograms Each pixel within the cell used to give a weight to

the orientation which is found in the gradient computation Gradient

magnitude is used as the weight The cells are in the rectangular form The

binning of gradient orientation should be spread over 0 to 180 degrees and

opposite direction counts as the same In the Fig 8 depicts the edge

orientation of picture elements If the images have any illumination and

contrast changes then the gradient strength must be locally normalized For

that cells are grouped together into larger blocks These blocks are

overlapping so that each cell contributes more than once to the final

descriptor Here rectangular HOG (R-HOG) blocks are applied which are

mainly in square grids The performance of HOG is improved by putting

on a Gaussian window into each block

FIG

CHAPTER 3

SOFTWARE SPECIFICATION

31 GENERAL

MATLAB(matrix laboratory) is a numerical

computing environment and fourth-generation programming language

Developed by Math Works MATLAB allows matrix manipulations

plotting of functions and data implementation of algorithms creation

of user interfaces and interfacing with programs written in other languages

including C C++ Java and Fortran

Although MATLAB is intended primarily for numerical computing an

optional toolbox uses the MuPAD symbolic engine allowing access

to symbolic computing capabilities An additional package Simulink adds

graphicalmulti-domainsimulationandModel-Based

Design for dynamic and embedded systems

In 2004 MATLAB had around one million users across industry

and academia MATLAB users come from various backgrounds

of engineering science and economics MATLAB is widely used in

academic and research institutions as well as industrial enterprises

MATLAB was first adopted by researchers and practitioners

in control engineering Littles specialty but quickly spread to many other

domains It is now also used in education in particular the teaching

of linear algebra and numerical analysis and is popular amongst scientists

involved in image processing The MATLAB application is built around the

MATLAB language The simplest way to execute MATLAB code is to type

it in the Command Window which is one of the elements of the MATLAB

Desktop When code is entered in the Command Window MATLAB can

be used as an interactive mathematical shell Sequences of commands can

be saved in a text file typically using the MATLAB Editor as a script or

encapsulated into a function extending the commands available

MATLAB provides a number of features for documenting and

sharing your work You can integrate your MATLAB code with other

languages and applications and distribute your MATLAB algorithms and

applications

32 FEATURES OF MATLAB

High-level language for technical computing

Development environment for managing code files and data

Interactive tools for iterative exploration design and problem solving

Mathematical functions for linear algebra statistics Fourier analysis

filtering optimization and numerical integration

2-D and 3-D graphics functions for visualizing data

Tools for building custom graphical user interfaces

Functions for integrating MATLAB based algorithms with external

applications and languages such as C C++ FORTRAN Javatrade COM

and Microsoft Excel

MATLAB is used in vast area including signal and image

processing communications control design test and measurement

financial modeling and analysis and computational Add-on toolboxes

(collections of special-purpose MATLAB functions) extend the MATLAB

environment to solve particular classes of problems in these application

areas

MATLAB can be used on personal computers and powerful

server systems including the Cheaha compute cluster With the addition of

the Parallel Computing Toolbox the language can be extended with parallel

implementations for common computational functions including for-loop

unrolling Additionally this toolbox supports offloading computationally

intensive workloads to Cheaha the campus compute cluster MATLAB is

one of a few languages in which each variable is a matrix (broadly

construed) and knows how big it is Moreover the fundamental operators

(eg addition multiplication) are programmed to deal with matrices when

required And the MATLAB environment handles much of the bothersome

housekeeping that makes all this possible Since so many of the procedures

required for Macro-Investment Analysis involves matrices MATLAB

proves to be an extremely efficient language for both communication and

implementation

321 INTERFACING WITH OTHER LANGUAGES

MATLAB can call functions and subroutines written in the C

programming language or FORTRAN A wrapper function is created

allowing MATLAB data types to be passed and returned The dynamically

loadable object files created by compiling such functions are termed MEX-

files (for MATLAB executable)

Libraries written in Java ActiveX or NET can be directly called

from MATLAB and many MATLAB libraries (for

example XML or SQL support) are implemented as wrappers around Java

or ActiveX libraries Calling MATLAB from Java is more complicated but

can be done with MATLAB extension which is sold separately by Math

Works or using an undocumented mechanism called JMI (Java-to-Mat lab

Interface) which should not be confused with the unrelated Java that is also

called JMI

As alternatives to the MuPAD based Symbolic Math Toolbox

available from Math Works MATLAB can be connected

to Maple or Mathematical

Libraries also exist to import and export MathML

Development Environment

Startup Accelerator for faster MATLAB startup on Windows especially on

Windows XP and for network installations

Spreadsheet Import Tool that provides more options for selecting and

loading mixed textual and numeric data

Readability and navigation improvements to warning and error messages in

the MATLAB command window

Automatic variable and function renaming in the MATLAB Editor

Developing Algorithms and Applications

MATLAB provides a high-level language and development

tools that let you quickly develop and analyze your algorithms and

applications

The MATLAB Language

The MATLAB language supports the vector and matrix operations

that are fundamental to engineering and scientific problems It enables fast

development and execution With the MATLAB language you can

program and develop algorithms faster than with traditional languages

because you do not need to perform low-level administrative tasks such as

declaring variables specifying data types and allocating memory In many

cases MATLAB eliminates the need for lsquoforrsquo loops As a result one line of

MATLAB code can often replace several lines of C or C++ code

At the same time MATLAB provides all the features of a traditional

programming language including arithmetic operators flow control data

structures data types object-oriented programming (OOP) and debugging

features

MATLAB lets you execute commands or groups of commands one

at a time without compiling and linking enabling you to quickly iterate to

the optimal solution For fast execution of heavy matrix and vector

computations MATLAB uses processor-optimized libraries For general-

purpose scalar computations MATLAB generates machine-code

instructions using its JIT (Just-In-Time) compilation technology

This technology which is available on most platforms provides

execution speeds that rival those of traditional programming languages

Development Tools

MATLAB includes development tools that help you implement

your algorithm efficiently These include the following

MATLAB Editor

Provides standard editing and debugging features such as setting

breakpoints and single stepping

Code Analyzer

Checks your code for problems and recommends modifications to

maximize performance and maintainability

MATLAB Profiler

Records the time spent executing each line of code

Directory Reports

Scan all the files in a directory and report on code efficiency file

differences file dependencies and code coverage

Designing Graphical User Interfaces

By using the interactive tool GUIDE (Graphical User Interface

Development Environment) to layout design and edit user interfaces

GUIDE lets you include list boxes pull-down menus push buttons radio

buttons and sliders as well as MATLAB plots and Microsoft

ActiveXreg controls Alternatively you can create GUIs programmatically

using MATLAB functions

322 ANALYZING AND ACCESSING DATA

MATLAB supports the entire data analysis process from acquiring

data from external devices and databases through preprocessing

visualization and numerical analysis to producing presentation-quality

output

Data Analysis

MATLAB provides interactive tools and command-line functions for data

analysis operations including

Interpolating and decimating

Extracting sections of data scaling and averaging

Thresholding and smoothing

Correlation Fourier analysis and filtering

1-D peak valley and zero finding

Basic statistics and curve fitting

Matrix analysis

Data Access

MATLAB is an efficient platform for accessing data from

files other applications databases and external devices You can read data

from popular file formats such as Microsoft Excel ASCII text or binary

files image sound and video files and scientific files such as HDF and

HDF5 Low-level binary file IO functions let you work with data files in

any format Additional functions let you read data from Web pages and

XML

Visualizing Data

All the graphics features that are required to visualize engineering

and scientific data are available in MATLAB These include 2-D and 3-D

plotting functions 3-D volume visualization functions tools for

interactively creating plots and the ability to export results to all popular

graphics formats You can customize plots by adding multiple axes

changing line colors and markers adding annotation Latex equations and

legends and drawing shapes

2-D Plotting

Visualizing vectors of data with 2-D plotting functions that create

Line area bar and pie charts

Direction and velocity plots

Histograms

Polygons and surfaces

Scatterbubble plots

Animations

3-D Plotting and Volume Visualization

MATLAB provides functions for visualizing 2-D matrices 3-

D scalar and 3-D vector data You can use these functions to visualize and

understand large often complex multidimensional data Specifying plot

characteristics such as camera viewing angle perspective lighting effect

light source locations and transparency

3-D plotting functions include

Surface contour and mesh

Image plots

Cone slice stream and isosurface

323 PERFORMING NUMERIC COMPUTATION

MATLAB contains mathematical statistical and engineering

functions to support all common engineering and science operations These

functions developed by experts in mathematics are the foundation of the

MATLAB language The core math functions use the LAPACK and BLAS

linear algebra subroutine libraries and the FFTW Discrete Fourier

Transform library Because these processor-dependent libraries are

optimized to the different platforms that MATLAB supports they execute

faster than the equivalent C or C++ code

MATLAB provides the following types of functions for performing

mathematical operations and analyzing data

Matrix manipulation and linear algebra

Polynomials and interpolation

Fourier analysis and filtering

Data analysis and statistics

Optimization and numerical integration

Ordinary differential equations (ODEs)

Partial differential equations (PDEs)

Sparse matrix operations

MATLAB can perform arithmetic on a wide range of data types

including doubles singles and integers

CHAPTER 4

IMPLEMENTATION

41 GENERAL

Matlab is a program that was originally designed to simplify the

implementation of numerical linear algebra routines It has since grown into

something much bigger and it is used to implement numerical algorithms

for a wide range of applications The basic language used is very similar to

standard linear algebra notation but there are a few extensions that will

likely cause you some problems at first

42 SNAPSHOTS

ORIGINAL SCLERA IMAGE IS CONVERTED INTO GREY SCALE IMAGE

FIG

GREY SCALE IMAGE IS CONVERTED INTO BINARY IMAGE

FIG

EDGE DETECTON IS DONE BY OTSUrsquoS THRESHOLDING

FIG

SELECTING THE REGION OF INTEREST (SCLERA PART)

FIG

SELECTED ROI PART

FIG

FIG

ENHANCEMENT OF SCLERA IMAGE

FIG

FEATURE EXTRACTION OF SCLERA IMAGE USING GABOR

FILTERS

FIG

MATCHING WITH IMAGES IN DATABASE

FIG

DISPLAYING THE RESULT (MATCHED OR NOT MATCHED)

FIG

CHAPTER 5

APPLICATIONS

The applications of biometrics can be divided into the following three main groups

Commercial applications such as computer network login electronic data security ecommerce Internet access ATM credit card physical access control cellular phone PDA medical records management distance learning etc

Government applications such as national ID card correctional facility driverrsquos license social security welfare-disbursement border control Passport control etc

Forensic applications such as corpse identification criminal investigation terrorist identification parenthood determination missing children etc Traditionally commercial applications have used knowledge-based systems (eg PIN sand passwords) government applications have used token-based systems (eg ID cards and badges) and forensic applications have relied on human experts to match biometric features Biometric systems are being increasingly deployed in large scale civilian applications The Schiphol Premium scheme at the Amsterdam airport for example employs iris scan cards to speed up the passport and visa control procedures

CHAPTER 6

CONCLUSION AND FUTURE SCOPE

61 CONCLUSION

In this paper we proposed a new parallel sclera vein recognition

method which employees a two stage parallel approach for registration and

matching Even though the research focused on developing a parallel sclera

matching solution for the sequential line-descriptor method using CUDA

GPU architecture the parallel strategies developed in this research can be

applied to design parallel solutions to other sclera vein recognition methods

and general pattern recognition methods We designed the Y shape

descriptor to narrow the search range to increase the matching efficiency

which is a new feature extraction method to take advantage of the GPU

structures We developed the WPL descriptor to incorporate mask

information and make it more suitable for parallel computing which can

dramatically reduce data transferring and computation We then carefully

mapped our algorithms to GPU threads and blocks which is an important

step to achieve parallel computation efficiency using a GPU A work flow

which has high arithmetic intensity to hide the memory access latency was

designed to partition the computation task to the heterogeneous system of

CPU and GPU even to the threads in GPU The proposed method

dramatically improves the matching efficiency without compromising

recognition accuracy

62 REFERENCES

[1] C W Oyster The Human Eye Structure and Function Sunderland

Sinauer Associates 1999

[2] C Cuevas D Berjon F Moran and N Garcia ldquoMoving object

detection for real-time augmented reality applications in a GPGPUrdquo IEEE

Trans Consum Electron vol 58 no 1 pp 117ndash125 Feb 2012

[3] D C Cirean U Meier L M Gambardella and J Schmidhuber ldquoDeep

big simple neural nets for handwritten digit recognitionrdquo Neural Comput

vol 22 no 12 pp 3207ndash3220 2010

[4] F Z Sakr M Taher and A M Wahba ldquoHigh performance iris

recognition system on GPUrdquo in Proc ICCES 2011 pp 237ndash242

[5] G Poli J H Saito J F Mari and M R Zorzan ldquoProcessing

neocognitron of face recognition on high performance environment based

on GPU with CUDA architecturerdquo in Proc 20th Int Symp Comput

Archit High Perform Comput 2008 pp 81ndash88

[6] J Antikainen J Havel R Josth A Herout P Zemcik and M Hauta-

Kasari ldquoNonnegative tensor factorization accelerated using GPGPUrdquo IEEE

Trans Parallel Distrib Syst vol 22 no 7 pp 1135ndash1141 Feb 2011

[7] K-S Oh and K Jung ldquoGPU implementation of neural networksrdquo

Pattern Recognit vol 37 no 6 pp 1311ndash1314 2004

[8] P R Dixon T Oonishi and S Furui ldquoHarnessing graphics processors

for the fast computation of acoustic likelihoods in speech recognitionrdquo

Comput Speech Lang vol 23 no 4 pp 510ndash526 2009

[9] P Kaufman and A Alm ldquoClinical applicationrdquo Adlerrsquos Physiology of

the Eye 2003

[10] R N Rakvic B J Ulis R P Broussard R W Ives and N Steiner

ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics Security vol 4

no 4 pp 812ndash823 Dec 2009

[11] S Crihalmeanu and A Ross ldquoMultispectral scleral patterns for ocular

biometric recognitionrdquo Pattern Recognit Lett vol 33 no 14 pp 1860ndash

1869 Oct 2012

[12] W Wenying Z Dongming Z Yongdong L Jintao and G

Xiaoguang ldquoRobust spatial matching for object retrieval and its parallel

implementation on GPUrdquo IEEE Trans Multimedia vol 13 no 6 pp

1308ndash1318 Dec 2011Multimedia Sec Magdeburg Germany Sep 2004

pp 4ndash15

[13] Y Xu S Deka and R Righetti ldquoA hybrid CPU-GPGPU approach for

real-time elastographyrdquo IEEE Trans Ultrason Ferroelectr Freq Control

vol 58 no 12 pp 2631ndash2645 Dec 2011

[14] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive

multimodal eye recognitionrdquo Signal Image Video Process vol 7 no 4

pp 619ndash631 Jul 2013

[15] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive

approach for sclera image quality measurerdquo Int J Biometrics vol 5 no 2

pp 181ndash198 2013

[16] Z Zhou E Y Du N L Thomas and E J Delp ldquoA new human

identification method Sclera recognitionrdquo IEEE Trans Syst Man

Cybern A Syst Humans vol 42 no 3 pp 571ndash583 May 2012

Page 15: Major project report on

crispening and sharpening filtering interpolation and magnification

pseudo coloring and so on

123 IMAGE RESTORATION

It is concerned with filtering the observed image to minimize the

effect of degradations Effectiveness of image restoration depends on the

extent and accuracy of the knowledge of degradation process as well as on

filter design Image restoration differs from image enhancement in that the

latter is concerned with more extraction or accentuation of image features

124 IMAGE COMPRESSION

It is concerned with minimizing the number of bits required to represent

an image Application of compression are in broadcast TV remote sensing

via satellite military communication via aircraft radar teleconferencing

facsimile transmission for educational amp business documents medical

images that arise in Computer tomography magnetic resonance imaging

and digital radiology motion pictures satellite images weather maps

geological surveys and so on

Text compression ndash CCITT GROUP3 amp GROUP4

Still image compression ndash JPEG

Video image compression ndash MPEG

125 SEGMENTATION

In computer vision image segmentation is the process of

partitioning a digital image into multiple segments (sets of pixels also

known as super pixels) The goal of segmentation is to simplify andor

change the representation of an image into something that is more

meaningful and easier to analyze Image segmentation is typically used to

locate objects and boundaries (lines curves etc) in images More precisely

image segmentation is the process of assigning a label to every pixel in an

image such that pixels with the same label share certain visual

characteristics

The result of image segmentation is a set of segments that

collectively cover the entire image or a set of contours extracted from the

image (see edge detection) Each of the pixels in a region are similar with

respect to some characteristic or computed property such as

colour intensity or texture Adjacent regions are significantly different

with respect to the same characteristic(s) When applied to a stack of

images typical in medical imaging the resulting contours after image

segmentation can be used to create 3D reconstructions with the help of

interpolation algorithms like marching cubes

126 IMAGE RESTORATION

Image restoration like enhancement improves the qualities of image

but all the operations are mainly based on known measured or

degradations of the original image Image restorations are used to restore

images with problems such as geometric distortion improper focus

repetitive noise and camera motion It is used to correct images for known

degradations

127 FUNDAMENTAL STEPS

Image acquisition to acquire a digital image

Image preprocessing to improve the image in ways that increases the

chances for success of the other processes

Image segmentation to partitions an input image into its constituent parts or

objects

Image representation to convert the input data to a form suitable for

computer processing

Image description to extract features that result in some quantitative

information of interest or features that are basic for differentiating one

class of objects from another

Image recognition to assign a label to an object based on the

information provided by its descriptors

Image interpretation to assign meaning to an ensemble of recognized

objects

Knowledge about a problem domain is coded into an image processing

system in the form of a Knowledge database

13 A SIMPLE IMAGE MODEL

To be suitable for computer processing an image f(xy) must be digitalized

both spatially and in amplitude

Digitization of the spatial coordinates (xy) is called image sampling

Amplitude digitization is called gray-level quantization

The storage and processing requirements increase rapidly with the spatial

resolution and the number of gray levels

Example A 256 gray-level image of size 256x256 occupies 64K bytes of

memory

Images of very low spatial resolution produce a checkerboard effect

The use of insufficient number of gray levels in smooth areas of a digital

image results in false contouring

14 IMAGE FILE FORMATS

There are two general groups of lsquoimagesrsquo vector graphics (or line art)

and bitmaps (pixel-based or lsquoimagesrsquo) Some of the most common file

formats are

GIF mdash Graphical interchange Format An 8-bit (256 colour) non-

destructively compressed bitmap format Mostly used for web Has several

sub-standards one of which is the animated GIF

JPEG mdash Joint Photographic Experts Group a very efficient (ie much

information per byte) destructively compressed 24 bit (16 million colours)

bitmap format Widely used especially for web and Internet (bandwidth-

limited)

TIFF mdash Tagged Image File Format The standard 24 bit publication bitmap

format Compresses non-destructively with for instance Lempel-Ziv-

Welch (LZW) compression

PS mdash Postscript a standard vector format Has numerous sub-standards

and can be difficult to transport across platforms and operating systems

PSD ndash Adobe PhotoShop Document a dedicated Photoshop format that

keeps all the information in an image including all the layers

BMP- bit map file format

15 TYPE OF IMAGES

Images are 4 types

1 Binary image

2 Gray scale image

3 Color image

4 Indexed image

151 BINARY IMAGES

A binary image is a digital image that has only two possible values for

each pixel Typically the two colors used for a binary image are black and

white though any two colors can be used Binary images are also called bi-

level or two-level This means that each pixel is stored as a single bitmdashie

a 0 or 1 The names black-and-white BampW

152 GRAY SCALE IMAGE

In a (8-bit) grayscale image each picture element has an assigned intensity

that ranges from 0 to 255 A grey scale image is what people normally call

a black and white image but the name emphasizes that such an image will

also include many shades of grey

FIG

153 COLOR IMAGE

The RGB colour model relates very closely to the way we perceive

colour with the r g and b receptors in our retinas RGB uses additive colour

mixing and is the basic colour model used in television or any other

medium that projects colour with light It is the basic colour model used in

computers and for web graphics but it cannot be used for print production

The secondary colours of RGB ndash cyan magenta and yellow ndash are formed

by mixing two of the primary colours (red green or blue) and excluding the

third colour Red and green combine to make yellow green and blue to

make cyan and blue and red form magenta The combination of red green

and blue in full intensity makes white

In Photoshop using the ldquoscreenrdquo mode for the different layers in an

image will make the intensities mix together according to the additive

colour mixing model This is analogous to stacking slide images on top of

each other and shining light through them

FIG

CMYK The 4-colour CMYK model used in printing lays down

overlapping layers of varying percentages of transparent cyan (C) magenta

(M) and yellow (Y) inks In addition a layer of black (K) ink can be added

The CMYK model uses the subtractive colour model

154 INDEXED IMAGE

FIG

An indexed image consists of an array and a color map matrix The

pixel values in the array are direct indices into a color map By convention

this documentation uses the variable name X to refer to the array and map

to refer to the color map In computing indexed color is a technique to

manage digital images colors in a limited fashion in order to save

computer memory and file storage while speeding up display refresh and

file transfers It is a form of vector quantization compression

When an image is encoded in this way color information is not

directly carried by the image pixel data but is stored in a separate piece of

data called a palette an array of color elements in which every element a

color is indexed by its position within the array The image pixels do not

contain the full specification of its color but only its index in the palette

This technique is sometimes referred as pseudocolor or indirect color as

colors are addressed indirectly

Perhaps the first device that supported palette colors was a random-

access frame buffer described in 1975 by Kajiya Sutherland and Cheadle

This supported a palette of 256 36-bit RGB colors

16 Applications of image processing

Interest in digital image processing methods stems from 2 principal

application areas

1) Improvement of pictorial information for human interpretation

2) Processing of scene data for autonomous machine perception

In the second application area interest focuses on procedures for

extracting from an image

Information in a form suitable for computer processing

Examples include automatic character recognition industrial machine

vision for product assembly and inspection military recognizance

automatic processing of fingerprints etc

17 EXISTING SYSTEM

In Crihalmeanu and Ross proposed three approaches Speed Up Robust

Features (SURF)-based method minutiae detection and direct correlation

matching for feature registration and matching Within these three methods

the SURF method achieves the best accuracy It takes an average of 15

seconds1 using the SURF method to perform a one-to-one matching In

Zhou et al Proposed line descriptor-based method for sclera vein

recognition The matching step (including registration) is the most time-

consuming step in this sclera vein recognition system which costs about 12

seconds to perform a one-to-one matching Both speed was calculated using

a PC with Intelreg Core TM 2 Duo 24GHz processors and 4 GB DRAM

Currently Sclera vein recognition algorithms are designed using central

processing unit (CPU)-based systems

171 DISADVANTAGES OF EXISTING SYSTEM

1 Mask files are used to calculate valid overlapping areas of two sclera

templates and to align the templates to the same coordinate system But the

mask files are large in size and will preoccupy the GPU memory and slow

down the data transfer Also some of processing on the mask files will

involve convolution which is difficult to improve its performance on the

scalar process unit on CUDA

2 The procedure of sclera feature matching consists of a pipeline of several

computational stages with different memory and processing requirements

There is no uniform mapping scheme applicable for all these stages

3 When the scale of sclera database is far larger than the number of the

processing units on the GPU parallel matching on the GPU is still unable to

satisfy the requirement of real-time performance

LITERATURE SURVEY

1 S Crihalmeanu and A Ross ldquo ieee Multispectral scleral patterns

for ocular biometric recognitionrdquo Pattern Recognit Lett vol 33 no

14 pp 1860ndash1869 Oct 2012

Face recognition in unconstrained acquisition conditions is one of the

most challenging problems that has been actively researched in recent

years It is well known that many state-of-the-arts still face recognition

algorithms perform well when constrained (frontal well illuminated high-

resolution sharp and full) face images are acquired However their

performance degrades significantly when the test images contain variations

that are not present in the training images In this paper we highlight some

of the key issues in remote face recognition We define the remote face

recognition as one where faces are several tens of meters (10-250m) from

the cameras We then describe a remote face database which has been

acquired in an unconstrained outdoor maritime environment Recognition

performance of a subset of existing still image-based face recognition

algorithms is evaluated on the remote face data set Further we define the

remote re-identification problem as matching a subject at one location with

candidate sets acquired at a different location and over time in remote

conditions We provide preliminary experimental results on remote re-

identification It is demonstrated that in addition to applying a good

classification algorithm finding features that are robust to variations

mentioned above and developing statistical models which can account for

these variations are very important for remote face recognition

2 R N Rakvic B J Ulis R P Broussard R W Ives and N

Steiner1 ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics

Security

With the rapidly expanded biometric data collected by various sectors

of government and industry for identification and verification purposes

how to manage and process such Big Data draws great concern Even

though modern processors are equipped with more cores and memory

capacity it still requires careful design in order to utilize the hardware

resource effectively and the power consumption efficiently This research

addresses this issue by investigating the workload characteristics of

biometric application Taking Daugmanrsquos iris matching algorithm which

has been proven to be the most reliable iris matching method as a case

study we conduct performance profiling and binary instrumentation on the

benchmark to capture its execution behavior The results show that data

loading and memory access incurs great performance overhead and

motivates us to move the biometrics computation to high-performance

architecture

Modern iris recognition algorithms can be computationally intensive

yet are designed for traditional sequential processing elements such as a

personal computer However a parallel processing alternative using field

programmable gate arrays (FPGAs) offers an opportunity to speed up iris

recognition Within the means of this project iris template generation with

directional filtering which is acomputationally expensive yet parallel

portion of a modern iris recognition algorithm is parallelized on an FPGA

system We will present a performance comparison of the parallelized

algorithm on the FPGA system to a traditional CPU-based version The

parallelized template generation outperforms an optimized C++ code

version determining the information content of an iris approximately 324

times faster

3 R Derakhshani A Ross and S Crihalmeanu ldquoA new biometric

modality based on conjunctival vasculaturerdquo in Proc Artif Neural

Netw Eng 2006 pp 1ndash8

A new biometric indicator based on the patterns of conjunctival

vasculature is proposed Conjunctival vessels can be observed on the visible

part of the sclera that is exposed to the outside world These vessels

demonstrate rich and specific details in visible light and can be easily

photographed using a regular digital camera In this paper we discuss

methods for conjunctival imaging preprocessing and feature extraction in

order to derive a suitable conjunctival vascular template for biometric

authentication Commensurate classification methods along with the

observed accuracy are discussed Experimental results suggest the potential

of using conjunctival vasculature as a biometric measure Identification of

a person based on some unique set of features is an important task The

human identification is possible with several biometric systems and sclera

recognition is one of the promising biometrics The sclera is the white

portion of the human eye The vein pattern seen in the sclera region is

unique to each person Thus the sclera vein pattern is a well suited

biometric technology for human identification The existing methods used

for sclera recognition have some drawbacks like only frontal looking

images are preferred for matching and rotation variance is another problem

These problems are completely eliminated in the proposed system by using

two feature extraction techniques They are Histogram of Oriented

Gradients (HOG) and converting the image into polar form using the

bilinear interpolation technique These two features help the proposed

system to become illumination invariant and rotation invariant The

experimentation is done with the help of UBIRIS database The

experimental result shows that the proposed sclera recognition method can

achieve better accuracy than the previous methods

4 J D Owens M Houston D Luebke S Green J E Stone and J

C Phillips ldquoGPU computingrdquo Proc IEEE vol 96 no 5pp 879ndash899

May 2008

The graphics processing unit (GPU) has become an integral part of

todayrsquos mainstream computing systems Over the past six years there has

been a marked increase in the performance and capabilities of GPUs The

modern GPU is not only a powerful graphics engine but also a highly

parallel programmable processor featuring peak arithmetic and memory

bandwidth that substantially outpaces its CPU counterpart The GPUrsquos

rapid increase in both programmability and capability has spawned a

research community that has successfully mapped a broad range of

computationally demanding complex problems to the GPU This effort in

general purpose computing on the GPU also known as GPU computing

has positioned the GPU as a compelling alternative to traditional

microprocessors in high-performance computer systems of the future We

describe the background hardware and programming model for GPU

computing summarize the state of the art in tools and techniques and

present four GPU computing successes in game physics and computational

biophysics that deliver order-of-magnitude performance gains over

optimized CPU applications

5 H Proenccedila and L A Alexandre ldquoUBIRIS A noisy iris image

databaserdquoin Proc 13th Int Conf Image Anal Process 2005 pp 970ndash

977

This paper proposes algorithms for iris segmentation quality

enhancement match score fusion and indexing to improve both the

accuracy and the speed of iris recognition A curve evolution approach is

proposed to effectively segment a nonideal iris image using the modified

MumfordndashShah functional Different enhancement algorithms are

concurrently applied on the segmented iris image to produce multiple

enhanced versions of the iris image A support-vector-machine-based

learning algorithm selects locally enhanced regions from each globally

enhanced image and combines these good-quality regions to create a single

high-quality iris image Two distinct features are extracted from the high-

quality iris image The global textural feature is extracted using the 1-D log

polar Gabor transform and the local topological feature is extracted using

Euler numbers An intelligent fusion algorithm combines the textural and

topological matching scores to further improve the iris recognition

performance and reduce the false rejection rate whereas an indexing

algorithm enables fast and accurate iris identification The verification and

identification performance of the propose algorithms is validated and

compared with other algorithms using the CASIA Version 3 ICE 2005 and

UBIRIS iris databases

18 PROPOSED METHOD

Proposed a new parallel sclera vein recognition method using a two-

stage parallel approach for registration and matching A parallel sclera

matching solution for Sclera vein recognition using our sequential line-

descriptor method using the CUDA GPU architecture CUDA is a highly

parallel multithreaded many-core processor with tremendous

computational power

It supports not only a traditional graphics pipeline but also computation

on non-graphical data It is relatively straightforward to implement our C

program for CUDA on AMD-based GPU using Open CL Our CUDA

kernels can be directly converted to Open CL kernels by concerning

different syntax for various keywords and built-in functions The mapping

strategy is also effective in Open CL if we regard thread and block in

CUDA as work item and work-group in Open CL Most of our optimization

techniques such as coalesced memory access and prefix sum can work in

Open CL too Moreover since CUDA is a data parallel architecture the

implementation of our approach by Open CL should be programmed in

data-parallel model

In this research we first discuss why the naiumlve parallel approach would

not work We then propose the new sclera descriptor ndash the Y shape sclera

feature-based efficient registration method to speed up the mapping scheme

introduce the ldquoweighted polar line (WPL) descriptorrdquo that would be better

suited for parallel computing to mitigate the mask size issue and develop

our coarse to fine two-stage matching process to dramatically improve the

matching speed These new approaches make the parallel processing

possible and efficient

191PROPOSED SYSTEM ADVANTAGES

1 To improve the efficiency in this research we propose a new descriptor

mdash the Y shape descriptor which can greatly help improve the efficiency of

the coarse registration of two images and can be used to filter out some

non-matching pairs before refined matching

2 We propose the coarse-to-fine two-stage matching process In the first

stage we matched two images coarsely using the Y-shape descriptors

which is very fast to match because no registration was needed The

matching result in this stage can help filter out image pairs with low

similarities

CHAPTER 2

PROJECT DESCRIPTION

21 INTRODUCTION

The sclera is the opaque and white outer layer of the eye The blood

vessel structure of sclera is formed randomly and is unique to each person

which can be used for humanrsquos identification Several researchers have

designed different Sclera vein recognition methods and have shown that it

is promising to use Sclera vein recognition for human identification In

Crihalmeanu and Ross proposed three approaches Speed Up Robust

Features (SURF)-based method minutiae detection and direct correlation

matching for feature registration and matching Within these three methods

the SURF method achieves the best accuracy It takes an average of 15

seconds1 using the SURF method to per- form a one-to-one matching Zhou

et al proposed line descriptor-based method for sclera vein recognition

The matching step (including registration) is the most time-consuming step

in this sclera vein recognition system which costs about 12 seconds to

perform a one-to-one matching Both speed was calculated using a PC with

Intelreg Coretrade 2 Duo 24GHz processors and 4 GB DRAM Currently

Sclera vein recognition algorithms are designed using central processing

unit (CPU)-based systems

As discussed CPU-based systems are designed as sequential

processing devices which may not be efficient in data processing where the

data can be parallelized Because of large time consumption in the matching

step Sclera vein recognition using sequential-based method would be very

challenging to be implemented in a real time biometric system especially

when there is large number of templates in the database for matching GPUs

(as abbreviation of General purpose Graphics Processing Units GPGPUs)

are now popularly used for parallel computing to improve the

computational processing speed and efficiency The highly parallel

structure of GPUs makes them more effective than CPUs for data

processing where processing can be performed in parallel GPUs have been

widely used in biometrics recognition such as speech recognition text

detection handwriting recognition and face recognition In iris

recognition GPU was used to extract the features construct descriptors

and match templates

GPUs are also used for object retrieval and image search Park et al

designed the performance evaluation of image processing algorithms such

as linear feature extraction and multi-view stereo matching on GPUs

However these approaches were designed for their specific biometric

recognition applications and feature searching methods Therefore they may

not be efficient for Sclera vein recognition Compute Unified Device

Architecture (CUDA) the computing engine of NVIDIA GPUs is used in

this research CUDA is a highly parallel multithreaded many-core

processor with tremendous computational power It supports not only a

traditional graphics pipeline but also computation on non-graphical data

More importantly it offers an easier programming platform which

outperforms its CPU counterparts in terms of peak arithmetic intensity and

memory bandwidth In this research the goal is not to develop a unified

strategy to parallelize all sclera matching methods because each method is

quite different from one another and would need customized design To

develop an efficient parallel computing scheme it would need different

strategies for different Sclera vein recognition methods

Rather the goal is to develop a parallel sclera matching solution for

Sclera vein recognition using our sequential line-descriptor method using

the CUDA GPU architecture However the parallelization strategies

developed in this research can be applied to design parallel approaches for

other Sclera vein recognition methods and help parallelize general pattern

recognition methods Based on the matching approach in there are three

challenges to map the task of sclera feature matching to GPU

1) Mask files are used to calculate valid overlapping areas of two sclera

templates and to align the templates to the same coordinate system But the

mask files are large in size and will preoccupy the GPU memory and slow

down the data transfer Also some of processing on the mask files will

involve convolution which is difficult to improve its performance on the

scalar process unit on CUDA

2) The procedure of sclera feature matching consists of a pipeline of several

computational stages with different memory and processing requirements

There is no uniform mapping scheme applicable for all these stages

3) When the scale of sclera database is far larger than the number of the

processing units on the GPU parallel matching on the GPU is still unable to

satisfy the requirement of real-time performance New designs are

necessary to help narrow down the search range In summary naiumlve

implementation of the algorithms in parallel would not work efficiently

Note it is relatively straightforward to implement our C program for

CUDA on AMD-based GPU using Open CL Our CUDA kernels can be

directly converted to Open CL kernels by concerning different syntax for

various keywords and built-in functions The mapping strategy is also

effective in Open CL if we regard thread and block in CUDA as work item

and work-group in Open CL Most of our optimization techniques such as

coalesced memory access and prefix sum can work in Open CL too

Moreover since CUDA is a data parallel architecture the implementation

of our approach by Open CL should be programmed in data-parallel model

In this research we first discuss why the naiumlve parallel approach would not

work (Section 3) We then propose the new sclera descriptor ndash the Y shape

sclera feature-based efficient registration method to speed up the mapping

scheme (Section 4) introduce the ldquoweighted polar line (WPL) descriptorrdquo

that would be better suited for parallel computing to mitigate the mask size

issue (Section 5) and develop our coarse to fine two-stage matching

process to dramatically improve the matching speed (Section 6) These new

approaches make the parallel processing possible and efficient However it

is non-trivial to implement these algorithms in CUDA We then developed

the implementation schemes to map our algorithms into CUDA (Section 7)

In the Section 2 we give brief introduction of Sclera vein recognition In

the Section 8 we performed some experiments using the proposed system

In the Section 9 we draw some conclusions

22 BACKGROUND OF SCLERA VEIN RECOGNITION

221 OVERVIEW OF SCLERA VEIN RECOGNITION

A typical sclera vein recognition system includes sclera

segmentation feature enhancement feature extraction and feature

matching (Figure 1)

FIG

Sclera image segmentation is the first step in sclera vein recognition

Several methods have been designed for sclera segmentation Crihalmeanu

et al presented an semi-automated system for sclera segmentation They

used a clustering algorithm to classify the color eye images into three

clusters - sclera iris and background Later on Crihalmeanu and Ross

designed a segmentation approach based on a normalized sclera index

measure which includes coarse sclera segmentation pupil region

segmentation and fine sclera segmentation Zhou et al developed a skin

tone plus ldquowhite colorrdquo-based voting method for sclera segmentation in

color images and Otsursquos thresholding-based method for grayscale images

After sclera segmentation it is necessary to enhance and extract the sclera

features since the sclera vein patterns often lack contrast and are hard to

detect Zhou et al used a bank of multi-directional Gabor filters for

vascular pattern enhancement Derakhshani et al used contrast limited

adaptive histogram equalization (CLAHE) to enhance the green color plane

of the RGB image and a multi-scale region growing approach to identify

the sclera veins from the image background Crihalmeanu and Ross applied

a selective enhancement filter for blood vessels to extract features from the

green component in a color image In the feature matching step

Crihalmeanu and Ross proposed

three registration and matching approaches including Speed Up Robust

Features (SURF) which is based on interest-point detection minutiae

detection which is based on minutiae points on the vasculature structure

and direct correlation matching which relies on image registration Zhou et

al designed a line descriptor based feature registration and matching

method

The proposed sclera recognition consists of five steps which include

sclera segmentation vein pattern enhancement feature extraction feature

matching and matching decision Fig 2 shows the block diagram of sclera

recognition Two types of feature extraction are used in the proposed

method to achieve good accuracy for the identification The characteristics

that are elicited from the blood vessel structure seen in the sclera region are

Histogram of Oriented Gradients (HOG) and interpolation of Cartesian to

Polar conversion HOG is used to determine the gradient orientation and

edge orientations of vein pattern in the sclera region of an eye image To

become more computationally efficient the data of the image are converted

to the polar form It is mainly used for circular or quasi circular shape of

object These two characteristics are extracted from all the images in the

database and compared with the features of the query image whether the

person is correctly identified or not This procedure is done in the feature

matching step and ultimately makes the matching decision By using the

proposed feature extraction methods and matching techniques the human

identification is more accurate than the existing studies In the proposed

method two features of an image are drawn out

222 SCLERA SEGMENTATION

Sclera segmentation is the first step in the sclera recognition It lets

in three steps glare area detection sclera area estimation and iris and eyelid

detection and refinement Fig shows the steps of segmentation

FIG

Glare Area Detection Glare area means a small bright area near

pupil or iris This is the unwanted portion on the eye image Sobel filter is

applied to detect the glare area present in the iris or pupil Simply it runs

only for the grayscale image If the image is color then it needs a

conversion to grayscale image and after that apply it to the Sobel filter to

detect the glare area Fig 4 shows the result of the glare area detection

FIG

Sclera area estimation For the estimation of sclera area Otsursquos

thresholding method is applied The stairs of the sclera area detection are

selection of the area of interest (ROI) Otsursquos thresholding sclera area

detection Left and right sclera area is selected based on the iris boundaries

When the region of interest is selected then apply Otsursquos thresholding for

obtaining the potential sclera areas The correct left sclera area should be

placed in the right and center positions and correct right sclera area should

be placed in the left and center In this way non sclera areas are wiped out

223 IRIS AND EYELID REFINEMENT

The top and underside of the sclera regions are the limits of the

sclera area And then that upper eyelid lower eyelid and iris boundaries are

refined These altogether are the unwanted portion for recognition In order

to eliminate these effects refinement is done in the footstep of the detection

of sclera area Fig shows after the Otsursquos thresholding process and iris and

eyelid refinement to detect right sclera area In the same way the left sclera

area is detected using this method

FIG

In the segmentation process all images are not perfectly segmented

Hence feature extraction and matching are needed to reduce the

segmentation fault The vein patterns in the sclera area are not visible in the

segmentation process To get vein patterns more visible vein pattern

enhancement is to be performed

224 OCULAR SURFACE VASCULATURE

Human recognition using vascular patterns in the human body has

been studied in the context of fingers (Miura et al 2004) palm (Lin and

Fan 2004) and retina (Hill 1999) In the case of retinal biometrics an

especial optical device for imaging the back of the eyeball is needed (Hill

1999) Due to its perceived invasiveness and the required degree of subject

cooperation the use of retinal biometrics may not be acceptable to some

individuals The conjunctiva is a thin transparent and moist tissue that

covers the outer surface of the eye The part of the conjunctiva that covers

the inner lining of the eyelids is called palpebral conjunctiva and the part

that covers the outer surface of the eye is called ocular (or the bulbar)

conjunctiva which is the focus of this study The ocular conjunctiva is very

thin and clear thus the vasculature (including those of the episclera) is

easily visible through it The visible microcirculation of conjunctiva offers a

rich and complex network of veins and fine microcirculation (Fig 1) The

apparent complexity and specificity of these vascular patterns motivated us

to utilize them for personal identification (Derakhshani and Ross 2006)

FIG

We have found conjunctival vasculature to be a suitable biometric as it

conforms to the following criteria (Jain et al 2004)

UNIVERSALITY All normal living tissues including that of the

conjunctiva and episclera have vascular structure

UNIQUENESS Vasculature is created during embryonic vasculogenesis

Its detailed final structure is mostly stochastic and thus unique Even

though no comprehensive study on the uniqueness of vascular structures

has been conducted study of some targeted areas such as those of the eye

fundus confirm the uniqueness of such vascular patterns even between

identical twins (Simon and Goldstein 1935 Tower 1955)

PERMANENCE Other than cases of significant trauma pathology or

chemical intervention spontaneous adult ocular vasculogenesis and

angiogenesis does not easily occur Thus the conjunctival vascular

structure is expected to have reasonable permanence (Joussen 2001)

Practicality Conjunctival vasculature can be captured with commercial off

the shelf digital cameras under normal lighting conditions making this

modality highly practical

ACCEPTABILITY Since the subject is not required to stare directly into

the camera lens and given the possibility of capturing the conjunctival

vasculature from several feet away this modality is non-intrusive and thus

more acceptable

SPOOF-PROOFNESS The fine multi surface structure of the ocular

veins makes them hard to reproduce as a physical artifact Besides being a

stand-alone biometric modality we anticipate that the addition of

conjunctival biometrics will enhance the performance of current iris-based

biometric system in the following ways

Improving accuracy by the addition of vascular features

Facilitating recognition using off-angle iris images For instance if the iris

information is relegated to the left or right portions of the eye the sclera

vein patterns will be further exposed This feature makes sclera vasculature

a natural complement to the iris biometric

Addressing the failure-to-enroll issue when iris patterns are not usable (eg

due to surgical procedures)

Reducing vulnerability to spoof attacks For instance when implemented

alongside iris systems an attacker needs to reproduce not only the iris but

also different surfaces of the sclera along with the associated

microcirculation and make them available on commensurate eye surfaces

The first step in parallelizing an algorithm is to determine the

availability for simultaneous computation Below Figure demonstrates the

possibility for parallel directional filtering Since the filter is computed over

different portions of the input image the computation can be computed in

parallel (denoted by Elements below) In addition individual parallelization

of each element of Filtering can also be performed A detailed discussion of

our proposed parallelization is outside the scope of this paper

FIG

FIG

225 OVERVIEW OF THE LINE DESCRIPTOR-BASED SCLERA

VEIN

2251 RECOGNITION METHOD

The matching segment of the line-descriptor based method is a

bottleneck with regard to matching speed In this section we briefly

describe the Line Descriptor-based sclera vein recognition method After

segmentation vein patterns were enhanced by a bank of directional Gabor

filters Binary morphological operations are used to thin the detected vein

structure down to a single pixel wide skeleton and remove the branch

points The line descriptor is used to describe the segments in the vein

structure Figure 2 shows a visual description of the line descriptor Each

segment is described by three quantities the segments angle to some

reference angle at the iris center θ the segments distance to the iris center r

and the dominant angular orientation of the line segment ɸ Thus the

descriptor is S = ( θ r ɸ )T The individual components of the line descriptor

are calculated as

FIG

Here fline (x) is the polynomial approximation of the line segment (xl yl )

is the center point of the line segment (xi yi ) is the center of the detected

iris and S is the line descriptor In order to register the segments of the

vascular patterns a RANSAC-based algorithm is used to estimate the best-

fit parameters for registration between the two sclera vascular patterns For

the registration algorithm it randomly chooses two points ndash one from the

test template and one from the target template It also randomly chooses a

scaling factor and a rotation value based on a priori knowledge of the

database Using these values it calculates a fitness value for the registration

using these parameters

After sclera template registration each line segment in the test

template is compared to the line segments in the target template for

matches In order to reduce the effect of segmentation errors we created the

weighting image (Figure 3) from the sclera mask by setting interior pixels

in the sclera mask to 1 pixels within some distance of the boundary of the

mask to 05 and pixels outside the mask to 0

The matching score for two segment descriptors is calculated By

where Si and Sj are two segment descriptors m(Si Sj ) is the matching

score between segments Si and Sj d(Si Sj ) is the Euclidean distance

between the segment descriptors center points (from Eq 6-8) Dmatch is

the matching distance threshold and match is the matching angle threshold

The total matching score M is the sum of the individual matching scores

divided by the maximum matching score for the minimal set between the

test and target template That is one of the test or target templates has fewer

points and thus the sum of its descriptors weight sets the maximum score

that can be attained

FIG

FIG

FIG

FIG

movement of eye Y shape branches are observed to be a stable feature and

can be used as sclera feature descriptor To detect the Y shape branches in

the original template we search for the nearest neighbors set of every line

segment in a regular distance classified the angles among these neighbors

If there were two types of angle values in the line segment set this set may

be inferred as a Y shape structure and the line segment angles would be

recorded as a new feature of the sclera

There are two ways to measure both orientation and relationship of

every branch of Y shape vessels one is to use the angles of every branch to

x axle the other is to use the angels between branch and iris radial

direction The first method needs additional rotation operating to align the

template In our approach we employed the second method As Figure 6

shows ϕ1 ϕ2 and ϕ3 denote the angle between each branch and the radius

from pupil center Even when the head tilts the eye moves or the camera

zooms occurs at the image acquisition step ϕ1 ϕ2 and ϕ3 are quite stable

To tolerate errors from the pupil center calculation in the segmentation step

we also recorded the center position (x y) of the Y shape branches as

auxiliary parameters So in our rotation shift and scale invariant feature

vector is defined as y(ϕ1 ϕ2 ϕ3 x y) The Y-shape descriptor is generated

with reference to the iris center Therefore it is automatically aligned to the

iris centers It is a rotational- and scale- invariant descriptor V WPL

SCLERA DESCRIPTOR As we discussed in the Section 22 the line

descriptor is extracted from the skeleton of vessel structure in binary images

(Figure 7) The skeleton is then broken into smaller segments For each

segment a line descriptor is created to record the center and orientation of

the segment This descriptor is expressed as s(x yɸ) where (x y) is the

position of the center and ɸ is its orientation Because of the limitation of

segmentation accuracy the descriptor in the boundary of sclera area might

not be accurate and may contain spur edges resulting from the iris eyelid

andor eyelashes To be tolerant of such error the mask file

FIG

The line descriptor of the sclera vessel pattern (a) An eye image (b) Vessel

patterns in sclera (c) Enhanced sclera vessel patterns (d) Centers of line

segments of vessel patterns

Is designed to indicate whether a line segment belongs to the edge of the

sclera or not However in GPU application using the mask is a challenging

since the mask files are large in size and will occupy the GPU memory and

slow down the data transfer When matching the registration RANSAC

type algorithm was used to randomly select the corresponding descriptors

and the transform parameter between them was used to generate the

template transform affine matrix After every templates transform the mask

data should also be transformed and new boundary should be calculated to

evaluate the weight of the transformed descriptor This results in too many

convolutions in processor unit

To reduce heavy data transfer and computation we designed the

weighted polar line (WPL) descriptor structure which includes the

information of mask and can be automatically aligned We extracted the

relationship of geometric feature of descriptors and store them as a new

descriptor We use a weighted image created via setting various weight

values according to their positions The weight of those descriptors who are

beyond the sclera are set to be 0 and those who are near the sclera

boundary are 05 and interior descriptors are set to be 1 In our work

descriptors weights were calculated on their own mask by the CPU only

once

The calculating result was saved as a component of descriptor The

descriptor of sclera will change to s(x y ɸw) where w denotes the weight

of the point and the value may be 0 05 1 To align two templates when a

template is shifted to another location along the line connecting their

centers all the descriptors of that template will be transformed It would be

faster if two templates have similar reference points If we use the center of

the iris as the reference point when two templates are compared the

correspondence will automatically be aligned to each other since they have

the similar reference point Every feature vector of the template is a set of

line segment descriptors composed of three variable (Figure 8) the

segment angle to the reference line which went through the iris center

denoted as θ the distance between the segments center and pupil center

which is denoted as r the dominant angular orientation of the segment

denoted as ɸ To minimize the GPU computing we also convert the

descriptor value from polar coordinate to rectangular coordinate in CPU

preprocess

The descriptor vector becomes s(x y r θ ɸw) The left and right

parts of sclera in an eye may have different registration parameters For

example as an eyeball moves left left part sclera patterns of the eye may be

compressed while the right part sclera patterns are stretched

In parallel matching these two parts are assigned to threads in

different warps to allow different deformation The multiprocessor in

CUDA manages threads in groups of 32 parallel threads called warps We

reorganized the descriptor from same sides and saved

FIG

FIG

them in continuous address This would meet requirement of coalesced

memory access in GPU

After reorganizing the structure of descriptors and adding mask information

into the new descriptor the computation on the mask file is not needed on

the GPU It was very fast to match with this feature because it does not

need to reregister the templates every time after shifting Thus the cost of

data transfer and computation on GPU will be reduced Matching on the

new descriptor the shift parameters generator in Figure 4 is then simplified

as Figure 9

23 EVOLUTION OF GPU ARCHITECTURE

The fixed-function pipeline lacked the generality to efficiently express

more complicated shading and lighting operations that are essential for

complex effects The key step was replacing the fixed-function per-vertex

and per-fragment operations with user-specified programs run on each

vertex and fragment Over the past six years these vertex programs and

fragment programs have become increasingly more capable with larger

limits on their size and resource consumption with more fully featured

instruction sets and with more flexible control-flow operations After many

years of separate instruction sets for vertex and fragment operations current

GPUs support the unified Shader Model 40 on both vertex and fragment

shaders

The hardware must support shader programs of at least 65 k static

instructions and unlimited dynamic instructions

The instruction set for the first time supports both 32-bit integers and 32-

bit floating-point numbers

The hardware must allow an arbitrary number of both direct and indirect

reads from global memory (texture)

Finally dynamic flow control in the form of loops and branches must be

supported

As the shader model has evolved and become more powerful and GPU

applications of all types have increased vertex and fragment program

complexity GPU architectures have increasingly focused on the

programmable parts of the graphics pipeline Indeed while previous

generations of GPUs could best be described as additions of

programmability to a fixed-function pipeline todayrsquos GPUs are better

characterized as a programmable engine surrounded by supporting fixed-

function units General-Purpose Computing on the GPU Mapping general-

purpose computation onto the GPU uses the graphics hardware in much the

same way as any standard graphics application Because of this similarity it

is both easier and more difficult to explain the process On one hand the

actual operations are the same and are easy to follow on the other hand the

terminology is different between graphics and general-purpose use Harris

provides an excellent description of this mapping process

We begin by describing GPU programming using graphics terminology

then show how the same steps are used in a general-purpose way to author

GPGPU applications and finally use the same steps to show the more

simple and direct way that todayrsquos GPU computing applications are written

231 PROGRAMMING A GPU FOR GRAPHICS

We begin with the same GPU pipeline that we described in Section II

concentrating on the programmable aspects of this pipeline

The programmer specifies geometry that covers a region on the screen

The rasterizer generates a fragment at each pixel location covered by that

geometry

Each fragment is shaded by the fragment program

The fragment program computes the value of the fragment by a

combination of math operations and global memory reads from a global

Btexture[ memory

The resulting image can then be used as texture on future passes through

the graphics pipeline

232 PROGRAMMING A GPU FOR GENERAL-PURPOSE

PROGRAMS (OLD)

Coopting this pipeline to perform general-purpose computation

involves the exact same steps but different terminology A motivating

example is a fluid simulation computed over a grid at each time step we

compute the next state of the fluid for each grid point from the current state

at its grid point and at the grid points of its neighbors

The programmer specifies a geometric primitive that covers a

computation domain of interest The rasterizer generates a fragment at each

pixel location covered by that geometry (In our example our primitive

must cover a grid of fragments equal to the domain size of our fluid

simulation)

Each fragment is shaded by an SPMD general-purpose fragment

program (Each grid point runs the same program to update the state of its

fluid)

The fragment program computes the value of the fragment by a

combination of math operations and Bgather[ accesses from global

memory (Each grid point can access the state of its neighbors from the

previous time step in computing its current value)

The resulting buffer in global memory can then be used as an input on

future passes (The current state of the fluid will be used on the next time

step)

23 2 PROGRAMMING A GPU FOR GENERAL-PURPOSE

PROGRAMS (NEW)

One of the historical difficulties in programming GPGPU applications

has been that despite their general-purpose tasksrsquo having nothing to do with

graphics the applications still had to be programmed using graphics APIs

In addition the program had to be structured in terms of the graphics

pipeline with the programmable units only accessible as an intermediate

step in that pipeline when the programmer would almost certainly prefer to

access the programmable units directly The programming environments we

describe in detail in Section IV are solving this difficulty by providing a

more natural direct non-graphics interface to the hardware and

specifically the programmable units Today GPU computing applications

are structured in the following way

The programmer directly defines the computation domain of interest as a

structured grid of threads

An SPMD general-purpose program computes the value of each thread

The value for each thread is computed by a combination of math

operations and both Bgather[ (read) accesses from and Bscatter[ (write)

accesses to global memory Unlike in the previous two

methods the same buffer can be used for both reading and writing

allowing more flexible algorithms (for example in-place algorithms that

use less memory)

The resulting buffer in global memory can then be used as an input in

future computation

24 COARSE-TO-FINE TWO-STAGE MATCHING PROCESS

To further improve the matching process we propose the coarse-to-fine

two-stage matching process In the first stage we matched two images

coarsely using the Y-shape descriptors which is very fast to match because

no registration was needed The matching result in this stage can help filter

out image pairs with low similarities After this step it is still possible for

some false positive matches In the second stage we used WPL descriptor

to register the two images for more detailed descriptor matching including

scale- and translation invariance This stage includes shift transform affine

matrix generation and final WPL descriptor matching Overall we

partitioned the registration and matching processing into four kernels2 in

CUDA (Figure 10) matching on the Y shape descriptor shift

transformation affine matrix generation and final WSL descriptor

matching Combining these two stages the matching program can run faster

and

achieve more accurate score

241 STAGE I MATCHING WITH Y SHAPE DESCRIPTOR

Due to scale- and rotation- invariance of the Y-shape features

registration is unnecessary before matching on Y shape descriptor The

whole matching algorithm is listed as algorithm 1

FIG

Here ytei and yta j are the Y shape descriptors of test template Tte

and target template Tta respectively dϕ is the Euclidian distance of angle

element of descriptors vector defined as (3) dxy is the Euclidian distance of

two descriptor centers defined as (4) ni and di are the matched descriptor

pairsrsquo number and their centers distance respectively tϕ is a distance

threshold and txy is the threshold to restrict the searching area We set tϕ to

30 and txy to 675 in our experiment Here

To match two sclera templates we searched the areas nearby to all

the Y shape branches The search area is limited to the corresponding left or

right half of the sclera in order to reduce the searching range and time The

distance of two branches is defined in (3) where ϕi j is the angle between

the j th branch and the polar from pupil center in desctiptor i

The number of matched pairs ni and the distance between Y shape

branches centers di are stored as the matching result We fuse the number of

matched branches and the average distance between matched branches

centers as (2) Here α is a factor to fuse the matching score which was set

to 30 in our study Ni and Nj is the total numbers of feature vectors in

template i and j separately The decision is regulated by the threshold t if

the sclerarsquos matching score is lower than t the sclera will be discarded The

sclera with high matching score will be passed to the next more precisely

matching process

242 STAGE II FINE MATCHING USING WPL DESCRIPTOR

The line segment WSL descriptor reveals more vessel structure detail of

sclera than the Y shape descriptor The variation of sclera vessel pattern is

nonlinear because

When acquiring an eye image in different gaze angle the vessel structure

will appear nonlinear shrink or extend because eyeball is spherical in shape

sclera is made up of four layers episclera stroma lamina fusca and

endothelium There are slightly differences among movement of these

layers Considering these factors our registration employed both single

shift transform and multi-parameter transform which combines shift

rotation and scale together

1) SHIFT PARAMETER SEARCH As we discussed before

segmentation may not be accurate As a result the detected iris center could

not be very accurate Shift transform is designed to tolerant possible errors

in pupil center detection in the segmentation step If there is no deformation

or only very minor deformation registration with shift transform together

would be adequate to achieve an accurate result We designed Algorithm 2

to get optimized shift parameter Where Tte is the test template and ssei is

the i th WPL descriptor of Tte Tta is the target template and ssai is the i th

WPL descriptor of Ttad(stek staj ) is Euclidean distance of escriptors stek

and staj

Δsk is the shift value of two descriptors defines as

We first randomly select an equal number of segment descriptors

stek in test template Tte from each quad and find its nearest neighbors staj _

in target template Tta The shift offset of them is recorded as the possible

registration shift factor _sk The final offset registration factor is _soptim

which has the smallest standard deviation among these candidate offsets

2) AFFINE TRANSFORM PARAMETER SEARCH

Affine transform is designed to tolerant some deformation of sclera

patterns in the matching step The affine transform algorithm is shown in

Algorithm 3 The shift value in the parameter set is obtained by randomly

selecting descriptor s(it )te and calculating the distance from its nearest

neighbor staj_ in Tta We transform the test template by the matrix in (7)

At end of the iteration we count the numbers of matched descriptor pairs

from the transformed template and the target template The factor β is

involved to determine if the pair of descriptor is matched and we set it to

be 20 pixels in our experiment After N iterations the optimized transform

parameter set is determined via selecting the maximum matching numbers

m(it) Here stei Tte staj and Tta is defined same as algorithm 2 tr (it )

shi f t θ(it )tr (it ) scale is the parameters of shift rotation and scale

transform generated in i tth iteration R(θ (it )) T (tr (it ) shi f t ) and S(tr (it

) scale) are the transform matrix defined as (7) To search optimize

transform parameter we iterated N times to generate these parameters In

our experiment we set iteration time to 512

3) REGISTRATION AND MATCHING ALGORITHM

Using the optimized parameter set determined from Algorithms 2

and 3 the test template will be registered and matched simultaneously The

registration and matching algorithm is listed in Algorithm 4 Here stei Tte

staj and Tta are defined same as Algorithms 2 and 3 θ(optm) tr (optm) shi f

t tr (optm) scale _soptim are the registration parameters attained from

Algorithms 2 and 3 R_θ(optm)_T _tr (optm) shi f t _S_tr (optm) scale

_ is the descriptor transform matrix defined in Algorithm 3 empty is the angle

between the segment descriptor and radius direction w is the weight of the

descriptor which indicates whether the descriptor is at the edge of sclera or

not To ensure that the nearest descriptors have a similar orientation we

used a constant factor α to check the abstract difference of two ɸ In our

experiment we set α to 5 The total matching score is minimal score of two

transformed result divided by the minimal matching score for test template

and target template

25 MAPPING THE SUBTASKS TO CUDA

CUDA is a single instruction multiple data (SIMD) system and

works as a coprocessor with a CPU A CUDA consists of many streaming

multiprocessors (SM) where the parallel part of the program should be

partitioned into threads by the programmer and mapped into those threads

There are multiple memory spaces in the CUDA memory hierarchy

register local memory shared memory global memory constant memory

and texture memory Register local memory and shared memory are on-

chip and could be a little time consuming to access these memories Only

shared memory can be accessed by other threads within the same block

However there is only limited availability of shared memory Global

memory constant memory and texture memory are off-chip memory and

accessible by all threads which would be very time consuming to access

these memories

Constant memory and texture memory are read-only and cacheable

memory Mapping algorithms to CUDA to achieve efficient processing is

not a trivial task There are several challenges in CUDA programming

If threads in a warp have different control path all the branches will be

executed serially To improve performance branch divergence within a

warp should be avoided

Global memory is slower than on-chip memory in term of access To

completely hide the latency of the small instructions set we should use on-

chip memory preferentially rather than global memory When global

memory access occurs threads in same warp should access the words in

sequence to achieve coalescence

Shared memory is much faster than the local and global memory space

But shared memory is organized into banks which are equal in size If two

addresses of memory request from different thread within a warp fall in the

same memory bank the access will be serialized To get maximum

performance memory requests should be scheduled to minimize bank

conflicts

251 MAPPING ALGORITHM TO BLOCKS

Because the proposed registration and matching algorithm has four

independent modules all the modules will be converted to different kernels

on the GPU These kernels are different in computation density thus we

map them to the GPU by various map strategies to fully utilize the

computing power of CUDA Figure 11 shows our scheme of CPU-GPU

task distribution and the partition among blocks and threads Algorithm 1 is

partitioned into coarse-grained parallel subtasks

We create a number of threads in this kernel The number of threads

is the same as the number of templates in the database As the upper middle

column shows in Figure 11 each target template will be assigned to one

thread One thread performs a pair of templates compare In our work we

use NVIDIA C2070 as our GPU Threads and blocks number is set to

1024 That means we can match our test template with up to 1024times1024

target templates at same time

Algorithms 2-4 will be partitioned into fine-grained subtasks which is

processed a section of descriptors in one thread As the lower portion of the

middle column shows in Figure 11 we assigned a target template to one

block Inside a block one thread corresponds a set of descriptors in this

template This partition makes every block execute independently and there

are no data exchange requirements between different blocks When all

threads complete their responding descriptor fractions the sum of the

intermediate results needs to be computed or compared A parallel prefix

sum algorithm is used to calculate the sum of intermediate results which is

show in right of Figure 11 Firstly all odd number threads compute the sum

of consecutive pairs of the results Then recursively every first of i (= 4 8

16 32 64 ) threads

compute the prefix sum on the new result The final result will be saved in

the first address which has the same variable name as the first intermediate

result

252 MAPPING INSIDE BLOCK

In shift argument searching there are two schemes we can choose to

map task

Mapping one pair of templates to all the threads in a block and then every

thread would take charge of a fraction of descriptors and cooperation with

other threads

Assigning a single possible shift offset to a thread and all the threads will

compute independently unless the final result should be compared with

other possible offset

Due to great number of sum and synchronization operations in every

nearest neighbor searching step we choose the second method to parallelize

shift searching In affine matrix generator we mapped an entire parameter

set searching to a thread and every thread randomly generated a set of

parameters and tried them independently The generated iterations were

assigned to all threads The challenge of this step is the randomly generated

numbers might be correlated among threads In the step of rotation and

scale registration generating we used the Mersenne Twister pseudorandom

number generator because it can use bitwise arithmetic and have long

period

The Mersenne twister as most of pseudorandom generators is iterative

Therefore itrsquos hard to parallelize a single twister state update step among

several execution threads To make sure that thousands of threads in the

launch grid generate uncorrelated random sequence many simultaneous

Mersenne twisters need to process with different initial states in parallel

But even ldquovery differentrdquo (by any definition) initial state values do not

prevent the emission of correlated sequences by each generator sharing

identical parameters To solve this problem and to enable efficient

implementation of Mersenne Twister on parallel architectures we used a

special offline tool for the dynamic creation of Mersenne Twisters

parameters modified from the algorithm developed by Makoto Matsumoto

and Takuji Nishimura In the registration and matching step when

searching the nearest neighbor a line segment that has already matched

with others should not be used again In our approach a flag

FIG

FIG

Variable denoting whether the line has been matched is stored in

shared memory To share the flags all the threads in a block should wait

synchronic operation at every query step Our solution is to use a single

thread in a block to process the matching

253 MEMORY MANAGEMENT

The bandwidth inside GPU board is much higher than the

bandwidth between host memory and device memory The data transfer

between host and device can lead to long latency As shown in Figure 11

we load the entire target templates set from database without considering

when they would be processed Therefore there was no data transfer from

host to device during the matching procedure In global memory the

components in descriptor y(ϕ1 ϕ2 ϕ3 x y) and s(x y rθ ϕw) were stored

separately This would guarantee contiguous kernels of Algorithm 2 to 4

can access their data in successive addresses Although such coalescing

access reduces the latency frequently global memory access was still a

slower way to get data In our kernel we loaded the test template to shared

memory to accelerate memory access Because the Algorithms 2 to 4

execute different number of iterations on same data the bank conflict does

not happen To maximize our texture memory space we set the system

cache to the lowest value and bonded our target descriptor to texture

memory Using this catchable memory our data access was accelerated

more

FIG

26 HISTOGRAM OF ORIENTED GRADIENTS

Histogram of oriented gradients is the feature descriptors It is primarily

applied to the design of target detection In this paper it is applied as the

feature for human recognition In the sclera region the vein patterns are the

edges of an image So HOG is used to determine the gradient orientation

and edge orientations of vein pattern in the sclera region of an eye image

To follow out this technique first of all divide the image into small

connected regions called cells For each cell compute the histogram of

gradient directions or edge orientations of the pixels Then the combination

of different histogram of different cell represents the descriptor To improve

accuracy histograms can be contrast normalized by calculating the intensity

from the block and then using this value normalizes all cells within the

block This normalization result shows that it is invariant to geometric and

photometric changes The gradient magnitude m(x y) and orientation 1050592(x

y) are calculated using x and y directions gradients dx (x y) and dy (x y)

Orientation binning is the second step of HOG This method utilized

to create cell histograms Each pixel within the cell used to give a weight to

the orientation which is found in the gradient computation Gradient

magnitude is used as the weight The cells are in the rectangular form The

binning of gradient orientation should be spread over 0 to 180 degrees and

opposite direction counts as the same In the Fig 8 depicts the edge

orientation of picture elements If the images have any illumination and

contrast changes then the gradient strength must be locally normalized For

that cells are grouped together into larger blocks These blocks are

overlapping so that each cell contributes more than once to the final

descriptor Here rectangular HOG (R-HOG) blocks are applied which are

mainly in square grids The performance of HOG is improved by putting

on a Gaussian window into each block

FIG

CHAPTER 3

SOFTWARE SPECIFICATION

31 GENERAL

MATLAB(matrix laboratory) is a numerical

computing environment and fourth-generation programming language

Developed by Math Works MATLAB allows matrix manipulations

plotting of functions and data implementation of algorithms creation

of user interfaces and interfacing with programs written in other languages

including C C++ Java and Fortran

Although MATLAB is intended primarily for numerical computing an

optional toolbox uses the MuPAD symbolic engine allowing access

to symbolic computing capabilities An additional package Simulink adds

graphicalmulti-domainsimulationandModel-Based

Design for dynamic and embedded systems

In 2004 MATLAB had around one million users across industry

and academia MATLAB users come from various backgrounds

of engineering science and economics MATLAB is widely used in

academic and research institutions as well as industrial enterprises

MATLAB was first adopted by researchers and practitioners

in control engineering Littles specialty but quickly spread to many other

domains It is now also used in education in particular the teaching

of linear algebra and numerical analysis and is popular amongst scientists

involved in image processing The MATLAB application is built around the

MATLAB language The simplest way to execute MATLAB code is to type

it in the Command Window which is one of the elements of the MATLAB

Desktop When code is entered in the Command Window MATLAB can

be used as an interactive mathematical shell Sequences of commands can

be saved in a text file typically using the MATLAB Editor as a script or

encapsulated into a function extending the commands available

MATLAB provides a number of features for documenting and

sharing your work You can integrate your MATLAB code with other

languages and applications and distribute your MATLAB algorithms and

applications

32 FEATURES OF MATLAB

High-level language for technical computing

Development environment for managing code files and data

Interactive tools for iterative exploration design and problem solving

Mathematical functions for linear algebra statistics Fourier analysis

filtering optimization and numerical integration

2-D and 3-D graphics functions for visualizing data

Tools for building custom graphical user interfaces

Functions for integrating MATLAB based algorithms with external

applications and languages such as C C++ FORTRAN Javatrade COM

and Microsoft Excel

MATLAB is used in vast area including signal and image

processing communications control design test and measurement

financial modeling and analysis and computational Add-on toolboxes

(collections of special-purpose MATLAB functions) extend the MATLAB

environment to solve particular classes of problems in these application

areas

MATLAB can be used on personal computers and powerful

server systems including the Cheaha compute cluster With the addition of

the Parallel Computing Toolbox the language can be extended with parallel

implementations for common computational functions including for-loop

unrolling Additionally this toolbox supports offloading computationally

intensive workloads to Cheaha the campus compute cluster MATLAB is

one of a few languages in which each variable is a matrix (broadly

construed) and knows how big it is Moreover the fundamental operators

(eg addition multiplication) are programmed to deal with matrices when

required And the MATLAB environment handles much of the bothersome

housekeeping that makes all this possible Since so many of the procedures

required for Macro-Investment Analysis involves matrices MATLAB

proves to be an extremely efficient language for both communication and

implementation

321 INTERFACING WITH OTHER LANGUAGES

MATLAB can call functions and subroutines written in the C

programming language or FORTRAN A wrapper function is created

allowing MATLAB data types to be passed and returned The dynamically

loadable object files created by compiling such functions are termed MEX-

files (for MATLAB executable)

Libraries written in Java ActiveX or NET can be directly called

from MATLAB and many MATLAB libraries (for

example XML or SQL support) are implemented as wrappers around Java

or ActiveX libraries Calling MATLAB from Java is more complicated but

can be done with MATLAB extension which is sold separately by Math

Works or using an undocumented mechanism called JMI (Java-to-Mat lab

Interface) which should not be confused with the unrelated Java that is also

called JMI

As alternatives to the MuPAD based Symbolic Math Toolbox

available from Math Works MATLAB can be connected

to Maple or Mathematical

Libraries also exist to import and export MathML

Development Environment

Startup Accelerator for faster MATLAB startup on Windows especially on

Windows XP and for network installations

Spreadsheet Import Tool that provides more options for selecting and

loading mixed textual and numeric data

Readability and navigation improvements to warning and error messages in

the MATLAB command window

Automatic variable and function renaming in the MATLAB Editor

Developing Algorithms and Applications

MATLAB provides a high-level language and development

tools that let you quickly develop and analyze your algorithms and

applications

The MATLAB Language

The MATLAB language supports the vector and matrix operations

that are fundamental to engineering and scientific problems It enables fast

development and execution With the MATLAB language you can

program and develop algorithms faster than with traditional languages

because you do not need to perform low-level administrative tasks such as

declaring variables specifying data types and allocating memory In many

cases MATLAB eliminates the need for lsquoforrsquo loops As a result one line of

MATLAB code can often replace several lines of C or C++ code

At the same time MATLAB provides all the features of a traditional

programming language including arithmetic operators flow control data

structures data types object-oriented programming (OOP) and debugging

features

MATLAB lets you execute commands or groups of commands one

at a time without compiling and linking enabling you to quickly iterate to

the optimal solution For fast execution of heavy matrix and vector

computations MATLAB uses processor-optimized libraries For general-

purpose scalar computations MATLAB generates machine-code

instructions using its JIT (Just-In-Time) compilation technology

This technology which is available on most platforms provides

execution speeds that rival those of traditional programming languages

Development Tools

MATLAB includes development tools that help you implement

your algorithm efficiently These include the following

MATLAB Editor

Provides standard editing and debugging features such as setting

breakpoints and single stepping

Code Analyzer

Checks your code for problems and recommends modifications to

maximize performance and maintainability

MATLAB Profiler

Records the time spent executing each line of code

Directory Reports

Scan all the files in a directory and report on code efficiency file

differences file dependencies and code coverage

Designing Graphical User Interfaces

By using the interactive tool GUIDE (Graphical User Interface

Development Environment) to layout design and edit user interfaces

GUIDE lets you include list boxes pull-down menus push buttons radio

buttons and sliders as well as MATLAB plots and Microsoft

ActiveXreg controls Alternatively you can create GUIs programmatically

using MATLAB functions

322 ANALYZING AND ACCESSING DATA

MATLAB supports the entire data analysis process from acquiring

data from external devices and databases through preprocessing

visualization and numerical analysis to producing presentation-quality

output

Data Analysis

MATLAB provides interactive tools and command-line functions for data

analysis operations including

Interpolating and decimating

Extracting sections of data scaling and averaging

Thresholding and smoothing

Correlation Fourier analysis and filtering

1-D peak valley and zero finding

Basic statistics and curve fitting

Matrix analysis

Data Access

MATLAB is an efficient platform for accessing data from

files other applications databases and external devices You can read data

from popular file formats such as Microsoft Excel ASCII text or binary

files image sound and video files and scientific files such as HDF and

HDF5 Low-level binary file IO functions let you work with data files in

any format Additional functions let you read data from Web pages and

XML

Visualizing Data

All the graphics features that are required to visualize engineering

and scientific data are available in MATLAB These include 2-D and 3-D

plotting functions 3-D volume visualization functions tools for

interactively creating plots and the ability to export results to all popular

graphics formats You can customize plots by adding multiple axes

changing line colors and markers adding annotation Latex equations and

legends and drawing shapes

2-D Plotting

Visualizing vectors of data with 2-D plotting functions that create

Line area bar and pie charts

Direction and velocity plots

Histograms

Polygons and surfaces

Scatterbubble plots

Animations

3-D Plotting and Volume Visualization

MATLAB provides functions for visualizing 2-D matrices 3-

D scalar and 3-D vector data You can use these functions to visualize and

understand large often complex multidimensional data Specifying plot

characteristics such as camera viewing angle perspective lighting effect

light source locations and transparency

3-D plotting functions include

Surface contour and mesh

Image plots

Cone slice stream and isosurface

323 PERFORMING NUMERIC COMPUTATION

MATLAB contains mathematical statistical and engineering

functions to support all common engineering and science operations These

functions developed by experts in mathematics are the foundation of the

MATLAB language The core math functions use the LAPACK and BLAS

linear algebra subroutine libraries and the FFTW Discrete Fourier

Transform library Because these processor-dependent libraries are

optimized to the different platforms that MATLAB supports they execute

faster than the equivalent C or C++ code

MATLAB provides the following types of functions for performing

mathematical operations and analyzing data

Matrix manipulation and linear algebra

Polynomials and interpolation

Fourier analysis and filtering

Data analysis and statistics

Optimization and numerical integration

Ordinary differential equations (ODEs)

Partial differential equations (PDEs)

Sparse matrix operations

MATLAB can perform arithmetic on a wide range of data types

including doubles singles and integers

CHAPTER 4

IMPLEMENTATION

41 GENERAL

Matlab is a program that was originally designed to simplify the

implementation of numerical linear algebra routines It has since grown into

something much bigger and it is used to implement numerical algorithms

for a wide range of applications The basic language used is very similar to

standard linear algebra notation but there are a few extensions that will

likely cause you some problems at first

42 SNAPSHOTS

ORIGINAL SCLERA IMAGE IS CONVERTED INTO GREY SCALE IMAGE

FIG

GREY SCALE IMAGE IS CONVERTED INTO BINARY IMAGE

FIG

EDGE DETECTON IS DONE BY OTSUrsquoS THRESHOLDING

FIG

SELECTING THE REGION OF INTEREST (SCLERA PART)

FIG

SELECTED ROI PART

FIG

FIG

ENHANCEMENT OF SCLERA IMAGE

FIG

FEATURE EXTRACTION OF SCLERA IMAGE USING GABOR

FILTERS

FIG

MATCHING WITH IMAGES IN DATABASE

FIG

DISPLAYING THE RESULT (MATCHED OR NOT MATCHED)

FIG

CHAPTER 5

APPLICATIONS

The applications of biometrics can be divided into the following three main groups

Commercial applications such as computer network login electronic data security ecommerce Internet access ATM credit card physical access control cellular phone PDA medical records management distance learning etc

Government applications such as national ID card correctional facility driverrsquos license social security welfare-disbursement border control Passport control etc

Forensic applications such as corpse identification criminal investigation terrorist identification parenthood determination missing children etc Traditionally commercial applications have used knowledge-based systems (eg PIN sand passwords) government applications have used token-based systems (eg ID cards and badges) and forensic applications have relied on human experts to match biometric features Biometric systems are being increasingly deployed in large scale civilian applications The Schiphol Premium scheme at the Amsterdam airport for example employs iris scan cards to speed up the passport and visa control procedures

CHAPTER 6

CONCLUSION AND FUTURE SCOPE

61 CONCLUSION

In this paper we proposed a new parallel sclera vein recognition

method which employees a two stage parallel approach for registration and

matching Even though the research focused on developing a parallel sclera

matching solution for the sequential line-descriptor method using CUDA

GPU architecture the parallel strategies developed in this research can be

applied to design parallel solutions to other sclera vein recognition methods

and general pattern recognition methods We designed the Y shape

descriptor to narrow the search range to increase the matching efficiency

which is a new feature extraction method to take advantage of the GPU

structures We developed the WPL descriptor to incorporate mask

information and make it more suitable for parallel computing which can

dramatically reduce data transferring and computation We then carefully

mapped our algorithms to GPU threads and blocks which is an important

step to achieve parallel computation efficiency using a GPU A work flow

which has high arithmetic intensity to hide the memory access latency was

designed to partition the computation task to the heterogeneous system of

CPU and GPU even to the threads in GPU The proposed method

dramatically improves the matching efficiency without compromising

recognition accuracy

62 REFERENCES

[1] C W Oyster The Human Eye Structure and Function Sunderland

Sinauer Associates 1999

[2] C Cuevas D Berjon F Moran and N Garcia ldquoMoving object

detection for real-time augmented reality applications in a GPGPUrdquo IEEE

Trans Consum Electron vol 58 no 1 pp 117ndash125 Feb 2012

[3] D C Cirean U Meier L M Gambardella and J Schmidhuber ldquoDeep

big simple neural nets for handwritten digit recognitionrdquo Neural Comput

vol 22 no 12 pp 3207ndash3220 2010

[4] F Z Sakr M Taher and A M Wahba ldquoHigh performance iris

recognition system on GPUrdquo in Proc ICCES 2011 pp 237ndash242

[5] G Poli J H Saito J F Mari and M R Zorzan ldquoProcessing

neocognitron of face recognition on high performance environment based

on GPU with CUDA architecturerdquo in Proc 20th Int Symp Comput

Archit High Perform Comput 2008 pp 81ndash88

[6] J Antikainen J Havel R Josth A Herout P Zemcik and M Hauta-

Kasari ldquoNonnegative tensor factorization accelerated using GPGPUrdquo IEEE

Trans Parallel Distrib Syst vol 22 no 7 pp 1135ndash1141 Feb 2011

[7] K-S Oh and K Jung ldquoGPU implementation of neural networksrdquo

Pattern Recognit vol 37 no 6 pp 1311ndash1314 2004

[8] P R Dixon T Oonishi and S Furui ldquoHarnessing graphics processors

for the fast computation of acoustic likelihoods in speech recognitionrdquo

Comput Speech Lang vol 23 no 4 pp 510ndash526 2009

[9] P Kaufman and A Alm ldquoClinical applicationrdquo Adlerrsquos Physiology of

the Eye 2003

[10] R N Rakvic B J Ulis R P Broussard R W Ives and N Steiner

ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics Security vol 4

no 4 pp 812ndash823 Dec 2009

[11] S Crihalmeanu and A Ross ldquoMultispectral scleral patterns for ocular

biometric recognitionrdquo Pattern Recognit Lett vol 33 no 14 pp 1860ndash

1869 Oct 2012

[12] W Wenying Z Dongming Z Yongdong L Jintao and G

Xiaoguang ldquoRobust spatial matching for object retrieval and its parallel

implementation on GPUrdquo IEEE Trans Multimedia vol 13 no 6 pp

1308ndash1318 Dec 2011Multimedia Sec Magdeburg Germany Sep 2004

pp 4ndash15

[13] Y Xu S Deka and R Righetti ldquoA hybrid CPU-GPGPU approach for

real-time elastographyrdquo IEEE Trans Ultrason Ferroelectr Freq Control

vol 58 no 12 pp 2631ndash2645 Dec 2011

[14] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive

multimodal eye recognitionrdquo Signal Image Video Process vol 7 no 4

pp 619ndash631 Jul 2013

[15] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive

approach for sclera image quality measurerdquo Int J Biometrics vol 5 no 2

pp 181ndash198 2013

[16] Z Zhou E Y Du N L Thomas and E J Delp ldquoA new human

identification method Sclera recognitionrdquo IEEE Trans Syst Man

Cybern A Syst Humans vol 42 no 3 pp 571ndash583 May 2012

Page 16: Major project report on

The result of image segmentation is a set of segments that

collectively cover the entire image or a set of contours extracted from the

image (see edge detection) Each of the pixels in a region are similar with

respect to some characteristic or computed property such as

colour intensity or texture Adjacent regions are significantly different

with respect to the same characteristic(s) When applied to a stack of

images typical in medical imaging the resulting contours after image

segmentation can be used to create 3D reconstructions with the help of

interpolation algorithms like marching cubes

126 IMAGE RESTORATION

Image restoration like enhancement improves the qualities of image

but all the operations are mainly based on known measured or

degradations of the original image Image restorations are used to restore

images with problems such as geometric distortion improper focus

repetitive noise and camera motion It is used to correct images for known

degradations

127 FUNDAMENTAL STEPS

Image acquisition to acquire a digital image

Image preprocessing to improve the image in ways that increases the

chances for success of the other processes

Image segmentation to partitions an input image into its constituent parts or

objects

Image representation to convert the input data to a form suitable for

computer processing

Image description to extract features that result in some quantitative

information of interest or features that are basic for differentiating one

class of objects from another

Image recognition to assign a label to an object based on the

information provided by its descriptors

Image interpretation to assign meaning to an ensemble of recognized

objects

Knowledge about a problem domain is coded into an image processing

system in the form of a Knowledge database

13 A SIMPLE IMAGE MODEL

To be suitable for computer processing an image f(xy) must be digitalized

both spatially and in amplitude

Digitization of the spatial coordinates (xy) is called image sampling

Amplitude digitization is called gray-level quantization

The storage and processing requirements increase rapidly with the spatial

resolution and the number of gray levels

Example A 256 gray-level image of size 256x256 occupies 64K bytes of

memory

Images of very low spatial resolution produce a checkerboard effect

The use of insufficient number of gray levels in smooth areas of a digital

image results in false contouring

14 IMAGE FILE FORMATS

There are two general groups of lsquoimagesrsquo vector graphics (or line art)

and bitmaps (pixel-based or lsquoimagesrsquo) Some of the most common file

formats are

GIF mdash Graphical interchange Format An 8-bit (256 colour) non-

destructively compressed bitmap format Mostly used for web Has several

sub-standards one of which is the animated GIF

JPEG mdash Joint Photographic Experts Group a very efficient (ie much

information per byte) destructively compressed 24 bit (16 million colours)

bitmap format Widely used especially for web and Internet (bandwidth-

limited)

TIFF mdash Tagged Image File Format The standard 24 bit publication bitmap

format Compresses non-destructively with for instance Lempel-Ziv-

Welch (LZW) compression

PS mdash Postscript a standard vector format Has numerous sub-standards

and can be difficult to transport across platforms and operating systems

PSD ndash Adobe PhotoShop Document a dedicated Photoshop format that

keeps all the information in an image including all the layers

BMP- bit map file format

15 TYPE OF IMAGES

Images are 4 types

1 Binary image

2 Gray scale image

3 Color image

4 Indexed image

151 BINARY IMAGES

A binary image is a digital image that has only two possible values for

each pixel Typically the two colors used for a binary image are black and

white though any two colors can be used Binary images are also called bi-

level or two-level This means that each pixel is stored as a single bitmdashie

a 0 or 1 The names black-and-white BampW

152 GRAY SCALE IMAGE

In a (8-bit) grayscale image each picture element has an assigned intensity

that ranges from 0 to 255 A grey scale image is what people normally call

a black and white image but the name emphasizes that such an image will

also include many shades of grey

FIG

153 COLOR IMAGE

The RGB colour model relates very closely to the way we perceive

colour with the r g and b receptors in our retinas RGB uses additive colour

mixing and is the basic colour model used in television or any other

medium that projects colour with light It is the basic colour model used in

computers and for web graphics but it cannot be used for print production

The secondary colours of RGB ndash cyan magenta and yellow ndash are formed

by mixing two of the primary colours (red green or blue) and excluding the

third colour Red and green combine to make yellow green and blue to

make cyan and blue and red form magenta The combination of red green

and blue in full intensity makes white

In Photoshop using the ldquoscreenrdquo mode for the different layers in an

image will make the intensities mix together according to the additive

colour mixing model This is analogous to stacking slide images on top of

each other and shining light through them

FIG

CMYK The 4-colour CMYK model used in printing lays down

overlapping layers of varying percentages of transparent cyan (C) magenta

(M) and yellow (Y) inks In addition a layer of black (K) ink can be added

The CMYK model uses the subtractive colour model

154 INDEXED IMAGE

FIG

An indexed image consists of an array and a color map matrix The

pixel values in the array are direct indices into a color map By convention

this documentation uses the variable name X to refer to the array and map

to refer to the color map In computing indexed color is a technique to

manage digital images colors in a limited fashion in order to save

computer memory and file storage while speeding up display refresh and

file transfers It is a form of vector quantization compression

When an image is encoded in this way color information is not

directly carried by the image pixel data but is stored in a separate piece of

data called a palette an array of color elements in which every element a

color is indexed by its position within the array The image pixels do not

contain the full specification of its color but only its index in the palette

This technique is sometimes referred as pseudocolor or indirect color as

colors are addressed indirectly

Perhaps the first device that supported palette colors was a random-

access frame buffer described in 1975 by Kajiya Sutherland and Cheadle

This supported a palette of 256 36-bit RGB colors

16 Applications of image processing

Interest in digital image processing methods stems from 2 principal

application areas

1) Improvement of pictorial information for human interpretation

2) Processing of scene data for autonomous machine perception

In the second application area interest focuses on procedures for

extracting from an image

Information in a form suitable for computer processing

Examples include automatic character recognition industrial machine

vision for product assembly and inspection military recognizance

automatic processing of fingerprints etc

17 EXISTING SYSTEM

In Crihalmeanu and Ross proposed three approaches Speed Up Robust

Features (SURF)-based method minutiae detection and direct correlation

matching for feature registration and matching Within these three methods

the SURF method achieves the best accuracy It takes an average of 15

seconds1 using the SURF method to perform a one-to-one matching In

Zhou et al Proposed line descriptor-based method for sclera vein

recognition The matching step (including registration) is the most time-

consuming step in this sclera vein recognition system which costs about 12

seconds to perform a one-to-one matching Both speed was calculated using

a PC with Intelreg Core TM 2 Duo 24GHz processors and 4 GB DRAM

Currently Sclera vein recognition algorithms are designed using central

processing unit (CPU)-based systems

171 DISADVANTAGES OF EXISTING SYSTEM

1 Mask files are used to calculate valid overlapping areas of two sclera

templates and to align the templates to the same coordinate system But the

mask files are large in size and will preoccupy the GPU memory and slow

down the data transfer Also some of processing on the mask files will

involve convolution which is difficult to improve its performance on the

scalar process unit on CUDA

2 The procedure of sclera feature matching consists of a pipeline of several

computational stages with different memory and processing requirements

There is no uniform mapping scheme applicable for all these stages

3 When the scale of sclera database is far larger than the number of the

processing units on the GPU parallel matching on the GPU is still unable to

satisfy the requirement of real-time performance

LITERATURE SURVEY

1 S Crihalmeanu and A Ross ldquo ieee Multispectral scleral patterns

for ocular biometric recognitionrdquo Pattern Recognit Lett vol 33 no

14 pp 1860ndash1869 Oct 2012

Face recognition in unconstrained acquisition conditions is one of the

most challenging problems that has been actively researched in recent

years It is well known that many state-of-the-arts still face recognition

algorithms perform well when constrained (frontal well illuminated high-

resolution sharp and full) face images are acquired However their

performance degrades significantly when the test images contain variations

that are not present in the training images In this paper we highlight some

of the key issues in remote face recognition We define the remote face

recognition as one where faces are several tens of meters (10-250m) from

the cameras We then describe a remote face database which has been

acquired in an unconstrained outdoor maritime environment Recognition

performance of a subset of existing still image-based face recognition

algorithms is evaluated on the remote face data set Further we define the

remote re-identification problem as matching a subject at one location with

candidate sets acquired at a different location and over time in remote

conditions We provide preliminary experimental results on remote re-

identification It is demonstrated that in addition to applying a good

classification algorithm finding features that are robust to variations

mentioned above and developing statistical models which can account for

these variations are very important for remote face recognition

2 R N Rakvic B J Ulis R P Broussard R W Ives and N

Steiner1 ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics

Security

With the rapidly expanded biometric data collected by various sectors

of government and industry for identification and verification purposes

how to manage and process such Big Data draws great concern Even

though modern processors are equipped with more cores and memory

capacity it still requires careful design in order to utilize the hardware

resource effectively and the power consumption efficiently This research

addresses this issue by investigating the workload characteristics of

biometric application Taking Daugmanrsquos iris matching algorithm which

has been proven to be the most reliable iris matching method as a case

study we conduct performance profiling and binary instrumentation on the

benchmark to capture its execution behavior The results show that data

loading and memory access incurs great performance overhead and

motivates us to move the biometrics computation to high-performance

architecture

Modern iris recognition algorithms can be computationally intensive

yet are designed for traditional sequential processing elements such as a

personal computer However a parallel processing alternative using field

programmable gate arrays (FPGAs) offers an opportunity to speed up iris

recognition Within the means of this project iris template generation with

directional filtering which is acomputationally expensive yet parallel

portion of a modern iris recognition algorithm is parallelized on an FPGA

system We will present a performance comparison of the parallelized

algorithm on the FPGA system to a traditional CPU-based version The

parallelized template generation outperforms an optimized C++ code

version determining the information content of an iris approximately 324

times faster

3 R Derakhshani A Ross and S Crihalmeanu ldquoA new biometric

modality based on conjunctival vasculaturerdquo in Proc Artif Neural

Netw Eng 2006 pp 1ndash8

A new biometric indicator based on the patterns of conjunctival

vasculature is proposed Conjunctival vessels can be observed on the visible

part of the sclera that is exposed to the outside world These vessels

demonstrate rich and specific details in visible light and can be easily

photographed using a regular digital camera In this paper we discuss

methods for conjunctival imaging preprocessing and feature extraction in

order to derive a suitable conjunctival vascular template for biometric

authentication Commensurate classification methods along with the

observed accuracy are discussed Experimental results suggest the potential

of using conjunctival vasculature as a biometric measure Identification of

a person based on some unique set of features is an important task The

human identification is possible with several biometric systems and sclera

recognition is one of the promising biometrics The sclera is the white

portion of the human eye The vein pattern seen in the sclera region is

unique to each person Thus the sclera vein pattern is a well suited

biometric technology for human identification The existing methods used

for sclera recognition have some drawbacks like only frontal looking

images are preferred for matching and rotation variance is another problem

These problems are completely eliminated in the proposed system by using

two feature extraction techniques They are Histogram of Oriented

Gradients (HOG) and converting the image into polar form using the

bilinear interpolation technique These two features help the proposed

system to become illumination invariant and rotation invariant The

experimentation is done with the help of UBIRIS database The

experimental result shows that the proposed sclera recognition method can

achieve better accuracy than the previous methods

4 J D Owens M Houston D Luebke S Green J E Stone and J

C Phillips ldquoGPU computingrdquo Proc IEEE vol 96 no 5pp 879ndash899

May 2008

The graphics processing unit (GPU) has become an integral part of

todayrsquos mainstream computing systems Over the past six years there has

been a marked increase in the performance and capabilities of GPUs The

modern GPU is not only a powerful graphics engine but also a highly

parallel programmable processor featuring peak arithmetic and memory

bandwidth that substantially outpaces its CPU counterpart The GPUrsquos

rapid increase in both programmability and capability has spawned a

research community that has successfully mapped a broad range of

computationally demanding complex problems to the GPU This effort in

general purpose computing on the GPU also known as GPU computing

has positioned the GPU as a compelling alternative to traditional

microprocessors in high-performance computer systems of the future We

describe the background hardware and programming model for GPU

computing summarize the state of the art in tools and techniques and

present four GPU computing successes in game physics and computational

biophysics that deliver order-of-magnitude performance gains over

optimized CPU applications

5 H Proenccedila and L A Alexandre ldquoUBIRIS A noisy iris image

databaserdquoin Proc 13th Int Conf Image Anal Process 2005 pp 970ndash

977

This paper proposes algorithms for iris segmentation quality

enhancement match score fusion and indexing to improve both the

accuracy and the speed of iris recognition A curve evolution approach is

proposed to effectively segment a nonideal iris image using the modified

MumfordndashShah functional Different enhancement algorithms are

concurrently applied on the segmented iris image to produce multiple

enhanced versions of the iris image A support-vector-machine-based

learning algorithm selects locally enhanced regions from each globally

enhanced image and combines these good-quality regions to create a single

high-quality iris image Two distinct features are extracted from the high-

quality iris image The global textural feature is extracted using the 1-D log

polar Gabor transform and the local topological feature is extracted using

Euler numbers An intelligent fusion algorithm combines the textural and

topological matching scores to further improve the iris recognition

performance and reduce the false rejection rate whereas an indexing

algorithm enables fast and accurate iris identification The verification and

identification performance of the propose algorithms is validated and

compared with other algorithms using the CASIA Version 3 ICE 2005 and

UBIRIS iris databases

18 PROPOSED METHOD

Proposed a new parallel sclera vein recognition method using a two-

stage parallel approach for registration and matching A parallel sclera

matching solution for Sclera vein recognition using our sequential line-

descriptor method using the CUDA GPU architecture CUDA is a highly

parallel multithreaded many-core processor with tremendous

computational power

It supports not only a traditional graphics pipeline but also computation

on non-graphical data It is relatively straightforward to implement our C

program for CUDA on AMD-based GPU using Open CL Our CUDA

kernels can be directly converted to Open CL kernels by concerning

different syntax for various keywords and built-in functions The mapping

strategy is also effective in Open CL if we regard thread and block in

CUDA as work item and work-group in Open CL Most of our optimization

techniques such as coalesced memory access and prefix sum can work in

Open CL too Moreover since CUDA is a data parallel architecture the

implementation of our approach by Open CL should be programmed in

data-parallel model

In this research we first discuss why the naiumlve parallel approach would

not work We then propose the new sclera descriptor ndash the Y shape sclera

feature-based efficient registration method to speed up the mapping scheme

introduce the ldquoweighted polar line (WPL) descriptorrdquo that would be better

suited for parallel computing to mitigate the mask size issue and develop

our coarse to fine two-stage matching process to dramatically improve the

matching speed These new approaches make the parallel processing

possible and efficient

191PROPOSED SYSTEM ADVANTAGES

1 To improve the efficiency in this research we propose a new descriptor

mdash the Y shape descriptor which can greatly help improve the efficiency of

the coarse registration of two images and can be used to filter out some

non-matching pairs before refined matching

2 We propose the coarse-to-fine two-stage matching process In the first

stage we matched two images coarsely using the Y-shape descriptors

which is very fast to match because no registration was needed The

matching result in this stage can help filter out image pairs with low

similarities

CHAPTER 2

PROJECT DESCRIPTION

21 INTRODUCTION

The sclera is the opaque and white outer layer of the eye The blood

vessel structure of sclera is formed randomly and is unique to each person

which can be used for humanrsquos identification Several researchers have

designed different Sclera vein recognition methods and have shown that it

is promising to use Sclera vein recognition for human identification In

Crihalmeanu and Ross proposed three approaches Speed Up Robust

Features (SURF)-based method minutiae detection and direct correlation

matching for feature registration and matching Within these three methods

the SURF method achieves the best accuracy It takes an average of 15

seconds1 using the SURF method to per- form a one-to-one matching Zhou

et al proposed line descriptor-based method for sclera vein recognition

The matching step (including registration) is the most time-consuming step

in this sclera vein recognition system which costs about 12 seconds to

perform a one-to-one matching Both speed was calculated using a PC with

Intelreg Coretrade 2 Duo 24GHz processors and 4 GB DRAM Currently

Sclera vein recognition algorithms are designed using central processing

unit (CPU)-based systems

As discussed CPU-based systems are designed as sequential

processing devices which may not be efficient in data processing where the

data can be parallelized Because of large time consumption in the matching

step Sclera vein recognition using sequential-based method would be very

challenging to be implemented in a real time biometric system especially

when there is large number of templates in the database for matching GPUs

(as abbreviation of General purpose Graphics Processing Units GPGPUs)

are now popularly used for parallel computing to improve the

computational processing speed and efficiency The highly parallel

structure of GPUs makes them more effective than CPUs for data

processing where processing can be performed in parallel GPUs have been

widely used in biometrics recognition such as speech recognition text

detection handwriting recognition and face recognition In iris

recognition GPU was used to extract the features construct descriptors

and match templates

GPUs are also used for object retrieval and image search Park et al

designed the performance evaluation of image processing algorithms such

as linear feature extraction and multi-view stereo matching on GPUs

However these approaches were designed for their specific biometric

recognition applications and feature searching methods Therefore they may

not be efficient for Sclera vein recognition Compute Unified Device

Architecture (CUDA) the computing engine of NVIDIA GPUs is used in

this research CUDA is a highly parallel multithreaded many-core

processor with tremendous computational power It supports not only a

traditional graphics pipeline but also computation on non-graphical data

More importantly it offers an easier programming platform which

outperforms its CPU counterparts in terms of peak arithmetic intensity and

memory bandwidth In this research the goal is not to develop a unified

strategy to parallelize all sclera matching methods because each method is

quite different from one another and would need customized design To

develop an efficient parallel computing scheme it would need different

strategies for different Sclera vein recognition methods

Rather the goal is to develop a parallel sclera matching solution for

Sclera vein recognition using our sequential line-descriptor method using

the CUDA GPU architecture However the parallelization strategies

developed in this research can be applied to design parallel approaches for

other Sclera vein recognition methods and help parallelize general pattern

recognition methods Based on the matching approach in there are three

challenges to map the task of sclera feature matching to GPU

1) Mask files are used to calculate valid overlapping areas of two sclera

templates and to align the templates to the same coordinate system But the

mask files are large in size and will preoccupy the GPU memory and slow

down the data transfer Also some of processing on the mask files will

involve convolution which is difficult to improve its performance on the

scalar process unit on CUDA

2) The procedure of sclera feature matching consists of a pipeline of several

computational stages with different memory and processing requirements

There is no uniform mapping scheme applicable for all these stages

3) When the scale of sclera database is far larger than the number of the

processing units on the GPU parallel matching on the GPU is still unable to

satisfy the requirement of real-time performance New designs are

necessary to help narrow down the search range In summary naiumlve

implementation of the algorithms in parallel would not work efficiently

Note it is relatively straightforward to implement our C program for

CUDA on AMD-based GPU using Open CL Our CUDA kernels can be

directly converted to Open CL kernels by concerning different syntax for

various keywords and built-in functions The mapping strategy is also

effective in Open CL if we regard thread and block in CUDA as work item

and work-group in Open CL Most of our optimization techniques such as

coalesced memory access and prefix sum can work in Open CL too

Moreover since CUDA is a data parallel architecture the implementation

of our approach by Open CL should be programmed in data-parallel model

In this research we first discuss why the naiumlve parallel approach would not

work (Section 3) We then propose the new sclera descriptor ndash the Y shape

sclera feature-based efficient registration method to speed up the mapping

scheme (Section 4) introduce the ldquoweighted polar line (WPL) descriptorrdquo

that would be better suited for parallel computing to mitigate the mask size

issue (Section 5) and develop our coarse to fine two-stage matching

process to dramatically improve the matching speed (Section 6) These new

approaches make the parallel processing possible and efficient However it

is non-trivial to implement these algorithms in CUDA We then developed

the implementation schemes to map our algorithms into CUDA (Section 7)

In the Section 2 we give brief introduction of Sclera vein recognition In

the Section 8 we performed some experiments using the proposed system

In the Section 9 we draw some conclusions

22 BACKGROUND OF SCLERA VEIN RECOGNITION

221 OVERVIEW OF SCLERA VEIN RECOGNITION

A typical sclera vein recognition system includes sclera

segmentation feature enhancement feature extraction and feature

matching (Figure 1)

FIG

Sclera image segmentation is the first step in sclera vein recognition

Several methods have been designed for sclera segmentation Crihalmeanu

et al presented an semi-automated system for sclera segmentation They

used a clustering algorithm to classify the color eye images into three

clusters - sclera iris and background Later on Crihalmeanu and Ross

designed a segmentation approach based on a normalized sclera index

measure which includes coarse sclera segmentation pupil region

segmentation and fine sclera segmentation Zhou et al developed a skin

tone plus ldquowhite colorrdquo-based voting method for sclera segmentation in

color images and Otsursquos thresholding-based method for grayscale images

After sclera segmentation it is necessary to enhance and extract the sclera

features since the sclera vein patterns often lack contrast and are hard to

detect Zhou et al used a bank of multi-directional Gabor filters for

vascular pattern enhancement Derakhshani et al used contrast limited

adaptive histogram equalization (CLAHE) to enhance the green color plane

of the RGB image and a multi-scale region growing approach to identify

the sclera veins from the image background Crihalmeanu and Ross applied

a selective enhancement filter for blood vessels to extract features from the

green component in a color image In the feature matching step

Crihalmeanu and Ross proposed

three registration and matching approaches including Speed Up Robust

Features (SURF) which is based on interest-point detection minutiae

detection which is based on minutiae points on the vasculature structure

and direct correlation matching which relies on image registration Zhou et

al designed a line descriptor based feature registration and matching

method

The proposed sclera recognition consists of five steps which include

sclera segmentation vein pattern enhancement feature extraction feature

matching and matching decision Fig 2 shows the block diagram of sclera

recognition Two types of feature extraction are used in the proposed

method to achieve good accuracy for the identification The characteristics

that are elicited from the blood vessel structure seen in the sclera region are

Histogram of Oriented Gradients (HOG) and interpolation of Cartesian to

Polar conversion HOG is used to determine the gradient orientation and

edge orientations of vein pattern in the sclera region of an eye image To

become more computationally efficient the data of the image are converted

to the polar form It is mainly used for circular or quasi circular shape of

object These two characteristics are extracted from all the images in the

database and compared with the features of the query image whether the

person is correctly identified or not This procedure is done in the feature

matching step and ultimately makes the matching decision By using the

proposed feature extraction methods and matching techniques the human

identification is more accurate than the existing studies In the proposed

method two features of an image are drawn out

222 SCLERA SEGMENTATION

Sclera segmentation is the first step in the sclera recognition It lets

in three steps glare area detection sclera area estimation and iris and eyelid

detection and refinement Fig shows the steps of segmentation

FIG

Glare Area Detection Glare area means a small bright area near

pupil or iris This is the unwanted portion on the eye image Sobel filter is

applied to detect the glare area present in the iris or pupil Simply it runs

only for the grayscale image If the image is color then it needs a

conversion to grayscale image and after that apply it to the Sobel filter to

detect the glare area Fig 4 shows the result of the glare area detection

FIG

Sclera area estimation For the estimation of sclera area Otsursquos

thresholding method is applied The stairs of the sclera area detection are

selection of the area of interest (ROI) Otsursquos thresholding sclera area

detection Left and right sclera area is selected based on the iris boundaries

When the region of interest is selected then apply Otsursquos thresholding for

obtaining the potential sclera areas The correct left sclera area should be

placed in the right and center positions and correct right sclera area should

be placed in the left and center In this way non sclera areas are wiped out

223 IRIS AND EYELID REFINEMENT

The top and underside of the sclera regions are the limits of the

sclera area And then that upper eyelid lower eyelid and iris boundaries are

refined These altogether are the unwanted portion for recognition In order

to eliminate these effects refinement is done in the footstep of the detection

of sclera area Fig shows after the Otsursquos thresholding process and iris and

eyelid refinement to detect right sclera area In the same way the left sclera

area is detected using this method

FIG

In the segmentation process all images are not perfectly segmented

Hence feature extraction and matching are needed to reduce the

segmentation fault The vein patterns in the sclera area are not visible in the

segmentation process To get vein patterns more visible vein pattern

enhancement is to be performed

224 OCULAR SURFACE VASCULATURE

Human recognition using vascular patterns in the human body has

been studied in the context of fingers (Miura et al 2004) palm (Lin and

Fan 2004) and retina (Hill 1999) In the case of retinal biometrics an

especial optical device for imaging the back of the eyeball is needed (Hill

1999) Due to its perceived invasiveness and the required degree of subject

cooperation the use of retinal biometrics may not be acceptable to some

individuals The conjunctiva is a thin transparent and moist tissue that

covers the outer surface of the eye The part of the conjunctiva that covers

the inner lining of the eyelids is called palpebral conjunctiva and the part

that covers the outer surface of the eye is called ocular (or the bulbar)

conjunctiva which is the focus of this study The ocular conjunctiva is very

thin and clear thus the vasculature (including those of the episclera) is

easily visible through it The visible microcirculation of conjunctiva offers a

rich and complex network of veins and fine microcirculation (Fig 1) The

apparent complexity and specificity of these vascular patterns motivated us

to utilize them for personal identification (Derakhshani and Ross 2006)

FIG

We have found conjunctival vasculature to be a suitable biometric as it

conforms to the following criteria (Jain et al 2004)

UNIVERSALITY All normal living tissues including that of the

conjunctiva and episclera have vascular structure

UNIQUENESS Vasculature is created during embryonic vasculogenesis

Its detailed final structure is mostly stochastic and thus unique Even

though no comprehensive study on the uniqueness of vascular structures

has been conducted study of some targeted areas such as those of the eye

fundus confirm the uniqueness of such vascular patterns even between

identical twins (Simon and Goldstein 1935 Tower 1955)

PERMANENCE Other than cases of significant trauma pathology or

chemical intervention spontaneous adult ocular vasculogenesis and

angiogenesis does not easily occur Thus the conjunctival vascular

structure is expected to have reasonable permanence (Joussen 2001)

Practicality Conjunctival vasculature can be captured with commercial off

the shelf digital cameras under normal lighting conditions making this

modality highly practical

ACCEPTABILITY Since the subject is not required to stare directly into

the camera lens and given the possibility of capturing the conjunctival

vasculature from several feet away this modality is non-intrusive and thus

more acceptable

SPOOF-PROOFNESS The fine multi surface structure of the ocular

veins makes them hard to reproduce as a physical artifact Besides being a

stand-alone biometric modality we anticipate that the addition of

conjunctival biometrics will enhance the performance of current iris-based

biometric system in the following ways

Improving accuracy by the addition of vascular features

Facilitating recognition using off-angle iris images For instance if the iris

information is relegated to the left or right portions of the eye the sclera

vein patterns will be further exposed This feature makes sclera vasculature

a natural complement to the iris biometric

Addressing the failure-to-enroll issue when iris patterns are not usable (eg

due to surgical procedures)

Reducing vulnerability to spoof attacks For instance when implemented

alongside iris systems an attacker needs to reproduce not only the iris but

also different surfaces of the sclera along with the associated

microcirculation and make them available on commensurate eye surfaces

The first step in parallelizing an algorithm is to determine the

availability for simultaneous computation Below Figure demonstrates the

possibility for parallel directional filtering Since the filter is computed over

different portions of the input image the computation can be computed in

parallel (denoted by Elements below) In addition individual parallelization

of each element of Filtering can also be performed A detailed discussion of

our proposed parallelization is outside the scope of this paper

FIG

FIG

225 OVERVIEW OF THE LINE DESCRIPTOR-BASED SCLERA

VEIN

2251 RECOGNITION METHOD

The matching segment of the line-descriptor based method is a

bottleneck with regard to matching speed In this section we briefly

describe the Line Descriptor-based sclera vein recognition method After

segmentation vein patterns were enhanced by a bank of directional Gabor

filters Binary morphological operations are used to thin the detected vein

structure down to a single pixel wide skeleton and remove the branch

points The line descriptor is used to describe the segments in the vein

structure Figure 2 shows a visual description of the line descriptor Each

segment is described by three quantities the segments angle to some

reference angle at the iris center θ the segments distance to the iris center r

and the dominant angular orientation of the line segment ɸ Thus the

descriptor is S = ( θ r ɸ )T The individual components of the line descriptor

are calculated as

FIG

Here fline (x) is the polynomial approximation of the line segment (xl yl )

is the center point of the line segment (xi yi ) is the center of the detected

iris and S is the line descriptor In order to register the segments of the

vascular patterns a RANSAC-based algorithm is used to estimate the best-

fit parameters for registration between the two sclera vascular patterns For

the registration algorithm it randomly chooses two points ndash one from the

test template and one from the target template It also randomly chooses a

scaling factor and a rotation value based on a priori knowledge of the

database Using these values it calculates a fitness value for the registration

using these parameters

After sclera template registration each line segment in the test

template is compared to the line segments in the target template for

matches In order to reduce the effect of segmentation errors we created the

weighting image (Figure 3) from the sclera mask by setting interior pixels

in the sclera mask to 1 pixels within some distance of the boundary of the

mask to 05 and pixels outside the mask to 0

The matching score for two segment descriptors is calculated By

where Si and Sj are two segment descriptors m(Si Sj ) is the matching

score between segments Si and Sj d(Si Sj ) is the Euclidean distance

between the segment descriptors center points (from Eq 6-8) Dmatch is

the matching distance threshold and match is the matching angle threshold

The total matching score M is the sum of the individual matching scores

divided by the maximum matching score for the minimal set between the

test and target template That is one of the test or target templates has fewer

points and thus the sum of its descriptors weight sets the maximum score

that can be attained

FIG

FIG

FIG

FIG

movement of eye Y shape branches are observed to be a stable feature and

can be used as sclera feature descriptor To detect the Y shape branches in

the original template we search for the nearest neighbors set of every line

segment in a regular distance classified the angles among these neighbors

If there were two types of angle values in the line segment set this set may

be inferred as a Y shape structure and the line segment angles would be

recorded as a new feature of the sclera

There are two ways to measure both orientation and relationship of

every branch of Y shape vessels one is to use the angles of every branch to

x axle the other is to use the angels between branch and iris radial

direction The first method needs additional rotation operating to align the

template In our approach we employed the second method As Figure 6

shows ϕ1 ϕ2 and ϕ3 denote the angle between each branch and the radius

from pupil center Even when the head tilts the eye moves or the camera

zooms occurs at the image acquisition step ϕ1 ϕ2 and ϕ3 are quite stable

To tolerate errors from the pupil center calculation in the segmentation step

we also recorded the center position (x y) of the Y shape branches as

auxiliary parameters So in our rotation shift and scale invariant feature

vector is defined as y(ϕ1 ϕ2 ϕ3 x y) The Y-shape descriptor is generated

with reference to the iris center Therefore it is automatically aligned to the

iris centers It is a rotational- and scale- invariant descriptor V WPL

SCLERA DESCRIPTOR As we discussed in the Section 22 the line

descriptor is extracted from the skeleton of vessel structure in binary images

(Figure 7) The skeleton is then broken into smaller segments For each

segment a line descriptor is created to record the center and orientation of

the segment This descriptor is expressed as s(x yɸ) where (x y) is the

position of the center and ɸ is its orientation Because of the limitation of

segmentation accuracy the descriptor in the boundary of sclera area might

not be accurate and may contain spur edges resulting from the iris eyelid

andor eyelashes To be tolerant of such error the mask file

FIG

The line descriptor of the sclera vessel pattern (a) An eye image (b) Vessel

patterns in sclera (c) Enhanced sclera vessel patterns (d) Centers of line

segments of vessel patterns

Is designed to indicate whether a line segment belongs to the edge of the

sclera or not However in GPU application using the mask is a challenging

since the mask files are large in size and will occupy the GPU memory and

slow down the data transfer When matching the registration RANSAC

type algorithm was used to randomly select the corresponding descriptors

and the transform parameter between them was used to generate the

template transform affine matrix After every templates transform the mask

data should also be transformed and new boundary should be calculated to

evaluate the weight of the transformed descriptor This results in too many

convolutions in processor unit

To reduce heavy data transfer and computation we designed the

weighted polar line (WPL) descriptor structure which includes the

information of mask and can be automatically aligned We extracted the

relationship of geometric feature of descriptors and store them as a new

descriptor We use a weighted image created via setting various weight

values according to their positions The weight of those descriptors who are

beyond the sclera are set to be 0 and those who are near the sclera

boundary are 05 and interior descriptors are set to be 1 In our work

descriptors weights were calculated on their own mask by the CPU only

once

The calculating result was saved as a component of descriptor The

descriptor of sclera will change to s(x y ɸw) where w denotes the weight

of the point and the value may be 0 05 1 To align two templates when a

template is shifted to another location along the line connecting their

centers all the descriptors of that template will be transformed It would be

faster if two templates have similar reference points If we use the center of

the iris as the reference point when two templates are compared the

correspondence will automatically be aligned to each other since they have

the similar reference point Every feature vector of the template is a set of

line segment descriptors composed of three variable (Figure 8) the

segment angle to the reference line which went through the iris center

denoted as θ the distance between the segments center and pupil center

which is denoted as r the dominant angular orientation of the segment

denoted as ɸ To minimize the GPU computing we also convert the

descriptor value from polar coordinate to rectangular coordinate in CPU

preprocess

The descriptor vector becomes s(x y r θ ɸw) The left and right

parts of sclera in an eye may have different registration parameters For

example as an eyeball moves left left part sclera patterns of the eye may be

compressed while the right part sclera patterns are stretched

In parallel matching these two parts are assigned to threads in

different warps to allow different deformation The multiprocessor in

CUDA manages threads in groups of 32 parallel threads called warps We

reorganized the descriptor from same sides and saved

FIG

FIG

them in continuous address This would meet requirement of coalesced

memory access in GPU

After reorganizing the structure of descriptors and adding mask information

into the new descriptor the computation on the mask file is not needed on

the GPU It was very fast to match with this feature because it does not

need to reregister the templates every time after shifting Thus the cost of

data transfer and computation on GPU will be reduced Matching on the

new descriptor the shift parameters generator in Figure 4 is then simplified

as Figure 9

23 EVOLUTION OF GPU ARCHITECTURE

The fixed-function pipeline lacked the generality to efficiently express

more complicated shading and lighting operations that are essential for

complex effects The key step was replacing the fixed-function per-vertex

and per-fragment operations with user-specified programs run on each

vertex and fragment Over the past six years these vertex programs and

fragment programs have become increasingly more capable with larger

limits on their size and resource consumption with more fully featured

instruction sets and with more flexible control-flow operations After many

years of separate instruction sets for vertex and fragment operations current

GPUs support the unified Shader Model 40 on both vertex and fragment

shaders

The hardware must support shader programs of at least 65 k static

instructions and unlimited dynamic instructions

The instruction set for the first time supports both 32-bit integers and 32-

bit floating-point numbers

The hardware must allow an arbitrary number of both direct and indirect

reads from global memory (texture)

Finally dynamic flow control in the form of loops and branches must be

supported

As the shader model has evolved and become more powerful and GPU

applications of all types have increased vertex and fragment program

complexity GPU architectures have increasingly focused on the

programmable parts of the graphics pipeline Indeed while previous

generations of GPUs could best be described as additions of

programmability to a fixed-function pipeline todayrsquos GPUs are better

characterized as a programmable engine surrounded by supporting fixed-

function units General-Purpose Computing on the GPU Mapping general-

purpose computation onto the GPU uses the graphics hardware in much the

same way as any standard graphics application Because of this similarity it

is both easier and more difficult to explain the process On one hand the

actual operations are the same and are easy to follow on the other hand the

terminology is different between graphics and general-purpose use Harris

provides an excellent description of this mapping process

We begin by describing GPU programming using graphics terminology

then show how the same steps are used in a general-purpose way to author

GPGPU applications and finally use the same steps to show the more

simple and direct way that todayrsquos GPU computing applications are written

231 PROGRAMMING A GPU FOR GRAPHICS

We begin with the same GPU pipeline that we described in Section II

concentrating on the programmable aspects of this pipeline

The programmer specifies geometry that covers a region on the screen

The rasterizer generates a fragment at each pixel location covered by that

geometry

Each fragment is shaded by the fragment program

The fragment program computes the value of the fragment by a

combination of math operations and global memory reads from a global

Btexture[ memory

The resulting image can then be used as texture on future passes through

the graphics pipeline

232 PROGRAMMING A GPU FOR GENERAL-PURPOSE

PROGRAMS (OLD)

Coopting this pipeline to perform general-purpose computation

involves the exact same steps but different terminology A motivating

example is a fluid simulation computed over a grid at each time step we

compute the next state of the fluid for each grid point from the current state

at its grid point and at the grid points of its neighbors

The programmer specifies a geometric primitive that covers a

computation domain of interest The rasterizer generates a fragment at each

pixel location covered by that geometry (In our example our primitive

must cover a grid of fragments equal to the domain size of our fluid

simulation)

Each fragment is shaded by an SPMD general-purpose fragment

program (Each grid point runs the same program to update the state of its

fluid)

The fragment program computes the value of the fragment by a

combination of math operations and Bgather[ accesses from global

memory (Each grid point can access the state of its neighbors from the

previous time step in computing its current value)

The resulting buffer in global memory can then be used as an input on

future passes (The current state of the fluid will be used on the next time

step)

23 2 PROGRAMMING A GPU FOR GENERAL-PURPOSE

PROGRAMS (NEW)

One of the historical difficulties in programming GPGPU applications

has been that despite their general-purpose tasksrsquo having nothing to do with

graphics the applications still had to be programmed using graphics APIs

In addition the program had to be structured in terms of the graphics

pipeline with the programmable units only accessible as an intermediate

step in that pipeline when the programmer would almost certainly prefer to

access the programmable units directly The programming environments we

describe in detail in Section IV are solving this difficulty by providing a

more natural direct non-graphics interface to the hardware and

specifically the programmable units Today GPU computing applications

are structured in the following way

The programmer directly defines the computation domain of interest as a

structured grid of threads

An SPMD general-purpose program computes the value of each thread

The value for each thread is computed by a combination of math

operations and both Bgather[ (read) accesses from and Bscatter[ (write)

accesses to global memory Unlike in the previous two

methods the same buffer can be used for both reading and writing

allowing more flexible algorithms (for example in-place algorithms that

use less memory)

The resulting buffer in global memory can then be used as an input in

future computation

24 COARSE-TO-FINE TWO-STAGE MATCHING PROCESS

To further improve the matching process we propose the coarse-to-fine

two-stage matching process In the first stage we matched two images

coarsely using the Y-shape descriptors which is very fast to match because

no registration was needed The matching result in this stage can help filter

out image pairs with low similarities After this step it is still possible for

some false positive matches In the second stage we used WPL descriptor

to register the two images for more detailed descriptor matching including

scale- and translation invariance This stage includes shift transform affine

matrix generation and final WPL descriptor matching Overall we

partitioned the registration and matching processing into four kernels2 in

CUDA (Figure 10) matching on the Y shape descriptor shift

transformation affine matrix generation and final WSL descriptor

matching Combining these two stages the matching program can run faster

and

achieve more accurate score

241 STAGE I MATCHING WITH Y SHAPE DESCRIPTOR

Due to scale- and rotation- invariance of the Y-shape features

registration is unnecessary before matching on Y shape descriptor The

whole matching algorithm is listed as algorithm 1

FIG

Here ytei and yta j are the Y shape descriptors of test template Tte

and target template Tta respectively dϕ is the Euclidian distance of angle

element of descriptors vector defined as (3) dxy is the Euclidian distance of

two descriptor centers defined as (4) ni and di are the matched descriptor

pairsrsquo number and their centers distance respectively tϕ is a distance

threshold and txy is the threshold to restrict the searching area We set tϕ to

30 and txy to 675 in our experiment Here

To match two sclera templates we searched the areas nearby to all

the Y shape branches The search area is limited to the corresponding left or

right half of the sclera in order to reduce the searching range and time The

distance of two branches is defined in (3) where ϕi j is the angle between

the j th branch and the polar from pupil center in desctiptor i

The number of matched pairs ni and the distance between Y shape

branches centers di are stored as the matching result We fuse the number of

matched branches and the average distance between matched branches

centers as (2) Here α is a factor to fuse the matching score which was set

to 30 in our study Ni and Nj is the total numbers of feature vectors in

template i and j separately The decision is regulated by the threshold t if

the sclerarsquos matching score is lower than t the sclera will be discarded The

sclera with high matching score will be passed to the next more precisely

matching process

242 STAGE II FINE MATCHING USING WPL DESCRIPTOR

The line segment WSL descriptor reveals more vessel structure detail of

sclera than the Y shape descriptor The variation of sclera vessel pattern is

nonlinear because

When acquiring an eye image in different gaze angle the vessel structure

will appear nonlinear shrink or extend because eyeball is spherical in shape

sclera is made up of four layers episclera stroma lamina fusca and

endothelium There are slightly differences among movement of these

layers Considering these factors our registration employed both single

shift transform and multi-parameter transform which combines shift

rotation and scale together

1) SHIFT PARAMETER SEARCH As we discussed before

segmentation may not be accurate As a result the detected iris center could

not be very accurate Shift transform is designed to tolerant possible errors

in pupil center detection in the segmentation step If there is no deformation

or only very minor deformation registration with shift transform together

would be adequate to achieve an accurate result We designed Algorithm 2

to get optimized shift parameter Where Tte is the test template and ssei is

the i th WPL descriptor of Tte Tta is the target template and ssai is the i th

WPL descriptor of Ttad(stek staj ) is Euclidean distance of escriptors stek

and staj

Δsk is the shift value of two descriptors defines as

We first randomly select an equal number of segment descriptors

stek in test template Tte from each quad and find its nearest neighbors staj _

in target template Tta The shift offset of them is recorded as the possible

registration shift factor _sk The final offset registration factor is _soptim

which has the smallest standard deviation among these candidate offsets

2) AFFINE TRANSFORM PARAMETER SEARCH

Affine transform is designed to tolerant some deformation of sclera

patterns in the matching step The affine transform algorithm is shown in

Algorithm 3 The shift value in the parameter set is obtained by randomly

selecting descriptor s(it )te and calculating the distance from its nearest

neighbor staj_ in Tta We transform the test template by the matrix in (7)

At end of the iteration we count the numbers of matched descriptor pairs

from the transformed template and the target template The factor β is

involved to determine if the pair of descriptor is matched and we set it to

be 20 pixels in our experiment After N iterations the optimized transform

parameter set is determined via selecting the maximum matching numbers

m(it) Here stei Tte staj and Tta is defined same as algorithm 2 tr (it )

shi f t θ(it )tr (it ) scale is the parameters of shift rotation and scale

transform generated in i tth iteration R(θ (it )) T (tr (it ) shi f t ) and S(tr (it

) scale) are the transform matrix defined as (7) To search optimize

transform parameter we iterated N times to generate these parameters In

our experiment we set iteration time to 512

3) REGISTRATION AND MATCHING ALGORITHM

Using the optimized parameter set determined from Algorithms 2

and 3 the test template will be registered and matched simultaneously The

registration and matching algorithm is listed in Algorithm 4 Here stei Tte

staj and Tta are defined same as Algorithms 2 and 3 θ(optm) tr (optm) shi f

t tr (optm) scale _soptim are the registration parameters attained from

Algorithms 2 and 3 R_θ(optm)_T _tr (optm) shi f t _S_tr (optm) scale

_ is the descriptor transform matrix defined in Algorithm 3 empty is the angle

between the segment descriptor and radius direction w is the weight of the

descriptor which indicates whether the descriptor is at the edge of sclera or

not To ensure that the nearest descriptors have a similar orientation we

used a constant factor α to check the abstract difference of two ɸ In our

experiment we set α to 5 The total matching score is minimal score of two

transformed result divided by the minimal matching score for test template

and target template

25 MAPPING THE SUBTASKS TO CUDA

CUDA is a single instruction multiple data (SIMD) system and

works as a coprocessor with a CPU A CUDA consists of many streaming

multiprocessors (SM) where the parallel part of the program should be

partitioned into threads by the programmer and mapped into those threads

There are multiple memory spaces in the CUDA memory hierarchy

register local memory shared memory global memory constant memory

and texture memory Register local memory and shared memory are on-

chip and could be a little time consuming to access these memories Only

shared memory can be accessed by other threads within the same block

However there is only limited availability of shared memory Global

memory constant memory and texture memory are off-chip memory and

accessible by all threads which would be very time consuming to access

these memories

Constant memory and texture memory are read-only and cacheable

memory Mapping algorithms to CUDA to achieve efficient processing is

not a trivial task There are several challenges in CUDA programming

If threads in a warp have different control path all the branches will be

executed serially To improve performance branch divergence within a

warp should be avoided

Global memory is slower than on-chip memory in term of access To

completely hide the latency of the small instructions set we should use on-

chip memory preferentially rather than global memory When global

memory access occurs threads in same warp should access the words in

sequence to achieve coalescence

Shared memory is much faster than the local and global memory space

But shared memory is organized into banks which are equal in size If two

addresses of memory request from different thread within a warp fall in the

same memory bank the access will be serialized To get maximum

performance memory requests should be scheduled to minimize bank

conflicts

251 MAPPING ALGORITHM TO BLOCKS

Because the proposed registration and matching algorithm has four

independent modules all the modules will be converted to different kernels

on the GPU These kernels are different in computation density thus we

map them to the GPU by various map strategies to fully utilize the

computing power of CUDA Figure 11 shows our scheme of CPU-GPU

task distribution and the partition among blocks and threads Algorithm 1 is

partitioned into coarse-grained parallel subtasks

We create a number of threads in this kernel The number of threads

is the same as the number of templates in the database As the upper middle

column shows in Figure 11 each target template will be assigned to one

thread One thread performs a pair of templates compare In our work we

use NVIDIA C2070 as our GPU Threads and blocks number is set to

1024 That means we can match our test template with up to 1024times1024

target templates at same time

Algorithms 2-4 will be partitioned into fine-grained subtasks which is

processed a section of descriptors in one thread As the lower portion of the

middle column shows in Figure 11 we assigned a target template to one

block Inside a block one thread corresponds a set of descriptors in this

template This partition makes every block execute independently and there

are no data exchange requirements between different blocks When all

threads complete their responding descriptor fractions the sum of the

intermediate results needs to be computed or compared A parallel prefix

sum algorithm is used to calculate the sum of intermediate results which is

show in right of Figure 11 Firstly all odd number threads compute the sum

of consecutive pairs of the results Then recursively every first of i (= 4 8

16 32 64 ) threads

compute the prefix sum on the new result The final result will be saved in

the first address which has the same variable name as the first intermediate

result

252 MAPPING INSIDE BLOCK

In shift argument searching there are two schemes we can choose to

map task

Mapping one pair of templates to all the threads in a block and then every

thread would take charge of a fraction of descriptors and cooperation with

other threads

Assigning a single possible shift offset to a thread and all the threads will

compute independently unless the final result should be compared with

other possible offset

Due to great number of sum and synchronization operations in every

nearest neighbor searching step we choose the second method to parallelize

shift searching In affine matrix generator we mapped an entire parameter

set searching to a thread and every thread randomly generated a set of

parameters and tried them independently The generated iterations were

assigned to all threads The challenge of this step is the randomly generated

numbers might be correlated among threads In the step of rotation and

scale registration generating we used the Mersenne Twister pseudorandom

number generator because it can use bitwise arithmetic and have long

period

The Mersenne twister as most of pseudorandom generators is iterative

Therefore itrsquos hard to parallelize a single twister state update step among

several execution threads To make sure that thousands of threads in the

launch grid generate uncorrelated random sequence many simultaneous

Mersenne twisters need to process with different initial states in parallel

But even ldquovery differentrdquo (by any definition) initial state values do not

prevent the emission of correlated sequences by each generator sharing

identical parameters To solve this problem and to enable efficient

implementation of Mersenne Twister on parallel architectures we used a

special offline tool for the dynamic creation of Mersenne Twisters

parameters modified from the algorithm developed by Makoto Matsumoto

and Takuji Nishimura In the registration and matching step when

searching the nearest neighbor a line segment that has already matched

with others should not be used again In our approach a flag

FIG

FIG

Variable denoting whether the line has been matched is stored in

shared memory To share the flags all the threads in a block should wait

synchronic operation at every query step Our solution is to use a single

thread in a block to process the matching

253 MEMORY MANAGEMENT

The bandwidth inside GPU board is much higher than the

bandwidth between host memory and device memory The data transfer

between host and device can lead to long latency As shown in Figure 11

we load the entire target templates set from database without considering

when they would be processed Therefore there was no data transfer from

host to device during the matching procedure In global memory the

components in descriptor y(ϕ1 ϕ2 ϕ3 x y) and s(x y rθ ϕw) were stored

separately This would guarantee contiguous kernels of Algorithm 2 to 4

can access their data in successive addresses Although such coalescing

access reduces the latency frequently global memory access was still a

slower way to get data In our kernel we loaded the test template to shared

memory to accelerate memory access Because the Algorithms 2 to 4

execute different number of iterations on same data the bank conflict does

not happen To maximize our texture memory space we set the system

cache to the lowest value and bonded our target descriptor to texture

memory Using this catchable memory our data access was accelerated

more

FIG

26 HISTOGRAM OF ORIENTED GRADIENTS

Histogram of oriented gradients is the feature descriptors It is primarily

applied to the design of target detection In this paper it is applied as the

feature for human recognition In the sclera region the vein patterns are the

edges of an image So HOG is used to determine the gradient orientation

and edge orientations of vein pattern in the sclera region of an eye image

To follow out this technique first of all divide the image into small

connected regions called cells For each cell compute the histogram of

gradient directions or edge orientations of the pixels Then the combination

of different histogram of different cell represents the descriptor To improve

accuracy histograms can be contrast normalized by calculating the intensity

from the block and then using this value normalizes all cells within the

block This normalization result shows that it is invariant to geometric and

photometric changes The gradient magnitude m(x y) and orientation 1050592(x

y) are calculated using x and y directions gradients dx (x y) and dy (x y)

Orientation binning is the second step of HOG This method utilized

to create cell histograms Each pixel within the cell used to give a weight to

the orientation which is found in the gradient computation Gradient

magnitude is used as the weight The cells are in the rectangular form The

binning of gradient orientation should be spread over 0 to 180 degrees and

opposite direction counts as the same In the Fig 8 depicts the edge

orientation of picture elements If the images have any illumination and

contrast changes then the gradient strength must be locally normalized For

that cells are grouped together into larger blocks These blocks are

overlapping so that each cell contributes more than once to the final

descriptor Here rectangular HOG (R-HOG) blocks are applied which are

mainly in square grids The performance of HOG is improved by putting

on a Gaussian window into each block

FIG

CHAPTER 3

SOFTWARE SPECIFICATION

31 GENERAL

MATLAB(matrix laboratory) is a numerical

computing environment and fourth-generation programming language

Developed by Math Works MATLAB allows matrix manipulations

plotting of functions and data implementation of algorithms creation

of user interfaces and interfacing with programs written in other languages

including C C++ Java and Fortran

Although MATLAB is intended primarily for numerical computing an

optional toolbox uses the MuPAD symbolic engine allowing access

to symbolic computing capabilities An additional package Simulink adds

graphicalmulti-domainsimulationandModel-Based

Design for dynamic and embedded systems

In 2004 MATLAB had around one million users across industry

and academia MATLAB users come from various backgrounds

of engineering science and economics MATLAB is widely used in

academic and research institutions as well as industrial enterprises

MATLAB was first adopted by researchers and practitioners

in control engineering Littles specialty but quickly spread to many other

domains It is now also used in education in particular the teaching

of linear algebra and numerical analysis and is popular amongst scientists

involved in image processing The MATLAB application is built around the

MATLAB language The simplest way to execute MATLAB code is to type

it in the Command Window which is one of the elements of the MATLAB

Desktop When code is entered in the Command Window MATLAB can

be used as an interactive mathematical shell Sequences of commands can

be saved in a text file typically using the MATLAB Editor as a script or

encapsulated into a function extending the commands available

MATLAB provides a number of features for documenting and

sharing your work You can integrate your MATLAB code with other

languages and applications and distribute your MATLAB algorithms and

applications

32 FEATURES OF MATLAB

High-level language for technical computing

Development environment for managing code files and data

Interactive tools for iterative exploration design and problem solving

Mathematical functions for linear algebra statistics Fourier analysis

filtering optimization and numerical integration

2-D and 3-D graphics functions for visualizing data

Tools for building custom graphical user interfaces

Functions for integrating MATLAB based algorithms with external

applications and languages such as C C++ FORTRAN Javatrade COM

and Microsoft Excel

MATLAB is used in vast area including signal and image

processing communications control design test and measurement

financial modeling and analysis and computational Add-on toolboxes

(collections of special-purpose MATLAB functions) extend the MATLAB

environment to solve particular classes of problems in these application

areas

MATLAB can be used on personal computers and powerful

server systems including the Cheaha compute cluster With the addition of

the Parallel Computing Toolbox the language can be extended with parallel

implementations for common computational functions including for-loop

unrolling Additionally this toolbox supports offloading computationally

intensive workloads to Cheaha the campus compute cluster MATLAB is

one of a few languages in which each variable is a matrix (broadly

construed) and knows how big it is Moreover the fundamental operators

(eg addition multiplication) are programmed to deal with matrices when

required And the MATLAB environment handles much of the bothersome

housekeeping that makes all this possible Since so many of the procedures

required for Macro-Investment Analysis involves matrices MATLAB

proves to be an extremely efficient language for both communication and

implementation

321 INTERFACING WITH OTHER LANGUAGES

MATLAB can call functions and subroutines written in the C

programming language or FORTRAN A wrapper function is created

allowing MATLAB data types to be passed and returned The dynamically

loadable object files created by compiling such functions are termed MEX-

files (for MATLAB executable)

Libraries written in Java ActiveX or NET can be directly called

from MATLAB and many MATLAB libraries (for

example XML or SQL support) are implemented as wrappers around Java

or ActiveX libraries Calling MATLAB from Java is more complicated but

can be done with MATLAB extension which is sold separately by Math

Works or using an undocumented mechanism called JMI (Java-to-Mat lab

Interface) which should not be confused with the unrelated Java that is also

called JMI

As alternatives to the MuPAD based Symbolic Math Toolbox

available from Math Works MATLAB can be connected

to Maple or Mathematical

Libraries also exist to import and export MathML

Development Environment

Startup Accelerator for faster MATLAB startup on Windows especially on

Windows XP and for network installations

Spreadsheet Import Tool that provides more options for selecting and

loading mixed textual and numeric data

Readability and navigation improvements to warning and error messages in

the MATLAB command window

Automatic variable and function renaming in the MATLAB Editor

Developing Algorithms and Applications

MATLAB provides a high-level language and development

tools that let you quickly develop and analyze your algorithms and

applications

The MATLAB Language

The MATLAB language supports the vector and matrix operations

that are fundamental to engineering and scientific problems It enables fast

development and execution With the MATLAB language you can

program and develop algorithms faster than with traditional languages

because you do not need to perform low-level administrative tasks such as

declaring variables specifying data types and allocating memory In many

cases MATLAB eliminates the need for lsquoforrsquo loops As a result one line of

MATLAB code can often replace several lines of C or C++ code

At the same time MATLAB provides all the features of a traditional

programming language including arithmetic operators flow control data

structures data types object-oriented programming (OOP) and debugging

features

MATLAB lets you execute commands or groups of commands one

at a time without compiling and linking enabling you to quickly iterate to

the optimal solution For fast execution of heavy matrix and vector

computations MATLAB uses processor-optimized libraries For general-

purpose scalar computations MATLAB generates machine-code

instructions using its JIT (Just-In-Time) compilation technology

This technology which is available on most platforms provides

execution speeds that rival those of traditional programming languages

Development Tools

MATLAB includes development tools that help you implement

your algorithm efficiently These include the following

MATLAB Editor

Provides standard editing and debugging features such as setting

breakpoints and single stepping

Code Analyzer

Checks your code for problems and recommends modifications to

maximize performance and maintainability

MATLAB Profiler

Records the time spent executing each line of code

Directory Reports

Scan all the files in a directory and report on code efficiency file

differences file dependencies and code coverage

Designing Graphical User Interfaces

By using the interactive tool GUIDE (Graphical User Interface

Development Environment) to layout design and edit user interfaces

GUIDE lets you include list boxes pull-down menus push buttons radio

buttons and sliders as well as MATLAB plots and Microsoft

ActiveXreg controls Alternatively you can create GUIs programmatically

using MATLAB functions

322 ANALYZING AND ACCESSING DATA

MATLAB supports the entire data analysis process from acquiring

data from external devices and databases through preprocessing

visualization and numerical analysis to producing presentation-quality

output

Data Analysis

MATLAB provides interactive tools and command-line functions for data

analysis operations including

Interpolating and decimating

Extracting sections of data scaling and averaging

Thresholding and smoothing

Correlation Fourier analysis and filtering

1-D peak valley and zero finding

Basic statistics and curve fitting

Matrix analysis

Data Access

MATLAB is an efficient platform for accessing data from

files other applications databases and external devices You can read data

from popular file formats such as Microsoft Excel ASCII text or binary

files image sound and video files and scientific files such as HDF and

HDF5 Low-level binary file IO functions let you work with data files in

any format Additional functions let you read data from Web pages and

XML

Visualizing Data

All the graphics features that are required to visualize engineering

and scientific data are available in MATLAB These include 2-D and 3-D

plotting functions 3-D volume visualization functions tools for

interactively creating plots and the ability to export results to all popular

graphics formats You can customize plots by adding multiple axes

changing line colors and markers adding annotation Latex equations and

legends and drawing shapes

2-D Plotting

Visualizing vectors of data with 2-D plotting functions that create

Line area bar and pie charts

Direction and velocity plots

Histograms

Polygons and surfaces

Scatterbubble plots

Animations

3-D Plotting and Volume Visualization

MATLAB provides functions for visualizing 2-D matrices 3-

D scalar and 3-D vector data You can use these functions to visualize and

understand large often complex multidimensional data Specifying plot

characteristics such as camera viewing angle perspective lighting effect

light source locations and transparency

3-D plotting functions include

Surface contour and mesh

Image plots

Cone slice stream and isosurface

323 PERFORMING NUMERIC COMPUTATION

MATLAB contains mathematical statistical and engineering

functions to support all common engineering and science operations These

functions developed by experts in mathematics are the foundation of the

MATLAB language The core math functions use the LAPACK and BLAS

linear algebra subroutine libraries and the FFTW Discrete Fourier

Transform library Because these processor-dependent libraries are

optimized to the different platforms that MATLAB supports they execute

faster than the equivalent C or C++ code

MATLAB provides the following types of functions for performing

mathematical operations and analyzing data

Matrix manipulation and linear algebra

Polynomials and interpolation

Fourier analysis and filtering

Data analysis and statistics

Optimization and numerical integration

Ordinary differential equations (ODEs)

Partial differential equations (PDEs)

Sparse matrix operations

MATLAB can perform arithmetic on a wide range of data types

including doubles singles and integers

CHAPTER 4

IMPLEMENTATION

41 GENERAL

Matlab is a program that was originally designed to simplify the

implementation of numerical linear algebra routines It has since grown into

something much bigger and it is used to implement numerical algorithms

for a wide range of applications The basic language used is very similar to

standard linear algebra notation but there are a few extensions that will

likely cause you some problems at first

42 SNAPSHOTS

ORIGINAL SCLERA IMAGE IS CONVERTED INTO GREY SCALE IMAGE

FIG

GREY SCALE IMAGE IS CONVERTED INTO BINARY IMAGE

FIG

EDGE DETECTON IS DONE BY OTSUrsquoS THRESHOLDING

FIG

SELECTING THE REGION OF INTEREST (SCLERA PART)

FIG

SELECTED ROI PART

FIG

FIG

ENHANCEMENT OF SCLERA IMAGE

FIG

FEATURE EXTRACTION OF SCLERA IMAGE USING GABOR

FILTERS

FIG

MATCHING WITH IMAGES IN DATABASE

FIG

DISPLAYING THE RESULT (MATCHED OR NOT MATCHED)

FIG

CHAPTER 5

APPLICATIONS

The applications of biometrics can be divided into the following three main groups

Commercial applications such as computer network login electronic data security ecommerce Internet access ATM credit card physical access control cellular phone PDA medical records management distance learning etc

Government applications such as national ID card correctional facility driverrsquos license social security welfare-disbursement border control Passport control etc

Forensic applications such as corpse identification criminal investigation terrorist identification parenthood determination missing children etc Traditionally commercial applications have used knowledge-based systems (eg PIN sand passwords) government applications have used token-based systems (eg ID cards and badges) and forensic applications have relied on human experts to match biometric features Biometric systems are being increasingly deployed in large scale civilian applications The Schiphol Premium scheme at the Amsterdam airport for example employs iris scan cards to speed up the passport and visa control procedures

CHAPTER 6

CONCLUSION AND FUTURE SCOPE

61 CONCLUSION

In this paper we proposed a new parallel sclera vein recognition

method which employees a two stage parallel approach for registration and

matching Even though the research focused on developing a parallel sclera

matching solution for the sequential line-descriptor method using CUDA

GPU architecture the parallel strategies developed in this research can be

applied to design parallel solutions to other sclera vein recognition methods

and general pattern recognition methods We designed the Y shape

descriptor to narrow the search range to increase the matching efficiency

which is a new feature extraction method to take advantage of the GPU

structures We developed the WPL descriptor to incorporate mask

information and make it more suitable for parallel computing which can

dramatically reduce data transferring and computation We then carefully

mapped our algorithms to GPU threads and blocks which is an important

step to achieve parallel computation efficiency using a GPU A work flow

which has high arithmetic intensity to hide the memory access latency was

designed to partition the computation task to the heterogeneous system of

CPU and GPU even to the threads in GPU The proposed method

dramatically improves the matching efficiency without compromising

recognition accuracy

62 REFERENCES

[1] C W Oyster The Human Eye Structure and Function Sunderland

Sinauer Associates 1999

[2] C Cuevas D Berjon F Moran and N Garcia ldquoMoving object

detection for real-time augmented reality applications in a GPGPUrdquo IEEE

Trans Consum Electron vol 58 no 1 pp 117ndash125 Feb 2012

[3] D C Cirean U Meier L M Gambardella and J Schmidhuber ldquoDeep

big simple neural nets for handwritten digit recognitionrdquo Neural Comput

vol 22 no 12 pp 3207ndash3220 2010

[4] F Z Sakr M Taher and A M Wahba ldquoHigh performance iris

recognition system on GPUrdquo in Proc ICCES 2011 pp 237ndash242

[5] G Poli J H Saito J F Mari and M R Zorzan ldquoProcessing

neocognitron of face recognition on high performance environment based

on GPU with CUDA architecturerdquo in Proc 20th Int Symp Comput

Archit High Perform Comput 2008 pp 81ndash88

[6] J Antikainen J Havel R Josth A Herout P Zemcik and M Hauta-

Kasari ldquoNonnegative tensor factorization accelerated using GPGPUrdquo IEEE

Trans Parallel Distrib Syst vol 22 no 7 pp 1135ndash1141 Feb 2011

[7] K-S Oh and K Jung ldquoGPU implementation of neural networksrdquo

Pattern Recognit vol 37 no 6 pp 1311ndash1314 2004

[8] P R Dixon T Oonishi and S Furui ldquoHarnessing graphics processors

for the fast computation of acoustic likelihoods in speech recognitionrdquo

Comput Speech Lang vol 23 no 4 pp 510ndash526 2009

[9] P Kaufman and A Alm ldquoClinical applicationrdquo Adlerrsquos Physiology of

the Eye 2003

[10] R N Rakvic B J Ulis R P Broussard R W Ives and N Steiner

ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics Security vol 4

no 4 pp 812ndash823 Dec 2009

[11] S Crihalmeanu and A Ross ldquoMultispectral scleral patterns for ocular

biometric recognitionrdquo Pattern Recognit Lett vol 33 no 14 pp 1860ndash

1869 Oct 2012

[12] W Wenying Z Dongming Z Yongdong L Jintao and G

Xiaoguang ldquoRobust spatial matching for object retrieval and its parallel

implementation on GPUrdquo IEEE Trans Multimedia vol 13 no 6 pp

1308ndash1318 Dec 2011Multimedia Sec Magdeburg Germany Sep 2004

pp 4ndash15

[13] Y Xu S Deka and R Righetti ldquoA hybrid CPU-GPGPU approach for

real-time elastographyrdquo IEEE Trans Ultrason Ferroelectr Freq Control

vol 58 no 12 pp 2631ndash2645 Dec 2011

[14] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive

multimodal eye recognitionrdquo Signal Image Video Process vol 7 no 4

pp 619ndash631 Jul 2013

[15] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive

approach for sclera image quality measurerdquo Int J Biometrics vol 5 no 2

pp 181ndash198 2013

[16] Z Zhou E Y Du N L Thomas and E J Delp ldquoA new human

identification method Sclera recognitionrdquo IEEE Trans Syst Man

Cybern A Syst Humans vol 42 no 3 pp 571ndash583 May 2012

Page 17: Major project report on

13 A SIMPLE IMAGE MODEL

To be suitable for computer processing an image f(xy) must be digitalized

both spatially and in amplitude

Digitization of the spatial coordinates (xy) is called image sampling

Amplitude digitization is called gray-level quantization

The storage and processing requirements increase rapidly with the spatial

resolution and the number of gray levels

Example A 256 gray-level image of size 256x256 occupies 64K bytes of

memory

Images of very low spatial resolution produce a checkerboard effect

The use of insufficient number of gray levels in smooth areas of a digital

image results in false contouring

14 IMAGE FILE FORMATS

There are two general groups of lsquoimagesrsquo vector graphics (or line art)

and bitmaps (pixel-based or lsquoimagesrsquo) Some of the most common file

formats are

GIF mdash Graphical interchange Format An 8-bit (256 colour) non-

destructively compressed bitmap format Mostly used for web Has several

sub-standards one of which is the animated GIF

JPEG mdash Joint Photographic Experts Group a very efficient (ie much

information per byte) destructively compressed 24 bit (16 million colours)

bitmap format Widely used especially for web and Internet (bandwidth-

limited)

TIFF mdash Tagged Image File Format The standard 24 bit publication bitmap

format Compresses non-destructively with for instance Lempel-Ziv-

Welch (LZW) compression

PS mdash Postscript a standard vector format Has numerous sub-standards

and can be difficult to transport across platforms and operating systems

PSD ndash Adobe PhotoShop Document a dedicated Photoshop format that

keeps all the information in an image including all the layers

BMP- bit map file format

15 TYPE OF IMAGES

Images are 4 types

1 Binary image

2 Gray scale image

3 Color image

4 Indexed image

151 BINARY IMAGES

A binary image is a digital image that has only two possible values for

each pixel Typically the two colors used for a binary image are black and

white though any two colors can be used Binary images are also called bi-

level or two-level This means that each pixel is stored as a single bitmdashie

a 0 or 1 The names black-and-white BampW

152 GRAY SCALE IMAGE

In a (8-bit) grayscale image each picture element has an assigned intensity

that ranges from 0 to 255 A grey scale image is what people normally call

a black and white image but the name emphasizes that such an image will

also include many shades of grey

FIG

153 COLOR IMAGE

The RGB colour model relates very closely to the way we perceive

colour with the r g and b receptors in our retinas RGB uses additive colour

mixing and is the basic colour model used in television or any other

medium that projects colour with light It is the basic colour model used in

computers and for web graphics but it cannot be used for print production

The secondary colours of RGB ndash cyan magenta and yellow ndash are formed

by mixing two of the primary colours (red green or blue) and excluding the

third colour Red and green combine to make yellow green and blue to

make cyan and blue and red form magenta The combination of red green

and blue in full intensity makes white

In Photoshop using the ldquoscreenrdquo mode for the different layers in an

image will make the intensities mix together according to the additive

colour mixing model This is analogous to stacking slide images on top of

each other and shining light through them

FIG

CMYK The 4-colour CMYK model used in printing lays down

overlapping layers of varying percentages of transparent cyan (C) magenta

(M) and yellow (Y) inks In addition a layer of black (K) ink can be added

The CMYK model uses the subtractive colour model

154 INDEXED IMAGE

FIG

An indexed image consists of an array and a color map matrix The

pixel values in the array are direct indices into a color map By convention

this documentation uses the variable name X to refer to the array and map

to refer to the color map In computing indexed color is a technique to

manage digital images colors in a limited fashion in order to save

computer memory and file storage while speeding up display refresh and

file transfers It is a form of vector quantization compression

When an image is encoded in this way color information is not

directly carried by the image pixel data but is stored in a separate piece of

data called a palette an array of color elements in which every element a

color is indexed by its position within the array The image pixels do not

contain the full specification of its color but only its index in the palette

This technique is sometimes referred as pseudocolor or indirect color as

colors are addressed indirectly

Perhaps the first device that supported palette colors was a random-

access frame buffer described in 1975 by Kajiya Sutherland and Cheadle

This supported a palette of 256 36-bit RGB colors

16 Applications of image processing

Interest in digital image processing methods stems from 2 principal

application areas

1) Improvement of pictorial information for human interpretation

2) Processing of scene data for autonomous machine perception

In the second application area interest focuses on procedures for

extracting from an image

Information in a form suitable for computer processing

Examples include automatic character recognition industrial machine

vision for product assembly and inspection military recognizance

automatic processing of fingerprints etc

17 EXISTING SYSTEM

In Crihalmeanu and Ross proposed three approaches Speed Up Robust

Features (SURF)-based method minutiae detection and direct correlation

matching for feature registration and matching Within these three methods

the SURF method achieves the best accuracy It takes an average of 15

seconds1 using the SURF method to perform a one-to-one matching In

Zhou et al Proposed line descriptor-based method for sclera vein

recognition The matching step (including registration) is the most time-

consuming step in this sclera vein recognition system which costs about 12

seconds to perform a one-to-one matching Both speed was calculated using

a PC with Intelreg Core TM 2 Duo 24GHz processors and 4 GB DRAM

Currently Sclera vein recognition algorithms are designed using central

processing unit (CPU)-based systems

171 DISADVANTAGES OF EXISTING SYSTEM

1 Mask files are used to calculate valid overlapping areas of two sclera

templates and to align the templates to the same coordinate system But the

mask files are large in size and will preoccupy the GPU memory and slow

down the data transfer Also some of processing on the mask files will

involve convolution which is difficult to improve its performance on the

scalar process unit on CUDA

2 The procedure of sclera feature matching consists of a pipeline of several

computational stages with different memory and processing requirements

There is no uniform mapping scheme applicable for all these stages

3 When the scale of sclera database is far larger than the number of the

processing units on the GPU parallel matching on the GPU is still unable to

satisfy the requirement of real-time performance

LITERATURE SURVEY

1 S Crihalmeanu and A Ross ldquo ieee Multispectral scleral patterns

for ocular biometric recognitionrdquo Pattern Recognit Lett vol 33 no

14 pp 1860ndash1869 Oct 2012

Face recognition in unconstrained acquisition conditions is one of the

most challenging problems that has been actively researched in recent

years It is well known that many state-of-the-arts still face recognition

algorithms perform well when constrained (frontal well illuminated high-

resolution sharp and full) face images are acquired However their

performance degrades significantly when the test images contain variations

that are not present in the training images In this paper we highlight some

of the key issues in remote face recognition We define the remote face

recognition as one where faces are several tens of meters (10-250m) from

the cameras We then describe a remote face database which has been

acquired in an unconstrained outdoor maritime environment Recognition

performance of a subset of existing still image-based face recognition

algorithms is evaluated on the remote face data set Further we define the

remote re-identification problem as matching a subject at one location with

candidate sets acquired at a different location and over time in remote

conditions We provide preliminary experimental results on remote re-

identification It is demonstrated that in addition to applying a good

classification algorithm finding features that are robust to variations

mentioned above and developing statistical models which can account for

these variations are very important for remote face recognition

2 R N Rakvic B J Ulis R P Broussard R W Ives and N

Steiner1 ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics

Security

With the rapidly expanded biometric data collected by various sectors

of government and industry for identification and verification purposes

how to manage and process such Big Data draws great concern Even

though modern processors are equipped with more cores and memory

capacity it still requires careful design in order to utilize the hardware

resource effectively and the power consumption efficiently This research

addresses this issue by investigating the workload characteristics of

biometric application Taking Daugmanrsquos iris matching algorithm which

has been proven to be the most reliable iris matching method as a case

study we conduct performance profiling and binary instrumentation on the

benchmark to capture its execution behavior The results show that data

loading and memory access incurs great performance overhead and

motivates us to move the biometrics computation to high-performance

architecture

Modern iris recognition algorithms can be computationally intensive

yet are designed for traditional sequential processing elements such as a

personal computer However a parallel processing alternative using field

programmable gate arrays (FPGAs) offers an opportunity to speed up iris

recognition Within the means of this project iris template generation with

directional filtering which is acomputationally expensive yet parallel

portion of a modern iris recognition algorithm is parallelized on an FPGA

system We will present a performance comparison of the parallelized

algorithm on the FPGA system to a traditional CPU-based version The

parallelized template generation outperforms an optimized C++ code

version determining the information content of an iris approximately 324

times faster

3 R Derakhshani A Ross and S Crihalmeanu ldquoA new biometric

modality based on conjunctival vasculaturerdquo in Proc Artif Neural

Netw Eng 2006 pp 1ndash8

A new biometric indicator based on the patterns of conjunctival

vasculature is proposed Conjunctival vessels can be observed on the visible

part of the sclera that is exposed to the outside world These vessels

demonstrate rich and specific details in visible light and can be easily

photographed using a regular digital camera In this paper we discuss

methods for conjunctival imaging preprocessing and feature extraction in

order to derive a suitable conjunctival vascular template for biometric

authentication Commensurate classification methods along with the

observed accuracy are discussed Experimental results suggest the potential

of using conjunctival vasculature as a biometric measure Identification of

a person based on some unique set of features is an important task The

human identification is possible with several biometric systems and sclera

recognition is one of the promising biometrics The sclera is the white

portion of the human eye The vein pattern seen in the sclera region is

unique to each person Thus the sclera vein pattern is a well suited

biometric technology for human identification The existing methods used

for sclera recognition have some drawbacks like only frontal looking

images are preferred for matching and rotation variance is another problem

These problems are completely eliminated in the proposed system by using

two feature extraction techniques They are Histogram of Oriented

Gradients (HOG) and converting the image into polar form using the

bilinear interpolation technique These two features help the proposed

system to become illumination invariant and rotation invariant The

experimentation is done with the help of UBIRIS database The

experimental result shows that the proposed sclera recognition method can

achieve better accuracy than the previous methods

4 J D Owens M Houston D Luebke S Green J E Stone and J

C Phillips ldquoGPU computingrdquo Proc IEEE vol 96 no 5pp 879ndash899

May 2008

The graphics processing unit (GPU) has become an integral part of

todayrsquos mainstream computing systems Over the past six years there has

been a marked increase in the performance and capabilities of GPUs The

modern GPU is not only a powerful graphics engine but also a highly

parallel programmable processor featuring peak arithmetic and memory

bandwidth that substantially outpaces its CPU counterpart The GPUrsquos

rapid increase in both programmability and capability has spawned a

research community that has successfully mapped a broad range of

computationally demanding complex problems to the GPU This effort in

general purpose computing on the GPU also known as GPU computing

has positioned the GPU as a compelling alternative to traditional

microprocessors in high-performance computer systems of the future We

describe the background hardware and programming model for GPU

computing summarize the state of the art in tools and techniques and

present four GPU computing successes in game physics and computational

biophysics that deliver order-of-magnitude performance gains over

optimized CPU applications

5 H Proenccedila and L A Alexandre ldquoUBIRIS A noisy iris image

databaserdquoin Proc 13th Int Conf Image Anal Process 2005 pp 970ndash

977

This paper proposes algorithms for iris segmentation quality

enhancement match score fusion and indexing to improve both the

accuracy and the speed of iris recognition A curve evolution approach is

proposed to effectively segment a nonideal iris image using the modified

MumfordndashShah functional Different enhancement algorithms are

concurrently applied on the segmented iris image to produce multiple

enhanced versions of the iris image A support-vector-machine-based

learning algorithm selects locally enhanced regions from each globally

enhanced image and combines these good-quality regions to create a single

high-quality iris image Two distinct features are extracted from the high-

quality iris image The global textural feature is extracted using the 1-D log

polar Gabor transform and the local topological feature is extracted using

Euler numbers An intelligent fusion algorithm combines the textural and

topological matching scores to further improve the iris recognition

performance and reduce the false rejection rate whereas an indexing

algorithm enables fast and accurate iris identification The verification and

identification performance of the propose algorithms is validated and

compared with other algorithms using the CASIA Version 3 ICE 2005 and

UBIRIS iris databases

18 PROPOSED METHOD

Proposed a new parallel sclera vein recognition method using a two-

stage parallel approach for registration and matching A parallel sclera

matching solution for Sclera vein recognition using our sequential line-

descriptor method using the CUDA GPU architecture CUDA is a highly

parallel multithreaded many-core processor with tremendous

computational power

It supports not only a traditional graphics pipeline but also computation

on non-graphical data It is relatively straightforward to implement our C

program for CUDA on AMD-based GPU using Open CL Our CUDA

kernels can be directly converted to Open CL kernels by concerning

different syntax for various keywords and built-in functions The mapping

strategy is also effective in Open CL if we regard thread and block in

CUDA as work item and work-group in Open CL Most of our optimization

techniques such as coalesced memory access and prefix sum can work in

Open CL too Moreover since CUDA is a data parallel architecture the

implementation of our approach by Open CL should be programmed in

data-parallel model

In this research we first discuss why the naiumlve parallel approach would

not work We then propose the new sclera descriptor ndash the Y shape sclera

feature-based efficient registration method to speed up the mapping scheme

introduce the ldquoweighted polar line (WPL) descriptorrdquo that would be better

suited for parallel computing to mitigate the mask size issue and develop

our coarse to fine two-stage matching process to dramatically improve the

matching speed These new approaches make the parallel processing

possible and efficient

191PROPOSED SYSTEM ADVANTAGES

1 To improve the efficiency in this research we propose a new descriptor

mdash the Y shape descriptor which can greatly help improve the efficiency of

the coarse registration of two images and can be used to filter out some

non-matching pairs before refined matching

2 We propose the coarse-to-fine two-stage matching process In the first

stage we matched two images coarsely using the Y-shape descriptors

which is very fast to match because no registration was needed The

matching result in this stage can help filter out image pairs with low

similarities

CHAPTER 2

PROJECT DESCRIPTION

21 INTRODUCTION

The sclera is the opaque and white outer layer of the eye The blood

vessel structure of sclera is formed randomly and is unique to each person

which can be used for humanrsquos identification Several researchers have

designed different Sclera vein recognition methods and have shown that it

is promising to use Sclera vein recognition for human identification In

Crihalmeanu and Ross proposed three approaches Speed Up Robust

Features (SURF)-based method minutiae detection and direct correlation

matching for feature registration and matching Within these three methods

the SURF method achieves the best accuracy It takes an average of 15

seconds1 using the SURF method to per- form a one-to-one matching Zhou

et al proposed line descriptor-based method for sclera vein recognition

The matching step (including registration) is the most time-consuming step

in this sclera vein recognition system which costs about 12 seconds to

perform a one-to-one matching Both speed was calculated using a PC with

Intelreg Coretrade 2 Duo 24GHz processors and 4 GB DRAM Currently

Sclera vein recognition algorithms are designed using central processing

unit (CPU)-based systems

As discussed CPU-based systems are designed as sequential

processing devices which may not be efficient in data processing where the

data can be parallelized Because of large time consumption in the matching

step Sclera vein recognition using sequential-based method would be very

challenging to be implemented in a real time biometric system especially

when there is large number of templates in the database for matching GPUs

(as abbreviation of General purpose Graphics Processing Units GPGPUs)

are now popularly used for parallel computing to improve the

computational processing speed and efficiency The highly parallel

structure of GPUs makes them more effective than CPUs for data

processing where processing can be performed in parallel GPUs have been

widely used in biometrics recognition such as speech recognition text

detection handwriting recognition and face recognition In iris

recognition GPU was used to extract the features construct descriptors

and match templates

GPUs are also used for object retrieval and image search Park et al

designed the performance evaluation of image processing algorithms such

as linear feature extraction and multi-view stereo matching on GPUs

However these approaches were designed for their specific biometric

recognition applications and feature searching methods Therefore they may

not be efficient for Sclera vein recognition Compute Unified Device

Architecture (CUDA) the computing engine of NVIDIA GPUs is used in

this research CUDA is a highly parallel multithreaded many-core

processor with tremendous computational power It supports not only a

traditional graphics pipeline but also computation on non-graphical data

More importantly it offers an easier programming platform which

outperforms its CPU counterparts in terms of peak arithmetic intensity and

memory bandwidth In this research the goal is not to develop a unified

strategy to parallelize all sclera matching methods because each method is

quite different from one another and would need customized design To

develop an efficient parallel computing scheme it would need different

strategies for different Sclera vein recognition methods

Rather the goal is to develop a parallel sclera matching solution for

Sclera vein recognition using our sequential line-descriptor method using

the CUDA GPU architecture However the parallelization strategies

developed in this research can be applied to design parallel approaches for

other Sclera vein recognition methods and help parallelize general pattern

recognition methods Based on the matching approach in there are three

challenges to map the task of sclera feature matching to GPU

1) Mask files are used to calculate valid overlapping areas of two sclera

templates and to align the templates to the same coordinate system But the

mask files are large in size and will preoccupy the GPU memory and slow

down the data transfer Also some of processing on the mask files will

involve convolution which is difficult to improve its performance on the

scalar process unit on CUDA

2) The procedure of sclera feature matching consists of a pipeline of several

computational stages with different memory and processing requirements

There is no uniform mapping scheme applicable for all these stages

3) When the scale of sclera database is far larger than the number of the

processing units on the GPU parallel matching on the GPU is still unable to

satisfy the requirement of real-time performance New designs are

necessary to help narrow down the search range In summary naiumlve

implementation of the algorithms in parallel would not work efficiently

Note it is relatively straightforward to implement our C program for

CUDA on AMD-based GPU using Open CL Our CUDA kernels can be

directly converted to Open CL kernels by concerning different syntax for

various keywords and built-in functions The mapping strategy is also

effective in Open CL if we regard thread and block in CUDA as work item

and work-group in Open CL Most of our optimization techniques such as

coalesced memory access and prefix sum can work in Open CL too

Moreover since CUDA is a data parallel architecture the implementation

of our approach by Open CL should be programmed in data-parallel model

In this research we first discuss why the naiumlve parallel approach would not

work (Section 3) We then propose the new sclera descriptor ndash the Y shape

sclera feature-based efficient registration method to speed up the mapping

scheme (Section 4) introduce the ldquoweighted polar line (WPL) descriptorrdquo

that would be better suited for parallel computing to mitigate the mask size

issue (Section 5) and develop our coarse to fine two-stage matching

process to dramatically improve the matching speed (Section 6) These new

approaches make the parallel processing possible and efficient However it

is non-trivial to implement these algorithms in CUDA We then developed

the implementation schemes to map our algorithms into CUDA (Section 7)

In the Section 2 we give brief introduction of Sclera vein recognition In

the Section 8 we performed some experiments using the proposed system

In the Section 9 we draw some conclusions

22 BACKGROUND OF SCLERA VEIN RECOGNITION

221 OVERVIEW OF SCLERA VEIN RECOGNITION

A typical sclera vein recognition system includes sclera

segmentation feature enhancement feature extraction and feature

matching (Figure 1)

FIG

Sclera image segmentation is the first step in sclera vein recognition

Several methods have been designed for sclera segmentation Crihalmeanu

et al presented an semi-automated system for sclera segmentation They

used a clustering algorithm to classify the color eye images into three

clusters - sclera iris and background Later on Crihalmeanu and Ross

designed a segmentation approach based on a normalized sclera index

measure which includes coarse sclera segmentation pupil region

segmentation and fine sclera segmentation Zhou et al developed a skin

tone plus ldquowhite colorrdquo-based voting method for sclera segmentation in

color images and Otsursquos thresholding-based method for grayscale images

After sclera segmentation it is necessary to enhance and extract the sclera

features since the sclera vein patterns often lack contrast and are hard to

detect Zhou et al used a bank of multi-directional Gabor filters for

vascular pattern enhancement Derakhshani et al used contrast limited

adaptive histogram equalization (CLAHE) to enhance the green color plane

of the RGB image and a multi-scale region growing approach to identify

the sclera veins from the image background Crihalmeanu and Ross applied

a selective enhancement filter for blood vessels to extract features from the

green component in a color image In the feature matching step

Crihalmeanu and Ross proposed

three registration and matching approaches including Speed Up Robust

Features (SURF) which is based on interest-point detection minutiae

detection which is based on minutiae points on the vasculature structure

and direct correlation matching which relies on image registration Zhou et

al designed a line descriptor based feature registration and matching

method

The proposed sclera recognition consists of five steps which include

sclera segmentation vein pattern enhancement feature extraction feature

matching and matching decision Fig 2 shows the block diagram of sclera

recognition Two types of feature extraction are used in the proposed

method to achieve good accuracy for the identification The characteristics

that are elicited from the blood vessel structure seen in the sclera region are

Histogram of Oriented Gradients (HOG) and interpolation of Cartesian to

Polar conversion HOG is used to determine the gradient orientation and

edge orientations of vein pattern in the sclera region of an eye image To

become more computationally efficient the data of the image are converted

to the polar form It is mainly used for circular or quasi circular shape of

object These two characteristics are extracted from all the images in the

database and compared with the features of the query image whether the

person is correctly identified or not This procedure is done in the feature

matching step and ultimately makes the matching decision By using the

proposed feature extraction methods and matching techniques the human

identification is more accurate than the existing studies In the proposed

method two features of an image are drawn out

222 SCLERA SEGMENTATION

Sclera segmentation is the first step in the sclera recognition It lets

in three steps glare area detection sclera area estimation and iris and eyelid

detection and refinement Fig shows the steps of segmentation

FIG

Glare Area Detection Glare area means a small bright area near

pupil or iris This is the unwanted portion on the eye image Sobel filter is

applied to detect the glare area present in the iris or pupil Simply it runs

only for the grayscale image If the image is color then it needs a

conversion to grayscale image and after that apply it to the Sobel filter to

detect the glare area Fig 4 shows the result of the glare area detection

FIG

Sclera area estimation For the estimation of sclera area Otsursquos

thresholding method is applied The stairs of the sclera area detection are

selection of the area of interest (ROI) Otsursquos thresholding sclera area

detection Left and right sclera area is selected based on the iris boundaries

When the region of interest is selected then apply Otsursquos thresholding for

obtaining the potential sclera areas The correct left sclera area should be

placed in the right and center positions and correct right sclera area should

be placed in the left and center In this way non sclera areas are wiped out

223 IRIS AND EYELID REFINEMENT

The top and underside of the sclera regions are the limits of the

sclera area And then that upper eyelid lower eyelid and iris boundaries are

refined These altogether are the unwanted portion for recognition In order

to eliminate these effects refinement is done in the footstep of the detection

of sclera area Fig shows after the Otsursquos thresholding process and iris and

eyelid refinement to detect right sclera area In the same way the left sclera

area is detected using this method

FIG

In the segmentation process all images are not perfectly segmented

Hence feature extraction and matching are needed to reduce the

segmentation fault The vein patterns in the sclera area are not visible in the

segmentation process To get vein patterns more visible vein pattern

enhancement is to be performed

224 OCULAR SURFACE VASCULATURE

Human recognition using vascular patterns in the human body has

been studied in the context of fingers (Miura et al 2004) palm (Lin and

Fan 2004) and retina (Hill 1999) In the case of retinal biometrics an

especial optical device for imaging the back of the eyeball is needed (Hill

1999) Due to its perceived invasiveness and the required degree of subject

cooperation the use of retinal biometrics may not be acceptable to some

individuals The conjunctiva is a thin transparent and moist tissue that

covers the outer surface of the eye The part of the conjunctiva that covers

the inner lining of the eyelids is called palpebral conjunctiva and the part

that covers the outer surface of the eye is called ocular (or the bulbar)

conjunctiva which is the focus of this study The ocular conjunctiva is very

thin and clear thus the vasculature (including those of the episclera) is

easily visible through it The visible microcirculation of conjunctiva offers a

rich and complex network of veins and fine microcirculation (Fig 1) The

apparent complexity and specificity of these vascular patterns motivated us

to utilize them for personal identification (Derakhshani and Ross 2006)

FIG

We have found conjunctival vasculature to be a suitable biometric as it

conforms to the following criteria (Jain et al 2004)

UNIVERSALITY All normal living tissues including that of the

conjunctiva and episclera have vascular structure

UNIQUENESS Vasculature is created during embryonic vasculogenesis

Its detailed final structure is mostly stochastic and thus unique Even

though no comprehensive study on the uniqueness of vascular structures

has been conducted study of some targeted areas such as those of the eye

fundus confirm the uniqueness of such vascular patterns even between

identical twins (Simon and Goldstein 1935 Tower 1955)

PERMANENCE Other than cases of significant trauma pathology or

chemical intervention spontaneous adult ocular vasculogenesis and

angiogenesis does not easily occur Thus the conjunctival vascular

structure is expected to have reasonable permanence (Joussen 2001)

Practicality Conjunctival vasculature can be captured with commercial off

the shelf digital cameras under normal lighting conditions making this

modality highly practical

ACCEPTABILITY Since the subject is not required to stare directly into

the camera lens and given the possibility of capturing the conjunctival

vasculature from several feet away this modality is non-intrusive and thus

more acceptable

SPOOF-PROOFNESS The fine multi surface structure of the ocular

veins makes them hard to reproduce as a physical artifact Besides being a

stand-alone biometric modality we anticipate that the addition of

conjunctival biometrics will enhance the performance of current iris-based

biometric system in the following ways

Improving accuracy by the addition of vascular features

Facilitating recognition using off-angle iris images For instance if the iris

information is relegated to the left or right portions of the eye the sclera

vein patterns will be further exposed This feature makes sclera vasculature

a natural complement to the iris biometric

Addressing the failure-to-enroll issue when iris patterns are not usable (eg

due to surgical procedures)

Reducing vulnerability to spoof attacks For instance when implemented

alongside iris systems an attacker needs to reproduce not only the iris but

also different surfaces of the sclera along with the associated

microcirculation and make them available on commensurate eye surfaces

The first step in parallelizing an algorithm is to determine the

availability for simultaneous computation Below Figure demonstrates the

possibility for parallel directional filtering Since the filter is computed over

different portions of the input image the computation can be computed in

parallel (denoted by Elements below) In addition individual parallelization

of each element of Filtering can also be performed A detailed discussion of

our proposed parallelization is outside the scope of this paper

FIG

FIG

225 OVERVIEW OF THE LINE DESCRIPTOR-BASED SCLERA

VEIN

2251 RECOGNITION METHOD

The matching segment of the line-descriptor based method is a

bottleneck with regard to matching speed In this section we briefly

describe the Line Descriptor-based sclera vein recognition method After

segmentation vein patterns were enhanced by a bank of directional Gabor

filters Binary morphological operations are used to thin the detected vein

structure down to a single pixel wide skeleton and remove the branch

points The line descriptor is used to describe the segments in the vein

structure Figure 2 shows a visual description of the line descriptor Each

segment is described by three quantities the segments angle to some

reference angle at the iris center θ the segments distance to the iris center r

and the dominant angular orientation of the line segment ɸ Thus the

descriptor is S = ( θ r ɸ )T The individual components of the line descriptor

are calculated as

FIG

Here fline (x) is the polynomial approximation of the line segment (xl yl )

is the center point of the line segment (xi yi ) is the center of the detected

iris and S is the line descriptor In order to register the segments of the

vascular patterns a RANSAC-based algorithm is used to estimate the best-

fit parameters for registration between the two sclera vascular patterns For

the registration algorithm it randomly chooses two points ndash one from the

test template and one from the target template It also randomly chooses a

scaling factor and a rotation value based on a priori knowledge of the

database Using these values it calculates a fitness value for the registration

using these parameters

After sclera template registration each line segment in the test

template is compared to the line segments in the target template for

matches In order to reduce the effect of segmentation errors we created the

weighting image (Figure 3) from the sclera mask by setting interior pixels

in the sclera mask to 1 pixels within some distance of the boundary of the

mask to 05 and pixels outside the mask to 0

The matching score for two segment descriptors is calculated By

where Si and Sj are two segment descriptors m(Si Sj ) is the matching

score between segments Si and Sj d(Si Sj ) is the Euclidean distance

between the segment descriptors center points (from Eq 6-8) Dmatch is

the matching distance threshold and match is the matching angle threshold

The total matching score M is the sum of the individual matching scores

divided by the maximum matching score for the minimal set between the

test and target template That is one of the test or target templates has fewer

points and thus the sum of its descriptors weight sets the maximum score

that can be attained

FIG

FIG

FIG

FIG

movement of eye Y shape branches are observed to be a stable feature and

can be used as sclera feature descriptor To detect the Y shape branches in

the original template we search for the nearest neighbors set of every line

segment in a regular distance classified the angles among these neighbors

If there were two types of angle values in the line segment set this set may

be inferred as a Y shape structure and the line segment angles would be

recorded as a new feature of the sclera

There are two ways to measure both orientation and relationship of

every branch of Y shape vessels one is to use the angles of every branch to

x axle the other is to use the angels between branch and iris radial

direction The first method needs additional rotation operating to align the

template In our approach we employed the second method As Figure 6

shows ϕ1 ϕ2 and ϕ3 denote the angle between each branch and the radius

from pupil center Even when the head tilts the eye moves or the camera

zooms occurs at the image acquisition step ϕ1 ϕ2 and ϕ3 are quite stable

To tolerate errors from the pupil center calculation in the segmentation step

we also recorded the center position (x y) of the Y shape branches as

auxiliary parameters So in our rotation shift and scale invariant feature

vector is defined as y(ϕ1 ϕ2 ϕ3 x y) The Y-shape descriptor is generated

with reference to the iris center Therefore it is automatically aligned to the

iris centers It is a rotational- and scale- invariant descriptor V WPL

SCLERA DESCRIPTOR As we discussed in the Section 22 the line

descriptor is extracted from the skeleton of vessel structure in binary images

(Figure 7) The skeleton is then broken into smaller segments For each

segment a line descriptor is created to record the center and orientation of

the segment This descriptor is expressed as s(x yɸ) where (x y) is the

position of the center and ɸ is its orientation Because of the limitation of

segmentation accuracy the descriptor in the boundary of sclera area might

not be accurate and may contain spur edges resulting from the iris eyelid

andor eyelashes To be tolerant of such error the mask file

FIG

The line descriptor of the sclera vessel pattern (a) An eye image (b) Vessel

patterns in sclera (c) Enhanced sclera vessel patterns (d) Centers of line

segments of vessel patterns

Is designed to indicate whether a line segment belongs to the edge of the

sclera or not However in GPU application using the mask is a challenging

since the mask files are large in size and will occupy the GPU memory and

slow down the data transfer When matching the registration RANSAC

type algorithm was used to randomly select the corresponding descriptors

and the transform parameter between them was used to generate the

template transform affine matrix After every templates transform the mask

data should also be transformed and new boundary should be calculated to

evaluate the weight of the transformed descriptor This results in too many

convolutions in processor unit

To reduce heavy data transfer and computation we designed the

weighted polar line (WPL) descriptor structure which includes the

information of mask and can be automatically aligned We extracted the

relationship of geometric feature of descriptors and store them as a new

descriptor We use a weighted image created via setting various weight

values according to their positions The weight of those descriptors who are

beyond the sclera are set to be 0 and those who are near the sclera

boundary are 05 and interior descriptors are set to be 1 In our work

descriptors weights were calculated on their own mask by the CPU only

once

The calculating result was saved as a component of descriptor The

descriptor of sclera will change to s(x y ɸw) where w denotes the weight

of the point and the value may be 0 05 1 To align two templates when a

template is shifted to another location along the line connecting their

centers all the descriptors of that template will be transformed It would be

faster if two templates have similar reference points If we use the center of

the iris as the reference point when two templates are compared the

correspondence will automatically be aligned to each other since they have

the similar reference point Every feature vector of the template is a set of

line segment descriptors composed of three variable (Figure 8) the

segment angle to the reference line which went through the iris center

denoted as θ the distance between the segments center and pupil center

which is denoted as r the dominant angular orientation of the segment

denoted as ɸ To minimize the GPU computing we also convert the

descriptor value from polar coordinate to rectangular coordinate in CPU

preprocess

The descriptor vector becomes s(x y r θ ɸw) The left and right

parts of sclera in an eye may have different registration parameters For

example as an eyeball moves left left part sclera patterns of the eye may be

compressed while the right part sclera patterns are stretched

In parallel matching these two parts are assigned to threads in

different warps to allow different deformation The multiprocessor in

CUDA manages threads in groups of 32 parallel threads called warps We

reorganized the descriptor from same sides and saved

FIG

FIG

them in continuous address This would meet requirement of coalesced

memory access in GPU

After reorganizing the structure of descriptors and adding mask information

into the new descriptor the computation on the mask file is not needed on

the GPU It was very fast to match with this feature because it does not

need to reregister the templates every time after shifting Thus the cost of

data transfer and computation on GPU will be reduced Matching on the

new descriptor the shift parameters generator in Figure 4 is then simplified

as Figure 9

23 EVOLUTION OF GPU ARCHITECTURE

The fixed-function pipeline lacked the generality to efficiently express

more complicated shading and lighting operations that are essential for

complex effects The key step was replacing the fixed-function per-vertex

and per-fragment operations with user-specified programs run on each

vertex and fragment Over the past six years these vertex programs and

fragment programs have become increasingly more capable with larger

limits on their size and resource consumption with more fully featured

instruction sets and with more flexible control-flow operations After many

years of separate instruction sets for vertex and fragment operations current

GPUs support the unified Shader Model 40 on both vertex and fragment

shaders

The hardware must support shader programs of at least 65 k static

instructions and unlimited dynamic instructions

The instruction set for the first time supports both 32-bit integers and 32-

bit floating-point numbers

The hardware must allow an arbitrary number of both direct and indirect

reads from global memory (texture)

Finally dynamic flow control in the form of loops and branches must be

supported

As the shader model has evolved and become more powerful and GPU

applications of all types have increased vertex and fragment program

complexity GPU architectures have increasingly focused on the

programmable parts of the graphics pipeline Indeed while previous

generations of GPUs could best be described as additions of

programmability to a fixed-function pipeline todayrsquos GPUs are better

characterized as a programmable engine surrounded by supporting fixed-

function units General-Purpose Computing on the GPU Mapping general-

purpose computation onto the GPU uses the graphics hardware in much the

same way as any standard graphics application Because of this similarity it

is both easier and more difficult to explain the process On one hand the

actual operations are the same and are easy to follow on the other hand the

terminology is different between graphics and general-purpose use Harris

provides an excellent description of this mapping process

We begin by describing GPU programming using graphics terminology

then show how the same steps are used in a general-purpose way to author

GPGPU applications and finally use the same steps to show the more

simple and direct way that todayrsquos GPU computing applications are written

231 PROGRAMMING A GPU FOR GRAPHICS

We begin with the same GPU pipeline that we described in Section II

concentrating on the programmable aspects of this pipeline

The programmer specifies geometry that covers a region on the screen

The rasterizer generates a fragment at each pixel location covered by that

geometry

Each fragment is shaded by the fragment program

The fragment program computes the value of the fragment by a

combination of math operations and global memory reads from a global

Btexture[ memory

The resulting image can then be used as texture on future passes through

the graphics pipeline

232 PROGRAMMING A GPU FOR GENERAL-PURPOSE

PROGRAMS (OLD)

Coopting this pipeline to perform general-purpose computation

involves the exact same steps but different terminology A motivating

example is a fluid simulation computed over a grid at each time step we

compute the next state of the fluid for each grid point from the current state

at its grid point and at the grid points of its neighbors

The programmer specifies a geometric primitive that covers a

computation domain of interest The rasterizer generates a fragment at each

pixel location covered by that geometry (In our example our primitive

must cover a grid of fragments equal to the domain size of our fluid

simulation)

Each fragment is shaded by an SPMD general-purpose fragment

program (Each grid point runs the same program to update the state of its

fluid)

The fragment program computes the value of the fragment by a

combination of math operations and Bgather[ accesses from global

memory (Each grid point can access the state of its neighbors from the

previous time step in computing its current value)

The resulting buffer in global memory can then be used as an input on

future passes (The current state of the fluid will be used on the next time

step)

23 2 PROGRAMMING A GPU FOR GENERAL-PURPOSE

PROGRAMS (NEW)

One of the historical difficulties in programming GPGPU applications

has been that despite their general-purpose tasksrsquo having nothing to do with

graphics the applications still had to be programmed using graphics APIs

In addition the program had to be structured in terms of the graphics

pipeline with the programmable units only accessible as an intermediate

step in that pipeline when the programmer would almost certainly prefer to

access the programmable units directly The programming environments we

describe in detail in Section IV are solving this difficulty by providing a

more natural direct non-graphics interface to the hardware and

specifically the programmable units Today GPU computing applications

are structured in the following way

The programmer directly defines the computation domain of interest as a

structured grid of threads

An SPMD general-purpose program computes the value of each thread

The value for each thread is computed by a combination of math

operations and both Bgather[ (read) accesses from and Bscatter[ (write)

accesses to global memory Unlike in the previous two

methods the same buffer can be used for both reading and writing

allowing more flexible algorithms (for example in-place algorithms that

use less memory)

The resulting buffer in global memory can then be used as an input in

future computation

24 COARSE-TO-FINE TWO-STAGE MATCHING PROCESS

To further improve the matching process we propose the coarse-to-fine

two-stage matching process In the first stage we matched two images

coarsely using the Y-shape descriptors which is very fast to match because

no registration was needed The matching result in this stage can help filter

out image pairs with low similarities After this step it is still possible for

some false positive matches In the second stage we used WPL descriptor

to register the two images for more detailed descriptor matching including

scale- and translation invariance This stage includes shift transform affine

matrix generation and final WPL descriptor matching Overall we

partitioned the registration and matching processing into four kernels2 in

CUDA (Figure 10) matching on the Y shape descriptor shift

transformation affine matrix generation and final WSL descriptor

matching Combining these two stages the matching program can run faster

and

achieve more accurate score

241 STAGE I MATCHING WITH Y SHAPE DESCRIPTOR

Due to scale- and rotation- invariance of the Y-shape features

registration is unnecessary before matching on Y shape descriptor The

whole matching algorithm is listed as algorithm 1

FIG

Here ytei and yta j are the Y shape descriptors of test template Tte

and target template Tta respectively dϕ is the Euclidian distance of angle

element of descriptors vector defined as (3) dxy is the Euclidian distance of

two descriptor centers defined as (4) ni and di are the matched descriptor

pairsrsquo number and their centers distance respectively tϕ is a distance

threshold and txy is the threshold to restrict the searching area We set tϕ to

30 and txy to 675 in our experiment Here

To match two sclera templates we searched the areas nearby to all

the Y shape branches The search area is limited to the corresponding left or

right half of the sclera in order to reduce the searching range and time The

distance of two branches is defined in (3) where ϕi j is the angle between

the j th branch and the polar from pupil center in desctiptor i

The number of matched pairs ni and the distance between Y shape

branches centers di are stored as the matching result We fuse the number of

matched branches and the average distance between matched branches

centers as (2) Here α is a factor to fuse the matching score which was set

to 30 in our study Ni and Nj is the total numbers of feature vectors in

template i and j separately The decision is regulated by the threshold t if

the sclerarsquos matching score is lower than t the sclera will be discarded The

sclera with high matching score will be passed to the next more precisely

matching process

242 STAGE II FINE MATCHING USING WPL DESCRIPTOR

The line segment WSL descriptor reveals more vessel structure detail of

sclera than the Y shape descriptor The variation of sclera vessel pattern is

nonlinear because

When acquiring an eye image in different gaze angle the vessel structure

will appear nonlinear shrink or extend because eyeball is spherical in shape

sclera is made up of four layers episclera stroma lamina fusca and

endothelium There are slightly differences among movement of these

layers Considering these factors our registration employed both single

shift transform and multi-parameter transform which combines shift

rotation and scale together

1) SHIFT PARAMETER SEARCH As we discussed before

segmentation may not be accurate As a result the detected iris center could

not be very accurate Shift transform is designed to tolerant possible errors

in pupil center detection in the segmentation step If there is no deformation

or only very minor deformation registration with shift transform together

would be adequate to achieve an accurate result We designed Algorithm 2

to get optimized shift parameter Where Tte is the test template and ssei is

the i th WPL descriptor of Tte Tta is the target template and ssai is the i th

WPL descriptor of Ttad(stek staj ) is Euclidean distance of escriptors stek

and staj

Δsk is the shift value of two descriptors defines as

We first randomly select an equal number of segment descriptors

stek in test template Tte from each quad and find its nearest neighbors staj _

in target template Tta The shift offset of them is recorded as the possible

registration shift factor _sk The final offset registration factor is _soptim

which has the smallest standard deviation among these candidate offsets

2) AFFINE TRANSFORM PARAMETER SEARCH

Affine transform is designed to tolerant some deformation of sclera

patterns in the matching step The affine transform algorithm is shown in

Algorithm 3 The shift value in the parameter set is obtained by randomly

selecting descriptor s(it )te and calculating the distance from its nearest

neighbor staj_ in Tta We transform the test template by the matrix in (7)

At end of the iteration we count the numbers of matched descriptor pairs

from the transformed template and the target template The factor β is

involved to determine if the pair of descriptor is matched and we set it to

be 20 pixels in our experiment After N iterations the optimized transform

parameter set is determined via selecting the maximum matching numbers

m(it) Here stei Tte staj and Tta is defined same as algorithm 2 tr (it )

shi f t θ(it )tr (it ) scale is the parameters of shift rotation and scale

transform generated in i tth iteration R(θ (it )) T (tr (it ) shi f t ) and S(tr (it

) scale) are the transform matrix defined as (7) To search optimize

transform parameter we iterated N times to generate these parameters In

our experiment we set iteration time to 512

3) REGISTRATION AND MATCHING ALGORITHM

Using the optimized parameter set determined from Algorithms 2

and 3 the test template will be registered and matched simultaneously The

registration and matching algorithm is listed in Algorithm 4 Here stei Tte

staj and Tta are defined same as Algorithms 2 and 3 θ(optm) tr (optm) shi f

t tr (optm) scale _soptim are the registration parameters attained from

Algorithms 2 and 3 R_θ(optm)_T _tr (optm) shi f t _S_tr (optm) scale

_ is the descriptor transform matrix defined in Algorithm 3 empty is the angle

between the segment descriptor and radius direction w is the weight of the

descriptor which indicates whether the descriptor is at the edge of sclera or

not To ensure that the nearest descriptors have a similar orientation we

used a constant factor α to check the abstract difference of two ɸ In our

experiment we set α to 5 The total matching score is minimal score of two

transformed result divided by the minimal matching score for test template

and target template

25 MAPPING THE SUBTASKS TO CUDA

CUDA is a single instruction multiple data (SIMD) system and

works as a coprocessor with a CPU A CUDA consists of many streaming

multiprocessors (SM) where the parallel part of the program should be

partitioned into threads by the programmer and mapped into those threads

There are multiple memory spaces in the CUDA memory hierarchy

register local memory shared memory global memory constant memory

and texture memory Register local memory and shared memory are on-

chip and could be a little time consuming to access these memories Only

shared memory can be accessed by other threads within the same block

However there is only limited availability of shared memory Global

memory constant memory and texture memory are off-chip memory and

accessible by all threads which would be very time consuming to access

these memories

Constant memory and texture memory are read-only and cacheable

memory Mapping algorithms to CUDA to achieve efficient processing is

not a trivial task There are several challenges in CUDA programming

If threads in a warp have different control path all the branches will be

executed serially To improve performance branch divergence within a

warp should be avoided

Global memory is slower than on-chip memory in term of access To

completely hide the latency of the small instructions set we should use on-

chip memory preferentially rather than global memory When global

memory access occurs threads in same warp should access the words in

sequence to achieve coalescence

Shared memory is much faster than the local and global memory space

But shared memory is organized into banks which are equal in size If two

addresses of memory request from different thread within a warp fall in the

same memory bank the access will be serialized To get maximum

performance memory requests should be scheduled to minimize bank

conflicts

251 MAPPING ALGORITHM TO BLOCKS

Because the proposed registration and matching algorithm has four

independent modules all the modules will be converted to different kernels

on the GPU These kernels are different in computation density thus we

map them to the GPU by various map strategies to fully utilize the

computing power of CUDA Figure 11 shows our scheme of CPU-GPU

task distribution and the partition among blocks and threads Algorithm 1 is

partitioned into coarse-grained parallel subtasks

We create a number of threads in this kernel The number of threads

is the same as the number of templates in the database As the upper middle

column shows in Figure 11 each target template will be assigned to one

thread One thread performs a pair of templates compare In our work we

use NVIDIA C2070 as our GPU Threads and blocks number is set to

1024 That means we can match our test template with up to 1024times1024

target templates at same time

Algorithms 2-4 will be partitioned into fine-grained subtasks which is

processed a section of descriptors in one thread As the lower portion of the

middle column shows in Figure 11 we assigned a target template to one

block Inside a block one thread corresponds a set of descriptors in this

template This partition makes every block execute independently and there

are no data exchange requirements between different blocks When all

threads complete their responding descriptor fractions the sum of the

intermediate results needs to be computed or compared A parallel prefix

sum algorithm is used to calculate the sum of intermediate results which is

show in right of Figure 11 Firstly all odd number threads compute the sum

of consecutive pairs of the results Then recursively every first of i (= 4 8

16 32 64 ) threads

compute the prefix sum on the new result The final result will be saved in

the first address which has the same variable name as the first intermediate

result

252 MAPPING INSIDE BLOCK

In shift argument searching there are two schemes we can choose to

map task

Mapping one pair of templates to all the threads in a block and then every

thread would take charge of a fraction of descriptors and cooperation with

other threads

Assigning a single possible shift offset to a thread and all the threads will

compute independently unless the final result should be compared with

other possible offset

Due to great number of sum and synchronization operations in every

nearest neighbor searching step we choose the second method to parallelize

shift searching In affine matrix generator we mapped an entire parameter

set searching to a thread and every thread randomly generated a set of

parameters and tried them independently The generated iterations were

assigned to all threads The challenge of this step is the randomly generated

numbers might be correlated among threads In the step of rotation and

scale registration generating we used the Mersenne Twister pseudorandom

number generator because it can use bitwise arithmetic and have long

period

The Mersenne twister as most of pseudorandom generators is iterative

Therefore itrsquos hard to parallelize a single twister state update step among

several execution threads To make sure that thousands of threads in the

launch grid generate uncorrelated random sequence many simultaneous

Mersenne twisters need to process with different initial states in parallel

But even ldquovery differentrdquo (by any definition) initial state values do not

prevent the emission of correlated sequences by each generator sharing

identical parameters To solve this problem and to enable efficient

implementation of Mersenne Twister on parallel architectures we used a

special offline tool for the dynamic creation of Mersenne Twisters

parameters modified from the algorithm developed by Makoto Matsumoto

and Takuji Nishimura In the registration and matching step when

searching the nearest neighbor a line segment that has already matched

with others should not be used again In our approach a flag

FIG

FIG

Variable denoting whether the line has been matched is stored in

shared memory To share the flags all the threads in a block should wait

synchronic operation at every query step Our solution is to use a single

thread in a block to process the matching

253 MEMORY MANAGEMENT

The bandwidth inside GPU board is much higher than the

bandwidth between host memory and device memory The data transfer

between host and device can lead to long latency As shown in Figure 11

we load the entire target templates set from database without considering

when they would be processed Therefore there was no data transfer from

host to device during the matching procedure In global memory the

components in descriptor y(ϕ1 ϕ2 ϕ3 x y) and s(x y rθ ϕw) were stored

separately This would guarantee contiguous kernels of Algorithm 2 to 4

can access their data in successive addresses Although such coalescing

access reduces the latency frequently global memory access was still a

slower way to get data In our kernel we loaded the test template to shared

memory to accelerate memory access Because the Algorithms 2 to 4

execute different number of iterations on same data the bank conflict does

not happen To maximize our texture memory space we set the system

cache to the lowest value and bonded our target descriptor to texture

memory Using this catchable memory our data access was accelerated

more

FIG

26 HISTOGRAM OF ORIENTED GRADIENTS

Histogram of oriented gradients is the feature descriptors It is primarily

applied to the design of target detection In this paper it is applied as the

feature for human recognition In the sclera region the vein patterns are the

edges of an image So HOG is used to determine the gradient orientation

and edge orientations of vein pattern in the sclera region of an eye image

To follow out this technique first of all divide the image into small

connected regions called cells For each cell compute the histogram of

gradient directions or edge orientations of the pixels Then the combination

of different histogram of different cell represents the descriptor To improve

accuracy histograms can be contrast normalized by calculating the intensity

from the block and then using this value normalizes all cells within the

block This normalization result shows that it is invariant to geometric and

photometric changes The gradient magnitude m(x y) and orientation 1050592(x

y) are calculated using x and y directions gradients dx (x y) and dy (x y)

Orientation binning is the second step of HOG This method utilized

to create cell histograms Each pixel within the cell used to give a weight to

the orientation which is found in the gradient computation Gradient

magnitude is used as the weight The cells are in the rectangular form The

binning of gradient orientation should be spread over 0 to 180 degrees and

opposite direction counts as the same In the Fig 8 depicts the edge

orientation of picture elements If the images have any illumination and

contrast changes then the gradient strength must be locally normalized For

that cells are grouped together into larger blocks These blocks are

overlapping so that each cell contributes more than once to the final

descriptor Here rectangular HOG (R-HOG) blocks are applied which are

mainly in square grids The performance of HOG is improved by putting

on a Gaussian window into each block

FIG

CHAPTER 3

SOFTWARE SPECIFICATION

31 GENERAL

MATLAB(matrix laboratory) is a numerical

computing environment and fourth-generation programming language

Developed by Math Works MATLAB allows matrix manipulations

plotting of functions and data implementation of algorithms creation

of user interfaces and interfacing with programs written in other languages

including C C++ Java and Fortran

Although MATLAB is intended primarily for numerical computing an

optional toolbox uses the MuPAD symbolic engine allowing access

to symbolic computing capabilities An additional package Simulink adds

graphicalmulti-domainsimulationandModel-Based

Design for dynamic and embedded systems

In 2004 MATLAB had around one million users across industry

and academia MATLAB users come from various backgrounds

of engineering science and economics MATLAB is widely used in

academic and research institutions as well as industrial enterprises

MATLAB was first adopted by researchers and practitioners

in control engineering Littles specialty but quickly spread to many other

domains It is now also used in education in particular the teaching

of linear algebra and numerical analysis and is popular amongst scientists

involved in image processing The MATLAB application is built around the

MATLAB language The simplest way to execute MATLAB code is to type

it in the Command Window which is one of the elements of the MATLAB

Desktop When code is entered in the Command Window MATLAB can

be used as an interactive mathematical shell Sequences of commands can

be saved in a text file typically using the MATLAB Editor as a script or

encapsulated into a function extending the commands available

MATLAB provides a number of features for documenting and

sharing your work You can integrate your MATLAB code with other

languages and applications and distribute your MATLAB algorithms and

applications

32 FEATURES OF MATLAB

High-level language for technical computing

Development environment for managing code files and data

Interactive tools for iterative exploration design and problem solving

Mathematical functions for linear algebra statistics Fourier analysis

filtering optimization and numerical integration

2-D and 3-D graphics functions for visualizing data

Tools for building custom graphical user interfaces

Functions for integrating MATLAB based algorithms with external

applications and languages such as C C++ FORTRAN Javatrade COM

and Microsoft Excel

MATLAB is used in vast area including signal and image

processing communications control design test and measurement

financial modeling and analysis and computational Add-on toolboxes

(collections of special-purpose MATLAB functions) extend the MATLAB

environment to solve particular classes of problems in these application

areas

MATLAB can be used on personal computers and powerful

server systems including the Cheaha compute cluster With the addition of

the Parallel Computing Toolbox the language can be extended with parallel

implementations for common computational functions including for-loop

unrolling Additionally this toolbox supports offloading computationally

intensive workloads to Cheaha the campus compute cluster MATLAB is

one of a few languages in which each variable is a matrix (broadly

construed) and knows how big it is Moreover the fundamental operators

(eg addition multiplication) are programmed to deal with matrices when

required And the MATLAB environment handles much of the bothersome

housekeeping that makes all this possible Since so many of the procedures

required for Macro-Investment Analysis involves matrices MATLAB

proves to be an extremely efficient language for both communication and

implementation

321 INTERFACING WITH OTHER LANGUAGES

MATLAB can call functions and subroutines written in the C

programming language or FORTRAN A wrapper function is created

allowing MATLAB data types to be passed and returned The dynamically

loadable object files created by compiling such functions are termed MEX-

files (for MATLAB executable)

Libraries written in Java ActiveX or NET can be directly called

from MATLAB and many MATLAB libraries (for

example XML or SQL support) are implemented as wrappers around Java

or ActiveX libraries Calling MATLAB from Java is more complicated but

can be done with MATLAB extension which is sold separately by Math

Works or using an undocumented mechanism called JMI (Java-to-Mat lab

Interface) which should not be confused with the unrelated Java that is also

called JMI

As alternatives to the MuPAD based Symbolic Math Toolbox

available from Math Works MATLAB can be connected

to Maple or Mathematical

Libraries also exist to import and export MathML

Development Environment

Startup Accelerator for faster MATLAB startup on Windows especially on

Windows XP and for network installations

Spreadsheet Import Tool that provides more options for selecting and

loading mixed textual and numeric data

Readability and navigation improvements to warning and error messages in

the MATLAB command window

Automatic variable and function renaming in the MATLAB Editor

Developing Algorithms and Applications

MATLAB provides a high-level language and development

tools that let you quickly develop and analyze your algorithms and

applications

The MATLAB Language

The MATLAB language supports the vector and matrix operations

that are fundamental to engineering and scientific problems It enables fast

development and execution With the MATLAB language you can

program and develop algorithms faster than with traditional languages

because you do not need to perform low-level administrative tasks such as

declaring variables specifying data types and allocating memory In many

cases MATLAB eliminates the need for lsquoforrsquo loops As a result one line of

MATLAB code can often replace several lines of C or C++ code

At the same time MATLAB provides all the features of a traditional

programming language including arithmetic operators flow control data

structures data types object-oriented programming (OOP) and debugging

features

MATLAB lets you execute commands or groups of commands one

at a time without compiling and linking enabling you to quickly iterate to

the optimal solution For fast execution of heavy matrix and vector

computations MATLAB uses processor-optimized libraries For general-

purpose scalar computations MATLAB generates machine-code

instructions using its JIT (Just-In-Time) compilation technology

This technology which is available on most platforms provides

execution speeds that rival those of traditional programming languages

Development Tools

MATLAB includes development tools that help you implement

your algorithm efficiently These include the following

MATLAB Editor

Provides standard editing and debugging features such as setting

breakpoints and single stepping

Code Analyzer

Checks your code for problems and recommends modifications to

maximize performance and maintainability

MATLAB Profiler

Records the time spent executing each line of code

Directory Reports

Scan all the files in a directory and report on code efficiency file

differences file dependencies and code coverage

Designing Graphical User Interfaces

By using the interactive tool GUIDE (Graphical User Interface

Development Environment) to layout design and edit user interfaces

GUIDE lets you include list boxes pull-down menus push buttons radio

buttons and sliders as well as MATLAB plots and Microsoft

ActiveXreg controls Alternatively you can create GUIs programmatically

using MATLAB functions

322 ANALYZING AND ACCESSING DATA

MATLAB supports the entire data analysis process from acquiring

data from external devices and databases through preprocessing

visualization and numerical analysis to producing presentation-quality

output

Data Analysis

MATLAB provides interactive tools and command-line functions for data

analysis operations including

Interpolating and decimating

Extracting sections of data scaling and averaging

Thresholding and smoothing

Correlation Fourier analysis and filtering

1-D peak valley and zero finding

Basic statistics and curve fitting

Matrix analysis

Data Access

MATLAB is an efficient platform for accessing data from

files other applications databases and external devices You can read data

from popular file formats such as Microsoft Excel ASCII text or binary

files image sound and video files and scientific files such as HDF and

HDF5 Low-level binary file IO functions let you work with data files in

any format Additional functions let you read data from Web pages and

XML

Visualizing Data

All the graphics features that are required to visualize engineering

and scientific data are available in MATLAB These include 2-D and 3-D

plotting functions 3-D volume visualization functions tools for

interactively creating plots and the ability to export results to all popular

graphics formats You can customize plots by adding multiple axes

changing line colors and markers adding annotation Latex equations and

legends and drawing shapes

2-D Plotting

Visualizing vectors of data with 2-D plotting functions that create

Line area bar and pie charts

Direction and velocity plots

Histograms

Polygons and surfaces

Scatterbubble plots

Animations

3-D Plotting and Volume Visualization

MATLAB provides functions for visualizing 2-D matrices 3-

D scalar and 3-D vector data You can use these functions to visualize and

understand large often complex multidimensional data Specifying plot

characteristics such as camera viewing angle perspective lighting effect

light source locations and transparency

3-D plotting functions include

Surface contour and mesh

Image plots

Cone slice stream and isosurface

323 PERFORMING NUMERIC COMPUTATION

MATLAB contains mathematical statistical and engineering

functions to support all common engineering and science operations These

functions developed by experts in mathematics are the foundation of the

MATLAB language The core math functions use the LAPACK and BLAS

linear algebra subroutine libraries and the FFTW Discrete Fourier

Transform library Because these processor-dependent libraries are

optimized to the different platforms that MATLAB supports they execute

faster than the equivalent C or C++ code

MATLAB provides the following types of functions for performing

mathematical operations and analyzing data

Matrix manipulation and linear algebra

Polynomials and interpolation

Fourier analysis and filtering

Data analysis and statistics

Optimization and numerical integration

Ordinary differential equations (ODEs)

Partial differential equations (PDEs)

Sparse matrix operations

MATLAB can perform arithmetic on a wide range of data types

including doubles singles and integers

CHAPTER 4

IMPLEMENTATION

41 GENERAL

Matlab is a program that was originally designed to simplify the

implementation of numerical linear algebra routines It has since grown into

something much bigger and it is used to implement numerical algorithms

for a wide range of applications The basic language used is very similar to

standard linear algebra notation but there are a few extensions that will

likely cause you some problems at first

42 SNAPSHOTS

ORIGINAL SCLERA IMAGE IS CONVERTED INTO GREY SCALE IMAGE

FIG

GREY SCALE IMAGE IS CONVERTED INTO BINARY IMAGE

FIG

EDGE DETECTON IS DONE BY OTSUrsquoS THRESHOLDING

FIG

SELECTING THE REGION OF INTEREST (SCLERA PART)

FIG

SELECTED ROI PART

FIG

FIG

ENHANCEMENT OF SCLERA IMAGE

FIG

FEATURE EXTRACTION OF SCLERA IMAGE USING GABOR

FILTERS

FIG

MATCHING WITH IMAGES IN DATABASE

FIG

DISPLAYING THE RESULT (MATCHED OR NOT MATCHED)

FIG

CHAPTER 5

APPLICATIONS

The applications of biometrics can be divided into the following three main groups

Commercial applications such as computer network login electronic data security ecommerce Internet access ATM credit card physical access control cellular phone PDA medical records management distance learning etc

Government applications such as national ID card correctional facility driverrsquos license social security welfare-disbursement border control Passport control etc

Forensic applications such as corpse identification criminal investigation terrorist identification parenthood determination missing children etc Traditionally commercial applications have used knowledge-based systems (eg PIN sand passwords) government applications have used token-based systems (eg ID cards and badges) and forensic applications have relied on human experts to match biometric features Biometric systems are being increasingly deployed in large scale civilian applications The Schiphol Premium scheme at the Amsterdam airport for example employs iris scan cards to speed up the passport and visa control procedures

CHAPTER 6

CONCLUSION AND FUTURE SCOPE

61 CONCLUSION

In this paper we proposed a new parallel sclera vein recognition

method which employees a two stage parallel approach for registration and

matching Even though the research focused on developing a parallel sclera

matching solution for the sequential line-descriptor method using CUDA

GPU architecture the parallel strategies developed in this research can be

applied to design parallel solutions to other sclera vein recognition methods

and general pattern recognition methods We designed the Y shape

descriptor to narrow the search range to increase the matching efficiency

which is a new feature extraction method to take advantage of the GPU

structures We developed the WPL descriptor to incorporate mask

information and make it more suitable for parallel computing which can

dramatically reduce data transferring and computation We then carefully

mapped our algorithms to GPU threads and blocks which is an important

step to achieve parallel computation efficiency using a GPU A work flow

which has high arithmetic intensity to hide the memory access latency was

designed to partition the computation task to the heterogeneous system of

CPU and GPU even to the threads in GPU The proposed method

dramatically improves the matching efficiency without compromising

recognition accuracy

62 REFERENCES

[1] C W Oyster The Human Eye Structure and Function Sunderland

Sinauer Associates 1999

[2] C Cuevas D Berjon F Moran and N Garcia ldquoMoving object

detection for real-time augmented reality applications in a GPGPUrdquo IEEE

Trans Consum Electron vol 58 no 1 pp 117ndash125 Feb 2012

[3] D C Cirean U Meier L M Gambardella and J Schmidhuber ldquoDeep

big simple neural nets for handwritten digit recognitionrdquo Neural Comput

vol 22 no 12 pp 3207ndash3220 2010

[4] F Z Sakr M Taher and A M Wahba ldquoHigh performance iris

recognition system on GPUrdquo in Proc ICCES 2011 pp 237ndash242

[5] G Poli J H Saito J F Mari and M R Zorzan ldquoProcessing

neocognitron of face recognition on high performance environment based

on GPU with CUDA architecturerdquo in Proc 20th Int Symp Comput

Archit High Perform Comput 2008 pp 81ndash88

[6] J Antikainen J Havel R Josth A Herout P Zemcik and M Hauta-

Kasari ldquoNonnegative tensor factorization accelerated using GPGPUrdquo IEEE

Trans Parallel Distrib Syst vol 22 no 7 pp 1135ndash1141 Feb 2011

[7] K-S Oh and K Jung ldquoGPU implementation of neural networksrdquo

Pattern Recognit vol 37 no 6 pp 1311ndash1314 2004

[8] P R Dixon T Oonishi and S Furui ldquoHarnessing graphics processors

for the fast computation of acoustic likelihoods in speech recognitionrdquo

Comput Speech Lang vol 23 no 4 pp 510ndash526 2009

[9] P Kaufman and A Alm ldquoClinical applicationrdquo Adlerrsquos Physiology of

the Eye 2003

[10] R N Rakvic B J Ulis R P Broussard R W Ives and N Steiner

ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics Security vol 4

no 4 pp 812ndash823 Dec 2009

[11] S Crihalmeanu and A Ross ldquoMultispectral scleral patterns for ocular

biometric recognitionrdquo Pattern Recognit Lett vol 33 no 14 pp 1860ndash

1869 Oct 2012

[12] W Wenying Z Dongming Z Yongdong L Jintao and G

Xiaoguang ldquoRobust spatial matching for object retrieval and its parallel

implementation on GPUrdquo IEEE Trans Multimedia vol 13 no 6 pp

1308ndash1318 Dec 2011Multimedia Sec Magdeburg Germany Sep 2004

pp 4ndash15

[13] Y Xu S Deka and R Righetti ldquoA hybrid CPU-GPGPU approach for

real-time elastographyrdquo IEEE Trans Ultrason Ferroelectr Freq Control

vol 58 no 12 pp 2631ndash2645 Dec 2011

[14] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive

multimodal eye recognitionrdquo Signal Image Video Process vol 7 no 4

pp 619ndash631 Jul 2013

[15] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive

approach for sclera image quality measurerdquo Int J Biometrics vol 5 no 2

pp 181ndash198 2013

[16] Z Zhou E Y Du N L Thomas and E J Delp ldquoA new human

identification method Sclera recognitionrdquo IEEE Trans Syst Man

Cybern A Syst Humans vol 42 no 3 pp 571ndash583 May 2012

Page 18: Major project report on

BMP- bit map file format

15 TYPE OF IMAGES

Images are 4 types

1 Binary image

2 Gray scale image

3 Color image

4 Indexed image

151 BINARY IMAGES

A binary image is a digital image that has only two possible values for

each pixel Typically the two colors used for a binary image are black and

white though any two colors can be used Binary images are also called bi-

level or two-level This means that each pixel is stored as a single bitmdashie

a 0 or 1 The names black-and-white BampW

152 GRAY SCALE IMAGE

In a (8-bit) grayscale image each picture element has an assigned intensity

that ranges from 0 to 255 A grey scale image is what people normally call

a black and white image but the name emphasizes that such an image will

also include many shades of grey

FIG

153 COLOR IMAGE

The RGB colour model relates very closely to the way we perceive

colour with the r g and b receptors in our retinas RGB uses additive colour

mixing and is the basic colour model used in television or any other

medium that projects colour with light It is the basic colour model used in

computers and for web graphics but it cannot be used for print production

The secondary colours of RGB ndash cyan magenta and yellow ndash are formed

by mixing two of the primary colours (red green or blue) and excluding the

third colour Red and green combine to make yellow green and blue to

make cyan and blue and red form magenta The combination of red green

and blue in full intensity makes white

In Photoshop using the ldquoscreenrdquo mode for the different layers in an

image will make the intensities mix together according to the additive

colour mixing model This is analogous to stacking slide images on top of

each other and shining light through them

FIG

CMYK The 4-colour CMYK model used in printing lays down

overlapping layers of varying percentages of transparent cyan (C) magenta

(M) and yellow (Y) inks In addition a layer of black (K) ink can be added

The CMYK model uses the subtractive colour model

154 INDEXED IMAGE

FIG

An indexed image consists of an array and a color map matrix The

pixel values in the array are direct indices into a color map By convention

this documentation uses the variable name X to refer to the array and map

to refer to the color map In computing indexed color is a technique to

manage digital images colors in a limited fashion in order to save

computer memory and file storage while speeding up display refresh and

file transfers It is a form of vector quantization compression

When an image is encoded in this way color information is not

directly carried by the image pixel data but is stored in a separate piece of

data called a palette an array of color elements in which every element a

color is indexed by its position within the array The image pixels do not

contain the full specification of its color but only its index in the palette

This technique is sometimes referred as pseudocolor or indirect color as

colors are addressed indirectly

Perhaps the first device that supported palette colors was a random-

access frame buffer described in 1975 by Kajiya Sutherland and Cheadle

This supported a palette of 256 36-bit RGB colors

16 Applications of image processing

Interest in digital image processing methods stems from 2 principal

application areas

1) Improvement of pictorial information for human interpretation

2) Processing of scene data for autonomous machine perception

In the second application area interest focuses on procedures for

extracting from an image

Information in a form suitable for computer processing

Examples include automatic character recognition industrial machine

vision for product assembly and inspection military recognizance

automatic processing of fingerprints etc

17 EXISTING SYSTEM

In Crihalmeanu and Ross proposed three approaches Speed Up Robust

Features (SURF)-based method minutiae detection and direct correlation

matching for feature registration and matching Within these three methods

the SURF method achieves the best accuracy It takes an average of 15

seconds1 using the SURF method to perform a one-to-one matching In

Zhou et al Proposed line descriptor-based method for sclera vein

recognition The matching step (including registration) is the most time-

consuming step in this sclera vein recognition system which costs about 12

seconds to perform a one-to-one matching Both speed was calculated using

a PC with Intelreg Core TM 2 Duo 24GHz processors and 4 GB DRAM

Currently Sclera vein recognition algorithms are designed using central

processing unit (CPU)-based systems

171 DISADVANTAGES OF EXISTING SYSTEM

1 Mask files are used to calculate valid overlapping areas of two sclera

templates and to align the templates to the same coordinate system But the

mask files are large in size and will preoccupy the GPU memory and slow

down the data transfer Also some of processing on the mask files will

involve convolution which is difficult to improve its performance on the

scalar process unit on CUDA

2 The procedure of sclera feature matching consists of a pipeline of several

computational stages with different memory and processing requirements

There is no uniform mapping scheme applicable for all these stages

3 When the scale of sclera database is far larger than the number of the

processing units on the GPU parallel matching on the GPU is still unable to

satisfy the requirement of real-time performance

LITERATURE SURVEY

1 S Crihalmeanu and A Ross ldquo ieee Multispectral scleral patterns

for ocular biometric recognitionrdquo Pattern Recognit Lett vol 33 no

14 pp 1860ndash1869 Oct 2012

Face recognition in unconstrained acquisition conditions is one of the

most challenging problems that has been actively researched in recent

years It is well known that many state-of-the-arts still face recognition

algorithms perform well when constrained (frontal well illuminated high-

resolution sharp and full) face images are acquired However their

performance degrades significantly when the test images contain variations

that are not present in the training images In this paper we highlight some

of the key issues in remote face recognition We define the remote face

recognition as one where faces are several tens of meters (10-250m) from

the cameras We then describe a remote face database which has been

acquired in an unconstrained outdoor maritime environment Recognition

performance of a subset of existing still image-based face recognition

algorithms is evaluated on the remote face data set Further we define the

remote re-identification problem as matching a subject at one location with

candidate sets acquired at a different location and over time in remote

conditions We provide preliminary experimental results on remote re-

identification It is demonstrated that in addition to applying a good

classification algorithm finding features that are robust to variations

mentioned above and developing statistical models which can account for

these variations are very important for remote face recognition

2 R N Rakvic B J Ulis R P Broussard R W Ives and N

Steiner1 ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics

Security

With the rapidly expanded biometric data collected by various sectors

of government and industry for identification and verification purposes

how to manage and process such Big Data draws great concern Even

though modern processors are equipped with more cores and memory

capacity it still requires careful design in order to utilize the hardware

resource effectively and the power consumption efficiently This research

addresses this issue by investigating the workload characteristics of

biometric application Taking Daugmanrsquos iris matching algorithm which

has been proven to be the most reliable iris matching method as a case

study we conduct performance profiling and binary instrumentation on the

benchmark to capture its execution behavior The results show that data

loading and memory access incurs great performance overhead and

motivates us to move the biometrics computation to high-performance

architecture

Modern iris recognition algorithms can be computationally intensive

yet are designed for traditional sequential processing elements such as a

personal computer However a parallel processing alternative using field

programmable gate arrays (FPGAs) offers an opportunity to speed up iris

recognition Within the means of this project iris template generation with

directional filtering which is acomputationally expensive yet parallel

portion of a modern iris recognition algorithm is parallelized on an FPGA

system We will present a performance comparison of the parallelized

algorithm on the FPGA system to a traditional CPU-based version The

parallelized template generation outperforms an optimized C++ code

version determining the information content of an iris approximately 324

times faster

3 R Derakhshani A Ross and S Crihalmeanu ldquoA new biometric

modality based on conjunctival vasculaturerdquo in Proc Artif Neural

Netw Eng 2006 pp 1ndash8

A new biometric indicator based on the patterns of conjunctival

vasculature is proposed Conjunctival vessels can be observed on the visible

part of the sclera that is exposed to the outside world These vessels

demonstrate rich and specific details in visible light and can be easily

photographed using a regular digital camera In this paper we discuss

methods for conjunctival imaging preprocessing and feature extraction in

order to derive a suitable conjunctival vascular template for biometric

authentication Commensurate classification methods along with the

observed accuracy are discussed Experimental results suggest the potential

of using conjunctival vasculature as a biometric measure Identification of

a person based on some unique set of features is an important task The

human identification is possible with several biometric systems and sclera

recognition is one of the promising biometrics The sclera is the white

portion of the human eye The vein pattern seen in the sclera region is

unique to each person Thus the sclera vein pattern is a well suited

biometric technology for human identification The existing methods used

for sclera recognition have some drawbacks like only frontal looking

images are preferred for matching and rotation variance is another problem

These problems are completely eliminated in the proposed system by using

two feature extraction techniques They are Histogram of Oriented

Gradients (HOG) and converting the image into polar form using the

bilinear interpolation technique These two features help the proposed

system to become illumination invariant and rotation invariant The

experimentation is done with the help of UBIRIS database The

experimental result shows that the proposed sclera recognition method can

achieve better accuracy than the previous methods

4 J D Owens M Houston D Luebke S Green J E Stone and J

C Phillips ldquoGPU computingrdquo Proc IEEE vol 96 no 5pp 879ndash899

May 2008

The graphics processing unit (GPU) has become an integral part of

todayrsquos mainstream computing systems Over the past six years there has

been a marked increase in the performance and capabilities of GPUs The

modern GPU is not only a powerful graphics engine but also a highly

parallel programmable processor featuring peak arithmetic and memory

bandwidth that substantially outpaces its CPU counterpart The GPUrsquos

rapid increase in both programmability and capability has spawned a

research community that has successfully mapped a broad range of

computationally demanding complex problems to the GPU This effort in

general purpose computing on the GPU also known as GPU computing

has positioned the GPU as a compelling alternative to traditional

microprocessors in high-performance computer systems of the future We

describe the background hardware and programming model for GPU

computing summarize the state of the art in tools and techniques and

present four GPU computing successes in game physics and computational

biophysics that deliver order-of-magnitude performance gains over

optimized CPU applications

5 H Proenccedila and L A Alexandre ldquoUBIRIS A noisy iris image

databaserdquoin Proc 13th Int Conf Image Anal Process 2005 pp 970ndash

977

This paper proposes algorithms for iris segmentation quality

enhancement match score fusion and indexing to improve both the

accuracy and the speed of iris recognition A curve evolution approach is

proposed to effectively segment a nonideal iris image using the modified

MumfordndashShah functional Different enhancement algorithms are

concurrently applied on the segmented iris image to produce multiple

enhanced versions of the iris image A support-vector-machine-based

learning algorithm selects locally enhanced regions from each globally

enhanced image and combines these good-quality regions to create a single

high-quality iris image Two distinct features are extracted from the high-

quality iris image The global textural feature is extracted using the 1-D log

polar Gabor transform and the local topological feature is extracted using

Euler numbers An intelligent fusion algorithm combines the textural and

topological matching scores to further improve the iris recognition

performance and reduce the false rejection rate whereas an indexing

algorithm enables fast and accurate iris identification The verification and

identification performance of the propose algorithms is validated and

compared with other algorithms using the CASIA Version 3 ICE 2005 and

UBIRIS iris databases

18 PROPOSED METHOD

Proposed a new parallel sclera vein recognition method using a two-

stage parallel approach for registration and matching A parallel sclera

matching solution for Sclera vein recognition using our sequential line-

descriptor method using the CUDA GPU architecture CUDA is a highly

parallel multithreaded many-core processor with tremendous

computational power

It supports not only a traditional graphics pipeline but also computation

on non-graphical data It is relatively straightforward to implement our C

program for CUDA on AMD-based GPU using Open CL Our CUDA

kernels can be directly converted to Open CL kernels by concerning

different syntax for various keywords and built-in functions The mapping

strategy is also effective in Open CL if we regard thread and block in

CUDA as work item and work-group in Open CL Most of our optimization

techniques such as coalesced memory access and prefix sum can work in

Open CL too Moreover since CUDA is a data parallel architecture the

implementation of our approach by Open CL should be programmed in

data-parallel model

In this research we first discuss why the naiumlve parallel approach would

not work We then propose the new sclera descriptor ndash the Y shape sclera

feature-based efficient registration method to speed up the mapping scheme

introduce the ldquoweighted polar line (WPL) descriptorrdquo that would be better

suited for parallel computing to mitigate the mask size issue and develop

our coarse to fine two-stage matching process to dramatically improve the

matching speed These new approaches make the parallel processing

possible and efficient

191PROPOSED SYSTEM ADVANTAGES

1 To improve the efficiency in this research we propose a new descriptor

mdash the Y shape descriptor which can greatly help improve the efficiency of

the coarse registration of two images and can be used to filter out some

non-matching pairs before refined matching

2 We propose the coarse-to-fine two-stage matching process In the first

stage we matched two images coarsely using the Y-shape descriptors

which is very fast to match because no registration was needed The

matching result in this stage can help filter out image pairs with low

similarities

CHAPTER 2

PROJECT DESCRIPTION

21 INTRODUCTION

The sclera is the opaque and white outer layer of the eye The blood

vessel structure of sclera is formed randomly and is unique to each person

which can be used for humanrsquos identification Several researchers have

designed different Sclera vein recognition methods and have shown that it

is promising to use Sclera vein recognition for human identification In

Crihalmeanu and Ross proposed three approaches Speed Up Robust

Features (SURF)-based method minutiae detection and direct correlation

matching for feature registration and matching Within these three methods

the SURF method achieves the best accuracy It takes an average of 15

seconds1 using the SURF method to per- form a one-to-one matching Zhou

et al proposed line descriptor-based method for sclera vein recognition

The matching step (including registration) is the most time-consuming step

in this sclera vein recognition system which costs about 12 seconds to

perform a one-to-one matching Both speed was calculated using a PC with

Intelreg Coretrade 2 Duo 24GHz processors and 4 GB DRAM Currently

Sclera vein recognition algorithms are designed using central processing

unit (CPU)-based systems

As discussed CPU-based systems are designed as sequential

processing devices which may not be efficient in data processing where the

data can be parallelized Because of large time consumption in the matching

step Sclera vein recognition using sequential-based method would be very

challenging to be implemented in a real time biometric system especially

when there is large number of templates in the database for matching GPUs

(as abbreviation of General purpose Graphics Processing Units GPGPUs)

are now popularly used for parallel computing to improve the

computational processing speed and efficiency The highly parallel

structure of GPUs makes them more effective than CPUs for data

processing where processing can be performed in parallel GPUs have been

widely used in biometrics recognition such as speech recognition text

detection handwriting recognition and face recognition In iris

recognition GPU was used to extract the features construct descriptors

and match templates

GPUs are also used for object retrieval and image search Park et al

designed the performance evaluation of image processing algorithms such

as linear feature extraction and multi-view stereo matching on GPUs

However these approaches were designed for their specific biometric

recognition applications and feature searching methods Therefore they may

not be efficient for Sclera vein recognition Compute Unified Device

Architecture (CUDA) the computing engine of NVIDIA GPUs is used in

this research CUDA is a highly parallel multithreaded many-core

processor with tremendous computational power It supports not only a

traditional graphics pipeline but also computation on non-graphical data

More importantly it offers an easier programming platform which

outperforms its CPU counterparts in terms of peak arithmetic intensity and

memory bandwidth In this research the goal is not to develop a unified

strategy to parallelize all sclera matching methods because each method is

quite different from one another and would need customized design To

develop an efficient parallel computing scheme it would need different

strategies for different Sclera vein recognition methods

Rather the goal is to develop a parallel sclera matching solution for

Sclera vein recognition using our sequential line-descriptor method using

the CUDA GPU architecture However the parallelization strategies

developed in this research can be applied to design parallel approaches for

other Sclera vein recognition methods and help parallelize general pattern

recognition methods Based on the matching approach in there are three

challenges to map the task of sclera feature matching to GPU

1) Mask files are used to calculate valid overlapping areas of two sclera

templates and to align the templates to the same coordinate system But the

mask files are large in size and will preoccupy the GPU memory and slow

down the data transfer Also some of processing on the mask files will

involve convolution which is difficult to improve its performance on the

scalar process unit on CUDA

2) The procedure of sclera feature matching consists of a pipeline of several

computational stages with different memory and processing requirements

There is no uniform mapping scheme applicable for all these stages

3) When the scale of sclera database is far larger than the number of the

processing units on the GPU parallel matching on the GPU is still unable to

satisfy the requirement of real-time performance New designs are

necessary to help narrow down the search range In summary naiumlve

implementation of the algorithms in parallel would not work efficiently

Note it is relatively straightforward to implement our C program for

CUDA on AMD-based GPU using Open CL Our CUDA kernels can be

directly converted to Open CL kernels by concerning different syntax for

various keywords and built-in functions The mapping strategy is also

effective in Open CL if we regard thread and block in CUDA as work item

and work-group in Open CL Most of our optimization techniques such as

coalesced memory access and prefix sum can work in Open CL too

Moreover since CUDA is a data parallel architecture the implementation

of our approach by Open CL should be programmed in data-parallel model

In this research we first discuss why the naiumlve parallel approach would not

work (Section 3) We then propose the new sclera descriptor ndash the Y shape

sclera feature-based efficient registration method to speed up the mapping

scheme (Section 4) introduce the ldquoweighted polar line (WPL) descriptorrdquo

that would be better suited for parallel computing to mitigate the mask size

issue (Section 5) and develop our coarse to fine two-stage matching

process to dramatically improve the matching speed (Section 6) These new

approaches make the parallel processing possible and efficient However it

is non-trivial to implement these algorithms in CUDA We then developed

the implementation schemes to map our algorithms into CUDA (Section 7)

In the Section 2 we give brief introduction of Sclera vein recognition In

the Section 8 we performed some experiments using the proposed system

In the Section 9 we draw some conclusions

22 BACKGROUND OF SCLERA VEIN RECOGNITION

221 OVERVIEW OF SCLERA VEIN RECOGNITION

A typical sclera vein recognition system includes sclera

segmentation feature enhancement feature extraction and feature

matching (Figure 1)

FIG

Sclera image segmentation is the first step in sclera vein recognition

Several methods have been designed for sclera segmentation Crihalmeanu

et al presented an semi-automated system for sclera segmentation They

used a clustering algorithm to classify the color eye images into three

clusters - sclera iris and background Later on Crihalmeanu and Ross

designed a segmentation approach based on a normalized sclera index

measure which includes coarse sclera segmentation pupil region

segmentation and fine sclera segmentation Zhou et al developed a skin

tone plus ldquowhite colorrdquo-based voting method for sclera segmentation in

color images and Otsursquos thresholding-based method for grayscale images

After sclera segmentation it is necessary to enhance and extract the sclera

features since the sclera vein patterns often lack contrast and are hard to

detect Zhou et al used a bank of multi-directional Gabor filters for

vascular pattern enhancement Derakhshani et al used contrast limited

adaptive histogram equalization (CLAHE) to enhance the green color plane

of the RGB image and a multi-scale region growing approach to identify

the sclera veins from the image background Crihalmeanu and Ross applied

a selective enhancement filter for blood vessels to extract features from the

green component in a color image In the feature matching step

Crihalmeanu and Ross proposed

three registration and matching approaches including Speed Up Robust

Features (SURF) which is based on interest-point detection minutiae

detection which is based on minutiae points on the vasculature structure

and direct correlation matching which relies on image registration Zhou et

al designed a line descriptor based feature registration and matching

method

The proposed sclera recognition consists of five steps which include

sclera segmentation vein pattern enhancement feature extraction feature

matching and matching decision Fig 2 shows the block diagram of sclera

recognition Two types of feature extraction are used in the proposed

method to achieve good accuracy for the identification The characteristics

that are elicited from the blood vessel structure seen in the sclera region are

Histogram of Oriented Gradients (HOG) and interpolation of Cartesian to

Polar conversion HOG is used to determine the gradient orientation and

edge orientations of vein pattern in the sclera region of an eye image To

become more computationally efficient the data of the image are converted

to the polar form It is mainly used for circular or quasi circular shape of

object These two characteristics are extracted from all the images in the

database and compared with the features of the query image whether the

person is correctly identified or not This procedure is done in the feature

matching step and ultimately makes the matching decision By using the

proposed feature extraction methods and matching techniques the human

identification is more accurate than the existing studies In the proposed

method two features of an image are drawn out

222 SCLERA SEGMENTATION

Sclera segmentation is the first step in the sclera recognition It lets

in three steps glare area detection sclera area estimation and iris and eyelid

detection and refinement Fig shows the steps of segmentation

FIG

Glare Area Detection Glare area means a small bright area near

pupil or iris This is the unwanted portion on the eye image Sobel filter is

applied to detect the glare area present in the iris or pupil Simply it runs

only for the grayscale image If the image is color then it needs a

conversion to grayscale image and after that apply it to the Sobel filter to

detect the glare area Fig 4 shows the result of the glare area detection

FIG

Sclera area estimation For the estimation of sclera area Otsursquos

thresholding method is applied The stairs of the sclera area detection are

selection of the area of interest (ROI) Otsursquos thresholding sclera area

detection Left and right sclera area is selected based on the iris boundaries

When the region of interest is selected then apply Otsursquos thresholding for

obtaining the potential sclera areas The correct left sclera area should be

placed in the right and center positions and correct right sclera area should

be placed in the left and center In this way non sclera areas are wiped out

223 IRIS AND EYELID REFINEMENT

The top and underside of the sclera regions are the limits of the

sclera area And then that upper eyelid lower eyelid and iris boundaries are

refined These altogether are the unwanted portion for recognition In order

to eliminate these effects refinement is done in the footstep of the detection

of sclera area Fig shows after the Otsursquos thresholding process and iris and

eyelid refinement to detect right sclera area In the same way the left sclera

area is detected using this method

FIG

In the segmentation process all images are not perfectly segmented

Hence feature extraction and matching are needed to reduce the

segmentation fault The vein patterns in the sclera area are not visible in the

segmentation process To get vein patterns more visible vein pattern

enhancement is to be performed

224 OCULAR SURFACE VASCULATURE

Human recognition using vascular patterns in the human body has

been studied in the context of fingers (Miura et al 2004) palm (Lin and

Fan 2004) and retina (Hill 1999) In the case of retinal biometrics an

especial optical device for imaging the back of the eyeball is needed (Hill

1999) Due to its perceived invasiveness and the required degree of subject

cooperation the use of retinal biometrics may not be acceptable to some

individuals The conjunctiva is a thin transparent and moist tissue that

covers the outer surface of the eye The part of the conjunctiva that covers

the inner lining of the eyelids is called palpebral conjunctiva and the part

that covers the outer surface of the eye is called ocular (or the bulbar)

conjunctiva which is the focus of this study The ocular conjunctiva is very

thin and clear thus the vasculature (including those of the episclera) is

easily visible through it The visible microcirculation of conjunctiva offers a

rich and complex network of veins and fine microcirculation (Fig 1) The

apparent complexity and specificity of these vascular patterns motivated us

to utilize them for personal identification (Derakhshani and Ross 2006)

FIG

We have found conjunctival vasculature to be a suitable biometric as it

conforms to the following criteria (Jain et al 2004)

UNIVERSALITY All normal living tissues including that of the

conjunctiva and episclera have vascular structure

UNIQUENESS Vasculature is created during embryonic vasculogenesis

Its detailed final structure is mostly stochastic and thus unique Even

though no comprehensive study on the uniqueness of vascular structures

has been conducted study of some targeted areas such as those of the eye

fundus confirm the uniqueness of such vascular patterns even between

identical twins (Simon and Goldstein 1935 Tower 1955)

PERMANENCE Other than cases of significant trauma pathology or

chemical intervention spontaneous adult ocular vasculogenesis and

angiogenesis does not easily occur Thus the conjunctival vascular

structure is expected to have reasonable permanence (Joussen 2001)

Practicality Conjunctival vasculature can be captured with commercial off

the shelf digital cameras under normal lighting conditions making this

modality highly practical

ACCEPTABILITY Since the subject is not required to stare directly into

the camera lens and given the possibility of capturing the conjunctival

vasculature from several feet away this modality is non-intrusive and thus

more acceptable

SPOOF-PROOFNESS The fine multi surface structure of the ocular

veins makes them hard to reproduce as a physical artifact Besides being a

stand-alone biometric modality we anticipate that the addition of

conjunctival biometrics will enhance the performance of current iris-based

biometric system in the following ways

Improving accuracy by the addition of vascular features

Facilitating recognition using off-angle iris images For instance if the iris

information is relegated to the left or right portions of the eye the sclera

vein patterns will be further exposed This feature makes sclera vasculature

a natural complement to the iris biometric

Addressing the failure-to-enroll issue when iris patterns are not usable (eg

due to surgical procedures)

Reducing vulnerability to spoof attacks For instance when implemented

alongside iris systems an attacker needs to reproduce not only the iris but

also different surfaces of the sclera along with the associated

microcirculation and make them available on commensurate eye surfaces

The first step in parallelizing an algorithm is to determine the

availability for simultaneous computation Below Figure demonstrates the

possibility for parallel directional filtering Since the filter is computed over

different portions of the input image the computation can be computed in

parallel (denoted by Elements below) In addition individual parallelization

of each element of Filtering can also be performed A detailed discussion of

our proposed parallelization is outside the scope of this paper

FIG

FIG

225 OVERVIEW OF THE LINE DESCRIPTOR-BASED SCLERA

VEIN

2251 RECOGNITION METHOD

The matching segment of the line-descriptor based method is a

bottleneck with regard to matching speed In this section we briefly

describe the Line Descriptor-based sclera vein recognition method After

segmentation vein patterns were enhanced by a bank of directional Gabor

filters Binary morphological operations are used to thin the detected vein

structure down to a single pixel wide skeleton and remove the branch

points The line descriptor is used to describe the segments in the vein

structure Figure 2 shows a visual description of the line descriptor Each

segment is described by three quantities the segments angle to some

reference angle at the iris center θ the segments distance to the iris center r

and the dominant angular orientation of the line segment ɸ Thus the

descriptor is S = ( θ r ɸ )T The individual components of the line descriptor

are calculated as

FIG

Here fline (x) is the polynomial approximation of the line segment (xl yl )

is the center point of the line segment (xi yi ) is the center of the detected

iris and S is the line descriptor In order to register the segments of the

vascular patterns a RANSAC-based algorithm is used to estimate the best-

fit parameters for registration between the two sclera vascular patterns For

the registration algorithm it randomly chooses two points ndash one from the

test template and one from the target template It also randomly chooses a

scaling factor and a rotation value based on a priori knowledge of the

database Using these values it calculates a fitness value for the registration

using these parameters

After sclera template registration each line segment in the test

template is compared to the line segments in the target template for

matches In order to reduce the effect of segmentation errors we created the

weighting image (Figure 3) from the sclera mask by setting interior pixels

in the sclera mask to 1 pixels within some distance of the boundary of the

mask to 05 and pixels outside the mask to 0

The matching score for two segment descriptors is calculated By

where Si and Sj are two segment descriptors m(Si Sj ) is the matching

score between segments Si and Sj d(Si Sj ) is the Euclidean distance

between the segment descriptors center points (from Eq 6-8) Dmatch is

the matching distance threshold and match is the matching angle threshold

The total matching score M is the sum of the individual matching scores

divided by the maximum matching score for the minimal set between the

test and target template That is one of the test or target templates has fewer

points and thus the sum of its descriptors weight sets the maximum score

that can be attained

FIG

FIG

FIG

FIG

movement of eye Y shape branches are observed to be a stable feature and

can be used as sclera feature descriptor To detect the Y shape branches in

the original template we search for the nearest neighbors set of every line

segment in a regular distance classified the angles among these neighbors

If there were two types of angle values in the line segment set this set may

be inferred as a Y shape structure and the line segment angles would be

recorded as a new feature of the sclera

There are two ways to measure both orientation and relationship of

every branch of Y shape vessels one is to use the angles of every branch to

x axle the other is to use the angels between branch and iris radial

direction The first method needs additional rotation operating to align the

template In our approach we employed the second method As Figure 6

shows ϕ1 ϕ2 and ϕ3 denote the angle between each branch and the radius

from pupil center Even when the head tilts the eye moves or the camera

zooms occurs at the image acquisition step ϕ1 ϕ2 and ϕ3 are quite stable

To tolerate errors from the pupil center calculation in the segmentation step

we also recorded the center position (x y) of the Y shape branches as

auxiliary parameters So in our rotation shift and scale invariant feature

vector is defined as y(ϕ1 ϕ2 ϕ3 x y) The Y-shape descriptor is generated

with reference to the iris center Therefore it is automatically aligned to the

iris centers It is a rotational- and scale- invariant descriptor V WPL

SCLERA DESCRIPTOR As we discussed in the Section 22 the line

descriptor is extracted from the skeleton of vessel structure in binary images

(Figure 7) The skeleton is then broken into smaller segments For each

segment a line descriptor is created to record the center and orientation of

the segment This descriptor is expressed as s(x yɸ) where (x y) is the

position of the center and ɸ is its orientation Because of the limitation of

segmentation accuracy the descriptor in the boundary of sclera area might

not be accurate and may contain spur edges resulting from the iris eyelid

andor eyelashes To be tolerant of such error the mask file

FIG

The line descriptor of the sclera vessel pattern (a) An eye image (b) Vessel

patterns in sclera (c) Enhanced sclera vessel patterns (d) Centers of line

segments of vessel patterns

Is designed to indicate whether a line segment belongs to the edge of the

sclera or not However in GPU application using the mask is a challenging

since the mask files are large in size and will occupy the GPU memory and

slow down the data transfer When matching the registration RANSAC

type algorithm was used to randomly select the corresponding descriptors

and the transform parameter between them was used to generate the

template transform affine matrix After every templates transform the mask

data should also be transformed and new boundary should be calculated to

evaluate the weight of the transformed descriptor This results in too many

convolutions in processor unit

To reduce heavy data transfer and computation we designed the

weighted polar line (WPL) descriptor structure which includes the

information of mask and can be automatically aligned We extracted the

relationship of geometric feature of descriptors and store them as a new

descriptor We use a weighted image created via setting various weight

values according to their positions The weight of those descriptors who are

beyond the sclera are set to be 0 and those who are near the sclera

boundary are 05 and interior descriptors are set to be 1 In our work

descriptors weights were calculated on their own mask by the CPU only

once

The calculating result was saved as a component of descriptor The

descriptor of sclera will change to s(x y ɸw) where w denotes the weight

of the point and the value may be 0 05 1 To align two templates when a

template is shifted to another location along the line connecting their

centers all the descriptors of that template will be transformed It would be

faster if two templates have similar reference points If we use the center of

the iris as the reference point when two templates are compared the

correspondence will automatically be aligned to each other since they have

the similar reference point Every feature vector of the template is a set of

line segment descriptors composed of three variable (Figure 8) the

segment angle to the reference line which went through the iris center

denoted as θ the distance between the segments center and pupil center

which is denoted as r the dominant angular orientation of the segment

denoted as ɸ To minimize the GPU computing we also convert the

descriptor value from polar coordinate to rectangular coordinate in CPU

preprocess

The descriptor vector becomes s(x y r θ ɸw) The left and right

parts of sclera in an eye may have different registration parameters For

example as an eyeball moves left left part sclera patterns of the eye may be

compressed while the right part sclera patterns are stretched

In parallel matching these two parts are assigned to threads in

different warps to allow different deformation The multiprocessor in

CUDA manages threads in groups of 32 parallel threads called warps We

reorganized the descriptor from same sides and saved

FIG

FIG

them in continuous address This would meet requirement of coalesced

memory access in GPU

After reorganizing the structure of descriptors and adding mask information

into the new descriptor the computation on the mask file is not needed on

the GPU It was very fast to match with this feature because it does not

need to reregister the templates every time after shifting Thus the cost of

data transfer and computation on GPU will be reduced Matching on the

new descriptor the shift parameters generator in Figure 4 is then simplified

as Figure 9

23 EVOLUTION OF GPU ARCHITECTURE

The fixed-function pipeline lacked the generality to efficiently express

more complicated shading and lighting operations that are essential for

complex effects The key step was replacing the fixed-function per-vertex

and per-fragment operations with user-specified programs run on each

vertex and fragment Over the past six years these vertex programs and

fragment programs have become increasingly more capable with larger

limits on their size and resource consumption with more fully featured

instruction sets and with more flexible control-flow operations After many

years of separate instruction sets for vertex and fragment operations current

GPUs support the unified Shader Model 40 on both vertex and fragment

shaders

The hardware must support shader programs of at least 65 k static

instructions and unlimited dynamic instructions

The instruction set for the first time supports both 32-bit integers and 32-

bit floating-point numbers

The hardware must allow an arbitrary number of both direct and indirect

reads from global memory (texture)

Finally dynamic flow control in the form of loops and branches must be

supported

As the shader model has evolved and become more powerful and GPU

applications of all types have increased vertex and fragment program

complexity GPU architectures have increasingly focused on the

programmable parts of the graphics pipeline Indeed while previous

generations of GPUs could best be described as additions of

programmability to a fixed-function pipeline todayrsquos GPUs are better

characterized as a programmable engine surrounded by supporting fixed-

function units General-Purpose Computing on the GPU Mapping general-

purpose computation onto the GPU uses the graphics hardware in much the

same way as any standard graphics application Because of this similarity it

is both easier and more difficult to explain the process On one hand the

actual operations are the same and are easy to follow on the other hand the

terminology is different between graphics and general-purpose use Harris

provides an excellent description of this mapping process

We begin by describing GPU programming using graphics terminology

then show how the same steps are used in a general-purpose way to author

GPGPU applications and finally use the same steps to show the more

simple and direct way that todayrsquos GPU computing applications are written

231 PROGRAMMING A GPU FOR GRAPHICS

We begin with the same GPU pipeline that we described in Section II

concentrating on the programmable aspects of this pipeline

The programmer specifies geometry that covers a region on the screen

The rasterizer generates a fragment at each pixel location covered by that

geometry

Each fragment is shaded by the fragment program

The fragment program computes the value of the fragment by a

combination of math operations and global memory reads from a global

Btexture[ memory

The resulting image can then be used as texture on future passes through

the graphics pipeline

232 PROGRAMMING A GPU FOR GENERAL-PURPOSE

PROGRAMS (OLD)

Coopting this pipeline to perform general-purpose computation

involves the exact same steps but different terminology A motivating

example is a fluid simulation computed over a grid at each time step we

compute the next state of the fluid for each grid point from the current state

at its grid point and at the grid points of its neighbors

The programmer specifies a geometric primitive that covers a

computation domain of interest The rasterizer generates a fragment at each

pixel location covered by that geometry (In our example our primitive

must cover a grid of fragments equal to the domain size of our fluid

simulation)

Each fragment is shaded by an SPMD general-purpose fragment

program (Each grid point runs the same program to update the state of its

fluid)

The fragment program computes the value of the fragment by a

combination of math operations and Bgather[ accesses from global

memory (Each grid point can access the state of its neighbors from the

previous time step in computing its current value)

The resulting buffer in global memory can then be used as an input on

future passes (The current state of the fluid will be used on the next time

step)

23 2 PROGRAMMING A GPU FOR GENERAL-PURPOSE

PROGRAMS (NEW)

One of the historical difficulties in programming GPGPU applications

has been that despite their general-purpose tasksrsquo having nothing to do with

graphics the applications still had to be programmed using graphics APIs

In addition the program had to be structured in terms of the graphics

pipeline with the programmable units only accessible as an intermediate

step in that pipeline when the programmer would almost certainly prefer to

access the programmable units directly The programming environments we

describe in detail in Section IV are solving this difficulty by providing a

more natural direct non-graphics interface to the hardware and

specifically the programmable units Today GPU computing applications

are structured in the following way

The programmer directly defines the computation domain of interest as a

structured grid of threads

An SPMD general-purpose program computes the value of each thread

The value for each thread is computed by a combination of math

operations and both Bgather[ (read) accesses from and Bscatter[ (write)

accesses to global memory Unlike in the previous two

methods the same buffer can be used for both reading and writing

allowing more flexible algorithms (for example in-place algorithms that

use less memory)

The resulting buffer in global memory can then be used as an input in

future computation

24 COARSE-TO-FINE TWO-STAGE MATCHING PROCESS

To further improve the matching process we propose the coarse-to-fine

two-stage matching process In the first stage we matched two images

coarsely using the Y-shape descriptors which is very fast to match because

no registration was needed The matching result in this stage can help filter

out image pairs with low similarities After this step it is still possible for

some false positive matches In the second stage we used WPL descriptor

to register the two images for more detailed descriptor matching including

scale- and translation invariance This stage includes shift transform affine

matrix generation and final WPL descriptor matching Overall we

partitioned the registration and matching processing into four kernels2 in

CUDA (Figure 10) matching on the Y shape descriptor shift

transformation affine matrix generation and final WSL descriptor

matching Combining these two stages the matching program can run faster

and

achieve more accurate score

241 STAGE I MATCHING WITH Y SHAPE DESCRIPTOR

Due to scale- and rotation- invariance of the Y-shape features

registration is unnecessary before matching on Y shape descriptor The

whole matching algorithm is listed as algorithm 1

FIG

Here ytei and yta j are the Y shape descriptors of test template Tte

and target template Tta respectively dϕ is the Euclidian distance of angle

element of descriptors vector defined as (3) dxy is the Euclidian distance of

two descriptor centers defined as (4) ni and di are the matched descriptor

pairsrsquo number and their centers distance respectively tϕ is a distance

threshold and txy is the threshold to restrict the searching area We set tϕ to

30 and txy to 675 in our experiment Here

To match two sclera templates we searched the areas nearby to all

the Y shape branches The search area is limited to the corresponding left or

right half of the sclera in order to reduce the searching range and time The

distance of two branches is defined in (3) where ϕi j is the angle between

the j th branch and the polar from pupil center in desctiptor i

The number of matched pairs ni and the distance between Y shape

branches centers di are stored as the matching result We fuse the number of

matched branches and the average distance between matched branches

centers as (2) Here α is a factor to fuse the matching score which was set

to 30 in our study Ni and Nj is the total numbers of feature vectors in

template i and j separately The decision is regulated by the threshold t if

the sclerarsquos matching score is lower than t the sclera will be discarded The

sclera with high matching score will be passed to the next more precisely

matching process

242 STAGE II FINE MATCHING USING WPL DESCRIPTOR

The line segment WSL descriptor reveals more vessel structure detail of

sclera than the Y shape descriptor The variation of sclera vessel pattern is

nonlinear because

When acquiring an eye image in different gaze angle the vessel structure

will appear nonlinear shrink or extend because eyeball is spherical in shape

sclera is made up of four layers episclera stroma lamina fusca and

endothelium There are slightly differences among movement of these

layers Considering these factors our registration employed both single

shift transform and multi-parameter transform which combines shift

rotation and scale together

1) SHIFT PARAMETER SEARCH As we discussed before

segmentation may not be accurate As a result the detected iris center could

not be very accurate Shift transform is designed to tolerant possible errors

in pupil center detection in the segmentation step If there is no deformation

or only very minor deformation registration with shift transform together

would be adequate to achieve an accurate result We designed Algorithm 2

to get optimized shift parameter Where Tte is the test template and ssei is

the i th WPL descriptor of Tte Tta is the target template and ssai is the i th

WPL descriptor of Ttad(stek staj ) is Euclidean distance of escriptors stek

and staj

Δsk is the shift value of two descriptors defines as

We first randomly select an equal number of segment descriptors

stek in test template Tte from each quad and find its nearest neighbors staj _

in target template Tta The shift offset of them is recorded as the possible

registration shift factor _sk The final offset registration factor is _soptim

which has the smallest standard deviation among these candidate offsets

2) AFFINE TRANSFORM PARAMETER SEARCH

Affine transform is designed to tolerant some deformation of sclera

patterns in the matching step The affine transform algorithm is shown in

Algorithm 3 The shift value in the parameter set is obtained by randomly

selecting descriptor s(it )te and calculating the distance from its nearest

neighbor staj_ in Tta We transform the test template by the matrix in (7)

At end of the iteration we count the numbers of matched descriptor pairs

from the transformed template and the target template The factor β is

involved to determine if the pair of descriptor is matched and we set it to

be 20 pixels in our experiment After N iterations the optimized transform

parameter set is determined via selecting the maximum matching numbers

m(it) Here stei Tte staj and Tta is defined same as algorithm 2 tr (it )

shi f t θ(it )tr (it ) scale is the parameters of shift rotation and scale

transform generated in i tth iteration R(θ (it )) T (tr (it ) shi f t ) and S(tr (it

) scale) are the transform matrix defined as (7) To search optimize

transform parameter we iterated N times to generate these parameters In

our experiment we set iteration time to 512

3) REGISTRATION AND MATCHING ALGORITHM

Using the optimized parameter set determined from Algorithms 2

and 3 the test template will be registered and matched simultaneously The

registration and matching algorithm is listed in Algorithm 4 Here stei Tte

staj and Tta are defined same as Algorithms 2 and 3 θ(optm) tr (optm) shi f

t tr (optm) scale _soptim are the registration parameters attained from

Algorithms 2 and 3 R_θ(optm)_T _tr (optm) shi f t _S_tr (optm) scale

_ is the descriptor transform matrix defined in Algorithm 3 empty is the angle

between the segment descriptor and radius direction w is the weight of the

descriptor which indicates whether the descriptor is at the edge of sclera or

not To ensure that the nearest descriptors have a similar orientation we

used a constant factor α to check the abstract difference of two ɸ In our

experiment we set α to 5 The total matching score is minimal score of two

transformed result divided by the minimal matching score for test template

and target template

25 MAPPING THE SUBTASKS TO CUDA

CUDA is a single instruction multiple data (SIMD) system and

works as a coprocessor with a CPU A CUDA consists of many streaming

multiprocessors (SM) where the parallel part of the program should be

partitioned into threads by the programmer and mapped into those threads

There are multiple memory spaces in the CUDA memory hierarchy

register local memory shared memory global memory constant memory

and texture memory Register local memory and shared memory are on-

chip and could be a little time consuming to access these memories Only

shared memory can be accessed by other threads within the same block

However there is only limited availability of shared memory Global

memory constant memory and texture memory are off-chip memory and

accessible by all threads which would be very time consuming to access

these memories

Constant memory and texture memory are read-only and cacheable

memory Mapping algorithms to CUDA to achieve efficient processing is

not a trivial task There are several challenges in CUDA programming

If threads in a warp have different control path all the branches will be

executed serially To improve performance branch divergence within a

warp should be avoided

Global memory is slower than on-chip memory in term of access To

completely hide the latency of the small instructions set we should use on-

chip memory preferentially rather than global memory When global

memory access occurs threads in same warp should access the words in

sequence to achieve coalescence

Shared memory is much faster than the local and global memory space

But shared memory is organized into banks which are equal in size If two

addresses of memory request from different thread within a warp fall in the

same memory bank the access will be serialized To get maximum

performance memory requests should be scheduled to minimize bank

conflicts

251 MAPPING ALGORITHM TO BLOCKS

Because the proposed registration and matching algorithm has four

independent modules all the modules will be converted to different kernels

on the GPU These kernels are different in computation density thus we

map them to the GPU by various map strategies to fully utilize the

computing power of CUDA Figure 11 shows our scheme of CPU-GPU

task distribution and the partition among blocks and threads Algorithm 1 is

partitioned into coarse-grained parallel subtasks

We create a number of threads in this kernel The number of threads

is the same as the number of templates in the database As the upper middle

column shows in Figure 11 each target template will be assigned to one

thread One thread performs a pair of templates compare In our work we

use NVIDIA C2070 as our GPU Threads and blocks number is set to

1024 That means we can match our test template with up to 1024times1024

target templates at same time

Algorithms 2-4 will be partitioned into fine-grained subtasks which is

processed a section of descriptors in one thread As the lower portion of the

middle column shows in Figure 11 we assigned a target template to one

block Inside a block one thread corresponds a set of descriptors in this

template This partition makes every block execute independently and there

are no data exchange requirements between different blocks When all

threads complete their responding descriptor fractions the sum of the

intermediate results needs to be computed or compared A parallel prefix

sum algorithm is used to calculate the sum of intermediate results which is

show in right of Figure 11 Firstly all odd number threads compute the sum

of consecutive pairs of the results Then recursively every first of i (= 4 8

16 32 64 ) threads

compute the prefix sum on the new result The final result will be saved in

the first address which has the same variable name as the first intermediate

result

252 MAPPING INSIDE BLOCK

In shift argument searching there are two schemes we can choose to

map task

Mapping one pair of templates to all the threads in a block and then every

thread would take charge of a fraction of descriptors and cooperation with

other threads

Assigning a single possible shift offset to a thread and all the threads will

compute independently unless the final result should be compared with

other possible offset

Due to great number of sum and synchronization operations in every

nearest neighbor searching step we choose the second method to parallelize

shift searching In affine matrix generator we mapped an entire parameter

set searching to a thread and every thread randomly generated a set of

parameters and tried them independently The generated iterations were

assigned to all threads The challenge of this step is the randomly generated

numbers might be correlated among threads In the step of rotation and

scale registration generating we used the Mersenne Twister pseudorandom

number generator because it can use bitwise arithmetic and have long

period

The Mersenne twister as most of pseudorandom generators is iterative

Therefore itrsquos hard to parallelize a single twister state update step among

several execution threads To make sure that thousands of threads in the

launch grid generate uncorrelated random sequence many simultaneous

Mersenne twisters need to process with different initial states in parallel

But even ldquovery differentrdquo (by any definition) initial state values do not

prevent the emission of correlated sequences by each generator sharing

identical parameters To solve this problem and to enable efficient

implementation of Mersenne Twister on parallel architectures we used a

special offline tool for the dynamic creation of Mersenne Twisters

parameters modified from the algorithm developed by Makoto Matsumoto

and Takuji Nishimura In the registration and matching step when

searching the nearest neighbor a line segment that has already matched

with others should not be used again In our approach a flag

FIG

FIG

Variable denoting whether the line has been matched is stored in

shared memory To share the flags all the threads in a block should wait

synchronic operation at every query step Our solution is to use a single

thread in a block to process the matching

253 MEMORY MANAGEMENT

The bandwidth inside GPU board is much higher than the

bandwidth between host memory and device memory The data transfer

between host and device can lead to long latency As shown in Figure 11

we load the entire target templates set from database without considering

when they would be processed Therefore there was no data transfer from

host to device during the matching procedure In global memory the

components in descriptor y(ϕ1 ϕ2 ϕ3 x y) and s(x y rθ ϕw) were stored

separately This would guarantee contiguous kernels of Algorithm 2 to 4

can access their data in successive addresses Although such coalescing

access reduces the latency frequently global memory access was still a

slower way to get data In our kernel we loaded the test template to shared

memory to accelerate memory access Because the Algorithms 2 to 4

execute different number of iterations on same data the bank conflict does

not happen To maximize our texture memory space we set the system

cache to the lowest value and bonded our target descriptor to texture

memory Using this catchable memory our data access was accelerated

more

FIG

26 HISTOGRAM OF ORIENTED GRADIENTS

Histogram of oriented gradients is the feature descriptors It is primarily

applied to the design of target detection In this paper it is applied as the

feature for human recognition In the sclera region the vein patterns are the

edges of an image So HOG is used to determine the gradient orientation

and edge orientations of vein pattern in the sclera region of an eye image

To follow out this technique first of all divide the image into small

connected regions called cells For each cell compute the histogram of

gradient directions or edge orientations of the pixels Then the combination

of different histogram of different cell represents the descriptor To improve

accuracy histograms can be contrast normalized by calculating the intensity

from the block and then using this value normalizes all cells within the

block This normalization result shows that it is invariant to geometric and

photometric changes The gradient magnitude m(x y) and orientation 1050592(x

y) are calculated using x and y directions gradients dx (x y) and dy (x y)

Orientation binning is the second step of HOG This method utilized

to create cell histograms Each pixel within the cell used to give a weight to

the orientation which is found in the gradient computation Gradient

magnitude is used as the weight The cells are in the rectangular form The

binning of gradient orientation should be spread over 0 to 180 degrees and

opposite direction counts as the same In the Fig 8 depicts the edge

orientation of picture elements If the images have any illumination and

contrast changes then the gradient strength must be locally normalized For

that cells are grouped together into larger blocks These blocks are

overlapping so that each cell contributes more than once to the final

descriptor Here rectangular HOG (R-HOG) blocks are applied which are

mainly in square grids The performance of HOG is improved by putting

on a Gaussian window into each block

FIG

CHAPTER 3

SOFTWARE SPECIFICATION

31 GENERAL

MATLAB(matrix laboratory) is a numerical

computing environment and fourth-generation programming language

Developed by Math Works MATLAB allows matrix manipulations

plotting of functions and data implementation of algorithms creation

of user interfaces and interfacing with programs written in other languages

including C C++ Java and Fortran

Although MATLAB is intended primarily for numerical computing an

optional toolbox uses the MuPAD symbolic engine allowing access

to symbolic computing capabilities An additional package Simulink adds

graphicalmulti-domainsimulationandModel-Based

Design for dynamic and embedded systems

In 2004 MATLAB had around one million users across industry

and academia MATLAB users come from various backgrounds

of engineering science and economics MATLAB is widely used in

academic and research institutions as well as industrial enterprises

MATLAB was first adopted by researchers and practitioners

in control engineering Littles specialty but quickly spread to many other

domains It is now also used in education in particular the teaching

of linear algebra and numerical analysis and is popular amongst scientists

involved in image processing The MATLAB application is built around the

MATLAB language The simplest way to execute MATLAB code is to type

it in the Command Window which is one of the elements of the MATLAB

Desktop When code is entered in the Command Window MATLAB can

be used as an interactive mathematical shell Sequences of commands can

be saved in a text file typically using the MATLAB Editor as a script or

encapsulated into a function extending the commands available

MATLAB provides a number of features for documenting and

sharing your work You can integrate your MATLAB code with other

languages and applications and distribute your MATLAB algorithms and

applications

32 FEATURES OF MATLAB

High-level language for technical computing

Development environment for managing code files and data

Interactive tools for iterative exploration design and problem solving

Mathematical functions for linear algebra statistics Fourier analysis

filtering optimization and numerical integration

2-D and 3-D graphics functions for visualizing data

Tools for building custom graphical user interfaces

Functions for integrating MATLAB based algorithms with external

applications and languages such as C C++ FORTRAN Javatrade COM

and Microsoft Excel

MATLAB is used in vast area including signal and image

processing communications control design test and measurement

financial modeling and analysis and computational Add-on toolboxes

(collections of special-purpose MATLAB functions) extend the MATLAB

environment to solve particular classes of problems in these application

areas

MATLAB can be used on personal computers and powerful

server systems including the Cheaha compute cluster With the addition of

the Parallel Computing Toolbox the language can be extended with parallel

implementations for common computational functions including for-loop

unrolling Additionally this toolbox supports offloading computationally

intensive workloads to Cheaha the campus compute cluster MATLAB is

one of a few languages in which each variable is a matrix (broadly

construed) and knows how big it is Moreover the fundamental operators

(eg addition multiplication) are programmed to deal with matrices when

required And the MATLAB environment handles much of the bothersome

housekeeping that makes all this possible Since so many of the procedures

required for Macro-Investment Analysis involves matrices MATLAB

proves to be an extremely efficient language for both communication and

implementation

321 INTERFACING WITH OTHER LANGUAGES

MATLAB can call functions and subroutines written in the C

programming language or FORTRAN A wrapper function is created

allowing MATLAB data types to be passed and returned The dynamically

loadable object files created by compiling such functions are termed MEX-

files (for MATLAB executable)

Libraries written in Java ActiveX or NET can be directly called

from MATLAB and many MATLAB libraries (for

example XML or SQL support) are implemented as wrappers around Java

or ActiveX libraries Calling MATLAB from Java is more complicated but

can be done with MATLAB extension which is sold separately by Math

Works or using an undocumented mechanism called JMI (Java-to-Mat lab

Interface) which should not be confused with the unrelated Java that is also

called JMI

As alternatives to the MuPAD based Symbolic Math Toolbox

available from Math Works MATLAB can be connected

to Maple or Mathematical

Libraries also exist to import and export MathML

Development Environment

Startup Accelerator for faster MATLAB startup on Windows especially on

Windows XP and for network installations

Spreadsheet Import Tool that provides more options for selecting and

loading mixed textual and numeric data

Readability and navigation improvements to warning and error messages in

the MATLAB command window

Automatic variable and function renaming in the MATLAB Editor

Developing Algorithms and Applications

MATLAB provides a high-level language and development

tools that let you quickly develop and analyze your algorithms and

applications

The MATLAB Language

The MATLAB language supports the vector and matrix operations

that are fundamental to engineering and scientific problems It enables fast

development and execution With the MATLAB language you can

program and develop algorithms faster than with traditional languages

because you do not need to perform low-level administrative tasks such as

declaring variables specifying data types and allocating memory In many

cases MATLAB eliminates the need for lsquoforrsquo loops As a result one line of

MATLAB code can often replace several lines of C or C++ code

At the same time MATLAB provides all the features of a traditional

programming language including arithmetic operators flow control data

structures data types object-oriented programming (OOP) and debugging

features

MATLAB lets you execute commands or groups of commands one

at a time without compiling and linking enabling you to quickly iterate to

the optimal solution For fast execution of heavy matrix and vector

computations MATLAB uses processor-optimized libraries For general-

purpose scalar computations MATLAB generates machine-code

instructions using its JIT (Just-In-Time) compilation technology

This technology which is available on most platforms provides

execution speeds that rival those of traditional programming languages

Development Tools

MATLAB includes development tools that help you implement

your algorithm efficiently These include the following

MATLAB Editor

Provides standard editing and debugging features such as setting

breakpoints and single stepping

Code Analyzer

Checks your code for problems and recommends modifications to

maximize performance and maintainability

MATLAB Profiler

Records the time spent executing each line of code

Directory Reports

Scan all the files in a directory and report on code efficiency file

differences file dependencies and code coverage

Designing Graphical User Interfaces

By using the interactive tool GUIDE (Graphical User Interface

Development Environment) to layout design and edit user interfaces

GUIDE lets you include list boxes pull-down menus push buttons radio

buttons and sliders as well as MATLAB plots and Microsoft

ActiveXreg controls Alternatively you can create GUIs programmatically

using MATLAB functions

322 ANALYZING AND ACCESSING DATA

MATLAB supports the entire data analysis process from acquiring

data from external devices and databases through preprocessing

visualization and numerical analysis to producing presentation-quality

output

Data Analysis

MATLAB provides interactive tools and command-line functions for data

analysis operations including

Interpolating and decimating

Extracting sections of data scaling and averaging

Thresholding and smoothing

Correlation Fourier analysis and filtering

1-D peak valley and zero finding

Basic statistics and curve fitting

Matrix analysis

Data Access

MATLAB is an efficient platform for accessing data from

files other applications databases and external devices You can read data

from popular file formats such as Microsoft Excel ASCII text or binary

files image sound and video files and scientific files such as HDF and

HDF5 Low-level binary file IO functions let you work with data files in

any format Additional functions let you read data from Web pages and

XML

Visualizing Data

All the graphics features that are required to visualize engineering

and scientific data are available in MATLAB These include 2-D and 3-D

plotting functions 3-D volume visualization functions tools for

interactively creating plots and the ability to export results to all popular

graphics formats You can customize plots by adding multiple axes

changing line colors and markers adding annotation Latex equations and

legends and drawing shapes

2-D Plotting

Visualizing vectors of data with 2-D plotting functions that create

Line area bar and pie charts

Direction and velocity plots

Histograms

Polygons and surfaces

Scatterbubble plots

Animations

3-D Plotting and Volume Visualization

MATLAB provides functions for visualizing 2-D matrices 3-

D scalar and 3-D vector data You can use these functions to visualize and

understand large often complex multidimensional data Specifying plot

characteristics such as camera viewing angle perspective lighting effect

light source locations and transparency

3-D plotting functions include

Surface contour and mesh

Image plots

Cone slice stream and isosurface

323 PERFORMING NUMERIC COMPUTATION

MATLAB contains mathematical statistical and engineering

functions to support all common engineering and science operations These

functions developed by experts in mathematics are the foundation of the

MATLAB language The core math functions use the LAPACK and BLAS

linear algebra subroutine libraries and the FFTW Discrete Fourier

Transform library Because these processor-dependent libraries are

optimized to the different platforms that MATLAB supports they execute

faster than the equivalent C or C++ code

MATLAB provides the following types of functions for performing

mathematical operations and analyzing data

Matrix manipulation and linear algebra

Polynomials and interpolation

Fourier analysis and filtering

Data analysis and statistics

Optimization and numerical integration

Ordinary differential equations (ODEs)

Partial differential equations (PDEs)

Sparse matrix operations

MATLAB can perform arithmetic on a wide range of data types

including doubles singles and integers

CHAPTER 4

IMPLEMENTATION

41 GENERAL

Matlab is a program that was originally designed to simplify the

implementation of numerical linear algebra routines It has since grown into

something much bigger and it is used to implement numerical algorithms

for a wide range of applications The basic language used is very similar to

standard linear algebra notation but there are a few extensions that will

likely cause you some problems at first

42 SNAPSHOTS

ORIGINAL SCLERA IMAGE IS CONVERTED INTO GREY SCALE IMAGE

FIG

GREY SCALE IMAGE IS CONVERTED INTO BINARY IMAGE

FIG

EDGE DETECTON IS DONE BY OTSUrsquoS THRESHOLDING

FIG

SELECTING THE REGION OF INTEREST (SCLERA PART)

FIG

SELECTED ROI PART

FIG

FIG

ENHANCEMENT OF SCLERA IMAGE

FIG

FEATURE EXTRACTION OF SCLERA IMAGE USING GABOR

FILTERS

FIG

MATCHING WITH IMAGES IN DATABASE

FIG

DISPLAYING THE RESULT (MATCHED OR NOT MATCHED)

FIG

CHAPTER 5

APPLICATIONS

The applications of biometrics can be divided into the following three main groups

Commercial applications such as computer network login electronic data security ecommerce Internet access ATM credit card physical access control cellular phone PDA medical records management distance learning etc

Government applications such as national ID card correctional facility driverrsquos license social security welfare-disbursement border control Passport control etc

Forensic applications such as corpse identification criminal investigation terrorist identification parenthood determination missing children etc Traditionally commercial applications have used knowledge-based systems (eg PIN sand passwords) government applications have used token-based systems (eg ID cards and badges) and forensic applications have relied on human experts to match biometric features Biometric systems are being increasingly deployed in large scale civilian applications The Schiphol Premium scheme at the Amsterdam airport for example employs iris scan cards to speed up the passport and visa control procedures

CHAPTER 6

CONCLUSION AND FUTURE SCOPE

61 CONCLUSION

In this paper we proposed a new parallel sclera vein recognition

method which employees a two stage parallel approach for registration and

matching Even though the research focused on developing a parallel sclera

matching solution for the sequential line-descriptor method using CUDA

GPU architecture the parallel strategies developed in this research can be

applied to design parallel solutions to other sclera vein recognition methods

and general pattern recognition methods We designed the Y shape

descriptor to narrow the search range to increase the matching efficiency

which is a new feature extraction method to take advantage of the GPU

structures We developed the WPL descriptor to incorporate mask

information and make it more suitable for parallel computing which can

dramatically reduce data transferring and computation We then carefully

mapped our algorithms to GPU threads and blocks which is an important

step to achieve parallel computation efficiency using a GPU A work flow

which has high arithmetic intensity to hide the memory access latency was

designed to partition the computation task to the heterogeneous system of

CPU and GPU even to the threads in GPU The proposed method

dramatically improves the matching efficiency without compromising

recognition accuracy

62 REFERENCES

[1] C W Oyster The Human Eye Structure and Function Sunderland

Sinauer Associates 1999

[2] C Cuevas D Berjon F Moran and N Garcia ldquoMoving object

detection for real-time augmented reality applications in a GPGPUrdquo IEEE

Trans Consum Electron vol 58 no 1 pp 117ndash125 Feb 2012

[3] D C Cirean U Meier L M Gambardella and J Schmidhuber ldquoDeep

big simple neural nets for handwritten digit recognitionrdquo Neural Comput

vol 22 no 12 pp 3207ndash3220 2010

[4] F Z Sakr M Taher and A M Wahba ldquoHigh performance iris

recognition system on GPUrdquo in Proc ICCES 2011 pp 237ndash242

[5] G Poli J H Saito J F Mari and M R Zorzan ldquoProcessing

neocognitron of face recognition on high performance environment based

on GPU with CUDA architecturerdquo in Proc 20th Int Symp Comput

Archit High Perform Comput 2008 pp 81ndash88

[6] J Antikainen J Havel R Josth A Herout P Zemcik and M Hauta-

Kasari ldquoNonnegative tensor factorization accelerated using GPGPUrdquo IEEE

Trans Parallel Distrib Syst vol 22 no 7 pp 1135ndash1141 Feb 2011

[7] K-S Oh and K Jung ldquoGPU implementation of neural networksrdquo

Pattern Recognit vol 37 no 6 pp 1311ndash1314 2004

[8] P R Dixon T Oonishi and S Furui ldquoHarnessing graphics processors

for the fast computation of acoustic likelihoods in speech recognitionrdquo

Comput Speech Lang vol 23 no 4 pp 510ndash526 2009

[9] P Kaufman and A Alm ldquoClinical applicationrdquo Adlerrsquos Physiology of

the Eye 2003

[10] R N Rakvic B J Ulis R P Broussard R W Ives and N Steiner

ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics Security vol 4

no 4 pp 812ndash823 Dec 2009

[11] S Crihalmeanu and A Ross ldquoMultispectral scleral patterns for ocular

biometric recognitionrdquo Pattern Recognit Lett vol 33 no 14 pp 1860ndash

1869 Oct 2012

[12] W Wenying Z Dongming Z Yongdong L Jintao and G

Xiaoguang ldquoRobust spatial matching for object retrieval and its parallel

implementation on GPUrdquo IEEE Trans Multimedia vol 13 no 6 pp

1308ndash1318 Dec 2011Multimedia Sec Magdeburg Germany Sep 2004

pp 4ndash15

[13] Y Xu S Deka and R Righetti ldquoA hybrid CPU-GPGPU approach for

real-time elastographyrdquo IEEE Trans Ultrason Ferroelectr Freq Control

vol 58 no 12 pp 2631ndash2645 Dec 2011

[14] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive

multimodal eye recognitionrdquo Signal Image Video Process vol 7 no 4

pp 619ndash631 Jul 2013

[15] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive

approach for sclera image quality measurerdquo Int J Biometrics vol 5 no 2

pp 181ndash198 2013

[16] Z Zhou E Y Du N L Thomas and E J Delp ldquoA new human

identification method Sclera recognitionrdquo IEEE Trans Syst Man

Cybern A Syst Humans vol 42 no 3 pp 571ndash583 May 2012

Page 19: Major project report on

153 COLOR IMAGE

The RGB colour model relates very closely to the way we perceive

colour with the r g and b receptors in our retinas RGB uses additive colour

mixing and is the basic colour model used in television or any other

medium that projects colour with light It is the basic colour model used in

computers and for web graphics but it cannot be used for print production

The secondary colours of RGB ndash cyan magenta and yellow ndash are formed

by mixing two of the primary colours (red green or blue) and excluding the

third colour Red and green combine to make yellow green and blue to

make cyan and blue and red form magenta The combination of red green

and blue in full intensity makes white

In Photoshop using the ldquoscreenrdquo mode for the different layers in an

image will make the intensities mix together according to the additive

colour mixing model This is analogous to stacking slide images on top of

each other and shining light through them

FIG

CMYK The 4-colour CMYK model used in printing lays down

overlapping layers of varying percentages of transparent cyan (C) magenta

(M) and yellow (Y) inks In addition a layer of black (K) ink can be added

The CMYK model uses the subtractive colour model

154 INDEXED IMAGE

FIG

An indexed image consists of an array and a color map matrix The

pixel values in the array are direct indices into a color map By convention

this documentation uses the variable name X to refer to the array and map

to refer to the color map In computing indexed color is a technique to

manage digital images colors in a limited fashion in order to save

computer memory and file storage while speeding up display refresh and

file transfers It is a form of vector quantization compression

When an image is encoded in this way color information is not

directly carried by the image pixel data but is stored in a separate piece of

data called a palette an array of color elements in which every element a

color is indexed by its position within the array The image pixels do not

contain the full specification of its color but only its index in the palette

This technique is sometimes referred as pseudocolor or indirect color as

colors are addressed indirectly

Perhaps the first device that supported palette colors was a random-

access frame buffer described in 1975 by Kajiya Sutherland and Cheadle

This supported a palette of 256 36-bit RGB colors

16 Applications of image processing

Interest in digital image processing methods stems from 2 principal

application areas

1) Improvement of pictorial information for human interpretation

2) Processing of scene data for autonomous machine perception

In the second application area interest focuses on procedures for

extracting from an image

Information in a form suitable for computer processing

Examples include automatic character recognition industrial machine

vision for product assembly and inspection military recognizance

automatic processing of fingerprints etc

17 EXISTING SYSTEM

In Crihalmeanu and Ross proposed three approaches Speed Up Robust

Features (SURF)-based method minutiae detection and direct correlation

matching for feature registration and matching Within these three methods

the SURF method achieves the best accuracy It takes an average of 15

seconds1 using the SURF method to perform a one-to-one matching In

Zhou et al Proposed line descriptor-based method for sclera vein

recognition The matching step (including registration) is the most time-

consuming step in this sclera vein recognition system which costs about 12

seconds to perform a one-to-one matching Both speed was calculated using

a PC with Intelreg Core TM 2 Duo 24GHz processors and 4 GB DRAM

Currently Sclera vein recognition algorithms are designed using central

processing unit (CPU)-based systems

171 DISADVANTAGES OF EXISTING SYSTEM

1 Mask files are used to calculate valid overlapping areas of two sclera

templates and to align the templates to the same coordinate system But the

mask files are large in size and will preoccupy the GPU memory and slow

down the data transfer Also some of processing on the mask files will

involve convolution which is difficult to improve its performance on the

scalar process unit on CUDA

2 The procedure of sclera feature matching consists of a pipeline of several

computational stages with different memory and processing requirements

There is no uniform mapping scheme applicable for all these stages

3 When the scale of sclera database is far larger than the number of the

processing units on the GPU parallel matching on the GPU is still unable to

satisfy the requirement of real-time performance

LITERATURE SURVEY

1 S Crihalmeanu and A Ross ldquo ieee Multispectral scleral patterns

for ocular biometric recognitionrdquo Pattern Recognit Lett vol 33 no

14 pp 1860ndash1869 Oct 2012

Face recognition in unconstrained acquisition conditions is one of the

most challenging problems that has been actively researched in recent

years It is well known that many state-of-the-arts still face recognition

algorithms perform well when constrained (frontal well illuminated high-

resolution sharp and full) face images are acquired However their

performance degrades significantly when the test images contain variations

that are not present in the training images In this paper we highlight some

of the key issues in remote face recognition We define the remote face

recognition as one where faces are several tens of meters (10-250m) from

the cameras We then describe a remote face database which has been

acquired in an unconstrained outdoor maritime environment Recognition

performance of a subset of existing still image-based face recognition

algorithms is evaluated on the remote face data set Further we define the

remote re-identification problem as matching a subject at one location with

candidate sets acquired at a different location and over time in remote

conditions We provide preliminary experimental results on remote re-

identification It is demonstrated that in addition to applying a good

classification algorithm finding features that are robust to variations

mentioned above and developing statistical models which can account for

these variations are very important for remote face recognition

2 R N Rakvic B J Ulis R P Broussard R W Ives and N

Steiner1 ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics

Security

With the rapidly expanded biometric data collected by various sectors

of government and industry for identification and verification purposes

how to manage and process such Big Data draws great concern Even

though modern processors are equipped with more cores and memory

capacity it still requires careful design in order to utilize the hardware

resource effectively and the power consumption efficiently This research

addresses this issue by investigating the workload characteristics of

biometric application Taking Daugmanrsquos iris matching algorithm which

has been proven to be the most reliable iris matching method as a case

study we conduct performance profiling and binary instrumentation on the

benchmark to capture its execution behavior The results show that data

loading and memory access incurs great performance overhead and

motivates us to move the biometrics computation to high-performance

architecture

Modern iris recognition algorithms can be computationally intensive

yet are designed for traditional sequential processing elements such as a

personal computer However a parallel processing alternative using field

programmable gate arrays (FPGAs) offers an opportunity to speed up iris

recognition Within the means of this project iris template generation with

directional filtering which is acomputationally expensive yet parallel

portion of a modern iris recognition algorithm is parallelized on an FPGA

system We will present a performance comparison of the parallelized

algorithm on the FPGA system to a traditional CPU-based version The

parallelized template generation outperforms an optimized C++ code

version determining the information content of an iris approximately 324

times faster

3 R Derakhshani A Ross and S Crihalmeanu ldquoA new biometric

modality based on conjunctival vasculaturerdquo in Proc Artif Neural

Netw Eng 2006 pp 1ndash8

A new biometric indicator based on the patterns of conjunctival

vasculature is proposed Conjunctival vessels can be observed on the visible

part of the sclera that is exposed to the outside world These vessels

demonstrate rich and specific details in visible light and can be easily

photographed using a regular digital camera In this paper we discuss

methods for conjunctival imaging preprocessing and feature extraction in

order to derive a suitable conjunctival vascular template for biometric

authentication Commensurate classification methods along with the

observed accuracy are discussed Experimental results suggest the potential

of using conjunctival vasculature as a biometric measure Identification of

a person based on some unique set of features is an important task The

human identification is possible with several biometric systems and sclera

recognition is one of the promising biometrics The sclera is the white

portion of the human eye The vein pattern seen in the sclera region is

unique to each person Thus the sclera vein pattern is a well suited

biometric technology for human identification The existing methods used

for sclera recognition have some drawbacks like only frontal looking

images are preferred for matching and rotation variance is another problem

These problems are completely eliminated in the proposed system by using

two feature extraction techniques They are Histogram of Oriented

Gradients (HOG) and converting the image into polar form using the

bilinear interpolation technique These two features help the proposed

system to become illumination invariant and rotation invariant The

experimentation is done with the help of UBIRIS database The

experimental result shows that the proposed sclera recognition method can

achieve better accuracy than the previous methods

4 J D Owens M Houston D Luebke S Green J E Stone and J

C Phillips ldquoGPU computingrdquo Proc IEEE vol 96 no 5pp 879ndash899

May 2008

The graphics processing unit (GPU) has become an integral part of

todayrsquos mainstream computing systems Over the past six years there has

been a marked increase in the performance and capabilities of GPUs The

modern GPU is not only a powerful graphics engine but also a highly

parallel programmable processor featuring peak arithmetic and memory

bandwidth that substantially outpaces its CPU counterpart The GPUrsquos

rapid increase in both programmability and capability has spawned a

research community that has successfully mapped a broad range of

computationally demanding complex problems to the GPU This effort in

general purpose computing on the GPU also known as GPU computing

has positioned the GPU as a compelling alternative to traditional

microprocessors in high-performance computer systems of the future We

describe the background hardware and programming model for GPU

computing summarize the state of the art in tools and techniques and

present four GPU computing successes in game physics and computational

biophysics that deliver order-of-magnitude performance gains over

optimized CPU applications

5 H Proenccedila and L A Alexandre ldquoUBIRIS A noisy iris image

databaserdquoin Proc 13th Int Conf Image Anal Process 2005 pp 970ndash

977

This paper proposes algorithms for iris segmentation quality

enhancement match score fusion and indexing to improve both the

accuracy and the speed of iris recognition A curve evolution approach is

proposed to effectively segment a nonideal iris image using the modified

MumfordndashShah functional Different enhancement algorithms are

concurrently applied on the segmented iris image to produce multiple

enhanced versions of the iris image A support-vector-machine-based

learning algorithm selects locally enhanced regions from each globally

enhanced image and combines these good-quality regions to create a single

high-quality iris image Two distinct features are extracted from the high-

quality iris image The global textural feature is extracted using the 1-D log

polar Gabor transform and the local topological feature is extracted using

Euler numbers An intelligent fusion algorithm combines the textural and

topological matching scores to further improve the iris recognition

performance and reduce the false rejection rate whereas an indexing

algorithm enables fast and accurate iris identification The verification and

identification performance of the propose algorithms is validated and

compared with other algorithms using the CASIA Version 3 ICE 2005 and

UBIRIS iris databases

18 PROPOSED METHOD

Proposed a new parallel sclera vein recognition method using a two-

stage parallel approach for registration and matching A parallel sclera

matching solution for Sclera vein recognition using our sequential line-

descriptor method using the CUDA GPU architecture CUDA is a highly

parallel multithreaded many-core processor with tremendous

computational power

It supports not only a traditional graphics pipeline but also computation

on non-graphical data It is relatively straightforward to implement our C

program for CUDA on AMD-based GPU using Open CL Our CUDA

kernels can be directly converted to Open CL kernels by concerning

different syntax for various keywords and built-in functions The mapping

strategy is also effective in Open CL if we regard thread and block in

CUDA as work item and work-group in Open CL Most of our optimization

techniques such as coalesced memory access and prefix sum can work in

Open CL too Moreover since CUDA is a data parallel architecture the

implementation of our approach by Open CL should be programmed in

data-parallel model

In this research we first discuss why the naiumlve parallel approach would

not work We then propose the new sclera descriptor ndash the Y shape sclera

feature-based efficient registration method to speed up the mapping scheme

introduce the ldquoweighted polar line (WPL) descriptorrdquo that would be better

suited for parallel computing to mitigate the mask size issue and develop

our coarse to fine two-stage matching process to dramatically improve the

matching speed These new approaches make the parallel processing

possible and efficient

191PROPOSED SYSTEM ADVANTAGES

1 To improve the efficiency in this research we propose a new descriptor

mdash the Y shape descriptor which can greatly help improve the efficiency of

the coarse registration of two images and can be used to filter out some

non-matching pairs before refined matching

2 We propose the coarse-to-fine two-stage matching process In the first

stage we matched two images coarsely using the Y-shape descriptors

which is very fast to match because no registration was needed The

matching result in this stage can help filter out image pairs with low

similarities

CHAPTER 2

PROJECT DESCRIPTION

21 INTRODUCTION

The sclera is the opaque and white outer layer of the eye The blood

vessel structure of sclera is formed randomly and is unique to each person

which can be used for humanrsquos identification Several researchers have

designed different Sclera vein recognition methods and have shown that it

is promising to use Sclera vein recognition for human identification In

Crihalmeanu and Ross proposed three approaches Speed Up Robust

Features (SURF)-based method minutiae detection and direct correlation

matching for feature registration and matching Within these three methods

the SURF method achieves the best accuracy It takes an average of 15

seconds1 using the SURF method to per- form a one-to-one matching Zhou

et al proposed line descriptor-based method for sclera vein recognition

The matching step (including registration) is the most time-consuming step

in this sclera vein recognition system which costs about 12 seconds to

perform a one-to-one matching Both speed was calculated using a PC with

Intelreg Coretrade 2 Duo 24GHz processors and 4 GB DRAM Currently

Sclera vein recognition algorithms are designed using central processing

unit (CPU)-based systems

As discussed CPU-based systems are designed as sequential

processing devices which may not be efficient in data processing where the

data can be parallelized Because of large time consumption in the matching

step Sclera vein recognition using sequential-based method would be very

challenging to be implemented in a real time biometric system especially

when there is large number of templates in the database for matching GPUs

(as abbreviation of General purpose Graphics Processing Units GPGPUs)

are now popularly used for parallel computing to improve the

computational processing speed and efficiency The highly parallel

structure of GPUs makes them more effective than CPUs for data

processing where processing can be performed in parallel GPUs have been

widely used in biometrics recognition such as speech recognition text

detection handwriting recognition and face recognition In iris

recognition GPU was used to extract the features construct descriptors

and match templates

GPUs are also used for object retrieval and image search Park et al

designed the performance evaluation of image processing algorithms such

as linear feature extraction and multi-view stereo matching on GPUs

However these approaches were designed for their specific biometric

recognition applications and feature searching methods Therefore they may

not be efficient for Sclera vein recognition Compute Unified Device

Architecture (CUDA) the computing engine of NVIDIA GPUs is used in

this research CUDA is a highly parallel multithreaded many-core

processor with tremendous computational power It supports not only a

traditional graphics pipeline but also computation on non-graphical data

More importantly it offers an easier programming platform which

outperforms its CPU counterparts in terms of peak arithmetic intensity and

memory bandwidth In this research the goal is not to develop a unified

strategy to parallelize all sclera matching methods because each method is

quite different from one another and would need customized design To

develop an efficient parallel computing scheme it would need different

strategies for different Sclera vein recognition methods

Rather the goal is to develop a parallel sclera matching solution for

Sclera vein recognition using our sequential line-descriptor method using

the CUDA GPU architecture However the parallelization strategies

developed in this research can be applied to design parallel approaches for

other Sclera vein recognition methods and help parallelize general pattern

recognition methods Based on the matching approach in there are three

challenges to map the task of sclera feature matching to GPU

1) Mask files are used to calculate valid overlapping areas of two sclera

templates and to align the templates to the same coordinate system But the

mask files are large in size and will preoccupy the GPU memory and slow

down the data transfer Also some of processing on the mask files will

involve convolution which is difficult to improve its performance on the

scalar process unit on CUDA

2) The procedure of sclera feature matching consists of a pipeline of several

computational stages with different memory and processing requirements

There is no uniform mapping scheme applicable for all these stages

3) When the scale of sclera database is far larger than the number of the

processing units on the GPU parallel matching on the GPU is still unable to

satisfy the requirement of real-time performance New designs are

necessary to help narrow down the search range In summary naiumlve

implementation of the algorithms in parallel would not work efficiently

Note it is relatively straightforward to implement our C program for

CUDA on AMD-based GPU using Open CL Our CUDA kernels can be

directly converted to Open CL kernels by concerning different syntax for

various keywords and built-in functions The mapping strategy is also

effective in Open CL if we regard thread and block in CUDA as work item

and work-group in Open CL Most of our optimization techniques such as

coalesced memory access and prefix sum can work in Open CL too

Moreover since CUDA is a data parallel architecture the implementation

of our approach by Open CL should be programmed in data-parallel model

In this research we first discuss why the naiumlve parallel approach would not

work (Section 3) We then propose the new sclera descriptor ndash the Y shape

sclera feature-based efficient registration method to speed up the mapping

scheme (Section 4) introduce the ldquoweighted polar line (WPL) descriptorrdquo

that would be better suited for parallel computing to mitigate the mask size

issue (Section 5) and develop our coarse to fine two-stage matching

process to dramatically improve the matching speed (Section 6) These new

approaches make the parallel processing possible and efficient However it

is non-trivial to implement these algorithms in CUDA We then developed

the implementation schemes to map our algorithms into CUDA (Section 7)

In the Section 2 we give brief introduction of Sclera vein recognition In

the Section 8 we performed some experiments using the proposed system

In the Section 9 we draw some conclusions

22 BACKGROUND OF SCLERA VEIN RECOGNITION

221 OVERVIEW OF SCLERA VEIN RECOGNITION

A typical sclera vein recognition system includes sclera

segmentation feature enhancement feature extraction and feature

matching (Figure 1)

FIG

Sclera image segmentation is the first step in sclera vein recognition

Several methods have been designed for sclera segmentation Crihalmeanu

et al presented an semi-automated system for sclera segmentation They

used a clustering algorithm to classify the color eye images into three

clusters - sclera iris and background Later on Crihalmeanu and Ross

designed a segmentation approach based on a normalized sclera index

measure which includes coarse sclera segmentation pupil region

segmentation and fine sclera segmentation Zhou et al developed a skin

tone plus ldquowhite colorrdquo-based voting method for sclera segmentation in

color images and Otsursquos thresholding-based method for grayscale images

After sclera segmentation it is necessary to enhance and extract the sclera

features since the sclera vein patterns often lack contrast and are hard to

detect Zhou et al used a bank of multi-directional Gabor filters for

vascular pattern enhancement Derakhshani et al used contrast limited

adaptive histogram equalization (CLAHE) to enhance the green color plane

of the RGB image and a multi-scale region growing approach to identify

the sclera veins from the image background Crihalmeanu and Ross applied

a selective enhancement filter for blood vessels to extract features from the

green component in a color image In the feature matching step

Crihalmeanu and Ross proposed

three registration and matching approaches including Speed Up Robust

Features (SURF) which is based on interest-point detection minutiae

detection which is based on minutiae points on the vasculature structure

and direct correlation matching which relies on image registration Zhou et

al designed a line descriptor based feature registration and matching

method

The proposed sclera recognition consists of five steps which include

sclera segmentation vein pattern enhancement feature extraction feature

matching and matching decision Fig 2 shows the block diagram of sclera

recognition Two types of feature extraction are used in the proposed

method to achieve good accuracy for the identification The characteristics

that are elicited from the blood vessel structure seen in the sclera region are

Histogram of Oriented Gradients (HOG) and interpolation of Cartesian to

Polar conversion HOG is used to determine the gradient orientation and

edge orientations of vein pattern in the sclera region of an eye image To

become more computationally efficient the data of the image are converted

to the polar form It is mainly used for circular or quasi circular shape of

object These two characteristics are extracted from all the images in the

database and compared with the features of the query image whether the

person is correctly identified or not This procedure is done in the feature

matching step and ultimately makes the matching decision By using the

proposed feature extraction methods and matching techniques the human

identification is more accurate than the existing studies In the proposed

method two features of an image are drawn out

222 SCLERA SEGMENTATION

Sclera segmentation is the first step in the sclera recognition It lets

in three steps glare area detection sclera area estimation and iris and eyelid

detection and refinement Fig shows the steps of segmentation

FIG

Glare Area Detection Glare area means a small bright area near

pupil or iris This is the unwanted portion on the eye image Sobel filter is

applied to detect the glare area present in the iris or pupil Simply it runs

only for the grayscale image If the image is color then it needs a

conversion to grayscale image and after that apply it to the Sobel filter to

detect the glare area Fig 4 shows the result of the glare area detection

FIG

Sclera area estimation For the estimation of sclera area Otsursquos

thresholding method is applied The stairs of the sclera area detection are

selection of the area of interest (ROI) Otsursquos thresholding sclera area

detection Left and right sclera area is selected based on the iris boundaries

When the region of interest is selected then apply Otsursquos thresholding for

obtaining the potential sclera areas The correct left sclera area should be

placed in the right and center positions and correct right sclera area should

be placed in the left and center In this way non sclera areas are wiped out

223 IRIS AND EYELID REFINEMENT

The top and underside of the sclera regions are the limits of the

sclera area And then that upper eyelid lower eyelid and iris boundaries are

refined These altogether are the unwanted portion for recognition In order

to eliminate these effects refinement is done in the footstep of the detection

of sclera area Fig shows after the Otsursquos thresholding process and iris and

eyelid refinement to detect right sclera area In the same way the left sclera

area is detected using this method

FIG

In the segmentation process all images are not perfectly segmented

Hence feature extraction and matching are needed to reduce the

segmentation fault The vein patterns in the sclera area are not visible in the

segmentation process To get vein patterns more visible vein pattern

enhancement is to be performed

224 OCULAR SURFACE VASCULATURE

Human recognition using vascular patterns in the human body has

been studied in the context of fingers (Miura et al 2004) palm (Lin and

Fan 2004) and retina (Hill 1999) In the case of retinal biometrics an

especial optical device for imaging the back of the eyeball is needed (Hill

1999) Due to its perceived invasiveness and the required degree of subject

cooperation the use of retinal biometrics may not be acceptable to some

individuals The conjunctiva is a thin transparent and moist tissue that

covers the outer surface of the eye The part of the conjunctiva that covers

the inner lining of the eyelids is called palpebral conjunctiva and the part

that covers the outer surface of the eye is called ocular (or the bulbar)

conjunctiva which is the focus of this study The ocular conjunctiva is very

thin and clear thus the vasculature (including those of the episclera) is

easily visible through it The visible microcirculation of conjunctiva offers a

rich and complex network of veins and fine microcirculation (Fig 1) The

apparent complexity and specificity of these vascular patterns motivated us

to utilize them for personal identification (Derakhshani and Ross 2006)

FIG

We have found conjunctival vasculature to be a suitable biometric as it

conforms to the following criteria (Jain et al 2004)

UNIVERSALITY All normal living tissues including that of the

conjunctiva and episclera have vascular structure

UNIQUENESS Vasculature is created during embryonic vasculogenesis

Its detailed final structure is mostly stochastic and thus unique Even

though no comprehensive study on the uniqueness of vascular structures

has been conducted study of some targeted areas such as those of the eye

fundus confirm the uniqueness of such vascular patterns even between

identical twins (Simon and Goldstein 1935 Tower 1955)

PERMANENCE Other than cases of significant trauma pathology or

chemical intervention spontaneous adult ocular vasculogenesis and

angiogenesis does not easily occur Thus the conjunctival vascular

structure is expected to have reasonable permanence (Joussen 2001)

Practicality Conjunctival vasculature can be captured with commercial off

the shelf digital cameras under normal lighting conditions making this

modality highly practical

ACCEPTABILITY Since the subject is not required to stare directly into

the camera lens and given the possibility of capturing the conjunctival

vasculature from several feet away this modality is non-intrusive and thus

more acceptable

SPOOF-PROOFNESS The fine multi surface structure of the ocular

veins makes them hard to reproduce as a physical artifact Besides being a

stand-alone biometric modality we anticipate that the addition of

conjunctival biometrics will enhance the performance of current iris-based

biometric system in the following ways

Improving accuracy by the addition of vascular features

Facilitating recognition using off-angle iris images For instance if the iris

information is relegated to the left or right portions of the eye the sclera

vein patterns will be further exposed This feature makes sclera vasculature

a natural complement to the iris biometric

Addressing the failure-to-enroll issue when iris patterns are not usable (eg

due to surgical procedures)

Reducing vulnerability to spoof attacks For instance when implemented

alongside iris systems an attacker needs to reproduce not only the iris but

also different surfaces of the sclera along with the associated

microcirculation and make them available on commensurate eye surfaces

The first step in parallelizing an algorithm is to determine the

availability for simultaneous computation Below Figure demonstrates the

possibility for parallel directional filtering Since the filter is computed over

different portions of the input image the computation can be computed in

parallel (denoted by Elements below) In addition individual parallelization

of each element of Filtering can also be performed A detailed discussion of

our proposed parallelization is outside the scope of this paper

FIG

FIG

225 OVERVIEW OF THE LINE DESCRIPTOR-BASED SCLERA

VEIN

2251 RECOGNITION METHOD

The matching segment of the line-descriptor based method is a

bottleneck with regard to matching speed In this section we briefly

describe the Line Descriptor-based sclera vein recognition method After

segmentation vein patterns were enhanced by a bank of directional Gabor

filters Binary morphological operations are used to thin the detected vein

structure down to a single pixel wide skeleton and remove the branch

points The line descriptor is used to describe the segments in the vein

structure Figure 2 shows a visual description of the line descriptor Each

segment is described by three quantities the segments angle to some

reference angle at the iris center θ the segments distance to the iris center r

and the dominant angular orientation of the line segment ɸ Thus the

descriptor is S = ( θ r ɸ )T The individual components of the line descriptor

are calculated as

FIG

Here fline (x) is the polynomial approximation of the line segment (xl yl )

is the center point of the line segment (xi yi ) is the center of the detected

iris and S is the line descriptor In order to register the segments of the

vascular patterns a RANSAC-based algorithm is used to estimate the best-

fit parameters for registration between the two sclera vascular patterns For

the registration algorithm it randomly chooses two points ndash one from the

test template and one from the target template It also randomly chooses a

scaling factor and a rotation value based on a priori knowledge of the

database Using these values it calculates a fitness value for the registration

using these parameters

After sclera template registration each line segment in the test

template is compared to the line segments in the target template for

matches In order to reduce the effect of segmentation errors we created the

weighting image (Figure 3) from the sclera mask by setting interior pixels

in the sclera mask to 1 pixels within some distance of the boundary of the

mask to 05 and pixels outside the mask to 0

The matching score for two segment descriptors is calculated By

where Si and Sj are two segment descriptors m(Si Sj ) is the matching

score between segments Si and Sj d(Si Sj ) is the Euclidean distance

between the segment descriptors center points (from Eq 6-8) Dmatch is

the matching distance threshold and match is the matching angle threshold

The total matching score M is the sum of the individual matching scores

divided by the maximum matching score for the minimal set between the

test and target template That is one of the test or target templates has fewer

points and thus the sum of its descriptors weight sets the maximum score

that can be attained

FIG

FIG

FIG

FIG

movement of eye Y shape branches are observed to be a stable feature and

can be used as sclera feature descriptor To detect the Y shape branches in

the original template we search for the nearest neighbors set of every line

segment in a regular distance classified the angles among these neighbors

If there were two types of angle values in the line segment set this set may

be inferred as a Y shape structure and the line segment angles would be

recorded as a new feature of the sclera

There are two ways to measure both orientation and relationship of

every branch of Y shape vessels one is to use the angles of every branch to

x axle the other is to use the angels between branch and iris radial

direction The first method needs additional rotation operating to align the

template In our approach we employed the second method As Figure 6

shows ϕ1 ϕ2 and ϕ3 denote the angle between each branch and the radius

from pupil center Even when the head tilts the eye moves or the camera

zooms occurs at the image acquisition step ϕ1 ϕ2 and ϕ3 are quite stable

To tolerate errors from the pupil center calculation in the segmentation step

we also recorded the center position (x y) of the Y shape branches as

auxiliary parameters So in our rotation shift and scale invariant feature

vector is defined as y(ϕ1 ϕ2 ϕ3 x y) The Y-shape descriptor is generated

with reference to the iris center Therefore it is automatically aligned to the

iris centers It is a rotational- and scale- invariant descriptor V WPL

SCLERA DESCRIPTOR As we discussed in the Section 22 the line

descriptor is extracted from the skeleton of vessel structure in binary images

(Figure 7) The skeleton is then broken into smaller segments For each

segment a line descriptor is created to record the center and orientation of

the segment This descriptor is expressed as s(x yɸ) where (x y) is the

position of the center and ɸ is its orientation Because of the limitation of

segmentation accuracy the descriptor in the boundary of sclera area might

not be accurate and may contain spur edges resulting from the iris eyelid

andor eyelashes To be tolerant of such error the mask file

FIG

The line descriptor of the sclera vessel pattern (a) An eye image (b) Vessel

patterns in sclera (c) Enhanced sclera vessel patterns (d) Centers of line

segments of vessel patterns

Is designed to indicate whether a line segment belongs to the edge of the

sclera or not However in GPU application using the mask is a challenging

since the mask files are large in size and will occupy the GPU memory and

slow down the data transfer When matching the registration RANSAC

type algorithm was used to randomly select the corresponding descriptors

and the transform parameter between them was used to generate the

template transform affine matrix After every templates transform the mask

data should also be transformed and new boundary should be calculated to

evaluate the weight of the transformed descriptor This results in too many

convolutions in processor unit

To reduce heavy data transfer and computation we designed the

weighted polar line (WPL) descriptor structure which includes the

information of mask and can be automatically aligned We extracted the

relationship of geometric feature of descriptors and store them as a new

descriptor We use a weighted image created via setting various weight

values according to their positions The weight of those descriptors who are

beyond the sclera are set to be 0 and those who are near the sclera

boundary are 05 and interior descriptors are set to be 1 In our work

descriptors weights were calculated on their own mask by the CPU only

once

The calculating result was saved as a component of descriptor The

descriptor of sclera will change to s(x y ɸw) where w denotes the weight

of the point and the value may be 0 05 1 To align two templates when a

template is shifted to another location along the line connecting their

centers all the descriptors of that template will be transformed It would be

faster if two templates have similar reference points If we use the center of

the iris as the reference point when two templates are compared the

correspondence will automatically be aligned to each other since they have

the similar reference point Every feature vector of the template is a set of

line segment descriptors composed of three variable (Figure 8) the

segment angle to the reference line which went through the iris center

denoted as θ the distance between the segments center and pupil center

which is denoted as r the dominant angular orientation of the segment

denoted as ɸ To minimize the GPU computing we also convert the

descriptor value from polar coordinate to rectangular coordinate in CPU

preprocess

The descriptor vector becomes s(x y r θ ɸw) The left and right

parts of sclera in an eye may have different registration parameters For

example as an eyeball moves left left part sclera patterns of the eye may be

compressed while the right part sclera patterns are stretched

In parallel matching these two parts are assigned to threads in

different warps to allow different deformation The multiprocessor in

CUDA manages threads in groups of 32 parallel threads called warps We

reorganized the descriptor from same sides and saved

FIG

FIG

them in continuous address This would meet requirement of coalesced

memory access in GPU

After reorganizing the structure of descriptors and adding mask information

into the new descriptor the computation on the mask file is not needed on

the GPU It was very fast to match with this feature because it does not

need to reregister the templates every time after shifting Thus the cost of

data transfer and computation on GPU will be reduced Matching on the

new descriptor the shift parameters generator in Figure 4 is then simplified

as Figure 9

23 EVOLUTION OF GPU ARCHITECTURE

The fixed-function pipeline lacked the generality to efficiently express

more complicated shading and lighting operations that are essential for

complex effects The key step was replacing the fixed-function per-vertex

and per-fragment operations with user-specified programs run on each

vertex and fragment Over the past six years these vertex programs and

fragment programs have become increasingly more capable with larger

limits on their size and resource consumption with more fully featured

instruction sets and with more flexible control-flow operations After many

years of separate instruction sets for vertex and fragment operations current

GPUs support the unified Shader Model 40 on both vertex and fragment

shaders

The hardware must support shader programs of at least 65 k static

instructions and unlimited dynamic instructions

The instruction set for the first time supports both 32-bit integers and 32-

bit floating-point numbers

The hardware must allow an arbitrary number of both direct and indirect

reads from global memory (texture)

Finally dynamic flow control in the form of loops and branches must be

supported

As the shader model has evolved and become more powerful and GPU

applications of all types have increased vertex and fragment program

complexity GPU architectures have increasingly focused on the

programmable parts of the graphics pipeline Indeed while previous

generations of GPUs could best be described as additions of

programmability to a fixed-function pipeline todayrsquos GPUs are better

characterized as a programmable engine surrounded by supporting fixed-

function units General-Purpose Computing on the GPU Mapping general-

purpose computation onto the GPU uses the graphics hardware in much the

same way as any standard graphics application Because of this similarity it

is both easier and more difficult to explain the process On one hand the

actual operations are the same and are easy to follow on the other hand the

terminology is different between graphics and general-purpose use Harris

provides an excellent description of this mapping process

We begin by describing GPU programming using graphics terminology

then show how the same steps are used in a general-purpose way to author

GPGPU applications and finally use the same steps to show the more

simple and direct way that todayrsquos GPU computing applications are written

231 PROGRAMMING A GPU FOR GRAPHICS

We begin with the same GPU pipeline that we described in Section II

concentrating on the programmable aspects of this pipeline

The programmer specifies geometry that covers a region on the screen

The rasterizer generates a fragment at each pixel location covered by that

geometry

Each fragment is shaded by the fragment program

The fragment program computes the value of the fragment by a

combination of math operations and global memory reads from a global

Btexture[ memory

The resulting image can then be used as texture on future passes through

the graphics pipeline

232 PROGRAMMING A GPU FOR GENERAL-PURPOSE

PROGRAMS (OLD)

Coopting this pipeline to perform general-purpose computation

involves the exact same steps but different terminology A motivating

example is a fluid simulation computed over a grid at each time step we

compute the next state of the fluid for each grid point from the current state

at its grid point and at the grid points of its neighbors

The programmer specifies a geometric primitive that covers a

computation domain of interest The rasterizer generates a fragment at each

pixel location covered by that geometry (In our example our primitive

must cover a grid of fragments equal to the domain size of our fluid

simulation)

Each fragment is shaded by an SPMD general-purpose fragment

program (Each grid point runs the same program to update the state of its

fluid)

The fragment program computes the value of the fragment by a

combination of math operations and Bgather[ accesses from global

memory (Each grid point can access the state of its neighbors from the

previous time step in computing its current value)

The resulting buffer in global memory can then be used as an input on

future passes (The current state of the fluid will be used on the next time

step)

23 2 PROGRAMMING A GPU FOR GENERAL-PURPOSE

PROGRAMS (NEW)

One of the historical difficulties in programming GPGPU applications

has been that despite their general-purpose tasksrsquo having nothing to do with

graphics the applications still had to be programmed using graphics APIs

In addition the program had to be structured in terms of the graphics

pipeline with the programmable units only accessible as an intermediate

step in that pipeline when the programmer would almost certainly prefer to

access the programmable units directly The programming environments we

describe in detail in Section IV are solving this difficulty by providing a

more natural direct non-graphics interface to the hardware and

specifically the programmable units Today GPU computing applications

are structured in the following way

The programmer directly defines the computation domain of interest as a

structured grid of threads

An SPMD general-purpose program computes the value of each thread

The value for each thread is computed by a combination of math

operations and both Bgather[ (read) accesses from and Bscatter[ (write)

accesses to global memory Unlike in the previous two

methods the same buffer can be used for both reading and writing

allowing more flexible algorithms (for example in-place algorithms that

use less memory)

The resulting buffer in global memory can then be used as an input in

future computation

24 COARSE-TO-FINE TWO-STAGE MATCHING PROCESS

To further improve the matching process we propose the coarse-to-fine

two-stage matching process In the first stage we matched two images

coarsely using the Y-shape descriptors which is very fast to match because

no registration was needed The matching result in this stage can help filter

out image pairs with low similarities After this step it is still possible for

some false positive matches In the second stage we used WPL descriptor

to register the two images for more detailed descriptor matching including

scale- and translation invariance This stage includes shift transform affine

matrix generation and final WPL descriptor matching Overall we

partitioned the registration and matching processing into four kernels2 in

CUDA (Figure 10) matching on the Y shape descriptor shift

transformation affine matrix generation and final WSL descriptor

matching Combining these two stages the matching program can run faster

and

achieve more accurate score

241 STAGE I MATCHING WITH Y SHAPE DESCRIPTOR

Due to scale- and rotation- invariance of the Y-shape features

registration is unnecessary before matching on Y shape descriptor The

whole matching algorithm is listed as algorithm 1

FIG

Here ytei and yta j are the Y shape descriptors of test template Tte

and target template Tta respectively dϕ is the Euclidian distance of angle

element of descriptors vector defined as (3) dxy is the Euclidian distance of

two descriptor centers defined as (4) ni and di are the matched descriptor

pairsrsquo number and their centers distance respectively tϕ is a distance

threshold and txy is the threshold to restrict the searching area We set tϕ to

30 and txy to 675 in our experiment Here

To match two sclera templates we searched the areas nearby to all

the Y shape branches The search area is limited to the corresponding left or

right half of the sclera in order to reduce the searching range and time The

distance of two branches is defined in (3) where ϕi j is the angle between

the j th branch and the polar from pupil center in desctiptor i

The number of matched pairs ni and the distance between Y shape

branches centers di are stored as the matching result We fuse the number of

matched branches and the average distance between matched branches

centers as (2) Here α is a factor to fuse the matching score which was set

to 30 in our study Ni and Nj is the total numbers of feature vectors in

template i and j separately The decision is regulated by the threshold t if

the sclerarsquos matching score is lower than t the sclera will be discarded The

sclera with high matching score will be passed to the next more precisely

matching process

242 STAGE II FINE MATCHING USING WPL DESCRIPTOR

The line segment WSL descriptor reveals more vessel structure detail of

sclera than the Y shape descriptor The variation of sclera vessel pattern is

nonlinear because

When acquiring an eye image in different gaze angle the vessel structure

will appear nonlinear shrink or extend because eyeball is spherical in shape

sclera is made up of four layers episclera stroma lamina fusca and

endothelium There are slightly differences among movement of these

layers Considering these factors our registration employed both single

shift transform and multi-parameter transform which combines shift

rotation and scale together

1) SHIFT PARAMETER SEARCH As we discussed before

segmentation may not be accurate As a result the detected iris center could

not be very accurate Shift transform is designed to tolerant possible errors

in pupil center detection in the segmentation step If there is no deformation

or only very minor deformation registration with shift transform together

would be adequate to achieve an accurate result We designed Algorithm 2

to get optimized shift parameter Where Tte is the test template and ssei is

the i th WPL descriptor of Tte Tta is the target template and ssai is the i th

WPL descriptor of Ttad(stek staj ) is Euclidean distance of escriptors stek

and staj

Δsk is the shift value of two descriptors defines as

We first randomly select an equal number of segment descriptors

stek in test template Tte from each quad and find its nearest neighbors staj _

in target template Tta The shift offset of them is recorded as the possible

registration shift factor _sk The final offset registration factor is _soptim

which has the smallest standard deviation among these candidate offsets

2) AFFINE TRANSFORM PARAMETER SEARCH

Affine transform is designed to tolerant some deformation of sclera

patterns in the matching step The affine transform algorithm is shown in

Algorithm 3 The shift value in the parameter set is obtained by randomly

selecting descriptor s(it )te and calculating the distance from its nearest

neighbor staj_ in Tta We transform the test template by the matrix in (7)

At end of the iteration we count the numbers of matched descriptor pairs

from the transformed template and the target template The factor β is

involved to determine if the pair of descriptor is matched and we set it to

be 20 pixels in our experiment After N iterations the optimized transform

parameter set is determined via selecting the maximum matching numbers

m(it) Here stei Tte staj and Tta is defined same as algorithm 2 tr (it )

shi f t θ(it )tr (it ) scale is the parameters of shift rotation and scale

transform generated in i tth iteration R(θ (it )) T (tr (it ) shi f t ) and S(tr (it

) scale) are the transform matrix defined as (7) To search optimize

transform parameter we iterated N times to generate these parameters In

our experiment we set iteration time to 512

3) REGISTRATION AND MATCHING ALGORITHM

Using the optimized parameter set determined from Algorithms 2

and 3 the test template will be registered and matched simultaneously The

registration and matching algorithm is listed in Algorithm 4 Here stei Tte

staj and Tta are defined same as Algorithms 2 and 3 θ(optm) tr (optm) shi f

t tr (optm) scale _soptim are the registration parameters attained from

Algorithms 2 and 3 R_θ(optm)_T _tr (optm) shi f t _S_tr (optm) scale

_ is the descriptor transform matrix defined in Algorithm 3 empty is the angle

between the segment descriptor and radius direction w is the weight of the

descriptor which indicates whether the descriptor is at the edge of sclera or

not To ensure that the nearest descriptors have a similar orientation we

used a constant factor α to check the abstract difference of two ɸ In our

experiment we set α to 5 The total matching score is minimal score of two

transformed result divided by the minimal matching score for test template

and target template

25 MAPPING THE SUBTASKS TO CUDA

CUDA is a single instruction multiple data (SIMD) system and

works as a coprocessor with a CPU A CUDA consists of many streaming

multiprocessors (SM) where the parallel part of the program should be

partitioned into threads by the programmer and mapped into those threads

There are multiple memory spaces in the CUDA memory hierarchy

register local memory shared memory global memory constant memory

and texture memory Register local memory and shared memory are on-

chip and could be a little time consuming to access these memories Only

shared memory can be accessed by other threads within the same block

However there is only limited availability of shared memory Global

memory constant memory and texture memory are off-chip memory and

accessible by all threads which would be very time consuming to access

these memories

Constant memory and texture memory are read-only and cacheable

memory Mapping algorithms to CUDA to achieve efficient processing is

not a trivial task There are several challenges in CUDA programming

If threads in a warp have different control path all the branches will be

executed serially To improve performance branch divergence within a

warp should be avoided

Global memory is slower than on-chip memory in term of access To

completely hide the latency of the small instructions set we should use on-

chip memory preferentially rather than global memory When global

memory access occurs threads in same warp should access the words in

sequence to achieve coalescence

Shared memory is much faster than the local and global memory space

But shared memory is organized into banks which are equal in size If two

addresses of memory request from different thread within a warp fall in the

same memory bank the access will be serialized To get maximum

performance memory requests should be scheduled to minimize bank

conflicts

251 MAPPING ALGORITHM TO BLOCKS

Because the proposed registration and matching algorithm has four

independent modules all the modules will be converted to different kernels

on the GPU These kernels are different in computation density thus we

map them to the GPU by various map strategies to fully utilize the

computing power of CUDA Figure 11 shows our scheme of CPU-GPU

task distribution and the partition among blocks and threads Algorithm 1 is

partitioned into coarse-grained parallel subtasks

We create a number of threads in this kernel The number of threads

is the same as the number of templates in the database As the upper middle

column shows in Figure 11 each target template will be assigned to one

thread One thread performs a pair of templates compare In our work we

use NVIDIA C2070 as our GPU Threads and blocks number is set to

1024 That means we can match our test template with up to 1024times1024

target templates at same time

Algorithms 2-4 will be partitioned into fine-grained subtasks which is

processed a section of descriptors in one thread As the lower portion of the

middle column shows in Figure 11 we assigned a target template to one

block Inside a block one thread corresponds a set of descriptors in this

template This partition makes every block execute independently and there

are no data exchange requirements between different blocks When all

threads complete their responding descriptor fractions the sum of the

intermediate results needs to be computed or compared A parallel prefix

sum algorithm is used to calculate the sum of intermediate results which is

show in right of Figure 11 Firstly all odd number threads compute the sum

of consecutive pairs of the results Then recursively every first of i (= 4 8

16 32 64 ) threads

compute the prefix sum on the new result The final result will be saved in

the first address which has the same variable name as the first intermediate

result

252 MAPPING INSIDE BLOCK

In shift argument searching there are two schemes we can choose to

map task

Mapping one pair of templates to all the threads in a block and then every

thread would take charge of a fraction of descriptors and cooperation with

other threads

Assigning a single possible shift offset to a thread and all the threads will

compute independently unless the final result should be compared with

other possible offset

Due to great number of sum and synchronization operations in every

nearest neighbor searching step we choose the second method to parallelize

shift searching In affine matrix generator we mapped an entire parameter

set searching to a thread and every thread randomly generated a set of

parameters and tried them independently The generated iterations were

assigned to all threads The challenge of this step is the randomly generated

numbers might be correlated among threads In the step of rotation and

scale registration generating we used the Mersenne Twister pseudorandom

number generator because it can use bitwise arithmetic and have long

period

The Mersenne twister as most of pseudorandom generators is iterative

Therefore itrsquos hard to parallelize a single twister state update step among

several execution threads To make sure that thousands of threads in the

launch grid generate uncorrelated random sequence many simultaneous

Mersenne twisters need to process with different initial states in parallel

But even ldquovery differentrdquo (by any definition) initial state values do not

prevent the emission of correlated sequences by each generator sharing

identical parameters To solve this problem and to enable efficient

implementation of Mersenne Twister on parallel architectures we used a

special offline tool for the dynamic creation of Mersenne Twisters

parameters modified from the algorithm developed by Makoto Matsumoto

and Takuji Nishimura In the registration and matching step when

searching the nearest neighbor a line segment that has already matched

with others should not be used again In our approach a flag

FIG

FIG

Variable denoting whether the line has been matched is stored in

shared memory To share the flags all the threads in a block should wait

synchronic operation at every query step Our solution is to use a single

thread in a block to process the matching

253 MEMORY MANAGEMENT

The bandwidth inside GPU board is much higher than the

bandwidth between host memory and device memory The data transfer

between host and device can lead to long latency As shown in Figure 11

we load the entire target templates set from database without considering

when they would be processed Therefore there was no data transfer from

host to device during the matching procedure In global memory the

components in descriptor y(ϕ1 ϕ2 ϕ3 x y) and s(x y rθ ϕw) were stored

separately This would guarantee contiguous kernels of Algorithm 2 to 4

can access their data in successive addresses Although such coalescing

access reduces the latency frequently global memory access was still a

slower way to get data In our kernel we loaded the test template to shared

memory to accelerate memory access Because the Algorithms 2 to 4

execute different number of iterations on same data the bank conflict does

not happen To maximize our texture memory space we set the system

cache to the lowest value and bonded our target descriptor to texture

memory Using this catchable memory our data access was accelerated

more

FIG

26 HISTOGRAM OF ORIENTED GRADIENTS

Histogram of oriented gradients is the feature descriptors It is primarily

applied to the design of target detection In this paper it is applied as the

feature for human recognition In the sclera region the vein patterns are the

edges of an image So HOG is used to determine the gradient orientation

and edge orientations of vein pattern in the sclera region of an eye image

To follow out this technique first of all divide the image into small

connected regions called cells For each cell compute the histogram of

gradient directions or edge orientations of the pixels Then the combination

of different histogram of different cell represents the descriptor To improve

accuracy histograms can be contrast normalized by calculating the intensity

from the block and then using this value normalizes all cells within the

block This normalization result shows that it is invariant to geometric and

photometric changes The gradient magnitude m(x y) and orientation 1050592(x

y) are calculated using x and y directions gradients dx (x y) and dy (x y)

Orientation binning is the second step of HOG This method utilized

to create cell histograms Each pixel within the cell used to give a weight to

the orientation which is found in the gradient computation Gradient

magnitude is used as the weight The cells are in the rectangular form The

binning of gradient orientation should be spread over 0 to 180 degrees and

opposite direction counts as the same In the Fig 8 depicts the edge

orientation of picture elements If the images have any illumination and

contrast changes then the gradient strength must be locally normalized For

that cells are grouped together into larger blocks These blocks are

overlapping so that each cell contributes more than once to the final

descriptor Here rectangular HOG (R-HOG) blocks are applied which are

mainly in square grids The performance of HOG is improved by putting

on a Gaussian window into each block

FIG

CHAPTER 3

SOFTWARE SPECIFICATION

31 GENERAL

MATLAB(matrix laboratory) is a numerical

computing environment and fourth-generation programming language

Developed by Math Works MATLAB allows matrix manipulations

plotting of functions and data implementation of algorithms creation

of user interfaces and interfacing with programs written in other languages

including C C++ Java and Fortran

Although MATLAB is intended primarily for numerical computing an

optional toolbox uses the MuPAD symbolic engine allowing access

to symbolic computing capabilities An additional package Simulink adds

graphicalmulti-domainsimulationandModel-Based

Design for dynamic and embedded systems

In 2004 MATLAB had around one million users across industry

and academia MATLAB users come from various backgrounds

of engineering science and economics MATLAB is widely used in

academic and research institutions as well as industrial enterprises

MATLAB was first adopted by researchers and practitioners

in control engineering Littles specialty but quickly spread to many other

domains It is now also used in education in particular the teaching

of linear algebra and numerical analysis and is popular amongst scientists

involved in image processing The MATLAB application is built around the

MATLAB language The simplest way to execute MATLAB code is to type

it in the Command Window which is one of the elements of the MATLAB

Desktop When code is entered in the Command Window MATLAB can

be used as an interactive mathematical shell Sequences of commands can

be saved in a text file typically using the MATLAB Editor as a script or

encapsulated into a function extending the commands available

MATLAB provides a number of features for documenting and

sharing your work You can integrate your MATLAB code with other

languages and applications and distribute your MATLAB algorithms and

applications

32 FEATURES OF MATLAB

High-level language for technical computing

Development environment for managing code files and data

Interactive tools for iterative exploration design and problem solving

Mathematical functions for linear algebra statistics Fourier analysis

filtering optimization and numerical integration

2-D and 3-D graphics functions for visualizing data

Tools for building custom graphical user interfaces

Functions for integrating MATLAB based algorithms with external

applications and languages such as C C++ FORTRAN Javatrade COM

and Microsoft Excel

MATLAB is used in vast area including signal and image

processing communications control design test and measurement

financial modeling and analysis and computational Add-on toolboxes

(collections of special-purpose MATLAB functions) extend the MATLAB

environment to solve particular classes of problems in these application

areas

MATLAB can be used on personal computers and powerful

server systems including the Cheaha compute cluster With the addition of

the Parallel Computing Toolbox the language can be extended with parallel

implementations for common computational functions including for-loop

unrolling Additionally this toolbox supports offloading computationally

intensive workloads to Cheaha the campus compute cluster MATLAB is

one of a few languages in which each variable is a matrix (broadly

construed) and knows how big it is Moreover the fundamental operators

(eg addition multiplication) are programmed to deal with matrices when

required And the MATLAB environment handles much of the bothersome

housekeeping that makes all this possible Since so many of the procedures

required for Macro-Investment Analysis involves matrices MATLAB

proves to be an extremely efficient language for both communication and

implementation

321 INTERFACING WITH OTHER LANGUAGES

MATLAB can call functions and subroutines written in the C

programming language or FORTRAN A wrapper function is created

allowing MATLAB data types to be passed and returned The dynamically

loadable object files created by compiling such functions are termed MEX-

files (for MATLAB executable)

Libraries written in Java ActiveX or NET can be directly called

from MATLAB and many MATLAB libraries (for

example XML or SQL support) are implemented as wrappers around Java

or ActiveX libraries Calling MATLAB from Java is more complicated but

can be done with MATLAB extension which is sold separately by Math

Works or using an undocumented mechanism called JMI (Java-to-Mat lab

Interface) which should not be confused with the unrelated Java that is also

called JMI

As alternatives to the MuPAD based Symbolic Math Toolbox

available from Math Works MATLAB can be connected

to Maple or Mathematical

Libraries also exist to import and export MathML

Development Environment

Startup Accelerator for faster MATLAB startup on Windows especially on

Windows XP and for network installations

Spreadsheet Import Tool that provides more options for selecting and

loading mixed textual and numeric data

Readability and navigation improvements to warning and error messages in

the MATLAB command window

Automatic variable and function renaming in the MATLAB Editor

Developing Algorithms and Applications

MATLAB provides a high-level language and development

tools that let you quickly develop and analyze your algorithms and

applications

The MATLAB Language

The MATLAB language supports the vector and matrix operations

that are fundamental to engineering and scientific problems It enables fast

development and execution With the MATLAB language you can

program and develop algorithms faster than with traditional languages

because you do not need to perform low-level administrative tasks such as

declaring variables specifying data types and allocating memory In many

cases MATLAB eliminates the need for lsquoforrsquo loops As a result one line of

MATLAB code can often replace several lines of C or C++ code

At the same time MATLAB provides all the features of a traditional

programming language including arithmetic operators flow control data

structures data types object-oriented programming (OOP) and debugging

features

MATLAB lets you execute commands or groups of commands one

at a time without compiling and linking enabling you to quickly iterate to

the optimal solution For fast execution of heavy matrix and vector

computations MATLAB uses processor-optimized libraries For general-

purpose scalar computations MATLAB generates machine-code

instructions using its JIT (Just-In-Time) compilation technology

This technology which is available on most platforms provides

execution speeds that rival those of traditional programming languages

Development Tools

MATLAB includes development tools that help you implement

your algorithm efficiently These include the following

MATLAB Editor

Provides standard editing and debugging features such as setting

breakpoints and single stepping

Code Analyzer

Checks your code for problems and recommends modifications to

maximize performance and maintainability

MATLAB Profiler

Records the time spent executing each line of code

Directory Reports

Scan all the files in a directory and report on code efficiency file

differences file dependencies and code coverage

Designing Graphical User Interfaces

By using the interactive tool GUIDE (Graphical User Interface

Development Environment) to layout design and edit user interfaces

GUIDE lets you include list boxes pull-down menus push buttons radio

buttons and sliders as well as MATLAB plots and Microsoft

ActiveXreg controls Alternatively you can create GUIs programmatically

using MATLAB functions

322 ANALYZING AND ACCESSING DATA

MATLAB supports the entire data analysis process from acquiring

data from external devices and databases through preprocessing

visualization and numerical analysis to producing presentation-quality

output

Data Analysis

MATLAB provides interactive tools and command-line functions for data

analysis operations including

Interpolating and decimating

Extracting sections of data scaling and averaging

Thresholding and smoothing

Correlation Fourier analysis and filtering

1-D peak valley and zero finding

Basic statistics and curve fitting

Matrix analysis

Data Access

MATLAB is an efficient platform for accessing data from

files other applications databases and external devices You can read data

from popular file formats such as Microsoft Excel ASCII text or binary

files image sound and video files and scientific files such as HDF and

HDF5 Low-level binary file IO functions let you work with data files in

any format Additional functions let you read data from Web pages and

XML

Visualizing Data

All the graphics features that are required to visualize engineering

and scientific data are available in MATLAB These include 2-D and 3-D

plotting functions 3-D volume visualization functions tools for

interactively creating plots and the ability to export results to all popular

graphics formats You can customize plots by adding multiple axes

changing line colors and markers adding annotation Latex equations and

legends and drawing shapes

2-D Plotting

Visualizing vectors of data with 2-D plotting functions that create

Line area bar and pie charts

Direction and velocity plots

Histograms

Polygons and surfaces

Scatterbubble plots

Animations

3-D Plotting and Volume Visualization

MATLAB provides functions for visualizing 2-D matrices 3-

D scalar and 3-D vector data You can use these functions to visualize and

understand large often complex multidimensional data Specifying plot

characteristics such as camera viewing angle perspective lighting effect

light source locations and transparency

3-D plotting functions include

Surface contour and mesh

Image plots

Cone slice stream and isosurface

323 PERFORMING NUMERIC COMPUTATION

MATLAB contains mathematical statistical and engineering

functions to support all common engineering and science operations These

functions developed by experts in mathematics are the foundation of the

MATLAB language The core math functions use the LAPACK and BLAS

linear algebra subroutine libraries and the FFTW Discrete Fourier

Transform library Because these processor-dependent libraries are

optimized to the different platforms that MATLAB supports they execute

faster than the equivalent C or C++ code

MATLAB provides the following types of functions for performing

mathematical operations and analyzing data

Matrix manipulation and linear algebra

Polynomials and interpolation

Fourier analysis and filtering

Data analysis and statistics

Optimization and numerical integration

Ordinary differential equations (ODEs)

Partial differential equations (PDEs)

Sparse matrix operations

MATLAB can perform arithmetic on a wide range of data types

including doubles singles and integers

CHAPTER 4

IMPLEMENTATION

41 GENERAL

Matlab is a program that was originally designed to simplify the

implementation of numerical linear algebra routines It has since grown into

something much bigger and it is used to implement numerical algorithms

for a wide range of applications The basic language used is very similar to

standard linear algebra notation but there are a few extensions that will

likely cause you some problems at first

42 SNAPSHOTS

ORIGINAL SCLERA IMAGE IS CONVERTED INTO GREY SCALE IMAGE

FIG

GREY SCALE IMAGE IS CONVERTED INTO BINARY IMAGE

FIG

EDGE DETECTON IS DONE BY OTSUrsquoS THRESHOLDING

FIG

SELECTING THE REGION OF INTEREST (SCLERA PART)

FIG

SELECTED ROI PART

FIG

FIG

ENHANCEMENT OF SCLERA IMAGE

FIG

FEATURE EXTRACTION OF SCLERA IMAGE USING GABOR

FILTERS

FIG

MATCHING WITH IMAGES IN DATABASE

FIG

DISPLAYING THE RESULT (MATCHED OR NOT MATCHED)

FIG

CHAPTER 5

APPLICATIONS

The applications of biometrics can be divided into the following three main groups

Commercial applications such as computer network login electronic data security ecommerce Internet access ATM credit card physical access control cellular phone PDA medical records management distance learning etc

Government applications such as national ID card correctional facility driverrsquos license social security welfare-disbursement border control Passport control etc

Forensic applications such as corpse identification criminal investigation terrorist identification parenthood determination missing children etc Traditionally commercial applications have used knowledge-based systems (eg PIN sand passwords) government applications have used token-based systems (eg ID cards and badges) and forensic applications have relied on human experts to match biometric features Biometric systems are being increasingly deployed in large scale civilian applications The Schiphol Premium scheme at the Amsterdam airport for example employs iris scan cards to speed up the passport and visa control procedures

CHAPTER 6

CONCLUSION AND FUTURE SCOPE

61 CONCLUSION

In this paper we proposed a new parallel sclera vein recognition

method which employees a two stage parallel approach for registration and

matching Even though the research focused on developing a parallel sclera

matching solution for the sequential line-descriptor method using CUDA

GPU architecture the parallel strategies developed in this research can be

applied to design parallel solutions to other sclera vein recognition methods

and general pattern recognition methods We designed the Y shape

descriptor to narrow the search range to increase the matching efficiency

which is a new feature extraction method to take advantage of the GPU

structures We developed the WPL descriptor to incorporate mask

information and make it more suitable for parallel computing which can

dramatically reduce data transferring and computation We then carefully

mapped our algorithms to GPU threads and blocks which is an important

step to achieve parallel computation efficiency using a GPU A work flow

which has high arithmetic intensity to hide the memory access latency was

designed to partition the computation task to the heterogeneous system of

CPU and GPU even to the threads in GPU The proposed method

dramatically improves the matching efficiency without compromising

recognition accuracy

62 REFERENCES

[1] C W Oyster The Human Eye Structure and Function Sunderland

Sinauer Associates 1999

[2] C Cuevas D Berjon F Moran and N Garcia ldquoMoving object

detection for real-time augmented reality applications in a GPGPUrdquo IEEE

Trans Consum Electron vol 58 no 1 pp 117ndash125 Feb 2012

[3] D C Cirean U Meier L M Gambardella and J Schmidhuber ldquoDeep

big simple neural nets for handwritten digit recognitionrdquo Neural Comput

vol 22 no 12 pp 3207ndash3220 2010

[4] F Z Sakr M Taher and A M Wahba ldquoHigh performance iris

recognition system on GPUrdquo in Proc ICCES 2011 pp 237ndash242

[5] G Poli J H Saito J F Mari and M R Zorzan ldquoProcessing

neocognitron of face recognition on high performance environment based

on GPU with CUDA architecturerdquo in Proc 20th Int Symp Comput

Archit High Perform Comput 2008 pp 81ndash88

[6] J Antikainen J Havel R Josth A Herout P Zemcik and M Hauta-

Kasari ldquoNonnegative tensor factorization accelerated using GPGPUrdquo IEEE

Trans Parallel Distrib Syst vol 22 no 7 pp 1135ndash1141 Feb 2011

[7] K-S Oh and K Jung ldquoGPU implementation of neural networksrdquo

Pattern Recognit vol 37 no 6 pp 1311ndash1314 2004

[8] P R Dixon T Oonishi and S Furui ldquoHarnessing graphics processors

for the fast computation of acoustic likelihoods in speech recognitionrdquo

Comput Speech Lang vol 23 no 4 pp 510ndash526 2009

[9] P Kaufman and A Alm ldquoClinical applicationrdquo Adlerrsquos Physiology of

the Eye 2003

[10] R N Rakvic B J Ulis R P Broussard R W Ives and N Steiner

ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics Security vol 4

no 4 pp 812ndash823 Dec 2009

[11] S Crihalmeanu and A Ross ldquoMultispectral scleral patterns for ocular

biometric recognitionrdquo Pattern Recognit Lett vol 33 no 14 pp 1860ndash

1869 Oct 2012

[12] W Wenying Z Dongming Z Yongdong L Jintao and G

Xiaoguang ldquoRobust spatial matching for object retrieval and its parallel

implementation on GPUrdquo IEEE Trans Multimedia vol 13 no 6 pp

1308ndash1318 Dec 2011Multimedia Sec Magdeburg Germany Sep 2004

pp 4ndash15

[13] Y Xu S Deka and R Righetti ldquoA hybrid CPU-GPGPU approach for

real-time elastographyrdquo IEEE Trans Ultrason Ferroelectr Freq Control

vol 58 no 12 pp 2631ndash2645 Dec 2011

[14] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive

multimodal eye recognitionrdquo Signal Image Video Process vol 7 no 4

pp 619ndash631 Jul 2013

[15] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive

approach for sclera image quality measurerdquo Int J Biometrics vol 5 no 2

pp 181ndash198 2013

[16] Z Zhou E Y Du N L Thomas and E J Delp ldquoA new human

identification method Sclera recognitionrdquo IEEE Trans Syst Man

Cybern A Syst Humans vol 42 no 3 pp 571ndash583 May 2012

Page 20: Major project report on

154 INDEXED IMAGE

FIG

An indexed image consists of an array and a color map matrix The

pixel values in the array are direct indices into a color map By convention

this documentation uses the variable name X to refer to the array and map

to refer to the color map In computing indexed color is a technique to

manage digital images colors in a limited fashion in order to save

computer memory and file storage while speeding up display refresh and

file transfers It is a form of vector quantization compression

When an image is encoded in this way color information is not

directly carried by the image pixel data but is stored in a separate piece of

data called a palette an array of color elements in which every element a

color is indexed by its position within the array The image pixels do not

contain the full specification of its color but only its index in the palette

This technique is sometimes referred as pseudocolor or indirect color as

colors are addressed indirectly

Perhaps the first device that supported palette colors was a random-

access frame buffer described in 1975 by Kajiya Sutherland and Cheadle

This supported a palette of 256 36-bit RGB colors

16 Applications of image processing

Interest in digital image processing methods stems from 2 principal

application areas

1) Improvement of pictorial information for human interpretation

2) Processing of scene data for autonomous machine perception

In the second application area interest focuses on procedures for

extracting from an image

Information in a form suitable for computer processing

Examples include automatic character recognition industrial machine

vision for product assembly and inspection military recognizance

automatic processing of fingerprints etc

17 EXISTING SYSTEM

In Crihalmeanu and Ross proposed three approaches Speed Up Robust

Features (SURF)-based method minutiae detection and direct correlation

matching for feature registration and matching Within these three methods

the SURF method achieves the best accuracy It takes an average of 15

seconds1 using the SURF method to perform a one-to-one matching In

Zhou et al Proposed line descriptor-based method for sclera vein

recognition The matching step (including registration) is the most time-

consuming step in this sclera vein recognition system which costs about 12

seconds to perform a one-to-one matching Both speed was calculated using

a PC with Intelreg Core TM 2 Duo 24GHz processors and 4 GB DRAM

Currently Sclera vein recognition algorithms are designed using central

processing unit (CPU)-based systems

171 DISADVANTAGES OF EXISTING SYSTEM

1 Mask files are used to calculate valid overlapping areas of two sclera

templates and to align the templates to the same coordinate system But the

mask files are large in size and will preoccupy the GPU memory and slow

down the data transfer Also some of processing on the mask files will

involve convolution which is difficult to improve its performance on the

scalar process unit on CUDA

2 The procedure of sclera feature matching consists of a pipeline of several

computational stages with different memory and processing requirements

There is no uniform mapping scheme applicable for all these stages

3 When the scale of sclera database is far larger than the number of the

processing units on the GPU parallel matching on the GPU is still unable to

satisfy the requirement of real-time performance

LITERATURE SURVEY

1 S Crihalmeanu and A Ross ldquo ieee Multispectral scleral patterns

for ocular biometric recognitionrdquo Pattern Recognit Lett vol 33 no

14 pp 1860ndash1869 Oct 2012

Face recognition in unconstrained acquisition conditions is one of the

most challenging problems that has been actively researched in recent

years It is well known that many state-of-the-arts still face recognition

algorithms perform well when constrained (frontal well illuminated high-

resolution sharp and full) face images are acquired However their

performance degrades significantly when the test images contain variations

that are not present in the training images In this paper we highlight some

of the key issues in remote face recognition We define the remote face

recognition as one where faces are several tens of meters (10-250m) from

the cameras We then describe a remote face database which has been

acquired in an unconstrained outdoor maritime environment Recognition

performance of a subset of existing still image-based face recognition

algorithms is evaluated on the remote face data set Further we define the

remote re-identification problem as matching a subject at one location with

candidate sets acquired at a different location and over time in remote

conditions We provide preliminary experimental results on remote re-

identification It is demonstrated that in addition to applying a good

classification algorithm finding features that are robust to variations

mentioned above and developing statistical models which can account for

these variations are very important for remote face recognition

2 R N Rakvic B J Ulis R P Broussard R W Ives and N

Steiner1 ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics

Security

With the rapidly expanded biometric data collected by various sectors

of government and industry for identification and verification purposes

how to manage and process such Big Data draws great concern Even

though modern processors are equipped with more cores and memory

capacity it still requires careful design in order to utilize the hardware

resource effectively and the power consumption efficiently This research

addresses this issue by investigating the workload characteristics of

biometric application Taking Daugmanrsquos iris matching algorithm which

has been proven to be the most reliable iris matching method as a case

study we conduct performance profiling and binary instrumentation on the

benchmark to capture its execution behavior The results show that data

loading and memory access incurs great performance overhead and

motivates us to move the biometrics computation to high-performance

architecture

Modern iris recognition algorithms can be computationally intensive

yet are designed for traditional sequential processing elements such as a

personal computer However a parallel processing alternative using field

programmable gate arrays (FPGAs) offers an opportunity to speed up iris

recognition Within the means of this project iris template generation with

directional filtering which is acomputationally expensive yet parallel

portion of a modern iris recognition algorithm is parallelized on an FPGA

system We will present a performance comparison of the parallelized

algorithm on the FPGA system to a traditional CPU-based version The

parallelized template generation outperforms an optimized C++ code

version determining the information content of an iris approximately 324

times faster

3 R Derakhshani A Ross and S Crihalmeanu ldquoA new biometric

modality based on conjunctival vasculaturerdquo in Proc Artif Neural

Netw Eng 2006 pp 1ndash8

A new biometric indicator based on the patterns of conjunctival

vasculature is proposed Conjunctival vessels can be observed on the visible

part of the sclera that is exposed to the outside world These vessels

demonstrate rich and specific details in visible light and can be easily

photographed using a regular digital camera In this paper we discuss

methods for conjunctival imaging preprocessing and feature extraction in

order to derive a suitable conjunctival vascular template for biometric

authentication Commensurate classification methods along with the

observed accuracy are discussed Experimental results suggest the potential

of using conjunctival vasculature as a biometric measure Identification of

a person based on some unique set of features is an important task The

human identification is possible with several biometric systems and sclera

recognition is one of the promising biometrics The sclera is the white

portion of the human eye The vein pattern seen in the sclera region is

unique to each person Thus the sclera vein pattern is a well suited

biometric technology for human identification The existing methods used

for sclera recognition have some drawbacks like only frontal looking

images are preferred for matching and rotation variance is another problem

These problems are completely eliminated in the proposed system by using

two feature extraction techniques They are Histogram of Oriented

Gradients (HOG) and converting the image into polar form using the

bilinear interpolation technique These two features help the proposed

system to become illumination invariant and rotation invariant The

experimentation is done with the help of UBIRIS database The

experimental result shows that the proposed sclera recognition method can

achieve better accuracy than the previous methods

4 J D Owens M Houston D Luebke S Green J E Stone and J

C Phillips ldquoGPU computingrdquo Proc IEEE vol 96 no 5pp 879ndash899

May 2008

The graphics processing unit (GPU) has become an integral part of

todayrsquos mainstream computing systems Over the past six years there has

been a marked increase in the performance and capabilities of GPUs The

modern GPU is not only a powerful graphics engine but also a highly

parallel programmable processor featuring peak arithmetic and memory

bandwidth that substantially outpaces its CPU counterpart The GPUrsquos

rapid increase in both programmability and capability has spawned a

research community that has successfully mapped a broad range of

computationally demanding complex problems to the GPU This effort in

general purpose computing on the GPU also known as GPU computing

has positioned the GPU as a compelling alternative to traditional

microprocessors in high-performance computer systems of the future We

describe the background hardware and programming model for GPU

computing summarize the state of the art in tools and techniques and

present four GPU computing successes in game physics and computational

biophysics that deliver order-of-magnitude performance gains over

optimized CPU applications

5 H Proenccedila and L A Alexandre ldquoUBIRIS A noisy iris image

databaserdquoin Proc 13th Int Conf Image Anal Process 2005 pp 970ndash

977

This paper proposes algorithms for iris segmentation quality

enhancement match score fusion and indexing to improve both the

accuracy and the speed of iris recognition A curve evolution approach is

proposed to effectively segment a nonideal iris image using the modified

MumfordndashShah functional Different enhancement algorithms are

concurrently applied on the segmented iris image to produce multiple

enhanced versions of the iris image A support-vector-machine-based

learning algorithm selects locally enhanced regions from each globally

enhanced image and combines these good-quality regions to create a single

high-quality iris image Two distinct features are extracted from the high-

quality iris image The global textural feature is extracted using the 1-D log

polar Gabor transform and the local topological feature is extracted using

Euler numbers An intelligent fusion algorithm combines the textural and

topological matching scores to further improve the iris recognition

performance and reduce the false rejection rate whereas an indexing

algorithm enables fast and accurate iris identification The verification and

identification performance of the propose algorithms is validated and

compared with other algorithms using the CASIA Version 3 ICE 2005 and

UBIRIS iris databases

18 PROPOSED METHOD

Proposed a new parallel sclera vein recognition method using a two-

stage parallel approach for registration and matching A parallel sclera

matching solution for Sclera vein recognition using our sequential line-

descriptor method using the CUDA GPU architecture CUDA is a highly

parallel multithreaded many-core processor with tremendous

computational power

It supports not only a traditional graphics pipeline but also computation

on non-graphical data It is relatively straightforward to implement our C

program for CUDA on AMD-based GPU using Open CL Our CUDA

kernels can be directly converted to Open CL kernels by concerning

different syntax for various keywords and built-in functions The mapping

strategy is also effective in Open CL if we regard thread and block in

CUDA as work item and work-group in Open CL Most of our optimization

techniques such as coalesced memory access and prefix sum can work in

Open CL too Moreover since CUDA is a data parallel architecture the

implementation of our approach by Open CL should be programmed in

data-parallel model

In this research we first discuss why the naiumlve parallel approach would

not work We then propose the new sclera descriptor ndash the Y shape sclera

feature-based efficient registration method to speed up the mapping scheme

introduce the ldquoweighted polar line (WPL) descriptorrdquo that would be better

suited for parallel computing to mitigate the mask size issue and develop

our coarse to fine two-stage matching process to dramatically improve the

matching speed These new approaches make the parallel processing

possible and efficient

191PROPOSED SYSTEM ADVANTAGES

1 To improve the efficiency in this research we propose a new descriptor

mdash the Y shape descriptor which can greatly help improve the efficiency of

the coarse registration of two images and can be used to filter out some

non-matching pairs before refined matching

2 We propose the coarse-to-fine two-stage matching process In the first

stage we matched two images coarsely using the Y-shape descriptors

which is very fast to match because no registration was needed The

matching result in this stage can help filter out image pairs with low

similarities

CHAPTER 2

PROJECT DESCRIPTION

21 INTRODUCTION

The sclera is the opaque and white outer layer of the eye The blood

vessel structure of sclera is formed randomly and is unique to each person

which can be used for humanrsquos identification Several researchers have

designed different Sclera vein recognition methods and have shown that it

is promising to use Sclera vein recognition for human identification In

Crihalmeanu and Ross proposed three approaches Speed Up Robust

Features (SURF)-based method minutiae detection and direct correlation

matching for feature registration and matching Within these three methods

the SURF method achieves the best accuracy It takes an average of 15

seconds1 using the SURF method to per- form a one-to-one matching Zhou

et al proposed line descriptor-based method for sclera vein recognition

The matching step (including registration) is the most time-consuming step

in this sclera vein recognition system which costs about 12 seconds to

perform a one-to-one matching Both speed was calculated using a PC with

Intelreg Coretrade 2 Duo 24GHz processors and 4 GB DRAM Currently

Sclera vein recognition algorithms are designed using central processing

unit (CPU)-based systems

As discussed CPU-based systems are designed as sequential

processing devices which may not be efficient in data processing where the

data can be parallelized Because of large time consumption in the matching

step Sclera vein recognition using sequential-based method would be very

challenging to be implemented in a real time biometric system especially

when there is large number of templates in the database for matching GPUs

(as abbreviation of General purpose Graphics Processing Units GPGPUs)

are now popularly used for parallel computing to improve the

computational processing speed and efficiency The highly parallel

structure of GPUs makes them more effective than CPUs for data

processing where processing can be performed in parallel GPUs have been

widely used in biometrics recognition such as speech recognition text

detection handwriting recognition and face recognition In iris

recognition GPU was used to extract the features construct descriptors

and match templates

GPUs are also used for object retrieval and image search Park et al

designed the performance evaluation of image processing algorithms such

as linear feature extraction and multi-view stereo matching on GPUs

However these approaches were designed for their specific biometric

recognition applications and feature searching methods Therefore they may

not be efficient for Sclera vein recognition Compute Unified Device

Architecture (CUDA) the computing engine of NVIDIA GPUs is used in

this research CUDA is a highly parallel multithreaded many-core

processor with tremendous computational power It supports not only a

traditional graphics pipeline but also computation on non-graphical data

More importantly it offers an easier programming platform which

outperforms its CPU counterparts in terms of peak arithmetic intensity and

memory bandwidth In this research the goal is not to develop a unified

strategy to parallelize all sclera matching methods because each method is

quite different from one another and would need customized design To

develop an efficient parallel computing scheme it would need different

strategies for different Sclera vein recognition methods

Rather the goal is to develop a parallel sclera matching solution for

Sclera vein recognition using our sequential line-descriptor method using

the CUDA GPU architecture However the parallelization strategies

developed in this research can be applied to design parallel approaches for

other Sclera vein recognition methods and help parallelize general pattern

recognition methods Based on the matching approach in there are three

challenges to map the task of sclera feature matching to GPU

1) Mask files are used to calculate valid overlapping areas of two sclera

templates and to align the templates to the same coordinate system But the

mask files are large in size and will preoccupy the GPU memory and slow

down the data transfer Also some of processing on the mask files will

involve convolution which is difficult to improve its performance on the

scalar process unit on CUDA

2) The procedure of sclera feature matching consists of a pipeline of several

computational stages with different memory and processing requirements

There is no uniform mapping scheme applicable for all these stages

3) When the scale of sclera database is far larger than the number of the

processing units on the GPU parallel matching on the GPU is still unable to

satisfy the requirement of real-time performance New designs are

necessary to help narrow down the search range In summary naiumlve

implementation of the algorithms in parallel would not work efficiently

Note it is relatively straightforward to implement our C program for

CUDA on AMD-based GPU using Open CL Our CUDA kernels can be

directly converted to Open CL kernels by concerning different syntax for

various keywords and built-in functions The mapping strategy is also

effective in Open CL if we regard thread and block in CUDA as work item

and work-group in Open CL Most of our optimization techniques such as

coalesced memory access and prefix sum can work in Open CL too

Moreover since CUDA is a data parallel architecture the implementation

of our approach by Open CL should be programmed in data-parallel model

In this research we first discuss why the naiumlve parallel approach would not

work (Section 3) We then propose the new sclera descriptor ndash the Y shape

sclera feature-based efficient registration method to speed up the mapping

scheme (Section 4) introduce the ldquoweighted polar line (WPL) descriptorrdquo

that would be better suited for parallel computing to mitigate the mask size

issue (Section 5) and develop our coarse to fine two-stage matching

process to dramatically improve the matching speed (Section 6) These new

approaches make the parallel processing possible and efficient However it

is non-trivial to implement these algorithms in CUDA We then developed

the implementation schemes to map our algorithms into CUDA (Section 7)

In the Section 2 we give brief introduction of Sclera vein recognition In

the Section 8 we performed some experiments using the proposed system

In the Section 9 we draw some conclusions

22 BACKGROUND OF SCLERA VEIN RECOGNITION

221 OVERVIEW OF SCLERA VEIN RECOGNITION

A typical sclera vein recognition system includes sclera

segmentation feature enhancement feature extraction and feature

matching (Figure 1)

FIG

Sclera image segmentation is the first step in sclera vein recognition

Several methods have been designed for sclera segmentation Crihalmeanu

et al presented an semi-automated system for sclera segmentation They

used a clustering algorithm to classify the color eye images into three

clusters - sclera iris and background Later on Crihalmeanu and Ross

designed a segmentation approach based on a normalized sclera index

measure which includes coarse sclera segmentation pupil region

segmentation and fine sclera segmentation Zhou et al developed a skin

tone plus ldquowhite colorrdquo-based voting method for sclera segmentation in

color images and Otsursquos thresholding-based method for grayscale images

After sclera segmentation it is necessary to enhance and extract the sclera

features since the sclera vein patterns often lack contrast and are hard to

detect Zhou et al used a bank of multi-directional Gabor filters for

vascular pattern enhancement Derakhshani et al used contrast limited

adaptive histogram equalization (CLAHE) to enhance the green color plane

of the RGB image and a multi-scale region growing approach to identify

the sclera veins from the image background Crihalmeanu and Ross applied

a selective enhancement filter for blood vessels to extract features from the

green component in a color image In the feature matching step

Crihalmeanu and Ross proposed

three registration and matching approaches including Speed Up Robust

Features (SURF) which is based on interest-point detection minutiae

detection which is based on minutiae points on the vasculature structure

and direct correlation matching which relies on image registration Zhou et

al designed a line descriptor based feature registration and matching

method

The proposed sclera recognition consists of five steps which include

sclera segmentation vein pattern enhancement feature extraction feature

matching and matching decision Fig 2 shows the block diagram of sclera

recognition Two types of feature extraction are used in the proposed

method to achieve good accuracy for the identification The characteristics

that are elicited from the blood vessel structure seen in the sclera region are

Histogram of Oriented Gradients (HOG) and interpolation of Cartesian to

Polar conversion HOG is used to determine the gradient orientation and

edge orientations of vein pattern in the sclera region of an eye image To

become more computationally efficient the data of the image are converted

to the polar form It is mainly used for circular or quasi circular shape of

object These two characteristics are extracted from all the images in the

database and compared with the features of the query image whether the

person is correctly identified or not This procedure is done in the feature

matching step and ultimately makes the matching decision By using the

proposed feature extraction methods and matching techniques the human

identification is more accurate than the existing studies In the proposed

method two features of an image are drawn out

222 SCLERA SEGMENTATION

Sclera segmentation is the first step in the sclera recognition It lets

in three steps glare area detection sclera area estimation and iris and eyelid

detection and refinement Fig shows the steps of segmentation

FIG

Glare Area Detection Glare area means a small bright area near

pupil or iris This is the unwanted portion on the eye image Sobel filter is

applied to detect the glare area present in the iris or pupil Simply it runs

only for the grayscale image If the image is color then it needs a

conversion to grayscale image and after that apply it to the Sobel filter to

detect the glare area Fig 4 shows the result of the glare area detection

FIG

Sclera area estimation For the estimation of sclera area Otsursquos

thresholding method is applied The stairs of the sclera area detection are

selection of the area of interest (ROI) Otsursquos thresholding sclera area

detection Left and right sclera area is selected based on the iris boundaries

When the region of interest is selected then apply Otsursquos thresholding for

obtaining the potential sclera areas The correct left sclera area should be

placed in the right and center positions and correct right sclera area should

be placed in the left and center In this way non sclera areas are wiped out

223 IRIS AND EYELID REFINEMENT

The top and underside of the sclera regions are the limits of the

sclera area And then that upper eyelid lower eyelid and iris boundaries are

refined These altogether are the unwanted portion for recognition In order

to eliminate these effects refinement is done in the footstep of the detection

of sclera area Fig shows after the Otsursquos thresholding process and iris and

eyelid refinement to detect right sclera area In the same way the left sclera

area is detected using this method

FIG

In the segmentation process all images are not perfectly segmented

Hence feature extraction and matching are needed to reduce the

segmentation fault The vein patterns in the sclera area are not visible in the

segmentation process To get vein patterns more visible vein pattern

enhancement is to be performed

224 OCULAR SURFACE VASCULATURE

Human recognition using vascular patterns in the human body has

been studied in the context of fingers (Miura et al 2004) palm (Lin and

Fan 2004) and retina (Hill 1999) In the case of retinal biometrics an

especial optical device for imaging the back of the eyeball is needed (Hill

1999) Due to its perceived invasiveness and the required degree of subject

cooperation the use of retinal biometrics may not be acceptable to some

individuals The conjunctiva is a thin transparent and moist tissue that

covers the outer surface of the eye The part of the conjunctiva that covers

the inner lining of the eyelids is called palpebral conjunctiva and the part

that covers the outer surface of the eye is called ocular (or the bulbar)

conjunctiva which is the focus of this study The ocular conjunctiva is very

thin and clear thus the vasculature (including those of the episclera) is

easily visible through it The visible microcirculation of conjunctiva offers a

rich and complex network of veins and fine microcirculation (Fig 1) The

apparent complexity and specificity of these vascular patterns motivated us

to utilize them for personal identification (Derakhshani and Ross 2006)

FIG

We have found conjunctival vasculature to be a suitable biometric as it

conforms to the following criteria (Jain et al 2004)

UNIVERSALITY All normal living tissues including that of the

conjunctiva and episclera have vascular structure

UNIQUENESS Vasculature is created during embryonic vasculogenesis

Its detailed final structure is mostly stochastic and thus unique Even

though no comprehensive study on the uniqueness of vascular structures

has been conducted study of some targeted areas such as those of the eye

fundus confirm the uniqueness of such vascular patterns even between

identical twins (Simon and Goldstein 1935 Tower 1955)

PERMANENCE Other than cases of significant trauma pathology or

chemical intervention spontaneous adult ocular vasculogenesis and

angiogenesis does not easily occur Thus the conjunctival vascular

structure is expected to have reasonable permanence (Joussen 2001)

Practicality Conjunctival vasculature can be captured with commercial off

the shelf digital cameras under normal lighting conditions making this

modality highly practical

ACCEPTABILITY Since the subject is not required to stare directly into

the camera lens and given the possibility of capturing the conjunctival

vasculature from several feet away this modality is non-intrusive and thus

more acceptable

SPOOF-PROOFNESS The fine multi surface structure of the ocular

veins makes them hard to reproduce as a physical artifact Besides being a

stand-alone biometric modality we anticipate that the addition of

conjunctival biometrics will enhance the performance of current iris-based

biometric system in the following ways

Improving accuracy by the addition of vascular features

Facilitating recognition using off-angle iris images For instance if the iris

information is relegated to the left or right portions of the eye the sclera

vein patterns will be further exposed This feature makes sclera vasculature

a natural complement to the iris biometric

Addressing the failure-to-enroll issue when iris patterns are not usable (eg

due to surgical procedures)

Reducing vulnerability to spoof attacks For instance when implemented

alongside iris systems an attacker needs to reproduce not only the iris but

also different surfaces of the sclera along with the associated

microcirculation and make them available on commensurate eye surfaces

The first step in parallelizing an algorithm is to determine the

availability for simultaneous computation Below Figure demonstrates the

possibility for parallel directional filtering Since the filter is computed over

different portions of the input image the computation can be computed in

parallel (denoted by Elements below) In addition individual parallelization

of each element of Filtering can also be performed A detailed discussion of

our proposed parallelization is outside the scope of this paper

FIG

FIG

225 OVERVIEW OF THE LINE DESCRIPTOR-BASED SCLERA

VEIN

2251 RECOGNITION METHOD

The matching segment of the line-descriptor based method is a

bottleneck with regard to matching speed In this section we briefly

describe the Line Descriptor-based sclera vein recognition method After

segmentation vein patterns were enhanced by a bank of directional Gabor

filters Binary morphological operations are used to thin the detected vein

structure down to a single pixel wide skeleton and remove the branch

points The line descriptor is used to describe the segments in the vein

structure Figure 2 shows a visual description of the line descriptor Each

segment is described by three quantities the segments angle to some

reference angle at the iris center θ the segments distance to the iris center r

and the dominant angular orientation of the line segment ɸ Thus the

descriptor is S = ( θ r ɸ )T The individual components of the line descriptor

are calculated as

FIG

Here fline (x) is the polynomial approximation of the line segment (xl yl )

is the center point of the line segment (xi yi ) is the center of the detected

iris and S is the line descriptor In order to register the segments of the

vascular patterns a RANSAC-based algorithm is used to estimate the best-

fit parameters for registration between the two sclera vascular patterns For

the registration algorithm it randomly chooses two points ndash one from the

test template and one from the target template It also randomly chooses a

scaling factor and a rotation value based on a priori knowledge of the

database Using these values it calculates a fitness value for the registration

using these parameters

After sclera template registration each line segment in the test

template is compared to the line segments in the target template for

matches In order to reduce the effect of segmentation errors we created the

weighting image (Figure 3) from the sclera mask by setting interior pixels

in the sclera mask to 1 pixels within some distance of the boundary of the

mask to 05 and pixels outside the mask to 0

The matching score for two segment descriptors is calculated By

where Si and Sj are two segment descriptors m(Si Sj ) is the matching

score between segments Si and Sj d(Si Sj ) is the Euclidean distance

between the segment descriptors center points (from Eq 6-8) Dmatch is

the matching distance threshold and match is the matching angle threshold

The total matching score M is the sum of the individual matching scores

divided by the maximum matching score for the minimal set between the

test and target template That is one of the test or target templates has fewer

points and thus the sum of its descriptors weight sets the maximum score

that can be attained

FIG

FIG

FIG

FIG

movement of eye Y shape branches are observed to be a stable feature and

can be used as sclera feature descriptor To detect the Y shape branches in

the original template we search for the nearest neighbors set of every line

segment in a regular distance classified the angles among these neighbors

If there were two types of angle values in the line segment set this set may

be inferred as a Y shape structure and the line segment angles would be

recorded as a new feature of the sclera

There are two ways to measure both orientation and relationship of

every branch of Y shape vessels one is to use the angles of every branch to

x axle the other is to use the angels between branch and iris radial

direction The first method needs additional rotation operating to align the

template In our approach we employed the second method As Figure 6

shows ϕ1 ϕ2 and ϕ3 denote the angle between each branch and the radius

from pupil center Even when the head tilts the eye moves or the camera

zooms occurs at the image acquisition step ϕ1 ϕ2 and ϕ3 are quite stable

To tolerate errors from the pupil center calculation in the segmentation step

we also recorded the center position (x y) of the Y shape branches as

auxiliary parameters So in our rotation shift and scale invariant feature

vector is defined as y(ϕ1 ϕ2 ϕ3 x y) The Y-shape descriptor is generated

with reference to the iris center Therefore it is automatically aligned to the

iris centers It is a rotational- and scale- invariant descriptor V WPL

SCLERA DESCRIPTOR As we discussed in the Section 22 the line

descriptor is extracted from the skeleton of vessel structure in binary images

(Figure 7) The skeleton is then broken into smaller segments For each

segment a line descriptor is created to record the center and orientation of

the segment This descriptor is expressed as s(x yɸ) where (x y) is the

position of the center and ɸ is its orientation Because of the limitation of

segmentation accuracy the descriptor in the boundary of sclera area might

not be accurate and may contain spur edges resulting from the iris eyelid

andor eyelashes To be tolerant of such error the mask file

FIG

The line descriptor of the sclera vessel pattern (a) An eye image (b) Vessel

patterns in sclera (c) Enhanced sclera vessel patterns (d) Centers of line

segments of vessel patterns

Is designed to indicate whether a line segment belongs to the edge of the

sclera or not However in GPU application using the mask is a challenging

since the mask files are large in size and will occupy the GPU memory and

slow down the data transfer When matching the registration RANSAC

type algorithm was used to randomly select the corresponding descriptors

and the transform parameter between them was used to generate the

template transform affine matrix After every templates transform the mask

data should also be transformed and new boundary should be calculated to

evaluate the weight of the transformed descriptor This results in too many

convolutions in processor unit

To reduce heavy data transfer and computation we designed the

weighted polar line (WPL) descriptor structure which includes the

information of mask and can be automatically aligned We extracted the

relationship of geometric feature of descriptors and store them as a new

descriptor We use a weighted image created via setting various weight

values according to their positions The weight of those descriptors who are

beyond the sclera are set to be 0 and those who are near the sclera

boundary are 05 and interior descriptors are set to be 1 In our work

descriptors weights were calculated on their own mask by the CPU only

once

The calculating result was saved as a component of descriptor The

descriptor of sclera will change to s(x y ɸw) where w denotes the weight

of the point and the value may be 0 05 1 To align two templates when a

template is shifted to another location along the line connecting their

centers all the descriptors of that template will be transformed It would be

faster if two templates have similar reference points If we use the center of

the iris as the reference point when two templates are compared the

correspondence will automatically be aligned to each other since they have

the similar reference point Every feature vector of the template is a set of

line segment descriptors composed of three variable (Figure 8) the

segment angle to the reference line which went through the iris center

denoted as θ the distance between the segments center and pupil center

which is denoted as r the dominant angular orientation of the segment

denoted as ɸ To minimize the GPU computing we also convert the

descriptor value from polar coordinate to rectangular coordinate in CPU

preprocess

The descriptor vector becomes s(x y r θ ɸw) The left and right

parts of sclera in an eye may have different registration parameters For

example as an eyeball moves left left part sclera patterns of the eye may be

compressed while the right part sclera patterns are stretched

In parallel matching these two parts are assigned to threads in

different warps to allow different deformation The multiprocessor in

CUDA manages threads in groups of 32 parallel threads called warps We

reorganized the descriptor from same sides and saved

FIG

FIG

them in continuous address This would meet requirement of coalesced

memory access in GPU

After reorganizing the structure of descriptors and adding mask information

into the new descriptor the computation on the mask file is not needed on

the GPU It was very fast to match with this feature because it does not

need to reregister the templates every time after shifting Thus the cost of

data transfer and computation on GPU will be reduced Matching on the

new descriptor the shift parameters generator in Figure 4 is then simplified

as Figure 9

23 EVOLUTION OF GPU ARCHITECTURE

The fixed-function pipeline lacked the generality to efficiently express

more complicated shading and lighting operations that are essential for

complex effects The key step was replacing the fixed-function per-vertex

and per-fragment operations with user-specified programs run on each

vertex and fragment Over the past six years these vertex programs and

fragment programs have become increasingly more capable with larger

limits on their size and resource consumption with more fully featured

instruction sets and with more flexible control-flow operations After many

years of separate instruction sets for vertex and fragment operations current

GPUs support the unified Shader Model 40 on both vertex and fragment

shaders

The hardware must support shader programs of at least 65 k static

instructions and unlimited dynamic instructions

The instruction set for the first time supports both 32-bit integers and 32-

bit floating-point numbers

The hardware must allow an arbitrary number of both direct and indirect

reads from global memory (texture)

Finally dynamic flow control in the form of loops and branches must be

supported

As the shader model has evolved and become more powerful and GPU

applications of all types have increased vertex and fragment program

complexity GPU architectures have increasingly focused on the

programmable parts of the graphics pipeline Indeed while previous

generations of GPUs could best be described as additions of

programmability to a fixed-function pipeline todayrsquos GPUs are better

characterized as a programmable engine surrounded by supporting fixed-

function units General-Purpose Computing on the GPU Mapping general-

purpose computation onto the GPU uses the graphics hardware in much the

same way as any standard graphics application Because of this similarity it

is both easier and more difficult to explain the process On one hand the

actual operations are the same and are easy to follow on the other hand the

terminology is different between graphics and general-purpose use Harris

provides an excellent description of this mapping process

We begin by describing GPU programming using graphics terminology

then show how the same steps are used in a general-purpose way to author

GPGPU applications and finally use the same steps to show the more

simple and direct way that todayrsquos GPU computing applications are written

231 PROGRAMMING A GPU FOR GRAPHICS

We begin with the same GPU pipeline that we described in Section II

concentrating on the programmable aspects of this pipeline

The programmer specifies geometry that covers a region on the screen

The rasterizer generates a fragment at each pixel location covered by that

geometry

Each fragment is shaded by the fragment program

The fragment program computes the value of the fragment by a

combination of math operations and global memory reads from a global

Btexture[ memory

The resulting image can then be used as texture on future passes through

the graphics pipeline

232 PROGRAMMING A GPU FOR GENERAL-PURPOSE

PROGRAMS (OLD)

Coopting this pipeline to perform general-purpose computation

involves the exact same steps but different terminology A motivating

example is a fluid simulation computed over a grid at each time step we

compute the next state of the fluid for each grid point from the current state

at its grid point and at the grid points of its neighbors

The programmer specifies a geometric primitive that covers a

computation domain of interest The rasterizer generates a fragment at each

pixel location covered by that geometry (In our example our primitive

must cover a grid of fragments equal to the domain size of our fluid

simulation)

Each fragment is shaded by an SPMD general-purpose fragment

program (Each grid point runs the same program to update the state of its

fluid)

The fragment program computes the value of the fragment by a

combination of math operations and Bgather[ accesses from global

memory (Each grid point can access the state of its neighbors from the

previous time step in computing its current value)

The resulting buffer in global memory can then be used as an input on

future passes (The current state of the fluid will be used on the next time

step)

23 2 PROGRAMMING A GPU FOR GENERAL-PURPOSE

PROGRAMS (NEW)

One of the historical difficulties in programming GPGPU applications

has been that despite their general-purpose tasksrsquo having nothing to do with

graphics the applications still had to be programmed using graphics APIs

In addition the program had to be structured in terms of the graphics

pipeline with the programmable units only accessible as an intermediate

step in that pipeline when the programmer would almost certainly prefer to

access the programmable units directly The programming environments we

describe in detail in Section IV are solving this difficulty by providing a

more natural direct non-graphics interface to the hardware and

specifically the programmable units Today GPU computing applications

are structured in the following way

The programmer directly defines the computation domain of interest as a

structured grid of threads

An SPMD general-purpose program computes the value of each thread

The value for each thread is computed by a combination of math

operations and both Bgather[ (read) accesses from and Bscatter[ (write)

accesses to global memory Unlike in the previous two

methods the same buffer can be used for both reading and writing

allowing more flexible algorithms (for example in-place algorithms that

use less memory)

The resulting buffer in global memory can then be used as an input in

future computation

24 COARSE-TO-FINE TWO-STAGE MATCHING PROCESS

To further improve the matching process we propose the coarse-to-fine

two-stage matching process In the first stage we matched two images

coarsely using the Y-shape descriptors which is very fast to match because

no registration was needed The matching result in this stage can help filter

out image pairs with low similarities After this step it is still possible for

some false positive matches In the second stage we used WPL descriptor

to register the two images for more detailed descriptor matching including

scale- and translation invariance This stage includes shift transform affine

matrix generation and final WPL descriptor matching Overall we

partitioned the registration and matching processing into four kernels2 in

CUDA (Figure 10) matching on the Y shape descriptor shift

transformation affine matrix generation and final WSL descriptor

matching Combining these two stages the matching program can run faster

and

achieve more accurate score

241 STAGE I MATCHING WITH Y SHAPE DESCRIPTOR

Due to scale- and rotation- invariance of the Y-shape features

registration is unnecessary before matching on Y shape descriptor The

whole matching algorithm is listed as algorithm 1

FIG

Here ytei and yta j are the Y shape descriptors of test template Tte

and target template Tta respectively dϕ is the Euclidian distance of angle

element of descriptors vector defined as (3) dxy is the Euclidian distance of

two descriptor centers defined as (4) ni and di are the matched descriptor

pairsrsquo number and their centers distance respectively tϕ is a distance

threshold and txy is the threshold to restrict the searching area We set tϕ to

30 and txy to 675 in our experiment Here

To match two sclera templates we searched the areas nearby to all

the Y shape branches The search area is limited to the corresponding left or

right half of the sclera in order to reduce the searching range and time The

distance of two branches is defined in (3) where ϕi j is the angle between

the j th branch and the polar from pupil center in desctiptor i

The number of matched pairs ni and the distance between Y shape

branches centers di are stored as the matching result We fuse the number of

matched branches and the average distance between matched branches

centers as (2) Here α is a factor to fuse the matching score which was set

to 30 in our study Ni and Nj is the total numbers of feature vectors in

template i and j separately The decision is regulated by the threshold t if

the sclerarsquos matching score is lower than t the sclera will be discarded The

sclera with high matching score will be passed to the next more precisely

matching process

242 STAGE II FINE MATCHING USING WPL DESCRIPTOR

The line segment WSL descriptor reveals more vessel structure detail of

sclera than the Y shape descriptor The variation of sclera vessel pattern is

nonlinear because

When acquiring an eye image in different gaze angle the vessel structure

will appear nonlinear shrink or extend because eyeball is spherical in shape

sclera is made up of four layers episclera stroma lamina fusca and

endothelium There are slightly differences among movement of these

layers Considering these factors our registration employed both single

shift transform and multi-parameter transform which combines shift

rotation and scale together

1) SHIFT PARAMETER SEARCH As we discussed before

segmentation may not be accurate As a result the detected iris center could

not be very accurate Shift transform is designed to tolerant possible errors

in pupil center detection in the segmentation step If there is no deformation

or only very minor deformation registration with shift transform together

would be adequate to achieve an accurate result We designed Algorithm 2

to get optimized shift parameter Where Tte is the test template and ssei is

the i th WPL descriptor of Tte Tta is the target template and ssai is the i th

WPL descriptor of Ttad(stek staj ) is Euclidean distance of escriptors stek

and staj

Δsk is the shift value of two descriptors defines as

We first randomly select an equal number of segment descriptors

stek in test template Tte from each quad and find its nearest neighbors staj _

in target template Tta The shift offset of them is recorded as the possible

registration shift factor _sk The final offset registration factor is _soptim

which has the smallest standard deviation among these candidate offsets

2) AFFINE TRANSFORM PARAMETER SEARCH

Affine transform is designed to tolerant some deformation of sclera

patterns in the matching step The affine transform algorithm is shown in

Algorithm 3 The shift value in the parameter set is obtained by randomly

selecting descriptor s(it )te and calculating the distance from its nearest

neighbor staj_ in Tta We transform the test template by the matrix in (7)

At end of the iteration we count the numbers of matched descriptor pairs

from the transformed template and the target template The factor β is

involved to determine if the pair of descriptor is matched and we set it to

be 20 pixels in our experiment After N iterations the optimized transform

parameter set is determined via selecting the maximum matching numbers

m(it) Here stei Tte staj and Tta is defined same as algorithm 2 tr (it )

shi f t θ(it )tr (it ) scale is the parameters of shift rotation and scale

transform generated in i tth iteration R(θ (it )) T (tr (it ) shi f t ) and S(tr (it

) scale) are the transform matrix defined as (7) To search optimize

transform parameter we iterated N times to generate these parameters In

our experiment we set iteration time to 512

3) REGISTRATION AND MATCHING ALGORITHM

Using the optimized parameter set determined from Algorithms 2

and 3 the test template will be registered and matched simultaneously The

registration and matching algorithm is listed in Algorithm 4 Here stei Tte

staj and Tta are defined same as Algorithms 2 and 3 θ(optm) tr (optm) shi f

t tr (optm) scale _soptim are the registration parameters attained from

Algorithms 2 and 3 R_θ(optm)_T _tr (optm) shi f t _S_tr (optm) scale

_ is the descriptor transform matrix defined in Algorithm 3 empty is the angle

between the segment descriptor and radius direction w is the weight of the

descriptor which indicates whether the descriptor is at the edge of sclera or

not To ensure that the nearest descriptors have a similar orientation we

used a constant factor α to check the abstract difference of two ɸ In our

experiment we set α to 5 The total matching score is minimal score of two

transformed result divided by the minimal matching score for test template

and target template

25 MAPPING THE SUBTASKS TO CUDA

CUDA is a single instruction multiple data (SIMD) system and

works as a coprocessor with a CPU A CUDA consists of many streaming

multiprocessors (SM) where the parallel part of the program should be

partitioned into threads by the programmer and mapped into those threads

There are multiple memory spaces in the CUDA memory hierarchy

register local memory shared memory global memory constant memory

and texture memory Register local memory and shared memory are on-

chip and could be a little time consuming to access these memories Only

shared memory can be accessed by other threads within the same block

However there is only limited availability of shared memory Global

memory constant memory and texture memory are off-chip memory and

accessible by all threads which would be very time consuming to access

these memories

Constant memory and texture memory are read-only and cacheable

memory Mapping algorithms to CUDA to achieve efficient processing is

not a trivial task There are several challenges in CUDA programming

If threads in a warp have different control path all the branches will be

executed serially To improve performance branch divergence within a

warp should be avoided

Global memory is slower than on-chip memory in term of access To

completely hide the latency of the small instructions set we should use on-

chip memory preferentially rather than global memory When global

memory access occurs threads in same warp should access the words in

sequence to achieve coalescence

Shared memory is much faster than the local and global memory space

But shared memory is organized into banks which are equal in size If two

addresses of memory request from different thread within a warp fall in the

same memory bank the access will be serialized To get maximum

performance memory requests should be scheduled to minimize bank

conflicts

251 MAPPING ALGORITHM TO BLOCKS

Because the proposed registration and matching algorithm has four

independent modules all the modules will be converted to different kernels

on the GPU These kernels are different in computation density thus we

map them to the GPU by various map strategies to fully utilize the

computing power of CUDA Figure 11 shows our scheme of CPU-GPU

task distribution and the partition among blocks and threads Algorithm 1 is

partitioned into coarse-grained parallel subtasks

We create a number of threads in this kernel The number of threads

is the same as the number of templates in the database As the upper middle

column shows in Figure 11 each target template will be assigned to one

thread One thread performs a pair of templates compare In our work we

use NVIDIA C2070 as our GPU Threads and blocks number is set to

1024 That means we can match our test template with up to 1024times1024

target templates at same time

Algorithms 2-4 will be partitioned into fine-grained subtasks which is

processed a section of descriptors in one thread As the lower portion of the

middle column shows in Figure 11 we assigned a target template to one

block Inside a block one thread corresponds a set of descriptors in this

template This partition makes every block execute independently and there

are no data exchange requirements between different blocks When all

threads complete their responding descriptor fractions the sum of the

intermediate results needs to be computed or compared A parallel prefix

sum algorithm is used to calculate the sum of intermediate results which is

show in right of Figure 11 Firstly all odd number threads compute the sum

of consecutive pairs of the results Then recursively every first of i (= 4 8

16 32 64 ) threads

compute the prefix sum on the new result The final result will be saved in

the first address which has the same variable name as the first intermediate

result

252 MAPPING INSIDE BLOCK

In shift argument searching there are two schemes we can choose to

map task

Mapping one pair of templates to all the threads in a block and then every

thread would take charge of a fraction of descriptors and cooperation with

other threads

Assigning a single possible shift offset to a thread and all the threads will

compute independently unless the final result should be compared with

other possible offset

Due to great number of sum and synchronization operations in every

nearest neighbor searching step we choose the second method to parallelize

shift searching In affine matrix generator we mapped an entire parameter

set searching to a thread and every thread randomly generated a set of

parameters and tried them independently The generated iterations were

assigned to all threads The challenge of this step is the randomly generated

numbers might be correlated among threads In the step of rotation and

scale registration generating we used the Mersenne Twister pseudorandom

number generator because it can use bitwise arithmetic and have long

period

The Mersenne twister as most of pseudorandom generators is iterative

Therefore itrsquos hard to parallelize a single twister state update step among

several execution threads To make sure that thousands of threads in the

launch grid generate uncorrelated random sequence many simultaneous

Mersenne twisters need to process with different initial states in parallel

But even ldquovery differentrdquo (by any definition) initial state values do not

prevent the emission of correlated sequences by each generator sharing

identical parameters To solve this problem and to enable efficient

implementation of Mersenne Twister on parallel architectures we used a

special offline tool for the dynamic creation of Mersenne Twisters

parameters modified from the algorithm developed by Makoto Matsumoto

and Takuji Nishimura In the registration and matching step when

searching the nearest neighbor a line segment that has already matched

with others should not be used again In our approach a flag

FIG

FIG

Variable denoting whether the line has been matched is stored in

shared memory To share the flags all the threads in a block should wait

synchronic operation at every query step Our solution is to use a single

thread in a block to process the matching

253 MEMORY MANAGEMENT

The bandwidth inside GPU board is much higher than the

bandwidth between host memory and device memory The data transfer

between host and device can lead to long latency As shown in Figure 11

we load the entire target templates set from database without considering

when they would be processed Therefore there was no data transfer from

host to device during the matching procedure In global memory the

components in descriptor y(ϕ1 ϕ2 ϕ3 x y) and s(x y rθ ϕw) were stored

separately This would guarantee contiguous kernels of Algorithm 2 to 4

can access their data in successive addresses Although such coalescing

access reduces the latency frequently global memory access was still a

slower way to get data In our kernel we loaded the test template to shared

memory to accelerate memory access Because the Algorithms 2 to 4

execute different number of iterations on same data the bank conflict does

not happen To maximize our texture memory space we set the system

cache to the lowest value and bonded our target descriptor to texture

memory Using this catchable memory our data access was accelerated

more

FIG

26 HISTOGRAM OF ORIENTED GRADIENTS

Histogram of oriented gradients is the feature descriptors It is primarily

applied to the design of target detection In this paper it is applied as the

feature for human recognition In the sclera region the vein patterns are the

edges of an image So HOG is used to determine the gradient orientation

and edge orientations of vein pattern in the sclera region of an eye image

To follow out this technique first of all divide the image into small

connected regions called cells For each cell compute the histogram of

gradient directions or edge orientations of the pixels Then the combination

of different histogram of different cell represents the descriptor To improve

accuracy histograms can be contrast normalized by calculating the intensity

from the block and then using this value normalizes all cells within the

block This normalization result shows that it is invariant to geometric and

photometric changes The gradient magnitude m(x y) and orientation 1050592(x

y) are calculated using x and y directions gradients dx (x y) and dy (x y)

Orientation binning is the second step of HOG This method utilized

to create cell histograms Each pixel within the cell used to give a weight to

the orientation which is found in the gradient computation Gradient

magnitude is used as the weight The cells are in the rectangular form The

binning of gradient orientation should be spread over 0 to 180 degrees and

opposite direction counts as the same In the Fig 8 depicts the edge

orientation of picture elements If the images have any illumination and

contrast changes then the gradient strength must be locally normalized For

that cells are grouped together into larger blocks These blocks are

overlapping so that each cell contributes more than once to the final

descriptor Here rectangular HOG (R-HOG) blocks are applied which are

mainly in square grids The performance of HOG is improved by putting

on a Gaussian window into each block

FIG

CHAPTER 3

SOFTWARE SPECIFICATION

31 GENERAL

MATLAB(matrix laboratory) is a numerical

computing environment and fourth-generation programming language

Developed by Math Works MATLAB allows matrix manipulations

plotting of functions and data implementation of algorithms creation

of user interfaces and interfacing with programs written in other languages

including C C++ Java and Fortran

Although MATLAB is intended primarily for numerical computing an

optional toolbox uses the MuPAD symbolic engine allowing access

to symbolic computing capabilities An additional package Simulink adds

graphicalmulti-domainsimulationandModel-Based

Design for dynamic and embedded systems

In 2004 MATLAB had around one million users across industry

and academia MATLAB users come from various backgrounds

of engineering science and economics MATLAB is widely used in

academic and research institutions as well as industrial enterprises

MATLAB was first adopted by researchers and practitioners

in control engineering Littles specialty but quickly spread to many other

domains It is now also used in education in particular the teaching

of linear algebra and numerical analysis and is popular amongst scientists

involved in image processing The MATLAB application is built around the

MATLAB language The simplest way to execute MATLAB code is to type

it in the Command Window which is one of the elements of the MATLAB

Desktop When code is entered in the Command Window MATLAB can

be used as an interactive mathematical shell Sequences of commands can

be saved in a text file typically using the MATLAB Editor as a script or

encapsulated into a function extending the commands available

MATLAB provides a number of features for documenting and

sharing your work You can integrate your MATLAB code with other

languages and applications and distribute your MATLAB algorithms and

applications

32 FEATURES OF MATLAB

High-level language for technical computing

Development environment for managing code files and data

Interactive tools for iterative exploration design and problem solving

Mathematical functions for linear algebra statistics Fourier analysis

filtering optimization and numerical integration

2-D and 3-D graphics functions for visualizing data

Tools for building custom graphical user interfaces

Functions for integrating MATLAB based algorithms with external

applications and languages such as C C++ FORTRAN Javatrade COM

and Microsoft Excel

MATLAB is used in vast area including signal and image

processing communications control design test and measurement

financial modeling and analysis and computational Add-on toolboxes

(collections of special-purpose MATLAB functions) extend the MATLAB

environment to solve particular classes of problems in these application

areas

MATLAB can be used on personal computers and powerful

server systems including the Cheaha compute cluster With the addition of

the Parallel Computing Toolbox the language can be extended with parallel

implementations for common computational functions including for-loop

unrolling Additionally this toolbox supports offloading computationally

intensive workloads to Cheaha the campus compute cluster MATLAB is

one of a few languages in which each variable is a matrix (broadly

construed) and knows how big it is Moreover the fundamental operators

(eg addition multiplication) are programmed to deal with matrices when

required And the MATLAB environment handles much of the bothersome

housekeeping that makes all this possible Since so many of the procedures

required for Macro-Investment Analysis involves matrices MATLAB

proves to be an extremely efficient language for both communication and

implementation

321 INTERFACING WITH OTHER LANGUAGES

MATLAB can call functions and subroutines written in the C

programming language or FORTRAN A wrapper function is created

allowing MATLAB data types to be passed and returned The dynamically

loadable object files created by compiling such functions are termed MEX-

files (for MATLAB executable)

Libraries written in Java ActiveX or NET can be directly called

from MATLAB and many MATLAB libraries (for

example XML or SQL support) are implemented as wrappers around Java

or ActiveX libraries Calling MATLAB from Java is more complicated but

can be done with MATLAB extension which is sold separately by Math

Works or using an undocumented mechanism called JMI (Java-to-Mat lab

Interface) which should not be confused with the unrelated Java that is also

called JMI

As alternatives to the MuPAD based Symbolic Math Toolbox

available from Math Works MATLAB can be connected

to Maple or Mathematical

Libraries also exist to import and export MathML

Development Environment

Startup Accelerator for faster MATLAB startup on Windows especially on

Windows XP and for network installations

Spreadsheet Import Tool that provides more options for selecting and

loading mixed textual and numeric data

Readability and navigation improvements to warning and error messages in

the MATLAB command window

Automatic variable and function renaming in the MATLAB Editor

Developing Algorithms and Applications

MATLAB provides a high-level language and development

tools that let you quickly develop and analyze your algorithms and

applications

The MATLAB Language

The MATLAB language supports the vector and matrix operations

that are fundamental to engineering and scientific problems It enables fast

development and execution With the MATLAB language you can

program and develop algorithms faster than with traditional languages

because you do not need to perform low-level administrative tasks such as

declaring variables specifying data types and allocating memory In many

cases MATLAB eliminates the need for lsquoforrsquo loops As a result one line of

MATLAB code can often replace several lines of C or C++ code

At the same time MATLAB provides all the features of a traditional

programming language including arithmetic operators flow control data

structures data types object-oriented programming (OOP) and debugging

features

MATLAB lets you execute commands or groups of commands one

at a time without compiling and linking enabling you to quickly iterate to

the optimal solution For fast execution of heavy matrix and vector

computations MATLAB uses processor-optimized libraries For general-

purpose scalar computations MATLAB generates machine-code

instructions using its JIT (Just-In-Time) compilation technology

This technology which is available on most platforms provides

execution speeds that rival those of traditional programming languages

Development Tools

MATLAB includes development tools that help you implement

your algorithm efficiently These include the following

MATLAB Editor

Provides standard editing and debugging features such as setting

breakpoints and single stepping

Code Analyzer

Checks your code for problems and recommends modifications to

maximize performance and maintainability

MATLAB Profiler

Records the time spent executing each line of code

Directory Reports

Scan all the files in a directory and report on code efficiency file

differences file dependencies and code coverage

Designing Graphical User Interfaces

By using the interactive tool GUIDE (Graphical User Interface

Development Environment) to layout design and edit user interfaces

GUIDE lets you include list boxes pull-down menus push buttons radio

buttons and sliders as well as MATLAB plots and Microsoft

ActiveXreg controls Alternatively you can create GUIs programmatically

using MATLAB functions

322 ANALYZING AND ACCESSING DATA

MATLAB supports the entire data analysis process from acquiring

data from external devices and databases through preprocessing

visualization and numerical analysis to producing presentation-quality

output

Data Analysis

MATLAB provides interactive tools and command-line functions for data

analysis operations including

Interpolating and decimating

Extracting sections of data scaling and averaging

Thresholding and smoothing

Correlation Fourier analysis and filtering

1-D peak valley and zero finding

Basic statistics and curve fitting

Matrix analysis

Data Access

MATLAB is an efficient platform for accessing data from

files other applications databases and external devices You can read data

from popular file formats such as Microsoft Excel ASCII text or binary

files image sound and video files and scientific files such as HDF and

HDF5 Low-level binary file IO functions let you work with data files in

any format Additional functions let you read data from Web pages and

XML

Visualizing Data

All the graphics features that are required to visualize engineering

and scientific data are available in MATLAB These include 2-D and 3-D

plotting functions 3-D volume visualization functions tools for

interactively creating plots and the ability to export results to all popular

graphics formats You can customize plots by adding multiple axes

changing line colors and markers adding annotation Latex equations and

legends and drawing shapes

2-D Plotting

Visualizing vectors of data with 2-D plotting functions that create

Line area bar and pie charts

Direction and velocity plots

Histograms

Polygons and surfaces

Scatterbubble plots

Animations

3-D Plotting and Volume Visualization

MATLAB provides functions for visualizing 2-D matrices 3-

D scalar and 3-D vector data You can use these functions to visualize and

understand large often complex multidimensional data Specifying plot

characteristics such as camera viewing angle perspective lighting effect

light source locations and transparency

3-D plotting functions include

Surface contour and mesh

Image plots

Cone slice stream and isosurface

323 PERFORMING NUMERIC COMPUTATION

MATLAB contains mathematical statistical and engineering

functions to support all common engineering and science operations These

functions developed by experts in mathematics are the foundation of the

MATLAB language The core math functions use the LAPACK and BLAS

linear algebra subroutine libraries and the FFTW Discrete Fourier

Transform library Because these processor-dependent libraries are

optimized to the different platforms that MATLAB supports they execute

faster than the equivalent C or C++ code

MATLAB provides the following types of functions for performing

mathematical operations and analyzing data

Matrix manipulation and linear algebra

Polynomials and interpolation

Fourier analysis and filtering

Data analysis and statistics

Optimization and numerical integration

Ordinary differential equations (ODEs)

Partial differential equations (PDEs)

Sparse matrix operations

MATLAB can perform arithmetic on a wide range of data types

including doubles singles and integers

CHAPTER 4

IMPLEMENTATION

41 GENERAL

Matlab is a program that was originally designed to simplify the

implementation of numerical linear algebra routines It has since grown into

something much bigger and it is used to implement numerical algorithms

for a wide range of applications The basic language used is very similar to

standard linear algebra notation but there are a few extensions that will

likely cause you some problems at first

42 SNAPSHOTS

ORIGINAL SCLERA IMAGE IS CONVERTED INTO GREY SCALE IMAGE

FIG

GREY SCALE IMAGE IS CONVERTED INTO BINARY IMAGE

FIG

EDGE DETECTON IS DONE BY OTSUrsquoS THRESHOLDING

FIG

SELECTING THE REGION OF INTEREST (SCLERA PART)

FIG

SELECTED ROI PART

FIG

FIG

ENHANCEMENT OF SCLERA IMAGE

FIG

FEATURE EXTRACTION OF SCLERA IMAGE USING GABOR

FILTERS

FIG

MATCHING WITH IMAGES IN DATABASE

FIG

DISPLAYING THE RESULT (MATCHED OR NOT MATCHED)

FIG

CHAPTER 5

APPLICATIONS

The applications of biometrics can be divided into the following three main groups

Commercial applications such as computer network login electronic data security ecommerce Internet access ATM credit card physical access control cellular phone PDA medical records management distance learning etc

Government applications such as national ID card correctional facility driverrsquos license social security welfare-disbursement border control Passport control etc

Forensic applications such as corpse identification criminal investigation terrorist identification parenthood determination missing children etc Traditionally commercial applications have used knowledge-based systems (eg PIN sand passwords) government applications have used token-based systems (eg ID cards and badges) and forensic applications have relied on human experts to match biometric features Biometric systems are being increasingly deployed in large scale civilian applications The Schiphol Premium scheme at the Amsterdam airport for example employs iris scan cards to speed up the passport and visa control procedures

CHAPTER 6

CONCLUSION AND FUTURE SCOPE

61 CONCLUSION

In this paper we proposed a new parallel sclera vein recognition

method which employees a two stage parallel approach for registration and

matching Even though the research focused on developing a parallel sclera

matching solution for the sequential line-descriptor method using CUDA

GPU architecture the parallel strategies developed in this research can be

applied to design parallel solutions to other sclera vein recognition methods

and general pattern recognition methods We designed the Y shape

descriptor to narrow the search range to increase the matching efficiency

which is a new feature extraction method to take advantage of the GPU

structures We developed the WPL descriptor to incorporate mask

information and make it more suitable for parallel computing which can

dramatically reduce data transferring and computation We then carefully

mapped our algorithms to GPU threads and blocks which is an important

step to achieve parallel computation efficiency using a GPU A work flow

which has high arithmetic intensity to hide the memory access latency was

designed to partition the computation task to the heterogeneous system of

CPU and GPU even to the threads in GPU The proposed method

dramatically improves the matching efficiency without compromising

recognition accuracy

62 REFERENCES

[1] C W Oyster The Human Eye Structure and Function Sunderland

Sinauer Associates 1999

[2] C Cuevas D Berjon F Moran and N Garcia ldquoMoving object

detection for real-time augmented reality applications in a GPGPUrdquo IEEE

Trans Consum Electron vol 58 no 1 pp 117ndash125 Feb 2012

[3] D C Cirean U Meier L M Gambardella and J Schmidhuber ldquoDeep

big simple neural nets for handwritten digit recognitionrdquo Neural Comput

vol 22 no 12 pp 3207ndash3220 2010

[4] F Z Sakr M Taher and A M Wahba ldquoHigh performance iris

recognition system on GPUrdquo in Proc ICCES 2011 pp 237ndash242

[5] G Poli J H Saito J F Mari and M R Zorzan ldquoProcessing

neocognitron of face recognition on high performance environment based

on GPU with CUDA architecturerdquo in Proc 20th Int Symp Comput

Archit High Perform Comput 2008 pp 81ndash88

[6] J Antikainen J Havel R Josth A Herout P Zemcik and M Hauta-

Kasari ldquoNonnegative tensor factorization accelerated using GPGPUrdquo IEEE

Trans Parallel Distrib Syst vol 22 no 7 pp 1135ndash1141 Feb 2011

[7] K-S Oh and K Jung ldquoGPU implementation of neural networksrdquo

Pattern Recognit vol 37 no 6 pp 1311ndash1314 2004

[8] P R Dixon T Oonishi and S Furui ldquoHarnessing graphics processors

for the fast computation of acoustic likelihoods in speech recognitionrdquo

Comput Speech Lang vol 23 no 4 pp 510ndash526 2009

[9] P Kaufman and A Alm ldquoClinical applicationrdquo Adlerrsquos Physiology of

the Eye 2003

[10] R N Rakvic B J Ulis R P Broussard R W Ives and N Steiner

ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics Security vol 4

no 4 pp 812ndash823 Dec 2009

[11] S Crihalmeanu and A Ross ldquoMultispectral scleral patterns for ocular

biometric recognitionrdquo Pattern Recognit Lett vol 33 no 14 pp 1860ndash

1869 Oct 2012

[12] W Wenying Z Dongming Z Yongdong L Jintao and G

Xiaoguang ldquoRobust spatial matching for object retrieval and its parallel

implementation on GPUrdquo IEEE Trans Multimedia vol 13 no 6 pp

1308ndash1318 Dec 2011Multimedia Sec Magdeburg Germany Sep 2004

pp 4ndash15

[13] Y Xu S Deka and R Righetti ldquoA hybrid CPU-GPGPU approach for

real-time elastographyrdquo IEEE Trans Ultrason Ferroelectr Freq Control

vol 58 no 12 pp 2631ndash2645 Dec 2011

[14] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive

multimodal eye recognitionrdquo Signal Image Video Process vol 7 no 4

pp 619ndash631 Jul 2013

[15] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive

approach for sclera image quality measurerdquo Int J Biometrics vol 5 no 2

pp 181ndash198 2013

[16] Z Zhou E Y Du N L Thomas and E J Delp ldquoA new human

identification method Sclera recognitionrdquo IEEE Trans Syst Man

Cybern A Syst Humans vol 42 no 3 pp 571ndash583 May 2012

Page 21: Major project report on

1) Improvement of pictorial information for human interpretation

2) Processing of scene data for autonomous machine perception

In the second application area interest focuses on procedures for

extracting from an image

Information in a form suitable for computer processing

Examples include automatic character recognition industrial machine

vision for product assembly and inspection military recognizance

automatic processing of fingerprints etc

17 EXISTING SYSTEM

In Crihalmeanu and Ross proposed three approaches Speed Up Robust

Features (SURF)-based method minutiae detection and direct correlation

matching for feature registration and matching Within these three methods

the SURF method achieves the best accuracy It takes an average of 15

seconds1 using the SURF method to perform a one-to-one matching In

Zhou et al Proposed line descriptor-based method for sclera vein

recognition The matching step (including registration) is the most time-

consuming step in this sclera vein recognition system which costs about 12

seconds to perform a one-to-one matching Both speed was calculated using

a PC with Intelreg Core TM 2 Duo 24GHz processors and 4 GB DRAM

Currently Sclera vein recognition algorithms are designed using central

processing unit (CPU)-based systems

171 DISADVANTAGES OF EXISTING SYSTEM

1 Mask files are used to calculate valid overlapping areas of two sclera

templates and to align the templates to the same coordinate system But the

mask files are large in size and will preoccupy the GPU memory and slow

down the data transfer Also some of processing on the mask files will

involve convolution which is difficult to improve its performance on the

scalar process unit on CUDA

2 The procedure of sclera feature matching consists of a pipeline of several

computational stages with different memory and processing requirements

There is no uniform mapping scheme applicable for all these stages

3 When the scale of sclera database is far larger than the number of the

processing units on the GPU parallel matching on the GPU is still unable to

satisfy the requirement of real-time performance

LITERATURE SURVEY

1 S Crihalmeanu and A Ross ldquo ieee Multispectral scleral patterns

for ocular biometric recognitionrdquo Pattern Recognit Lett vol 33 no

14 pp 1860ndash1869 Oct 2012

Face recognition in unconstrained acquisition conditions is one of the

most challenging problems that has been actively researched in recent

years It is well known that many state-of-the-arts still face recognition

algorithms perform well when constrained (frontal well illuminated high-

resolution sharp and full) face images are acquired However their

performance degrades significantly when the test images contain variations

that are not present in the training images In this paper we highlight some

of the key issues in remote face recognition We define the remote face

recognition as one where faces are several tens of meters (10-250m) from

the cameras We then describe a remote face database which has been

acquired in an unconstrained outdoor maritime environment Recognition

performance of a subset of existing still image-based face recognition

algorithms is evaluated on the remote face data set Further we define the

remote re-identification problem as matching a subject at one location with

candidate sets acquired at a different location and over time in remote

conditions We provide preliminary experimental results on remote re-

identification It is demonstrated that in addition to applying a good

classification algorithm finding features that are robust to variations

mentioned above and developing statistical models which can account for

these variations are very important for remote face recognition

2 R N Rakvic B J Ulis R P Broussard R W Ives and N

Steiner1 ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics

Security

With the rapidly expanded biometric data collected by various sectors

of government and industry for identification and verification purposes

how to manage and process such Big Data draws great concern Even

though modern processors are equipped with more cores and memory

capacity it still requires careful design in order to utilize the hardware

resource effectively and the power consumption efficiently This research

addresses this issue by investigating the workload characteristics of

biometric application Taking Daugmanrsquos iris matching algorithm which

has been proven to be the most reliable iris matching method as a case

study we conduct performance profiling and binary instrumentation on the

benchmark to capture its execution behavior The results show that data

loading and memory access incurs great performance overhead and

motivates us to move the biometrics computation to high-performance

architecture

Modern iris recognition algorithms can be computationally intensive

yet are designed for traditional sequential processing elements such as a

personal computer However a parallel processing alternative using field

programmable gate arrays (FPGAs) offers an opportunity to speed up iris

recognition Within the means of this project iris template generation with

directional filtering which is acomputationally expensive yet parallel

portion of a modern iris recognition algorithm is parallelized on an FPGA

system We will present a performance comparison of the parallelized

algorithm on the FPGA system to a traditional CPU-based version The

parallelized template generation outperforms an optimized C++ code

version determining the information content of an iris approximately 324

times faster

3 R Derakhshani A Ross and S Crihalmeanu ldquoA new biometric

modality based on conjunctival vasculaturerdquo in Proc Artif Neural

Netw Eng 2006 pp 1ndash8

A new biometric indicator based on the patterns of conjunctival

vasculature is proposed Conjunctival vessels can be observed on the visible

part of the sclera that is exposed to the outside world These vessels

demonstrate rich and specific details in visible light and can be easily

photographed using a regular digital camera In this paper we discuss

methods for conjunctival imaging preprocessing and feature extraction in

order to derive a suitable conjunctival vascular template for biometric

authentication Commensurate classification methods along with the

observed accuracy are discussed Experimental results suggest the potential

of using conjunctival vasculature as a biometric measure Identification of

a person based on some unique set of features is an important task The

human identification is possible with several biometric systems and sclera

recognition is one of the promising biometrics The sclera is the white

portion of the human eye The vein pattern seen in the sclera region is

unique to each person Thus the sclera vein pattern is a well suited

biometric technology for human identification The existing methods used

for sclera recognition have some drawbacks like only frontal looking

images are preferred for matching and rotation variance is another problem

These problems are completely eliminated in the proposed system by using

two feature extraction techniques They are Histogram of Oriented

Gradients (HOG) and converting the image into polar form using the

bilinear interpolation technique These two features help the proposed

system to become illumination invariant and rotation invariant The

experimentation is done with the help of UBIRIS database The

experimental result shows that the proposed sclera recognition method can

achieve better accuracy than the previous methods

4 J D Owens M Houston D Luebke S Green J E Stone and J

C Phillips ldquoGPU computingrdquo Proc IEEE vol 96 no 5pp 879ndash899

May 2008

The graphics processing unit (GPU) has become an integral part of

todayrsquos mainstream computing systems Over the past six years there has

been a marked increase in the performance and capabilities of GPUs The

modern GPU is not only a powerful graphics engine but also a highly

parallel programmable processor featuring peak arithmetic and memory

bandwidth that substantially outpaces its CPU counterpart The GPUrsquos

rapid increase in both programmability and capability has spawned a

research community that has successfully mapped a broad range of

computationally demanding complex problems to the GPU This effort in

general purpose computing on the GPU also known as GPU computing

has positioned the GPU as a compelling alternative to traditional

microprocessors in high-performance computer systems of the future We

describe the background hardware and programming model for GPU

computing summarize the state of the art in tools and techniques and

present four GPU computing successes in game physics and computational

biophysics that deliver order-of-magnitude performance gains over

optimized CPU applications

5 H Proenccedila and L A Alexandre ldquoUBIRIS A noisy iris image

databaserdquoin Proc 13th Int Conf Image Anal Process 2005 pp 970ndash

977

This paper proposes algorithms for iris segmentation quality

enhancement match score fusion and indexing to improve both the

accuracy and the speed of iris recognition A curve evolution approach is

proposed to effectively segment a nonideal iris image using the modified

MumfordndashShah functional Different enhancement algorithms are

concurrently applied on the segmented iris image to produce multiple

enhanced versions of the iris image A support-vector-machine-based

learning algorithm selects locally enhanced regions from each globally

enhanced image and combines these good-quality regions to create a single

high-quality iris image Two distinct features are extracted from the high-

quality iris image The global textural feature is extracted using the 1-D log

polar Gabor transform and the local topological feature is extracted using

Euler numbers An intelligent fusion algorithm combines the textural and

topological matching scores to further improve the iris recognition

performance and reduce the false rejection rate whereas an indexing

algorithm enables fast and accurate iris identification The verification and

identification performance of the propose algorithms is validated and

compared with other algorithms using the CASIA Version 3 ICE 2005 and

UBIRIS iris databases

18 PROPOSED METHOD

Proposed a new parallel sclera vein recognition method using a two-

stage parallel approach for registration and matching A parallel sclera

matching solution for Sclera vein recognition using our sequential line-

descriptor method using the CUDA GPU architecture CUDA is a highly

parallel multithreaded many-core processor with tremendous

computational power

It supports not only a traditional graphics pipeline but also computation

on non-graphical data It is relatively straightforward to implement our C

program for CUDA on AMD-based GPU using Open CL Our CUDA

kernels can be directly converted to Open CL kernels by concerning

different syntax for various keywords and built-in functions The mapping

strategy is also effective in Open CL if we regard thread and block in

CUDA as work item and work-group in Open CL Most of our optimization

techniques such as coalesced memory access and prefix sum can work in

Open CL too Moreover since CUDA is a data parallel architecture the

implementation of our approach by Open CL should be programmed in

data-parallel model

In this research we first discuss why the naiumlve parallel approach would

not work We then propose the new sclera descriptor ndash the Y shape sclera

feature-based efficient registration method to speed up the mapping scheme

introduce the ldquoweighted polar line (WPL) descriptorrdquo that would be better

suited for parallel computing to mitigate the mask size issue and develop

our coarse to fine two-stage matching process to dramatically improve the

matching speed These new approaches make the parallel processing

possible and efficient

191PROPOSED SYSTEM ADVANTAGES

1 To improve the efficiency in this research we propose a new descriptor

mdash the Y shape descriptor which can greatly help improve the efficiency of

the coarse registration of two images and can be used to filter out some

non-matching pairs before refined matching

2 We propose the coarse-to-fine two-stage matching process In the first

stage we matched two images coarsely using the Y-shape descriptors

which is very fast to match because no registration was needed The

matching result in this stage can help filter out image pairs with low

similarities

CHAPTER 2

PROJECT DESCRIPTION

21 INTRODUCTION

The sclera is the opaque and white outer layer of the eye The blood

vessel structure of sclera is formed randomly and is unique to each person

which can be used for humanrsquos identification Several researchers have

designed different Sclera vein recognition methods and have shown that it

is promising to use Sclera vein recognition for human identification In

Crihalmeanu and Ross proposed three approaches Speed Up Robust

Features (SURF)-based method minutiae detection and direct correlation

matching for feature registration and matching Within these three methods

the SURF method achieves the best accuracy It takes an average of 15

seconds1 using the SURF method to per- form a one-to-one matching Zhou

et al proposed line descriptor-based method for sclera vein recognition

The matching step (including registration) is the most time-consuming step

in this sclera vein recognition system which costs about 12 seconds to

perform a one-to-one matching Both speed was calculated using a PC with

Intelreg Coretrade 2 Duo 24GHz processors and 4 GB DRAM Currently

Sclera vein recognition algorithms are designed using central processing

unit (CPU)-based systems

As discussed CPU-based systems are designed as sequential

processing devices which may not be efficient in data processing where the

data can be parallelized Because of large time consumption in the matching

step Sclera vein recognition using sequential-based method would be very

challenging to be implemented in a real time biometric system especially

when there is large number of templates in the database for matching GPUs

(as abbreviation of General purpose Graphics Processing Units GPGPUs)

are now popularly used for parallel computing to improve the

computational processing speed and efficiency The highly parallel

structure of GPUs makes them more effective than CPUs for data

processing where processing can be performed in parallel GPUs have been

widely used in biometrics recognition such as speech recognition text

detection handwriting recognition and face recognition In iris

recognition GPU was used to extract the features construct descriptors

and match templates

GPUs are also used for object retrieval and image search Park et al

designed the performance evaluation of image processing algorithms such

as linear feature extraction and multi-view stereo matching on GPUs

However these approaches were designed for their specific biometric

recognition applications and feature searching methods Therefore they may

not be efficient for Sclera vein recognition Compute Unified Device

Architecture (CUDA) the computing engine of NVIDIA GPUs is used in

this research CUDA is a highly parallel multithreaded many-core

processor with tremendous computational power It supports not only a

traditional graphics pipeline but also computation on non-graphical data

More importantly it offers an easier programming platform which

outperforms its CPU counterparts in terms of peak arithmetic intensity and

memory bandwidth In this research the goal is not to develop a unified

strategy to parallelize all sclera matching methods because each method is

quite different from one another and would need customized design To

develop an efficient parallel computing scheme it would need different

strategies for different Sclera vein recognition methods

Rather the goal is to develop a parallel sclera matching solution for

Sclera vein recognition using our sequential line-descriptor method using

the CUDA GPU architecture However the parallelization strategies

developed in this research can be applied to design parallel approaches for

other Sclera vein recognition methods and help parallelize general pattern

recognition methods Based on the matching approach in there are three

challenges to map the task of sclera feature matching to GPU

1) Mask files are used to calculate valid overlapping areas of two sclera

templates and to align the templates to the same coordinate system But the

mask files are large in size and will preoccupy the GPU memory and slow

down the data transfer Also some of processing on the mask files will

involve convolution which is difficult to improve its performance on the

scalar process unit on CUDA

2) The procedure of sclera feature matching consists of a pipeline of several

computational stages with different memory and processing requirements

There is no uniform mapping scheme applicable for all these stages

3) When the scale of sclera database is far larger than the number of the

processing units on the GPU parallel matching on the GPU is still unable to

satisfy the requirement of real-time performance New designs are

necessary to help narrow down the search range In summary naiumlve

implementation of the algorithms in parallel would not work efficiently

Note it is relatively straightforward to implement our C program for

CUDA on AMD-based GPU using Open CL Our CUDA kernels can be

directly converted to Open CL kernels by concerning different syntax for

various keywords and built-in functions The mapping strategy is also

effective in Open CL if we regard thread and block in CUDA as work item

and work-group in Open CL Most of our optimization techniques such as

coalesced memory access and prefix sum can work in Open CL too

Moreover since CUDA is a data parallel architecture the implementation

of our approach by Open CL should be programmed in data-parallel model

In this research we first discuss why the naiumlve parallel approach would not

work (Section 3) We then propose the new sclera descriptor ndash the Y shape

sclera feature-based efficient registration method to speed up the mapping

scheme (Section 4) introduce the ldquoweighted polar line (WPL) descriptorrdquo

that would be better suited for parallel computing to mitigate the mask size

issue (Section 5) and develop our coarse to fine two-stage matching

process to dramatically improve the matching speed (Section 6) These new

approaches make the parallel processing possible and efficient However it

is non-trivial to implement these algorithms in CUDA We then developed

the implementation schemes to map our algorithms into CUDA (Section 7)

In the Section 2 we give brief introduction of Sclera vein recognition In

the Section 8 we performed some experiments using the proposed system

In the Section 9 we draw some conclusions

22 BACKGROUND OF SCLERA VEIN RECOGNITION

221 OVERVIEW OF SCLERA VEIN RECOGNITION

A typical sclera vein recognition system includes sclera

segmentation feature enhancement feature extraction and feature

matching (Figure 1)

FIG

Sclera image segmentation is the first step in sclera vein recognition

Several methods have been designed for sclera segmentation Crihalmeanu

et al presented an semi-automated system for sclera segmentation They

used a clustering algorithm to classify the color eye images into three

clusters - sclera iris and background Later on Crihalmeanu and Ross

designed a segmentation approach based on a normalized sclera index

measure which includes coarse sclera segmentation pupil region

segmentation and fine sclera segmentation Zhou et al developed a skin

tone plus ldquowhite colorrdquo-based voting method for sclera segmentation in

color images and Otsursquos thresholding-based method for grayscale images

After sclera segmentation it is necessary to enhance and extract the sclera

features since the sclera vein patterns often lack contrast and are hard to

detect Zhou et al used a bank of multi-directional Gabor filters for

vascular pattern enhancement Derakhshani et al used contrast limited

adaptive histogram equalization (CLAHE) to enhance the green color plane

of the RGB image and a multi-scale region growing approach to identify

the sclera veins from the image background Crihalmeanu and Ross applied

a selective enhancement filter for blood vessels to extract features from the

green component in a color image In the feature matching step

Crihalmeanu and Ross proposed

three registration and matching approaches including Speed Up Robust

Features (SURF) which is based on interest-point detection minutiae

detection which is based on minutiae points on the vasculature structure

and direct correlation matching which relies on image registration Zhou et

al designed a line descriptor based feature registration and matching

method

The proposed sclera recognition consists of five steps which include

sclera segmentation vein pattern enhancement feature extraction feature

matching and matching decision Fig 2 shows the block diagram of sclera

recognition Two types of feature extraction are used in the proposed

method to achieve good accuracy for the identification The characteristics

that are elicited from the blood vessel structure seen in the sclera region are

Histogram of Oriented Gradients (HOG) and interpolation of Cartesian to

Polar conversion HOG is used to determine the gradient orientation and

edge orientations of vein pattern in the sclera region of an eye image To

become more computationally efficient the data of the image are converted

to the polar form It is mainly used for circular or quasi circular shape of

object These two characteristics are extracted from all the images in the

database and compared with the features of the query image whether the

person is correctly identified or not This procedure is done in the feature

matching step and ultimately makes the matching decision By using the

proposed feature extraction methods and matching techniques the human

identification is more accurate than the existing studies In the proposed

method two features of an image are drawn out

222 SCLERA SEGMENTATION

Sclera segmentation is the first step in the sclera recognition It lets

in three steps glare area detection sclera area estimation and iris and eyelid

detection and refinement Fig shows the steps of segmentation

FIG

Glare Area Detection Glare area means a small bright area near

pupil or iris This is the unwanted portion on the eye image Sobel filter is

applied to detect the glare area present in the iris or pupil Simply it runs

only for the grayscale image If the image is color then it needs a

conversion to grayscale image and after that apply it to the Sobel filter to

detect the glare area Fig 4 shows the result of the glare area detection

FIG

Sclera area estimation For the estimation of sclera area Otsursquos

thresholding method is applied The stairs of the sclera area detection are

selection of the area of interest (ROI) Otsursquos thresholding sclera area

detection Left and right sclera area is selected based on the iris boundaries

When the region of interest is selected then apply Otsursquos thresholding for

obtaining the potential sclera areas The correct left sclera area should be

placed in the right and center positions and correct right sclera area should

be placed in the left and center In this way non sclera areas are wiped out

223 IRIS AND EYELID REFINEMENT

The top and underside of the sclera regions are the limits of the

sclera area And then that upper eyelid lower eyelid and iris boundaries are

refined These altogether are the unwanted portion for recognition In order

to eliminate these effects refinement is done in the footstep of the detection

of sclera area Fig shows after the Otsursquos thresholding process and iris and

eyelid refinement to detect right sclera area In the same way the left sclera

area is detected using this method

FIG

In the segmentation process all images are not perfectly segmented

Hence feature extraction and matching are needed to reduce the

segmentation fault The vein patterns in the sclera area are not visible in the

segmentation process To get vein patterns more visible vein pattern

enhancement is to be performed

224 OCULAR SURFACE VASCULATURE

Human recognition using vascular patterns in the human body has

been studied in the context of fingers (Miura et al 2004) palm (Lin and

Fan 2004) and retina (Hill 1999) In the case of retinal biometrics an

especial optical device for imaging the back of the eyeball is needed (Hill

1999) Due to its perceived invasiveness and the required degree of subject

cooperation the use of retinal biometrics may not be acceptable to some

individuals The conjunctiva is a thin transparent and moist tissue that

covers the outer surface of the eye The part of the conjunctiva that covers

the inner lining of the eyelids is called palpebral conjunctiva and the part

that covers the outer surface of the eye is called ocular (or the bulbar)

conjunctiva which is the focus of this study The ocular conjunctiva is very

thin and clear thus the vasculature (including those of the episclera) is

easily visible through it The visible microcirculation of conjunctiva offers a

rich and complex network of veins and fine microcirculation (Fig 1) The

apparent complexity and specificity of these vascular patterns motivated us

to utilize them for personal identification (Derakhshani and Ross 2006)

FIG

We have found conjunctival vasculature to be a suitable biometric as it

conforms to the following criteria (Jain et al 2004)

UNIVERSALITY All normal living tissues including that of the

conjunctiva and episclera have vascular structure

UNIQUENESS Vasculature is created during embryonic vasculogenesis

Its detailed final structure is mostly stochastic and thus unique Even

though no comprehensive study on the uniqueness of vascular structures

has been conducted study of some targeted areas such as those of the eye

fundus confirm the uniqueness of such vascular patterns even between

identical twins (Simon and Goldstein 1935 Tower 1955)

PERMANENCE Other than cases of significant trauma pathology or

chemical intervention spontaneous adult ocular vasculogenesis and

angiogenesis does not easily occur Thus the conjunctival vascular

structure is expected to have reasonable permanence (Joussen 2001)

Practicality Conjunctival vasculature can be captured with commercial off

the shelf digital cameras under normal lighting conditions making this

modality highly practical

ACCEPTABILITY Since the subject is not required to stare directly into

the camera lens and given the possibility of capturing the conjunctival

vasculature from several feet away this modality is non-intrusive and thus

more acceptable

SPOOF-PROOFNESS The fine multi surface structure of the ocular

veins makes them hard to reproduce as a physical artifact Besides being a

stand-alone biometric modality we anticipate that the addition of

conjunctival biometrics will enhance the performance of current iris-based

biometric system in the following ways

Improving accuracy by the addition of vascular features

Facilitating recognition using off-angle iris images For instance if the iris

information is relegated to the left or right portions of the eye the sclera

vein patterns will be further exposed This feature makes sclera vasculature

a natural complement to the iris biometric

Addressing the failure-to-enroll issue when iris patterns are not usable (eg

due to surgical procedures)

Reducing vulnerability to spoof attacks For instance when implemented

alongside iris systems an attacker needs to reproduce not only the iris but

also different surfaces of the sclera along with the associated

microcirculation and make them available on commensurate eye surfaces

The first step in parallelizing an algorithm is to determine the

availability for simultaneous computation Below Figure demonstrates the

possibility for parallel directional filtering Since the filter is computed over

different portions of the input image the computation can be computed in

parallel (denoted by Elements below) In addition individual parallelization

of each element of Filtering can also be performed A detailed discussion of

our proposed parallelization is outside the scope of this paper

FIG

FIG

225 OVERVIEW OF THE LINE DESCRIPTOR-BASED SCLERA

VEIN

2251 RECOGNITION METHOD

The matching segment of the line-descriptor based method is a

bottleneck with regard to matching speed In this section we briefly

describe the Line Descriptor-based sclera vein recognition method After

segmentation vein patterns were enhanced by a bank of directional Gabor

filters Binary morphological operations are used to thin the detected vein

structure down to a single pixel wide skeleton and remove the branch

points The line descriptor is used to describe the segments in the vein

structure Figure 2 shows a visual description of the line descriptor Each

segment is described by three quantities the segments angle to some

reference angle at the iris center θ the segments distance to the iris center r

and the dominant angular orientation of the line segment ɸ Thus the

descriptor is S = ( θ r ɸ )T The individual components of the line descriptor

are calculated as

FIG

Here fline (x) is the polynomial approximation of the line segment (xl yl )

is the center point of the line segment (xi yi ) is the center of the detected

iris and S is the line descriptor In order to register the segments of the

vascular patterns a RANSAC-based algorithm is used to estimate the best-

fit parameters for registration between the two sclera vascular patterns For

the registration algorithm it randomly chooses two points ndash one from the

test template and one from the target template It also randomly chooses a

scaling factor and a rotation value based on a priori knowledge of the

database Using these values it calculates a fitness value for the registration

using these parameters

After sclera template registration each line segment in the test

template is compared to the line segments in the target template for

matches In order to reduce the effect of segmentation errors we created the

weighting image (Figure 3) from the sclera mask by setting interior pixels

in the sclera mask to 1 pixels within some distance of the boundary of the

mask to 05 and pixels outside the mask to 0

The matching score for two segment descriptors is calculated By

where Si and Sj are two segment descriptors m(Si Sj ) is the matching

score between segments Si and Sj d(Si Sj ) is the Euclidean distance

between the segment descriptors center points (from Eq 6-8) Dmatch is

the matching distance threshold and match is the matching angle threshold

The total matching score M is the sum of the individual matching scores

divided by the maximum matching score for the minimal set between the

test and target template That is one of the test or target templates has fewer

points and thus the sum of its descriptors weight sets the maximum score

that can be attained

FIG

FIG

FIG

FIG

movement of eye Y shape branches are observed to be a stable feature and

can be used as sclera feature descriptor To detect the Y shape branches in

the original template we search for the nearest neighbors set of every line

segment in a regular distance classified the angles among these neighbors

If there were two types of angle values in the line segment set this set may

be inferred as a Y shape structure and the line segment angles would be

recorded as a new feature of the sclera

There are two ways to measure both orientation and relationship of

every branch of Y shape vessels one is to use the angles of every branch to

x axle the other is to use the angels between branch and iris radial

direction The first method needs additional rotation operating to align the

template In our approach we employed the second method As Figure 6

shows ϕ1 ϕ2 and ϕ3 denote the angle between each branch and the radius

from pupil center Even when the head tilts the eye moves or the camera

zooms occurs at the image acquisition step ϕ1 ϕ2 and ϕ3 are quite stable

To tolerate errors from the pupil center calculation in the segmentation step

we also recorded the center position (x y) of the Y shape branches as

auxiliary parameters So in our rotation shift and scale invariant feature

vector is defined as y(ϕ1 ϕ2 ϕ3 x y) The Y-shape descriptor is generated

with reference to the iris center Therefore it is automatically aligned to the

iris centers It is a rotational- and scale- invariant descriptor V WPL

SCLERA DESCRIPTOR As we discussed in the Section 22 the line

descriptor is extracted from the skeleton of vessel structure in binary images

(Figure 7) The skeleton is then broken into smaller segments For each

segment a line descriptor is created to record the center and orientation of

the segment This descriptor is expressed as s(x yɸ) where (x y) is the

position of the center and ɸ is its orientation Because of the limitation of

segmentation accuracy the descriptor in the boundary of sclera area might

not be accurate and may contain spur edges resulting from the iris eyelid

andor eyelashes To be tolerant of such error the mask file

FIG

The line descriptor of the sclera vessel pattern (a) An eye image (b) Vessel

patterns in sclera (c) Enhanced sclera vessel patterns (d) Centers of line

segments of vessel patterns

Is designed to indicate whether a line segment belongs to the edge of the

sclera or not However in GPU application using the mask is a challenging

since the mask files are large in size and will occupy the GPU memory and

slow down the data transfer When matching the registration RANSAC

type algorithm was used to randomly select the corresponding descriptors

and the transform parameter between them was used to generate the

template transform affine matrix After every templates transform the mask

data should also be transformed and new boundary should be calculated to

evaluate the weight of the transformed descriptor This results in too many

convolutions in processor unit

To reduce heavy data transfer and computation we designed the

weighted polar line (WPL) descriptor structure which includes the

information of mask and can be automatically aligned We extracted the

relationship of geometric feature of descriptors and store them as a new

descriptor We use a weighted image created via setting various weight

values according to their positions The weight of those descriptors who are

beyond the sclera are set to be 0 and those who are near the sclera

boundary are 05 and interior descriptors are set to be 1 In our work

descriptors weights were calculated on their own mask by the CPU only

once

The calculating result was saved as a component of descriptor The

descriptor of sclera will change to s(x y ɸw) where w denotes the weight

of the point and the value may be 0 05 1 To align two templates when a

template is shifted to another location along the line connecting their

centers all the descriptors of that template will be transformed It would be

faster if two templates have similar reference points If we use the center of

the iris as the reference point when two templates are compared the

correspondence will automatically be aligned to each other since they have

the similar reference point Every feature vector of the template is a set of

line segment descriptors composed of three variable (Figure 8) the

segment angle to the reference line which went through the iris center

denoted as θ the distance between the segments center and pupil center

which is denoted as r the dominant angular orientation of the segment

denoted as ɸ To minimize the GPU computing we also convert the

descriptor value from polar coordinate to rectangular coordinate in CPU

preprocess

The descriptor vector becomes s(x y r θ ɸw) The left and right

parts of sclera in an eye may have different registration parameters For

example as an eyeball moves left left part sclera patterns of the eye may be

compressed while the right part sclera patterns are stretched

In parallel matching these two parts are assigned to threads in

different warps to allow different deformation The multiprocessor in

CUDA manages threads in groups of 32 parallel threads called warps We

reorganized the descriptor from same sides and saved

FIG

FIG

them in continuous address This would meet requirement of coalesced

memory access in GPU

After reorganizing the structure of descriptors and adding mask information

into the new descriptor the computation on the mask file is not needed on

the GPU It was very fast to match with this feature because it does not

need to reregister the templates every time after shifting Thus the cost of

data transfer and computation on GPU will be reduced Matching on the

new descriptor the shift parameters generator in Figure 4 is then simplified

as Figure 9

23 EVOLUTION OF GPU ARCHITECTURE

The fixed-function pipeline lacked the generality to efficiently express

more complicated shading and lighting operations that are essential for

complex effects The key step was replacing the fixed-function per-vertex

and per-fragment operations with user-specified programs run on each

vertex and fragment Over the past six years these vertex programs and

fragment programs have become increasingly more capable with larger

limits on their size and resource consumption with more fully featured

instruction sets and with more flexible control-flow operations After many

years of separate instruction sets for vertex and fragment operations current

GPUs support the unified Shader Model 40 on both vertex and fragment

shaders

The hardware must support shader programs of at least 65 k static

instructions and unlimited dynamic instructions

The instruction set for the first time supports both 32-bit integers and 32-

bit floating-point numbers

The hardware must allow an arbitrary number of both direct and indirect

reads from global memory (texture)

Finally dynamic flow control in the form of loops and branches must be

supported

As the shader model has evolved and become more powerful and GPU

applications of all types have increased vertex and fragment program

complexity GPU architectures have increasingly focused on the

programmable parts of the graphics pipeline Indeed while previous

generations of GPUs could best be described as additions of

programmability to a fixed-function pipeline todayrsquos GPUs are better

characterized as a programmable engine surrounded by supporting fixed-

function units General-Purpose Computing on the GPU Mapping general-

purpose computation onto the GPU uses the graphics hardware in much the

same way as any standard graphics application Because of this similarity it

is both easier and more difficult to explain the process On one hand the

actual operations are the same and are easy to follow on the other hand the

terminology is different between graphics and general-purpose use Harris

provides an excellent description of this mapping process

We begin by describing GPU programming using graphics terminology

then show how the same steps are used in a general-purpose way to author

GPGPU applications and finally use the same steps to show the more

simple and direct way that todayrsquos GPU computing applications are written

231 PROGRAMMING A GPU FOR GRAPHICS

We begin with the same GPU pipeline that we described in Section II

concentrating on the programmable aspects of this pipeline

The programmer specifies geometry that covers a region on the screen

The rasterizer generates a fragment at each pixel location covered by that

geometry

Each fragment is shaded by the fragment program

The fragment program computes the value of the fragment by a

combination of math operations and global memory reads from a global

Btexture[ memory

The resulting image can then be used as texture on future passes through

the graphics pipeline

232 PROGRAMMING A GPU FOR GENERAL-PURPOSE

PROGRAMS (OLD)

Coopting this pipeline to perform general-purpose computation

involves the exact same steps but different terminology A motivating

example is a fluid simulation computed over a grid at each time step we

compute the next state of the fluid for each grid point from the current state

at its grid point and at the grid points of its neighbors

The programmer specifies a geometric primitive that covers a

computation domain of interest The rasterizer generates a fragment at each

pixel location covered by that geometry (In our example our primitive

must cover a grid of fragments equal to the domain size of our fluid

simulation)

Each fragment is shaded by an SPMD general-purpose fragment

program (Each grid point runs the same program to update the state of its

fluid)

The fragment program computes the value of the fragment by a

combination of math operations and Bgather[ accesses from global

memory (Each grid point can access the state of its neighbors from the

previous time step in computing its current value)

The resulting buffer in global memory can then be used as an input on

future passes (The current state of the fluid will be used on the next time

step)

23 2 PROGRAMMING A GPU FOR GENERAL-PURPOSE

PROGRAMS (NEW)

One of the historical difficulties in programming GPGPU applications

has been that despite their general-purpose tasksrsquo having nothing to do with

graphics the applications still had to be programmed using graphics APIs

In addition the program had to be structured in terms of the graphics

pipeline with the programmable units only accessible as an intermediate

step in that pipeline when the programmer would almost certainly prefer to

access the programmable units directly The programming environments we

describe in detail in Section IV are solving this difficulty by providing a

more natural direct non-graphics interface to the hardware and

specifically the programmable units Today GPU computing applications

are structured in the following way

The programmer directly defines the computation domain of interest as a

structured grid of threads

An SPMD general-purpose program computes the value of each thread

The value for each thread is computed by a combination of math

operations and both Bgather[ (read) accesses from and Bscatter[ (write)

accesses to global memory Unlike in the previous two

methods the same buffer can be used for both reading and writing

allowing more flexible algorithms (for example in-place algorithms that

use less memory)

The resulting buffer in global memory can then be used as an input in

future computation

24 COARSE-TO-FINE TWO-STAGE MATCHING PROCESS

To further improve the matching process we propose the coarse-to-fine

two-stage matching process In the first stage we matched two images

coarsely using the Y-shape descriptors which is very fast to match because

no registration was needed The matching result in this stage can help filter

out image pairs with low similarities After this step it is still possible for

some false positive matches In the second stage we used WPL descriptor

to register the two images for more detailed descriptor matching including

scale- and translation invariance This stage includes shift transform affine

matrix generation and final WPL descriptor matching Overall we

partitioned the registration and matching processing into four kernels2 in

CUDA (Figure 10) matching on the Y shape descriptor shift

transformation affine matrix generation and final WSL descriptor

matching Combining these two stages the matching program can run faster

and

achieve more accurate score

241 STAGE I MATCHING WITH Y SHAPE DESCRIPTOR

Due to scale- and rotation- invariance of the Y-shape features

registration is unnecessary before matching on Y shape descriptor The

whole matching algorithm is listed as algorithm 1

FIG

Here ytei and yta j are the Y shape descriptors of test template Tte

and target template Tta respectively dϕ is the Euclidian distance of angle

element of descriptors vector defined as (3) dxy is the Euclidian distance of

two descriptor centers defined as (4) ni and di are the matched descriptor

pairsrsquo number and their centers distance respectively tϕ is a distance

threshold and txy is the threshold to restrict the searching area We set tϕ to

30 and txy to 675 in our experiment Here

To match two sclera templates we searched the areas nearby to all

the Y shape branches The search area is limited to the corresponding left or

right half of the sclera in order to reduce the searching range and time The

distance of two branches is defined in (3) where ϕi j is the angle between

the j th branch and the polar from pupil center in desctiptor i

The number of matched pairs ni and the distance between Y shape

branches centers di are stored as the matching result We fuse the number of

matched branches and the average distance between matched branches

centers as (2) Here α is a factor to fuse the matching score which was set

to 30 in our study Ni and Nj is the total numbers of feature vectors in

template i and j separately The decision is regulated by the threshold t if

the sclerarsquos matching score is lower than t the sclera will be discarded The

sclera with high matching score will be passed to the next more precisely

matching process

242 STAGE II FINE MATCHING USING WPL DESCRIPTOR

The line segment WSL descriptor reveals more vessel structure detail of

sclera than the Y shape descriptor The variation of sclera vessel pattern is

nonlinear because

When acquiring an eye image in different gaze angle the vessel structure

will appear nonlinear shrink or extend because eyeball is spherical in shape

sclera is made up of four layers episclera stroma lamina fusca and

endothelium There are slightly differences among movement of these

layers Considering these factors our registration employed both single

shift transform and multi-parameter transform which combines shift

rotation and scale together

1) SHIFT PARAMETER SEARCH As we discussed before

segmentation may not be accurate As a result the detected iris center could

not be very accurate Shift transform is designed to tolerant possible errors

in pupil center detection in the segmentation step If there is no deformation

or only very minor deformation registration with shift transform together

would be adequate to achieve an accurate result We designed Algorithm 2

to get optimized shift parameter Where Tte is the test template and ssei is

the i th WPL descriptor of Tte Tta is the target template and ssai is the i th

WPL descriptor of Ttad(stek staj ) is Euclidean distance of escriptors stek

and staj

Δsk is the shift value of two descriptors defines as

We first randomly select an equal number of segment descriptors

stek in test template Tte from each quad and find its nearest neighbors staj _

in target template Tta The shift offset of them is recorded as the possible

registration shift factor _sk The final offset registration factor is _soptim

which has the smallest standard deviation among these candidate offsets

2) AFFINE TRANSFORM PARAMETER SEARCH

Affine transform is designed to tolerant some deformation of sclera

patterns in the matching step The affine transform algorithm is shown in

Algorithm 3 The shift value in the parameter set is obtained by randomly

selecting descriptor s(it )te and calculating the distance from its nearest

neighbor staj_ in Tta We transform the test template by the matrix in (7)

At end of the iteration we count the numbers of matched descriptor pairs

from the transformed template and the target template The factor β is

involved to determine if the pair of descriptor is matched and we set it to

be 20 pixels in our experiment After N iterations the optimized transform

parameter set is determined via selecting the maximum matching numbers

m(it) Here stei Tte staj and Tta is defined same as algorithm 2 tr (it )

shi f t θ(it )tr (it ) scale is the parameters of shift rotation and scale

transform generated in i tth iteration R(θ (it )) T (tr (it ) shi f t ) and S(tr (it

) scale) are the transform matrix defined as (7) To search optimize

transform parameter we iterated N times to generate these parameters In

our experiment we set iteration time to 512

3) REGISTRATION AND MATCHING ALGORITHM

Using the optimized parameter set determined from Algorithms 2

and 3 the test template will be registered and matched simultaneously The

registration and matching algorithm is listed in Algorithm 4 Here stei Tte

staj and Tta are defined same as Algorithms 2 and 3 θ(optm) tr (optm) shi f

t tr (optm) scale _soptim are the registration parameters attained from

Algorithms 2 and 3 R_θ(optm)_T _tr (optm) shi f t _S_tr (optm) scale

_ is the descriptor transform matrix defined in Algorithm 3 empty is the angle

between the segment descriptor and radius direction w is the weight of the

descriptor which indicates whether the descriptor is at the edge of sclera or

not To ensure that the nearest descriptors have a similar orientation we

used a constant factor α to check the abstract difference of two ɸ In our

experiment we set α to 5 The total matching score is minimal score of two

transformed result divided by the minimal matching score for test template

and target template

25 MAPPING THE SUBTASKS TO CUDA

CUDA is a single instruction multiple data (SIMD) system and

works as a coprocessor with a CPU A CUDA consists of many streaming

multiprocessors (SM) where the parallel part of the program should be

partitioned into threads by the programmer and mapped into those threads

There are multiple memory spaces in the CUDA memory hierarchy

register local memory shared memory global memory constant memory

and texture memory Register local memory and shared memory are on-

chip and could be a little time consuming to access these memories Only

shared memory can be accessed by other threads within the same block

However there is only limited availability of shared memory Global

memory constant memory and texture memory are off-chip memory and

accessible by all threads which would be very time consuming to access

these memories

Constant memory and texture memory are read-only and cacheable

memory Mapping algorithms to CUDA to achieve efficient processing is

not a trivial task There are several challenges in CUDA programming

If threads in a warp have different control path all the branches will be

executed serially To improve performance branch divergence within a

warp should be avoided

Global memory is slower than on-chip memory in term of access To

completely hide the latency of the small instructions set we should use on-

chip memory preferentially rather than global memory When global

memory access occurs threads in same warp should access the words in

sequence to achieve coalescence

Shared memory is much faster than the local and global memory space

But shared memory is organized into banks which are equal in size If two

addresses of memory request from different thread within a warp fall in the

same memory bank the access will be serialized To get maximum

performance memory requests should be scheduled to minimize bank

conflicts

251 MAPPING ALGORITHM TO BLOCKS

Because the proposed registration and matching algorithm has four

independent modules all the modules will be converted to different kernels

on the GPU These kernels are different in computation density thus we

map them to the GPU by various map strategies to fully utilize the

computing power of CUDA Figure 11 shows our scheme of CPU-GPU

task distribution and the partition among blocks and threads Algorithm 1 is

partitioned into coarse-grained parallel subtasks

We create a number of threads in this kernel The number of threads

is the same as the number of templates in the database As the upper middle

column shows in Figure 11 each target template will be assigned to one

thread One thread performs a pair of templates compare In our work we

use NVIDIA C2070 as our GPU Threads and blocks number is set to

1024 That means we can match our test template with up to 1024times1024

target templates at same time

Algorithms 2-4 will be partitioned into fine-grained subtasks which is

processed a section of descriptors in one thread As the lower portion of the

middle column shows in Figure 11 we assigned a target template to one

block Inside a block one thread corresponds a set of descriptors in this

template This partition makes every block execute independently and there

are no data exchange requirements between different blocks When all

threads complete their responding descriptor fractions the sum of the

intermediate results needs to be computed or compared A parallel prefix

sum algorithm is used to calculate the sum of intermediate results which is

show in right of Figure 11 Firstly all odd number threads compute the sum

of consecutive pairs of the results Then recursively every first of i (= 4 8

16 32 64 ) threads

compute the prefix sum on the new result The final result will be saved in

the first address which has the same variable name as the first intermediate

result

252 MAPPING INSIDE BLOCK

In shift argument searching there are two schemes we can choose to

map task

Mapping one pair of templates to all the threads in a block and then every

thread would take charge of a fraction of descriptors and cooperation with

other threads

Assigning a single possible shift offset to a thread and all the threads will

compute independently unless the final result should be compared with

other possible offset

Due to great number of sum and synchronization operations in every

nearest neighbor searching step we choose the second method to parallelize

shift searching In affine matrix generator we mapped an entire parameter

set searching to a thread and every thread randomly generated a set of

parameters and tried them independently The generated iterations were

assigned to all threads The challenge of this step is the randomly generated

numbers might be correlated among threads In the step of rotation and

scale registration generating we used the Mersenne Twister pseudorandom

number generator because it can use bitwise arithmetic and have long

period

The Mersenne twister as most of pseudorandom generators is iterative

Therefore itrsquos hard to parallelize a single twister state update step among

several execution threads To make sure that thousands of threads in the

launch grid generate uncorrelated random sequence many simultaneous

Mersenne twisters need to process with different initial states in parallel

But even ldquovery differentrdquo (by any definition) initial state values do not

prevent the emission of correlated sequences by each generator sharing

identical parameters To solve this problem and to enable efficient

implementation of Mersenne Twister on parallel architectures we used a

special offline tool for the dynamic creation of Mersenne Twisters

parameters modified from the algorithm developed by Makoto Matsumoto

and Takuji Nishimura In the registration and matching step when

searching the nearest neighbor a line segment that has already matched

with others should not be used again In our approach a flag

FIG

FIG

Variable denoting whether the line has been matched is stored in

shared memory To share the flags all the threads in a block should wait

synchronic operation at every query step Our solution is to use a single

thread in a block to process the matching

253 MEMORY MANAGEMENT

The bandwidth inside GPU board is much higher than the

bandwidth between host memory and device memory The data transfer

between host and device can lead to long latency As shown in Figure 11

we load the entire target templates set from database without considering

when they would be processed Therefore there was no data transfer from

host to device during the matching procedure In global memory the

components in descriptor y(ϕ1 ϕ2 ϕ3 x y) and s(x y rθ ϕw) were stored

separately This would guarantee contiguous kernels of Algorithm 2 to 4

can access their data in successive addresses Although such coalescing

access reduces the latency frequently global memory access was still a

slower way to get data In our kernel we loaded the test template to shared

memory to accelerate memory access Because the Algorithms 2 to 4

execute different number of iterations on same data the bank conflict does

not happen To maximize our texture memory space we set the system

cache to the lowest value and bonded our target descriptor to texture

memory Using this catchable memory our data access was accelerated

more

FIG

26 HISTOGRAM OF ORIENTED GRADIENTS

Histogram of oriented gradients is the feature descriptors It is primarily

applied to the design of target detection In this paper it is applied as the

feature for human recognition In the sclera region the vein patterns are the

edges of an image So HOG is used to determine the gradient orientation

and edge orientations of vein pattern in the sclera region of an eye image

To follow out this technique first of all divide the image into small

connected regions called cells For each cell compute the histogram of

gradient directions or edge orientations of the pixels Then the combination

of different histogram of different cell represents the descriptor To improve

accuracy histograms can be contrast normalized by calculating the intensity

from the block and then using this value normalizes all cells within the

block This normalization result shows that it is invariant to geometric and

photometric changes The gradient magnitude m(x y) and orientation 1050592(x

y) are calculated using x and y directions gradients dx (x y) and dy (x y)

Orientation binning is the second step of HOG This method utilized

to create cell histograms Each pixel within the cell used to give a weight to

the orientation which is found in the gradient computation Gradient

magnitude is used as the weight The cells are in the rectangular form The

binning of gradient orientation should be spread over 0 to 180 degrees and

opposite direction counts as the same In the Fig 8 depicts the edge

orientation of picture elements If the images have any illumination and

contrast changes then the gradient strength must be locally normalized For

that cells are grouped together into larger blocks These blocks are

overlapping so that each cell contributes more than once to the final

descriptor Here rectangular HOG (R-HOG) blocks are applied which are

mainly in square grids The performance of HOG is improved by putting

on a Gaussian window into each block

FIG

CHAPTER 3

SOFTWARE SPECIFICATION

31 GENERAL

MATLAB(matrix laboratory) is a numerical

computing environment and fourth-generation programming language

Developed by Math Works MATLAB allows matrix manipulations

plotting of functions and data implementation of algorithms creation

of user interfaces and interfacing with programs written in other languages

including C C++ Java and Fortran

Although MATLAB is intended primarily for numerical computing an

optional toolbox uses the MuPAD symbolic engine allowing access

to symbolic computing capabilities An additional package Simulink adds

graphicalmulti-domainsimulationandModel-Based

Design for dynamic and embedded systems

In 2004 MATLAB had around one million users across industry

and academia MATLAB users come from various backgrounds

of engineering science and economics MATLAB is widely used in

academic and research institutions as well as industrial enterprises

MATLAB was first adopted by researchers and practitioners

in control engineering Littles specialty but quickly spread to many other

domains It is now also used in education in particular the teaching

of linear algebra and numerical analysis and is popular amongst scientists

involved in image processing The MATLAB application is built around the

MATLAB language The simplest way to execute MATLAB code is to type

it in the Command Window which is one of the elements of the MATLAB

Desktop When code is entered in the Command Window MATLAB can

be used as an interactive mathematical shell Sequences of commands can

be saved in a text file typically using the MATLAB Editor as a script or

encapsulated into a function extending the commands available

MATLAB provides a number of features for documenting and

sharing your work You can integrate your MATLAB code with other

languages and applications and distribute your MATLAB algorithms and

applications

32 FEATURES OF MATLAB

High-level language for technical computing

Development environment for managing code files and data

Interactive tools for iterative exploration design and problem solving

Mathematical functions for linear algebra statistics Fourier analysis

filtering optimization and numerical integration

2-D and 3-D graphics functions for visualizing data

Tools for building custom graphical user interfaces

Functions for integrating MATLAB based algorithms with external

applications and languages such as C C++ FORTRAN Javatrade COM

and Microsoft Excel

MATLAB is used in vast area including signal and image

processing communications control design test and measurement

financial modeling and analysis and computational Add-on toolboxes

(collections of special-purpose MATLAB functions) extend the MATLAB

environment to solve particular classes of problems in these application

areas

MATLAB can be used on personal computers and powerful

server systems including the Cheaha compute cluster With the addition of

the Parallel Computing Toolbox the language can be extended with parallel

implementations for common computational functions including for-loop

unrolling Additionally this toolbox supports offloading computationally

intensive workloads to Cheaha the campus compute cluster MATLAB is

one of a few languages in which each variable is a matrix (broadly

construed) and knows how big it is Moreover the fundamental operators

(eg addition multiplication) are programmed to deal with matrices when

required And the MATLAB environment handles much of the bothersome

housekeeping that makes all this possible Since so many of the procedures

required for Macro-Investment Analysis involves matrices MATLAB

proves to be an extremely efficient language for both communication and

implementation

321 INTERFACING WITH OTHER LANGUAGES

MATLAB can call functions and subroutines written in the C

programming language or FORTRAN A wrapper function is created

allowing MATLAB data types to be passed and returned The dynamically

loadable object files created by compiling such functions are termed MEX-

files (for MATLAB executable)

Libraries written in Java ActiveX or NET can be directly called

from MATLAB and many MATLAB libraries (for

example XML or SQL support) are implemented as wrappers around Java

or ActiveX libraries Calling MATLAB from Java is more complicated but

can be done with MATLAB extension which is sold separately by Math

Works or using an undocumented mechanism called JMI (Java-to-Mat lab

Interface) which should not be confused with the unrelated Java that is also

called JMI

As alternatives to the MuPAD based Symbolic Math Toolbox

available from Math Works MATLAB can be connected

to Maple or Mathematical

Libraries also exist to import and export MathML

Development Environment

Startup Accelerator for faster MATLAB startup on Windows especially on

Windows XP and for network installations

Spreadsheet Import Tool that provides more options for selecting and

loading mixed textual and numeric data

Readability and navigation improvements to warning and error messages in

the MATLAB command window

Automatic variable and function renaming in the MATLAB Editor

Developing Algorithms and Applications

MATLAB provides a high-level language and development

tools that let you quickly develop and analyze your algorithms and

applications

The MATLAB Language

The MATLAB language supports the vector and matrix operations

that are fundamental to engineering and scientific problems It enables fast

development and execution With the MATLAB language you can

program and develop algorithms faster than with traditional languages

because you do not need to perform low-level administrative tasks such as

declaring variables specifying data types and allocating memory In many

cases MATLAB eliminates the need for lsquoforrsquo loops As a result one line of

MATLAB code can often replace several lines of C or C++ code

At the same time MATLAB provides all the features of a traditional

programming language including arithmetic operators flow control data

structures data types object-oriented programming (OOP) and debugging

features

MATLAB lets you execute commands or groups of commands one

at a time without compiling and linking enabling you to quickly iterate to

the optimal solution For fast execution of heavy matrix and vector

computations MATLAB uses processor-optimized libraries For general-

purpose scalar computations MATLAB generates machine-code

instructions using its JIT (Just-In-Time) compilation technology

This technology which is available on most platforms provides

execution speeds that rival those of traditional programming languages

Development Tools

MATLAB includes development tools that help you implement

your algorithm efficiently These include the following

MATLAB Editor

Provides standard editing and debugging features such as setting

breakpoints and single stepping

Code Analyzer

Checks your code for problems and recommends modifications to

maximize performance and maintainability

MATLAB Profiler

Records the time spent executing each line of code

Directory Reports

Scan all the files in a directory and report on code efficiency file

differences file dependencies and code coverage

Designing Graphical User Interfaces

By using the interactive tool GUIDE (Graphical User Interface

Development Environment) to layout design and edit user interfaces

GUIDE lets you include list boxes pull-down menus push buttons radio

buttons and sliders as well as MATLAB plots and Microsoft

ActiveXreg controls Alternatively you can create GUIs programmatically

using MATLAB functions

322 ANALYZING AND ACCESSING DATA

MATLAB supports the entire data analysis process from acquiring

data from external devices and databases through preprocessing

visualization and numerical analysis to producing presentation-quality

output

Data Analysis

MATLAB provides interactive tools and command-line functions for data

analysis operations including

Interpolating and decimating

Extracting sections of data scaling and averaging

Thresholding and smoothing

Correlation Fourier analysis and filtering

1-D peak valley and zero finding

Basic statistics and curve fitting

Matrix analysis

Data Access

MATLAB is an efficient platform for accessing data from

files other applications databases and external devices You can read data

from popular file formats such as Microsoft Excel ASCII text or binary

files image sound and video files and scientific files such as HDF and

HDF5 Low-level binary file IO functions let you work with data files in

any format Additional functions let you read data from Web pages and

XML

Visualizing Data

All the graphics features that are required to visualize engineering

and scientific data are available in MATLAB These include 2-D and 3-D

plotting functions 3-D volume visualization functions tools for

interactively creating plots and the ability to export results to all popular

graphics formats You can customize plots by adding multiple axes

changing line colors and markers adding annotation Latex equations and

legends and drawing shapes

2-D Plotting

Visualizing vectors of data with 2-D plotting functions that create

Line area bar and pie charts

Direction and velocity plots

Histograms

Polygons and surfaces

Scatterbubble plots

Animations

3-D Plotting and Volume Visualization

MATLAB provides functions for visualizing 2-D matrices 3-

D scalar and 3-D vector data You can use these functions to visualize and

understand large often complex multidimensional data Specifying plot

characteristics such as camera viewing angle perspective lighting effect

light source locations and transparency

3-D plotting functions include

Surface contour and mesh

Image plots

Cone slice stream and isosurface

323 PERFORMING NUMERIC COMPUTATION

MATLAB contains mathematical statistical and engineering

functions to support all common engineering and science operations These

functions developed by experts in mathematics are the foundation of the

MATLAB language The core math functions use the LAPACK and BLAS

linear algebra subroutine libraries and the FFTW Discrete Fourier

Transform library Because these processor-dependent libraries are

optimized to the different platforms that MATLAB supports they execute

faster than the equivalent C or C++ code

MATLAB provides the following types of functions for performing

mathematical operations and analyzing data

Matrix manipulation and linear algebra

Polynomials and interpolation

Fourier analysis and filtering

Data analysis and statistics

Optimization and numerical integration

Ordinary differential equations (ODEs)

Partial differential equations (PDEs)

Sparse matrix operations

MATLAB can perform arithmetic on a wide range of data types

including doubles singles and integers

CHAPTER 4

IMPLEMENTATION

41 GENERAL

Matlab is a program that was originally designed to simplify the

implementation of numerical linear algebra routines It has since grown into

something much bigger and it is used to implement numerical algorithms

for a wide range of applications The basic language used is very similar to

standard linear algebra notation but there are a few extensions that will

likely cause you some problems at first

42 SNAPSHOTS

ORIGINAL SCLERA IMAGE IS CONVERTED INTO GREY SCALE IMAGE

FIG

GREY SCALE IMAGE IS CONVERTED INTO BINARY IMAGE

FIG

EDGE DETECTON IS DONE BY OTSUrsquoS THRESHOLDING

FIG

SELECTING THE REGION OF INTEREST (SCLERA PART)

FIG

SELECTED ROI PART

FIG

FIG

ENHANCEMENT OF SCLERA IMAGE

FIG

FEATURE EXTRACTION OF SCLERA IMAGE USING GABOR

FILTERS

FIG

MATCHING WITH IMAGES IN DATABASE

FIG

DISPLAYING THE RESULT (MATCHED OR NOT MATCHED)

FIG

CHAPTER 5

APPLICATIONS

The applications of biometrics can be divided into the following three main groups

Commercial applications such as computer network login electronic data security ecommerce Internet access ATM credit card physical access control cellular phone PDA medical records management distance learning etc

Government applications such as national ID card correctional facility driverrsquos license social security welfare-disbursement border control Passport control etc

Forensic applications such as corpse identification criminal investigation terrorist identification parenthood determination missing children etc Traditionally commercial applications have used knowledge-based systems (eg PIN sand passwords) government applications have used token-based systems (eg ID cards and badges) and forensic applications have relied on human experts to match biometric features Biometric systems are being increasingly deployed in large scale civilian applications The Schiphol Premium scheme at the Amsterdam airport for example employs iris scan cards to speed up the passport and visa control procedures

CHAPTER 6

CONCLUSION AND FUTURE SCOPE

61 CONCLUSION

In this paper we proposed a new parallel sclera vein recognition

method which employees a two stage parallel approach for registration and

matching Even though the research focused on developing a parallel sclera

matching solution for the sequential line-descriptor method using CUDA

GPU architecture the parallel strategies developed in this research can be

applied to design parallel solutions to other sclera vein recognition methods

and general pattern recognition methods We designed the Y shape

descriptor to narrow the search range to increase the matching efficiency

which is a new feature extraction method to take advantage of the GPU

structures We developed the WPL descriptor to incorporate mask

information and make it more suitable for parallel computing which can

dramatically reduce data transferring and computation We then carefully

mapped our algorithms to GPU threads and blocks which is an important

step to achieve parallel computation efficiency using a GPU A work flow

which has high arithmetic intensity to hide the memory access latency was

designed to partition the computation task to the heterogeneous system of

CPU and GPU even to the threads in GPU The proposed method

dramatically improves the matching efficiency without compromising

recognition accuracy

62 REFERENCES

[1] C W Oyster The Human Eye Structure and Function Sunderland

Sinauer Associates 1999

[2] C Cuevas D Berjon F Moran and N Garcia ldquoMoving object

detection for real-time augmented reality applications in a GPGPUrdquo IEEE

Trans Consum Electron vol 58 no 1 pp 117ndash125 Feb 2012

[3] D C Cirean U Meier L M Gambardella and J Schmidhuber ldquoDeep

big simple neural nets for handwritten digit recognitionrdquo Neural Comput

vol 22 no 12 pp 3207ndash3220 2010

[4] F Z Sakr M Taher and A M Wahba ldquoHigh performance iris

recognition system on GPUrdquo in Proc ICCES 2011 pp 237ndash242

[5] G Poli J H Saito J F Mari and M R Zorzan ldquoProcessing

neocognitron of face recognition on high performance environment based

on GPU with CUDA architecturerdquo in Proc 20th Int Symp Comput

Archit High Perform Comput 2008 pp 81ndash88

[6] J Antikainen J Havel R Josth A Herout P Zemcik and M Hauta-

Kasari ldquoNonnegative tensor factorization accelerated using GPGPUrdquo IEEE

Trans Parallel Distrib Syst vol 22 no 7 pp 1135ndash1141 Feb 2011

[7] K-S Oh and K Jung ldquoGPU implementation of neural networksrdquo

Pattern Recognit vol 37 no 6 pp 1311ndash1314 2004

[8] P R Dixon T Oonishi and S Furui ldquoHarnessing graphics processors

for the fast computation of acoustic likelihoods in speech recognitionrdquo

Comput Speech Lang vol 23 no 4 pp 510ndash526 2009

[9] P Kaufman and A Alm ldquoClinical applicationrdquo Adlerrsquos Physiology of

the Eye 2003

[10] R N Rakvic B J Ulis R P Broussard R W Ives and N Steiner

ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics Security vol 4

no 4 pp 812ndash823 Dec 2009

[11] S Crihalmeanu and A Ross ldquoMultispectral scleral patterns for ocular

biometric recognitionrdquo Pattern Recognit Lett vol 33 no 14 pp 1860ndash

1869 Oct 2012

[12] W Wenying Z Dongming Z Yongdong L Jintao and G

Xiaoguang ldquoRobust spatial matching for object retrieval and its parallel

implementation on GPUrdquo IEEE Trans Multimedia vol 13 no 6 pp

1308ndash1318 Dec 2011Multimedia Sec Magdeburg Germany Sep 2004

pp 4ndash15

[13] Y Xu S Deka and R Righetti ldquoA hybrid CPU-GPGPU approach for

real-time elastographyrdquo IEEE Trans Ultrason Ferroelectr Freq Control

vol 58 no 12 pp 2631ndash2645 Dec 2011

[14] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive

multimodal eye recognitionrdquo Signal Image Video Process vol 7 no 4

pp 619ndash631 Jul 2013

[15] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive

approach for sclera image quality measurerdquo Int J Biometrics vol 5 no 2

pp 181ndash198 2013

[16] Z Zhou E Y Du N L Thomas and E J Delp ldquoA new human

identification method Sclera recognitionrdquo IEEE Trans Syst Man

Cybern A Syst Humans vol 42 no 3 pp 571ndash583 May 2012

Page 22: Major project report on

involve convolution which is difficult to improve its performance on the

scalar process unit on CUDA

2 The procedure of sclera feature matching consists of a pipeline of several

computational stages with different memory and processing requirements

There is no uniform mapping scheme applicable for all these stages

3 When the scale of sclera database is far larger than the number of the

processing units on the GPU parallel matching on the GPU is still unable to

satisfy the requirement of real-time performance

LITERATURE SURVEY

1 S Crihalmeanu and A Ross ldquo ieee Multispectral scleral patterns

for ocular biometric recognitionrdquo Pattern Recognit Lett vol 33 no

14 pp 1860ndash1869 Oct 2012

Face recognition in unconstrained acquisition conditions is one of the

most challenging problems that has been actively researched in recent

years It is well known that many state-of-the-arts still face recognition

algorithms perform well when constrained (frontal well illuminated high-

resolution sharp and full) face images are acquired However their

performance degrades significantly when the test images contain variations

that are not present in the training images In this paper we highlight some

of the key issues in remote face recognition We define the remote face

recognition as one where faces are several tens of meters (10-250m) from

the cameras We then describe a remote face database which has been

acquired in an unconstrained outdoor maritime environment Recognition

performance of a subset of existing still image-based face recognition

algorithms is evaluated on the remote face data set Further we define the

remote re-identification problem as matching a subject at one location with

candidate sets acquired at a different location and over time in remote

conditions We provide preliminary experimental results on remote re-

identification It is demonstrated that in addition to applying a good

classification algorithm finding features that are robust to variations

mentioned above and developing statistical models which can account for

these variations are very important for remote face recognition

2 R N Rakvic B J Ulis R P Broussard R W Ives and N

Steiner1 ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics

Security

With the rapidly expanded biometric data collected by various sectors

of government and industry for identification and verification purposes

how to manage and process such Big Data draws great concern Even

though modern processors are equipped with more cores and memory

capacity it still requires careful design in order to utilize the hardware

resource effectively and the power consumption efficiently This research

addresses this issue by investigating the workload characteristics of

biometric application Taking Daugmanrsquos iris matching algorithm which

has been proven to be the most reliable iris matching method as a case

study we conduct performance profiling and binary instrumentation on the

benchmark to capture its execution behavior The results show that data

loading and memory access incurs great performance overhead and

motivates us to move the biometrics computation to high-performance

architecture

Modern iris recognition algorithms can be computationally intensive

yet are designed for traditional sequential processing elements such as a

personal computer However a parallel processing alternative using field

programmable gate arrays (FPGAs) offers an opportunity to speed up iris

recognition Within the means of this project iris template generation with

directional filtering which is acomputationally expensive yet parallel

portion of a modern iris recognition algorithm is parallelized on an FPGA

system We will present a performance comparison of the parallelized

algorithm on the FPGA system to a traditional CPU-based version The

parallelized template generation outperforms an optimized C++ code

version determining the information content of an iris approximately 324

times faster

3 R Derakhshani A Ross and S Crihalmeanu ldquoA new biometric

modality based on conjunctival vasculaturerdquo in Proc Artif Neural

Netw Eng 2006 pp 1ndash8

A new biometric indicator based on the patterns of conjunctival

vasculature is proposed Conjunctival vessels can be observed on the visible

part of the sclera that is exposed to the outside world These vessels

demonstrate rich and specific details in visible light and can be easily

photographed using a regular digital camera In this paper we discuss

methods for conjunctival imaging preprocessing and feature extraction in

order to derive a suitable conjunctival vascular template for biometric

authentication Commensurate classification methods along with the

observed accuracy are discussed Experimental results suggest the potential

of using conjunctival vasculature as a biometric measure Identification of

a person based on some unique set of features is an important task The

human identification is possible with several biometric systems and sclera

recognition is one of the promising biometrics The sclera is the white

portion of the human eye The vein pattern seen in the sclera region is

unique to each person Thus the sclera vein pattern is a well suited

biometric technology for human identification The existing methods used

for sclera recognition have some drawbacks like only frontal looking

images are preferred for matching and rotation variance is another problem

These problems are completely eliminated in the proposed system by using

two feature extraction techniques They are Histogram of Oriented

Gradients (HOG) and converting the image into polar form using the

bilinear interpolation technique These two features help the proposed

system to become illumination invariant and rotation invariant The

experimentation is done with the help of UBIRIS database The

experimental result shows that the proposed sclera recognition method can

achieve better accuracy than the previous methods

4 J D Owens M Houston D Luebke S Green J E Stone and J

C Phillips ldquoGPU computingrdquo Proc IEEE vol 96 no 5pp 879ndash899

May 2008

The graphics processing unit (GPU) has become an integral part of

todayrsquos mainstream computing systems Over the past six years there has

been a marked increase in the performance and capabilities of GPUs The

modern GPU is not only a powerful graphics engine but also a highly

parallel programmable processor featuring peak arithmetic and memory

bandwidth that substantially outpaces its CPU counterpart The GPUrsquos

rapid increase in both programmability and capability has spawned a

research community that has successfully mapped a broad range of

computationally demanding complex problems to the GPU This effort in

general purpose computing on the GPU also known as GPU computing

has positioned the GPU as a compelling alternative to traditional

microprocessors in high-performance computer systems of the future We

describe the background hardware and programming model for GPU

computing summarize the state of the art in tools and techniques and

present four GPU computing successes in game physics and computational

biophysics that deliver order-of-magnitude performance gains over

optimized CPU applications

5 H Proenccedila and L A Alexandre ldquoUBIRIS A noisy iris image

databaserdquoin Proc 13th Int Conf Image Anal Process 2005 pp 970ndash

977

This paper proposes algorithms for iris segmentation quality

enhancement match score fusion and indexing to improve both the

accuracy and the speed of iris recognition A curve evolution approach is

proposed to effectively segment a nonideal iris image using the modified

MumfordndashShah functional Different enhancement algorithms are

concurrently applied on the segmented iris image to produce multiple

enhanced versions of the iris image A support-vector-machine-based

learning algorithm selects locally enhanced regions from each globally

enhanced image and combines these good-quality regions to create a single

high-quality iris image Two distinct features are extracted from the high-

quality iris image The global textural feature is extracted using the 1-D log

polar Gabor transform and the local topological feature is extracted using

Euler numbers An intelligent fusion algorithm combines the textural and

topological matching scores to further improve the iris recognition

performance and reduce the false rejection rate whereas an indexing

algorithm enables fast and accurate iris identification The verification and

identification performance of the propose algorithms is validated and

compared with other algorithms using the CASIA Version 3 ICE 2005 and

UBIRIS iris databases

18 PROPOSED METHOD

Proposed a new parallel sclera vein recognition method using a two-

stage parallel approach for registration and matching A parallel sclera

matching solution for Sclera vein recognition using our sequential line-

descriptor method using the CUDA GPU architecture CUDA is a highly

parallel multithreaded many-core processor with tremendous

computational power

It supports not only a traditional graphics pipeline but also computation

on non-graphical data It is relatively straightforward to implement our C

program for CUDA on AMD-based GPU using Open CL Our CUDA

kernels can be directly converted to Open CL kernels by concerning

different syntax for various keywords and built-in functions The mapping

strategy is also effective in Open CL if we regard thread and block in

CUDA as work item and work-group in Open CL Most of our optimization

techniques such as coalesced memory access and prefix sum can work in

Open CL too Moreover since CUDA is a data parallel architecture the

implementation of our approach by Open CL should be programmed in

data-parallel model

In this research we first discuss why the naiumlve parallel approach would

not work We then propose the new sclera descriptor ndash the Y shape sclera

feature-based efficient registration method to speed up the mapping scheme

introduce the ldquoweighted polar line (WPL) descriptorrdquo that would be better

suited for parallel computing to mitigate the mask size issue and develop

our coarse to fine two-stage matching process to dramatically improve the

matching speed These new approaches make the parallel processing

possible and efficient

191PROPOSED SYSTEM ADVANTAGES

1 To improve the efficiency in this research we propose a new descriptor

mdash the Y shape descriptor which can greatly help improve the efficiency of

the coarse registration of two images and can be used to filter out some

non-matching pairs before refined matching

2 We propose the coarse-to-fine two-stage matching process In the first

stage we matched two images coarsely using the Y-shape descriptors

which is very fast to match because no registration was needed The

matching result in this stage can help filter out image pairs with low

similarities

CHAPTER 2

PROJECT DESCRIPTION

21 INTRODUCTION

The sclera is the opaque and white outer layer of the eye The blood

vessel structure of sclera is formed randomly and is unique to each person

which can be used for humanrsquos identification Several researchers have

designed different Sclera vein recognition methods and have shown that it

is promising to use Sclera vein recognition for human identification In

Crihalmeanu and Ross proposed three approaches Speed Up Robust

Features (SURF)-based method minutiae detection and direct correlation

matching for feature registration and matching Within these three methods

the SURF method achieves the best accuracy It takes an average of 15

seconds1 using the SURF method to per- form a one-to-one matching Zhou

et al proposed line descriptor-based method for sclera vein recognition

The matching step (including registration) is the most time-consuming step

in this sclera vein recognition system which costs about 12 seconds to

perform a one-to-one matching Both speed was calculated using a PC with

Intelreg Coretrade 2 Duo 24GHz processors and 4 GB DRAM Currently

Sclera vein recognition algorithms are designed using central processing

unit (CPU)-based systems

As discussed CPU-based systems are designed as sequential

processing devices which may not be efficient in data processing where the

data can be parallelized Because of large time consumption in the matching

step Sclera vein recognition using sequential-based method would be very

challenging to be implemented in a real time biometric system especially

when there is large number of templates in the database for matching GPUs

(as abbreviation of General purpose Graphics Processing Units GPGPUs)

are now popularly used for parallel computing to improve the

computational processing speed and efficiency The highly parallel

structure of GPUs makes them more effective than CPUs for data

processing where processing can be performed in parallel GPUs have been

widely used in biometrics recognition such as speech recognition text

detection handwriting recognition and face recognition In iris

recognition GPU was used to extract the features construct descriptors

and match templates

GPUs are also used for object retrieval and image search Park et al

designed the performance evaluation of image processing algorithms such

as linear feature extraction and multi-view stereo matching on GPUs

However these approaches were designed for their specific biometric

recognition applications and feature searching methods Therefore they may

not be efficient for Sclera vein recognition Compute Unified Device

Architecture (CUDA) the computing engine of NVIDIA GPUs is used in

this research CUDA is a highly parallel multithreaded many-core

processor with tremendous computational power It supports not only a

traditional graphics pipeline but also computation on non-graphical data

More importantly it offers an easier programming platform which

outperforms its CPU counterparts in terms of peak arithmetic intensity and

memory bandwidth In this research the goal is not to develop a unified

strategy to parallelize all sclera matching methods because each method is

quite different from one another and would need customized design To

develop an efficient parallel computing scheme it would need different

strategies for different Sclera vein recognition methods

Rather the goal is to develop a parallel sclera matching solution for

Sclera vein recognition using our sequential line-descriptor method using

the CUDA GPU architecture However the parallelization strategies

developed in this research can be applied to design parallel approaches for

other Sclera vein recognition methods and help parallelize general pattern

recognition methods Based on the matching approach in there are three

challenges to map the task of sclera feature matching to GPU

1) Mask files are used to calculate valid overlapping areas of two sclera

templates and to align the templates to the same coordinate system But the

mask files are large in size and will preoccupy the GPU memory and slow

down the data transfer Also some of processing on the mask files will

involve convolution which is difficult to improve its performance on the

scalar process unit on CUDA

2) The procedure of sclera feature matching consists of a pipeline of several

computational stages with different memory and processing requirements

There is no uniform mapping scheme applicable for all these stages

3) When the scale of sclera database is far larger than the number of the

processing units on the GPU parallel matching on the GPU is still unable to

satisfy the requirement of real-time performance New designs are

necessary to help narrow down the search range In summary naiumlve

implementation of the algorithms in parallel would not work efficiently

Note it is relatively straightforward to implement our C program for

CUDA on AMD-based GPU using Open CL Our CUDA kernels can be

directly converted to Open CL kernels by concerning different syntax for

various keywords and built-in functions The mapping strategy is also

effective in Open CL if we regard thread and block in CUDA as work item

and work-group in Open CL Most of our optimization techniques such as

coalesced memory access and prefix sum can work in Open CL too

Moreover since CUDA is a data parallel architecture the implementation

of our approach by Open CL should be programmed in data-parallel model

In this research we first discuss why the naiumlve parallel approach would not

work (Section 3) We then propose the new sclera descriptor ndash the Y shape

sclera feature-based efficient registration method to speed up the mapping

scheme (Section 4) introduce the ldquoweighted polar line (WPL) descriptorrdquo

that would be better suited for parallel computing to mitigate the mask size

issue (Section 5) and develop our coarse to fine two-stage matching

process to dramatically improve the matching speed (Section 6) These new

approaches make the parallel processing possible and efficient However it

is non-trivial to implement these algorithms in CUDA We then developed

the implementation schemes to map our algorithms into CUDA (Section 7)

In the Section 2 we give brief introduction of Sclera vein recognition In

the Section 8 we performed some experiments using the proposed system

In the Section 9 we draw some conclusions

22 BACKGROUND OF SCLERA VEIN RECOGNITION

221 OVERVIEW OF SCLERA VEIN RECOGNITION

A typical sclera vein recognition system includes sclera

segmentation feature enhancement feature extraction and feature

matching (Figure 1)

FIG

Sclera image segmentation is the first step in sclera vein recognition

Several methods have been designed for sclera segmentation Crihalmeanu

et al presented an semi-automated system for sclera segmentation They

used a clustering algorithm to classify the color eye images into three

clusters - sclera iris and background Later on Crihalmeanu and Ross

designed a segmentation approach based on a normalized sclera index

measure which includes coarse sclera segmentation pupil region

segmentation and fine sclera segmentation Zhou et al developed a skin

tone plus ldquowhite colorrdquo-based voting method for sclera segmentation in

color images and Otsursquos thresholding-based method for grayscale images

After sclera segmentation it is necessary to enhance and extract the sclera

features since the sclera vein patterns often lack contrast and are hard to

detect Zhou et al used a bank of multi-directional Gabor filters for

vascular pattern enhancement Derakhshani et al used contrast limited

adaptive histogram equalization (CLAHE) to enhance the green color plane

of the RGB image and a multi-scale region growing approach to identify

the sclera veins from the image background Crihalmeanu and Ross applied

a selective enhancement filter for blood vessels to extract features from the

green component in a color image In the feature matching step

Crihalmeanu and Ross proposed

three registration and matching approaches including Speed Up Robust

Features (SURF) which is based on interest-point detection minutiae

detection which is based on minutiae points on the vasculature structure

and direct correlation matching which relies on image registration Zhou et

al designed a line descriptor based feature registration and matching

method

The proposed sclera recognition consists of five steps which include

sclera segmentation vein pattern enhancement feature extraction feature

matching and matching decision Fig 2 shows the block diagram of sclera

recognition Two types of feature extraction are used in the proposed

method to achieve good accuracy for the identification The characteristics

that are elicited from the blood vessel structure seen in the sclera region are

Histogram of Oriented Gradients (HOG) and interpolation of Cartesian to

Polar conversion HOG is used to determine the gradient orientation and

edge orientations of vein pattern in the sclera region of an eye image To

become more computationally efficient the data of the image are converted

to the polar form It is mainly used for circular or quasi circular shape of

object These two characteristics are extracted from all the images in the

database and compared with the features of the query image whether the

person is correctly identified or not This procedure is done in the feature

matching step and ultimately makes the matching decision By using the

proposed feature extraction methods and matching techniques the human

identification is more accurate than the existing studies In the proposed

method two features of an image are drawn out

222 SCLERA SEGMENTATION

Sclera segmentation is the first step in the sclera recognition It lets

in three steps glare area detection sclera area estimation and iris and eyelid

detection and refinement Fig shows the steps of segmentation

FIG

Glare Area Detection Glare area means a small bright area near

pupil or iris This is the unwanted portion on the eye image Sobel filter is

applied to detect the glare area present in the iris or pupil Simply it runs

only for the grayscale image If the image is color then it needs a

conversion to grayscale image and after that apply it to the Sobel filter to

detect the glare area Fig 4 shows the result of the glare area detection

FIG

Sclera area estimation For the estimation of sclera area Otsursquos

thresholding method is applied The stairs of the sclera area detection are

selection of the area of interest (ROI) Otsursquos thresholding sclera area

detection Left and right sclera area is selected based on the iris boundaries

When the region of interest is selected then apply Otsursquos thresholding for

obtaining the potential sclera areas The correct left sclera area should be

placed in the right and center positions and correct right sclera area should

be placed in the left and center In this way non sclera areas are wiped out

223 IRIS AND EYELID REFINEMENT

The top and underside of the sclera regions are the limits of the

sclera area And then that upper eyelid lower eyelid and iris boundaries are

refined These altogether are the unwanted portion for recognition In order

to eliminate these effects refinement is done in the footstep of the detection

of sclera area Fig shows after the Otsursquos thresholding process and iris and

eyelid refinement to detect right sclera area In the same way the left sclera

area is detected using this method

FIG

In the segmentation process all images are not perfectly segmented

Hence feature extraction and matching are needed to reduce the

segmentation fault The vein patterns in the sclera area are not visible in the

segmentation process To get vein patterns more visible vein pattern

enhancement is to be performed

224 OCULAR SURFACE VASCULATURE

Human recognition using vascular patterns in the human body has

been studied in the context of fingers (Miura et al 2004) palm (Lin and

Fan 2004) and retina (Hill 1999) In the case of retinal biometrics an

especial optical device for imaging the back of the eyeball is needed (Hill

1999) Due to its perceived invasiveness and the required degree of subject

cooperation the use of retinal biometrics may not be acceptable to some

individuals The conjunctiva is a thin transparent and moist tissue that

covers the outer surface of the eye The part of the conjunctiva that covers

the inner lining of the eyelids is called palpebral conjunctiva and the part

that covers the outer surface of the eye is called ocular (or the bulbar)

conjunctiva which is the focus of this study The ocular conjunctiva is very

thin and clear thus the vasculature (including those of the episclera) is

easily visible through it The visible microcirculation of conjunctiva offers a

rich and complex network of veins and fine microcirculation (Fig 1) The

apparent complexity and specificity of these vascular patterns motivated us

to utilize them for personal identification (Derakhshani and Ross 2006)

FIG

We have found conjunctival vasculature to be a suitable biometric as it

conforms to the following criteria (Jain et al 2004)

UNIVERSALITY All normal living tissues including that of the

conjunctiva and episclera have vascular structure

UNIQUENESS Vasculature is created during embryonic vasculogenesis

Its detailed final structure is mostly stochastic and thus unique Even

though no comprehensive study on the uniqueness of vascular structures

has been conducted study of some targeted areas such as those of the eye

fundus confirm the uniqueness of such vascular patterns even between

identical twins (Simon and Goldstein 1935 Tower 1955)

PERMANENCE Other than cases of significant trauma pathology or

chemical intervention spontaneous adult ocular vasculogenesis and

angiogenesis does not easily occur Thus the conjunctival vascular

structure is expected to have reasonable permanence (Joussen 2001)

Practicality Conjunctival vasculature can be captured with commercial off

the shelf digital cameras under normal lighting conditions making this

modality highly practical

ACCEPTABILITY Since the subject is not required to stare directly into

the camera lens and given the possibility of capturing the conjunctival

vasculature from several feet away this modality is non-intrusive and thus

more acceptable

SPOOF-PROOFNESS The fine multi surface structure of the ocular

veins makes them hard to reproduce as a physical artifact Besides being a

stand-alone biometric modality we anticipate that the addition of

conjunctival biometrics will enhance the performance of current iris-based

biometric system in the following ways

Improving accuracy by the addition of vascular features

Facilitating recognition using off-angle iris images For instance if the iris

information is relegated to the left or right portions of the eye the sclera

vein patterns will be further exposed This feature makes sclera vasculature

a natural complement to the iris biometric

Addressing the failure-to-enroll issue when iris patterns are not usable (eg

due to surgical procedures)

Reducing vulnerability to spoof attacks For instance when implemented

alongside iris systems an attacker needs to reproduce not only the iris but

also different surfaces of the sclera along with the associated

microcirculation and make them available on commensurate eye surfaces

The first step in parallelizing an algorithm is to determine the

availability for simultaneous computation Below Figure demonstrates the

possibility for parallel directional filtering Since the filter is computed over

different portions of the input image the computation can be computed in

parallel (denoted by Elements below) In addition individual parallelization

of each element of Filtering can also be performed A detailed discussion of

our proposed parallelization is outside the scope of this paper

FIG

FIG

225 OVERVIEW OF THE LINE DESCRIPTOR-BASED SCLERA

VEIN

2251 RECOGNITION METHOD

The matching segment of the line-descriptor based method is a

bottleneck with regard to matching speed In this section we briefly

describe the Line Descriptor-based sclera vein recognition method After

segmentation vein patterns were enhanced by a bank of directional Gabor

filters Binary morphological operations are used to thin the detected vein

structure down to a single pixel wide skeleton and remove the branch

points The line descriptor is used to describe the segments in the vein

structure Figure 2 shows a visual description of the line descriptor Each

segment is described by three quantities the segments angle to some

reference angle at the iris center θ the segments distance to the iris center r

and the dominant angular orientation of the line segment ɸ Thus the

descriptor is S = ( θ r ɸ )T The individual components of the line descriptor

are calculated as

FIG

Here fline (x) is the polynomial approximation of the line segment (xl yl )

is the center point of the line segment (xi yi ) is the center of the detected

iris and S is the line descriptor In order to register the segments of the

vascular patterns a RANSAC-based algorithm is used to estimate the best-

fit parameters for registration between the two sclera vascular patterns For

the registration algorithm it randomly chooses two points ndash one from the

test template and one from the target template It also randomly chooses a

scaling factor and a rotation value based on a priori knowledge of the

database Using these values it calculates a fitness value for the registration

using these parameters

After sclera template registration each line segment in the test

template is compared to the line segments in the target template for

matches In order to reduce the effect of segmentation errors we created the

weighting image (Figure 3) from the sclera mask by setting interior pixels

in the sclera mask to 1 pixels within some distance of the boundary of the

mask to 05 and pixels outside the mask to 0

The matching score for two segment descriptors is calculated By

where Si and Sj are two segment descriptors m(Si Sj ) is the matching

score between segments Si and Sj d(Si Sj ) is the Euclidean distance

between the segment descriptors center points (from Eq 6-8) Dmatch is

the matching distance threshold and match is the matching angle threshold

The total matching score M is the sum of the individual matching scores

divided by the maximum matching score for the minimal set between the

test and target template That is one of the test or target templates has fewer

points and thus the sum of its descriptors weight sets the maximum score

that can be attained

FIG

FIG

FIG

FIG

movement of eye Y shape branches are observed to be a stable feature and

can be used as sclera feature descriptor To detect the Y shape branches in

the original template we search for the nearest neighbors set of every line

segment in a regular distance classified the angles among these neighbors

If there were two types of angle values in the line segment set this set may

be inferred as a Y shape structure and the line segment angles would be

recorded as a new feature of the sclera

There are two ways to measure both orientation and relationship of

every branch of Y shape vessels one is to use the angles of every branch to

x axle the other is to use the angels between branch and iris radial

direction The first method needs additional rotation operating to align the

template In our approach we employed the second method As Figure 6

shows ϕ1 ϕ2 and ϕ3 denote the angle between each branch and the radius

from pupil center Even when the head tilts the eye moves or the camera

zooms occurs at the image acquisition step ϕ1 ϕ2 and ϕ3 are quite stable

To tolerate errors from the pupil center calculation in the segmentation step

we also recorded the center position (x y) of the Y shape branches as

auxiliary parameters So in our rotation shift and scale invariant feature

vector is defined as y(ϕ1 ϕ2 ϕ3 x y) The Y-shape descriptor is generated

with reference to the iris center Therefore it is automatically aligned to the

iris centers It is a rotational- and scale- invariant descriptor V WPL

SCLERA DESCRIPTOR As we discussed in the Section 22 the line

descriptor is extracted from the skeleton of vessel structure in binary images

(Figure 7) The skeleton is then broken into smaller segments For each

segment a line descriptor is created to record the center and orientation of

the segment This descriptor is expressed as s(x yɸ) where (x y) is the

position of the center and ɸ is its orientation Because of the limitation of

segmentation accuracy the descriptor in the boundary of sclera area might

not be accurate and may contain spur edges resulting from the iris eyelid

andor eyelashes To be tolerant of such error the mask file

FIG

The line descriptor of the sclera vessel pattern (a) An eye image (b) Vessel

patterns in sclera (c) Enhanced sclera vessel patterns (d) Centers of line

segments of vessel patterns

Is designed to indicate whether a line segment belongs to the edge of the

sclera or not However in GPU application using the mask is a challenging

since the mask files are large in size and will occupy the GPU memory and

slow down the data transfer When matching the registration RANSAC

type algorithm was used to randomly select the corresponding descriptors

and the transform parameter between them was used to generate the

template transform affine matrix After every templates transform the mask

data should also be transformed and new boundary should be calculated to

evaluate the weight of the transformed descriptor This results in too many

convolutions in processor unit

To reduce heavy data transfer and computation we designed the

weighted polar line (WPL) descriptor structure which includes the

information of mask and can be automatically aligned We extracted the

relationship of geometric feature of descriptors and store them as a new

descriptor We use a weighted image created via setting various weight

values according to their positions The weight of those descriptors who are

beyond the sclera are set to be 0 and those who are near the sclera

boundary are 05 and interior descriptors are set to be 1 In our work

descriptors weights were calculated on their own mask by the CPU only

once

The calculating result was saved as a component of descriptor The

descriptor of sclera will change to s(x y ɸw) where w denotes the weight

of the point and the value may be 0 05 1 To align two templates when a

template is shifted to another location along the line connecting their

centers all the descriptors of that template will be transformed It would be

faster if two templates have similar reference points If we use the center of

the iris as the reference point when two templates are compared the

correspondence will automatically be aligned to each other since they have

the similar reference point Every feature vector of the template is a set of

line segment descriptors composed of three variable (Figure 8) the

segment angle to the reference line which went through the iris center

denoted as θ the distance between the segments center and pupil center

which is denoted as r the dominant angular orientation of the segment

denoted as ɸ To minimize the GPU computing we also convert the

descriptor value from polar coordinate to rectangular coordinate in CPU

preprocess

The descriptor vector becomes s(x y r θ ɸw) The left and right

parts of sclera in an eye may have different registration parameters For

example as an eyeball moves left left part sclera patterns of the eye may be

compressed while the right part sclera patterns are stretched

In parallel matching these two parts are assigned to threads in

different warps to allow different deformation The multiprocessor in

CUDA manages threads in groups of 32 parallel threads called warps We

reorganized the descriptor from same sides and saved

FIG

FIG

them in continuous address This would meet requirement of coalesced

memory access in GPU

After reorganizing the structure of descriptors and adding mask information

into the new descriptor the computation on the mask file is not needed on

the GPU It was very fast to match with this feature because it does not

need to reregister the templates every time after shifting Thus the cost of

data transfer and computation on GPU will be reduced Matching on the

new descriptor the shift parameters generator in Figure 4 is then simplified

as Figure 9

23 EVOLUTION OF GPU ARCHITECTURE

The fixed-function pipeline lacked the generality to efficiently express

more complicated shading and lighting operations that are essential for

complex effects The key step was replacing the fixed-function per-vertex

and per-fragment operations with user-specified programs run on each

vertex and fragment Over the past six years these vertex programs and

fragment programs have become increasingly more capable with larger

limits on their size and resource consumption with more fully featured

instruction sets and with more flexible control-flow operations After many

years of separate instruction sets for vertex and fragment operations current

GPUs support the unified Shader Model 40 on both vertex and fragment

shaders

The hardware must support shader programs of at least 65 k static

instructions and unlimited dynamic instructions

The instruction set for the first time supports both 32-bit integers and 32-

bit floating-point numbers

The hardware must allow an arbitrary number of both direct and indirect

reads from global memory (texture)

Finally dynamic flow control in the form of loops and branches must be

supported

As the shader model has evolved and become more powerful and GPU

applications of all types have increased vertex and fragment program

complexity GPU architectures have increasingly focused on the

programmable parts of the graphics pipeline Indeed while previous

generations of GPUs could best be described as additions of

programmability to a fixed-function pipeline todayrsquos GPUs are better

characterized as a programmable engine surrounded by supporting fixed-

function units General-Purpose Computing on the GPU Mapping general-

purpose computation onto the GPU uses the graphics hardware in much the

same way as any standard graphics application Because of this similarity it

is both easier and more difficult to explain the process On one hand the

actual operations are the same and are easy to follow on the other hand the

terminology is different between graphics and general-purpose use Harris

provides an excellent description of this mapping process

We begin by describing GPU programming using graphics terminology

then show how the same steps are used in a general-purpose way to author

GPGPU applications and finally use the same steps to show the more

simple and direct way that todayrsquos GPU computing applications are written

231 PROGRAMMING A GPU FOR GRAPHICS

We begin with the same GPU pipeline that we described in Section II

concentrating on the programmable aspects of this pipeline

The programmer specifies geometry that covers a region on the screen

The rasterizer generates a fragment at each pixel location covered by that

geometry

Each fragment is shaded by the fragment program

The fragment program computes the value of the fragment by a

combination of math operations and global memory reads from a global

Btexture[ memory

The resulting image can then be used as texture on future passes through

the graphics pipeline

232 PROGRAMMING A GPU FOR GENERAL-PURPOSE

PROGRAMS (OLD)

Coopting this pipeline to perform general-purpose computation

involves the exact same steps but different terminology A motivating

example is a fluid simulation computed over a grid at each time step we

compute the next state of the fluid for each grid point from the current state

at its grid point and at the grid points of its neighbors

The programmer specifies a geometric primitive that covers a

computation domain of interest The rasterizer generates a fragment at each

pixel location covered by that geometry (In our example our primitive

must cover a grid of fragments equal to the domain size of our fluid

simulation)

Each fragment is shaded by an SPMD general-purpose fragment

program (Each grid point runs the same program to update the state of its

fluid)

The fragment program computes the value of the fragment by a

combination of math operations and Bgather[ accesses from global

memory (Each grid point can access the state of its neighbors from the

previous time step in computing its current value)

The resulting buffer in global memory can then be used as an input on

future passes (The current state of the fluid will be used on the next time

step)

23 2 PROGRAMMING A GPU FOR GENERAL-PURPOSE

PROGRAMS (NEW)

One of the historical difficulties in programming GPGPU applications

has been that despite their general-purpose tasksrsquo having nothing to do with

graphics the applications still had to be programmed using graphics APIs

In addition the program had to be structured in terms of the graphics

pipeline with the programmable units only accessible as an intermediate

step in that pipeline when the programmer would almost certainly prefer to

access the programmable units directly The programming environments we

describe in detail in Section IV are solving this difficulty by providing a

more natural direct non-graphics interface to the hardware and

specifically the programmable units Today GPU computing applications

are structured in the following way

The programmer directly defines the computation domain of interest as a

structured grid of threads

An SPMD general-purpose program computes the value of each thread

The value for each thread is computed by a combination of math

operations and both Bgather[ (read) accesses from and Bscatter[ (write)

accesses to global memory Unlike in the previous two

methods the same buffer can be used for both reading and writing

allowing more flexible algorithms (for example in-place algorithms that

use less memory)

The resulting buffer in global memory can then be used as an input in

future computation

24 COARSE-TO-FINE TWO-STAGE MATCHING PROCESS

To further improve the matching process we propose the coarse-to-fine

two-stage matching process In the first stage we matched two images

coarsely using the Y-shape descriptors which is very fast to match because

no registration was needed The matching result in this stage can help filter

out image pairs with low similarities After this step it is still possible for

some false positive matches In the second stage we used WPL descriptor

to register the two images for more detailed descriptor matching including

scale- and translation invariance This stage includes shift transform affine

matrix generation and final WPL descriptor matching Overall we

partitioned the registration and matching processing into four kernels2 in

CUDA (Figure 10) matching on the Y shape descriptor shift

transformation affine matrix generation and final WSL descriptor

matching Combining these two stages the matching program can run faster

and

achieve more accurate score

241 STAGE I MATCHING WITH Y SHAPE DESCRIPTOR

Due to scale- and rotation- invariance of the Y-shape features

registration is unnecessary before matching on Y shape descriptor The

whole matching algorithm is listed as algorithm 1

FIG

Here ytei and yta j are the Y shape descriptors of test template Tte

and target template Tta respectively dϕ is the Euclidian distance of angle

element of descriptors vector defined as (3) dxy is the Euclidian distance of

two descriptor centers defined as (4) ni and di are the matched descriptor

pairsrsquo number and their centers distance respectively tϕ is a distance

threshold and txy is the threshold to restrict the searching area We set tϕ to

30 and txy to 675 in our experiment Here

To match two sclera templates we searched the areas nearby to all

the Y shape branches The search area is limited to the corresponding left or

right half of the sclera in order to reduce the searching range and time The

distance of two branches is defined in (3) where ϕi j is the angle between

the j th branch and the polar from pupil center in desctiptor i

The number of matched pairs ni and the distance between Y shape

branches centers di are stored as the matching result We fuse the number of

matched branches and the average distance between matched branches

centers as (2) Here α is a factor to fuse the matching score which was set

to 30 in our study Ni and Nj is the total numbers of feature vectors in

template i and j separately The decision is regulated by the threshold t if

the sclerarsquos matching score is lower than t the sclera will be discarded The

sclera with high matching score will be passed to the next more precisely

matching process

242 STAGE II FINE MATCHING USING WPL DESCRIPTOR

The line segment WSL descriptor reveals more vessel structure detail of

sclera than the Y shape descriptor The variation of sclera vessel pattern is

nonlinear because

When acquiring an eye image in different gaze angle the vessel structure

will appear nonlinear shrink or extend because eyeball is spherical in shape

sclera is made up of four layers episclera stroma lamina fusca and

endothelium There are slightly differences among movement of these

layers Considering these factors our registration employed both single

shift transform and multi-parameter transform which combines shift

rotation and scale together

1) SHIFT PARAMETER SEARCH As we discussed before

segmentation may not be accurate As a result the detected iris center could

not be very accurate Shift transform is designed to tolerant possible errors

in pupil center detection in the segmentation step If there is no deformation

or only very minor deformation registration with shift transform together

would be adequate to achieve an accurate result We designed Algorithm 2

to get optimized shift parameter Where Tte is the test template and ssei is

the i th WPL descriptor of Tte Tta is the target template and ssai is the i th

WPL descriptor of Ttad(stek staj ) is Euclidean distance of escriptors stek

and staj

Δsk is the shift value of two descriptors defines as

We first randomly select an equal number of segment descriptors

stek in test template Tte from each quad and find its nearest neighbors staj _

in target template Tta The shift offset of them is recorded as the possible

registration shift factor _sk The final offset registration factor is _soptim

which has the smallest standard deviation among these candidate offsets

2) AFFINE TRANSFORM PARAMETER SEARCH

Affine transform is designed to tolerant some deformation of sclera

patterns in the matching step The affine transform algorithm is shown in

Algorithm 3 The shift value in the parameter set is obtained by randomly

selecting descriptor s(it )te and calculating the distance from its nearest

neighbor staj_ in Tta We transform the test template by the matrix in (7)

At end of the iteration we count the numbers of matched descriptor pairs

from the transformed template and the target template The factor β is

involved to determine if the pair of descriptor is matched and we set it to

be 20 pixels in our experiment After N iterations the optimized transform

parameter set is determined via selecting the maximum matching numbers

m(it) Here stei Tte staj and Tta is defined same as algorithm 2 tr (it )

shi f t θ(it )tr (it ) scale is the parameters of shift rotation and scale

transform generated in i tth iteration R(θ (it )) T (tr (it ) shi f t ) and S(tr (it

) scale) are the transform matrix defined as (7) To search optimize

transform parameter we iterated N times to generate these parameters In

our experiment we set iteration time to 512

3) REGISTRATION AND MATCHING ALGORITHM

Using the optimized parameter set determined from Algorithms 2

and 3 the test template will be registered and matched simultaneously The

registration and matching algorithm is listed in Algorithm 4 Here stei Tte

staj and Tta are defined same as Algorithms 2 and 3 θ(optm) tr (optm) shi f

t tr (optm) scale _soptim are the registration parameters attained from

Algorithms 2 and 3 R_θ(optm)_T _tr (optm) shi f t _S_tr (optm) scale

_ is the descriptor transform matrix defined in Algorithm 3 empty is the angle

between the segment descriptor and radius direction w is the weight of the

descriptor which indicates whether the descriptor is at the edge of sclera or

not To ensure that the nearest descriptors have a similar orientation we

used a constant factor α to check the abstract difference of two ɸ In our

experiment we set α to 5 The total matching score is minimal score of two

transformed result divided by the minimal matching score for test template

and target template

25 MAPPING THE SUBTASKS TO CUDA

CUDA is a single instruction multiple data (SIMD) system and

works as a coprocessor with a CPU A CUDA consists of many streaming

multiprocessors (SM) where the parallel part of the program should be

partitioned into threads by the programmer and mapped into those threads

There are multiple memory spaces in the CUDA memory hierarchy

register local memory shared memory global memory constant memory

and texture memory Register local memory and shared memory are on-

chip and could be a little time consuming to access these memories Only

shared memory can be accessed by other threads within the same block

However there is only limited availability of shared memory Global

memory constant memory and texture memory are off-chip memory and

accessible by all threads which would be very time consuming to access

these memories

Constant memory and texture memory are read-only and cacheable

memory Mapping algorithms to CUDA to achieve efficient processing is

not a trivial task There are several challenges in CUDA programming

If threads in a warp have different control path all the branches will be

executed serially To improve performance branch divergence within a

warp should be avoided

Global memory is slower than on-chip memory in term of access To

completely hide the latency of the small instructions set we should use on-

chip memory preferentially rather than global memory When global

memory access occurs threads in same warp should access the words in

sequence to achieve coalescence

Shared memory is much faster than the local and global memory space

But shared memory is organized into banks which are equal in size If two

addresses of memory request from different thread within a warp fall in the

same memory bank the access will be serialized To get maximum

performance memory requests should be scheduled to minimize bank

conflicts

251 MAPPING ALGORITHM TO BLOCKS

Because the proposed registration and matching algorithm has four

independent modules all the modules will be converted to different kernels

on the GPU These kernels are different in computation density thus we

map them to the GPU by various map strategies to fully utilize the

computing power of CUDA Figure 11 shows our scheme of CPU-GPU

task distribution and the partition among blocks and threads Algorithm 1 is

partitioned into coarse-grained parallel subtasks

We create a number of threads in this kernel The number of threads

is the same as the number of templates in the database As the upper middle

column shows in Figure 11 each target template will be assigned to one

thread One thread performs a pair of templates compare In our work we

use NVIDIA C2070 as our GPU Threads and blocks number is set to

1024 That means we can match our test template with up to 1024times1024

target templates at same time

Algorithms 2-4 will be partitioned into fine-grained subtasks which is

processed a section of descriptors in one thread As the lower portion of the

middle column shows in Figure 11 we assigned a target template to one

block Inside a block one thread corresponds a set of descriptors in this

template This partition makes every block execute independently and there

are no data exchange requirements between different blocks When all

threads complete their responding descriptor fractions the sum of the

intermediate results needs to be computed or compared A parallel prefix

sum algorithm is used to calculate the sum of intermediate results which is

show in right of Figure 11 Firstly all odd number threads compute the sum

of consecutive pairs of the results Then recursively every first of i (= 4 8

16 32 64 ) threads

compute the prefix sum on the new result The final result will be saved in

the first address which has the same variable name as the first intermediate

result

252 MAPPING INSIDE BLOCK

In shift argument searching there are two schemes we can choose to

map task

Mapping one pair of templates to all the threads in a block and then every

thread would take charge of a fraction of descriptors and cooperation with

other threads

Assigning a single possible shift offset to a thread and all the threads will

compute independently unless the final result should be compared with

other possible offset

Due to great number of sum and synchronization operations in every

nearest neighbor searching step we choose the second method to parallelize

shift searching In affine matrix generator we mapped an entire parameter

set searching to a thread and every thread randomly generated a set of

parameters and tried them independently The generated iterations were

assigned to all threads The challenge of this step is the randomly generated

numbers might be correlated among threads In the step of rotation and

scale registration generating we used the Mersenne Twister pseudorandom

number generator because it can use bitwise arithmetic and have long

period

The Mersenne twister as most of pseudorandom generators is iterative

Therefore itrsquos hard to parallelize a single twister state update step among

several execution threads To make sure that thousands of threads in the

launch grid generate uncorrelated random sequence many simultaneous

Mersenne twisters need to process with different initial states in parallel

But even ldquovery differentrdquo (by any definition) initial state values do not

prevent the emission of correlated sequences by each generator sharing

identical parameters To solve this problem and to enable efficient

implementation of Mersenne Twister on parallel architectures we used a

special offline tool for the dynamic creation of Mersenne Twisters

parameters modified from the algorithm developed by Makoto Matsumoto

and Takuji Nishimura In the registration and matching step when

searching the nearest neighbor a line segment that has already matched

with others should not be used again In our approach a flag

FIG

FIG

Variable denoting whether the line has been matched is stored in

shared memory To share the flags all the threads in a block should wait

synchronic operation at every query step Our solution is to use a single

thread in a block to process the matching

253 MEMORY MANAGEMENT

The bandwidth inside GPU board is much higher than the

bandwidth between host memory and device memory The data transfer

between host and device can lead to long latency As shown in Figure 11

we load the entire target templates set from database without considering

when they would be processed Therefore there was no data transfer from

host to device during the matching procedure In global memory the

components in descriptor y(ϕ1 ϕ2 ϕ3 x y) and s(x y rθ ϕw) were stored

separately This would guarantee contiguous kernels of Algorithm 2 to 4

can access their data in successive addresses Although such coalescing

access reduces the latency frequently global memory access was still a

slower way to get data In our kernel we loaded the test template to shared

memory to accelerate memory access Because the Algorithms 2 to 4

execute different number of iterations on same data the bank conflict does

not happen To maximize our texture memory space we set the system

cache to the lowest value and bonded our target descriptor to texture

memory Using this catchable memory our data access was accelerated

more

FIG

26 HISTOGRAM OF ORIENTED GRADIENTS

Histogram of oriented gradients is the feature descriptors It is primarily

applied to the design of target detection In this paper it is applied as the

feature for human recognition In the sclera region the vein patterns are the

edges of an image So HOG is used to determine the gradient orientation

and edge orientations of vein pattern in the sclera region of an eye image

To follow out this technique first of all divide the image into small

connected regions called cells For each cell compute the histogram of

gradient directions or edge orientations of the pixels Then the combination

of different histogram of different cell represents the descriptor To improve

accuracy histograms can be contrast normalized by calculating the intensity

from the block and then using this value normalizes all cells within the

block This normalization result shows that it is invariant to geometric and

photometric changes The gradient magnitude m(x y) and orientation 1050592(x

y) are calculated using x and y directions gradients dx (x y) and dy (x y)

Orientation binning is the second step of HOG This method utilized

to create cell histograms Each pixel within the cell used to give a weight to

the orientation which is found in the gradient computation Gradient

magnitude is used as the weight The cells are in the rectangular form The

binning of gradient orientation should be spread over 0 to 180 degrees and

opposite direction counts as the same In the Fig 8 depicts the edge

orientation of picture elements If the images have any illumination and

contrast changes then the gradient strength must be locally normalized For

that cells are grouped together into larger blocks These blocks are

overlapping so that each cell contributes more than once to the final

descriptor Here rectangular HOG (R-HOG) blocks are applied which are

mainly in square grids The performance of HOG is improved by putting

on a Gaussian window into each block

FIG

CHAPTER 3

SOFTWARE SPECIFICATION

31 GENERAL

MATLAB(matrix laboratory) is a numerical

computing environment and fourth-generation programming language

Developed by Math Works MATLAB allows matrix manipulations

plotting of functions and data implementation of algorithms creation

of user interfaces and interfacing with programs written in other languages

including C C++ Java and Fortran

Although MATLAB is intended primarily for numerical computing an

optional toolbox uses the MuPAD symbolic engine allowing access

to symbolic computing capabilities An additional package Simulink adds

graphicalmulti-domainsimulationandModel-Based

Design for dynamic and embedded systems

In 2004 MATLAB had around one million users across industry

and academia MATLAB users come from various backgrounds

of engineering science and economics MATLAB is widely used in

academic and research institutions as well as industrial enterprises

MATLAB was first adopted by researchers and practitioners

in control engineering Littles specialty but quickly spread to many other

domains It is now also used in education in particular the teaching

of linear algebra and numerical analysis and is popular amongst scientists

involved in image processing The MATLAB application is built around the

MATLAB language The simplest way to execute MATLAB code is to type

it in the Command Window which is one of the elements of the MATLAB

Desktop When code is entered in the Command Window MATLAB can

be used as an interactive mathematical shell Sequences of commands can

be saved in a text file typically using the MATLAB Editor as a script or

encapsulated into a function extending the commands available

MATLAB provides a number of features for documenting and

sharing your work You can integrate your MATLAB code with other

languages and applications and distribute your MATLAB algorithms and

applications

32 FEATURES OF MATLAB

High-level language for technical computing

Development environment for managing code files and data

Interactive tools for iterative exploration design and problem solving

Mathematical functions for linear algebra statistics Fourier analysis

filtering optimization and numerical integration

2-D and 3-D graphics functions for visualizing data

Tools for building custom graphical user interfaces

Functions for integrating MATLAB based algorithms with external

applications and languages such as C C++ FORTRAN Javatrade COM

and Microsoft Excel

MATLAB is used in vast area including signal and image

processing communications control design test and measurement

financial modeling and analysis and computational Add-on toolboxes

(collections of special-purpose MATLAB functions) extend the MATLAB

environment to solve particular classes of problems in these application

areas

MATLAB can be used on personal computers and powerful

server systems including the Cheaha compute cluster With the addition of

the Parallel Computing Toolbox the language can be extended with parallel

implementations for common computational functions including for-loop

unrolling Additionally this toolbox supports offloading computationally

intensive workloads to Cheaha the campus compute cluster MATLAB is

one of a few languages in which each variable is a matrix (broadly

construed) and knows how big it is Moreover the fundamental operators

(eg addition multiplication) are programmed to deal with matrices when

required And the MATLAB environment handles much of the bothersome

housekeeping that makes all this possible Since so many of the procedures

required for Macro-Investment Analysis involves matrices MATLAB

proves to be an extremely efficient language for both communication and

implementation

321 INTERFACING WITH OTHER LANGUAGES

MATLAB can call functions and subroutines written in the C

programming language or FORTRAN A wrapper function is created

allowing MATLAB data types to be passed and returned The dynamically

loadable object files created by compiling such functions are termed MEX-

files (for MATLAB executable)

Libraries written in Java ActiveX or NET can be directly called

from MATLAB and many MATLAB libraries (for

example XML or SQL support) are implemented as wrappers around Java

or ActiveX libraries Calling MATLAB from Java is more complicated but

can be done with MATLAB extension which is sold separately by Math

Works or using an undocumented mechanism called JMI (Java-to-Mat lab

Interface) which should not be confused with the unrelated Java that is also

called JMI

As alternatives to the MuPAD based Symbolic Math Toolbox

available from Math Works MATLAB can be connected

to Maple or Mathematical

Libraries also exist to import and export MathML

Development Environment

Startup Accelerator for faster MATLAB startup on Windows especially on

Windows XP and for network installations

Spreadsheet Import Tool that provides more options for selecting and

loading mixed textual and numeric data

Readability and navigation improvements to warning and error messages in

the MATLAB command window

Automatic variable and function renaming in the MATLAB Editor

Developing Algorithms and Applications

MATLAB provides a high-level language and development

tools that let you quickly develop and analyze your algorithms and

applications

The MATLAB Language

The MATLAB language supports the vector and matrix operations

that are fundamental to engineering and scientific problems It enables fast

development and execution With the MATLAB language you can

program and develop algorithms faster than with traditional languages

because you do not need to perform low-level administrative tasks such as

declaring variables specifying data types and allocating memory In many

cases MATLAB eliminates the need for lsquoforrsquo loops As a result one line of

MATLAB code can often replace several lines of C or C++ code

At the same time MATLAB provides all the features of a traditional

programming language including arithmetic operators flow control data

structures data types object-oriented programming (OOP) and debugging

features

MATLAB lets you execute commands or groups of commands one

at a time without compiling and linking enabling you to quickly iterate to

the optimal solution For fast execution of heavy matrix and vector

computations MATLAB uses processor-optimized libraries For general-

purpose scalar computations MATLAB generates machine-code

instructions using its JIT (Just-In-Time) compilation technology

This technology which is available on most platforms provides

execution speeds that rival those of traditional programming languages

Development Tools

MATLAB includes development tools that help you implement

your algorithm efficiently These include the following

MATLAB Editor

Provides standard editing and debugging features such as setting

breakpoints and single stepping

Code Analyzer

Checks your code for problems and recommends modifications to

maximize performance and maintainability

MATLAB Profiler

Records the time spent executing each line of code

Directory Reports

Scan all the files in a directory and report on code efficiency file

differences file dependencies and code coverage

Designing Graphical User Interfaces

By using the interactive tool GUIDE (Graphical User Interface

Development Environment) to layout design and edit user interfaces

GUIDE lets you include list boxes pull-down menus push buttons radio

buttons and sliders as well as MATLAB plots and Microsoft

ActiveXreg controls Alternatively you can create GUIs programmatically

using MATLAB functions

322 ANALYZING AND ACCESSING DATA

MATLAB supports the entire data analysis process from acquiring

data from external devices and databases through preprocessing

visualization and numerical analysis to producing presentation-quality

output

Data Analysis

MATLAB provides interactive tools and command-line functions for data

analysis operations including

Interpolating and decimating

Extracting sections of data scaling and averaging

Thresholding and smoothing

Correlation Fourier analysis and filtering

1-D peak valley and zero finding

Basic statistics and curve fitting

Matrix analysis

Data Access

MATLAB is an efficient platform for accessing data from

files other applications databases and external devices You can read data

from popular file formats such as Microsoft Excel ASCII text or binary

files image sound and video files and scientific files such as HDF and

HDF5 Low-level binary file IO functions let you work with data files in

any format Additional functions let you read data from Web pages and

XML

Visualizing Data

All the graphics features that are required to visualize engineering

and scientific data are available in MATLAB These include 2-D and 3-D

plotting functions 3-D volume visualization functions tools for

interactively creating plots and the ability to export results to all popular

graphics formats You can customize plots by adding multiple axes

changing line colors and markers adding annotation Latex equations and

legends and drawing shapes

2-D Plotting

Visualizing vectors of data with 2-D plotting functions that create

Line area bar and pie charts

Direction and velocity plots

Histograms

Polygons and surfaces

Scatterbubble plots

Animations

3-D Plotting and Volume Visualization

MATLAB provides functions for visualizing 2-D matrices 3-

D scalar and 3-D vector data You can use these functions to visualize and

understand large often complex multidimensional data Specifying plot

characteristics such as camera viewing angle perspective lighting effect

light source locations and transparency

3-D plotting functions include

Surface contour and mesh

Image plots

Cone slice stream and isosurface

323 PERFORMING NUMERIC COMPUTATION

MATLAB contains mathematical statistical and engineering

functions to support all common engineering and science operations These

functions developed by experts in mathematics are the foundation of the

MATLAB language The core math functions use the LAPACK and BLAS

linear algebra subroutine libraries and the FFTW Discrete Fourier

Transform library Because these processor-dependent libraries are

optimized to the different platforms that MATLAB supports they execute

faster than the equivalent C or C++ code

MATLAB provides the following types of functions for performing

mathematical operations and analyzing data

Matrix manipulation and linear algebra

Polynomials and interpolation

Fourier analysis and filtering

Data analysis and statistics

Optimization and numerical integration

Ordinary differential equations (ODEs)

Partial differential equations (PDEs)

Sparse matrix operations

MATLAB can perform arithmetic on a wide range of data types

including doubles singles and integers

CHAPTER 4

IMPLEMENTATION

41 GENERAL

Matlab is a program that was originally designed to simplify the

implementation of numerical linear algebra routines It has since grown into

something much bigger and it is used to implement numerical algorithms

for a wide range of applications The basic language used is very similar to

standard linear algebra notation but there are a few extensions that will

likely cause you some problems at first

42 SNAPSHOTS

ORIGINAL SCLERA IMAGE IS CONVERTED INTO GREY SCALE IMAGE

FIG

GREY SCALE IMAGE IS CONVERTED INTO BINARY IMAGE

FIG

EDGE DETECTON IS DONE BY OTSUrsquoS THRESHOLDING

FIG

SELECTING THE REGION OF INTEREST (SCLERA PART)

FIG

SELECTED ROI PART

FIG

FIG

ENHANCEMENT OF SCLERA IMAGE

FIG

FEATURE EXTRACTION OF SCLERA IMAGE USING GABOR

FILTERS

FIG

MATCHING WITH IMAGES IN DATABASE

FIG

DISPLAYING THE RESULT (MATCHED OR NOT MATCHED)

FIG

CHAPTER 5

APPLICATIONS

The applications of biometrics can be divided into the following three main groups

Commercial applications such as computer network login electronic data security ecommerce Internet access ATM credit card physical access control cellular phone PDA medical records management distance learning etc

Government applications such as national ID card correctional facility driverrsquos license social security welfare-disbursement border control Passport control etc

Forensic applications such as corpse identification criminal investigation terrorist identification parenthood determination missing children etc Traditionally commercial applications have used knowledge-based systems (eg PIN sand passwords) government applications have used token-based systems (eg ID cards and badges) and forensic applications have relied on human experts to match biometric features Biometric systems are being increasingly deployed in large scale civilian applications The Schiphol Premium scheme at the Amsterdam airport for example employs iris scan cards to speed up the passport and visa control procedures

CHAPTER 6

CONCLUSION AND FUTURE SCOPE

61 CONCLUSION

In this paper we proposed a new parallel sclera vein recognition

method which employees a two stage parallel approach for registration and

matching Even though the research focused on developing a parallel sclera

matching solution for the sequential line-descriptor method using CUDA

GPU architecture the parallel strategies developed in this research can be

applied to design parallel solutions to other sclera vein recognition methods

and general pattern recognition methods We designed the Y shape

descriptor to narrow the search range to increase the matching efficiency

which is a new feature extraction method to take advantage of the GPU

structures We developed the WPL descriptor to incorporate mask

information and make it more suitable for parallel computing which can

dramatically reduce data transferring and computation We then carefully

mapped our algorithms to GPU threads and blocks which is an important

step to achieve parallel computation efficiency using a GPU A work flow

which has high arithmetic intensity to hide the memory access latency was

designed to partition the computation task to the heterogeneous system of

CPU and GPU even to the threads in GPU The proposed method

dramatically improves the matching efficiency without compromising

recognition accuracy

62 REFERENCES

[1] C W Oyster The Human Eye Structure and Function Sunderland

Sinauer Associates 1999

[2] C Cuevas D Berjon F Moran and N Garcia ldquoMoving object

detection for real-time augmented reality applications in a GPGPUrdquo IEEE

Trans Consum Electron vol 58 no 1 pp 117ndash125 Feb 2012

[3] D C Cirean U Meier L M Gambardella and J Schmidhuber ldquoDeep

big simple neural nets for handwritten digit recognitionrdquo Neural Comput

vol 22 no 12 pp 3207ndash3220 2010

[4] F Z Sakr M Taher and A M Wahba ldquoHigh performance iris

recognition system on GPUrdquo in Proc ICCES 2011 pp 237ndash242

[5] G Poli J H Saito J F Mari and M R Zorzan ldquoProcessing

neocognitron of face recognition on high performance environment based

on GPU with CUDA architecturerdquo in Proc 20th Int Symp Comput

Archit High Perform Comput 2008 pp 81ndash88

[6] J Antikainen J Havel R Josth A Herout P Zemcik and M Hauta-

Kasari ldquoNonnegative tensor factorization accelerated using GPGPUrdquo IEEE

Trans Parallel Distrib Syst vol 22 no 7 pp 1135ndash1141 Feb 2011

[7] K-S Oh and K Jung ldquoGPU implementation of neural networksrdquo

Pattern Recognit vol 37 no 6 pp 1311ndash1314 2004

[8] P R Dixon T Oonishi and S Furui ldquoHarnessing graphics processors

for the fast computation of acoustic likelihoods in speech recognitionrdquo

Comput Speech Lang vol 23 no 4 pp 510ndash526 2009

[9] P Kaufman and A Alm ldquoClinical applicationrdquo Adlerrsquos Physiology of

the Eye 2003

[10] R N Rakvic B J Ulis R P Broussard R W Ives and N Steiner

ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics Security vol 4

no 4 pp 812ndash823 Dec 2009

[11] S Crihalmeanu and A Ross ldquoMultispectral scleral patterns for ocular

biometric recognitionrdquo Pattern Recognit Lett vol 33 no 14 pp 1860ndash

1869 Oct 2012

[12] W Wenying Z Dongming Z Yongdong L Jintao and G

Xiaoguang ldquoRobust spatial matching for object retrieval and its parallel

implementation on GPUrdquo IEEE Trans Multimedia vol 13 no 6 pp

1308ndash1318 Dec 2011Multimedia Sec Magdeburg Germany Sep 2004

pp 4ndash15

[13] Y Xu S Deka and R Righetti ldquoA hybrid CPU-GPGPU approach for

real-time elastographyrdquo IEEE Trans Ultrason Ferroelectr Freq Control

vol 58 no 12 pp 2631ndash2645 Dec 2011

[14] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive

multimodal eye recognitionrdquo Signal Image Video Process vol 7 no 4

pp 619ndash631 Jul 2013

[15] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive

approach for sclera image quality measurerdquo Int J Biometrics vol 5 no 2

pp 181ndash198 2013

[16] Z Zhou E Y Du N L Thomas and E J Delp ldquoA new human

identification method Sclera recognitionrdquo IEEE Trans Syst Man

Cybern A Syst Humans vol 42 no 3 pp 571ndash583 May 2012

Page 23: Major project report on

classification algorithm finding features that are robust to variations

mentioned above and developing statistical models which can account for

these variations are very important for remote face recognition

2 R N Rakvic B J Ulis R P Broussard R W Ives and N

Steiner1 ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics

Security

With the rapidly expanded biometric data collected by various sectors

of government and industry for identification and verification purposes

how to manage and process such Big Data draws great concern Even

though modern processors are equipped with more cores and memory

capacity it still requires careful design in order to utilize the hardware

resource effectively and the power consumption efficiently This research

addresses this issue by investigating the workload characteristics of

biometric application Taking Daugmanrsquos iris matching algorithm which

has been proven to be the most reliable iris matching method as a case

study we conduct performance profiling and binary instrumentation on the

benchmark to capture its execution behavior The results show that data

loading and memory access incurs great performance overhead and

motivates us to move the biometrics computation to high-performance

architecture

Modern iris recognition algorithms can be computationally intensive

yet are designed for traditional sequential processing elements such as a

personal computer However a parallel processing alternative using field

programmable gate arrays (FPGAs) offers an opportunity to speed up iris

recognition Within the means of this project iris template generation with

directional filtering which is acomputationally expensive yet parallel

portion of a modern iris recognition algorithm is parallelized on an FPGA

system We will present a performance comparison of the parallelized

algorithm on the FPGA system to a traditional CPU-based version The

parallelized template generation outperforms an optimized C++ code

version determining the information content of an iris approximately 324

times faster

3 R Derakhshani A Ross and S Crihalmeanu ldquoA new biometric

modality based on conjunctival vasculaturerdquo in Proc Artif Neural

Netw Eng 2006 pp 1ndash8

A new biometric indicator based on the patterns of conjunctival

vasculature is proposed Conjunctival vessels can be observed on the visible

part of the sclera that is exposed to the outside world These vessels

demonstrate rich and specific details in visible light and can be easily

photographed using a regular digital camera In this paper we discuss

methods for conjunctival imaging preprocessing and feature extraction in

order to derive a suitable conjunctival vascular template for biometric

authentication Commensurate classification methods along with the

observed accuracy are discussed Experimental results suggest the potential

of using conjunctival vasculature as a biometric measure Identification of

a person based on some unique set of features is an important task The

human identification is possible with several biometric systems and sclera

recognition is one of the promising biometrics The sclera is the white

portion of the human eye The vein pattern seen in the sclera region is

unique to each person Thus the sclera vein pattern is a well suited

biometric technology for human identification The existing methods used

for sclera recognition have some drawbacks like only frontal looking

images are preferred for matching and rotation variance is another problem

These problems are completely eliminated in the proposed system by using

two feature extraction techniques They are Histogram of Oriented

Gradients (HOG) and converting the image into polar form using the

bilinear interpolation technique These two features help the proposed

system to become illumination invariant and rotation invariant The

experimentation is done with the help of UBIRIS database The

experimental result shows that the proposed sclera recognition method can

achieve better accuracy than the previous methods

4 J D Owens M Houston D Luebke S Green J E Stone and J

C Phillips ldquoGPU computingrdquo Proc IEEE vol 96 no 5pp 879ndash899

May 2008

The graphics processing unit (GPU) has become an integral part of

todayrsquos mainstream computing systems Over the past six years there has

been a marked increase in the performance and capabilities of GPUs The

modern GPU is not only a powerful graphics engine but also a highly

parallel programmable processor featuring peak arithmetic and memory

bandwidth that substantially outpaces its CPU counterpart The GPUrsquos

rapid increase in both programmability and capability has spawned a

research community that has successfully mapped a broad range of

computationally demanding complex problems to the GPU This effort in

general purpose computing on the GPU also known as GPU computing

has positioned the GPU as a compelling alternative to traditional

microprocessors in high-performance computer systems of the future We

describe the background hardware and programming model for GPU

computing summarize the state of the art in tools and techniques and

present four GPU computing successes in game physics and computational

biophysics that deliver order-of-magnitude performance gains over

optimized CPU applications

5 H Proenccedila and L A Alexandre ldquoUBIRIS A noisy iris image

databaserdquoin Proc 13th Int Conf Image Anal Process 2005 pp 970ndash

977

This paper proposes algorithms for iris segmentation quality

enhancement match score fusion and indexing to improve both the

accuracy and the speed of iris recognition A curve evolution approach is

proposed to effectively segment a nonideal iris image using the modified

MumfordndashShah functional Different enhancement algorithms are

concurrently applied on the segmented iris image to produce multiple

enhanced versions of the iris image A support-vector-machine-based

learning algorithm selects locally enhanced regions from each globally

enhanced image and combines these good-quality regions to create a single

high-quality iris image Two distinct features are extracted from the high-

quality iris image The global textural feature is extracted using the 1-D log

polar Gabor transform and the local topological feature is extracted using

Euler numbers An intelligent fusion algorithm combines the textural and

topological matching scores to further improve the iris recognition

performance and reduce the false rejection rate whereas an indexing

algorithm enables fast and accurate iris identification The verification and

identification performance of the propose algorithms is validated and

compared with other algorithms using the CASIA Version 3 ICE 2005 and

UBIRIS iris databases

18 PROPOSED METHOD

Proposed a new parallel sclera vein recognition method using a two-

stage parallel approach for registration and matching A parallel sclera

matching solution for Sclera vein recognition using our sequential line-

descriptor method using the CUDA GPU architecture CUDA is a highly

parallel multithreaded many-core processor with tremendous

computational power

It supports not only a traditional graphics pipeline but also computation

on non-graphical data It is relatively straightforward to implement our C

program for CUDA on AMD-based GPU using Open CL Our CUDA

kernels can be directly converted to Open CL kernels by concerning

different syntax for various keywords and built-in functions The mapping

strategy is also effective in Open CL if we regard thread and block in

CUDA as work item and work-group in Open CL Most of our optimization

techniques such as coalesced memory access and prefix sum can work in

Open CL too Moreover since CUDA is a data parallel architecture the

implementation of our approach by Open CL should be programmed in

data-parallel model

In this research we first discuss why the naiumlve parallel approach would

not work We then propose the new sclera descriptor ndash the Y shape sclera

feature-based efficient registration method to speed up the mapping scheme

introduce the ldquoweighted polar line (WPL) descriptorrdquo that would be better

suited for parallel computing to mitigate the mask size issue and develop

our coarse to fine two-stage matching process to dramatically improve the

matching speed These new approaches make the parallel processing

possible and efficient

191PROPOSED SYSTEM ADVANTAGES

1 To improve the efficiency in this research we propose a new descriptor

mdash the Y shape descriptor which can greatly help improve the efficiency of

the coarse registration of two images and can be used to filter out some

non-matching pairs before refined matching

2 We propose the coarse-to-fine two-stage matching process In the first

stage we matched two images coarsely using the Y-shape descriptors

which is very fast to match because no registration was needed The

matching result in this stage can help filter out image pairs with low

similarities

CHAPTER 2

PROJECT DESCRIPTION

21 INTRODUCTION

The sclera is the opaque and white outer layer of the eye The blood

vessel structure of sclera is formed randomly and is unique to each person

which can be used for humanrsquos identification Several researchers have

designed different Sclera vein recognition methods and have shown that it

is promising to use Sclera vein recognition for human identification In

Crihalmeanu and Ross proposed three approaches Speed Up Robust

Features (SURF)-based method minutiae detection and direct correlation

matching for feature registration and matching Within these three methods

the SURF method achieves the best accuracy It takes an average of 15

seconds1 using the SURF method to per- form a one-to-one matching Zhou

et al proposed line descriptor-based method for sclera vein recognition

The matching step (including registration) is the most time-consuming step

in this sclera vein recognition system which costs about 12 seconds to

perform a one-to-one matching Both speed was calculated using a PC with

Intelreg Coretrade 2 Duo 24GHz processors and 4 GB DRAM Currently

Sclera vein recognition algorithms are designed using central processing

unit (CPU)-based systems

As discussed CPU-based systems are designed as sequential

processing devices which may not be efficient in data processing where the

data can be parallelized Because of large time consumption in the matching

step Sclera vein recognition using sequential-based method would be very

challenging to be implemented in a real time biometric system especially

when there is large number of templates in the database for matching GPUs

(as abbreviation of General purpose Graphics Processing Units GPGPUs)

are now popularly used for parallel computing to improve the

computational processing speed and efficiency The highly parallel

structure of GPUs makes them more effective than CPUs for data

processing where processing can be performed in parallel GPUs have been

widely used in biometrics recognition such as speech recognition text

detection handwriting recognition and face recognition In iris

recognition GPU was used to extract the features construct descriptors

and match templates

GPUs are also used for object retrieval and image search Park et al

designed the performance evaluation of image processing algorithms such

as linear feature extraction and multi-view stereo matching on GPUs

However these approaches were designed for their specific biometric

recognition applications and feature searching methods Therefore they may

not be efficient for Sclera vein recognition Compute Unified Device

Architecture (CUDA) the computing engine of NVIDIA GPUs is used in

this research CUDA is a highly parallel multithreaded many-core

processor with tremendous computational power It supports not only a

traditional graphics pipeline but also computation on non-graphical data

More importantly it offers an easier programming platform which

outperforms its CPU counterparts in terms of peak arithmetic intensity and

memory bandwidth In this research the goal is not to develop a unified

strategy to parallelize all sclera matching methods because each method is

quite different from one another and would need customized design To

develop an efficient parallel computing scheme it would need different

strategies for different Sclera vein recognition methods

Rather the goal is to develop a parallel sclera matching solution for

Sclera vein recognition using our sequential line-descriptor method using

the CUDA GPU architecture However the parallelization strategies

developed in this research can be applied to design parallel approaches for

other Sclera vein recognition methods and help parallelize general pattern

recognition methods Based on the matching approach in there are three

challenges to map the task of sclera feature matching to GPU

1) Mask files are used to calculate valid overlapping areas of two sclera

templates and to align the templates to the same coordinate system But the

mask files are large in size and will preoccupy the GPU memory and slow

down the data transfer Also some of processing on the mask files will

involve convolution which is difficult to improve its performance on the

scalar process unit on CUDA

2) The procedure of sclera feature matching consists of a pipeline of several

computational stages with different memory and processing requirements

There is no uniform mapping scheme applicable for all these stages

3) When the scale of sclera database is far larger than the number of the

processing units on the GPU parallel matching on the GPU is still unable to

satisfy the requirement of real-time performance New designs are

necessary to help narrow down the search range In summary naiumlve

implementation of the algorithms in parallel would not work efficiently

Note it is relatively straightforward to implement our C program for

CUDA on AMD-based GPU using Open CL Our CUDA kernels can be

directly converted to Open CL kernels by concerning different syntax for

various keywords and built-in functions The mapping strategy is also

effective in Open CL if we regard thread and block in CUDA as work item

and work-group in Open CL Most of our optimization techniques such as

coalesced memory access and prefix sum can work in Open CL too

Moreover since CUDA is a data parallel architecture the implementation

of our approach by Open CL should be programmed in data-parallel model

In this research we first discuss why the naiumlve parallel approach would not

work (Section 3) We then propose the new sclera descriptor ndash the Y shape

sclera feature-based efficient registration method to speed up the mapping

scheme (Section 4) introduce the ldquoweighted polar line (WPL) descriptorrdquo

that would be better suited for parallel computing to mitigate the mask size

issue (Section 5) and develop our coarse to fine two-stage matching

process to dramatically improve the matching speed (Section 6) These new

approaches make the parallel processing possible and efficient However it

is non-trivial to implement these algorithms in CUDA We then developed

the implementation schemes to map our algorithms into CUDA (Section 7)

In the Section 2 we give brief introduction of Sclera vein recognition In

the Section 8 we performed some experiments using the proposed system

In the Section 9 we draw some conclusions

22 BACKGROUND OF SCLERA VEIN RECOGNITION

221 OVERVIEW OF SCLERA VEIN RECOGNITION

A typical sclera vein recognition system includes sclera

segmentation feature enhancement feature extraction and feature

matching (Figure 1)

FIG

Sclera image segmentation is the first step in sclera vein recognition

Several methods have been designed for sclera segmentation Crihalmeanu

et al presented an semi-automated system for sclera segmentation They

used a clustering algorithm to classify the color eye images into three

clusters - sclera iris and background Later on Crihalmeanu and Ross

designed a segmentation approach based on a normalized sclera index

measure which includes coarse sclera segmentation pupil region

segmentation and fine sclera segmentation Zhou et al developed a skin

tone plus ldquowhite colorrdquo-based voting method for sclera segmentation in

color images and Otsursquos thresholding-based method for grayscale images

After sclera segmentation it is necessary to enhance and extract the sclera

features since the sclera vein patterns often lack contrast and are hard to

detect Zhou et al used a bank of multi-directional Gabor filters for

vascular pattern enhancement Derakhshani et al used contrast limited

adaptive histogram equalization (CLAHE) to enhance the green color plane

of the RGB image and a multi-scale region growing approach to identify

the sclera veins from the image background Crihalmeanu and Ross applied

a selective enhancement filter for blood vessels to extract features from the

green component in a color image In the feature matching step

Crihalmeanu and Ross proposed

three registration and matching approaches including Speed Up Robust

Features (SURF) which is based on interest-point detection minutiae

detection which is based on minutiae points on the vasculature structure

and direct correlation matching which relies on image registration Zhou et

al designed a line descriptor based feature registration and matching

method

The proposed sclera recognition consists of five steps which include

sclera segmentation vein pattern enhancement feature extraction feature

matching and matching decision Fig 2 shows the block diagram of sclera

recognition Two types of feature extraction are used in the proposed

method to achieve good accuracy for the identification The characteristics

that are elicited from the blood vessel structure seen in the sclera region are

Histogram of Oriented Gradients (HOG) and interpolation of Cartesian to

Polar conversion HOG is used to determine the gradient orientation and

edge orientations of vein pattern in the sclera region of an eye image To

become more computationally efficient the data of the image are converted

to the polar form It is mainly used for circular or quasi circular shape of

object These two characteristics are extracted from all the images in the

database and compared with the features of the query image whether the

person is correctly identified or not This procedure is done in the feature

matching step and ultimately makes the matching decision By using the

proposed feature extraction methods and matching techniques the human

identification is more accurate than the existing studies In the proposed

method two features of an image are drawn out

222 SCLERA SEGMENTATION

Sclera segmentation is the first step in the sclera recognition It lets

in three steps glare area detection sclera area estimation and iris and eyelid

detection and refinement Fig shows the steps of segmentation

FIG

Glare Area Detection Glare area means a small bright area near

pupil or iris This is the unwanted portion on the eye image Sobel filter is

applied to detect the glare area present in the iris or pupil Simply it runs

only for the grayscale image If the image is color then it needs a

conversion to grayscale image and after that apply it to the Sobel filter to

detect the glare area Fig 4 shows the result of the glare area detection

FIG

Sclera area estimation For the estimation of sclera area Otsursquos

thresholding method is applied The stairs of the sclera area detection are

selection of the area of interest (ROI) Otsursquos thresholding sclera area

detection Left and right sclera area is selected based on the iris boundaries

When the region of interest is selected then apply Otsursquos thresholding for

obtaining the potential sclera areas The correct left sclera area should be

placed in the right and center positions and correct right sclera area should

be placed in the left and center In this way non sclera areas are wiped out

223 IRIS AND EYELID REFINEMENT

The top and underside of the sclera regions are the limits of the

sclera area And then that upper eyelid lower eyelid and iris boundaries are

refined These altogether are the unwanted portion for recognition In order

to eliminate these effects refinement is done in the footstep of the detection

of sclera area Fig shows after the Otsursquos thresholding process and iris and

eyelid refinement to detect right sclera area In the same way the left sclera

area is detected using this method

FIG

In the segmentation process all images are not perfectly segmented

Hence feature extraction and matching are needed to reduce the

segmentation fault The vein patterns in the sclera area are not visible in the

segmentation process To get vein patterns more visible vein pattern

enhancement is to be performed

224 OCULAR SURFACE VASCULATURE

Human recognition using vascular patterns in the human body has

been studied in the context of fingers (Miura et al 2004) palm (Lin and

Fan 2004) and retina (Hill 1999) In the case of retinal biometrics an

especial optical device for imaging the back of the eyeball is needed (Hill

1999) Due to its perceived invasiveness and the required degree of subject

cooperation the use of retinal biometrics may not be acceptable to some

individuals The conjunctiva is a thin transparent and moist tissue that

covers the outer surface of the eye The part of the conjunctiva that covers

the inner lining of the eyelids is called palpebral conjunctiva and the part

that covers the outer surface of the eye is called ocular (or the bulbar)

conjunctiva which is the focus of this study The ocular conjunctiva is very

thin and clear thus the vasculature (including those of the episclera) is

easily visible through it The visible microcirculation of conjunctiva offers a

rich and complex network of veins and fine microcirculation (Fig 1) The

apparent complexity and specificity of these vascular patterns motivated us

to utilize them for personal identification (Derakhshani and Ross 2006)

FIG

We have found conjunctival vasculature to be a suitable biometric as it

conforms to the following criteria (Jain et al 2004)

UNIVERSALITY All normal living tissues including that of the

conjunctiva and episclera have vascular structure

UNIQUENESS Vasculature is created during embryonic vasculogenesis

Its detailed final structure is mostly stochastic and thus unique Even

though no comprehensive study on the uniqueness of vascular structures

has been conducted study of some targeted areas such as those of the eye

fundus confirm the uniqueness of such vascular patterns even between

identical twins (Simon and Goldstein 1935 Tower 1955)

PERMANENCE Other than cases of significant trauma pathology or

chemical intervention spontaneous adult ocular vasculogenesis and

angiogenesis does not easily occur Thus the conjunctival vascular

structure is expected to have reasonable permanence (Joussen 2001)

Practicality Conjunctival vasculature can be captured with commercial off

the shelf digital cameras under normal lighting conditions making this

modality highly practical

ACCEPTABILITY Since the subject is not required to stare directly into

the camera lens and given the possibility of capturing the conjunctival

vasculature from several feet away this modality is non-intrusive and thus

more acceptable

SPOOF-PROOFNESS The fine multi surface structure of the ocular

veins makes them hard to reproduce as a physical artifact Besides being a

stand-alone biometric modality we anticipate that the addition of

conjunctival biometrics will enhance the performance of current iris-based

biometric system in the following ways

Improving accuracy by the addition of vascular features

Facilitating recognition using off-angle iris images For instance if the iris

information is relegated to the left or right portions of the eye the sclera

vein patterns will be further exposed This feature makes sclera vasculature

a natural complement to the iris biometric

Addressing the failure-to-enroll issue when iris patterns are not usable (eg

due to surgical procedures)

Reducing vulnerability to spoof attacks For instance when implemented

alongside iris systems an attacker needs to reproduce not only the iris but

also different surfaces of the sclera along with the associated

microcirculation and make them available on commensurate eye surfaces

The first step in parallelizing an algorithm is to determine the

availability for simultaneous computation Below Figure demonstrates the

possibility for parallel directional filtering Since the filter is computed over

different portions of the input image the computation can be computed in

parallel (denoted by Elements below) In addition individual parallelization

of each element of Filtering can also be performed A detailed discussion of

our proposed parallelization is outside the scope of this paper

FIG

FIG

225 OVERVIEW OF THE LINE DESCRIPTOR-BASED SCLERA

VEIN

2251 RECOGNITION METHOD

The matching segment of the line-descriptor based method is a

bottleneck with regard to matching speed In this section we briefly

describe the Line Descriptor-based sclera vein recognition method After

segmentation vein patterns were enhanced by a bank of directional Gabor

filters Binary morphological operations are used to thin the detected vein

structure down to a single pixel wide skeleton and remove the branch

points The line descriptor is used to describe the segments in the vein

structure Figure 2 shows a visual description of the line descriptor Each

segment is described by three quantities the segments angle to some

reference angle at the iris center θ the segments distance to the iris center r

and the dominant angular orientation of the line segment ɸ Thus the

descriptor is S = ( θ r ɸ )T The individual components of the line descriptor

are calculated as

FIG

Here fline (x) is the polynomial approximation of the line segment (xl yl )

is the center point of the line segment (xi yi ) is the center of the detected

iris and S is the line descriptor In order to register the segments of the

vascular patterns a RANSAC-based algorithm is used to estimate the best-

fit parameters for registration between the two sclera vascular patterns For

the registration algorithm it randomly chooses two points ndash one from the

test template and one from the target template It also randomly chooses a

scaling factor and a rotation value based on a priori knowledge of the

database Using these values it calculates a fitness value for the registration

using these parameters

After sclera template registration each line segment in the test

template is compared to the line segments in the target template for

matches In order to reduce the effect of segmentation errors we created the

weighting image (Figure 3) from the sclera mask by setting interior pixels

in the sclera mask to 1 pixels within some distance of the boundary of the

mask to 05 and pixels outside the mask to 0

The matching score for two segment descriptors is calculated By

where Si and Sj are two segment descriptors m(Si Sj ) is the matching

score between segments Si and Sj d(Si Sj ) is the Euclidean distance

between the segment descriptors center points (from Eq 6-8) Dmatch is

the matching distance threshold and match is the matching angle threshold

The total matching score M is the sum of the individual matching scores

divided by the maximum matching score for the minimal set between the

test and target template That is one of the test or target templates has fewer

points and thus the sum of its descriptors weight sets the maximum score

that can be attained

FIG

FIG

FIG

FIG

movement of eye Y shape branches are observed to be a stable feature and

can be used as sclera feature descriptor To detect the Y shape branches in

the original template we search for the nearest neighbors set of every line

segment in a regular distance classified the angles among these neighbors

If there were two types of angle values in the line segment set this set may

be inferred as a Y shape structure and the line segment angles would be

recorded as a new feature of the sclera

There are two ways to measure both orientation and relationship of

every branch of Y shape vessels one is to use the angles of every branch to

x axle the other is to use the angels between branch and iris radial

direction The first method needs additional rotation operating to align the

template In our approach we employed the second method As Figure 6

shows ϕ1 ϕ2 and ϕ3 denote the angle between each branch and the radius

from pupil center Even when the head tilts the eye moves or the camera

zooms occurs at the image acquisition step ϕ1 ϕ2 and ϕ3 are quite stable

To tolerate errors from the pupil center calculation in the segmentation step

we also recorded the center position (x y) of the Y shape branches as

auxiliary parameters So in our rotation shift and scale invariant feature

vector is defined as y(ϕ1 ϕ2 ϕ3 x y) The Y-shape descriptor is generated

with reference to the iris center Therefore it is automatically aligned to the

iris centers It is a rotational- and scale- invariant descriptor V WPL

SCLERA DESCRIPTOR As we discussed in the Section 22 the line

descriptor is extracted from the skeleton of vessel structure in binary images

(Figure 7) The skeleton is then broken into smaller segments For each

segment a line descriptor is created to record the center and orientation of

the segment This descriptor is expressed as s(x yɸ) where (x y) is the

position of the center and ɸ is its orientation Because of the limitation of

segmentation accuracy the descriptor in the boundary of sclera area might

not be accurate and may contain spur edges resulting from the iris eyelid

andor eyelashes To be tolerant of such error the mask file

FIG

The line descriptor of the sclera vessel pattern (a) An eye image (b) Vessel

patterns in sclera (c) Enhanced sclera vessel patterns (d) Centers of line

segments of vessel patterns

Is designed to indicate whether a line segment belongs to the edge of the

sclera or not However in GPU application using the mask is a challenging

since the mask files are large in size and will occupy the GPU memory and

slow down the data transfer When matching the registration RANSAC

type algorithm was used to randomly select the corresponding descriptors

and the transform parameter between them was used to generate the

template transform affine matrix After every templates transform the mask

data should also be transformed and new boundary should be calculated to

evaluate the weight of the transformed descriptor This results in too many

convolutions in processor unit

To reduce heavy data transfer and computation we designed the

weighted polar line (WPL) descriptor structure which includes the

information of mask and can be automatically aligned We extracted the

relationship of geometric feature of descriptors and store them as a new

descriptor We use a weighted image created via setting various weight

values according to their positions The weight of those descriptors who are

beyond the sclera are set to be 0 and those who are near the sclera

boundary are 05 and interior descriptors are set to be 1 In our work

descriptors weights were calculated on their own mask by the CPU only

once

The calculating result was saved as a component of descriptor The

descriptor of sclera will change to s(x y ɸw) where w denotes the weight

of the point and the value may be 0 05 1 To align two templates when a

template is shifted to another location along the line connecting their

centers all the descriptors of that template will be transformed It would be

faster if two templates have similar reference points If we use the center of

the iris as the reference point when two templates are compared the

correspondence will automatically be aligned to each other since they have

the similar reference point Every feature vector of the template is a set of

line segment descriptors composed of three variable (Figure 8) the

segment angle to the reference line which went through the iris center

denoted as θ the distance between the segments center and pupil center

which is denoted as r the dominant angular orientation of the segment

denoted as ɸ To minimize the GPU computing we also convert the

descriptor value from polar coordinate to rectangular coordinate in CPU

preprocess

The descriptor vector becomes s(x y r θ ɸw) The left and right

parts of sclera in an eye may have different registration parameters For

example as an eyeball moves left left part sclera patterns of the eye may be

compressed while the right part sclera patterns are stretched

In parallel matching these two parts are assigned to threads in

different warps to allow different deformation The multiprocessor in

CUDA manages threads in groups of 32 parallel threads called warps We

reorganized the descriptor from same sides and saved

FIG

FIG

them in continuous address This would meet requirement of coalesced

memory access in GPU

After reorganizing the structure of descriptors and adding mask information

into the new descriptor the computation on the mask file is not needed on

the GPU It was very fast to match with this feature because it does not

need to reregister the templates every time after shifting Thus the cost of

data transfer and computation on GPU will be reduced Matching on the

new descriptor the shift parameters generator in Figure 4 is then simplified

as Figure 9

23 EVOLUTION OF GPU ARCHITECTURE

The fixed-function pipeline lacked the generality to efficiently express

more complicated shading and lighting operations that are essential for

complex effects The key step was replacing the fixed-function per-vertex

and per-fragment operations with user-specified programs run on each

vertex and fragment Over the past six years these vertex programs and

fragment programs have become increasingly more capable with larger

limits on their size and resource consumption with more fully featured

instruction sets and with more flexible control-flow operations After many

years of separate instruction sets for vertex and fragment operations current

GPUs support the unified Shader Model 40 on both vertex and fragment

shaders

The hardware must support shader programs of at least 65 k static

instructions and unlimited dynamic instructions

The instruction set for the first time supports both 32-bit integers and 32-

bit floating-point numbers

The hardware must allow an arbitrary number of both direct and indirect

reads from global memory (texture)

Finally dynamic flow control in the form of loops and branches must be

supported

As the shader model has evolved and become more powerful and GPU

applications of all types have increased vertex and fragment program

complexity GPU architectures have increasingly focused on the

programmable parts of the graphics pipeline Indeed while previous

generations of GPUs could best be described as additions of

programmability to a fixed-function pipeline todayrsquos GPUs are better

characterized as a programmable engine surrounded by supporting fixed-

function units General-Purpose Computing on the GPU Mapping general-

purpose computation onto the GPU uses the graphics hardware in much the

same way as any standard graphics application Because of this similarity it

is both easier and more difficult to explain the process On one hand the

actual operations are the same and are easy to follow on the other hand the

terminology is different between graphics and general-purpose use Harris

provides an excellent description of this mapping process

We begin by describing GPU programming using graphics terminology

then show how the same steps are used in a general-purpose way to author

GPGPU applications and finally use the same steps to show the more

simple and direct way that todayrsquos GPU computing applications are written

231 PROGRAMMING A GPU FOR GRAPHICS

We begin with the same GPU pipeline that we described in Section II

concentrating on the programmable aspects of this pipeline

The programmer specifies geometry that covers a region on the screen

The rasterizer generates a fragment at each pixel location covered by that

geometry

Each fragment is shaded by the fragment program

The fragment program computes the value of the fragment by a

combination of math operations and global memory reads from a global

Btexture[ memory

The resulting image can then be used as texture on future passes through

the graphics pipeline

232 PROGRAMMING A GPU FOR GENERAL-PURPOSE

PROGRAMS (OLD)

Coopting this pipeline to perform general-purpose computation

involves the exact same steps but different terminology A motivating

example is a fluid simulation computed over a grid at each time step we

compute the next state of the fluid for each grid point from the current state

at its grid point and at the grid points of its neighbors

The programmer specifies a geometric primitive that covers a

computation domain of interest The rasterizer generates a fragment at each

pixel location covered by that geometry (In our example our primitive

must cover a grid of fragments equal to the domain size of our fluid

simulation)

Each fragment is shaded by an SPMD general-purpose fragment

program (Each grid point runs the same program to update the state of its

fluid)

The fragment program computes the value of the fragment by a

combination of math operations and Bgather[ accesses from global

memory (Each grid point can access the state of its neighbors from the

previous time step in computing its current value)

The resulting buffer in global memory can then be used as an input on

future passes (The current state of the fluid will be used on the next time

step)

23 2 PROGRAMMING A GPU FOR GENERAL-PURPOSE

PROGRAMS (NEW)

One of the historical difficulties in programming GPGPU applications

has been that despite their general-purpose tasksrsquo having nothing to do with

graphics the applications still had to be programmed using graphics APIs

In addition the program had to be structured in terms of the graphics

pipeline with the programmable units only accessible as an intermediate

step in that pipeline when the programmer would almost certainly prefer to

access the programmable units directly The programming environments we

describe in detail in Section IV are solving this difficulty by providing a

more natural direct non-graphics interface to the hardware and

specifically the programmable units Today GPU computing applications

are structured in the following way

The programmer directly defines the computation domain of interest as a

structured grid of threads

An SPMD general-purpose program computes the value of each thread

The value for each thread is computed by a combination of math

operations and both Bgather[ (read) accesses from and Bscatter[ (write)

accesses to global memory Unlike in the previous two

methods the same buffer can be used for both reading and writing

allowing more flexible algorithms (for example in-place algorithms that

use less memory)

The resulting buffer in global memory can then be used as an input in

future computation

24 COARSE-TO-FINE TWO-STAGE MATCHING PROCESS

To further improve the matching process we propose the coarse-to-fine

two-stage matching process In the first stage we matched two images

coarsely using the Y-shape descriptors which is very fast to match because

no registration was needed The matching result in this stage can help filter

out image pairs with low similarities After this step it is still possible for

some false positive matches In the second stage we used WPL descriptor

to register the two images for more detailed descriptor matching including

scale- and translation invariance This stage includes shift transform affine

matrix generation and final WPL descriptor matching Overall we

partitioned the registration and matching processing into four kernels2 in

CUDA (Figure 10) matching on the Y shape descriptor shift

transformation affine matrix generation and final WSL descriptor

matching Combining these two stages the matching program can run faster

and

achieve more accurate score

241 STAGE I MATCHING WITH Y SHAPE DESCRIPTOR

Due to scale- and rotation- invariance of the Y-shape features

registration is unnecessary before matching on Y shape descriptor The

whole matching algorithm is listed as algorithm 1

FIG

Here ytei and yta j are the Y shape descriptors of test template Tte

and target template Tta respectively dϕ is the Euclidian distance of angle

element of descriptors vector defined as (3) dxy is the Euclidian distance of

two descriptor centers defined as (4) ni and di are the matched descriptor

pairsrsquo number and their centers distance respectively tϕ is a distance

threshold and txy is the threshold to restrict the searching area We set tϕ to

30 and txy to 675 in our experiment Here

To match two sclera templates we searched the areas nearby to all

the Y shape branches The search area is limited to the corresponding left or

right half of the sclera in order to reduce the searching range and time The

distance of two branches is defined in (3) where ϕi j is the angle between

the j th branch and the polar from pupil center in desctiptor i

The number of matched pairs ni and the distance between Y shape

branches centers di are stored as the matching result We fuse the number of

matched branches and the average distance between matched branches

centers as (2) Here α is a factor to fuse the matching score which was set

to 30 in our study Ni and Nj is the total numbers of feature vectors in

template i and j separately The decision is regulated by the threshold t if

the sclerarsquos matching score is lower than t the sclera will be discarded The

sclera with high matching score will be passed to the next more precisely

matching process

242 STAGE II FINE MATCHING USING WPL DESCRIPTOR

The line segment WSL descriptor reveals more vessel structure detail of

sclera than the Y shape descriptor The variation of sclera vessel pattern is

nonlinear because

When acquiring an eye image in different gaze angle the vessel structure

will appear nonlinear shrink or extend because eyeball is spherical in shape

sclera is made up of four layers episclera stroma lamina fusca and

endothelium There are slightly differences among movement of these

layers Considering these factors our registration employed both single

shift transform and multi-parameter transform which combines shift

rotation and scale together

1) SHIFT PARAMETER SEARCH As we discussed before

segmentation may not be accurate As a result the detected iris center could

not be very accurate Shift transform is designed to tolerant possible errors

in pupil center detection in the segmentation step If there is no deformation

or only very minor deformation registration with shift transform together

would be adequate to achieve an accurate result We designed Algorithm 2

to get optimized shift parameter Where Tte is the test template and ssei is

the i th WPL descriptor of Tte Tta is the target template and ssai is the i th

WPL descriptor of Ttad(stek staj ) is Euclidean distance of escriptors stek

and staj

Δsk is the shift value of two descriptors defines as

We first randomly select an equal number of segment descriptors

stek in test template Tte from each quad and find its nearest neighbors staj _

in target template Tta The shift offset of them is recorded as the possible

registration shift factor _sk The final offset registration factor is _soptim

which has the smallest standard deviation among these candidate offsets

2) AFFINE TRANSFORM PARAMETER SEARCH

Affine transform is designed to tolerant some deformation of sclera

patterns in the matching step The affine transform algorithm is shown in

Algorithm 3 The shift value in the parameter set is obtained by randomly

selecting descriptor s(it )te and calculating the distance from its nearest

neighbor staj_ in Tta We transform the test template by the matrix in (7)

At end of the iteration we count the numbers of matched descriptor pairs

from the transformed template and the target template The factor β is

involved to determine if the pair of descriptor is matched and we set it to

be 20 pixels in our experiment After N iterations the optimized transform

parameter set is determined via selecting the maximum matching numbers

m(it) Here stei Tte staj and Tta is defined same as algorithm 2 tr (it )

shi f t θ(it )tr (it ) scale is the parameters of shift rotation and scale

transform generated in i tth iteration R(θ (it )) T (tr (it ) shi f t ) and S(tr (it

) scale) are the transform matrix defined as (7) To search optimize

transform parameter we iterated N times to generate these parameters In

our experiment we set iteration time to 512

3) REGISTRATION AND MATCHING ALGORITHM

Using the optimized parameter set determined from Algorithms 2

and 3 the test template will be registered and matched simultaneously The

registration and matching algorithm is listed in Algorithm 4 Here stei Tte

staj and Tta are defined same as Algorithms 2 and 3 θ(optm) tr (optm) shi f

t tr (optm) scale _soptim are the registration parameters attained from

Algorithms 2 and 3 R_θ(optm)_T _tr (optm) shi f t _S_tr (optm) scale

_ is the descriptor transform matrix defined in Algorithm 3 empty is the angle

between the segment descriptor and radius direction w is the weight of the

descriptor which indicates whether the descriptor is at the edge of sclera or

not To ensure that the nearest descriptors have a similar orientation we

used a constant factor α to check the abstract difference of two ɸ In our

experiment we set α to 5 The total matching score is minimal score of two

transformed result divided by the minimal matching score for test template

and target template

25 MAPPING THE SUBTASKS TO CUDA

CUDA is a single instruction multiple data (SIMD) system and

works as a coprocessor with a CPU A CUDA consists of many streaming

multiprocessors (SM) where the parallel part of the program should be

partitioned into threads by the programmer and mapped into those threads

There are multiple memory spaces in the CUDA memory hierarchy

register local memory shared memory global memory constant memory

and texture memory Register local memory and shared memory are on-

chip and could be a little time consuming to access these memories Only

shared memory can be accessed by other threads within the same block

However there is only limited availability of shared memory Global

memory constant memory and texture memory are off-chip memory and

accessible by all threads which would be very time consuming to access

these memories

Constant memory and texture memory are read-only and cacheable

memory Mapping algorithms to CUDA to achieve efficient processing is

not a trivial task There are several challenges in CUDA programming

If threads in a warp have different control path all the branches will be

executed serially To improve performance branch divergence within a

warp should be avoided

Global memory is slower than on-chip memory in term of access To

completely hide the latency of the small instructions set we should use on-

chip memory preferentially rather than global memory When global

memory access occurs threads in same warp should access the words in

sequence to achieve coalescence

Shared memory is much faster than the local and global memory space

But shared memory is organized into banks which are equal in size If two

addresses of memory request from different thread within a warp fall in the

same memory bank the access will be serialized To get maximum

performance memory requests should be scheduled to minimize bank

conflicts

251 MAPPING ALGORITHM TO BLOCKS

Because the proposed registration and matching algorithm has four

independent modules all the modules will be converted to different kernels

on the GPU These kernels are different in computation density thus we

map them to the GPU by various map strategies to fully utilize the

computing power of CUDA Figure 11 shows our scheme of CPU-GPU

task distribution and the partition among blocks and threads Algorithm 1 is

partitioned into coarse-grained parallel subtasks

We create a number of threads in this kernel The number of threads

is the same as the number of templates in the database As the upper middle

column shows in Figure 11 each target template will be assigned to one

thread One thread performs a pair of templates compare In our work we

use NVIDIA C2070 as our GPU Threads and blocks number is set to

1024 That means we can match our test template with up to 1024times1024

target templates at same time

Algorithms 2-4 will be partitioned into fine-grained subtasks which is

processed a section of descriptors in one thread As the lower portion of the

middle column shows in Figure 11 we assigned a target template to one

block Inside a block one thread corresponds a set of descriptors in this

template This partition makes every block execute independently and there

are no data exchange requirements between different blocks When all

threads complete their responding descriptor fractions the sum of the

intermediate results needs to be computed or compared A parallel prefix

sum algorithm is used to calculate the sum of intermediate results which is

show in right of Figure 11 Firstly all odd number threads compute the sum

of consecutive pairs of the results Then recursively every first of i (= 4 8

16 32 64 ) threads

compute the prefix sum on the new result The final result will be saved in

the first address which has the same variable name as the first intermediate

result

252 MAPPING INSIDE BLOCK

In shift argument searching there are two schemes we can choose to

map task

Mapping one pair of templates to all the threads in a block and then every

thread would take charge of a fraction of descriptors and cooperation with

other threads

Assigning a single possible shift offset to a thread and all the threads will

compute independently unless the final result should be compared with

other possible offset

Due to great number of sum and synchronization operations in every

nearest neighbor searching step we choose the second method to parallelize

shift searching In affine matrix generator we mapped an entire parameter

set searching to a thread and every thread randomly generated a set of

parameters and tried them independently The generated iterations were

assigned to all threads The challenge of this step is the randomly generated

numbers might be correlated among threads In the step of rotation and

scale registration generating we used the Mersenne Twister pseudorandom

number generator because it can use bitwise arithmetic and have long

period

The Mersenne twister as most of pseudorandom generators is iterative

Therefore itrsquos hard to parallelize a single twister state update step among

several execution threads To make sure that thousands of threads in the

launch grid generate uncorrelated random sequence many simultaneous

Mersenne twisters need to process with different initial states in parallel

But even ldquovery differentrdquo (by any definition) initial state values do not

prevent the emission of correlated sequences by each generator sharing

identical parameters To solve this problem and to enable efficient

implementation of Mersenne Twister on parallel architectures we used a

special offline tool for the dynamic creation of Mersenne Twisters

parameters modified from the algorithm developed by Makoto Matsumoto

and Takuji Nishimura In the registration and matching step when

searching the nearest neighbor a line segment that has already matched

with others should not be used again In our approach a flag

FIG

FIG

Variable denoting whether the line has been matched is stored in

shared memory To share the flags all the threads in a block should wait

synchronic operation at every query step Our solution is to use a single

thread in a block to process the matching

253 MEMORY MANAGEMENT

The bandwidth inside GPU board is much higher than the

bandwidth between host memory and device memory The data transfer

between host and device can lead to long latency As shown in Figure 11

we load the entire target templates set from database without considering

when they would be processed Therefore there was no data transfer from

host to device during the matching procedure In global memory the

components in descriptor y(ϕ1 ϕ2 ϕ3 x y) and s(x y rθ ϕw) were stored

separately This would guarantee contiguous kernels of Algorithm 2 to 4

can access their data in successive addresses Although such coalescing

access reduces the latency frequently global memory access was still a

slower way to get data In our kernel we loaded the test template to shared

memory to accelerate memory access Because the Algorithms 2 to 4

execute different number of iterations on same data the bank conflict does

not happen To maximize our texture memory space we set the system

cache to the lowest value and bonded our target descriptor to texture

memory Using this catchable memory our data access was accelerated

more

FIG

26 HISTOGRAM OF ORIENTED GRADIENTS

Histogram of oriented gradients is the feature descriptors It is primarily

applied to the design of target detection In this paper it is applied as the

feature for human recognition In the sclera region the vein patterns are the

edges of an image So HOG is used to determine the gradient orientation

and edge orientations of vein pattern in the sclera region of an eye image

To follow out this technique first of all divide the image into small

connected regions called cells For each cell compute the histogram of

gradient directions or edge orientations of the pixels Then the combination

of different histogram of different cell represents the descriptor To improve

accuracy histograms can be contrast normalized by calculating the intensity

from the block and then using this value normalizes all cells within the

block This normalization result shows that it is invariant to geometric and

photometric changes The gradient magnitude m(x y) and orientation 1050592(x

y) are calculated using x and y directions gradients dx (x y) and dy (x y)

Orientation binning is the second step of HOG This method utilized

to create cell histograms Each pixel within the cell used to give a weight to

the orientation which is found in the gradient computation Gradient

magnitude is used as the weight The cells are in the rectangular form The

binning of gradient orientation should be spread over 0 to 180 degrees and

opposite direction counts as the same In the Fig 8 depicts the edge

orientation of picture elements If the images have any illumination and

contrast changes then the gradient strength must be locally normalized For

that cells are grouped together into larger blocks These blocks are

overlapping so that each cell contributes more than once to the final

descriptor Here rectangular HOG (R-HOG) blocks are applied which are

mainly in square grids The performance of HOG is improved by putting

on a Gaussian window into each block

FIG

CHAPTER 3

SOFTWARE SPECIFICATION

31 GENERAL

MATLAB(matrix laboratory) is a numerical

computing environment and fourth-generation programming language

Developed by Math Works MATLAB allows matrix manipulations

plotting of functions and data implementation of algorithms creation

of user interfaces and interfacing with programs written in other languages

including C C++ Java and Fortran

Although MATLAB is intended primarily for numerical computing an

optional toolbox uses the MuPAD symbolic engine allowing access

to symbolic computing capabilities An additional package Simulink adds

graphicalmulti-domainsimulationandModel-Based

Design for dynamic and embedded systems

In 2004 MATLAB had around one million users across industry

and academia MATLAB users come from various backgrounds

of engineering science and economics MATLAB is widely used in

academic and research institutions as well as industrial enterprises

MATLAB was first adopted by researchers and practitioners

in control engineering Littles specialty but quickly spread to many other

domains It is now also used in education in particular the teaching

of linear algebra and numerical analysis and is popular amongst scientists

involved in image processing The MATLAB application is built around the

MATLAB language The simplest way to execute MATLAB code is to type

it in the Command Window which is one of the elements of the MATLAB

Desktop When code is entered in the Command Window MATLAB can

be used as an interactive mathematical shell Sequences of commands can

be saved in a text file typically using the MATLAB Editor as a script or

encapsulated into a function extending the commands available

MATLAB provides a number of features for documenting and

sharing your work You can integrate your MATLAB code with other

languages and applications and distribute your MATLAB algorithms and

applications

32 FEATURES OF MATLAB

High-level language for technical computing

Development environment for managing code files and data

Interactive tools for iterative exploration design and problem solving

Mathematical functions for linear algebra statistics Fourier analysis

filtering optimization and numerical integration

2-D and 3-D graphics functions for visualizing data

Tools for building custom graphical user interfaces

Functions for integrating MATLAB based algorithms with external

applications and languages such as C C++ FORTRAN Javatrade COM

and Microsoft Excel

MATLAB is used in vast area including signal and image

processing communications control design test and measurement

financial modeling and analysis and computational Add-on toolboxes

(collections of special-purpose MATLAB functions) extend the MATLAB

environment to solve particular classes of problems in these application

areas

MATLAB can be used on personal computers and powerful

server systems including the Cheaha compute cluster With the addition of

the Parallel Computing Toolbox the language can be extended with parallel

implementations for common computational functions including for-loop

unrolling Additionally this toolbox supports offloading computationally

intensive workloads to Cheaha the campus compute cluster MATLAB is

one of a few languages in which each variable is a matrix (broadly

construed) and knows how big it is Moreover the fundamental operators

(eg addition multiplication) are programmed to deal with matrices when

required And the MATLAB environment handles much of the bothersome

housekeeping that makes all this possible Since so many of the procedures

required for Macro-Investment Analysis involves matrices MATLAB

proves to be an extremely efficient language for both communication and

implementation

321 INTERFACING WITH OTHER LANGUAGES

MATLAB can call functions and subroutines written in the C

programming language or FORTRAN A wrapper function is created

allowing MATLAB data types to be passed and returned The dynamically

loadable object files created by compiling such functions are termed MEX-

files (for MATLAB executable)

Libraries written in Java ActiveX or NET can be directly called

from MATLAB and many MATLAB libraries (for

example XML or SQL support) are implemented as wrappers around Java

or ActiveX libraries Calling MATLAB from Java is more complicated but

can be done with MATLAB extension which is sold separately by Math

Works or using an undocumented mechanism called JMI (Java-to-Mat lab

Interface) which should not be confused with the unrelated Java that is also

called JMI

As alternatives to the MuPAD based Symbolic Math Toolbox

available from Math Works MATLAB can be connected

to Maple or Mathematical

Libraries also exist to import and export MathML

Development Environment

Startup Accelerator for faster MATLAB startup on Windows especially on

Windows XP and for network installations

Spreadsheet Import Tool that provides more options for selecting and

loading mixed textual and numeric data

Readability and navigation improvements to warning and error messages in

the MATLAB command window

Automatic variable and function renaming in the MATLAB Editor

Developing Algorithms and Applications

MATLAB provides a high-level language and development

tools that let you quickly develop and analyze your algorithms and

applications

The MATLAB Language

The MATLAB language supports the vector and matrix operations

that are fundamental to engineering and scientific problems It enables fast

development and execution With the MATLAB language you can

program and develop algorithms faster than with traditional languages

because you do not need to perform low-level administrative tasks such as

declaring variables specifying data types and allocating memory In many

cases MATLAB eliminates the need for lsquoforrsquo loops As a result one line of

MATLAB code can often replace several lines of C or C++ code

At the same time MATLAB provides all the features of a traditional

programming language including arithmetic operators flow control data

structures data types object-oriented programming (OOP) and debugging

features

MATLAB lets you execute commands or groups of commands one

at a time without compiling and linking enabling you to quickly iterate to

the optimal solution For fast execution of heavy matrix and vector

computations MATLAB uses processor-optimized libraries For general-

purpose scalar computations MATLAB generates machine-code

instructions using its JIT (Just-In-Time) compilation technology

This technology which is available on most platforms provides

execution speeds that rival those of traditional programming languages

Development Tools

MATLAB includes development tools that help you implement

your algorithm efficiently These include the following

MATLAB Editor

Provides standard editing and debugging features such as setting

breakpoints and single stepping

Code Analyzer

Checks your code for problems and recommends modifications to

maximize performance and maintainability

MATLAB Profiler

Records the time spent executing each line of code

Directory Reports

Scan all the files in a directory and report on code efficiency file

differences file dependencies and code coverage

Designing Graphical User Interfaces

By using the interactive tool GUIDE (Graphical User Interface

Development Environment) to layout design and edit user interfaces

GUIDE lets you include list boxes pull-down menus push buttons radio

buttons and sliders as well as MATLAB plots and Microsoft

ActiveXreg controls Alternatively you can create GUIs programmatically

using MATLAB functions

322 ANALYZING AND ACCESSING DATA

MATLAB supports the entire data analysis process from acquiring

data from external devices and databases through preprocessing

visualization and numerical analysis to producing presentation-quality

output

Data Analysis

MATLAB provides interactive tools and command-line functions for data

analysis operations including

Interpolating and decimating

Extracting sections of data scaling and averaging

Thresholding and smoothing

Correlation Fourier analysis and filtering

1-D peak valley and zero finding

Basic statistics and curve fitting

Matrix analysis

Data Access

MATLAB is an efficient platform for accessing data from

files other applications databases and external devices You can read data

from popular file formats such as Microsoft Excel ASCII text or binary

files image sound and video files and scientific files such as HDF and

HDF5 Low-level binary file IO functions let you work with data files in

any format Additional functions let you read data from Web pages and

XML

Visualizing Data

All the graphics features that are required to visualize engineering

and scientific data are available in MATLAB These include 2-D and 3-D

plotting functions 3-D volume visualization functions tools for

interactively creating plots and the ability to export results to all popular

graphics formats You can customize plots by adding multiple axes

changing line colors and markers adding annotation Latex equations and

legends and drawing shapes

2-D Plotting

Visualizing vectors of data with 2-D plotting functions that create

Line area bar and pie charts

Direction and velocity plots

Histograms

Polygons and surfaces

Scatterbubble plots

Animations

3-D Plotting and Volume Visualization

MATLAB provides functions for visualizing 2-D matrices 3-

D scalar and 3-D vector data You can use these functions to visualize and

understand large often complex multidimensional data Specifying plot

characteristics such as camera viewing angle perspective lighting effect

light source locations and transparency

3-D plotting functions include

Surface contour and mesh

Image plots

Cone slice stream and isosurface

323 PERFORMING NUMERIC COMPUTATION

MATLAB contains mathematical statistical and engineering

functions to support all common engineering and science operations These

functions developed by experts in mathematics are the foundation of the

MATLAB language The core math functions use the LAPACK and BLAS

linear algebra subroutine libraries and the FFTW Discrete Fourier

Transform library Because these processor-dependent libraries are

optimized to the different platforms that MATLAB supports they execute

faster than the equivalent C or C++ code

MATLAB provides the following types of functions for performing

mathematical operations and analyzing data

Matrix manipulation and linear algebra

Polynomials and interpolation

Fourier analysis and filtering

Data analysis and statistics

Optimization and numerical integration

Ordinary differential equations (ODEs)

Partial differential equations (PDEs)

Sparse matrix operations

MATLAB can perform arithmetic on a wide range of data types

including doubles singles and integers

CHAPTER 4

IMPLEMENTATION

41 GENERAL

Matlab is a program that was originally designed to simplify the

implementation of numerical linear algebra routines It has since grown into

something much bigger and it is used to implement numerical algorithms

for a wide range of applications The basic language used is very similar to

standard linear algebra notation but there are a few extensions that will

likely cause you some problems at first

42 SNAPSHOTS

ORIGINAL SCLERA IMAGE IS CONVERTED INTO GREY SCALE IMAGE

FIG

GREY SCALE IMAGE IS CONVERTED INTO BINARY IMAGE

FIG

EDGE DETECTON IS DONE BY OTSUrsquoS THRESHOLDING

FIG

SELECTING THE REGION OF INTEREST (SCLERA PART)

FIG

SELECTED ROI PART

FIG

FIG

ENHANCEMENT OF SCLERA IMAGE

FIG

FEATURE EXTRACTION OF SCLERA IMAGE USING GABOR

FILTERS

FIG

MATCHING WITH IMAGES IN DATABASE

FIG

DISPLAYING THE RESULT (MATCHED OR NOT MATCHED)

FIG

CHAPTER 5

APPLICATIONS

The applications of biometrics can be divided into the following three main groups

Commercial applications such as computer network login electronic data security ecommerce Internet access ATM credit card physical access control cellular phone PDA medical records management distance learning etc

Government applications such as national ID card correctional facility driverrsquos license social security welfare-disbursement border control Passport control etc

Forensic applications such as corpse identification criminal investigation terrorist identification parenthood determination missing children etc Traditionally commercial applications have used knowledge-based systems (eg PIN sand passwords) government applications have used token-based systems (eg ID cards and badges) and forensic applications have relied on human experts to match biometric features Biometric systems are being increasingly deployed in large scale civilian applications The Schiphol Premium scheme at the Amsterdam airport for example employs iris scan cards to speed up the passport and visa control procedures

CHAPTER 6

CONCLUSION AND FUTURE SCOPE

61 CONCLUSION

In this paper we proposed a new parallel sclera vein recognition

method which employees a two stage parallel approach for registration and

matching Even though the research focused on developing a parallel sclera

matching solution for the sequential line-descriptor method using CUDA

GPU architecture the parallel strategies developed in this research can be

applied to design parallel solutions to other sclera vein recognition methods

and general pattern recognition methods We designed the Y shape

descriptor to narrow the search range to increase the matching efficiency

which is a new feature extraction method to take advantage of the GPU

structures We developed the WPL descriptor to incorporate mask

information and make it more suitable for parallel computing which can

dramatically reduce data transferring and computation We then carefully

mapped our algorithms to GPU threads and blocks which is an important

step to achieve parallel computation efficiency using a GPU A work flow

which has high arithmetic intensity to hide the memory access latency was

designed to partition the computation task to the heterogeneous system of

CPU and GPU even to the threads in GPU The proposed method

dramatically improves the matching efficiency without compromising

recognition accuracy

62 REFERENCES

[1] C W Oyster The Human Eye Structure and Function Sunderland

Sinauer Associates 1999

[2] C Cuevas D Berjon F Moran and N Garcia ldquoMoving object

detection for real-time augmented reality applications in a GPGPUrdquo IEEE

Trans Consum Electron vol 58 no 1 pp 117ndash125 Feb 2012

[3] D C Cirean U Meier L M Gambardella and J Schmidhuber ldquoDeep

big simple neural nets for handwritten digit recognitionrdquo Neural Comput

vol 22 no 12 pp 3207ndash3220 2010

[4] F Z Sakr M Taher and A M Wahba ldquoHigh performance iris

recognition system on GPUrdquo in Proc ICCES 2011 pp 237ndash242

[5] G Poli J H Saito J F Mari and M R Zorzan ldquoProcessing

neocognitron of face recognition on high performance environment based

on GPU with CUDA architecturerdquo in Proc 20th Int Symp Comput

Archit High Perform Comput 2008 pp 81ndash88

[6] J Antikainen J Havel R Josth A Herout P Zemcik and M Hauta-

Kasari ldquoNonnegative tensor factorization accelerated using GPGPUrdquo IEEE

Trans Parallel Distrib Syst vol 22 no 7 pp 1135ndash1141 Feb 2011

[7] K-S Oh and K Jung ldquoGPU implementation of neural networksrdquo

Pattern Recognit vol 37 no 6 pp 1311ndash1314 2004

[8] P R Dixon T Oonishi and S Furui ldquoHarnessing graphics processors

for the fast computation of acoustic likelihoods in speech recognitionrdquo

Comput Speech Lang vol 23 no 4 pp 510ndash526 2009

[9] P Kaufman and A Alm ldquoClinical applicationrdquo Adlerrsquos Physiology of

the Eye 2003

[10] R N Rakvic B J Ulis R P Broussard R W Ives and N Steiner

ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics Security vol 4

no 4 pp 812ndash823 Dec 2009

[11] S Crihalmeanu and A Ross ldquoMultispectral scleral patterns for ocular

biometric recognitionrdquo Pattern Recognit Lett vol 33 no 14 pp 1860ndash

1869 Oct 2012

[12] W Wenying Z Dongming Z Yongdong L Jintao and G

Xiaoguang ldquoRobust spatial matching for object retrieval and its parallel

implementation on GPUrdquo IEEE Trans Multimedia vol 13 no 6 pp

1308ndash1318 Dec 2011Multimedia Sec Magdeburg Germany Sep 2004

pp 4ndash15

[13] Y Xu S Deka and R Righetti ldquoA hybrid CPU-GPGPU approach for

real-time elastographyrdquo IEEE Trans Ultrason Ferroelectr Freq Control

vol 58 no 12 pp 2631ndash2645 Dec 2011

[14] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive

multimodal eye recognitionrdquo Signal Image Video Process vol 7 no 4

pp 619ndash631 Jul 2013

[15] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive

approach for sclera image quality measurerdquo Int J Biometrics vol 5 no 2

pp 181ndash198 2013

[16] Z Zhou E Y Du N L Thomas and E J Delp ldquoA new human

identification method Sclera recognitionrdquo IEEE Trans Syst Man

Cybern A Syst Humans vol 42 no 3 pp 571ndash583 May 2012

Page 24: Major project report on

3 R Derakhshani A Ross and S Crihalmeanu ldquoA new biometric

modality based on conjunctival vasculaturerdquo in Proc Artif Neural

Netw Eng 2006 pp 1ndash8

A new biometric indicator based on the patterns of conjunctival

vasculature is proposed Conjunctival vessels can be observed on the visible

part of the sclera that is exposed to the outside world These vessels

demonstrate rich and specific details in visible light and can be easily

photographed using a regular digital camera In this paper we discuss

methods for conjunctival imaging preprocessing and feature extraction in

order to derive a suitable conjunctival vascular template for biometric

authentication Commensurate classification methods along with the

observed accuracy are discussed Experimental results suggest the potential

of using conjunctival vasculature as a biometric measure Identification of

a person based on some unique set of features is an important task The

human identification is possible with several biometric systems and sclera

recognition is one of the promising biometrics The sclera is the white

portion of the human eye The vein pattern seen in the sclera region is

unique to each person Thus the sclera vein pattern is a well suited

biometric technology for human identification The existing methods used

for sclera recognition have some drawbacks like only frontal looking

images are preferred for matching and rotation variance is another problem

These problems are completely eliminated in the proposed system by using

two feature extraction techniques They are Histogram of Oriented

Gradients (HOG) and converting the image into polar form using the

bilinear interpolation technique These two features help the proposed

system to become illumination invariant and rotation invariant The

experimentation is done with the help of UBIRIS database The

experimental result shows that the proposed sclera recognition method can

achieve better accuracy than the previous methods

4 J D Owens M Houston D Luebke S Green J E Stone and J

C Phillips ldquoGPU computingrdquo Proc IEEE vol 96 no 5pp 879ndash899

May 2008

The graphics processing unit (GPU) has become an integral part of

todayrsquos mainstream computing systems Over the past six years there has

been a marked increase in the performance and capabilities of GPUs The

modern GPU is not only a powerful graphics engine but also a highly

parallel programmable processor featuring peak arithmetic and memory

bandwidth that substantially outpaces its CPU counterpart The GPUrsquos

rapid increase in both programmability and capability has spawned a

research community that has successfully mapped a broad range of

computationally demanding complex problems to the GPU This effort in

general purpose computing on the GPU also known as GPU computing

has positioned the GPU as a compelling alternative to traditional

microprocessors in high-performance computer systems of the future We

describe the background hardware and programming model for GPU

computing summarize the state of the art in tools and techniques and

present four GPU computing successes in game physics and computational

biophysics that deliver order-of-magnitude performance gains over

optimized CPU applications

5 H Proenccedila and L A Alexandre ldquoUBIRIS A noisy iris image

databaserdquoin Proc 13th Int Conf Image Anal Process 2005 pp 970ndash

977

This paper proposes algorithms for iris segmentation quality

enhancement match score fusion and indexing to improve both the

accuracy and the speed of iris recognition A curve evolution approach is

proposed to effectively segment a nonideal iris image using the modified

MumfordndashShah functional Different enhancement algorithms are

concurrently applied on the segmented iris image to produce multiple

enhanced versions of the iris image A support-vector-machine-based

learning algorithm selects locally enhanced regions from each globally

enhanced image and combines these good-quality regions to create a single

high-quality iris image Two distinct features are extracted from the high-

quality iris image The global textural feature is extracted using the 1-D log

polar Gabor transform and the local topological feature is extracted using

Euler numbers An intelligent fusion algorithm combines the textural and

topological matching scores to further improve the iris recognition

performance and reduce the false rejection rate whereas an indexing

algorithm enables fast and accurate iris identification The verification and

identification performance of the propose algorithms is validated and

compared with other algorithms using the CASIA Version 3 ICE 2005 and

UBIRIS iris databases

18 PROPOSED METHOD

Proposed a new parallel sclera vein recognition method using a two-

stage parallel approach for registration and matching A parallel sclera

matching solution for Sclera vein recognition using our sequential line-

descriptor method using the CUDA GPU architecture CUDA is a highly

parallel multithreaded many-core processor with tremendous

computational power

It supports not only a traditional graphics pipeline but also computation

on non-graphical data It is relatively straightforward to implement our C

program for CUDA on AMD-based GPU using Open CL Our CUDA

kernels can be directly converted to Open CL kernels by concerning

different syntax for various keywords and built-in functions The mapping

strategy is also effective in Open CL if we regard thread and block in

CUDA as work item and work-group in Open CL Most of our optimization

techniques such as coalesced memory access and prefix sum can work in

Open CL too Moreover since CUDA is a data parallel architecture the

implementation of our approach by Open CL should be programmed in

data-parallel model

In this research we first discuss why the naiumlve parallel approach would

not work We then propose the new sclera descriptor ndash the Y shape sclera

feature-based efficient registration method to speed up the mapping scheme

introduce the ldquoweighted polar line (WPL) descriptorrdquo that would be better

suited for parallel computing to mitigate the mask size issue and develop

our coarse to fine two-stage matching process to dramatically improve the

matching speed These new approaches make the parallel processing

possible and efficient

191PROPOSED SYSTEM ADVANTAGES

1 To improve the efficiency in this research we propose a new descriptor

mdash the Y shape descriptor which can greatly help improve the efficiency of

the coarse registration of two images and can be used to filter out some

non-matching pairs before refined matching

2 We propose the coarse-to-fine two-stage matching process In the first

stage we matched two images coarsely using the Y-shape descriptors

which is very fast to match because no registration was needed The

matching result in this stage can help filter out image pairs with low

similarities

CHAPTER 2

PROJECT DESCRIPTION

21 INTRODUCTION

The sclera is the opaque and white outer layer of the eye The blood

vessel structure of sclera is formed randomly and is unique to each person

which can be used for humanrsquos identification Several researchers have

designed different Sclera vein recognition methods and have shown that it

is promising to use Sclera vein recognition for human identification In

Crihalmeanu and Ross proposed three approaches Speed Up Robust

Features (SURF)-based method minutiae detection and direct correlation

matching for feature registration and matching Within these three methods

the SURF method achieves the best accuracy It takes an average of 15

seconds1 using the SURF method to per- form a one-to-one matching Zhou

et al proposed line descriptor-based method for sclera vein recognition

The matching step (including registration) is the most time-consuming step

in this sclera vein recognition system which costs about 12 seconds to

perform a one-to-one matching Both speed was calculated using a PC with

Intelreg Coretrade 2 Duo 24GHz processors and 4 GB DRAM Currently

Sclera vein recognition algorithms are designed using central processing

unit (CPU)-based systems

As discussed CPU-based systems are designed as sequential

processing devices which may not be efficient in data processing where the

data can be parallelized Because of large time consumption in the matching

step Sclera vein recognition using sequential-based method would be very

challenging to be implemented in a real time biometric system especially

when there is large number of templates in the database for matching GPUs

(as abbreviation of General purpose Graphics Processing Units GPGPUs)

are now popularly used for parallel computing to improve the

computational processing speed and efficiency The highly parallel

structure of GPUs makes them more effective than CPUs for data

processing where processing can be performed in parallel GPUs have been

widely used in biometrics recognition such as speech recognition text

detection handwriting recognition and face recognition In iris

recognition GPU was used to extract the features construct descriptors

and match templates

GPUs are also used for object retrieval and image search Park et al

designed the performance evaluation of image processing algorithms such

as linear feature extraction and multi-view stereo matching on GPUs

However these approaches were designed for their specific biometric

recognition applications and feature searching methods Therefore they may

not be efficient for Sclera vein recognition Compute Unified Device

Architecture (CUDA) the computing engine of NVIDIA GPUs is used in

this research CUDA is a highly parallel multithreaded many-core

processor with tremendous computational power It supports not only a

traditional graphics pipeline but also computation on non-graphical data

More importantly it offers an easier programming platform which

outperforms its CPU counterparts in terms of peak arithmetic intensity and

memory bandwidth In this research the goal is not to develop a unified

strategy to parallelize all sclera matching methods because each method is

quite different from one another and would need customized design To

develop an efficient parallel computing scheme it would need different

strategies for different Sclera vein recognition methods

Rather the goal is to develop a parallel sclera matching solution for

Sclera vein recognition using our sequential line-descriptor method using

the CUDA GPU architecture However the parallelization strategies

developed in this research can be applied to design parallel approaches for

other Sclera vein recognition methods and help parallelize general pattern

recognition methods Based on the matching approach in there are three

challenges to map the task of sclera feature matching to GPU

1) Mask files are used to calculate valid overlapping areas of two sclera

templates and to align the templates to the same coordinate system But the

mask files are large in size and will preoccupy the GPU memory and slow

down the data transfer Also some of processing on the mask files will

involve convolution which is difficult to improve its performance on the

scalar process unit on CUDA

2) The procedure of sclera feature matching consists of a pipeline of several

computational stages with different memory and processing requirements

There is no uniform mapping scheme applicable for all these stages

3) When the scale of sclera database is far larger than the number of the

processing units on the GPU parallel matching on the GPU is still unable to

satisfy the requirement of real-time performance New designs are

necessary to help narrow down the search range In summary naiumlve

implementation of the algorithms in parallel would not work efficiently

Note it is relatively straightforward to implement our C program for

CUDA on AMD-based GPU using Open CL Our CUDA kernels can be

directly converted to Open CL kernels by concerning different syntax for

various keywords and built-in functions The mapping strategy is also

effective in Open CL if we regard thread and block in CUDA as work item

and work-group in Open CL Most of our optimization techniques such as

coalesced memory access and prefix sum can work in Open CL too

Moreover since CUDA is a data parallel architecture the implementation

of our approach by Open CL should be programmed in data-parallel model

In this research we first discuss why the naiumlve parallel approach would not

work (Section 3) We then propose the new sclera descriptor ndash the Y shape

sclera feature-based efficient registration method to speed up the mapping

scheme (Section 4) introduce the ldquoweighted polar line (WPL) descriptorrdquo

that would be better suited for parallel computing to mitigate the mask size

issue (Section 5) and develop our coarse to fine two-stage matching

process to dramatically improve the matching speed (Section 6) These new

approaches make the parallel processing possible and efficient However it

is non-trivial to implement these algorithms in CUDA We then developed

the implementation schemes to map our algorithms into CUDA (Section 7)

In the Section 2 we give brief introduction of Sclera vein recognition In

the Section 8 we performed some experiments using the proposed system

In the Section 9 we draw some conclusions

22 BACKGROUND OF SCLERA VEIN RECOGNITION

221 OVERVIEW OF SCLERA VEIN RECOGNITION

A typical sclera vein recognition system includes sclera

segmentation feature enhancement feature extraction and feature

matching (Figure 1)

FIG

Sclera image segmentation is the first step in sclera vein recognition

Several methods have been designed for sclera segmentation Crihalmeanu

et al presented an semi-automated system for sclera segmentation They

used a clustering algorithm to classify the color eye images into three

clusters - sclera iris and background Later on Crihalmeanu and Ross

designed a segmentation approach based on a normalized sclera index

measure which includes coarse sclera segmentation pupil region

segmentation and fine sclera segmentation Zhou et al developed a skin

tone plus ldquowhite colorrdquo-based voting method for sclera segmentation in

color images and Otsursquos thresholding-based method for grayscale images

After sclera segmentation it is necessary to enhance and extract the sclera

features since the sclera vein patterns often lack contrast and are hard to

detect Zhou et al used a bank of multi-directional Gabor filters for

vascular pattern enhancement Derakhshani et al used contrast limited

adaptive histogram equalization (CLAHE) to enhance the green color plane

of the RGB image and a multi-scale region growing approach to identify

the sclera veins from the image background Crihalmeanu and Ross applied

a selective enhancement filter for blood vessels to extract features from the

green component in a color image In the feature matching step

Crihalmeanu and Ross proposed

three registration and matching approaches including Speed Up Robust

Features (SURF) which is based on interest-point detection minutiae

detection which is based on minutiae points on the vasculature structure

and direct correlation matching which relies on image registration Zhou et

al designed a line descriptor based feature registration and matching

method

The proposed sclera recognition consists of five steps which include

sclera segmentation vein pattern enhancement feature extraction feature

matching and matching decision Fig 2 shows the block diagram of sclera

recognition Two types of feature extraction are used in the proposed

method to achieve good accuracy for the identification The characteristics

that are elicited from the blood vessel structure seen in the sclera region are

Histogram of Oriented Gradients (HOG) and interpolation of Cartesian to

Polar conversion HOG is used to determine the gradient orientation and

edge orientations of vein pattern in the sclera region of an eye image To

become more computationally efficient the data of the image are converted

to the polar form It is mainly used for circular or quasi circular shape of

object These two characteristics are extracted from all the images in the

database and compared with the features of the query image whether the

person is correctly identified or not This procedure is done in the feature

matching step and ultimately makes the matching decision By using the

proposed feature extraction methods and matching techniques the human

identification is more accurate than the existing studies In the proposed

method two features of an image are drawn out

222 SCLERA SEGMENTATION

Sclera segmentation is the first step in the sclera recognition It lets

in three steps glare area detection sclera area estimation and iris and eyelid

detection and refinement Fig shows the steps of segmentation

FIG

Glare Area Detection Glare area means a small bright area near

pupil or iris This is the unwanted portion on the eye image Sobel filter is

applied to detect the glare area present in the iris or pupil Simply it runs

only for the grayscale image If the image is color then it needs a

conversion to grayscale image and after that apply it to the Sobel filter to

detect the glare area Fig 4 shows the result of the glare area detection

FIG

Sclera area estimation For the estimation of sclera area Otsursquos

thresholding method is applied The stairs of the sclera area detection are

selection of the area of interest (ROI) Otsursquos thresholding sclera area

detection Left and right sclera area is selected based on the iris boundaries

When the region of interest is selected then apply Otsursquos thresholding for

obtaining the potential sclera areas The correct left sclera area should be

placed in the right and center positions and correct right sclera area should

be placed in the left and center In this way non sclera areas are wiped out

223 IRIS AND EYELID REFINEMENT

The top and underside of the sclera regions are the limits of the

sclera area And then that upper eyelid lower eyelid and iris boundaries are

refined These altogether are the unwanted portion for recognition In order

to eliminate these effects refinement is done in the footstep of the detection

of sclera area Fig shows after the Otsursquos thresholding process and iris and

eyelid refinement to detect right sclera area In the same way the left sclera

area is detected using this method

FIG

In the segmentation process all images are not perfectly segmented

Hence feature extraction and matching are needed to reduce the

segmentation fault The vein patterns in the sclera area are not visible in the

segmentation process To get vein patterns more visible vein pattern

enhancement is to be performed

224 OCULAR SURFACE VASCULATURE

Human recognition using vascular patterns in the human body has

been studied in the context of fingers (Miura et al 2004) palm (Lin and

Fan 2004) and retina (Hill 1999) In the case of retinal biometrics an

especial optical device for imaging the back of the eyeball is needed (Hill

1999) Due to its perceived invasiveness and the required degree of subject

cooperation the use of retinal biometrics may not be acceptable to some

individuals The conjunctiva is a thin transparent and moist tissue that

covers the outer surface of the eye The part of the conjunctiva that covers

the inner lining of the eyelids is called palpebral conjunctiva and the part

that covers the outer surface of the eye is called ocular (or the bulbar)

conjunctiva which is the focus of this study The ocular conjunctiva is very

thin and clear thus the vasculature (including those of the episclera) is

easily visible through it The visible microcirculation of conjunctiva offers a

rich and complex network of veins and fine microcirculation (Fig 1) The

apparent complexity and specificity of these vascular patterns motivated us

to utilize them for personal identification (Derakhshani and Ross 2006)

FIG

We have found conjunctival vasculature to be a suitable biometric as it

conforms to the following criteria (Jain et al 2004)

UNIVERSALITY All normal living tissues including that of the

conjunctiva and episclera have vascular structure

UNIQUENESS Vasculature is created during embryonic vasculogenesis

Its detailed final structure is mostly stochastic and thus unique Even

though no comprehensive study on the uniqueness of vascular structures

has been conducted study of some targeted areas such as those of the eye

fundus confirm the uniqueness of such vascular patterns even between

identical twins (Simon and Goldstein 1935 Tower 1955)

PERMANENCE Other than cases of significant trauma pathology or

chemical intervention spontaneous adult ocular vasculogenesis and

angiogenesis does not easily occur Thus the conjunctival vascular

structure is expected to have reasonable permanence (Joussen 2001)

Practicality Conjunctival vasculature can be captured with commercial off

the shelf digital cameras under normal lighting conditions making this

modality highly practical

ACCEPTABILITY Since the subject is not required to stare directly into

the camera lens and given the possibility of capturing the conjunctival

vasculature from several feet away this modality is non-intrusive and thus

more acceptable

SPOOF-PROOFNESS The fine multi surface structure of the ocular

veins makes them hard to reproduce as a physical artifact Besides being a

stand-alone biometric modality we anticipate that the addition of

conjunctival biometrics will enhance the performance of current iris-based

biometric system in the following ways

Improving accuracy by the addition of vascular features

Facilitating recognition using off-angle iris images For instance if the iris

information is relegated to the left or right portions of the eye the sclera

vein patterns will be further exposed This feature makes sclera vasculature

a natural complement to the iris biometric

Addressing the failure-to-enroll issue when iris patterns are not usable (eg

due to surgical procedures)

Reducing vulnerability to spoof attacks For instance when implemented

alongside iris systems an attacker needs to reproduce not only the iris but

also different surfaces of the sclera along with the associated

microcirculation and make them available on commensurate eye surfaces

The first step in parallelizing an algorithm is to determine the

availability for simultaneous computation Below Figure demonstrates the

possibility for parallel directional filtering Since the filter is computed over

different portions of the input image the computation can be computed in

parallel (denoted by Elements below) In addition individual parallelization

of each element of Filtering can also be performed A detailed discussion of

our proposed parallelization is outside the scope of this paper

FIG

FIG

225 OVERVIEW OF THE LINE DESCRIPTOR-BASED SCLERA

VEIN

2251 RECOGNITION METHOD

The matching segment of the line-descriptor based method is a

bottleneck with regard to matching speed In this section we briefly

describe the Line Descriptor-based sclera vein recognition method After

segmentation vein patterns were enhanced by a bank of directional Gabor

filters Binary morphological operations are used to thin the detected vein

structure down to a single pixel wide skeleton and remove the branch

points The line descriptor is used to describe the segments in the vein

structure Figure 2 shows a visual description of the line descriptor Each

segment is described by three quantities the segments angle to some

reference angle at the iris center θ the segments distance to the iris center r

and the dominant angular orientation of the line segment ɸ Thus the

descriptor is S = ( θ r ɸ )T The individual components of the line descriptor

are calculated as

FIG

Here fline (x) is the polynomial approximation of the line segment (xl yl )

is the center point of the line segment (xi yi ) is the center of the detected

iris and S is the line descriptor In order to register the segments of the

vascular patterns a RANSAC-based algorithm is used to estimate the best-

fit parameters for registration between the two sclera vascular patterns For

the registration algorithm it randomly chooses two points ndash one from the

test template and one from the target template It also randomly chooses a

scaling factor and a rotation value based on a priori knowledge of the

database Using these values it calculates a fitness value for the registration

using these parameters

After sclera template registration each line segment in the test

template is compared to the line segments in the target template for

matches In order to reduce the effect of segmentation errors we created the

weighting image (Figure 3) from the sclera mask by setting interior pixels

in the sclera mask to 1 pixels within some distance of the boundary of the

mask to 05 and pixels outside the mask to 0

The matching score for two segment descriptors is calculated By

where Si and Sj are two segment descriptors m(Si Sj ) is the matching

score between segments Si and Sj d(Si Sj ) is the Euclidean distance

between the segment descriptors center points (from Eq 6-8) Dmatch is

the matching distance threshold and match is the matching angle threshold

The total matching score M is the sum of the individual matching scores

divided by the maximum matching score for the minimal set between the

test and target template That is one of the test or target templates has fewer

points and thus the sum of its descriptors weight sets the maximum score

that can be attained

FIG

FIG

FIG

FIG

movement of eye Y shape branches are observed to be a stable feature and

can be used as sclera feature descriptor To detect the Y shape branches in

the original template we search for the nearest neighbors set of every line

segment in a regular distance classified the angles among these neighbors

If there were two types of angle values in the line segment set this set may

be inferred as a Y shape structure and the line segment angles would be

recorded as a new feature of the sclera

There are two ways to measure both orientation and relationship of

every branch of Y shape vessels one is to use the angles of every branch to

x axle the other is to use the angels between branch and iris radial

direction The first method needs additional rotation operating to align the

template In our approach we employed the second method As Figure 6

shows ϕ1 ϕ2 and ϕ3 denote the angle between each branch and the radius

from pupil center Even when the head tilts the eye moves or the camera

zooms occurs at the image acquisition step ϕ1 ϕ2 and ϕ3 are quite stable

To tolerate errors from the pupil center calculation in the segmentation step

we also recorded the center position (x y) of the Y shape branches as

auxiliary parameters So in our rotation shift and scale invariant feature

vector is defined as y(ϕ1 ϕ2 ϕ3 x y) The Y-shape descriptor is generated

with reference to the iris center Therefore it is automatically aligned to the

iris centers It is a rotational- and scale- invariant descriptor V WPL

SCLERA DESCRIPTOR As we discussed in the Section 22 the line

descriptor is extracted from the skeleton of vessel structure in binary images

(Figure 7) The skeleton is then broken into smaller segments For each

segment a line descriptor is created to record the center and orientation of

the segment This descriptor is expressed as s(x yɸ) where (x y) is the

position of the center and ɸ is its orientation Because of the limitation of

segmentation accuracy the descriptor in the boundary of sclera area might

not be accurate and may contain spur edges resulting from the iris eyelid

andor eyelashes To be tolerant of such error the mask file

FIG

The line descriptor of the sclera vessel pattern (a) An eye image (b) Vessel

patterns in sclera (c) Enhanced sclera vessel patterns (d) Centers of line

segments of vessel patterns

Is designed to indicate whether a line segment belongs to the edge of the

sclera or not However in GPU application using the mask is a challenging

since the mask files are large in size and will occupy the GPU memory and

slow down the data transfer When matching the registration RANSAC

type algorithm was used to randomly select the corresponding descriptors

and the transform parameter between them was used to generate the

template transform affine matrix After every templates transform the mask

data should also be transformed and new boundary should be calculated to

evaluate the weight of the transformed descriptor This results in too many

convolutions in processor unit

To reduce heavy data transfer and computation we designed the

weighted polar line (WPL) descriptor structure which includes the

information of mask and can be automatically aligned We extracted the

relationship of geometric feature of descriptors and store them as a new

descriptor We use a weighted image created via setting various weight

values according to their positions The weight of those descriptors who are

beyond the sclera are set to be 0 and those who are near the sclera

boundary are 05 and interior descriptors are set to be 1 In our work

descriptors weights were calculated on their own mask by the CPU only

once

The calculating result was saved as a component of descriptor The

descriptor of sclera will change to s(x y ɸw) where w denotes the weight

of the point and the value may be 0 05 1 To align two templates when a

template is shifted to another location along the line connecting their

centers all the descriptors of that template will be transformed It would be

faster if two templates have similar reference points If we use the center of

the iris as the reference point when two templates are compared the

correspondence will automatically be aligned to each other since they have

the similar reference point Every feature vector of the template is a set of

line segment descriptors composed of three variable (Figure 8) the

segment angle to the reference line which went through the iris center

denoted as θ the distance between the segments center and pupil center

which is denoted as r the dominant angular orientation of the segment

denoted as ɸ To minimize the GPU computing we also convert the

descriptor value from polar coordinate to rectangular coordinate in CPU

preprocess

The descriptor vector becomes s(x y r θ ɸw) The left and right

parts of sclera in an eye may have different registration parameters For

example as an eyeball moves left left part sclera patterns of the eye may be

compressed while the right part sclera patterns are stretched

In parallel matching these two parts are assigned to threads in

different warps to allow different deformation The multiprocessor in

CUDA manages threads in groups of 32 parallel threads called warps We

reorganized the descriptor from same sides and saved

FIG

FIG

them in continuous address This would meet requirement of coalesced

memory access in GPU

After reorganizing the structure of descriptors and adding mask information

into the new descriptor the computation on the mask file is not needed on

the GPU It was very fast to match with this feature because it does not

need to reregister the templates every time after shifting Thus the cost of

data transfer and computation on GPU will be reduced Matching on the

new descriptor the shift parameters generator in Figure 4 is then simplified

as Figure 9

23 EVOLUTION OF GPU ARCHITECTURE

The fixed-function pipeline lacked the generality to efficiently express

more complicated shading and lighting operations that are essential for

complex effects The key step was replacing the fixed-function per-vertex

and per-fragment operations with user-specified programs run on each

vertex and fragment Over the past six years these vertex programs and

fragment programs have become increasingly more capable with larger

limits on their size and resource consumption with more fully featured

instruction sets and with more flexible control-flow operations After many

years of separate instruction sets for vertex and fragment operations current

GPUs support the unified Shader Model 40 on both vertex and fragment

shaders

The hardware must support shader programs of at least 65 k static

instructions and unlimited dynamic instructions

The instruction set for the first time supports both 32-bit integers and 32-

bit floating-point numbers

The hardware must allow an arbitrary number of both direct and indirect

reads from global memory (texture)

Finally dynamic flow control in the form of loops and branches must be

supported

As the shader model has evolved and become more powerful and GPU

applications of all types have increased vertex and fragment program

complexity GPU architectures have increasingly focused on the

programmable parts of the graphics pipeline Indeed while previous

generations of GPUs could best be described as additions of

programmability to a fixed-function pipeline todayrsquos GPUs are better

characterized as a programmable engine surrounded by supporting fixed-

function units General-Purpose Computing on the GPU Mapping general-

purpose computation onto the GPU uses the graphics hardware in much the

same way as any standard graphics application Because of this similarity it

is both easier and more difficult to explain the process On one hand the

actual operations are the same and are easy to follow on the other hand the

terminology is different between graphics and general-purpose use Harris

provides an excellent description of this mapping process

We begin by describing GPU programming using graphics terminology

then show how the same steps are used in a general-purpose way to author

GPGPU applications and finally use the same steps to show the more

simple and direct way that todayrsquos GPU computing applications are written

231 PROGRAMMING A GPU FOR GRAPHICS

We begin with the same GPU pipeline that we described in Section II

concentrating on the programmable aspects of this pipeline

The programmer specifies geometry that covers a region on the screen

The rasterizer generates a fragment at each pixel location covered by that

geometry

Each fragment is shaded by the fragment program

The fragment program computes the value of the fragment by a

combination of math operations and global memory reads from a global

Btexture[ memory

The resulting image can then be used as texture on future passes through

the graphics pipeline

232 PROGRAMMING A GPU FOR GENERAL-PURPOSE

PROGRAMS (OLD)

Coopting this pipeline to perform general-purpose computation

involves the exact same steps but different terminology A motivating

example is a fluid simulation computed over a grid at each time step we

compute the next state of the fluid for each grid point from the current state

at its grid point and at the grid points of its neighbors

The programmer specifies a geometric primitive that covers a

computation domain of interest The rasterizer generates a fragment at each

pixel location covered by that geometry (In our example our primitive

must cover a grid of fragments equal to the domain size of our fluid

simulation)

Each fragment is shaded by an SPMD general-purpose fragment

program (Each grid point runs the same program to update the state of its

fluid)

The fragment program computes the value of the fragment by a

combination of math operations and Bgather[ accesses from global

memory (Each grid point can access the state of its neighbors from the

previous time step in computing its current value)

The resulting buffer in global memory can then be used as an input on

future passes (The current state of the fluid will be used on the next time

step)

23 2 PROGRAMMING A GPU FOR GENERAL-PURPOSE

PROGRAMS (NEW)

One of the historical difficulties in programming GPGPU applications

has been that despite their general-purpose tasksrsquo having nothing to do with

graphics the applications still had to be programmed using graphics APIs

In addition the program had to be structured in terms of the graphics

pipeline with the programmable units only accessible as an intermediate

step in that pipeline when the programmer would almost certainly prefer to

access the programmable units directly The programming environments we

describe in detail in Section IV are solving this difficulty by providing a

more natural direct non-graphics interface to the hardware and

specifically the programmable units Today GPU computing applications

are structured in the following way

The programmer directly defines the computation domain of interest as a

structured grid of threads

An SPMD general-purpose program computes the value of each thread

The value for each thread is computed by a combination of math

operations and both Bgather[ (read) accesses from and Bscatter[ (write)

accesses to global memory Unlike in the previous two

methods the same buffer can be used for both reading and writing

allowing more flexible algorithms (for example in-place algorithms that

use less memory)

The resulting buffer in global memory can then be used as an input in

future computation

24 COARSE-TO-FINE TWO-STAGE MATCHING PROCESS

To further improve the matching process we propose the coarse-to-fine

two-stage matching process In the first stage we matched two images

coarsely using the Y-shape descriptors which is very fast to match because

no registration was needed The matching result in this stage can help filter

out image pairs with low similarities After this step it is still possible for

some false positive matches In the second stage we used WPL descriptor

to register the two images for more detailed descriptor matching including

scale- and translation invariance This stage includes shift transform affine

matrix generation and final WPL descriptor matching Overall we

partitioned the registration and matching processing into four kernels2 in

CUDA (Figure 10) matching on the Y shape descriptor shift

transformation affine matrix generation and final WSL descriptor

matching Combining these two stages the matching program can run faster

and

achieve more accurate score

241 STAGE I MATCHING WITH Y SHAPE DESCRIPTOR

Due to scale- and rotation- invariance of the Y-shape features

registration is unnecessary before matching on Y shape descriptor The

whole matching algorithm is listed as algorithm 1

FIG

Here ytei and yta j are the Y shape descriptors of test template Tte

and target template Tta respectively dϕ is the Euclidian distance of angle

element of descriptors vector defined as (3) dxy is the Euclidian distance of

two descriptor centers defined as (4) ni and di are the matched descriptor

pairsrsquo number and their centers distance respectively tϕ is a distance

threshold and txy is the threshold to restrict the searching area We set tϕ to

30 and txy to 675 in our experiment Here

To match two sclera templates we searched the areas nearby to all

the Y shape branches The search area is limited to the corresponding left or

right half of the sclera in order to reduce the searching range and time The

distance of two branches is defined in (3) where ϕi j is the angle between

the j th branch and the polar from pupil center in desctiptor i

The number of matched pairs ni and the distance between Y shape

branches centers di are stored as the matching result We fuse the number of

matched branches and the average distance between matched branches

centers as (2) Here α is a factor to fuse the matching score which was set

to 30 in our study Ni and Nj is the total numbers of feature vectors in

template i and j separately The decision is regulated by the threshold t if

the sclerarsquos matching score is lower than t the sclera will be discarded The

sclera with high matching score will be passed to the next more precisely

matching process

242 STAGE II FINE MATCHING USING WPL DESCRIPTOR

The line segment WSL descriptor reveals more vessel structure detail of

sclera than the Y shape descriptor The variation of sclera vessel pattern is

nonlinear because

When acquiring an eye image in different gaze angle the vessel structure

will appear nonlinear shrink or extend because eyeball is spherical in shape

sclera is made up of four layers episclera stroma lamina fusca and

endothelium There are slightly differences among movement of these

layers Considering these factors our registration employed both single

shift transform and multi-parameter transform which combines shift

rotation and scale together

1) SHIFT PARAMETER SEARCH As we discussed before

segmentation may not be accurate As a result the detected iris center could

not be very accurate Shift transform is designed to tolerant possible errors

in pupil center detection in the segmentation step If there is no deformation

or only very minor deformation registration with shift transform together

would be adequate to achieve an accurate result We designed Algorithm 2

to get optimized shift parameter Where Tte is the test template and ssei is

the i th WPL descriptor of Tte Tta is the target template and ssai is the i th

WPL descriptor of Ttad(stek staj ) is Euclidean distance of escriptors stek

and staj

Δsk is the shift value of two descriptors defines as

We first randomly select an equal number of segment descriptors

stek in test template Tte from each quad and find its nearest neighbors staj _

in target template Tta The shift offset of them is recorded as the possible

registration shift factor _sk The final offset registration factor is _soptim

which has the smallest standard deviation among these candidate offsets

2) AFFINE TRANSFORM PARAMETER SEARCH

Affine transform is designed to tolerant some deformation of sclera

patterns in the matching step The affine transform algorithm is shown in

Algorithm 3 The shift value in the parameter set is obtained by randomly

selecting descriptor s(it )te and calculating the distance from its nearest

neighbor staj_ in Tta We transform the test template by the matrix in (7)

At end of the iteration we count the numbers of matched descriptor pairs

from the transformed template and the target template The factor β is

involved to determine if the pair of descriptor is matched and we set it to

be 20 pixels in our experiment After N iterations the optimized transform

parameter set is determined via selecting the maximum matching numbers

m(it) Here stei Tte staj and Tta is defined same as algorithm 2 tr (it )

shi f t θ(it )tr (it ) scale is the parameters of shift rotation and scale

transform generated in i tth iteration R(θ (it )) T (tr (it ) shi f t ) and S(tr (it

) scale) are the transform matrix defined as (7) To search optimize

transform parameter we iterated N times to generate these parameters In

our experiment we set iteration time to 512

3) REGISTRATION AND MATCHING ALGORITHM

Using the optimized parameter set determined from Algorithms 2

and 3 the test template will be registered and matched simultaneously The

registration and matching algorithm is listed in Algorithm 4 Here stei Tte

staj and Tta are defined same as Algorithms 2 and 3 θ(optm) tr (optm) shi f

t tr (optm) scale _soptim are the registration parameters attained from

Algorithms 2 and 3 R_θ(optm)_T _tr (optm) shi f t _S_tr (optm) scale

_ is the descriptor transform matrix defined in Algorithm 3 empty is the angle

between the segment descriptor and radius direction w is the weight of the

descriptor which indicates whether the descriptor is at the edge of sclera or

not To ensure that the nearest descriptors have a similar orientation we

used a constant factor α to check the abstract difference of two ɸ In our

experiment we set α to 5 The total matching score is minimal score of two

transformed result divided by the minimal matching score for test template

and target template

25 MAPPING THE SUBTASKS TO CUDA

CUDA is a single instruction multiple data (SIMD) system and

works as a coprocessor with a CPU A CUDA consists of many streaming

multiprocessors (SM) where the parallel part of the program should be

partitioned into threads by the programmer and mapped into those threads

There are multiple memory spaces in the CUDA memory hierarchy

register local memory shared memory global memory constant memory

and texture memory Register local memory and shared memory are on-

chip and could be a little time consuming to access these memories Only

shared memory can be accessed by other threads within the same block

However there is only limited availability of shared memory Global

memory constant memory and texture memory are off-chip memory and

accessible by all threads which would be very time consuming to access

these memories

Constant memory and texture memory are read-only and cacheable

memory Mapping algorithms to CUDA to achieve efficient processing is

not a trivial task There are several challenges in CUDA programming

If threads in a warp have different control path all the branches will be

executed serially To improve performance branch divergence within a

warp should be avoided

Global memory is slower than on-chip memory in term of access To

completely hide the latency of the small instructions set we should use on-

chip memory preferentially rather than global memory When global

memory access occurs threads in same warp should access the words in

sequence to achieve coalescence

Shared memory is much faster than the local and global memory space

But shared memory is organized into banks which are equal in size If two

addresses of memory request from different thread within a warp fall in the

same memory bank the access will be serialized To get maximum

performance memory requests should be scheduled to minimize bank

conflicts

251 MAPPING ALGORITHM TO BLOCKS

Because the proposed registration and matching algorithm has four

independent modules all the modules will be converted to different kernels

on the GPU These kernels are different in computation density thus we

map them to the GPU by various map strategies to fully utilize the

computing power of CUDA Figure 11 shows our scheme of CPU-GPU

task distribution and the partition among blocks and threads Algorithm 1 is

partitioned into coarse-grained parallel subtasks

We create a number of threads in this kernel The number of threads

is the same as the number of templates in the database As the upper middle

column shows in Figure 11 each target template will be assigned to one

thread One thread performs a pair of templates compare In our work we

use NVIDIA C2070 as our GPU Threads and blocks number is set to

1024 That means we can match our test template with up to 1024times1024

target templates at same time

Algorithms 2-4 will be partitioned into fine-grained subtasks which is

processed a section of descriptors in one thread As the lower portion of the

middle column shows in Figure 11 we assigned a target template to one

block Inside a block one thread corresponds a set of descriptors in this

template This partition makes every block execute independently and there

are no data exchange requirements between different blocks When all

threads complete their responding descriptor fractions the sum of the

intermediate results needs to be computed or compared A parallel prefix

sum algorithm is used to calculate the sum of intermediate results which is

show in right of Figure 11 Firstly all odd number threads compute the sum

of consecutive pairs of the results Then recursively every first of i (= 4 8

16 32 64 ) threads

compute the prefix sum on the new result The final result will be saved in

the first address which has the same variable name as the first intermediate

result

252 MAPPING INSIDE BLOCK

In shift argument searching there are two schemes we can choose to

map task

Mapping one pair of templates to all the threads in a block and then every

thread would take charge of a fraction of descriptors and cooperation with

other threads

Assigning a single possible shift offset to a thread and all the threads will

compute independently unless the final result should be compared with

other possible offset

Due to great number of sum and synchronization operations in every

nearest neighbor searching step we choose the second method to parallelize

shift searching In affine matrix generator we mapped an entire parameter

set searching to a thread and every thread randomly generated a set of

parameters and tried them independently The generated iterations were

assigned to all threads The challenge of this step is the randomly generated

numbers might be correlated among threads In the step of rotation and

scale registration generating we used the Mersenne Twister pseudorandom

number generator because it can use bitwise arithmetic and have long

period

The Mersenne twister as most of pseudorandom generators is iterative

Therefore itrsquos hard to parallelize a single twister state update step among

several execution threads To make sure that thousands of threads in the

launch grid generate uncorrelated random sequence many simultaneous

Mersenne twisters need to process with different initial states in parallel

But even ldquovery differentrdquo (by any definition) initial state values do not

prevent the emission of correlated sequences by each generator sharing

identical parameters To solve this problem and to enable efficient

implementation of Mersenne Twister on parallel architectures we used a

special offline tool for the dynamic creation of Mersenne Twisters

parameters modified from the algorithm developed by Makoto Matsumoto

and Takuji Nishimura In the registration and matching step when

searching the nearest neighbor a line segment that has already matched

with others should not be used again In our approach a flag

FIG

FIG

Variable denoting whether the line has been matched is stored in

shared memory To share the flags all the threads in a block should wait

synchronic operation at every query step Our solution is to use a single

thread in a block to process the matching

253 MEMORY MANAGEMENT

The bandwidth inside GPU board is much higher than the

bandwidth between host memory and device memory The data transfer

between host and device can lead to long latency As shown in Figure 11

we load the entire target templates set from database without considering

when they would be processed Therefore there was no data transfer from

host to device during the matching procedure In global memory the

components in descriptor y(ϕ1 ϕ2 ϕ3 x y) and s(x y rθ ϕw) were stored

separately This would guarantee contiguous kernels of Algorithm 2 to 4

can access their data in successive addresses Although such coalescing

access reduces the latency frequently global memory access was still a

slower way to get data In our kernel we loaded the test template to shared

memory to accelerate memory access Because the Algorithms 2 to 4

execute different number of iterations on same data the bank conflict does

not happen To maximize our texture memory space we set the system

cache to the lowest value and bonded our target descriptor to texture

memory Using this catchable memory our data access was accelerated

more

FIG

26 HISTOGRAM OF ORIENTED GRADIENTS

Histogram of oriented gradients is the feature descriptors It is primarily

applied to the design of target detection In this paper it is applied as the

feature for human recognition In the sclera region the vein patterns are the

edges of an image So HOG is used to determine the gradient orientation

and edge orientations of vein pattern in the sclera region of an eye image

To follow out this technique first of all divide the image into small

connected regions called cells For each cell compute the histogram of

gradient directions or edge orientations of the pixels Then the combination

of different histogram of different cell represents the descriptor To improve

accuracy histograms can be contrast normalized by calculating the intensity

from the block and then using this value normalizes all cells within the

block This normalization result shows that it is invariant to geometric and

photometric changes The gradient magnitude m(x y) and orientation 1050592(x

y) are calculated using x and y directions gradients dx (x y) and dy (x y)

Orientation binning is the second step of HOG This method utilized

to create cell histograms Each pixel within the cell used to give a weight to

the orientation which is found in the gradient computation Gradient

magnitude is used as the weight The cells are in the rectangular form The

binning of gradient orientation should be spread over 0 to 180 degrees and

opposite direction counts as the same In the Fig 8 depicts the edge

orientation of picture elements If the images have any illumination and

contrast changes then the gradient strength must be locally normalized For

that cells are grouped together into larger blocks These blocks are

overlapping so that each cell contributes more than once to the final

descriptor Here rectangular HOG (R-HOG) blocks are applied which are

mainly in square grids The performance of HOG is improved by putting

on a Gaussian window into each block

FIG

CHAPTER 3

SOFTWARE SPECIFICATION

31 GENERAL

MATLAB(matrix laboratory) is a numerical

computing environment and fourth-generation programming language

Developed by Math Works MATLAB allows matrix manipulations

plotting of functions and data implementation of algorithms creation

of user interfaces and interfacing with programs written in other languages

including C C++ Java and Fortran

Although MATLAB is intended primarily for numerical computing an

optional toolbox uses the MuPAD symbolic engine allowing access

to symbolic computing capabilities An additional package Simulink adds

graphicalmulti-domainsimulationandModel-Based

Design for dynamic and embedded systems

In 2004 MATLAB had around one million users across industry

and academia MATLAB users come from various backgrounds

of engineering science and economics MATLAB is widely used in

academic and research institutions as well as industrial enterprises

MATLAB was first adopted by researchers and practitioners

in control engineering Littles specialty but quickly spread to many other

domains It is now also used in education in particular the teaching

of linear algebra and numerical analysis and is popular amongst scientists

involved in image processing The MATLAB application is built around the

MATLAB language The simplest way to execute MATLAB code is to type

it in the Command Window which is one of the elements of the MATLAB

Desktop When code is entered in the Command Window MATLAB can

be used as an interactive mathematical shell Sequences of commands can

be saved in a text file typically using the MATLAB Editor as a script or

encapsulated into a function extending the commands available

MATLAB provides a number of features for documenting and

sharing your work You can integrate your MATLAB code with other

languages and applications and distribute your MATLAB algorithms and

applications

32 FEATURES OF MATLAB

High-level language for technical computing

Development environment for managing code files and data

Interactive tools for iterative exploration design and problem solving

Mathematical functions for linear algebra statistics Fourier analysis

filtering optimization and numerical integration

2-D and 3-D graphics functions for visualizing data

Tools for building custom graphical user interfaces

Functions for integrating MATLAB based algorithms with external

applications and languages such as C C++ FORTRAN Javatrade COM

and Microsoft Excel

MATLAB is used in vast area including signal and image

processing communications control design test and measurement

financial modeling and analysis and computational Add-on toolboxes

(collections of special-purpose MATLAB functions) extend the MATLAB

environment to solve particular classes of problems in these application

areas

MATLAB can be used on personal computers and powerful

server systems including the Cheaha compute cluster With the addition of

the Parallel Computing Toolbox the language can be extended with parallel

implementations for common computational functions including for-loop

unrolling Additionally this toolbox supports offloading computationally

intensive workloads to Cheaha the campus compute cluster MATLAB is

one of a few languages in which each variable is a matrix (broadly

construed) and knows how big it is Moreover the fundamental operators

(eg addition multiplication) are programmed to deal with matrices when

required And the MATLAB environment handles much of the bothersome

housekeeping that makes all this possible Since so many of the procedures

required for Macro-Investment Analysis involves matrices MATLAB

proves to be an extremely efficient language for both communication and

implementation

321 INTERFACING WITH OTHER LANGUAGES

MATLAB can call functions and subroutines written in the C

programming language or FORTRAN A wrapper function is created

allowing MATLAB data types to be passed and returned The dynamically

loadable object files created by compiling such functions are termed MEX-

files (for MATLAB executable)

Libraries written in Java ActiveX or NET can be directly called

from MATLAB and many MATLAB libraries (for

example XML or SQL support) are implemented as wrappers around Java

or ActiveX libraries Calling MATLAB from Java is more complicated but

can be done with MATLAB extension which is sold separately by Math

Works or using an undocumented mechanism called JMI (Java-to-Mat lab

Interface) which should not be confused with the unrelated Java that is also

called JMI

As alternatives to the MuPAD based Symbolic Math Toolbox

available from Math Works MATLAB can be connected

to Maple or Mathematical

Libraries also exist to import and export MathML

Development Environment

Startup Accelerator for faster MATLAB startup on Windows especially on

Windows XP and for network installations

Spreadsheet Import Tool that provides more options for selecting and

loading mixed textual and numeric data

Readability and navigation improvements to warning and error messages in

the MATLAB command window

Automatic variable and function renaming in the MATLAB Editor

Developing Algorithms and Applications

MATLAB provides a high-level language and development

tools that let you quickly develop and analyze your algorithms and

applications

The MATLAB Language

The MATLAB language supports the vector and matrix operations

that are fundamental to engineering and scientific problems It enables fast

development and execution With the MATLAB language you can

program and develop algorithms faster than with traditional languages

because you do not need to perform low-level administrative tasks such as

declaring variables specifying data types and allocating memory In many

cases MATLAB eliminates the need for lsquoforrsquo loops As a result one line of

MATLAB code can often replace several lines of C or C++ code

At the same time MATLAB provides all the features of a traditional

programming language including arithmetic operators flow control data

structures data types object-oriented programming (OOP) and debugging

features

MATLAB lets you execute commands or groups of commands one

at a time without compiling and linking enabling you to quickly iterate to

the optimal solution For fast execution of heavy matrix and vector

computations MATLAB uses processor-optimized libraries For general-

purpose scalar computations MATLAB generates machine-code

instructions using its JIT (Just-In-Time) compilation technology

This technology which is available on most platforms provides

execution speeds that rival those of traditional programming languages

Development Tools

MATLAB includes development tools that help you implement

your algorithm efficiently These include the following

MATLAB Editor

Provides standard editing and debugging features such as setting

breakpoints and single stepping

Code Analyzer

Checks your code for problems and recommends modifications to

maximize performance and maintainability

MATLAB Profiler

Records the time spent executing each line of code

Directory Reports

Scan all the files in a directory and report on code efficiency file

differences file dependencies and code coverage

Designing Graphical User Interfaces

By using the interactive tool GUIDE (Graphical User Interface

Development Environment) to layout design and edit user interfaces

GUIDE lets you include list boxes pull-down menus push buttons radio

buttons and sliders as well as MATLAB plots and Microsoft

ActiveXreg controls Alternatively you can create GUIs programmatically

using MATLAB functions

322 ANALYZING AND ACCESSING DATA

MATLAB supports the entire data analysis process from acquiring

data from external devices and databases through preprocessing

visualization and numerical analysis to producing presentation-quality

output

Data Analysis

MATLAB provides interactive tools and command-line functions for data

analysis operations including

Interpolating and decimating

Extracting sections of data scaling and averaging

Thresholding and smoothing

Correlation Fourier analysis and filtering

1-D peak valley and zero finding

Basic statistics and curve fitting

Matrix analysis

Data Access

MATLAB is an efficient platform for accessing data from

files other applications databases and external devices You can read data

from popular file formats such as Microsoft Excel ASCII text or binary

files image sound and video files and scientific files such as HDF and

HDF5 Low-level binary file IO functions let you work with data files in

any format Additional functions let you read data from Web pages and

XML

Visualizing Data

All the graphics features that are required to visualize engineering

and scientific data are available in MATLAB These include 2-D and 3-D

plotting functions 3-D volume visualization functions tools for

interactively creating plots and the ability to export results to all popular

graphics formats You can customize plots by adding multiple axes

changing line colors and markers adding annotation Latex equations and

legends and drawing shapes

2-D Plotting

Visualizing vectors of data with 2-D plotting functions that create

Line area bar and pie charts

Direction and velocity plots

Histograms

Polygons and surfaces

Scatterbubble plots

Animations

3-D Plotting and Volume Visualization

MATLAB provides functions for visualizing 2-D matrices 3-

D scalar and 3-D vector data You can use these functions to visualize and

understand large often complex multidimensional data Specifying plot

characteristics such as camera viewing angle perspective lighting effect

light source locations and transparency

3-D plotting functions include

Surface contour and mesh

Image plots

Cone slice stream and isosurface

323 PERFORMING NUMERIC COMPUTATION

MATLAB contains mathematical statistical and engineering

functions to support all common engineering and science operations These

functions developed by experts in mathematics are the foundation of the

MATLAB language The core math functions use the LAPACK and BLAS

linear algebra subroutine libraries and the FFTW Discrete Fourier

Transform library Because these processor-dependent libraries are

optimized to the different platforms that MATLAB supports they execute

faster than the equivalent C or C++ code

MATLAB provides the following types of functions for performing

mathematical operations and analyzing data

Matrix manipulation and linear algebra

Polynomials and interpolation

Fourier analysis and filtering

Data analysis and statistics

Optimization and numerical integration

Ordinary differential equations (ODEs)

Partial differential equations (PDEs)

Sparse matrix operations

MATLAB can perform arithmetic on a wide range of data types

including doubles singles and integers

CHAPTER 4

IMPLEMENTATION

41 GENERAL

Matlab is a program that was originally designed to simplify the

implementation of numerical linear algebra routines It has since grown into

something much bigger and it is used to implement numerical algorithms

for a wide range of applications The basic language used is very similar to

standard linear algebra notation but there are a few extensions that will

likely cause you some problems at first

42 SNAPSHOTS

ORIGINAL SCLERA IMAGE IS CONVERTED INTO GREY SCALE IMAGE

FIG

GREY SCALE IMAGE IS CONVERTED INTO BINARY IMAGE

FIG

EDGE DETECTON IS DONE BY OTSUrsquoS THRESHOLDING

FIG

SELECTING THE REGION OF INTEREST (SCLERA PART)

FIG

SELECTED ROI PART

FIG

FIG

ENHANCEMENT OF SCLERA IMAGE

FIG

FEATURE EXTRACTION OF SCLERA IMAGE USING GABOR

FILTERS

FIG

MATCHING WITH IMAGES IN DATABASE

FIG

DISPLAYING THE RESULT (MATCHED OR NOT MATCHED)

FIG

CHAPTER 5

APPLICATIONS

The applications of biometrics can be divided into the following three main groups

Commercial applications such as computer network login electronic data security ecommerce Internet access ATM credit card physical access control cellular phone PDA medical records management distance learning etc

Government applications such as national ID card correctional facility driverrsquos license social security welfare-disbursement border control Passport control etc

Forensic applications such as corpse identification criminal investigation terrorist identification parenthood determination missing children etc Traditionally commercial applications have used knowledge-based systems (eg PIN sand passwords) government applications have used token-based systems (eg ID cards and badges) and forensic applications have relied on human experts to match biometric features Biometric systems are being increasingly deployed in large scale civilian applications The Schiphol Premium scheme at the Amsterdam airport for example employs iris scan cards to speed up the passport and visa control procedures

CHAPTER 6

CONCLUSION AND FUTURE SCOPE

61 CONCLUSION

In this paper we proposed a new parallel sclera vein recognition

method which employees a two stage parallel approach for registration and

matching Even though the research focused on developing a parallel sclera

matching solution for the sequential line-descriptor method using CUDA

GPU architecture the parallel strategies developed in this research can be

applied to design parallel solutions to other sclera vein recognition methods

and general pattern recognition methods We designed the Y shape

descriptor to narrow the search range to increase the matching efficiency

which is a new feature extraction method to take advantage of the GPU

structures We developed the WPL descriptor to incorporate mask

information and make it more suitable for parallel computing which can

dramatically reduce data transferring and computation We then carefully

mapped our algorithms to GPU threads and blocks which is an important

step to achieve parallel computation efficiency using a GPU A work flow

which has high arithmetic intensity to hide the memory access latency was

designed to partition the computation task to the heterogeneous system of

CPU and GPU even to the threads in GPU The proposed method

dramatically improves the matching efficiency without compromising

recognition accuracy

62 REFERENCES

[1] C W Oyster The Human Eye Structure and Function Sunderland

Sinauer Associates 1999

[2] C Cuevas D Berjon F Moran and N Garcia ldquoMoving object

detection for real-time augmented reality applications in a GPGPUrdquo IEEE

Trans Consum Electron vol 58 no 1 pp 117ndash125 Feb 2012

[3] D C Cirean U Meier L M Gambardella and J Schmidhuber ldquoDeep

big simple neural nets for handwritten digit recognitionrdquo Neural Comput

vol 22 no 12 pp 3207ndash3220 2010

[4] F Z Sakr M Taher and A M Wahba ldquoHigh performance iris

recognition system on GPUrdquo in Proc ICCES 2011 pp 237ndash242

[5] G Poli J H Saito J F Mari and M R Zorzan ldquoProcessing

neocognitron of face recognition on high performance environment based

on GPU with CUDA architecturerdquo in Proc 20th Int Symp Comput

Archit High Perform Comput 2008 pp 81ndash88

[6] J Antikainen J Havel R Josth A Herout P Zemcik and M Hauta-

Kasari ldquoNonnegative tensor factorization accelerated using GPGPUrdquo IEEE

Trans Parallel Distrib Syst vol 22 no 7 pp 1135ndash1141 Feb 2011

[7] K-S Oh and K Jung ldquoGPU implementation of neural networksrdquo

Pattern Recognit vol 37 no 6 pp 1311ndash1314 2004

[8] P R Dixon T Oonishi and S Furui ldquoHarnessing graphics processors

for the fast computation of acoustic likelihoods in speech recognitionrdquo

Comput Speech Lang vol 23 no 4 pp 510ndash526 2009

[9] P Kaufman and A Alm ldquoClinical applicationrdquo Adlerrsquos Physiology of

the Eye 2003

[10] R N Rakvic B J Ulis R P Broussard R W Ives and N Steiner

ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics Security vol 4

no 4 pp 812ndash823 Dec 2009

[11] S Crihalmeanu and A Ross ldquoMultispectral scleral patterns for ocular

biometric recognitionrdquo Pattern Recognit Lett vol 33 no 14 pp 1860ndash

1869 Oct 2012

[12] W Wenying Z Dongming Z Yongdong L Jintao and G

Xiaoguang ldquoRobust spatial matching for object retrieval and its parallel

implementation on GPUrdquo IEEE Trans Multimedia vol 13 no 6 pp

1308ndash1318 Dec 2011Multimedia Sec Magdeburg Germany Sep 2004

pp 4ndash15

[13] Y Xu S Deka and R Righetti ldquoA hybrid CPU-GPGPU approach for

real-time elastographyrdquo IEEE Trans Ultrason Ferroelectr Freq Control

vol 58 no 12 pp 2631ndash2645 Dec 2011

[14] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive

multimodal eye recognitionrdquo Signal Image Video Process vol 7 no 4

pp 619ndash631 Jul 2013

[15] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive

approach for sclera image quality measurerdquo Int J Biometrics vol 5 no 2

pp 181ndash198 2013

[16] Z Zhou E Y Du N L Thomas and E J Delp ldquoA new human

identification method Sclera recognitionrdquo IEEE Trans Syst Man

Cybern A Syst Humans vol 42 no 3 pp 571ndash583 May 2012

Page 25: Major project report on

4 J D Owens M Houston D Luebke S Green J E Stone and J

C Phillips ldquoGPU computingrdquo Proc IEEE vol 96 no 5pp 879ndash899

May 2008

The graphics processing unit (GPU) has become an integral part of

todayrsquos mainstream computing systems Over the past six years there has

been a marked increase in the performance and capabilities of GPUs The

modern GPU is not only a powerful graphics engine but also a highly

parallel programmable processor featuring peak arithmetic and memory

bandwidth that substantially outpaces its CPU counterpart The GPUrsquos

rapid increase in both programmability and capability has spawned a

research community that has successfully mapped a broad range of

computationally demanding complex problems to the GPU This effort in

general purpose computing on the GPU also known as GPU computing

has positioned the GPU as a compelling alternative to traditional

microprocessors in high-performance computer systems of the future We

describe the background hardware and programming model for GPU

computing summarize the state of the art in tools and techniques and

present four GPU computing successes in game physics and computational

biophysics that deliver order-of-magnitude performance gains over

optimized CPU applications

5 H Proenccedila and L A Alexandre ldquoUBIRIS A noisy iris image

databaserdquoin Proc 13th Int Conf Image Anal Process 2005 pp 970ndash

977

This paper proposes algorithms for iris segmentation quality

enhancement match score fusion and indexing to improve both the

accuracy and the speed of iris recognition A curve evolution approach is

proposed to effectively segment a nonideal iris image using the modified

MumfordndashShah functional Different enhancement algorithms are

concurrently applied on the segmented iris image to produce multiple

enhanced versions of the iris image A support-vector-machine-based

learning algorithm selects locally enhanced regions from each globally

enhanced image and combines these good-quality regions to create a single

high-quality iris image Two distinct features are extracted from the high-

quality iris image The global textural feature is extracted using the 1-D log

polar Gabor transform and the local topological feature is extracted using

Euler numbers An intelligent fusion algorithm combines the textural and

topological matching scores to further improve the iris recognition

performance and reduce the false rejection rate whereas an indexing

algorithm enables fast and accurate iris identification The verification and

identification performance of the propose algorithms is validated and

compared with other algorithms using the CASIA Version 3 ICE 2005 and

UBIRIS iris databases

18 PROPOSED METHOD

Proposed a new parallel sclera vein recognition method using a two-

stage parallel approach for registration and matching A parallel sclera

matching solution for Sclera vein recognition using our sequential line-

descriptor method using the CUDA GPU architecture CUDA is a highly

parallel multithreaded many-core processor with tremendous

computational power

It supports not only a traditional graphics pipeline but also computation

on non-graphical data It is relatively straightforward to implement our C

program for CUDA on AMD-based GPU using Open CL Our CUDA

kernels can be directly converted to Open CL kernels by concerning

different syntax for various keywords and built-in functions The mapping

strategy is also effective in Open CL if we regard thread and block in

CUDA as work item and work-group in Open CL Most of our optimization

techniques such as coalesced memory access and prefix sum can work in

Open CL too Moreover since CUDA is a data parallel architecture the

implementation of our approach by Open CL should be programmed in

data-parallel model

In this research we first discuss why the naiumlve parallel approach would

not work We then propose the new sclera descriptor ndash the Y shape sclera

feature-based efficient registration method to speed up the mapping scheme

introduce the ldquoweighted polar line (WPL) descriptorrdquo that would be better

suited for parallel computing to mitigate the mask size issue and develop

our coarse to fine two-stage matching process to dramatically improve the

matching speed These new approaches make the parallel processing

possible and efficient

191PROPOSED SYSTEM ADVANTAGES

1 To improve the efficiency in this research we propose a new descriptor

mdash the Y shape descriptor which can greatly help improve the efficiency of

the coarse registration of two images and can be used to filter out some

non-matching pairs before refined matching

2 We propose the coarse-to-fine two-stage matching process In the first

stage we matched two images coarsely using the Y-shape descriptors

which is very fast to match because no registration was needed The

matching result in this stage can help filter out image pairs with low

similarities

CHAPTER 2

PROJECT DESCRIPTION

21 INTRODUCTION

The sclera is the opaque and white outer layer of the eye The blood

vessel structure of sclera is formed randomly and is unique to each person

which can be used for humanrsquos identification Several researchers have

designed different Sclera vein recognition methods and have shown that it

is promising to use Sclera vein recognition for human identification In

Crihalmeanu and Ross proposed three approaches Speed Up Robust

Features (SURF)-based method minutiae detection and direct correlation

matching for feature registration and matching Within these three methods

the SURF method achieves the best accuracy It takes an average of 15

seconds1 using the SURF method to per- form a one-to-one matching Zhou

et al proposed line descriptor-based method for sclera vein recognition

The matching step (including registration) is the most time-consuming step

in this sclera vein recognition system which costs about 12 seconds to

perform a one-to-one matching Both speed was calculated using a PC with

Intelreg Coretrade 2 Duo 24GHz processors and 4 GB DRAM Currently

Sclera vein recognition algorithms are designed using central processing

unit (CPU)-based systems

As discussed CPU-based systems are designed as sequential

processing devices which may not be efficient in data processing where the

data can be parallelized Because of large time consumption in the matching

step Sclera vein recognition using sequential-based method would be very

challenging to be implemented in a real time biometric system especially

when there is large number of templates in the database for matching GPUs

(as abbreviation of General purpose Graphics Processing Units GPGPUs)

are now popularly used for parallel computing to improve the

computational processing speed and efficiency The highly parallel

structure of GPUs makes them more effective than CPUs for data

processing where processing can be performed in parallel GPUs have been

widely used in biometrics recognition such as speech recognition text

detection handwriting recognition and face recognition In iris

recognition GPU was used to extract the features construct descriptors

and match templates

GPUs are also used for object retrieval and image search Park et al

designed the performance evaluation of image processing algorithms such

as linear feature extraction and multi-view stereo matching on GPUs

However these approaches were designed for their specific biometric

recognition applications and feature searching methods Therefore they may

not be efficient for Sclera vein recognition Compute Unified Device

Architecture (CUDA) the computing engine of NVIDIA GPUs is used in

this research CUDA is a highly parallel multithreaded many-core

processor with tremendous computational power It supports not only a

traditional graphics pipeline but also computation on non-graphical data

More importantly it offers an easier programming platform which

outperforms its CPU counterparts in terms of peak arithmetic intensity and

memory bandwidth In this research the goal is not to develop a unified

strategy to parallelize all sclera matching methods because each method is

quite different from one another and would need customized design To

develop an efficient parallel computing scheme it would need different

strategies for different Sclera vein recognition methods

Rather the goal is to develop a parallel sclera matching solution for

Sclera vein recognition using our sequential line-descriptor method using

the CUDA GPU architecture However the parallelization strategies

developed in this research can be applied to design parallel approaches for

other Sclera vein recognition methods and help parallelize general pattern

recognition methods Based on the matching approach in there are three

challenges to map the task of sclera feature matching to GPU

1) Mask files are used to calculate valid overlapping areas of two sclera

templates and to align the templates to the same coordinate system But the

mask files are large in size and will preoccupy the GPU memory and slow

down the data transfer Also some of processing on the mask files will

involve convolution which is difficult to improve its performance on the

scalar process unit on CUDA

2) The procedure of sclera feature matching consists of a pipeline of several

computational stages with different memory and processing requirements

There is no uniform mapping scheme applicable for all these stages

3) When the scale of sclera database is far larger than the number of the

processing units on the GPU parallel matching on the GPU is still unable to

satisfy the requirement of real-time performance New designs are

necessary to help narrow down the search range In summary naiumlve

implementation of the algorithms in parallel would not work efficiently

Note it is relatively straightforward to implement our C program for

CUDA on AMD-based GPU using Open CL Our CUDA kernels can be

directly converted to Open CL kernels by concerning different syntax for

various keywords and built-in functions The mapping strategy is also

effective in Open CL if we regard thread and block in CUDA as work item

and work-group in Open CL Most of our optimization techniques such as

coalesced memory access and prefix sum can work in Open CL too

Moreover since CUDA is a data parallel architecture the implementation

of our approach by Open CL should be programmed in data-parallel model

In this research we first discuss why the naiumlve parallel approach would not

work (Section 3) We then propose the new sclera descriptor ndash the Y shape

sclera feature-based efficient registration method to speed up the mapping

scheme (Section 4) introduce the ldquoweighted polar line (WPL) descriptorrdquo

that would be better suited for parallel computing to mitigate the mask size

issue (Section 5) and develop our coarse to fine two-stage matching

process to dramatically improve the matching speed (Section 6) These new

approaches make the parallel processing possible and efficient However it

is non-trivial to implement these algorithms in CUDA We then developed

the implementation schemes to map our algorithms into CUDA (Section 7)

In the Section 2 we give brief introduction of Sclera vein recognition In

the Section 8 we performed some experiments using the proposed system

In the Section 9 we draw some conclusions

22 BACKGROUND OF SCLERA VEIN RECOGNITION

221 OVERVIEW OF SCLERA VEIN RECOGNITION

A typical sclera vein recognition system includes sclera

segmentation feature enhancement feature extraction and feature

matching (Figure 1)

FIG

Sclera image segmentation is the first step in sclera vein recognition

Several methods have been designed for sclera segmentation Crihalmeanu

et al presented an semi-automated system for sclera segmentation They

used a clustering algorithm to classify the color eye images into three

clusters - sclera iris and background Later on Crihalmeanu and Ross

designed a segmentation approach based on a normalized sclera index

measure which includes coarse sclera segmentation pupil region

segmentation and fine sclera segmentation Zhou et al developed a skin

tone plus ldquowhite colorrdquo-based voting method for sclera segmentation in

color images and Otsursquos thresholding-based method for grayscale images

After sclera segmentation it is necessary to enhance and extract the sclera

features since the sclera vein patterns often lack contrast and are hard to

detect Zhou et al used a bank of multi-directional Gabor filters for

vascular pattern enhancement Derakhshani et al used contrast limited

adaptive histogram equalization (CLAHE) to enhance the green color plane

of the RGB image and a multi-scale region growing approach to identify

the sclera veins from the image background Crihalmeanu and Ross applied

a selective enhancement filter for blood vessels to extract features from the

green component in a color image In the feature matching step

Crihalmeanu and Ross proposed

three registration and matching approaches including Speed Up Robust

Features (SURF) which is based on interest-point detection minutiae

detection which is based on minutiae points on the vasculature structure

and direct correlation matching which relies on image registration Zhou et

al designed a line descriptor based feature registration and matching

method

The proposed sclera recognition consists of five steps which include

sclera segmentation vein pattern enhancement feature extraction feature

matching and matching decision Fig 2 shows the block diagram of sclera

recognition Two types of feature extraction are used in the proposed

method to achieve good accuracy for the identification The characteristics

that are elicited from the blood vessel structure seen in the sclera region are

Histogram of Oriented Gradients (HOG) and interpolation of Cartesian to

Polar conversion HOG is used to determine the gradient orientation and

edge orientations of vein pattern in the sclera region of an eye image To

become more computationally efficient the data of the image are converted

to the polar form It is mainly used for circular or quasi circular shape of

object These two characteristics are extracted from all the images in the

database and compared with the features of the query image whether the

person is correctly identified or not This procedure is done in the feature

matching step and ultimately makes the matching decision By using the

proposed feature extraction methods and matching techniques the human

identification is more accurate than the existing studies In the proposed

method two features of an image are drawn out

222 SCLERA SEGMENTATION

Sclera segmentation is the first step in the sclera recognition It lets

in three steps glare area detection sclera area estimation and iris and eyelid

detection and refinement Fig shows the steps of segmentation

FIG

Glare Area Detection Glare area means a small bright area near

pupil or iris This is the unwanted portion on the eye image Sobel filter is

applied to detect the glare area present in the iris or pupil Simply it runs

only for the grayscale image If the image is color then it needs a

conversion to grayscale image and after that apply it to the Sobel filter to

detect the glare area Fig 4 shows the result of the glare area detection

FIG

Sclera area estimation For the estimation of sclera area Otsursquos

thresholding method is applied The stairs of the sclera area detection are

selection of the area of interest (ROI) Otsursquos thresholding sclera area

detection Left and right sclera area is selected based on the iris boundaries

When the region of interest is selected then apply Otsursquos thresholding for

obtaining the potential sclera areas The correct left sclera area should be

placed in the right and center positions and correct right sclera area should

be placed in the left and center In this way non sclera areas are wiped out

223 IRIS AND EYELID REFINEMENT

The top and underside of the sclera regions are the limits of the

sclera area And then that upper eyelid lower eyelid and iris boundaries are

refined These altogether are the unwanted portion for recognition In order

to eliminate these effects refinement is done in the footstep of the detection

of sclera area Fig shows after the Otsursquos thresholding process and iris and

eyelid refinement to detect right sclera area In the same way the left sclera

area is detected using this method

FIG

In the segmentation process all images are not perfectly segmented

Hence feature extraction and matching are needed to reduce the

segmentation fault The vein patterns in the sclera area are not visible in the

segmentation process To get vein patterns more visible vein pattern

enhancement is to be performed

224 OCULAR SURFACE VASCULATURE

Human recognition using vascular patterns in the human body has

been studied in the context of fingers (Miura et al 2004) palm (Lin and

Fan 2004) and retina (Hill 1999) In the case of retinal biometrics an

especial optical device for imaging the back of the eyeball is needed (Hill

1999) Due to its perceived invasiveness and the required degree of subject

cooperation the use of retinal biometrics may not be acceptable to some

individuals The conjunctiva is a thin transparent and moist tissue that

covers the outer surface of the eye The part of the conjunctiva that covers

the inner lining of the eyelids is called palpebral conjunctiva and the part

that covers the outer surface of the eye is called ocular (or the bulbar)

conjunctiva which is the focus of this study The ocular conjunctiva is very

thin and clear thus the vasculature (including those of the episclera) is

easily visible through it The visible microcirculation of conjunctiva offers a

rich and complex network of veins and fine microcirculation (Fig 1) The

apparent complexity and specificity of these vascular patterns motivated us

to utilize them for personal identification (Derakhshani and Ross 2006)

FIG

We have found conjunctival vasculature to be a suitable biometric as it

conforms to the following criteria (Jain et al 2004)

UNIVERSALITY All normal living tissues including that of the

conjunctiva and episclera have vascular structure

UNIQUENESS Vasculature is created during embryonic vasculogenesis

Its detailed final structure is mostly stochastic and thus unique Even

though no comprehensive study on the uniqueness of vascular structures

has been conducted study of some targeted areas such as those of the eye

fundus confirm the uniqueness of such vascular patterns even between

identical twins (Simon and Goldstein 1935 Tower 1955)

PERMANENCE Other than cases of significant trauma pathology or

chemical intervention spontaneous adult ocular vasculogenesis and

angiogenesis does not easily occur Thus the conjunctival vascular

structure is expected to have reasonable permanence (Joussen 2001)

Practicality Conjunctival vasculature can be captured with commercial off

the shelf digital cameras under normal lighting conditions making this

modality highly practical

ACCEPTABILITY Since the subject is not required to stare directly into

the camera lens and given the possibility of capturing the conjunctival

vasculature from several feet away this modality is non-intrusive and thus

more acceptable

SPOOF-PROOFNESS The fine multi surface structure of the ocular

veins makes them hard to reproduce as a physical artifact Besides being a

stand-alone biometric modality we anticipate that the addition of

conjunctival biometrics will enhance the performance of current iris-based

biometric system in the following ways

Improving accuracy by the addition of vascular features

Facilitating recognition using off-angle iris images For instance if the iris

information is relegated to the left or right portions of the eye the sclera

vein patterns will be further exposed This feature makes sclera vasculature

a natural complement to the iris biometric

Addressing the failure-to-enroll issue when iris patterns are not usable (eg

due to surgical procedures)

Reducing vulnerability to spoof attacks For instance when implemented

alongside iris systems an attacker needs to reproduce not only the iris but

also different surfaces of the sclera along with the associated

microcirculation and make them available on commensurate eye surfaces

The first step in parallelizing an algorithm is to determine the

availability for simultaneous computation Below Figure demonstrates the

possibility for parallel directional filtering Since the filter is computed over

different portions of the input image the computation can be computed in

parallel (denoted by Elements below) In addition individual parallelization

of each element of Filtering can also be performed A detailed discussion of

our proposed parallelization is outside the scope of this paper

FIG

FIG

225 OVERVIEW OF THE LINE DESCRIPTOR-BASED SCLERA

VEIN

2251 RECOGNITION METHOD

The matching segment of the line-descriptor based method is a

bottleneck with regard to matching speed In this section we briefly

describe the Line Descriptor-based sclera vein recognition method After

segmentation vein patterns were enhanced by a bank of directional Gabor

filters Binary morphological operations are used to thin the detected vein

structure down to a single pixel wide skeleton and remove the branch

points The line descriptor is used to describe the segments in the vein

structure Figure 2 shows a visual description of the line descriptor Each

segment is described by three quantities the segments angle to some

reference angle at the iris center θ the segments distance to the iris center r

and the dominant angular orientation of the line segment ɸ Thus the

descriptor is S = ( θ r ɸ )T The individual components of the line descriptor

are calculated as

FIG

Here fline (x) is the polynomial approximation of the line segment (xl yl )

is the center point of the line segment (xi yi ) is the center of the detected

iris and S is the line descriptor In order to register the segments of the

vascular patterns a RANSAC-based algorithm is used to estimate the best-

fit parameters for registration between the two sclera vascular patterns For

the registration algorithm it randomly chooses two points ndash one from the

test template and one from the target template It also randomly chooses a

scaling factor and a rotation value based on a priori knowledge of the

database Using these values it calculates a fitness value for the registration

using these parameters

After sclera template registration each line segment in the test

template is compared to the line segments in the target template for

matches In order to reduce the effect of segmentation errors we created the

weighting image (Figure 3) from the sclera mask by setting interior pixels

in the sclera mask to 1 pixels within some distance of the boundary of the

mask to 05 and pixels outside the mask to 0

The matching score for two segment descriptors is calculated By

where Si and Sj are two segment descriptors m(Si Sj ) is the matching

score between segments Si and Sj d(Si Sj ) is the Euclidean distance

between the segment descriptors center points (from Eq 6-8) Dmatch is

the matching distance threshold and match is the matching angle threshold

The total matching score M is the sum of the individual matching scores

divided by the maximum matching score for the minimal set between the

test and target template That is one of the test or target templates has fewer

points and thus the sum of its descriptors weight sets the maximum score

that can be attained

FIG

FIG

FIG

FIG

movement of eye Y shape branches are observed to be a stable feature and

can be used as sclera feature descriptor To detect the Y shape branches in

the original template we search for the nearest neighbors set of every line

segment in a regular distance classified the angles among these neighbors

If there were two types of angle values in the line segment set this set may

be inferred as a Y shape structure and the line segment angles would be

recorded as a new feature of the sclera

There are two ways to measure both orientation and relationship of

every branch of Y shape vessels one is to use the angles of every branch to

x axle the other is to use the angels between branch and iris radial

direction The first method needs additional rotation operating to align the

template In our approach we employed the second method As Figure 6

shows ϕ1 ϕ2 and ϕ3 denote the angle between each branch and the radius

from pupil center Even when the head tilts the eye moves or the camera

zooms occurs at the image acquisition step ϕ1 ϕ2 and ϕ3 are quite stable

To tolerate errors from the pupil center calculation in the segmentation step

we also recorded the center position (x y) of the Y shape branches as

auxiliary parameters So in our rotation shift and scale invariant feature

vector is defined as y(ϕ1 ϕ2 ϕ3 x y) The Y-shape descriptor is generated

with reference to the iris center Therefore it is automatically aligned to the

iris centers It is a rotational- and scale- invariant descriptor V WPL

SCLERA DESCRIPTOR As we discussed in the Section 22 the line

descriptor is extracted from the skeleton of vessel structure in binary images

(Figure 7) The skeleton is then broken into smaller segments For each

segment a line descriptor is created to record the center and orientation of

the segment This descriptor is expressed as s(x yɸ) where (x y) is the

position of the center and ɸ is its orientation Because of the limitation of

segmentation accuracy the descriptor in the boundary of sclera area might

not be accurate and may contain spur edges resulting from the iris eyelid

andor eyelashes To be tolerant of such error the mask file

FIG

The line descriptor of the sclera vessel pattern (a) An eye image (b) Vessel

patterns in sclera (c) Enhanced sclera vessel patterns (d) Centers of line

segments of vessel patterns

Is designed to indicate whether a line segment belongs to the edge of the

sclera or not However in GPU application using the mask is a challenging

since the mask files are large in size and will occupy the GPU memory and

slow down the data transfer When matching the registration RANSAC

type algorithm was used to randomly select the corresponding descriptors

and the transform parameter between them was used to generate the

template transform affine matrix After every templates transform the mask

data should also be transformed and new boundary should be calculated to

evaluate the weight of the transformed descriptor This results in too many

convolutions in processor unit

To reduce heavy data transfer and computation we designed the

weighted polar line (WPL) descriptor structure which includes the

information of mask and can be automatically aligned We extracted the

relationship of geometric feature of descriptors and store them as a new

descriptor We use a weighted image created via setting various weight

values according to their positions The weight of those descriptors who are

beyond the sclera are set to be 0 and those who are near the sclera

boundary are 05 and interior descriptors are set to be 1 In our work

descriptors weights were calculated on their own mask by the CPU only

once

The calculating result was saved as a component of descriptor The

descriptor of sclera will change to s(x y ɸw) where w denotes the weight

of the point and the value may be 0 05 1 To align two templates when a

template is shifted to another location along the line connecting their

centers all the descriptors of that template will be transformed It would be

faster if two templates have similar reference points If we use the center of

the iris as the reference point when two templates are compared the

correspondence will automatically be aligned to each other since they have

the similar reference point Every feature vector of the template is a set of

line segment descriptors composed of three variable (Figure 8) the

segment angle to the reference line which went through the iris center

denoted as θ the distance between the segments center and pupil center

which is denoted as r the dominant angular orientation of the segment

denoted as ɸ To minimize the GPU computing we also convert the

descriptor value from polar coordinate to rectangular coordinate in CPU

preprocess

The descriptor vector becomes s(x y r θ ɸw) The left and right

parts of sclera in an eye may have different registration parameters For

example as an eyeball moves left left part sclera patterns of the eye may be

compressed while the right part sclera patterns are stretched

In parallel matching these two parts are assigned to threads in

different warps to allow different deformation The multiprocessor in

CUDA manages threads in groups of 32 parallel threads called warps We

reorganized the descriptor from same sides and saved

FIG

FIG

them in continuous address This would meet requirement of coalesced

memory access in GPU

After reorganizing the structure of descriptors and adding mask information

into the new descriptor the computation on the mask file is not needed on

the GPU It was very fast to match with this feature because it does not

need to reregister the templates every time after shifting Thus the cost of

data transfer and computation on GPU will be reduced Matching on the

new descriptor the shift parameters generator in Figure 4 is then simplified

as Figure 9

23 EVOLUTION OF GPU ARCHITECTURE

The fixed-function pipeline lacked the generality to efficiently express

more complicated shading and lighting operations that are essential for

complex effects The key step was replacing the fixed-function per-vertex

and per-fragment operations with user-specified programs run on each

vertex and fragment Over the past six years these vertex programs and

fragment programs have become increasingly more capable with larger

limits on their size and resource consumption with more fully featured

instruction sets and with more flexible control-flow operations After many

years of separate instruction sets for vertex and fragment operations current

GPUs support the unified Shader Model 40 on both vertex and fragment

shaders

The hardware must support shader programs of at least 65 k static

instructions and unlimited dynamic instructions

The instruction set for the first time supports both 32-bit integers and 32-

bit floating-point numbers

The hardware must allow an arbitrary number of both direct and indirect

reads from global memory (texture)

Finally dynamic flow control in the form of loops and branches must be

supported

As the shader model has evolved and become more powerful and GPU

applications of all types have increased vertex and fragment program

complexity GPU architectures have increasingly focused on the

programmable parts of the graphics pipeline Indeed while previous

generations of GPUs could best be described as additions of

programmability to a fixed-function pipeline todayrsquos GPUs are better

characterized as a programmable engine surrounded by supporting fixed-

function units General-Purpose Computing on the GPU Mapping general-

purpose computation onto the GPU uses the graphics hardware in much the

same way as any standard graphics application Because of this similarity it

is both easier and more difficult to explain the process On one hand the

actual operations are the same and are easy to follow on the other hand the

terminology is different between graphics and general-purpose use Harris

provides an excellent description of this mapping process

We begin by describing GPU programming using graphics terminology

then show how the same steps are used in a general-purpose way to author

GPGPU applications and finally use the same steps to show the more

simple and direct way that todayrsquos GPU computing applications are written

231 PROGRAMMING A GPU FOR GRAPHICS

We begin with the same GPU pipeline that we described in Section II

concentrating on the programmable aspects of this pipeline

The programmer specifies geometry that covers a region on the screen

The rasterizer generates a fragment at each pixel location covered by that

geometry

Each fragment is shaded by the fragment program

The fragment program computes the value of the fragment by a

combination of math operations and global memory reads from a global

Btexture[ memory

The resulting image can then be used as texture on future passes through

the graphics pipeline

232 PROGRAMMING A GPU FOR GENERAL-PURPOSE

PROGRAMS (OLD)

Coopting this pipeline to perform general-purpose computation

involves the exact same steps but different terminology A motivating

example is a fluid simulation computed over a grid at each time step we

compute the next state of the fluid for each grid point from the current state

at its grid point and at the grid points of its neighbors

The programmer specifies a geometric primitive that covers a

computation domain of interest The rasterizer generates a fragment at each

pixel location covered by that geometry (In our example our primitive

must cover a grid of fragments equal to the domain size of our fluid

simulation)

Each fragment is shaded by an SPMD general-purpose fragment

program (Each grid point runs the same program to update the state of its

fluid)

The fragment program computes the value of the fragment by a

combination of math operations and Bgather[ accesses from global

memory (Each grid point can access the state of its neighbors from the

previous time step in computing its current value)

The resulting buffer in global memory can then be used as an input on

future passes (The current state of the fluid will be used on the next time

step)

23 2 PROGRAMMING A GPU FOR GENERAL-PURPOSE

PROGRAMS (NEW)

One of the historical difficulties in programming GPGPU applications

has been that despite their general-purpose tasksrsquo having nothing to do with

graphics the applications still had to be programmed using graphics APIs

In addition the program had to be structured in terms of the graphics

pipeline with the programmable units only accessible as an intermediate

step in that pipeline when the programmer would almost certainly prefer to

access the programmable units directly The programming environments we

describe in detail in Section IV are solving this difficulty by providing a

more natural direct non-graphics interface to the hardware and

specifically the programmable units Today GPU computing applications

are structured in the following way

The programmer directly defines the computation domain of interest as a

structured grid of threads

An SPMD general-purpose program computes the value of each thread

The value for each thread is computed by a combination of math

operations and both Bgather[ (read) accesses from and Bscatter[ (write)

accesses to global memory Unlike in the previous two

methods the same buffer can be used for both reading and writing

allowing more flexible algorithms (for example in-place algorithms that

use less memory)

The resulting buffer in global memory can then be used as an input in

future computation

24 COARSE-TO-FINE TWO-STAGE MATCHING PROCESS

To further improve the matching process we propose the coarse-to-fine

two-stage matching process In the first stage we matched two images

coarsely using the Y-shape descriptors which is very fast to match because

no registration was needed The matching result in this stage can help filter

out image pairs with low similarities After this step it is still possible for

some false positive matches In the second stage we used WPL descriptor

to register the two images for more detailed descriptor matching including

scale- and translation invariance This stage includes shift transform affine

matrix generation and final WPL descriptor matching Overall we

partitioned the registration and matching processing into four kernels2 in

CUDA (Figure 10) matching on the Y shape descriptor shift

transformation affine matrix generation and final WSL descriptor

matching Combining these two stages the matching program can run faster

and

achieve more accurate score

241 STAGE I MATCHING WITH Y SHAPE DESCRIPTOR

Due to scale- and rotation- invariance of the Y-shape features

registration is unnecessary before matching on Y shape descriptor The

whole matching algorithm is listed as algorithm 1

FIG

Here ytei and yta j are the Y shape descriptors of test template Tte

and target template Tta respectively dϕ is the Euclidian distance of angle

element of descriptors vector defined as (3) dxy is the Euclidian distance of

two descriptor centers defined as (4) ni and di are the matched descriptor

pairsrsquo number and their centers distance respectively tϕ is a distance

threshold and txy is the threshold to restrict the searching area We set tϕ to

30 and txy to 675 in our experiment Here

To match two sclera templates we searched the areas nearby to all

the Y shape branches The search area is limited to the corresponding left or

right half of the sclera in order to reduce the searching range and time The

distance of two branches is defined in (3) where ϕi j is the angle between

the j th branch and the polar from pupil center in desctiptor i

The number of matched pairs ni and the distance between Y shape

branches centers di are stored as the matching result We fuse the number of

matched branches and the average distance between matched branches

centers as (2) Here α is a factor to fuse the matching score which was set

to 30 in our study Ni and Nj is the total numbers of feature vectors in

template i and j separately The decision is regulated by the threshold t if

the sclerarsquos matching score is lower than t the sclera will be discarded The

sclera with high matching score will be passed to the next more precisely

matching process

242 STAGE II FINE MATCHING USING WPL DESCRIPTOR

The line segment WSL descriptor reveals more vessel structure detail of

sclera than the Y shape descriptor The variation of sclera vessel pattern is

nonlinear because

When acquiring an eye image in different gaze angle the vessel structure

will appear nonlinear shrink or extend because eyeball is spherical in shape

sclera is made up of four layers episclera stroma lamina fusca and

endothelium There are slightly differences among movement of these

layers Considering these factors our registration employed both single

shift transform and multi-parameter transform which combines shift

rotation and scale together

1) SHIFT PARAMETER SEARCH As we discussed before

segmentation may not be accurate As a result the detected iris center could

not be very accurate Shift transform is designed to tolerant possible errors

in pupil center detection in the segmentation step If there is no deformation

or only very minor deformation registration with shift transform together

would be adequate to achieve an accurate result We designed Algorithm 2

to get optimized shift parameter Where Tte is the test template and ssei is

the i th WPL descriptor of Tte Tta is the target template and ssai is the i th

WPL descriptor of Ttad(stek staj ) is Euclidean distance of escriptors stek

and staj

Δsk is the shift value of two descriptors defines as

We first randomly select an equal number of segment descriptors

stek in test template Tte from each quad and find its nearest neighbors staj _

in target template Tta The shift offset of them is recorded as the possible

registration shift factor _sk The final offset registration factor is _soptim

which has the smallest standard deviation among these candidate offsets

2) AFFINE TRANSFORM PARAMETER SEARCH

Affine transform is designed to tolerant some deformation of sclera

patterns in the matching step The affine transform algorithm is shown in

Algorithm 3 The shift value in the parameter set is obtained by randomly

selecting descriptor s(it )te and calculating the distance from its nearest

neighbor staj_ in Tta We transform the test template by the matrix in (7)

At end of the iteration we count the numbers of matched descriptor pairs

from the transformed template and the target template The factor β is

involved to determine if the pair of descriptor is matched and we set it to

be 20 pixels in our experiment After N iterations the optimized transform

parameter set is determined via selecting the maximum matching numbers

m(it) Here stei Tte staj and Tta is defined same as algorithm 2 tr (it )

shi f t θ(it )tr (it ) scale is the parameters of shift rotation and scale

transform generated in i tth iteration R(θ (it )) T (tr (it ) shi f t ) and S(tr (it

) scale) are the transform matrix defined as (7) To search optimize

transform parameter we iterated N times to generate these parameters In

our experiment we set iteration time to 512

3) REGISTRATION AND MATCHING ALGORITHM

Using the optimized parameter set determined from Algorithms 2

and 3 the test template will be registered and matched simultaneously The

registration and matching algorithm is listed in Algorithm 4 Here stei Tte

staj and Tta are defined same as Algorithms 2 and 3 θ(optm) tr (optm) shi f

t tr (optm) scale _soptim are the registration parameters attained from

Algorithms 2 and 3 R_θ(optm)_T _tr (optm) shi f t _S_tr (optm) scale

_ is the descriptor transform matrix defined in Algorithm 3 empty is the angle

between the segment descriptor and radius direction w is the weight of the

descriptor which indicates whether the descriptor is at the edge of sclera or

not To ensure that the nearest descriptors have a similar orientation we

used a constant factor α to check the abstract difference of two ɸ In our

experiment we set α to 5 The total matching score is minimal score of two

transformed result divided by the minimal matching score for test template

and target template

25 MAPPING THE SUBTASKS TO CUDA

CUDA is a single instruction multiple data (SIMD) system and

works as a coprocessor with a CPU A CUDA consists of many streaming

multiprocessors (SM) where the parallel part of the program should be

partitioned into threads by the programmer and mapped into those threads

There are multiple memory spaces in the CUDA memory hierarchy

register local memory shared memory global memory constant memory

and texture memory Register local memory and shared memory are on-

chip and could be a little time consuming to access these memories Only

shared memory can be accessed by other threads within the same block

However there is only limited availability of shared memory Global

memory constant memory and texture memory are off-chip memory and

accessible by all threads which would be very time consuming to access

these memories

Constant memory and texture memory are read-only and cacheable

memory Mapping algorithms to CUDA to achieve efficient processing is

not a trivial task There are several challenges in CUDA programming

If threads in a warp have different control path all the branches will be

executed serially To improve performance branch divergence within a

warp should be avoided

Global memory is slower than on-chip memory in term of access To

completely hide the latency of the small instructions set we should use on-

chip memory preferentially rather than global memory When global

memory access occurs threads in same warp should access the words in

sequence to achieve coalescence

Shared memory is much faster than the local and global memory space

But shared memory is organized into banks which are equal in size If two

addresses of memory request from different thread within a warp fall in the

same memory bank the access will be serialized To get maximum

performance memory requests should be scheduled to minimize bank

conflicts

251 MAPPING ALGORITHM TO BLOCKS

Because the proposed registration and matching algorithm has four

independent modules all the modules will be converted to different kernels

on the GPU These kernels are different in computation density thus we

map them to the GPU by various map strategies to fully utilize the

computing power of CUDA Figure 11 shows our scheme of CPU-GPU

task distribution and the partition among blocks and threads Algorithm 1 is

partitioned into coarse-grained parallel subtasks

We create a number of threads in this kernel The number of threads

is the same as the number of templates in the database As the upper middle

column shows in Figure 11 each target template will be assigned to one

thread One thread performs a pair of templates compare In our work we

use NVIDIA C2070 as our GPU Threads and blocks number is set to

1024 That means we can match our test template with up to 1024times1024

target templates at same time

Algorithms 2-4 will be partitioned into fine-grained subtasks which is

processed a section of descriptors in one thread As the lower portion of the

middle column shows in Figure 11 we assigned a target template to one

block Inside a block one thread corresponds a set of descriptors in this

template This partition makes every block execute independently and there

are no data exchange requirements between different blocks When all

threads complete their responding descriptor fractions the sum of the

intermediate results needs to be computed or compared A parallel prefix

sum algorithm is used to calculate the sum of intermediate results which is

show in right of Figure 11 Firstly all odd number threads compute the sum

of consecutive pairs of the results Then recursively every first of i (= 4 8

16 32 64 ) threads

compute the prefix sum on the new result The final result will be saved in

the first address which has the same variable name as the first intermediate

result

252 MAPPING INSIDE BLOCK

In shift argument searching there are two schemes we can choose to

map task

Mapping one pair of templates to all the threads in a block and then every

thread would take charge of a fraction of descriptors and cooperation with

other threads

Assigning a single possible shift offset to a thread and all the threads will

compute independently unless the final result should be compared with

other possible offset

Due to great number of sum and synchronization operations in every

nearest neighbor searching step we choose the second method to parallelize

shift searching In affine matrix generator we mapped an entire parameter

set searching to a thread and every thread randomly generated a set of

parameters and tried them independently The generated iterations were

assigned to all threads The challenge of this step is the randomly generated

numbers might be correlated among threads In the step of rotation and

scale registration generating we used the Mersenne Twister pseudorandom

number generator because it can use bitwise arithmetic and have long

period

The Mersenne twister as most of pseudorandom generators is iterative

Therefore itrsquos hard to parallelize a single twister state update step among

several execution threads To make sure that thousands of threads in the

launch grid generate uncorrelated random sequence many simultaneous

Mersenne twisters need to process with different initial states in parallel

But even ldquovery differentrdquo (by any definition) initial state values do not

prevent the emission of correlated sequences by each generator sharing

identical parameters To solve this problem and to enable efficient

implementation of Mersenne Twister on parallel architectures we used a

special offline tool for the dynamic creation of Mersenne Twisters

parameters modified from the algorithm developed by Makoto Matsumoto

and Takuji Nishimura In the registration and matching step when

searching the nearest neighbor a line segment that has already matched

with others should not be used again In our approach a flag

FIG

FIG

Variable denoting whether the line has been matched is stored in

shared memory To share the flags all the threads in a block should wait

synchronic operation at every query step Our solution is to use a single

thread in a block to process the matching

253 MEMORY MANAGEMENT

The bandwidth inside GPU board is much higher than the

bandwidth between host memory and device memory The data transfer

between host and device can lead to long latency As shown in Figure 11

we load the entire target templates set from database without considering

when they would be processed Therefore there was no data transfer from

host to device during the matching procedure In global memory the

components in descriptor y(ϕ1 ϕ2 ϕ3 x y) and s(x y rθ ϕw) were stored

separately This would guarantee contiguous kernels of Algorithm 2 to 4

can access their data in successive addresses Although such coalescing

access reduces the latency frequently global memory access was still a

slower way to get data In our kernel we loaded the test template to shared

memory to accelerate memory access Because the Algorithms 2 to 4

execute different number of iterations on same data the bank conflict does

not happen To maximize our texture memory space we set the system

cache to the lowest value and bonded our target descriptor to texture

memory Using this catchable memory our data access was accelerated

more

FIG

26 HISTOGRAM OF ORIENTED GRADIENTS

Histogram of oriented gradients is the feature descriptors It is primarily

applied to the design of target detection In this paper it is applied as the

feature for human recognition In the sclera region the vein patterns are the

edges of an image So HOG is used to determine the gradient orientation

and edge orientations of vein pattern in the sclera region of an eye image

To follow out this technique first of all divide the image into small

connected regions called cells For each cell compute the histogram of

gradient directions or edge orientations of the pixels Then the combination

of different histogram of different cell represents the descriptor To improve

accuracy histograms can be contrast normalized by calculating the intensity

from the block and then using this value normalizes all cells within the

block This normalization result shows that it is invariant to geometric and

photometric changes The gradient magnitude m(x y) and orientation 1050592(x

y) are calculated using x and y directions gradients dx (x y) and dy (x y)

Orientation binning is the second step of HOG This method utilized

to create cell histograms Each pixel within the cell used to give a weight to

the orientation which is found in the gradient computation Gradient

magnitude is used as the weight The cells are in the rectangular form The

binning of gradient orientation should be spread over 0 to 180 degrees and

opposite direction counts as the same In the Fig 8 depicts the edge

orientation of picture elements If the images have any illumination and

contrast changes then the gradient strength must be locally normalized For

that cells are grouped together into larger blocks These blocks are

overlapping so that each cell contributes more than once to the final

descriptor Here rectangular HOG (R-HOG) blocks are applied which are

mainly in square grids The performance of HOG is improved by putting

on a Gaussian window into each block

FIG

CHAPTER 3

SOFTWARE SPECIFICATION

31 GENERAL

MATLAB(matrix laboratory) is a numerical

computing environment and fourth-generation programming language

Developed by Math Works MATLAB allows matrix manipulations

plotting of functions and data implementation of algorithms creation

of user interfaces and interfacing with programs written in other languages

including C C++ Java and Fortran

Although MATLAB is intended primarily for numerical computing an

optional toolbox uses the MuPAD symbolic engine allowing access

to symbolic computing capabilities An additional package Simulink adds

graphicalmulti-domainsimulationandModel-Based

Design for dynamic and embedded systems

In 2004 MATLAB had around one million users across industry

and academia MATLAB users come from various backgrounds

of engineering science and economics MATLAB is widely used in

academic and research institutions as well as industrial enterprises

MATLAB was first adopted by researchers and practitioners

in control engineering Littles specialty but quickly spread to many other

domains It is now also used in education in particular the teaching

of linear algebra and numerical analysis and is popular amongst scientists

involved in image processing The MATLAB application is built around the

MATLAB language The simplest way to execute MATLAB code is to type

it in the Command Window which is one of the elements of the MATLAB

Desktop When code is entered in the Command Window MATLAB can

be used as an interactive mathematical shell Sequences of commands can

be saved in a text file typically using the MATLAB Editor as a script or

encapsulated into a function extending the commands available

MATLAB provides a number of features for documenting and

sharing your work You can integrate your MATLAB code with other

languages and applications and distribute your MATLAB algorithms and

applications

32 FEATURES OF MATLAB

High-level language for technical computing

Development environment for managing code files and data

Interactive tools for iterative exploration design and problem solving

Mathematical functions for linear algebra statistics Fourier analysis

filtering optimization and numerical integration

2-D and 3-D graphics functions for visualizing data

Tools for building custom graphical user interfaces

Functions for integrating MATLAB based algorithms with external

applications and languages such as C C++ FORTRAN Javatrade COM

and Microsoft Excel

MATLAB is used in vast area including signal and image

processing communications control design test and measurement

financial modeling and analysis and computational Add-on toolboxes

(collections of special-purpose MATLAB functions) extend the MATLAB

environment to solve particular classes of problems in these application

areas

MATLAB can be used on personal computers and powerful

server systems including the Cheaha compute cluster With the addition of

the Parallel Computing Toolbox the language can be extended with parallel

implementations for common computational functions including for-loop

unrolling Additionally this toolbox supports offloading computationally

intensive workloads to Cheaha the campus compute cluster MATLAB is

one of a few languages in which each variable is a matrix (broadly

construed) and knows how big it is Moreover the fundamental operators

(eg addition multiplication) are programmed to deal with matrices when

required And the MATLAB environment handles much of the bothersome

housekeeping that makes all this possible Since so many of the procedures

required for Macro-Investment Analysis involves matrices MATLAB

proves to be an extremely efficient language for both communication and

implementation

321 INTERFACING WITH OTHER LANGUAGES

MATLAB can call functions and subroutines written in the C

programming language or FORTRAN A wrapper function is created

allowing MATLAB data types to be passed and returned The dynamically

loadable object files created by compiling such functions are termed MEX-

files (for MATLAB executable)

Libraries written in Java ActiveX or NET can be directly called

from MATLAB and many MATLAB libraries (for

example XML or SQL support) are implemented as wrappers around Java

or ActiveX libraries Calling MATLAB from Java is more complicated but

can be done with MATLAB extension which is sold separately by Math

Works or using an undocumented mechanism called JMI (Java-to-Mat lab

Interface) which should not be confused with the unrelated Java that is also

called JMI

As alternatives to the MuPAD based Symbolic Math Toolbox

available from Math Works MATLAB can be connected

to Maple or Mathematical

Libraries also exist to import and export MathML

Development Environment

Startup Accelerator for faster MATLAB startup on Windows especially on

Windows XP and for network installations

Spreadsheet Import Tool that provides more options for selecting and

loading mixed textual and numeric data

Readability and navigation improvements to warning and error messages in

the MATLAB command window

Automatic variable and function renaming in the MATLAB Editor

Developing Algorithms and Applications

MATLAB provides a high-level language and development

tools that let you quickly develop and analyze your algorithms and

applications

The MATLAB Language

The MATLAB language supports the vector and matrix operations

that are fundamental to engineering and scientific problems It enables fast

development and execution With the MATLAB language you can

program and develop algorithms faster than with traditional languages

because you do not need to perform low-level administrative tasks such as

declaring variables specifying data types and allocating memory In many

cases MATLAB eliminates the need for lsquoforrsquo loops As a result one line of

MATLAB code can often replace several lines of C or C++ code

At the same time MATLAB provides all the features of a traditional

programming language including arithmetic operators flow control data

structures data types object-oriented programming (OOP) and debugging

features

MATLAB lets you execute commands or groups of commands one

at a time without compiling and linking enabling you to quickly iterate to

the optimal solution For fast execution of heavy matrix and vector

computations MATLAB uses processor-optimized libraries For general-

purpose scalar computations MATLAB generates machine-code

instructions using its JIT (Just-In-Time) compilation technology

This technology which is available on most platforms provides

execution speeds that rival those of traditional programming languages

Development Tools

MATLAB includes development tools that help you implement

your algorithm efficiently These include the following

MATLAB Editor

Provides standard editing and debugging features such as setting

breakpoints and single stepping

Code Analyzer

Checks your code for problems and recommends modifications to

maximize performance and maintainability

MATLAB Profiler

Records the time spent executing each line of code

Directory Reports

Scan all the files in a directory and report on code efficiency file

differences file dependencies and code coverage

Designing Graphical User Interfaces

By using the interactive tool GUIDE (Graphical User Interface

Development Environment) to layout design and edit user interfaces

GUIDE lets you include list boxes pull-down menus push buttons radio

buttons and sliders as well as MATLAB plots and Microsoft

ActiveXreg controls Alternatively you can create GUIs programmatically

using MATLAB functions

322 ANALYZING AND ACCESSING DATA

MATLAB supports the entire data analysis process from acquiring

data from external devices and databases through preprocessing

visualization and numerical analysis to producing presentation-quality

output

Data Analysis

MATLAB provides interactive tools and command-line functions for data

analysis operations including

Interpolating and decimating

Extracting sections of data scaling and averaging

Thresholding and smoothing

Correlation Fourier analysis and filtering

1-D peak valley and zero finding

Basic statistics and curve fitting

Matrix analysis

Data Access

MATLAB is an efficient platform for accessing data from

files other applications databases and external devices You can read data

from popular file formats such as Microsoft Excel ASCII text or binary

files image sound and video files and scientific files such as HDF and

HDF5 Low-level binary file IO functions let you work with data files in

any format Additional functions let you read data from Web pages and

XML

Visualizing Data

All the graphics features that are required to visualize engineering

and scientific data are available in MATLAB These include 2-D and 3-D

plotting functions 3-D volume visualization functions tools for

interactively creating plots and the ability to export results to all popular

graphics formats You can customize plots by adding multiple axes

changing line colors and markers adding annotation Latex equations and

legends and drawing shapes

2-D Plotting

Visualizing vectors of data with 2-D plotting functions that create

Line area bar and pie charts

Direction and velocity plots

Histograms

Polygons and surfaces

Scatterbubble plots

Animations

3-D Plotting and Volume Visualization

MATLAB provides functions for visualizing 2-D matrices 3-

D scalar and 3-D vector data You can use these functions to visualize and

understand large often complex multidimensional data Specifying plot

characteristics such as camera viewing angle perspective lighting effect

light source locations and transparency

3-D plotting functions include

Surface contour and mesh

Image plots

Cone slice stream and isosurface

323 PERFORMING NUMERIC COMPUTATION

MATLAB contains mathematical statistical and engineering

functions to support all common engineering and science operations These

functions developed by experts in mathematics are the foundation of the

MATLAB language The core math functions use the LAPACK and BLAS

linear algebra subroutine libraries and the FFTW Discrete Fourier

Transform library Because these processor-dependent libraries are

optimized to the different platforms that MATLAB supports they execute

faster than the equivalent C or C++ code

MATLAB provides the following types of functions for performing

mathematical operations and analyzing data

Matrix manipulation and linear algebra

Polynomials and interpolation

Fourier analysis and filtering

Data analysis and statistics

Optimization and numerical integration

Ordinary differential equations (ODEs)

Partial differential equations (PDEs)

Sparse matrix operations

MATLAB can perform arithmetic on a wide range of data types

including doubles singles and integers

CHAPTER 4

IMPLEMENTATION

41 GENERAL

Matlab is a program that was originally designed to simplify the

implementation of numerical linear algebra routines It has since grown into

something much bigger and it is used to implement numerical algorithms

for a wide range of applications The basic language used is very similar to

standard linear algebra notation but there are a few extensions that will

likely cause you some problems at first

42 SNAPSHOTS

ORIGINAL SCLERA IMAGE IS CONVERTED INTO GREY SCALE IMAGE

FIG

GREY SCALE IMAGE IS CONVERTED INTO BINARY IMAGE

FIG

EDGE DETECTON IS DONE BY OTSUrsquoS THRESHOLDING

FIG

SELECTING THE REGION OF INTEREST (SCLERA PART)

FIG

SELECTED ROI PART

FIG

FIG

ENHANCEMENT OF SCLERA IMAGE

FIG

FEATURE EXTRACTION OF SCLERA IMAGE USING GABOR

FILTERS

FIG

MATCHING WITH IMAGES IN DATABASE

FIG

DISPLAYING THE RESULT (MATCHED OR NOT MATCHED)

FIG

CHAPTER 5

APPLICATIONS

The applications of biometrics can be divided into the following three main groups

Commercial applications such as computer network login electronic data security ecommerce Internet access ATM credit card physical access control cellular phone PDA medical records management distance learning etc

Government applications such as national ID card correctional facility driverrsquos license social security welfare-disbursement border control Passport control etc

Forensic applications such as corpse identification criminal investigation terrorist identification parenthood determination missing children etc Traditionally commercial applications have used knowledge-based systems (eg PIN sand passwords) government applications have used token-based systems (eg ID cards and badges) and forensic applications have relied on human experts to match biometric features Biometric systems are being increasingly deployed in large scale civilian applications The Schiphol Premium scheme at the Amsterdam airport for example employs iris scan cards to speed up the passport and visa control procedures

CHAPTER 6

CONCLUSION AND FUTURE SCOPE

61 CONCLUSION

In this paper we proposed a new parallel sclera vein recognition

method which employees a two stage parallel approach for registration and

matching Even though the research focused on developing a parallel sclera

matching solution for the sequential line-descriptor method using CUDA

GPU architecture the parallel strategies developed in this research can be

applied to design parallel solutions to other sclera vein recognition methods

and general pattern recognition methods We designed the Y shape

descriptor to narrow the search range to increase the matching efficiency

which is a new feature extraction method to take advantage of the GPU

structures We developed the WPL descriptor to incorporate mask

information and make it more suitable for parallel computing which can

dramatically reduce data transferring and computation We then carefully

mapped our algorithms to GPU threads and blocks which is an important

step to achieve parallel computation efficiency using a GPU A work flow

which has high arithmetic intensity to hide the memory access latency was

designed to partition the computation task to the heterogeneous system of

CPU and GPU even to the threads in GPU The proposed method

dramatically improves the matching efficiency without compromising

recognition accuracy

62 REFERENCES

[1] C W Oyster The Human Eye Structure and Function Sunderland

Sinauer Associates 1999

[2] C Cuevas D Berjon F Moran and N Garcia ldquoMoving object

detection for real-time augmented reality applications in a GPGPUrdquo IEEE

Trans Consum Electron vol 58 no 1 pp 117ndash125 Feb 2012

[3] D C Cirean U Meier L M Gambardella and J Schmidhuber ldquoDeep

big simple neural nets for handwritten digit recognitionrdquo Neural Comput

vol 22 no 12 pp 3207ndash3220 2010

[4] F Z Sakr M Taher and A M Wahba ldquoHigh performance iris

recognition system on GPUrdquo in Proc ICCES 2011 pp 237ndash242

[5] G Poli J H Saito J F Mari and M R Zorzan ldquoProcessing

neocognitron of face recognition on high performance environment based

on GPU with CUDA architecturerdquo in Proc 20th Int Symp Comput

Archit High Perform Comput 2008 pp 81ndash88

[6] J Antikainen J Havel R Josth A Herout P Zemcik and M Hauta-

Kasari ldquoNonnegative tensor factorization accelerated using GPGPUrdquo IEEE

Trans Parallel Distrib Syst vol 22 no 7 pp 1135ndash1141 Feb 2011

[7] K-S Oh and K Jung ldquoGPU implementation of neural networksrdquo

Pattern Recognit vol 37 no 6 pp 1311ndash1314 2004

[8] P R Dixon T Oonishi and S Furui ldquoHarnessing graphics processors

for the fast computation of acoustic likelihoods in speech recognitionrdquo

Comput Speech Lang vol 23 no 4 pp 510ndash526 2009

[9] P Kaufman and A Alm ldquoClinical applicationrdquo Adlerrsquos Physiology of

the Eye 2003

[10] R N Rakvic B J Ulis R P Broussard R W Ives and N Steiner

ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics Security vol 4

no 4 pp 812ndash823 Dec 2009

[11] S Crihalmeanu and A Ross ldquoMultispectral scleral patterns for ocular

biometric recognitionrdquo Pattern Recognit Lett vol 33 no 14 pp 1860ndash

1869 Oct 2012

[12] W Wenying Z Dongming Z Yongdong L Jintao and G

Xiaoguang ldquoRobust spatial matching for object retrieval and its parallel

implementation on GPUrdquo IEEE Trans Multimedia vol 13 no 6 pp

1308ndash1318 Dec 2011Multimedia Sec Magdeburg Germany Sep 2004

pp 4ndash15

[13] Y Xu S Deka and R Righetti ldquoA hybrid CPU-GPGPU approach for

real-time elastographyrdquo IEEE Trans Ultrason Ferroelectr Freq Control

vol 58 no 12 pp 2631ndash2645 Dec 2011

[14] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive

multimodal eye recognitionrdquo Signal Image Video Process vol 7 no 4

pp 619ndash631 Jul 2013

[15] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive

approach for sclera image quality measurerdquo Int J Biometrics vol 5 no 2

pp 181ndash198 2013

[16] Z Zhou E Y Du N L Thomas and E J Delp ldquoA new human

identification method Sclera recognitionrdquo IEEE Trans Syst Man

Cybern A Syst Humans vol 42 no 3 pp 571ndash583 May 2012

Page 26: Major project report on

high-quality iris image Two distinct features are extracted from the high-

quality iris image The global textural feature is extracted using the 1-D log

polar Gabor transform and the local topological feature is extracted using

Euler numbers An intelligent fusion algorithm combines the textural and

topological matching scores to further improve the iris recognition

performance and reduce the false rejection rate whereas an indexing

algorithm enables fast and accurate iris identification The verification and

identification performance of the propose algorithms is validated and

compared with other algorithms using the CASIA Version 3 ICE 2005 and

UBIRIS iris databases

18 PROPOSED METHOD

Proposed a new parallel sclera vein recognition method using a two-

stage parallel approach for registration and matching A parallel sclera

matching solution for Sclera vein recognition using our sequential line-

descriptor method using the CUDA GPU architecture CUDA is a highly

parallel multithreaded many-core processor with tremendous

computational power

It supports not only a traditional graphics pipeline but also computation

on non-graphical data It is relatively straightforward to implement our C

program for CUDA on AMD-based GPU using Open CL Our CUDA

kernels can be directly converted to Open CL kernels by concerning

different syntax for various keywords and built-in functions The mapping

strategy is also effective in Open CL if we regard thread and block in

CUDA as work item and work-group in Open CL Most of our optimization

techniques such as coalesced memory access and prefix sum can work in

Open CL too Moreover since CUDA is a data parallel architecture the

implementation of our approach by Open CL should be programmed in

data-parallel model

In this research we first discuss why the naiumlve parallel approach would

not work We then propose the new sclera descriptor ndash the Y shape sclera

feature-based efficient registration method to speed up the mapping scheme

introduce the ldquoweighted polar line (WPL) descriptorrdquo that would be better

suited for parallel computing to mitigate the mask size issue and develop

our coarse to fine two-stage matching process to dramatically improve the

matching speed These new approaches make the parallel processing

possible and efficient

191PROPOSED SYSTEM ADVANTAGES

1 To improve the efficiency in this research we propose a new descriptor

mdash the Y shape descriptor which can greatly help improve the efficiency of

the coarse registration of two images and can be used to filter out some

non-matching pairs before refined matching

2 We propose the coarse-to-fine two-stage matching process In the first

stage we matched two images coarsely using the Y-shape descriptors

which is very fast to match because no registration was needed The

matching result in this stage can help filter out image pairs with low

similarities

CHAPTER 2

PROJECT DESCRIPTION

21 INTRODUCTION

The sclera is the opaque and white outer layer of the eye The blood

vessel structure of sclera is formed randomly and is unique to each person

which can be used for humanrsquos identification Several researchers have

designed different Sclera vein recognition methods and have shown that it

is promising to use Sclera vein recognition for human identification In

Crihalmeanu and Ross proposed three approaches Speed Up Robust

Features (SURF)-based method minutiae detection and direct correlation

matching for feature registration and matching Within these three methods

the SURF method achieves the best accuracy It takes an average of 15

seconds1 using the SURF method to per- form a one-to-one matching Zhou

et al proposed line descriptor-based method for sclera vein recognition

The matching step (including registration) is the most time-consuming step

in this sclera vein recognition system which costs about 12 seconds to

perform a one-to-one matching Both speed was calculated using a PC with

Intelreg Coretrade 2 Duo 24GHz processors and 4 GB DRAM Currently

Sclera vein recognition algorithms are designed using central processing

unit (CPU)-based systems

As discussed CPU-based systems are designed as sequential

processing devices which may not be efficient in data processing where the

data can be parallelized Because of large time consumption in the matching

step Sclera vein recognition using sequential-based method would be very

challenging to be implemented in a real time biometric system especially

when there is large number of templates in the database for matching GPUs

(as abbreviation of General purpose Graphics Processing Units GPGPUs)

are now popularly used for parallel computing to improve the

computational processing speed and efficiency The highly parallel

structure of GPUs makes them more effective than CPUs for data

processing where processing can be performed in parallel GPUs have been

widely used in biometrics recognition such as speech recognition text

detection handwriting recognition and face recognition In iris

recognition GPU was used to extract the features construct descriptors

and match templates

GPUs are also used for object retrieval and image search Park et al

designed the performance evaluation of image processing algorithms such

as linear feature extraction and multi-view stereo matching on GPUs

However these approaches were designed for their specific biometric

recognition applications and feature searching methods Therefore they may

not be efficient for Sclera vein recognition Compute Unified Device

Architecture (CUDA) the computing engine of NVIDIA GPUs is used in

this research CUDA is a highly parallel multithreaded many-core

processor with tremendous computational power It supports not only a

traditional graphics pipeline but also computation on non-graphical data

More importantly it offers an easier programming platform which

outperforms its CPU counterparts in terms of peak arithmetic intensity and

memory bandwidth In this research the goal is not to develop a unified

strategy to parallelize all sclera matching methods because each method is

quite different from one another and would need customized design To

develop an efficient parallel computing scheme it would need different

strategies for different Sclera vein recognition methods

Rather the goal is to develop a parallel sclera matching solution for

Sclera vein recognition using our sequential line-descriptor method using

the CUDA GPU architecture However the parallelization strategies

developed in this research can be applied to design parallel approaches for

other Sclera vein recognition methods and help parallelize general pattern

recognition methods Based on the matching approach in there are three

challenges to map the task of sclera feature matching to GPU

1) Mask files are used to calculate valid overlapping areas of two sclera

templates and to align the templates to the same coordinate system But the

mask files are large in size and will preoccupy the GPU memory and slow

down the data transfer Also some of processing on the mask files will

involve convolution which is difficult to improve its performance on the

scalar process unit on CUDA

2) The procedure of sclera feature matching consists of a pipeline of several

computational stages with different memory and processing requirements

There is no uniform mapping scheme applicable for all these stages

3) When the scale of sclera database is far larger than the number of the

processing units on the GPU parallel matching on the GPU is still unable to

satisfy the requirement of real-time performance New designs are

necessary to help narrow down the search range In summary naiumlve

implementation of the algorithms in parallel would not work efficiently

Note it is relatively straightforward to implement our C program for

CUDA on AMD-based GPU using Open CL Our CUDA kernels can be

directly converted to Open CL kernels by concerning different syntax for

various keywords and built-in functions The mapping strategy is also

effective in Open CL if we regard thread and block in CUDA as work item

and work-group in Open CL Most of our optimization techniques such as

coalesced memory access and prefix sum can work in Open CL too

Moreover since CUDA is a data parallel architecture the implementation

of our approach by Open CL should be programmed in data-parallel model

In this research we first discuss why the naiumlve parallel approach would not

work (Section 3) We then propose the new sclera descriptor ndash the Y shape

sclera feature-based efficient registration method to speed up the mapping

scheme (Section 4) introduce the ldquoweighted polar line (WPL) descriptorrdquo

that would be better suited for parallel computing to mitigate the mask size

issue (Section 5) and develop our coarse to fine two-stage matching

process to dramatically improve the matching speed (Section 6) These new

approaches make the parallel processing possible and efficient However it

is non-trivial to implement these algorithms in CUDA We then developed

the implementation schemes to map our algorithms into CUDA (Section 7)

In the Section 2 we give brief introduction of Sclera vein recognition In

the Section 8 we performed some experiments using the proposed system

In the Section 9 we draw some conclusions

22 BACKGROUND OF SCLERA VEIN RECOGNITION

221 OVERVIEW OF SCLERA VEIN RECOGNITION

A typical sclera vein recognition system includes sclera

segmentation feature enhancement feature extraction and feature

matching (Figure 1)

FIG

Sclera image segmentation is the first step in sclera vein recognition

Several methods have been designed for sclera segmentation Crihalmeanu

et al presented an semi-automated system for sclera segmentation They

used a clustering algorithm to classify the color eye images into three

clusters - sclera iris and background Later on Crihalmeanu and Ross

designed a segmentation approach based on a normalized sclera index

measure which includes coarse sclera segmentation pupil region

segmentation and fine sclera segmentation Zhou et al developed a skin

tone plus ldquowhite colorrdquo-based voting method for sclera segmentation in

color images and Otsursquos thresholding-based method for grayscale images

After sclera segmentation it is necessary to enhance and extract the sclera

features since the sclera vein patterns often lack contrast and are hard to

detect Zhou et al used a bank of multi-directional Gabor filters for

vascular pattern enhancement Derakhshani et al used contrast limited

adaptive histogram equalization (CLAHE) to enhance the green color plane

of the RGB image and a multi-scale region growing approach to identify

the sclera veins from the image background Crihalmeanu and Ross applied

a selective enhancement filter for blood vessels to extract features from the

green component in a color image In the feature matching step

Crihalmeanu and Ross proposed

three registration and matching approaches including Speed Up Robust

Features (SURF) which is based on interest-point detection minutiae

detection which is based on minutiae points on the vasculature structure

and direct correlation matching which relies on image registration Zhou et

al designed a line descriptor based feature registration and matching

method

The proposed sclera recognition consists of five steps which include

sclera segmentation vein pattern enhancement feature extraction feature

matching and matching decision Fig 2 shows the block diagram of sclera

recognition Two types of feature extraction are used in the proposed

method to achieve good accuracy for the identification The characteristics

that are elicited from the blood vessel structure seen in the sclera region are

Histogram of Oriented Gradients (HOG) and interpolation of Cartesian to

Polar conversion HOG is used to determine the gradient orientation and

edge orientations of vein pattern in the sclera region of an eye image To

become more computationally efficient the data of the image are converted

to the polar form It is mainly used for circular or quasi circular shape of

object These two characteristics are extracted from all the images in the

database and compared with the features of the query image whether the

person is correctly identified or not This procedure is done in the feature

matching step and ultimately makes the matching decision By using the

proposed feature extraction methods and matching techniques the human

identification is more accurate than the existing studies In the proposed

method two features of an image are drawn out

222 SCLERA SEGMENTATION

Sclera segmentation is the first step in the sclera recognition It lets

in three steps glare area detection sclera area estimation and iris and eyelid

detection and refinement Fig shows the steps of segmentation

FIG

Glare Area Detection Glare area means a small bright area near

pupil or iris This is the unwanted portion on the eye image Sobel filter is

applied to detect the glare area present in the iris or pupil Simply it runs

only for the grayscale image If the image is color then it needs a

conversion to grayscale image and after that apply it to the Sobel filter to

detect the glare area Fig 4 shows the result of the glare area detection

FIG

Sclera area estimation For the estimation of sclera area Otsursquos

thresholding method is applied The stairs of the sclera area detection are

selection of the area of interest (ROI) Otsursquos thresholding sclera area

detection Left and right sclera area is selected based on the iris boundaries

When the region of interest is selected then apply Otsursquos thresholding for

obtaining the potential sclera areas The correct left sclera area should be

placed in the right and center positions and correct right sclera area should

be placed in the left and center In this way non sclera areas are wiped out

223 IRIS AND EYELID REFINEMENT

The top and underside of the sclera regions are the limits of the

sclera area And then that upper eyelid lower eyelid and iris boundaries are

refined These altogether are the unwanted portion for recognition In order

to eliminate these effects refinement is done in the footstep of the detection

of sclera area Fig shows after the Otsursquos thresholding process and iris and

eyelid refinement to detect right sclera area In the same way the left sclera

area is detected using this method

FIG

In the segmentation process all images are not perfectly segmented

Hence feature extraction and matching are needed to reduce the

segmentation fault The vein patterns in the sclera area are not visible in the

segmentation process To get vein patterns more visible vein pattern

enhancement is to be performed

224 OCULAR SURFACE VASCULATURE

Human recognition using vascular patterns in the human body has

been studied in the context of fingers (Miura et al 2004) palm (Lin and

Fan 2004) and retina (Hill 1999) In the case of retinal biometrics an

especial optical device for imaging the back of the eyeball is needed (Hill

1999) Due to its perceived invasiveness and the required degree of subject

cooperation the use of retinal biometrics may not be acceptable to some

individuals The conjunctiva is a thin transparent and moist tissue that

covers the outer surface of the eye The part of the conjunctiva that covers

the inner lining of the eyelids is called palpebral conjunctiva and the part

that covers the outer surface of the eye is called ocular (or the bulbar)

conjunctiva which is the focus of this study The ocular conjunctiva is very

thin and clear thus the vasculature (including those of the episclera) is

easily visible through it The visible microcirculation of conjunctiva offers a

rich and complex network of veins and fine microcirculation (Fig 1) The

apparent complexity and specificity of these vascular patterns motivated us

to utilize them for personal identification (Derakhshani and Ross 2006)

FIG

We have found conjunctival vasculature to be a suitable biometric as it

conforms to the following criteria (Jain et al 2004)

UNIVERSALITY All normal living tissues including that of the

conjunctiva and episclera have vascular structure

UNIQUENESS Vasculature is created during embryonic vasculogenesis

Its detailed final structure is mostly stochastic and thus unique Even

though no comprehensive study on the uniqueness of vascular structures

has been conducted study of some targeted areas such as those of the eye

fundus confirm the uniqueness of such vascular patterns even between

identical twins (Simon and Goldstein 1935 Tower 1955)

PERMANENCE Other than cases of significant trauma pathology or

chemical intervention spontaneous adult ocular vasculogenesis and

angiogenesis does not easily occur Thus the conjunctival vascular

structure is expected to have reasonable permanence (Joussen 2001)

Practicality Conjunctival vasculature can be captured with commercial off

the shelf digital cameras under normal lighting conditions making this

modality highly practical

ACCEPTABILITY Since the subject is not required to stare directly into

the camera lens and given the possibility of capturing the conjunctival

vasculature from several feet away this modality is non-intrusive and thus

more acceptable

SPOOF-PROOFNESS The fine multi surface structure of the ocular

veins makes them hard to reproduce as a physical artifact Besides being a

stand-alone biometric modality we anticipate that the addition of

conjunctival biometrics will enhance the performance of current iris-based

biometric system in the following ways

Improving accuracy by the addition of vascular features

Facilitating recognition using off-angle iris images For instance if the iris

information is relegated to the left or right portions of the eye the sclera

vein patterns will be further exposed This feature makes sclera vasculature

a natural complement to the iris biometric

Addressing the failure-to-enroll issue when iris patterns are not usable (eg

due to surgical procedures)

Reducing vulnerability to spoof attacks For instance when implemented

alongside iris systems an attacker needs to reproduce not only the iris but

also different surfaces of the sclera along with the associated

microcirculation and make them available on commensurate eye surfaces

The first step in parallelizing an algorithm is to determine the

availability for simultaneous computation Below Figure demonstrates the

possibility for parallel directional filtering Since the filter is computed over

different portions of the input image the computation can be computed in

parallel (denoted by Elements below) In addition individual parallelization

of each element of Filtering can also be performed A detailed discussion of

our proposed parallelization is outside the scope of this paper

FIG

FIG

225 OVERVIEW OF THE LINE DESCRIPTOR-BASED SCLERA

VEIN

2251 RECOGNITION METHOD

The matching segment of the line-descriptor based method is a

bottleneck with regard to matching speed In this section we briefly

describe the Line Descriptor-based sclera vein recognition method After

segmentation vein patterns were enhanced by a bank of directional Gabor

filters Binary morphological operations are used to thin the detected vein

structure down to a single pixel wide skeleton and remove the branch

points The line descriptor is used to describe the segments in the vein

structure Figure 2 shows a visual description of the line descriptor Each

segment is described by three quantities the segments angle to some

reference angle at the iris center θ the segments distance to the iris center r

and the dominant angular orientation of the line segment ɸ Thus the

descriptor is S = ( θ r ɸ )T The individual components of the line descriptor

are calculated as

FIG

Here fline (x) is the polynomial approximation of the line segment (xl yl )

is the center point of the line segment (xi yi ) is the center of the detected

iris and S is the line descriptor In order to register the segments of the

vascular patterns a RANSAC-based algorithm is used to estimate the best-

fit parameters for registration between the two sclera vascular patterns For

the registration algorithm it randomly chooses two points ndash one from the

test template and one from the target template It also randomly chooses a

scaling factor and a rotation value based on a priori knowledge of the

database Using these values it calculates a fitness value for the registration

using these parameters

After sclera template registration each line segment in the test

template is compared to the line segments in the target template for

matches In order to reduce the effect of segmentation errors we created the

weighting image (Figure 3) from the sclera mask by setting interior pixels

in the sclera mask to 1 pixels within some distance of the boundary of the

mask to 05 and pixels outside the mask to 0

The matching score for two segment descriptors is calculated By

where Si and Sj are two segment descriptors m(Si Sj ) is the matching

score between segments Si and Sj d(Si Sj ) is the Euclidean distance

between the segment descriptors center points (from Eq 6-8) Dmatch is

the matching distance threshold and match is the matching angle threshold

The total matching score M is the sum of the individual matching scores

divided by the maximum matching score for the minimal set between the

test and target template That is one of the test or target templates has fewer

points and thus the sum of its descriptors weight sets the maximum score

that can be attained

FIG

FIG

FIG

FIG

movement of eye Y shape branches are observed to be a stable feature and

can be used as sclera feature descriptor To detect the Y shape branches in

the original template we search for the nearest neighbors set of every line

segment in a regular distance classified the angles among these neighbors

If there were two types of angle values in the line segment set this set may

be inferred as a Y shape structure and the line segment angles would be

recorded as a new feature of the sclera

There are two ways to measure both orientation and relationship of

every branch of Y shape vessels one is to use the angles of every branch to

x axle the other is to use the angels between branch and iris radial

direction The first method needs additional rotation operating to align the

template In our approach we employed the second method As Figure 6

shows ϕ1 ϕ2 and ϕ3 denote the angle between each branch and the radius

from pupil center Even when the head tilts the eye moves or the camera

zooms occurs at the image acquisition step ϕ1 ϕ2 and ϕ3 are quite stable

To tolerate errors from the pupil center calculation in the segmentation step

we also recorded the center position (x y) of the Y shape branches as

auxiliary parameters So in our rotation shift and scale invariant feature

vector is defined as y(ϕ1 ϕ2 ϕ3 x y) The Y-shape descriptor is generated

with reference to the iris center Therefore it is automatically aligned to the

iris centers It is a rotational- and scale- invariant descriptor V WPL

SCLERA DESCRIPTOR As we discussed in the Section 22 the line

descriptor is extracted from the skeleton of vessel structure in binary images

(Figure 7) The skeleton is then broken into smaller segments For each

segment a line descriptor is created to record the center and orientation of

the segment This descriptor is expressed as s(x yɸ) where (x y) is the

position of the center and ɸ is its orientation Because of the limitation of

segmentation accuracy the descriptor in the boundary of sclera area might

not be accurate and may contain spur edges resulting from the iris eyelid

andor eyelashes To be tolerant of such error the mask file

FIG

The line descriptor of the sclera vessel pattern (a) An eye image (b) Vessel

patterns in sclera (c) Enhanced sclera vessel patterns (d) Centers of line

segments of vessel patterns

Is designed to indicate whether a line segment belongs to the edge of the

sclera or not However in GPU application using the mask is a challenging

since the mask files are large in size and will occupy the GPU memory and

slow down the data transfer When matching the registration RANSAC

type algorithm was used to randomly select the corresponding descriptors

and the transform parameter between them was used to generate the

template transform affine matrix After every templates transform the mask

data should also be transformed and new boundary should be calculated to

evaluate the weight of the transformed descriptor This results in too many

convolutions in processor unit

To reduce heavy data transfer and computation we designed the

weighted polar line (WPL) descriptor structure which includes the

information of mask and can be automatically aligned We extracted the

relationship of geometric feature of descriptors and store them as a new

descriptor We use a weighted image created via setting various weight

values according to their positions The weight of those descriptors who are

beyond the sclera are set to be 0 and those who are near the sclera

boundary are 05 and interior descriptors are set to be 1 In our work

descriptors weights were calculated on their own mask by the CPU only

once

The calculating result was saved as a component of descriptor The

descriptor of sclera will change to s(x y ɸw) where w denotes the weight

of the point and the value may be 0 05 1 To align two templates when a

template is shifted to another location along the line connecting their

centers all the descriptors of that template will be transformed It would be

faster if two templates have similar reference points If we use the center of

the iris as the reference point when two templates are compared the

correspondence will automatically be aligned to each other since they have

the similar reference point Every feature vector of the template is a set of

line segment descriptors composed of three variable (Figure 8) the

segment angle to the reference line which went through the iris center

denoted as θ the distance between the segments center and pupil center

which is denoted as r the dominant angular orientation of the segment

denoted as ɸ To minimize the GPU computing we also convert the

descriptor value from polar coordinate to rectangular coordinate in CPU

preprocess

The descriptor vector becomes s(x y r θ ɸw) The left and right

parts of sclera in an eye may have different registration parameters For

example as an eyeball moves left left part sclera patterns of the eye may be

compressed while the right part sclera patterns are stretched

In parallel matching these two parts are assigned to threads in

different warps to allow different deformation The multiprocessor in

CUDA manages threads in groups of 32 parallel threads called warps We

reorganized the descriptor from same sides and saved

FIG

FIG

them in continuous address This would meet requirement of coalesced

memory access in GPU

After reorganizing the structure of descriptors and adding mask information

into the new descriptor the computation on the mask file is not needed on

the GPU It was very fast to match with this feature because it does not

need to reregister the templates every time after shifting Thus the cost of

data transfer and computation on GPU will be reduced Matching on the

new descriptor the shift parameters generator in Figure 4 is then simplified

as Figure 9

23 EVOLUTION OF GPU ARCHITECTURE

The fixed-function pipeline lacked the generality to efficiently express

more complicated shading and lighting operations that are essential for

complex effects The key step was replacing the fixed-function per-vertex

and per-fragment operations with user-specified programs run on each

vertex and fragment Over the past six years these vertex programs and

fragment programs have become increasingly more capable with larger

limits on their size and resource consumption with more fully featured

instruction sets and with more flexible control-flow operations After many

years of separate instruction sets for vertex and fragment operations current

GPUs support the unified Shader Model 40 on both vertex and fragment

shaders

The hardware must support shader programs of at least 65 k static

instructions and unlimited dynamic instructions

The instruction set for the first time supports both 32-bit integers and 32-

bit floating-point numbers

The hardware must allow an arbitrary number of both direct and indirect

reads from global memory (texture)

Finally dynamic flow control in the form of loops and branches must be

supported

As the shader model has evolved and become more powerful and GPU

applications of all types have increased vertex and fragment program

complexity GPU architectures have increasingly focused on the

programmable parts of the graphics pipeline Indeed while previous

generations of GPUs could best be described as additions of

programmability to a fixed-function pipeline todayrsquos GPUs are better

characterized as a programmable engine surrounded by supporting fixed-

function units General-Purpose Computing on the GPU Mapping general-

purpose computation onto the GPU uses the graphics hardware in much the

same way as any standard graphics application Because of this similarity it

is both easier and more difficult to explain the process On one hand the

actual operations are the same and are easy to follow on the other hand the

terminology is different between graphics and general-purpose use Harris

provides an excellent description of this mapping process

We begin by describing GPU programming using graphics terminology

then show how the same steps are used in a general-purpose way to author

GPGPU applications and finally use the same steps to show the more

simple and direct way that todayrsquos GPU computing applications are written

231 PROGRAMMING A GPU FOR GRAPHICS

We begin with the same GPU pipeline that we described in Section II

concentrating on the programmable aspects of this pipeline

The programmer specifies geometry that covers a region on the screen

The rasterizer generates a fragment at each pixel location covered by that

geometry

Each fragment is shaded by the fragment program

The fragment program computes the value of the fragment by a

combination of math operations and global memory reads from a global

Btexture[ memory

The resulting image can then be used as texture on future passes through

the graphics pipeline

232 PROGRAMMING A GPU FOR GENERAL-PURPOSE

PROGRAMS (OLD)

Coopting this pipeline to perform general-purpose computation

involves the exact same steps but different terminology A motivating

example is a fluid simulation computed over a grid at each time step we

compute the next state of the fluid for each grid point from the current state

at its grid point and at the grid points of its neighbors

The programmer specifies a geometric primitive that covers a

computation domain of interest The rasterizer generates a fragment at each

pixel location covered by that geometry (In our example our primitive

must cover a grid of fragments equal to the domain size of our fluid

simulation)

Each fragment is shaded by an SPMD general-purpose fragment

program (Each grid point runs the same program to update the state of its

fluid)

The fragment program computes the value of the fragment by a

combination of math operations and Bgather[ accesses from global

memory (Each grid point can access the state of its neighbors from the

previous time step in computing its current value)

The resulting buffer in global memory can then be used as an input on

future passes (The current state of the fluid will be used on the next time

step)

23 2 PROGRAMMING A GPU FOR GENERAL-PURPOSE

PROGRAMS (NEW)

One of the historical difficulties in programming GPGPU applications

has been that despite their general-purpose tasksrsquo having nothing to do with

graphics the applications still had to be programmed using graphics APIs

In addition the program had to be structured in terms of the graphics

pipeline with the programmable units only accessible as an intermediate

step in that pipeline when the programmer would almost certainly prefer to

access the programmable units directly The programming environments we

describe in detail in Section IV are solving this difficulty by providing a

more natural direct non-graphics interface to the hardware and

specifically the programmable units Today GPU computing applications

are structured in the following way

The programmer directly defines the computation domain of interest as a

structured grid of threads

An SPMD general-purpose program computes the value of each thread

The value for each thread is computed by a combination of math

operations and both Bgather[ (read) accesses from and Bscatter[ (write)

accesses to global memory Unlike in the previous two

methods the same buffer can be used for both reading and writing

allowing more flexible algorithms (for example in-place algorithms that

use less memory)

The resulting buffer in global memory can then be used as an input in

future computation

24 COARSE-TO-FINE TWO-STAGE MATCHING PROCESS

To further improve the matching process we propose the coarse-to-fine

two-stage matching process In the first stage we matched two images

coarsely using the Y-shape descriptors which is very fast to match because

no registration was needed The matching result in this stage can help filter

out image pairs with low similarities After this step it is still possible for

some false positive matches In the second stage we used WPL descriptor

to register the two images for more detailed descriptor matching including

scale- and translation invariance This stage includes shift transform affine

matrix generation and final WPL descriptor matching Overall we

partitioned the registration and matching processing into four kernels2 in

CUDA (Figure 10) matching on the Y shape descriptor shift

transformation affine matrix generation and final WSL descriptor

matching Combining these two stages the matching program can run faster

and

achieve more accurate score

241 STAGE I MATCHING WITH Y SHAPE DESCRIPTOR

Due to scale- and rotation- invariance of the Y-shape features

registration is unnecessary before matching on Y shape descriptor The

whole matching algorithm is listed as algorithm 1

FIG

Here ytei and yta j are the Y shape descriptors of test template Tte

and target template Tta respectively dϕ is the Euclidian distance of angle

element of descriptors vector defined as (3) dxy is the Euclidian distance of

two descriptor centers defined as (4) ni and di are the matched descriptor

pairsrsquo number and their centers distance respectively tϕ is a distance

threshold and txy is the threshold to restrict the searching area We set tϕ to

30 and txy to 675 in our experiment Here

To match two sclera templates we searched the areas nearby to all

the Y shape branches The search area is limited to the corresponding left or

right half of the sclera in order to reduce the searching range and time The

distance of two branches is defined in (3) where ϕi j is the angle between

the j th branch and the polar from pupil center in desctiptor i

The number of matched pairs ni and the distance between Y shape

branches centers di are stored as the matching result We fuse the number of

matched branches and the average distance between matched branches

centers as (2) Here α is a factor to fuse the matching score which was set

to 30 in our study Ni and Nj is the total numbers of feature vectors in

template i and j separately The decision is regulated by the threshold t if

the sclerarsquos matching score is lower than t the sclera will be discarded The

sclera with high matching score will be passed to the next more precisely

matching process

242 STAGE II FINE MATCHING USING WPL DESCRIPTOR

The line segment WSL descriptor reveals more vessel structure detail of

sclera than the Y shape descriptor The variation of sclera vessel pattern is

nonlinear because

When acquiring an eye image in different gaze angle the vessel structure

will appear nonlinear shrink or extend because eyeball is spherical in shape

sclera is made up of four layers episclera stroma lamina fusca and

endothelium There are slightly differences among movement of these

layers Considering these factors our registration employed both single

shift transform and multi-parameter transform which combines shift

rotation and scale together

1) SHIFT PARAMETER SEARCH As we discussed before

segmentation may not be accurate As a result the detected iris center could

not be very accurate Shift transform is designed to tolerant possible errors

in pupil center detection in the segmentation step If there is no deformation

or only very minor deformation registration with shift transform together

would be adequate to achieve an accurate result We designed Algorithm 2

to get optimized shift parameter Where Tte is the test template and ssei is

the i th WPL descriptor of Tte Tta is the target template and ssai is the i th

WPL descriptor of Ttad(stek staj ) is Euclidean distance of escriptors stek

and staj

Δsk is the shift value of two descriptors defines as

We first randomly select an equal number of segment descriptors

stek in test template Tte from each quad and find its nearest neighbors staj _

in target template Tta The shift offset of them is recorded as the possible

registration shift factor _sk The final offset registration factor is _soptim

which has the smallest standard deviation among these candidate offsets

2) AFFINE TRANSFORM PARAMETER SEARCH

Affine transform is designed to tolerant some deformation of sclera

patterns in the matching step The affine transform algorithm is shown in

Algorithm 3 The shift value in the parameter set is obtained by randomly

selecting descriptor s(it )te and calculating the distance from its nearest

neighbor staj_ in Tta We transform the test template by the matrix in (7)

At end of the iteration we count the numbers of matched descriptor pairs

from the transformed template and the target template The factor β is

involved to determine if the pair of descriptor is matched and we set it to

be 20 pixels in our experiment After N iterations the optimized transform

parameter set is determined via selecting the maximum matching numbers

m(it) Here stei Tte staj and Tta is defined same as algorithm 2 tr (it )

shi f t θ(it )tr (it ) scale is the parameters of shift rotation and scale

transform generated in i tth iteration R(θ (it )) T (tr (it ) shi f t ) and S(tr (it

) scale) are the transform matrix defined as (7) To search optimize

transform parameter we iterated N times to generate these parameters In

our experiment we set iteration time to 512

3) REGISTRATION AND MATCHING ALGORITHM

Using the optimized parameter set determined from Algorithms 2

and 3 the test template will be registered and matched simultaneously The

registration and matching algorithm is listed in Algorithm 4 Here stei Tte

staj and Tta are defined same as Algorithms 2 and 3 θ(optm) tr (optm) shi f

t tr (optm) scale _soptim are the registration parameters attained from

Algorithms 2 and 3 R_θ(optm)_T _tr (optm) shi f t _S_tr (optm) scale

_ is the descriptor transform matrix defined in Algorithm 3 empty is the angle

between the segment descriptor and radius direction w is the weight of the

descriptor which indicates whether the descriptor is at the edge of sclera or

not To ensure that the nearest descriptors have a similar orientation we

used a constant factor α to check the abstract difference of two ɸ In our

experiment we set α to 5 The total matching score is minimal score of two

transformed result divided by the minimal matching score for test template

and target template

25 MAPPING THE SUBTASKS TO CUDA

CUDA is a single instruction multiple data (SIMD) system and

works as a coprocessor with a CPU A CUDA consists of many streaming

multiprocessors (SM) where the parallel part of the program should be

partitioned into threads by the programmer and mapped into those threads

There are multiple memory spaces in the CUDA memory hierarchy

register local memory shared memory global memory constant memory

and texture memory Register local memory and shared memory are on-

chip and could be a little time consuming to access these memories Only

shared memory can be accessed by other threads within the same block

However there is only limited availability of shared memory Global

memory constant memory and texture memory are off-chip memory and

accessible by all threads which would be very time consuming to access

these memories

Constant memory and texture memory are read-only and cacheable

memory Mapping algorithms to CUDA to achieve efficient processing is

not a trivial task There are several challenges in CUDA programming

If threads in a warp have different control path all the branches will be

executed serially To improve performance branch divergence within a

warp should be avoided

Global memory is slower than on-chip memory in term of access To

completely hide the latency of the small instructions set we should use on-

chip memory preferentially rather than global memory When global

memory access occurs threads in same warp should access the words in

sequence to achieve coalescence

Shared memory is much faster than the local and global memory space

But shared memory is organized into banks which are equal in size If two

addresses of memory request from different thread within a warp fall in the

same memory bank the access will be serialized To get maximum

performance memory requests should be scheduled to minimize bank

conflicts

251 MAPPING ALGORITHM TO BLOCKS

Because the proposed registration and matching algorithm has four

independent modules all the modules will be converted to different kernels

on the GPU These kernels are different in computation density thus we

map them to the GPU by various map strategies to fully utilize the

computing power of CUDA Figure 11 shows our scheme of CPU-GPU

task distribution and the partition among blocks and threads Algorithm 1 is

partitioned into coarse-grained parallel subtasks

We create a number of threads in this kernel The number of threads

is the same as the number of templates in the database As the upper middle

column shows in Figure 11 each target template will be assigned to one

thread One thread performs a pair of templates compare In our work we

use NVIDIA C2070 as our GPU Threads and blocks number is set to

1024 That means we can match our test template with up to 1024times1024

target templates at same time

Algorithms 2-4 will be partitioned into fine-grained subtasks which is

processed a section of descriptors in one thread As the lower portion of the

middle column shows in Figure 11 we assigned a target template to one

block Inside a block one thread corresponds a set of descriptors in this

template This partition makes every block execute independently and there

are no data exchange requirements between different blocks When all

threads complete their responding descriptor fractions the sum of the

intermediate results needs to be computed or compared A parallel prefix

sum algorithm is used to calculate the sum of intermediate results which is

show in right of Figure 11 Firstly all odd number threads compute the sum

of consecutive pairs of the results Then recursively every first of i (= 4 8

16 32 64 ) threads

compute the prefix sum on the new result The final result will be saved in

the first address which has the same variable name as the first intermediate

result

252 MAPPING INSIDE BLOCK

In shift argument searching there are two schemes we can choose to

map task

Mapping one pair of templates to all the threads in a block and then every

thread would take charge of a fraction of descriptors and cooperation with

other threads

Assigning a single possible shift offset to a thread and all the threads will

compute independently unless the final result should be compared with

other possible offset

Due to great number of sum and synchronization operations in every

nearest neighbor searching step we choose the second method to parallelize

shift searching In affine matrix generator we mapped an entire parameter

set searching to a thread and every thread randomly generated a set of

parameters and tried them independently The generated iterations were

assigned to all threads The challenge of this step is the randomly generated

numbers might be correlated among threads In the step of rotation and

scale registration generating we used the Mersenne Twister pseudorandom

number generator because it can use bitwise arithmetic and have long

period

The Mersenne twister as most of pseudorandom generators is iterative

Therefore itrsquos hard to parallelize a single twister state update step among

several execution threads To make sure that thousands of threads in the

launch grid generate uncorrelated random sequence many simultaneous

Mersenne twisters need to process with different initial states in parallel

But even ldquovery differentrdquo (by any definition) initial state values do not

prevent the emission of correlated sequences by each generator sharing

identical parameters To solve this problem and to enable efficient

implementation of Mersenne Twister on parallel architectures we used a

special offline tool for the dynamic creation of Mersenne Twisters

parameters modified from the algorithm developed by Makoto Matsumoto

and Takuji Nishimura In the registration and matching step when

searching the nearest neighbor a line segment that has already matched

with others should not be used again In our approach a flag

FIG

FIG

Variable denoting whether the line has been matched is stored in

shared memory To share the flags all the threads in a block should wait

synchronic operation at every query step Our solution is to use a single

thread in a block to process the matching

253 MEMORY MANAGEMENT

The bandwidth inside GPU board is much higher than the

bandwidth between host memory and device memory The data transfer

between host and device can lead to long latency As shown in Figure 11

we load the entire target templates set from database without considering

when they would be processed Therefore there was no data transfer from

host to device during the matching procedure In global memory the

components in descriptor y(ϕ1 ϕ2 ϕ3 x y) and s(x y rθ ϕw) were stored

separately This would guarantee contiguous kernels of Algorithm 2 to 4

can access their data in successive addresses Although such coalescing

access reduces the latency frequently global memory access was still a

slower way to get data In our kernel we loaded the test template to shared

memory to accelerate memory access Because the Algorithms 2 to 4

execute different number of iterations on same data the bank conflict does

not happen To maximize our texture memory space we set the system

cache to the lowest value and bonded our target descriptor to texture

memory Using this catchable memory our data access was accelerated

more

FIG

26 HISTOGRAM OF ORIENTED GRADIENTS

Histogram of oriented gradients is the feature descriptors It is primarily

applied to the design of target detection In this paper it is applied as the

feature for human recognition In the sclera region the vein patterns are the

edges of an image So HOG is used to determine the gradient orientation

and edge orientations of vein pattern in the sclera region of an eye image

To follow out this technique first of all divide the image into small

connected regions called cells For each cell compute the histogram of

gradient directions or edge orientations of the pixels Then the combination

of different histogram of different cell represents the descriptor To improve

accuracy histograms can be contrast normalized by calculating the intensity

from the block and then using this value normalizes all cells within the

block This normalization result shows that it is invariant to geometric and

photometric changes The gradient magnitude m(x y) and orientation 1050592(x

y) are calculated using x and y directions gradients dx (x y) and dy (x y)

Orientation binning is the second step of HOG This method utilized

to create cell histograms Each pixel within the cell used to give a weight to

the orientation which is found in the gradient computation Gradient

magnitude is used as the weight The cells are in the rectangular form The

binning of gradient orientation should be spread over 0 to 180 degrees and

opposite direction counts as the same In the Fig 8 depicts the edge

orientation of picture elements If the images have any illumination and

contrast changes then the gradient strength must be locally normalized For

that cells are grouped together into larger blocks These blocks are

overlapping so that each cell contributes more than once to the final

descriptor Here rectangular HOG (R-HOG) blocks are applied which are

mainly in square grids The performance of HOG is improved by putting

on a Gaussian window into each block

FIG

CHAPTER 3

SOFTWARE SPECIFICATION

31 GENERAL

MATLAB(matrix laboratory) is a numerical

computing environment and fourth-generation programming language

Developed by Math Works MATLAB allows matrix manipulations

plotting of functions and data implementation of algorithms creation

of user interfaces and interfacing with programs written in other languages

including C C++ Java and Fortran

Although MATLAB is intended primarily for numerical computing an

optional toolbox uses the MuPAD symbolic engine allowing access

to symbolic computing capabilities An additional package Simulink adds

graphicalmulti-domainsimulationandModel-Based

Design for dynamic and embedded systems

In 2004 MATLAB had around one million users across industry

and academia MATLAB users come from various backgrounds

of engineering science and economics MATLAB is widely used in

academic and research institutions as well as industrial enterprises

MATLAB was first adopted by researchers and practitioners

in control engineering Littles specialty but quickly spread to many other

domains It is now also used in education in particular the teaching

of linear algebra and numerical analysis and is popular amongst scientists

involved in image processing The MATLAB application is built around the

MATLAB language The simplest way to execute MATLAB code is to type

it in the Command Window which is one of the elements of the MATLAB

Desktop When code is entered in the Command Window MATLAB can

be used as an interactive mathematical shell Sequences of commands can

be saved in a text file typically using the MATLAB Editor as a script or

encapsulated into a function extending the commands available

MATLAB provides a number of features for documenting and

sharing your work You can integrate your MATLAB code with other

languages and applications and distribute your MATLAB algorithms and

applications

32 FEATURES OF MATLAB

High-level language for technical computing

Development environment for managing code files and data

Interactive tools for iterative exploration design and problem solving

Mathematical functions for linear algebra statistics Fourier analysis

filtering optimization and numerical integration

2-D and 3-D graphics functions for visualizing data

Tools for building custom graphical user interfaces

Functions for integrating MATLAB based algorithms with external

applications and languages such as C C++ FORTRAN Javatrade COM

and Microsoft Excel

MATLAB is used in vast area including signal and image

processing communications control design test and measurement

financial modeling and analysis and computational Add-on toolboxes

(collections of special-purpose MATLAB functions) extend the MATLAB

environment to solve particular classes of problems in these application

areas

MATLAB can be used on personal computers and powerful

server systems including the Cheaha compute cluster With the addition of

the Parallel Computing Toolbox the language can be extended with parallel

implementations for common computational functions including for-loop

unrolling Additionally this toolbox supports offloading computationally

intensive workloads to Cheaha the campus compute cluster MATLAB is

one of a few languages in which each variable is a matrix (broadly

construed) and knows how big it is Moreover the fundamental operators

(eg addition multiplication) are programmed to deal with matrices when

required And the MATLAB environment handles much of the bothersome

housekeeping that makes all this possible Since so many of the procedures

required for Macro-Investment Analysis involves matrices MATLAB

proves to be an extremely efficient language for both communication and

implementation

321 INTERFACING WITH OTHER LANGUAGES

MATLAB can call functions and subroutines written in the C

programming language or FORTRAN A wrapper function is created

allowing MATLAB data types to be passed and returned The dynamically

loadable object files created by compiling such functions are termed MEX-

files (for MATLAB executable)

Libraries written in Java ActiveX or NET can be directly called

from MATLAB and many MATLAB libraries (for

example XML or SQL support) are implemented as wrappers around Java

or ActiveX libraries Calling MATLAB from Java is more complicated but

can be done with MATLAB extension which is sold separately by Math

Works or using an undocumented mechanism called JMI (Java-to-Mat lab

Interface) which should not be confused with the unrelated Java that is also

called JMI

As alternatives to the MuPAD based Symbolic Math Toolbox

available from Math Works MATLAB can be connected

to Maple or Mathematical

Libraries also exist to import and export MathML

Development Environment

Startup Accelerator for faster MATLAB startup on Windows especially on

Windows XP and for network installations

Spreadsheet Import Tool that provides more options for selecting and

loading mixed textual and numeric data

Readability and navigation improvements to warning and error messages in

the MATLAB command window

Automatic variable and function renaming in the MATLAB Editor

Developing Algorithms and Applications

MATLAB provides a high-level language and development

tools that let you quickly develop and analyze your algorithms and

applications

The MATLAB Language

The MATLAB language supports the vector and matrix operations

that are fundamental to engineering and scientific problems It enables fast

development and execution With the MATLAB language you can

program and develop algorithms faster than with traditional languages

because you do not need to perform low-level administrative tasks such as

declaring variables specifying data types and allocating memory In many

cases MATLAB eliminates the need for lsquoforrsquo loops As a result one line of

MATLAB code can often replace several lines of C or C++ code

At the same time MATLAB provides all the features of a traditional

programming language including arithmetic operators flow control data

structures data types object-oriented programming (OOP) and debugging

features

MATLAB lets you execute commands or groups of commands one

at a time without compiling and linking enabling you to quickly iterate to

the optimal solution For fast execution of heavy matrix and vector

computations MATLAB uses processor-optimized libraries For general-

purpose scalar computations MATLAB generates machine-code

instructions using its JIT (Just-In-Time) compilation technology

This technology which is available on most platforms provides

execution speeds that rival those of traditional programming languages

Development Tools

MATLAB includes development tools that help you implement

your algorithm efficiently These include the following

MATLAB Editor

Provides standard editing and debugging features such as setting

breakpoints and single stepping

Code Analyzer

Checks your code for problems and recommends modifications to

maximize performance and maintainability

MATLAB Profiler

Records the time spent executing each line of code

Directory Reports

Scan all the files in a directory and report on code efficiency file

differences file dependencies and code coverage

Designing Graphical User Interfaces

By using the interactive tool GUIDE (Graphical User Interface

Development Environment) to layout design and edit user interfaces

GUIDE lets you include list boxes pull-down menus push buttons radio

buttons and sliders as well as MATLAB plots and Microsoft

ActiveXreg controls Alternatively you can create GUIs programmatically

using MATLAB functions

322 ANALYZING AND ACCESSING DATA

MATLAB supports the entire data analysis process from acquiring

data from external devices and databases through preprocessing

visualization and numerical analysis to producing presentation-quality

output

Data Analysis

MATLAB provides interactive tools and command-line functions for data

analysis operations including

Interpolating and decimating

Extracting sections of data scaling and averaging

Thresholding and smoothing

Correlation Fourier analysis and filtering

1-D peak valley and zero finding

Basic statistics and curve fitting

Matrix analysis

Data Access

MATLAB is an efficient platform for accessing data from

files other applications databases and external devices You can read data

from popular file formats such as Microsoft Excel ASCII text or binary

files image sound and video files and scientific files such as HDF and

HDF5 Low-level binary file IO functions let you work with data files in

any format Additional functions let you read data from Web pages and

XML

Visualizing Data

All the graphics features that are required to visualize engineering

and scientific data are available in MATLAB These include 2-D and 3-D

plotting functions 3-D volume visualization functions tools for

interactively creating plots and the ability to export results to all popular

graphics formats You can customize plots by adding multiple axes

changing line colors and markers adding annotation Latex equations and

legends and drawing shapes

2-D Plotting

Visualizing vectors of data with 2-D plotting functions that create

Line area bar and pie charts

Direction and velocity plots

Histograms

Polygons and surfaces

Scatterbubble plots

Animations

3-D Plotting and Volume Visualization

MATLAB provides functions for visualizing 2-D matrices 3-

D scalar and 3-D vector data You can use these functions to visualize and

understand large often complex multidimensional data Specifying plot

characteristics such as camera viewing angle perspective lighting effect

light source locations and transparency

3-D plotting functions include

Surface contour and mesh

Image plots

Cone slice stream and isosurface

323 PERFORMING NUMERIC COMPUTATION

MATLAB contains mathematical statistical and engineering

functions to support all common engineering and science operations These

functions developed by experts in mathematics are the foundation of the

MATLAB language The core math functions use the LAPACK and BLAS

linear algebra subroutine libraries and the FFTW Discrete Fourier

Transform library Because these processor-dependent libraries are

optimized to the different platforms that MATLAB supports they execute

faster than the equivalent C or C++ code

MATLAB provides the following types of functions for performing

mathematical operations and analyzing data

Matrix manipulation and linear algebra

Polynomials and interpolation

Fourier analysis and filtering

Data analysis and statistics

Optimization and numerical integration

Ordinary differential equations (ODEs)

Partial differential equations (PDEs)

Sparse matrix operations

MATLAB can perform arithmetic on a wide range of data types

including doubles singles and integers

CHAPTER 4

IMPLEMENTATION

41 GENERAL

Matlab is a program that was originally designed to simplify the

implementation of numerical linear algebra routines It has since grown into

something much bigger and it is used to implement numerical algorithms

for a wide range of applications The basic language used is very similar to

standard linear algebra notation but there are a few extensions that will

likely cause you some problems at first

42 SNAPSHOTS

ORIGINAL SCLERA IMAGE IS CONVERTED INTO GREY SCALE IMAGE

FIG

GREY SCALE IMAGE IS CONVERTED INTO BINARY IMAGE

FIG

EDGE DETECTON IS DONE BY OTSUrsquoS THRESHOLDING

FIG

SELECTING THE REGION OF INTEREST (SCLERA PART)

FIG

SELECTED ROI PART

FIG

FIG

ENHANCEMENT OF SCLERA IMAGE

FIG

FEATURE EXTRACTION OF SCLERA IMAGE USING GABOR

FILTERS

FIG

MATCHING WITH IMAGES IN DATABASE

FIG

DISPLAYING THE RESULT (MATCHED OR NOT MATCHED)

FIG

CHAPTER 5

APPLICATIONS

The applications of biometrics can be divided into the following three main groups

Commercial applications such as computer network login electronic data security ecommerce Internet access ATM credit card physical access control cellular phone PDA medical records management distance learning etc

Government applications such as national ID card correctional facility driverrsquos license social security welfare-disbursement border control Passport control etc

Forensic applications such as corpse identification criminal investigation terrorist identification parenthood determination missing children etc Traditionally commercial applications have used knowledge-based systems (eg PIN sand passwords) government applications have used token-based systems (eg ID cards and badges) and forensic applications have relied on human experts to match biometric features Biometric systems are being increasingly deployed in large scale civilian applications The Schiphol Premium scheme at the Amsterdam airport for example employs iris scan cards to speed up the passport and visa control procedures

CHAPTER 6

CONCLUSION AND FUTURE SCOPE

61 CONCLUSION

In this paper we proposed a new parallel sclera vein recognition

method which employees a two stage parallel approach for registration and

matching Even though the research focused on developing a parallel sclera

matching solution for the sequential line-descriptor method using CUDA

GPU architecture the parallel strategies developed in this research can be

applied to design parallel solutions to other sclera vein recognition methods

and general pattern recognition methods We designed the Y shape

descriptor to narrow the search range to increase the matching efficiency

which is a new feature extraction method to take advantage of the GPU

structures We developed the WPL descriptor to incorporate mask

information and make it more suitable for parallel computing which can

dramatically reduce data transferring and computation We then carefully

mapped our algorithms to GPU threads and blocks which is an important

step to achieve parallel computation efficiency using a GPU A work flow

which has high arithmetic intensity to hide the memory access latency was

designed to partition the computation task to the heterogeneous system of

CPU and GPU even to the threads in GPU The proposed method

dramatically improves the matching efficiency without compromising

recognition accuracy

62 REFERENCES

[1] C W Oyster The Human Eye Structure and Function Sunderland

Sinauer Associates 1999

[2] C Cuevas D Berjon F Moran and N Garcia ldquoMoving object

detection for real-time augmented reality applications in a GPGPUrdquo IEEE

Trans Consum Electron vol 58 no 1 pp 117ndash125 Feb 2012

[3] D C Cirean U Meier L M Gambardella and J Schmidhuber ldquoDeep

big simple neural nets for handwritten digit recognitionrdquo Neural Comput

vol 22 no 12 pp 3207ndash3220 2010

[4] F Z Sakr M Taher and A M Wahba ldquoHigh performance iris

recognition system on GPUrdquo in Proc ICCES 2011 pp 237ndash242

[5] G Poli J H Saito J F Mari and M R Zorzan ldquoProcessing

neocognitron of face recognition on high performance environment based

on GPU with CUDA architecturerdquo in Proc 20th Int Symp Comput

Archit High Perform Comput 2008 pp 81ndash88

[6] J Antikainen J Havel R Josth A Herout P Zemcik and M Hauta-

Kasari ldquoNonnegative tensor factorization accelerated using GPGPUrdquo IEEE

Trans Parallel Distrib Syst vol 22 no 7 pp 1135ndash1141 Feb 2011

[7] K-S Oh and K Jung ldquoGPU implementation of neural networksrdquo

Pattern Recognit vol 37 no 6 pp 1311ndash1314 2004

[8] P R Dixon T Oonishi and S Furui ldquoHarnessing graphics processors

for the fast computation of acoustic likelihoods in speech recognitionrdquo

Comput Speech Lang vol 23 no 4 pp 510ndash526 2009

[9] P Kaufman and A Alm ldquoClinical applicationrdquo Adlerrsquos Physiology of

the Eye 2003

[10] R N Rakvic B J Ulis R P Broussard R W Ives and N Steiner

ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics Security vol 4

no 4 pp 812ndash823 Dec 2009

[11] S Crihalmeanu and A Ross ldquoMultispectral scleral patterns for ocular

biometric recognitionrdquo Pattern Recognit Lett vol 33 no 14 pp 1860ndash

1869 Oct 2012

[12] W Wenying Z Dongming Z Yongdong L Jintao and G

Xiaoguang ldquoRobust spatial matching for object retrieval and its parallel

implementation on GPUrdquo IEEE Trans Multimedia vol 13 no 6 pp

1308ndash1318 Dec 2011Multimedia Sec Magdeburg Germany Sep 2004

pp 4ndash15

[13] Y Xu S Deka and R Righetti ldquoA hybrid CPU-GPGPU approach for

real-time elastographyrdquo IEEE Trans Ultrason Ferroelectr Freq Control

vol 58 no 12 pp 2631ndash2645 Dec 2011

[14] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive

multimodal eye recognitionrdquo Signal Image Video Process vol 7 no 4

pp 619ndash631 Jul 2013

[15] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive

approach for sclera image quality measurerdquo Int J Biometrics vol 5 no 2

pp 181ndash198 2013

[16] Z Zhou E Y Du N L Thomas and E J Delp ldquoA new human

identification method Sclera recognitionrdquo IEEE Trans Syst Man

Cybern A Syst Humans vol 42 no 3 pp 571ndash583 May 2012

Page 27: Major project report on

introduce the ldquoweighted polar line (WPL) descriptorrdquo that would be better

suited for parallel computing to mitigate the mask size issue and develop

our coarse to fine two-stage matching process to dramatically improve the

matching speed These new approaches make the parallel processing

possible and efficient

191PROPOSED SYSTEM ADVANTAGES

1 To improve the efficiency in this research we propose a new descriptor

mdash the Y shape descriptor which can greatly help improve the efficiency of

the coarse registration of two images and can be used to filter out some

non-matching pairs before refined matching

2 We propose the coarse-to-fine two-stage matching process In the first

stage we matched two images coarsely using the Y-shape descriptors

which is very fast to match because no registration was needed The

matching result in this stage can help filter out image pairs with low

similarities

CHAPTER 2

PROJECT DESCRIPTION

21 INTRODUCTION

The sclera is the opaque and white outer layer of the eye The blood

vessel structure of sclera is formed randomly and is unique to each person

which can be used for humanrsquos identification Several researchers have

designed different Sclera vein recognition methods and have shown that it

is promising to use Sclera vein recognition for human identification In

Crihalmeanu and Ross proposed three approaches Speed Up Robust

Features (SURF)-based method minutiae detection and direct correlation

matching for feature registration and matching Within these three methods

the SURF method achieves the best accuracy It takes an average of 15

seconds1 using the SURF method to per- form a one-to-one matching Zhou

et al proposed line descriptor-based method for sclera vein recognition

The matching step (including registration) is the most time-consuming step

in this sclera vein recognition system which costs about 12 seconds to

perform a one-to-one matching Both speed was calculated using a PC with

Intelreg Coretrade 2 Duo 24GHz processors and 4 GB DRAM Currently

Sclera vein recognition algorithms are designed using central processing

unit (CPU)-based systems

As discussed CPU-based systems are designed as sequential

processing devices which may not be efficient in data processing where the

data can be parallelized Because of large time consumption in the matching

step Sclera vein recognition using sequential-based method would be very

challenging to be implemented in a real time biometric system especially

when there is large number of templates in the database for matching GPUs

(as abbreviation of General purpose Graphics Processing Units GPGPUs)

are now popularly used for parallel computing to improve the

computational processing speed and efficiency The highly parallel

structure of GPUs makes them more effective than CPUs for data

processing where processing can be performed in parallel GPUs have been

widely used in biometrics recognition such as speech recognition text

detection handwriting recognition and face recognition In iris

recognition GPU was used to extract the features construct descriptors

and match templates

GPUs are also used for object retrieval and image search Park et al

designed the performance evaluation of image processing algorithms such

as linear feature extraction and multi-view stereo matching on GPUs

However these approaches were designed for their specific biometric

recognition applications and feature searching methods Therefore they may

not be efficient for Sclera vein recognition Compute Unified Device

Architecture (CUDA) the computing engine of NVIDIA GPUs is used in

this research CUDA is a highly parallel multithreaded many-core

processor with tremendous computational power It supports not only a

traditional graphics pipeline but also computation on non-graphical data

More importantly it offers an easier programming platform which

outperforms its CPU counterparts in terms of peak arithmetic intensity and

memory bandwidth In this research the goal is not to develop a unified

strategy to parallelize all sclera matching methods because each method is

quite different from one another and would need customized design To

develop an efficient parallel computing scheme it would need different

strategies for different Sclera vein recognition methods

Rather the goal is to develop a parallel sclera matching solution for

Sclera vein recognition using our sequential line-descriptor method using

the CUDA GPU architecture However the parallelization strategies

developed in this research can be applied to design parallel approaches for

other Sclera vein recognition methods and help parallelize general pattern

recognition methods Based on the matching approach in there are three

challenges to map the task of sclera feature matching to GPU

1) Mask files are used to calculate valid overlapping areas of two sclera

templates and to align the templates to the same coordinate system But the

mask files are large in size and will preoccupy the GPU memory and slow

down the data transfer Also some of processing on the mask files will

involve convolution which is difficult to improve its performance on the

scalar process unit on CUDA

2) The procedure of sclera feature matching consists of a pipeline of several

computational stages with different memory and processing requirements

There is no uniform mapping scheme applicable for all these stages

3) When the scale of sclera database is far larger than the number of the

processing units on the GPU parallel matching on the GPU is still unable to

satisfy the requirement of real-time performance New designs are

necessary to help narrow down the search range In summary naiumlve

implementation of the algorithms in parallel would not work efficiently

Note it is relatively straightforward to implement our C program for

CUDA on AMD-based GPU using Open CL Our CUDA kernels can be

directly converted to Open CL kernels by concerning different syntax for

various keywords and built-in functions The mapping strategy is also

effective in Open CL if we regard thread and block in CUDA as work item

and work-group in Open CL Most of our optimization techniques such as

coalesced memory access and prefix sum can work in Open CL too

Moreover since CUDA is a data parallel architecture the implementation

of our approach by Open CL should be programmed in data-parallel model

In this research we first discuss why the naiumlve parallel approach would not

work (Section 3) We then propose the new sclera descriptor ndash the Y shape

sclera feature-based efficient registration method to speed up the mapping

scheme (Section 4) introduce the ldquoweighted polar line (WPL) descriptorrdquo

that would be better suited for parallel computing to mitigate the mask size

issue (Section 5) and develop our coarse to fine two-stage matching

process to dramatically improve the matching speed (Section 6) These new

approaches make the parallel processing possible and efficient However it

is non-trivial to implement these algorithms in CUDA We then developed

the implementation schemes to map our algorithms into CUDA (Section 7)

In the Section 2 we give brief introduction of Sclera vein recognition In

the Section 8 we performed some experiments using the proposed system

In the Section 9 we draw some conclusions

22 BACKGROUND OF SCLERA VEIN RECOGNITION

221 OVERVIEW OF SCLERA VEIN RECOGNITION

A typical sclera vein recognition system includes sclera

segmentation feature enhancement feature extraction and feature

matching (Figure 1)

FIG

Sclera image segmentation is the first step in sclera vein recognition

Several methods have been designed for sclera segmentation Crihalmeanu

et al presented an semi-automated system for sclera segmentation They

used a clustering algorithm to classify the color eye images into three

clusters - sclera iris and background Later on Crihalmeanu and Ross

designed a segmentation approach based on a normalized sclera index

measure which includes coarse sclera segmentation pupil region

segmentation and fine sclera segmentation Zhou et al developed a skin

tone plus ldquowhite colorrdquo-based voting method for sclera segmentation in

color images and Otsursquos thresholding-based method for grayscale images

After sclera segmentation it is necessary to enhance and extract the sclera

features since the sclera vein patterns often lack contrast and are hard to

detect Zhou et al used a bank of multi-directional Gabor filters for

vascular pattern enhancement Derakhshani et al used contrast limited

adaptive histogram equalization (CLAHE) to enhance the green color plane

of the RGB image and a multi-scale region growing approach to identify

the sclera veins from the image background Crihalmeanu and Ross applied

a selective enhancement filter for blood vessels to extract features from the

green component in a color image In the feature matching step

Crihalmeanu and Ross proposed

three registration and matching approaches including Speed Up Robust

Features (SURF) which is based on interest-point detection minutiae

detection which is based on minutiae points on the vasculature structure

and direct correlation matching which relies on image registration Zhou et

al designed a line descriptor based feature registration and matching

method

The proposed sclera recognition consists of five steps which include

sclera segmentation vein pattern enhancement feature extraction feature

matching and matching decision Fig 2 shows the block diagram of sclera

recognition Two types of feature extraction are used in the proposed

method to achieve good accuracy for the identification The characteristics

that are elicited from the blood vessel structure seen in the sclera region are

Histogram of Oriented Gradients (HOG) and interpolation of Cartesian to

Polar conversion HOG is used to determine the gradient orientation and

edge orientations of vein pattern in the sclera region of an eye image To

become more computationally efficient the data of the image are converted

to the polar form It is mainly used for circular or quasi circular shape of

object These two characteristics are extracted from all the images in the

database and compared with the features of the query image whether the

person is correctly identified or not This procedure is done in the feature

matching step and ultimately makes the matching decision By using the

proposed feature extraction methods and matching techniques the human

identification is more accurate than the existing studies In the proposed

method two features of an image are drawn out

222 SCLERA SEGMENTATION

Sclera segmentation is the first step in the sclera recognition It lets

in three steps glare area detection sclera area estimation and iris and eyelid

detection and refinement Fig shows the steps of segmentation

FIG

Glare Area Detection Glare area means a small bright area near

pupil or iris This is the unwanted portion on the eye image Sobel filter is

applied to detect the glare area present in the iris or pupil Simply it runs

only for the grayscale image If the image is color then it needs a

conversion to grayscale image and after that apply it to the Sobel filter to

detect the glare area Fig 4 shows the result of the glare area detection

FIG

Sclera area estimation For the estimation of sclera area Otsursquos

thresholding method is applied The stairs of the sclera area detection are

selection of the area of interest (ROI) Otsursquos thresholding sclera area

detection Left and right sclera area is selected based on the iris boundaries

When the region of interest is selected then apply Otsursquos thresholding for

obtaining the potential sclera areas The correct left sclera area should be

placed in the right and center positions and correct right sclera area should

be placed in the left and center In this way non sclera areas are wiped out

223 IRIS AND EYELID REFINEMENT

The top and underside of the sclera regions are the limits of the

sclera area And then that upper eyelid lower eyelid and iris boundaries are

refined These altogether are the unwanted portion for recognition In order

to eliminate these effects refinement is done in the footstep of the detection

of sclera area Fig shows after the Otsursquos thresholding process and iris and

eyelid refinement to detect right sclera area In the same way the left sclera

area is detected using this method

FIG

In the segmentation process all images are not perfectly segmented

Hence feature extraction and matching are needed to reduce the

segmentation fault The vein patterns in the sclera area are not visible in the

segmentation process To get vein patterns more visible vein pattern

enhancement is to be performed

224 OCULAR SURFACE VASCULATURE

Human recognition using vascular patterns in the human body has

been studied in the context of fingers (Miura et al 2004) palm (Lin and

Fan 2004) and retina (Hill 1999) In the case of retinal biometrics an

especial optical device for imaging the back of the eyeball is needed (Hill

1999) Due to its perceived invasiveness and the required degree of subject

cooperation the use of retinal biometrics may not be acceptable to some

individuals The conjunctiva is a thin transparent and moist tissue that

covers the outer surface of the eye The part of the conjunctiva that covers

the inner lining of the eyelids is called palpebral conjunctiva and the part

that covers the outer surface of the eye is called ocular (or the bulbar)

conjunctiva which is the focus of this study The ocular conjunctiva is very

thin and clear thus the vasculature (including those of the episclera) is

easily visible through it The visible microcirculation of conjunctiva offers a

rich and complex network of veins and fine microcirculation (Fig 1) The

apparent complexity and specificity of these vascular patterns motivated us

to utilize them for personal identification (Derakhshani and Ross 2006)

FIG

We have found conjunctival vasculature to be a suitable biometric as it

conforms to the following criteria (Jain et al 2004)

UNIVERSALITY All normal living tissues including that of the

conjunctiva and episclera have vascular structure

UNIQUENESS Vasculature is created during embryonic vasculogenesis

Its detailed final structure is mostly stochastic and thus unique Even

though no comprehensive study on the uniqueness of vascular structures

has been conducted study of some targeted areas such as those of the eye

fundus confirm the uniqueness of such vascular patterns even between

identical twins (Simon and Goldstein 1935 Tower 1955)

PERMANENCE Other than cases of significant trauma pathology or

chemical intervention spontaneous adult ocular vasculogenesis and

angiogenesis does not easily occur Thus the conjunctival vascular

structure is expected to have reasonable permanence (Joussen 2001)

Practicality Conjunctival vasculature can be captured with commercial off

the shelf digital cameras under normal lighting conditions making this

modality highly practical

ACCEPTABILITY Since the subject is not required to stare directly into

the camera lens and given the possibility of capturing the conjunctival

vasculature from several feet away this modality is non-intrusive and thus

more acceptable

SPOOF-PROOFNESS The fine multi surface structure of the ocular

veins makes them hard to reproduce as a physical artifact Besides being a

stand-alone biometric modality we anticipate that the addition of

conjunctival biometrics will enhance the performance of current iris-based

biometric system in the following ways

Improving accuracy by the addition of vascular features

Facilitating recognition using off-angle iris images For instance if the iris

information is relegated to the left or right portions of the eye the sclera

vein patterns will be further exposed This feature makes sclera vasculature

a natural complement to the iris biometric

Addressing the failure-to-enroll issue when iris patterns are not usable (eg

due to surgical procedures)

Reducing vulnerability to spoof attacks For instance when implemented

alongside iris systems an attacker needs to reproduce not only the iris but

also different surfaces of the sclera along with the associated

microcirculation and make them available on commensurate eye surfaces

The first step in parallelizing an algorithm is to determine the

availability for simultaneous computation Below Figure demonstrates the

possibility for parallel directional filtering Since the filter is computed over

different portions of the input image the computation can be computed in

parallel (denoted by Elements below) In addition individual parallelization

of each element of Filtering can also be performed A detailed discussion of

our proposed parallelization is outside the scope of this paper

FIG

FIG

225 OVERVIEW OF THE LINE DESCRIPTOR-BASED SCLERA

VEIN

2251 RECOGNITION METHOD

The matching segment of the line-descriptor based method is a

bottleneck with regard to matching speed In this section we briefly

describe the Line Descriptor-based sclera vein recognition method After

segmentation vein patterns were enhanced by a bank of directional Gabor

filters Binary morphological operations are used to thin the detected vein

structure down to a single pixel wide skeleton and remove the branch

points The line descriptor is used to describe the segments in the vein

structure Figure 2 shows a visual description of the line descriptor Each

segment is described by three quantities the segments angle to some

reference angle at the iris center θ the segments distance to the iris center r

and the dominant angular orientation of the line segment ɸ Thus the

descriptor is S = ( θ r ɸ )T The individual components of the line descriptor

are calculated as

FIG

Here fline (x) is the polynomial approximation of the line segment (xl yl )

is the center point of the line segment (xi yi ) is the center of the detected

iris and S is the line descriptor In order to register the segments of the

vascular patterns a RANSAC-based algorithm is used to estimate the best-

fit parameters for registration between the two sclera vascular patterns For

the registration algorithm it randomly chooses two points ndash one from the

test template and one from the target template It also randomly chooses a

scaling factor and a rotation value based on a priori knowledge of the

database Using these values it calculates a fitness value for the registration

using these parameters

After sclera template registration each line segment in the test

template is compared to the line segments in the target template for

matches In order to reduce the effect of segmentation errors we created the

weighting image (Figure 3) from the sclera mask by setting interior pixels

in the sclera mask to 1 pixels within some distance of the boundary of the

mask to 05 and pixels outside the mask to 0

The matching score for two segment descriptors is calculated By

where Si and Sj are two segment descriptors m(Si Sj ) is the matching

score between segments Si and Sj d(Si Sj ) is the Euclidean distance

between the segment descriptors center points (from Eq 6-8) Dmatch is

the matching distance threshold and match is the matching angle threshold

The total matching score M is the sum of the individual matching scores

divided by the maximum matching score for the minimal set between the

test and target template That is one of the test or target templates has fewer

points and thus the sum of its descriptors weight sets the maximum score

that can be attained

FIG

FIG

FIG

FIG

movement of eye Y shape branches are observed to be a stable feature and

can be used as sclera feature descriptor To detect the Y shape branches in

the original template we search for the nearest neighbors set of every line

segment in a regular distance classified the angles among these neighbors

If there were two types of angle values in the line segment set this set may

be inferred as a Y shape structure and the line segment angles would be

recorded as a new feature of the sclera

There are two ways to measure both orientation and relationship of

every branch of Y shape vessels one is to use the angles of every branch to

x axle the other is to use the angels between branch and iris radial

direction The first method needs additional rotation operating to align the

template In our approach we employed the second method As Figure 6

shows ϕ1 ϕ2 and ϕ3 denote the angle between each branch and the radius

from pupil center Even when the head tilts the eye moves or the camera

zooms occurs at the image acquisition step ϕ1 ϕ2 and ϕ3 are quite stable

To tolerate errors from the pupil center calculation in the segmentation step

we also recorded the center position (x y) of the Y shape branches as

auxiliary parameters So in our rotation shift and scale invariant feature

vector is defined as y(ϕ1 ϕ2 ϕ3 x y) The Y-shape descriptor is generated

with reference to the iris center Therefore it is automatically aligned to the

iris centers It is a rotational- and scale- invariant descriptor V WPL

SCLERA DESCRIPTOR As we discussed in the Section 22 the line

descriptor is extracted from the skeleton of vessel structure in binary images

(Figure 7) The skeleton is then broken into smaller segments For each

segment a line descriptor is created to record the center and orientation of

the segment This descriptor is expressed as s(x yɸ) where (x y) is the

position of the center and ɸ is its orientation Because of the limitation of

segmentation accuracy the descriptor in the boundary of sclera area might

not be accurate and may contain spur edges resulting from the iris eyelid

andor eyelashes To be tolerant of such error the mask file

FIG

The line descriptor of the sclera vessel pattern (a) An eye image (b) Vessel

patterns in sclera (c) Enhanced sclera vessel patterns (d) Centers of line

segments of vessel patterns

Is designed to indicate whether a line segment belongs to the edge of the

sclera or not However in GPU application using the mask is a challenging

since the mask files are large in size and will occupy the GPU memory and

slow down the data transfer When matching the registration RANSAC

type algorithm was used to randomly select the corresponding descriptors

and the transform parameter between them was used to generate the

template transform affine matrix After every templates transform the mask

data should also be transformed and new boundary should be calculated to

evaluate the weight of the transformed descriptor This results in too many

convolutions in processor unit

To reduce heavy data transfer and computation we designed the

weighted polar line (WPL) descriptor structure which includes the

information of mask and can be automatically aligned We extracted the

relationship of geometric feature of descriptors and store them as a new

descriptor We use a weighted image created via setting various weight

values according to their positions The weight of those descriptors who are

beyond the sclera are set to be 0 and those who are near the sclera

boundary are 05 and interior descriptors are set to be 1 In our work

descriptors weights were calculated on their own mask by the CPU only

once

The calculating result was saved as a component of descriptor The

descriptor of sclera will change to s(x y ɸw) where w denotes the weight

of the point and the value may be 0 05 1 To align two templates when a

template is shifted to another location along the line connecting their

centers all the descriptors of that template will be transformed It would be

faster if two templates have similar reference points If we use the center of

the iris as the reference point when two templates are compared the

correspondence will automatically be aligned to each other since they have

the similar reference point Every feature vector of the template is a set of

line segment descriptors composed of three variable (Figure 8) the

segment angle to the reference line which went through the iris center

denoted as θ the distance between the segments center and pupil center

which is denoted as r the dominant angular orientation of the segment

denoted as ɸ To minimize the GPU computing we also convert the

descriptor value from polar coordinate to rectangular coordinate in CPU

preprocess

The descriptor vector becomes s(x y r θ ɸw) The left and right

parts of sclera in an eye may have different registration parameters For

example as an eyeball moves left left part sclera patterns of the eye may be

compressed while the right part sclera patterns are stretched

In parallel matching these two parts are assigned to threads in

different warps to allow different deformation The multiprocessor in

CUDA manages threads in groups of 32 parallel threads called warps We

reorganized the descriptor from same sides and saved

FIG

FIG

them in continuous address This would meet requirement of coalesced

memory access in GPU

After reorganizing the structure of descriptors and adding mask information

into the new descriptor the computation on the mask file is not needed on

the GPU It was very fast to match with this feature because it does not

need to reregister the templates every time after shifting Thus the cost of

data transfer and computation on GPU will be reduced Matching on the

new descriptor the shift parameters generator in Figure 4 is then simplified

as Figure 9

23 EVOLUTION OF GPU ARCHITECTURE

The fixed-function pipeline lacked the generality to efficiently express

more complicated shading and lighting operations that are essential for

complex effects The key step was replacing the fixed-function per-vertex

and per-fragment operations with user-specified programs run on each

vertex and fragment Over the past six years these vertex programs and

fragment programs have become increasingly more capable with larger

limits on their size and resource consumption with more fully featured

instruction sets and with more flexible control-flow operations After many

years of separate instruction sets for vertex and fragment operations current

GPUs support the unified Shader Model 40 on both vertex and fragment

shaders

The hardware must support shader programs of at least 65 k static

instructions and unlimited dynamic instructions

The instruction set for the first time supports both 32-bit integers and 32-

bit floating-point numbers

The hardware must allow an arbitrary number of both direct and indirect

reads from global memory (texture)

Finally dynamic flow control in the form of loops and branches must be

supported

As the shader model has evolved and become more powerful and GPU

applications of all types have increased vertex and fragment program

complexity GPU architectures have increasingly focused on the

programmable parts of the graphics pipeline Indeed while previous

generations of GPUs could best be described as additions of

programmability to a fixed-function pipeline todayrsquos GPUs are better

characterized as a programmable engine surrounded by supporting fixed-

function units General-Purpose Computing on the GPU Mapping general-

purpose computation onto the GPU uses the graphics hardware in much the

same way as any standard graphics application Because of this similarity it

is both easier and more difficult to explain the process On one hand the

actual operations are the same and are easy to follow on the other hand the

terminology is different between graphics and general-purpose use Harris

provides an excellent description of this mapping process

We begin by describing GPU programming using graphics terminology

then show how the same steps are used in a general-purpose way to author

GPGPU applications and finally use the same steps to show the more

simple and direct way that todayrsquos GPU computing applications are written

231 PROGRAMMING A GPU FOR GRAPHICS

We begin with the same GPU pipeline that we described in Section II

concentrating on the programmable aspects of this pipeline

The programmer specifies geometry that covers a region on the screen

The rasterizer generates a fragment at each pixel location covered by that

geometry

Each fragment is shaded by the fragment program

The fragment program computes the value of the fragment by a

combination of math operations and global memory reads from a global

Btexture[ memory

The resulting image can then be used as texture on future passes through

the graphics pipeline

232 PROGRAMMING A GPU FOR GENERAL-PURPOSE

PROGRAMS (OLD)

Coopting this pipeline to perform general-purpose computation

involves the exact same steps but different terminology A motivating

example is a fluid simulation computed over a grid at each time step we

compute the next state of the fluid for each grid point from the current state

at its grid point and at the grid points of its neighbors

The programmer specifies a geometric primitive that covers a

computation domain of interest The rasterizer generates a fragment at each

pixel location covered by that geometry (In our example our primitive

must cover a grid of fragments equal to the domain size of our fluid

simulation)

Each fragment is shaded by an SPMD general-purpose fragment

program (Each grid point runs the same program to update the state of its

fluid)

The fragment program computes the value of the fragment by a

combination of math operations and Bgather[ accesses from global

memory (Each grid point can access the state of its neighbors from the

previous time step in computing its current value)

The resulting buffer in global memory can then be used as an input on

future passes (The current state of the fluid will be used on the next time

step)

23 2 PROGRAMMING A GPU FOR GENERAL-PURPOSE

PROGRAMS (NEW)

One of the historical difficulties in programming GPGPU applications

has been that despite their general-purpose tasksrsquo having nothing to do with

graphics the applications still had to be programmed using graphics APIs

In addition the program had to be structured in terms of the graphics

pipeline with the programmable units only accessible as an intermediate

step in that pipeline when the programmer would almost certainly prefer to

access the programmable units directly The programming environments we

describe in detail in Section IV are solving this difficulty by providing a

more natural direct non-graphics interface to the hardware and

specifically the programmable units Today GPU computing applications

are structured in the following way

The programmer directly defines the computation domain of interest as a

structured grid of threads

An SPMD general-purpose program computes the value of each thread

The value for each thread is computed by a combination of math

operations and both Bgather[ (read) accesses from and Bscatter[ (write)

accesses to global memory Unlike in the previous two

methods the same buffer can be used for both reading and writing

allowing more flexible algorithms (for example in-place algorithms that

use less memory)

The resulting buffer in global memory can then be used as an input in

future computation

24 COARSE-TO-FINE TWO-STAGE MATCHING PROCESS

To further improve the matching process we propose the coarse-to-fine

two-stage matching process In the first stage we matched two images

coarsely using the Y-shape descriptors which is very fast to match because

no registration was needed The matching result in this stage can help filter

out image pairs with low similarities After this step it is still possible for

some false positive matches In the second stage we used WPL descriptor

to register the two images for more detailed descriptor matching including

scale- and translation invariance This stage includes shift transform affine

matrix generation and final WPL descriptor matching Overall we

partitioned the registration and matching processing into four kernels2 in

CUDA (Figure 10) matching on the Y shape descriptor shift

transformation affine matrix generation and final WSL descriptor

matching Combining these two stages the matching program can run faster

and

achieve more accurate score

241 STAGE I MATCHING WITH Y SHAPE DESCRIPTOR

Due to scale- and rotation- invariance of the Y-shape features

registration is unnecessary before matching on Y shape descriptor The

whole matching algorithm is listed as algorithm 1

FIG

Here ytei and yta j are the Y shape descriptors of test template Tte

and target template Tta respectively dϕ is the Euclidian distance of angle

element of descriptors vector defined as (3) dxy is the Euclidian distance of

two descriptor centers defined as (4) ni and di are the matched descriptor

pairsrsquo number and their centers distance respectively tϕ is a distance

threshold and txy is the threshold to restrict the searching area We set tϕ to

30 and txy to 675 in our experiment Here

To match two sclera templates we searched the areas nearby to all

the Y shape branches The search area is limited to the corresponding left or

right half of the sclera in order to reduce the searching range and time The

distance of two branches is defined in (3) where ϕi j is the angle between

the j th branch and the polar from pupil center in desctiptor i

The number of matched pairs ni and the distance between Y shape

branches centers di are stored as the matching result We fuse the number of

matched branches and the average distance between matched branches

centers as (2) Here α is a factor to fuse the matching score which was set

to 30 in our study Ni and Nj is the total numbers of feature vectors in

template i and j separately The decision is regulated by the threshold t if

the sclerarsquos matching score is lower than t the sclera will be discarded The

sclera with high matching score will be passed to the next more precisely

matching process

242 STAGE II FINE MATCHING USING WPL DESCRIPTOR

The line segment WSL descriptor reveals more vessel structure detail of

sclera than the Y shape descriptor The variation of sclera vessel pattern is

nonlinear because

When acquiring an eye image in different gaze angle the vessel structure

will appear nonlinear shrink or extend because eyeball is spherical in shape

sclera is made up of four layers episclera stroma lamina fusca and

endothelium There are slightly differences among movement of these

layers Considering these factors our registration employed both single

shift transform and multi-parameter transform which combines shift

rotation and scale together

1) SHIFT PARAMETER SEARCH As we discussed before

segmentation may not be accurate As a result the detected iris center could

not be very accurate Shift transform is designed to tolerant possible errors

in pupil center detection in the segmentation step If there is no deformation

or only very minor deformation registration with shift transform together

would be adequate to achieve an accurate result We designed Algorithm 2

to get optimized shift parameter Where Tte is the test template and ssei is

the i th WPL descriptor of Tte Tta is the target template and ssai is the i th

WPL descriptor of Ttad(stek staj ) is Euclidean distance of escriptors stek

and staj

Δsk is the shift value of two descriptors defines as

We first randomly select an equal number of segment descriptors

stek in test template Tte from each quad and find its nearest neighbors staj _

in target template Tta The shift offset of them is recorded as the possible

registration shift factor _sk The final offset registration factor is _soptim

which has the smallest standard deviation among these candidate offsets

2) AFFINE TRANSFORM PARAMETER SEARCH

Affine transform is designed to tolerant some deformation of sclera

patterns in the matching step The affine transform algorithm is shown in

Algorithm 3 The shift value in the parameter set is obtained by randomly

selecting descriptor s(it )te and calculating the distance from its nearest

neighbor staj_ in Tta We transform the test template by the matrix in (7)

At end of the iteration we count the numbers of matched descriptor pairs

from the transformed template and the target template The factor β is

involved to determine if the pair of descriptor is matched and we set it to

be 20 pixels in our experiment After N iterations the optimized transform

parameter set is determined via selecting the maximum matching numbers

m(it) Here stei Tte staj and Tta is defined same as algorithm 2 tr (it )

shi f t θ(it )tr (it ) scale is the parameters of shift rotation and scale

transform generated in i tth iteration R(θ (it )) T (tr (it ) shi f t ) and S(tr (it

) scale) are the transform matrix defined as (7) To search optimize

transform parameter we iterated N times to generate these parameters In

our experiment we set iteration time to 512

3) REGISTRATION AND MATCHING ALGORITHM

Using the optimized parameter set determined from Algorithms 2

and 3 the test template will be registered and matched simultaneously The

registration and matching algorithm is listed in Algorithm 4 Here stei Tte

staj and Tta are defined same as Algorithms 2 and 3 θ(optm) tr (optm) shi f

t tr (optm) scale _soptim are the registration parameters attained from

Algorithms 2 and 3 R_θ(optm)_T _tr (optm) shi f t _S_tr (optm) scale

_ is the descriptor transform matrix defined in Algorithm 3 empty is the angle

between the segment descriptor and radius direction w is the weight of the

descriptor which indicates whether the descriptor is at the edge of sclera or

not To ensure that the nearest descriptors have a similar orientation we

used a constant factor α to check the abstract difference of two ɸ In our

experiment we set α to 5 The total matching score is minimal score of two

transformed result divided by the minimal matching score for test template

and target template

25 MAPPING THE SUBTASKS TO CUDA

CUDA is a single instruction multiple data (SIMD) system and

works as a coprocessor with a CPU A CUDA consists of many streaming

multiprocessors (SM) where the parallel part of the program should be

partitioned into threads by the programmer and mapped into those threads

There are multiple memory spaces in the CUDA memory hierarchy

register local memory shared memory global memory constant memory

and texture memory Register local memory and shared memory are on-

chip and could be a little time consuming to access these memories Only

shared memory can be accessed by other threads within the same block

However there is only limited availability of shared memory Global

memory constant memory and texture memory are off-chip memory and

accessible by all threads which would be very time consuming to access

these memories

Constant memory and texture memory are read-only and cacheable

memory Mapping algorithms to CUDA to achieve efficient processing is

not a trivial task There are several challenges in CUDA programming

If threads in a warp have different control path all the branches will be

executed serially To improve performance branch divergence within a

warp should be avoided

Global memory is slower than on-chip memory in term of access To

completely hide the latency of the small instructions set we should use on-

chip memory preferentially rather than global memory When global

memory access occurs threads in same warp should access the words in

sequence to achieve coalescence

Shared memory is much faster than the local and global memory space

But shared memory is organized into banks which are equal in size If two

addresses of memory request from different thread within a warp fall in the

same memory bank the access will be serialized To get maximum

performance memory requests should be scheduled to minimize bank

conflicts

251 MAPPING ALGORITHM TO BLOCKS

Because the proposed registration and matching algorithm has four

independent modules all the modules will be converted to different kernels

on the GPU These kernels are different in computation density thus we

map them to the GPU by various map strategies to fully utilize the

computing power of CUDA Figure 11 shows our scheme of CPU-GPU

task distribution and the partition among blocks and threads Algorithm 1 is

partitioned into coarse-grained parallel subtasks

We create a number of threads in this kernel The number of threads

is the same as the number of templates in the database As the upper middle

column shows in Figure 11 each target template will be assigned to one

thread One thread performs a pair of templates compare In our work we

use NVIDIA C2070 as our GPU Threads and blocks number is set to

1024 That means we can match our test template with up to 1024times1024

target templates at same time

Algorithms 2-4 will be partitioned into fine-grained subtasks which is

processed a section of descriptors in one thread As the lower portion of the

middle column shows in Figure 11 we assigned a target template to one

block Inside a block one thread corresponds a set of descriptors in this

template This partition makes every block execute independently and there

are no data exchange requirements between different blocks When all

threads complete their responding descriptor fractions the sum of the

intermediate results needs to be computed or compared A parallel prefix

sum algorithm is used to calculate the sum of intermediate results which is

show in right of Figure 11 Firstly all odd number threads compute the sum

of consecutive pairs of the results Then recursively every first of i (= 4 8

16 32 64 ) threads

compute the prefix sum on the new result The final result will be saved in

the first address which has the same variable name as the first intermediate

result

252 MAPPING INSIDE BLOCK

In shift argument searching there are two schemes we can choose to

map task

Mapping one pair of templates to all the threads in a block and then every

thread would take charge of a fraction of descriptors and cooperation with

other threads

Assigning a single possible shift offset to a thread and all the threads will

compute independently unless the final result should be compared with

other possible offset

Due to great number of sum and synchronization operations in every

nearest neighbor searching step we choose the second method to parallelize

shift searching In affine matrix generator we mapped an entire parameter

set searching to a thread and every thread randomly generated a set of

parameters and tried them independently The generated iterations were

assigned to all threads The challenge of this step is the randomly generated

numbers might be correlated among threads In the step of rotation and

scale registration generating we used the Mersenne Twister pseudorandom

number generator because it can use bitwise arithmetic and have long

period

The Mersenne twister as most of pseudorandom generators is iterative

Therefore itrsquos hard to parallelize a single twister state update step among

several execution threads To make sure that thousands of threads in the

launch grid generate uncorrelated random sequence many simultaneous

Mersenne twisters need to process with different initial states in parallel

But even ldquovery differentrdquo (by any definition) initial state values do not

prevent the emission of correlated sequences by each generator sharing

identical parameters To solve this problem and to enable efficient

implementation of Mersenne Twister on parallel architectures we used a

special offline tool for the dynamic creation of Mersenne Twisters

parameters modified from the algorithm developed by Makoto Matsumoto

and Takuji Nishimura In the registration and matching step when

searching the nearest neighbor a line segment that has already matched

with others should not be used again In our approach a flag

FIG

FIG

Variable denoting whether the line has been matched is stored in

shared memory To share the flags all the threads in a block should wait

synchronic operation at every query step Our solution is to use a single

thread in a block to process the matching

253 MEMORY MANAGEMENT

The bandwidth inside GPU board is much higher than the

bandwidth between host memory and device memory The data transfer

between host and device can lead to long latency As shown in Figure 11

we load the entire target templates set from database without considering

when they would be processed Therefore there was no data transfer from

host to device during the matching procedure In global memory the

components in descriptor y(ϕ1 ϕ2 ϕ3 x y) and s(x y rθ ϕw) were stored

separately This would guarantee contiguous kernels of Algorithm 2 to 4

can access their data in successive addresses Although such coalescing

access reduces the latency frequently global memory access was still a

slower way to get data In our kernel we loaded the test template to shared

memory to accelerate memory access Because the Algorithms 2 to 4

execute different number of iterations on same data the bank conflict does

not happen To maximize our texture memory space we set the system

cache to the lowest value and bonded our target descriptor to texture

memory Using this catchable memory our data access was accelerated

more

FIG

26 HISTOGRAM OF ORIENTED GRADIENTS

Histogram of oriented gradients is the feature descriptors It is primarily

applied to the design of target detection In this paper it is applied as the

feature for human recognition In the sclera region the vein patterns are the

edges of an image So HOG is used to determine the gradient orientation

and edge orientations of vein pattern in the sclera region of an eye image

To follow out this technique first of all divide the image into small

connected regions called cells For each cell compute the histogram of

gradient directions or edge orientations of the pixels Then the combination

of different histogram of different cell represents the descriptor To improve

accuracy histograms can be contrast normalized by calculating the intensity

from the block and then using this value normalizes all cells within the

block This normalization result shows that it is invariant to geometric and

photometric changes The gradient magnitude m(x y) and orientation 1050592(x

y) are calculated using x and y directions gradients dx (x y) and dy (x y)

Orientation binning is the second step of HOG This method utilized

to create cell histograms Each pixel within the cell used to give a weight to

the orientation which is found in the gradient computation Gradient

magnitude is used as the weight The cells are in the rectangular form The

binning of gradient orientation should be spread over 0 to 180 degrees and

opposite direction counts as the same In the Fig 8 depicts the edge

orientation of picture elements If the images have any illumination and

contrast changes then the gradient strength must be locally normalized For

that cells are grouped together into larger blocks These blocks are

overlapping so that each cell contributes more than once to the final

descriptor Here rectangular HOG (R-HOG) blocks are applied which are

mainly in square grids The performance of HOG is improved by putting

on a Gaussian window into each block

FIG

CHAPTER 3

SOFTWARE SPECIFICATION

31 GENERAL

MATLAB(matrix laboratory) is a numerical

computing environment and fourth-generation programming language

Developed by Math Works MATLAB allows matrix manipulations

plotting of functions and data implementation of algorithms creation

of user interfaces and interfacing with programs written in other languages

including C C++ Java and Fortran

Although MATLAB is intended primarily for numerical computing an

optional toolbox uses the MuPAD symbolic engine allowing access

to symbolic computing capabilities An additional package Simulink adds

graphicalmulti-domainsimulationandModel-Based

Design for dynamic and embedded systems

In 2004 MATLAB had around one million users across industry

and academia MATLAB users come from various backgrounds

of engineering science and economics MATLAB is widely used in

academic and research institutions as well as industrial enterprises

MATLAB was first adopted by researchers and practitioners

in control engineering Littles specialty but quickly spread to many other

domains It is now also used in education in particular the teaching

of linear algebra and numerical analysis and is popular amongst scientists

involved in image processing The MATLAB application is built around the

MATLAB language The simplest way to execute MATLAB code is to type

it in the Command Window which is one of the elements of the MATLAB

Desktop When code is entered in the Command Window MATLAB can

be used as an interactive mathematical shell Sequences of commands can

be saved in a text file typically using the MATLAB Editor as a script or

encapsulated into a function extending the commands available

MATLAB provides a number of features for documenting and

sharing your work You can integrate your MATLAB code with other

languages and applications and distribute your MATLAB algorithms and

applications

32 FEATURES OF MATLAB

High-level language for technical computing

Development environment for managing code files and data

Interactive tools for iterative exploration design and problem solving

Mathematical functions for linear algebra statistics Fourier analysis

filtering optimization and numerical integration

2-D and 3-D graphics functions for visualizing data

Tools for building custom graphical user interfaces

Functions for integrating MATLAB based algorithms with external

applications and languages such as C C++ FORTRAN Javatrade COM

and Microsoft Excel

MATLAB is used in vast area including signal and image

processing communications control design test and measurement

financial modeling and analysis and computational Add-on toolboxes

(collections of special-purpose MATLAB functions) extend the MATLAB

environment to solve particular classes of problems in these application

areas

MATLAB can be used on personal computers and powerful

server systems including the Cheaha compute cluster With the addition of

the Parallel Computing Toolbox the language can be extended with parallel

implementations for common computational functions including for-loop

unrolling Additionally this toolbox supports offloading computationally

intensive workloads to Cheaha the campus compute cluster MATLAB is

one of a few languages in which each variable is a matrix (broadly

construed) and knows how big it is Moreover the fundamental operators

(eg addition multiplication) are programmed to deal with matrices when

required And the MATLAB environment handles much of the bothersome

housekeeping that makes all this possible Since so many of the procedures

required for Macro-Investment Analysis involves matrices MATLAB

proves to be an extremely efficient language for both communication and

implementation

321 INTERFACING WITH OTHER LANGUAGES

MATLAB can call functions and subroutines written in the C

programming language or FORTRAN A wrapper function is created

allowing MATLAB data types to be passed and returned The dynamically

loadable object files created by compiling such functions are termed MEX-

files (for MATLAB executable)

Libraries written in Java ActiveX or NET can be directly called

from MATLAB and many MATLAB libraries (for

example XML or SQL support) are implemented as wrappers around Java

or ActiveX libraries Calling MATLAB from Java is more complicated but

can be done with MATLAB extension which is sold separately by Math

Works or using an undocumented mechanism called JMI (Java-to-Mat lab

Interface) which should not be confused with the unrelated Java that is also

called JMI

As alternatives to the MuPAD based Symbolic Math Toolbox

available from Math Works MATLAB can be connected

to Maple or Mathematical

Libraries also exist to import and export MathML

Development Environment

Startup Accelerator for faster MATLAB startup on Windows especially on

Windows XP and for network installations

Spreadsheet Import Tool that provides more options for selecting and

loading mixed textual and numeric data

Readability and navigation improvements to warning and error messages in

the MATLAB command window

Automatic variable and function renaming in the MATLAB Editor

Developing Algorithms and Applications

MATLAB provides a high-level language and development

tools that let you quickly develop and analyze your algorithms and

applications

The MATLAB Language

The MATLAB language supports the vector and matrix operations

that are fundamental to engineering and scientific problems It enables fast

development and execution With the MATLAB language you can

program and develop algorithms faster than with traditional languages

because you do not need to perform low-level administrative tasks such as

declaring variables specifying data types and allocating memory In many

cases MATLAB eliminates the need for lsquoforrsquo loops As a result one line of

MATLAB code can often replace several lines of C or C++ code

At the same time MATLAB provides all the features of a traditional

programming language including arithmetic operators flow control data

structures data types object-oriented programming (OOP) and debugging

features

MATLAB lets you execute commands or groups of commands one

at a time without compiling and linking enabling you to quickly iterate to

the optimal solution For fast execution of heavy matrix and vector

computations MATLAB uses processor-optimized libraries For general-

purpose scalar computations MATLAB generates machine-code

instructions using its JIT (Just-In-Time) compilation technology

This technology which is available on most platforms provides

execution speeds that rival those of traditional programming languages

Development Tools

MATLAB includes development tools that help you implement

your algorithm efficiently These include the following

MATLAB Editor

Provides standard editing and debugging features such as setting

breakpoints and single stepping

Code Analyzer

Checks your code for problems and recommends modifications to

maximize performance and maintainability

MATLAB Profiler

Records the time spent executing each line of code

Directory Reports

Scan all the files in a directory and report on code efficiency file

differences file dependencies and code coverage

Designing Graphical User Interfaces

By using the interactive tool GUIDE (Graphical User Interface

Development Environment) to layout design and edit user interfaces

GUIDE lets you include list boxes pull-down menus push buttons radio

buttons and sliders as well as MATLAB plots and Microsoft

ActiveXreg controls Alternatively you can create GUIs programmatically

using MATLAB functions

322 ANALYZING AND ACCESSING DATA

MATLAB supports the entire data analysis process from acquiring

data from external devices and databases through preprocessing

visualization and numerical analysis to producing presentation-quality

output

Data Analysis

MATLAB provides interactive tools and command-line functions for data

analysis operations including

Interpolating and decimating

Extracting sections of data scaling and averaging

Thresholding and smoothing

Correlation Fourier analysis and filtering

1-D peak valley and zero finding

Basic statistics and curve fitting

Matrix analysis

Data Access

MATLAB is an efficient platform for accessing data from

files other applications databases and external devices You can read data

from popular file formats such as Microsoft Excel ASCII text or binary

files image sound and video files and scientific files such as HDF and

HDF5 Low-level binary file IO functions let you work with data files in

any format Additional functions let you read data from Web pages and

XML

Visualizing Data

All the graphics features that are required to visualize engineering

and scientific data are available in MATLAB These include 2-D and 3-D

plotting functions 3-D volume visualization functions tools for

interactively creating plots and the ability to export results to all popular

graphics formats You can customize plots by adding multiple axes

changing line colors and markers adding annotation Latex equations and

legends and drawing shapes

2-D Plotting

Visualizing vectors of data with 2-D plotting functions that create

Line area bar and pie charts

Direction and velocity plots

Histograms

Polygons and surfaces

Scatterbubble plots

Animations

3-D Plotting and Volume Visualization

MATLAB provides functions for visualizing 2-D matrices 3-

D scalar and 3-D vector data You can use these functions to visualize and

understand large often complex multidimensional data Specifying plot

characteristics such as camera viewing angle perspective lighting effect

light source locations and transparency

3-D plotting functions include

Surface contour and mesh

Image plots

Cone slice stream and isosurface

323 PERFORMING NUMERIC COMPUTATION

MATLAB contains mathematical statistical and engineering

functions to support all common engineering and science operations These

functions developed by experts in mathematics are the foundation of the

MATLAB language The core math functions use the LAPACK and BLAS

linear algebra subroutine libraries and the FFTW Discrete Fourier

Transform library Because these processor-dependent libraries are

optimized to the different platforms that MATLAB supports they execute

faster than the equivalent C or C++ code

MATLAB provides the following types of functions for performing

mathematical operations and analyzing data

Matrix manipulation and linear algebra

Polynomials and interpolation

Fourier analysis and filtering

Data analysis and statistics

Optimization and numerical integration

Ordinary differential equations (ODEs)

Partial differential equations (PDEs)

Sparse matrix operations

MATLAB can perform arithmetic on a wide range of data types

including doubles singles and integers

CHAPTER 4

IMPLEMENTATION

41 GENERAL

Matlab is a program that was originally designed to simplify the

implementation of numerical linear algebra routines It has since grown into

something much bigger and it is used to implement numerical algorithms

for a wide range of applications The basic language used is very similar to

standard linear algebra notation but there are a few extensions that will

likely cause you some problems at first

42 SNAPSHOTS

ORIGINAL SCLERA IMAGE IS CONVERTED INTO GREY SCALE IMAGE

FIG

GREY SCALE IMAGE IS CONVERTED INTO BINARY IMAGE

FIG

EDGE DETECTON IS DONE BY OTSUrsquoS THRESHOLDING

FIG

SELECTING THE REGION OF INTEREST (SCLERA PART)

FIG

SELECTED ROI PART

FIG

FIG

ENHANCEMENT OF SCLERA IMAGE

FIG

FEATURE EXTRACTION OF SCLERA IMAGE USING GABOR

FILTERS

FIG

MATCHING WITH IMAGES IN DATABASE

FIG

DISPLAYING THE RESULT (MATCHED OR NOT MATCHED)

FIG

CHAPTER 5

APPLICATIONS

The applications of biometrics can be divided into the following three main groups

Commercial applications such as computer network login electronic data security ecommerce Internet access ATM credit card physical access control cellular phone PDA medical records management distance learning etc

Government applications such as national ID card correctional facility driverrsquos license social security welfare-disbursement border control Passport control etc

Forensic applications such as corpse identification criminal investigation terrorist identification parenthood determination missing children etc Traditionally commercial applications have used knowledge-based systems (eg PIN sand passwords) government applications have used token-based systems (eg ID cards and badges) and forensic applications have relied on human experts to match biometric features Biometric systems are being increasingly deployed in large scale civilian applications The Schiphol Premium scheme at the Amsterdam airport for example employs iris scan cards to speed up the passport and visa control procedures

CHAPTER 6

CONCLUSION AND FUTURE SCOPE

61 CONCLUSION

In this paper we proposed a new parallel sclera vein recognition

method which employees a two stage parallel approach for registration and

matching Even though the research focused on developing a parallel sclera

matching solution for the sequential line-descriptor method using CUDA

GPU architecture the parallel strategies developed in this research can be

applied to design parallel solutions to other sclera vein recognition methods

and general pattern recognition methods We designed the Y shape

descriptor to narrow the search range to increase the matching efficiency

which is a new feature extraction method to take advantage of the GPU

structures We developed the WPL descriptor to incorporate mask

information and make it more suitable for parallel computing which can

dramatically reduce data transferring and computation We then carefully

mapped our algorithms to GPU threads and blocks which is an important

step to achieve parallel computation efficiency using a GPU A work flow

which has high arithmetic intensity to hide the memory access latency was

designed to partition the computation task to the heterogeneous system of

CPU and GPU even to the threads in GPU The proposed method

dramatically improves the matching efficiency without compromising

recognition accuracy

62 REFERENCES

[1] C W Oyster The Human Eye Structure and Function Sunderland

Sinauer Associates 1999

[2] C Cuevas D Berjon F Moran and N Garcia ldquoMoving object

detection for real-time augmented reality applications in a GPGPUrdquo IEEE

Trans Consum Electron vol 58 no 1 pp 117ndash125 Feb 2012

[3] D C Cirean U Meier L M Gambardella and J Schmidhuber ldquoDeep

big simple neural nets for handwritten digit recognitionrdquo Neural Comput

vol 22 no 12 pp 3207ndash3220 2010

[4] F Z Sakr M Taher and A M Wahba ldquoHigh performance iris

recognition system on GPUrdquo in Proc ICCES 2011 pp 237ndash242

[5] G Poli J H Saito J F Mari and M R Zorzan ldquoProcessing

neocognitron of face recognition on high performance environment based

on GPU with CUDA architecturerdquo in Proc 20th Int Symp Comput

Archit High Perform Comput 2008 pp 81ndash88

[6] J Antikainen J Havel R Josth A Herout P Zemcik and M Hauta-

Kasari ldquoNonnegative tensor factorization accelerated using GPGPUrdquo IEEE

Trans Parallel Distrib Syst vol 22 no 7 pp 1135ndash1141 Feb 2011

[7] K-S Oh and K Jung ldquoGPU implementation of neural networksrdquo

Pattern Recognit vol 37 no 6 pp 1311ndash1314 2004

[8] P R Dixon T Oonishi and S Furui ldquoHarnessing graphics processors

for the fast computation of acoustic likelihoods in speech recognitionrdquo

Comput Speech Lang vol 23 no 4 pp 510ndash526 2009

[9] P Kaufman and A Alm ldquoClinical applicationrdquo Adlerrsquos Physiology of

the Eye 2003

[10] R N Rakvic B J Ulis R P Broussard R W Ives and N Steiner

ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics Security vol 4

no 4 pp 812ndash823 Dec 2009

[11] S Crihalmeanu and A Ross ldquoMultispectral scleral patterns for ocular

biometric recognitionrdquo Pattern Recognit Lett vol 33 no 14 pp 1860ndash

1869 Oct 2012

[12] W Wenying Z Dongming Z Yongdong L Jintao and G

Xiaoguang ldquoRobust spatial matching for object retrieval and its parallel

implementation on GPUrdquo IEEE Trans Multimedia vol 13 no 6 pp

1308ndash1318 Dec 2011Multimedia Sec Magdeburg Germany Sep 2004

pp 4ndash15

[13] Y Xu S Deka and R Righetti ldquoA hybrid CPU-GPGPU approach for

real-time elastographyrdquo IEEE Trans Ultrason Ferroelectr Freq Control

vol 58 no 12 pp 2631ndash2645 Dec 2011

[14] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive

multimodal eye recognitionrdquo Signal Image Video Process vol 7 no 4

pp 619ndash631 Jul 2013

[15] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive

approach for sclera image quality measurerdquo Int J Biometrics vol 5 no 2

pp 181ndash198 2013

[16] Z Zhou E Y Du N L Thomas and E J Delp ldquoA new human

identification method Sclera recognitionrdquo IEEE Trans Syst Man

Cybern A Syst Humans vol 42 no 3 pp 571ndash583 May 2012

Page 28: Major project report on

CHAPTER 2

PROJECT DESCRIPTION

21 INTRODUCTION

The sclera is the opaque and white outer layer of the eye The blood

vessel structure of sclera is formed randomly and is unique to each person

which can be used for humanrsquos identification Several researchers have

designed different Sclera vein recognition methods and have shown that it

is promising to use Sclera vein recognition for human identification In

Crihalmeanu and Ross proposed three approaches Speed Up Robust

Features (SURF)-based method minutiae detection and direct correlation

matching for feature registration and matching Within these three methods

the SURF method achieves the best accuracy It takes an average of 15

seconds1 using the SURF method to per- form a one-to-one matching Zhou

et al proposed line descriptor-based method for sclera vein recognition

The matching step (including registration) is the most time-consuming step

in this sclera vein recognition system which costs about 12 seconds to

perform a one-to-one matching Both speed was calculated using a PC with

Intelreg Coretrade 2 Duo 24GHz processors and 4 GB DRAM Currently

Sclera vein recognition algorithms are designed using central processing

unit (CPU)-based systems

As discussed CPU-based systems are designed as sequential

processing devices which may not be efficient in data processing where the

data can be parallelized Because of large time consumption in the matching

step Sclera vein recognition using sequential-based method would be very

challenging to be implemented in a real time biometric system especially

when there is large number of templates in the database for matching GPUs

(as abbreviation of General purpose Graphics Processing Units GPGPUs)

are now popularly used for parallel computing to improve the

computational processing speed and efficiency The highly parallel

structure of GPUs makes them more effective than CPUs for data

processing where processing can be performed in parallel GPUs have been

widely used in biometrics recognition such as speech recognition text

detection handwriting recognition and face recognition In iris

recognition GPU was used to extract the features construct descriptors

and match templates

GPUs are also used for object retrieval and image search Park et al

designed the performance evaluation of image processing algorithms such

as linear feature extraction and multi-view stereo matching on GPUs

However these approaches were designed for their specific biometric

recognition applications and feature searching methods Therefore they may

not be efficient for Sclera vein recognition Compute Unified Device

Architecture (CUDA) the computing engine of NVIDIA GPUs is used in

this research CUDA is a highly parallel multithreaded many-core

processor with tremendous computational power It supports not only a

traditional graphics pipeline but also computation on non-graphical data

More importantly it offers an easier programming platform which

outperforms its CPU counterparts in terms of peak arithmetic intensity and

memory bandwidth In this research the goal is not to develop a unified

strategy to parallelize all sclera matching methods because each method is

quite different from one another and would need customized design To

develop an efficient parallel computing scheme it would need different

strategies for different Sclera vein recognition methods

Rather the goal is to develop a parallel sclera matching solution for

Sclera vein recognition using our sequential line-descriptor method using

the CUDA GPU architecture However the parallelization strategies

developed in this research can be applied to design parallel approaches for

other Sclera vein recognition methods and help parallelize general pattern

recognition methods Based on the matching approach in there are three

challenges to map the task of sclera feature matching to GPU

1) Mask files are used to calculate valid overlapping areas of two sclera

templates and to align the templates to the same coordinate system But the

mask files are large in size and will preoccupy the GPU memory and slow

down the data transfer Also some of processing on the mask files will

involve convolution which is difficult to improve its performance on the

scalar process unit on CUDA

2) The procedure of sclera feature matching consists of a pipeline of several

computational stages with different memory and processing requirements

There is no uniform mapping scheme applicable for all these stages

3) When the scale of sclera database is far larger than the number of the

processing units on the GPU parallel matching on the GPU is still unable to

satisfy the requirement of real-time performance New designs are

necessary to help narrow down the search range In summary naiumlve

implementation of the algorithms in parallel would not work efficiently

Note it is relatively straightforward to implement our C program for

CUDA on AMD-based GPU using Open CL Our CUDA kernels can be

directly converted to Open CL kernels by concerning different syntax for

various keywords and built-in functions The mapping strategy is also

effective in Open CL if we regard thread and block in CUDA as work item

and work-group in Open CL Most of our optimization techniques such as

coalesced memory access and prefix sum can work in Open CL too

Moreover since CUDA is a data parallel architecture the implementation

of our approach by Open CL should be programmed in data-parallel model

In this research we first discuss why the naiumlve parallel approach would not

work (Section 3) We then propose the new sclera descriptor ndash the Y shape

sclera feature-based efficient registration method to speed up the mapping

scheme (Section 4) introduce the ldquoweighted polar line (WPL) descriptorrdquo

that would be better suited for parallel computing to mitigate the mask size

issue (Section 5) and develop our coarse to fine two-stage matching

process to dramatically improve the matching speed (Section 6) These new

approaches make the parallel processing possible and efficient However it

is non-trivial to implement these algorithms in CUDA We then developed

the implementation schemes to map our algorithms into CUDA (Section 7)

In the Section 2 we give brief introduction of Sclera vein recognition In

the Section 8 we performed some experiments using the proposed system

In the Section 9 we draw some conclusions

22 BACKGROUND OF SCLERA VEIN RECOGNITION

221 OVERVIEW OF SCLERA VEIN RECOGNITION

A typical sclera vein recognition system includes sclera

segmentation feature enhancement feature extraction and feature

matching (Figure 1)

FIG

Sclera image segmentation is the first step in sclera vein recognition

Several methods have been designed for sclera segmentation Crihalmeanu

et al presented an semi-automated system for sclera segmentation They

used a clustering algorithm to classify the color eye images into three

clusters - sclera iris and background Later on Crihalmeanu and Ross

designed a segmentation approach based on a normalized sclera index

measure which includes coarse sclera segmentation pupil region

segmentation and fine sclera segmentation Zhou et al developed a skin

tone plus ldquowhite colorrdquo-based voting method for sclera segmentation in

color images and Otsursquos thresholding-based method for grayscale images

After sclera segmentation it is necessary to enhance and extract the sclera

features since the sclera vein patterns often lack contrast and are hard to

detect Zhou et al used a bank of multi-directional Gabor filters for

vascular pattern enhancement Derakhshani et al used contrast limited

adaptive histogram equalization (CLAHE) to enhance the green color plane

of the RGB image and a multi-scale region growing approach to identify

the sclera veins from the image background Crihalmeanu and Ross applied

a selective enhancement filter for blood vessels to extract features from the

green component in a color image In the feature matching step

Crihalmeanu and Ross proposed

three registration and matching approaches including Speed Up Robust

Features (SURF) which is based on interest-point detection minutiae

detection which is based on minutiae points on the vasculature structure

and direct correlation matching which relies on image registration Zhou et

al designed a line descriptor based feature registration and matching

method

The proposed sclera recognition consists of five steps which include

sclera segmentation vein pattern enhancement feature extraction feature

matching and matching decision Fig 2 shows the block diagram of sclera

recognition Two types of feature extraction are used in the proposed

method to achieve good accuracy for the identification The characteristics

that are elicited from the blood vessel structure seen in the sclera region are

Histogram of Oriented Gradients (HOG) and interpolation of Cartesian to

Polar conversion HOG is used to determine the gradient orientation and

edge orientations of vein pattern in the sclera region of an eye image To

become more computationally efficient the data of the image are converted

to the polar form It is mainly used for circular or quasi circular shape of

object These two characteristics are extracted from all the images in the

database and compared with the features of the query image whether the

person is correctly identified or not This procedure is done in the feature

matching step and ultimately makes the matching decision By using the

proposed feature extraction methods and matching techniques the human

identification is more accurate than the existing studies In the proposed

method two features of an image are drawn out

222 SCLERA SEGMENTATION

Sclera segmentation is the first step in the sclera recognition It lets

in three steps glare area detection sclera area estimation and iris and eyelid

detection and refinement Fig shows the steps of segmentation

FIG

Glare Area Detection Glare area means a small bright area near

pupil or iris This is the unwanted portion on the eye image Sobel filter is

applied to detect the glare area present in the iris or pupil Simply it runs

only for the grayscale image If the image is color then it needs a

conversion to grayscale image and after that apply it to the Sobel filter to

detect the glare area Fig 4 shows the result of the glare area detection

FIG

Sclera area estimation For the estimation of sclera area Otsursquos

thresholding method is applied The stairs of the sclera area detection are

selection of the area of interest (ROI) Otsursquos thresholding sclera area

detection Left and right sclera area is selected based on the iris boundaries

When the region of interest is selected then apply Otsursquos thresholding for

obtaining the potential sclera areas The correct left sclera area should be

placed in the right and center positions and correct right sclera area should

be placed in the left and center In this way non sclera areas are wiped out

223 IRIS AND EYELID REFINEMENT

The top and underside of the sclera regions are the limits of the

sclera area And then that upper eyelid lower eyelid and iris boundaries are

refined These altogether are the unwanted portion for recognition In order

to eliminate these effects refinement is done in the footstep of the detection

of sclera area Fig shows after the Otsursquos thresholding process and iris and

eyelid refinement to detect right sclera area In the same way the left sclera

area is detected using this method

FIG

In the segmentation process all images are not perfectly segmented

Hence feature extraction and matching are needed to reduce the

segmentation fault The vein patterns in the sclera area are not visible in the

segmentation process To get vein patterns more visible vein pattern

enhancement is to be performed

224 OCULAR SURFACE VASCULATURE

Human recognition using vascular patterns in the human body has

been studied in the context of fingers (Miura et al 2004) palm (Lin and

Fan 2004) and retina (Hill 1999) In the case of retinal biometrics an

especial optical device for imaging the back of the eyeball is needed (Hill

1999) Due to its perceived invasiveness and the required degree of subject

cooperation the use of retinal biometrics may not be acceptable to some

individuals The conjunctiva is a thin transparent and moist tissue that

covers the outer surface of the eye The part of the conjunctiva that covers

the inner lining of the eyelids is called palpebral conjunctiva and the part

that covers the outer surface of the eye is called ocular (or the bulbar)

conjunctiva which is the focus of this study The ocular conjunctiva is very

thin and clear thus the vasculature (including those of the episclera) is

easily visible through it The visible microcirculation of conjunctiva offers a

rich and complex network of veins and fine microcirculation (Fig 1) The

apparent complexity and specificity of these vascular patterns motivated us

to utilize them for personal identification (Derakhshani and Ross 2006)

FIG

We have found conjunctival vasculature to be a suitable biometric as it

conforms to the following criteria (Jain et al 2004)

UNIVERSALITY All normal living tissues including that of the

conjunctiva and episclera have vascular structure

UNIQUENESS Vasculature is created during embryonic vasculogenesis

Its detailed final structure is mostly stochastic and thus unique Even

though no comprehensive study on the uniqueness of vascular structures

has been conducted study of some targeted areas such as those of the eye

fundus confirm the uniqueness of such vascular patterns even between

identical twins (Simon and Goldstein 1935 Tower 1955)

PERMANENCE Other than cases of significant trauma pathology or

chemical intervention spontaneous adult ocular vasculogenesis and

angiogenesis does not easily occur Thus the conjunctival vascular

structure is expected to have reasonable permanence (Joussen 2001)

Practicality Conjunctival vasculature can be captured with commercial off

the shelf digital cameras under normal lighting conditions making this

modality highly practical

ACCEPTABILITY Since the subject is not required to stare directly into

the camera lens and given the possibility of capturing the conjunctival

vasculature from several feet away this modality is non-intrusive and thus

more acceptable

SPOOF-PROOFNESS The fine multi surface structure of the ocular

veins makes them hard to reproduce as a physical artifact Besides being a

stand-alone biometric modality we anticipate that the addition of

conjunctival biometrics will enhance the performance of current iris-based

biometric system in the following ways

Improving accuracy by the addition of vascular features

Facilitating recognition using off-angle iris images For instance if the iris

information is relegated to the left or right portions of the eye the sclera

vein patterns will be further exposed This feature makes sclera vasculature

a natural complement to the iris biometric

Addressing the failure-to-enroll issue when iris patterns are not usable (eg

due to surgical procedures)

Reducing vulnerability to spoof attacks For instance when implemented

alongside iris systems an attacker needs to reproduce not only the iris but

also different surfaces of the sclera along with the associated

microcirculation and make them available on commensurate eye surfaces

The first step in parallelizing an algorithm is to determine the

availability for simultaneous computation Below Figure demonstrates the

possibility for parallel directional filtering Since the filter is computed over

different portions of the input image the computation can be computed in

parallel (denoted by Elements below) In addition individual parallelization

of each element of Filtering can also be performed A detailed discussion of

our proposed parallelization is outside the scope of this paper

FIG

FIG

225 OVERVIEW OF THE LINE DESCRIPTOR-BASED SCLERA

VEIN

2251 RECOGNITION METHOD

The matching segment of the line-descriptor based method is a

bottleneck with regard to matching speed In this section we briefly

describe the Line Descriptor-based sclera vein recognition method After

segmentation vein patterns were enhanced by a bank of directional Gabor

filters Binary morphological operations are used to thin the detected vein

structure down to a single pixel wide skeleton and remove the branch

points The line descriptor is used to describe the segments in the vein

structure Figure 2 shows a visual description of the line descriptor Each

segment is described by three quantities the segments angle to some

reference angle at the iris center θ the segments distance to the iris center r

and the dominant angular orientation of the line segment ɸ Thus the

descriptor is S = ( θ r ɸ )T The individual components of the line descriptor

are calculated as

FIG

Here fline (x) is the polynomial approximation of the line segment (xl yl )

is the center point of the line segment (xi yi ) is the center of the detected

iris and S is the line descriptor In order to register the segments of the

vascular patterns a RANSAC-based algorithm is used to estimate the best-

fit parameters for registration between the two sclera vascular patterns For

the registration algorithm it randomly chooses two points ndash one from the

test template and one from the target template It also randomly chooses a

scaling factor and a rotation value based on a priori knowledge of the

database Using these values it calculates a fitness value for the registration

using these parameters

After sclera template registration each line segment in the test

template is compared to the line segments in the target template for

matches In order to reduce the effect of segmentation errors we created the

weighting image (Figure 3) from the sclera mask by setting interior pixels

in the sclera mask to 1 pixels within some distance of the boundary of the

mask to 05 and pixels outside the mask to 0

The matching score for two segment descriptors is calculated By

where Si and Sj are two segment descriptors m(Si Sj ) is the matching

score between segments Si and Sj d(Si Sj ) is the Euclidean distance

between the segment descriptors center points (from Eq 6-8) Dmatch is

the matching distance threshold and match is the matching angle threshold

The total matching score M is the sum of the individual matching scores

divided by the maximum matching score for the minimal set between the

test and target template That is one of the test or target templates has fewer

points and thus the sum of its descriptors weight sets the maximum score

that can be attained

FIG

FIG

FIG

FIG

movement of eye Y shape branches are observed to be a stable feature and

can be used as sclera feature descriptor To detect the Y shape branches in

the original template we search for the nearest neighbors set of every line

segment in a regular distance classified the angles among these neighbors

If there were two types of angle values in the line segment set this set may

be inferred as a Y shape structure and the line segment angles would be

recorded as a new feature of the sclera

There are two ways to measure both orientation and relationship of

every branch of Y shape vessels one is to use the angles of every branch to

x axle the other is to use the angels between branch and iris radial

direction The first method needs additional rotation operating to align the

template In our approach we employed the second method As Figure 6

shows ϕ1 ϕ2 and ϕ3 denote the angle between each branch and the radius

from pupil center Even when the head tilts the eye moves or the camera

zooms occurs at the image acquisition step ϕ1 ϕ2 and ϕ3 are quite stable

To tolerate errors from the pupil center calculation in the segmentation step

we also recorded the center position (x y) of the Y shape branches as

auxiliary parameters So in our rotation shift and scale invariant feature

vector is defined as y(ϕ1 ϕ2 ϕ3 x y) The Y-shape descriptor is generated

with reference to the iris center Therefore it is automatically aligned to the

iris centers It is a rotational- and scale- invariant descriptor V WPL

SCLERA DESCRIPTOR As we discussed in the Section 22 the line

descriptor is extracted from the skeleton of vessel structure in binary images

(Figure 7) The skeleton is then broken into smaller segments For each

segment a line descriptor is created to record the center and orientation of

the segment This descriptor is expressed as s(x yɸ) where (x y) is the

position of the center and ɸ is its orientation Because of the limitation of

segmentation accuracy the descriptor in the boundary of sclera area might

not be accurate and may contain spur edges resulting from the iris eyelid

andor eyelashes To be tolerant of such error the mask file

FIG

The line descriptor of the sclera vessel pattern (a) An eye image (b) Vessel

patterns in sclera (c) Enhanced sclera vessel patterns (d) Centers of line

segments of vessel patterns

Is designed to indicate whether a line segment belongs to the edge of the

sclera or not However in GPU application using the mask is a challenging

since the mask files are large in size and will occupy the GPU memory and

slow down the data transfer When matching the registration RANSAC

type algorithm was used to randomly select the corresponding descriptors

and the transform parameter between them was used to generate the

template transform affine matrix After every templates transform the mask

data should also be transformed and new boundary should be calculated to

evaluate the weight of the transformed descriptor This results in too many

convolutions in processor unit

To reduce heavy data transfer and computation we designed the

weighted polar line (WPL) descriptor structure which includes the

information of mask and can be automatically aligned We extracted the

relationship of geometric feature of descriptors and store them as a new

descriptor We use a weighted image created via setting various weight

values according to their positions The weight of those descriptors who are

beyond the sclera are set to be 0 and those who are near the sclera

boundary are 05 and interior descriptors are set to be 1 In our work

descriptors weights were calculated on their own mask by the CPU only

once

The calculating result was saved as a component of descriptor The

descriptor of sclera will change to s(x y ɸw) where w denotes the weight

of the point and the value may be 0 05 1 To align two templates when a

template is shifted to another location along the line connecting their

centers all the descriptors of that template will be transformed It would be

faster if two templates have similar reference points If we use the center of

the iris as the reference point when two templates are compared the

correspondence will automatically be aligned to each other since they have

the similar reference point Every feature vector of the template is a set of

line segment descriptors composed of three variable (Figure 8) the

segment angle to the reference line which went through the iris center

denoted as θ the distance between the segments center and pupil center

which is denoted as r the dominant angular orientation of the segment

denoted as ɸ To minimize the GPU computing we also convert the

descriptor value from polar coordinate to rectangular coordinate in CPU

preprocess

The descriptor vector becomes s(x y r θ ɸw) The left and right

parts of sclera in an eye may have different registration parameters For

example as an eyeball moves left left part sclera patterns of the eye may be

compressed while the right part sclera patterns are stretched

In parallel matching these two parts are assigned to threads in

different warps to allow different deformation The multiprocessor in

CUDA manages threads in groups of 32 parallel threads called warps We

reorganized the descriptor from same sides and saved

FIG

FIG

them in continuous address This would meet requirement of coalesced

memory access in GPU

After reorganizing the structure of descriptors and adding mask information

into the new descriptor the computation on the mask file is not needed on

the GPU It was very fast to match with this feature because it does not

need to reregister the templates every time after shifting Thus the cost of

data transfer and computation on GPU will be reduced Matching on the

new descriptor the shift parameters generator in Figure 4 is then simplified

as Figure 9

23 EVOLUTION OF GPU ARCHITECTURE

The fixed-function pipeline lacked the generality to efficiently express

more complicated shading and lighting operations that are essential for

complex effects The key step was replacing the fixed-function per-vertex

and per-fragment operations with user-specified programs run on each

vertex and fragment Over the past six years these vertex programs and

fragment programs have become increasingly more capable with larger

limits on their size and resource consumption with more fully featured

instruction sets and with more flexible control-flow operations After many

years of separate instruction sets for vertex and fragment operations current

GPUs support the unified Shader Model 40 on both vertex and fragment

shaders

The hardware must support shader programs of at least 65 k static

instructions and unlimited dynamic instructions

The instruction set for the first time supports both 32-bit integers and 32-

bit floating-point numbers

The hardware must allow an arbitrary number of both direct and indirect

reads from global memory (texture)

Finally dynamic flow control in the form of loops and branches must be

supported

As the shader model has evolved and become more powerful and GPU

applications of all types have increased vertex and fragment program

complexity GPU architectures have increasingly focused on the

programmable parts of the graphics pipeline Indeed while previous

generations of GPUs could best be described as additions of

programmability to a fixed-function pipeline todayrsquos GPUs are better

characterized as a programmable engine surrounded by supporting fixed-

function units General-Purpose Computing on the GPU Mapping general-

purpose computation onto the GPU uses the graphics hardware in much the

same way as any standard graphics application Because of this similarity it

is both easier and more difficult to explain the process On one hand the

actual operations are the same and are easy to follow on the other hand the

terminology is different between graphics and general-purpose use Harris

provides an excellent description of this mapping process

We begin by describing GPU programming using graphics terminology

then show how the same steps are used in a general-purpose way to author

GPGPU applications and finally use the same steps to show the more

simple and direct way that todayrsquos GPU computing applications are written

231 PROGRAMMING A GPU FOR GRAPHICS

We begin with the same GPU pipeline that we described in Section II

concentrating on the programmable aspects of this pipeline

The programmer specifies geometry that covers a region on the screen

The rasterizer generates a fragment at each pixel location covered by that

geometry

Each fragment is shaded by the fragment program

The fragment program computes the value of the fragment by a

combination of math operations and global memory reads from a global

Btexture[ memory

The resulting image can then be used as texture on future passes through

the graphics pipeline

232 PROGRAMMING A GPU FOR GENERAL-PURPOSE

PROGRAMS (OLD)

Coopting this pipeline to perform general-purpose computation

involves the exact same steps but different terminology A motivating

example is a fluid simulation computed over a grid at each time step we

compute the next state of the fluid for each grid point from the current state

at its grid point and at the grid points of its neighbors

The programmer specifies a geometric primitive that covers a

computation domain of interest The rasterizer generates a fragment at each

pixel location covered by that geometry (In our example our primitive

must cover a grid of fragments equal to the domain size of our fluid

simulation)

Each fragment is shaded by an SPMD general-purpose fragment

program (Each grid point runs the same program to update the state of its

fluid)

The fragment program computes the value of the fragment by a

combination of math operations and Bgather[ accesses from global

memory (Each grid point can access the state of its neighbors from the

previous time step in computing its current value)

The resulting buffer in global memory can then be used as an input on

future passes (The current state of the fluid will be used on the next time

step)

23 2 PROGRAMMING A GPU FOR GENERAL-PURPOSE

PROGRAMS (NEW)

One of the historical difficulties in programming GPGPU applications

has been that despite their general-purpose tasksrsquo having nothing to do with

graphics the applications still had to be programmed using graphics APIs

In addition the program had to be structured in terms of the graphics

pipeline with the programmable units only accessible as an intermediate

step in that pipeline when the programmer would almost certainly prefer to

access the programmable units directly The programming environments we

describe in detail in Section IV are solving this difficulty by providing a

more natural direct non-graphics interface to the hardware and

specifically the programmable units Today GPU computing applications

are structured in the following way

The programmer directly defines the computation domain of interest as a

structured grid of threads

An SPMD general-purpose program computes the value of each thread

The value for each thread is computed by a combination of math

operations and both Bgather[ (read) accesses from and Bscatter[ (write)

accesses to global memory Unlike in the previous two

methods the same buffer can be used for both reading and writing

allowing more flexible algorithms (for example in-place algorithms that

use less memory)

The resulting buffer in global memory can then be used as an input in

future computation

24 COARSE-TO-FINE TWO-STAGE MATCHING PROCESS

To further improve the matching process we propose the coarse-to-fine

two-stage matching process In the first stage we matched two images

coarsely using the Y-shape descriptors which is very fast to match because

no registration was needed The matching result in this stage can help filter

out image pairs with low similarities After this step it is still possible for

some false positive matches In the second stage we used WPL descriptor

to register the two images for more detailed descriptor matching including

scale- and translation invariance This stage includes shift transform affine

matrix generation and final WPL descriptor matching Overall we

partitioned the registration and matching processing into four kernels2 in

CUDA (Figure 10) matching on the Y shape descriptor shift

transformation affine matrix generation and final WSL descriptor

matching Combining these two stages the matching program can run faster

and

achieve more accurate score

241 STAGE I MATCHING WITH Y SHAPE DESCRIPTOR

Due to scale- and rotation- invariance of the Y-shape features

registration is unnecessary before matching on Y shape descriptor The

whole matching algorithm is listed as algorithm 1

FIG

Here ytei and yta j are the Y shape descriptors of test template Tte

and target template Tta respectively dϕ is the Euclidian distance of angle

element of descriptors vector defined as (3) dxy is the Euclidian distance of

two descriptor centers defined as (4) ni and di are the matched descriptor

pairsrsquo number and their centers distance respectively tϕ is a distance

threshold and txy is the threshold to restrict the searching area We set tϕ to

30 and txy to 675 in our experiment Here

To match two sclera templates we searched the areas nearby to all

the Y shape branches The search area is limited to the corresponding left or

right half of the sclera in order to reduce the searching range and time The

distance of two branches is defined in (3) where ϕi j is the angle between

the j th branch and the polar from pupil center in desctiptor i

The number of matched pairs ni and the distance between Y shape

branches centers di are stored as the matching result We fuse the number of

matched branches and the average distance between matched branches

centers as (2) Here α is a factor to fuse the matching score which was set

to 30 in our study Ni and Nj is the total numbers of feature vectors in

template i and j separately The decision is regulated by the threshold t if

the sclerarsquos matching score is lower than t the sclera will be discarded The

sclera with high matching score will be passed to the next more precisely

matching process

242 STAGE II FINE MATCHING USING WPL DESCRIPTOR

The line segment WSL descriptor reveals more vessel structure detail of

sclera than the Y shape descriptor The variation of sclera vessel pattern is

nonlinear because

When acquiring an eye image in different gaze angle the vessel structure

will appear nonlinear shrink or extend because eyeball is spherical in shape

sclera is made up of four layers episclera stroma lamina fusca and

endothelium There are slightly differences among movement of these

layers Considering these factors our registration employed both single

shift transform and multi-parameter transform which combines shift

rotation and scale together

1) SHIFT PARAMETER SEARCH As we discussed before

segmentation may not be accurate As a result the detected iris center could

not be very accurate Shift transform is designed to tolerant possible errors

in pupil center detection in the segmentation step If there is no deformation

or only very minor deformation registration with shift transform together

would be adequate to achieve an accurate result We designed Algorithm 2

to get optimized shift parameter Where Tte is the test template and ssei is

the i th WPL descriptor of Tte Tta is the target template and ssai is the i th

WPL descriptor of Ttad(stek staj ) is Euclidean distance of escriptors stek

and staj

Δsk is the shift value of two descriptors defines as

We first randomly select an equal number of segment descriptors

stek in test template Tte from each quad and find its nearest neighbors staj _

in target template Tta The shift offset of them is recorded as the possible

registration shift factor _sk The final offset registration factor is _soptim

which has the smallest standard deviation among these candidate offsets

2) AFFINE TRANSFORM PARAMETER SEARCH

Affine transform is designed to tolerant some deformation of sclera

patterns in the matching step The affine transform algorithm is shown in

Algorithm 3 The shift value in the parameter set is obtained by randomly

selecting descriptor s(it )te and calculating the distance from its nearest

neighbor staj_ in Tta We transform the test template by the matrix in (7)

At end of the iteration we count the numbers of matched descriptor pairs

from the transformed template and the target template The factor β is

involved to determine if the pair of descriptor is matched and we set it to

be 20 pixels in our experiment After N iterations the optimized transform

parameter set is determined via selecting the maximum matching numbers

m(it) Here stei Tte staj and Tta is defined same as algorithm 2 tr (it )

shi f t θ(it )tr (it ) scale is the parameters of shift rotation and scale

transform generated in i tth iteration R(θ (it )) T (tr (it ) shi f t ) and S(tr (it

) scale) are the transform matrix defined as (7) To search optimize

transform parameter we iterated N times to generate these parameters In

our experiment we set iteration time to 512

3) REGISTRATION AND MATCHING ALGORITHM

Using the optimized parameter set determined from Algorithms 2

and 3 the test template will be registered and matched simultaneously The

registration and matching algorithm is listed in Algorithm 4 Here stei Tte

staj and Tta are defined same as Algorithms 2 and 3 θ(optm) tr (optm) shi f

t tr (optm) scale _soptim are the registration parameters attained from

Algorithms 2 and 3 R_θ(optm)_T _tr (optm) shi f t _S_tr (optm) scale

_ is the descriptor transform matrix defined in Algorithm 3 empty is the angle

between the segment descriptor and radius direction w is the weight of the

descriptor which indicates whether the descriptor is at the edge of sclera or

not To ensure that the nearest descriptors have a similar orientation we

used a constant factor α to check the abstract difference of two ɸ In our

experiment we set α to 5 The total matching score is minimal score of two

transformed result divided by the minimal matching score for test template

and target template

25 MAPPING THE SUBTASKS TO CUDA

CUDA is a single instruction multiple data (SIMD) system and

works as a coprocessor with a CPU A CUDA consists of many streaming

multiprocessors (SM) where the parallel part of the program should be

partitioned into threads by the programmer and mapped into those threads

There are multiple memory spaces in the CUDA memory hierarchy

register local memory shared memory global memory constant memory

and texture memory Register local memory and shared memory are on-

chip and could be a little time consuming to access these memories Only

shared memory can be accessed by other threads within the same block

However there is only limited availability of shared memory Global

memory constant memory and texture memory are off-chip memory and

accessible by all threads which would be very time consuming to access

these memories

Constant memory and texture memory are read-only and cacheable

memory Mapping algorithms to CUDA to achieve efficient processing is

not a trivial task There are several challenges in CUDA programming

If threads in a warp have different control path all the branches will be

executed serially To improve performance branch divergence within a

warp should be avoided

Global memory is slower than on-chip memory in term of access To

completely hide the latency of the small instructions set we should use on-

chip memory preferentially rather than global memory When global

memory access occurs threads in same warp should access the words in

sequence to achieve coalescence

Shared memory is much faster than the local and global memory space

But shared memory is organized into banks which are equal in size If two

addresses of memory request from different thread within a warp fall in the

same memory bank the access will be serialized To get maximum

performance memory requests should be scheduled to minimize bank

conflicts

251 MAPPING ALGORITHM TO BLOCKS

Because the proposed registration and matching algorithm has four

independent modules all the modules will be converted to different kernels

on the GPU These kernels are different in computation density thus we

map them to the GPU by various map strategies to fully utilize the

computing power of CUDA Figure 11 shows our scheme of CPU-GPU

task distribution and the partition among blocks and threads Algorithm 1 is

partitioned into coarse-grained parallel subtasks

We create a number of threads in this kernel The number of threads

is the same as the number of templates in the database As the upper middle

column shows in Figure 11 each target template will be assigned to one

thread One thread performs a pair of templates compare In our work we

use NVIDIA C2070 as our GPU Threads and blocks number is set to

1024 That means we can match our test template with up to 1024times1024

target templates at same time

Algorithms 2-4 will be partitioned into fine-grained subtasks which is

processed a section of descriptors in one thread As the lower portion of the

middle column shows in Figure 11 we assigned a target template to one

block Inside a block one thread corresponds a set of descriptors in this

template This partition makes every block execute independently and there

are no data exchange requirements between different blocks When all

threads complete their responding descriptor fractions the sum of the

intermediate results needs to be computed or compared A parallel prefix

sum algorithm is used to calculate the sum of intermediate results which is

show in right of Figure 11 Firstly all odd number threads compute the sum

of consecutive pairs of the results Then recursively every first of i (= 4 8

16 32 64 ) threads

compute the prefix sum on the new result The final result will be saved in

the first address which has the same variable name as the first intermediate

result

252 MAPPING INSIDE BLOCK

In shift argument searching there are two schemes we can choose to

map task

Mapping one pair of templates to all the threads in a block and then every

thread would take charge of a fraction of descriptors and cooperation with

other threads

Assigning a single possible shift offset to a thread and all the threads will

compute independently unless the final result should be compared with

other possible offset

Due to great number of sum and synchronization operations in every

nearest neighbor searching step we choose the second method to parallelize

shift searching In affine matrix generator we mapped an entire parameter

set searching to a thread and every thread randomly generated a set of

parameters and tried them independently The generated iterations were

assigned to all threads The challenge of this step is the randomly generated

numbers might be correlated among threads In the step of rotation and

scale registration generating we used the Mersenne Twister pseudorandom

number generator because it can use bitwise arithmetic and have long

period

The Mersenne twister as most of pseudorandom generators is iterative

Therefore itrsquos hard to parallelize a single twister state update step among

several execution threads To make sure that thousands of threads in the

launch grid generate uncorrelated random sequence many simultaneous

Mersenne twisters need to process with different initial states in parallel

But even ldquovery differentrdquo (by any definition) initial state values do not

prevent the emission of correlated sequences by each generator sharing

identical parameters To solve this problem and to enable efficient

implementation of Mersenne Twister on parallel architectures we used a

special offline tool for the dynamic creation of Mersenne Twisters

parameters modified from the algorithm developed by Makoto Matsumoto

and Takuji Nishimura In the registration and matching step when

searching the nearest neighbor a line segment that has already matched

with others should not be used again In our approach a flag

FIG

FIG

Variable denoting whether the line has been matched is stored in

shared memory To share the flags all the threads in a block should wait

synchronic operation at every query step Our solution is to use a single

thread in a block to process the matching

253 MEMORY MANAGEMENT

The bandwidth inside GPU board is much higher than the

bandwidth between host memory and device memory The data transfer

between host and device can lead to long latency As shown in Figure 11

we load the entire target templates set from database without considering

when they would be processed Therefore there was no data transfer from

host to device during the matching procedure In global memory the

components in descriptor y(ϕ1 ϕ2 ϕ3 x y) and s(x y rθ ϕw) were stored

separately This would guarantee contiguous kernels of Algorithm 2 to 4

can access their data in successive addresses Although such coalescing

access reduces the latency frequently global memory access was still a

slower way to get data In our kernel we loaded the test template to shared

memory to accelerate memory access Because the Algorithms 2 to 4

execute different number of iterations on same data the bank conflict does

not happen To maximize our texture memory space we set the system

cache to the lowest value and bonded our target descriptor to texture

memory Using this catchable memory our data access was accelerated

more

FIG

26 HISTOGRAM OF ORIENTED GRADIENTS

Histogram of oriented gradients is the feature descriptors It is primarily

applied to the design of target detection In this paper it is applied as the

feature for human recognition In the sclera region the vein patterns are the

edges of an image So HOG is used to determine the gradient orientation

and edge orientations of vein pattern in the sclera region of an eye image

To follow out this technique first of all divide the image into small

connected regions called cells For each cell compute the histogram of

gradient directions or edge orientations of the pixels Then the combination

of different histogram of different cell represents the descriptor To improve

accuracy histograms can be contrast normalized by calculating the intensity

from the block and then using this value normalizes all cells within the

block This normalization result shows that it is invariant to geometric and

photometric changes The gradient magnitude m(x y) and orientation 1050592(x

y) are calculated using x and y directions gradients dx (x y) and dy (x y)

Orientation binning is the second step of HOG This method utilized

to create cell histograms Each pixel within the cell used to give a weight to

the orientation which is found in the gradient computation Gradient

magnitude is used as the weight The cells are in the rectangular form The

binning of gradient orientation should be spread over 0 to 180 degrees and

opposite direction counts as the same In the Fig 8 depicts the edge

orientation of picture elements If the images have any illumination and

contrast changes then the gradient strength must be locally normalized For

that cells are grouped together into larger blocks These blocks are

overlapping so that each cell contributes more than once to the final

descriptor Here rectangular HOG (R-HOG) blocks are applied which are

mainly in square grids The performance of HOG is improved by putting

on a Gaussian window into each block

FIG

CHAPTER 3

SOFTWARE SPECIFICATION

31 GENERAL

MATLAB(matrix laboratory) is a numerical

computing environment and fourth-generation programming language

Developed by Math Works MATLAB allows matrix manipulations

plotting of functions and data implementation of algorithms creation

of user interfaces and interfacing with programs written in other languages

including C C++ Java and Fortran

Although MATLAB is intended primarily for numerical computing an

optional toolbox uses the MuPAD symbolic engine allowing access

to symbolic computing capabilities An additional package Simulink adds

graphicalmulti-domainsimulationandModel-Based

Design for dynamic and embedded systems

In 2004 MATLAB had around one million users across industry

and academia MATLAB users come from various backgrounds

of engineering science and economics MATLAB is widely used in

academic and research institutions as well as industrial enterprises

MATLAB was first adopted by researchers and practitioners

in control engineering Littles specialty but quickly spread to many other

domains It is now also used in education in particular the teaching

of linear algebra and numerical analysis and is popular amongst scientists

involved in image processing The MATLAB application is built around the

MATLAB language The simplest way to execute MATLAB code is to type

it in the Command Window which is one of the elements of the MATLAB

Desktop When code is entered in the Command Window MATLAB can

be used as an interactive mathematical shell Sequences of commands can

be saved in a text file typically using the MATLAB Editor as a script or

encapsulated into a function extending the commands available

MATLAB provides a number of features for documenting and

sharing your work You can integrate your MATLAB code with other

languages and applications and distribute your MATLAB algorithms and

applications

32 FEATURES OF MATLAB

High-level language for technical computing

Development environment for managing code files and data

Interactive tools for iterative exploration design and problem solving

Mathematical functions for linear algebra statistics Fourier analysis

filtering optimization and numerical integration

2-D and 3-D graphics functions for visualizing data

Tools for building custom graphical user interfaces

Functions for integrating MATLAB based algorithms with external

applications and languages such as C C++ FORTRAN Javatrade COM

and Microsoft Excel

MATLAB is used in vast area including signal and image

processing communications control design test and measurement

financial modeling and analysis and computational Add-on toolboxes

(collections of special-purpose MATLAB functions) extend the MATLAB

environment to solve particular classes of problems in these application

areas

MATLAB can be used on personal computers and powerful

server systems including the Cheaha compute cluster With the addition of

the Parallel Computing Toolbox the language can be extended with parallel

implementations for common computational functions including for-loop

unrolling Additionally this toolbox supports offloading computationally

intensive workloads to Cheaha the campus compute cluster MATLAB is

one of a few languages in which each variable is a matrix (broadly

construed) and knows how big it is Moreover the fundamental operators

(eg addition multiplication) are programmed to deal with matrices when

required And the MATLAB environment handles much of the bothersome

housekeeping that makes all this possible Since so many of the procedures

required for Macro-Investment Analysis involves matrices MATLAB

proves to be an extremely efficient language for both communication and

implementation

321 INTERFACING WITH OTHER LANGUAGES

MATLAB can call functions and subroutines written in the C

programming language or FORTRAN A wrapper function is created

allowing MATLAB data types to be passed and returned The dynamically

loadable object files created by compiling such functions are termed MEX-

files (for MATLAB executable)

Libraries written in Java ActiveX or NET can be directly called

from MATLAB and many MATLAB libraries (for

example XML or SQL support) are implemented as wrappers around Java

or ActiveX libraries Calling MATLAB from Java is more complicated but

can be done with MATLAB extension which is sold separately by Math

Works or using an undocumented mechanism called JMI (Java-to-Mat lab

Interface) which should not be confused with the unrelated Java that is also

called JMI

As alternatives to the MuPAD based Symbolic Math Toolbox

available from Math Works MATLAB can be connected

to Maple or Mathematical

Libraries also exist to import and export MathML

Development Environment

Startup Accelerator for faster MATLAB startup on Windows especially on

Windows XP and for network installations

Spreadsheet Import Tool that provides more options for selecting and

loading mixed textual and numeric data

Readability and navigation improvements to warning and error messages in

the MATLAB command window

Automatic variable and function renaming in the MATLAB Editor

Developing Algorithms and Applications

MATLAB provides a high-level language and development

tools that let you quickly develop and analyze your algorithms and

applications

The MATLAB Language

The MATLAB language supports the vector and matrix operations

that are fundamental to engineering and scientific problems It enables fast

development and execution With the MATLAB language you can

program and develop algorithms faster than with traditional languages

because you do not need to perform low-level administrative tasks such as

declaring variables specifying data types and allocating memory In many

cases MATLAB eliminates the need for lsquoforrsquo loops As a result one line of

MATLAB code can often replace several lines of C or C++ code

At the same time MATLAB provides all the features of a traditional

programming language including arithmetic operators flow control data

structures data types object-oriented programming (OOP) and debugging

features

MATLAB lets you execute commands or groups of commands one

at a time without compiling and linking enabling you to quickly iterate to

the optimal solution For fast execution of heavy matrix and vector

computations MATLAB uses processor-optimized libraries For general-

purpose scalar computations MATLAB generates machine-code

instructions using its JIT (Just-In-Time) compilation technology

This technology which is available on most platforms provides

execution speeds that rival those of traditional programming languages

Development Tools

MATLAB includes development tools that help you implement

your algorithm efficiently These include the following

MATLAB Editor

Provides standard editing and debugging features such as setting

breakpoints and single stepping

Code Analyzer

Checks your code for problems and recommends modifications to

maximize performance and maintainability

MATLAB Profiler

Records the time spent executing each line of code

Directory Reports

Scan all the files in a directory and report on code efficiency file

differences file dependencies and code coverage

Designing Graphical User Interfaces

By using the interactive tool GUIDE (Graphical User Interface

Development Environment) to layout design and edit user interfaces

GUIDE lets you include list boxes pull-down menus push buttons radio

buttons and sliders as well as MATLAB plots and Microsoft

ActiveXreg controls Alternatively you can create GUIs programmatically

using MATLAB functions

322 ANALYZING AND ACCESSING DATA

MATLAB supports the entire data analysis process from acquiring

data from external devices and databases through preprocessing

visualization and numerical analysis to producing presentation-quality

output

Data Analysis

MATLAB provides interactive tools and command-line functions for data

analysis operations including

Interpolating and decimating

Extracting sections of data scaling and averaging

Thresholding and smoothing

Correlation Fourier analysis and filtering

1-D peak valley and zero finding

Basic statistics and curve fitting

Matrix analysis

Data Access

MATLAB is an efficient platform for accessing data from

files other applications databases and external devices You can read data

from popular file formats such as Microsoft Excel ASCII text or binary

files image sound and video files and scientific files such as HDF and

HDF5 Low-level binary file IO functions let you work with data files in

any format Additional functions let you read data from Web pages and

XML

Visualizing Data

All the graphics features that are required to visualize engineering

and scientific data are available in MATLAB These include 2-D and 3-D

plotting functions 3-D volume visualization functions tools for

interactively creating plots and the ability to export results to all popular

graphics formats You can customize plots by adding multiple axes

changing line colors and markers adding annotation Latex equations and

legends and drawing shapes

2-D Plotting

Visualizing vectors of data with 2-D plotting functions that create

Line area bar and pie charts

Direction and velocity plots

Histograms

Polygons and surfaces

Scatterbubble plots

Animations

3-D Plotting and Volume Visualization

MATLAB provides functions for visualizing 2-D matrices 3-

D scalar and 3-D vector data You can use these functions to visualize and

understand large often complex multidimensional data Specifying plot

characteristics such as camera viewing angle perspective lighting effect

light source locations and transparency

3-D plotting functions include

Surface contour and mesh

Image plots

Cone slice stream and isosurface

323 PERFORMING NUMERIC COMPUTATION

MATLAB contains mathematical statistical and engineering

functions to support all common engineering and science operations These

functions developed by experts in mathematics are the foundation of the

MATLAB language The core math functions use the LAPACK and BLAS

linear algebra subroutine libraries and the FFTW Discrete Fourier

Transform library Because these processor-dependent libraries are

optimized to the different platforms that MATLAB supports they execute

faster than the equivalent C or C++ code

MATLAB provides the following types of functions for performing

mathematical operations and analyzing data

Matrix manipulation and linear algebra

Polynomials and interpolation

Fourier analysis and filtering

Data analysis and statistics

Optimization and numerical integration

Ordinary differential equations (ODEs)

Partial differential equations (PDEs)

Sparse matrix operations

MATLAB can perform arithmetic on a wide range of data types

including doubles singles and integers

CHAPTER 4

IMPLEMENTATION

41 GENERAL

Matlab is a program that was originally designed to simplify the

implementation of numerical linear algebra routines It has since grown into

something much bigger and it is used to implement numerical algorithms

for a wide range of applications The basic language used is very similar to

standard linear algebra notation but there are a few extensions that will

likely cause you some problems at first

42 SNAPSHOTS

ORIGINAL SCLERA IMAGE IS CONVERTED INTO GREY SCALE IMAGE

FIG

GREY SCALE IMAGE IS CONVERTED INTO BINARY IMAGE

FIG

EDGE DETECTON IS DONE BY OTSUrsquoS THRESHOLDING

FIG

SELECTING THE REGION OF INTEREST (SCLERA PART)

FIG

SELECTED ROI PART

FIG

FIG

ENHANCEMENT OF SCLERA IMAGE

FIG

FEATURE EXTRACTION OF SCLERA IMAGE USING GABOR

FILTERS

FIG

MATCHING WITH IMAGES IN DATABASE

FIG

DISPLAYING THE RESULT (MATCHED OR NOT MATCHED)

FIG

CHAPTER 5

APPLICATIONS

The applications of biometrics can be divided into the following three main groups

Commercial applications such as computer network login electronic data security ecommerce Internet access ATM credit card physical access control cellular phone PDA medical records management distance learning etc

Government applications such as national ID card correctional facility driverrsquos license social security welfare-disbursement border control Passport control etc

Forensic applications such as corpse identification criminal investigation terrorist identification parenthood determination missing children etc Traditionally commercial applications have used knowledge-based systems (eg PIN sand passwords) government applications have used token-based systems (eg ID cards and badges) and forensic applications have relied on human experts to match biometric features Biometric systems are being increasingly deployed in large scale civilian applications The Schiphol Premium scheme at the Amsterdam airport for example employs iris scan cards to speed up the passport and visa control procedures

CHAPTER 6

CONCLUSION AND FUTURE SCOPE

61 CONCLUSION

In this paper we proposed a new parallel sclera vein recognition

method which employees a two stage parallel approach for registration and

matching Even though the research focused on developing a parallel sclera

matching solution for the sequential line-descriptor method using CUDA

GPU architecture the parallel strategies developed in this research can be

applied to design parallel solutions to other sclera vein recognition methods

and general pattern recognition methods We designed the Y shape

descriptor to narrow the search range to increase the matching efficiency

which is a new feature extraction method to take advantage of the GPU

structures We developed the WPL descriptor to incorporate mask

information and make it more suitable for parallel computing which can

dramatically reduce data transferring and computation We then carefully

mapped our algorithms to GPU threads and blocks which is an important

step to achieve parallel computation efficiency using a GPU A work flow

which has high arithmetic intensity to hide the memory access latency was

designed to partition the computation task to the heterogeneous system of

CPU and GPU even to the threads in GPU The proposed method

dramatically improves the matching efficiency without compromising

recognition accuracy

62 REFERENCES

[1] C W Oyster The Human Eye Structure and Function Sunderland

Sinauer Associates 1999

[2] C Cuevas D Berjon F Moran and N Garcia ldquoMoving object

detection for real-time augmented reality applications in a GPGPUrdquo IEEE

Trans Consum Electron vol 58 no 1 pp 117ndash125 Feb 2012

[3] D C Cirean U Meier L M Gambardella and J Schmidhuber ldquoDeep

big simple neural nets for handwritten digit recognitionrdquo Neural Comput

vol 22 no 12 pp 3207ndash3220 2010

[4] F Z Sakr M Taher and A M Wahba ldquoHigh performance iris

recognition system on GPUrdquo in Proc ICCES 2011 pp 237ndash242

[5] G Poli J H Saito J F Mari and M R Zorzan ldquoProcessing

neocognitron of face recognition on high performance environment based

on GPU with CUDA architecturerdquo in Proc 20th Int Symp Comput

Archit High Perform Comput 2008 pp 81ndash88

[6] J Antikainen J Havel R Josth A Herout P Zemcik and M Hauta-

Kasari ldquoNonnegative tensor factorization accelerated using GPGPUrdquo IEEE

Trans Parallel Distrib Syst vol 22 no 7 pp 1135ndash1141 Feb 2011

[7] K-S Oh and K Jung ldquoGPU implementation of neural networksrdquo

Pattern Recognit vol 37 no 6 pp 1311ndash1314 2004

[8] P R Dixon T Oonishi and S Furui ldquoHarnessing graphics processors

for the fast computation of acoustic likelihoods in speech recognitionrdquo

Comput Speech Lang vol 23 no 4 pp 510ndash526 2009

[9] P Kaufman and A Alm ldquoClinical applicationrdquo Adlerrsquos Physiology of

the Eye 2003

[10] R N Rakvic B J Ulis R P Broussard R W Ives and N Steiner

ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics Security vol 4

no 4 pp 812ndash823 Dec 2009

[11] S Crihalmeanu and A Ross ldquoMultispectral scleral patterns for ocular

biometric recognitionrdquo Pattern Recognit Lett vol 33 no 14 pp 1860ndash

1869 Oct 2012

[12] W Wenying Z Dongming Z Yongdong L Jintao and G

Xiaoguang ldquoRobust spatial matching for object retrieval and its parallel

implementation on GPUrdquo IEEE Trans Multimedia vol 13 no 6 pp

1308ndash1318 Dec 2011Multimedia Sec Magdeburg Germany Sep 2004

pp 4ndash15

[13] Y Xu S Deka and R Righetti ldquoA hybrid CPU-GPGPU approach for

real-time elastographyrdquo IEEE Trans Ultrason Ferroelectr Freq Control

vol 58 no 12 pp 2631ndash2645 Dec 2011

[14] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive

multimodal eye recognitionrdquo Signal Image Video Process vol 7 no 4

pp 619ndash631 Jul 2013

[15] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive

approach for sclera image quality measurerdquo Int J Biometrics vol 5 no 2

pp 181ndash198 2013

[16] Z Zhou E Y Du N L Thomas and E J Delp ldquoA new human

identification method Sclera recognitionrdquo IEEE Trans Syst Man

Cybern A Syst Humans vol 42 no 3 pp 571ndash583 May 2012

Page 29: Major project report on

when there is large number of templates in the database for matching GPUs

(as abbreviation of General purpose Graphics Processing Units GPGPUs)

are now popularly used for parallel computing to improve the

computational processing speed and efficiency The highly parallel

structure of GPUs makes them more effective than CPUs for data

processing where processing can be performed in parallel GPUs have been

widely used in biometrics recognition such as speech recognition text

detection handwriting recognition and face recognition In iris

recognition GPU was used to extract the features construct descriptors

and match templates

GPUs are also used for object retrieval and image search Park et al

designed the performance evaluation of image processing algorithms such

as linear feature extraction and multi-view stereo matching on GPUs

However these approaches were designed for their specific biometric

recognition applications and feature searching methods Therefore they may

not be efficient for Sclera vein recognition Compute Unified Device

Architecture (CUDA) the computing engine of NVIDIA GPUs is used in

this research CUDA is a highly parallel multithreaded many-core

processor with tremendous computational power It supports not only a

traditional graphics pipeline but also computation on non-graphical data

More importantly it offers an easier programming platform which

outperforms its CPU counterparts in terms of peak arithmetic intensity and

memory bandwidth In this research the goal is not to develop a unified

strategy to parallelize all sclera matching methods because each method is

quite different from one another and would need customized design To

develop an efficient parallel computing scheme it would need different

strategies for different Sclera vein recognition methods

Rather the goal is to develop a parallel sclera matching solution for

Sclera vein recognition using our sequential line-descriptor method using

the CUDA GPU architecture However the parallelization strategies

developed in this research can be applied to design parallel approaches for

other Sclera vein recognition methods and help parallelize general pattern

recognition methods Based on the matching approach in there are three

challenges to map the task of sclera feature matching to GPU

1) Mask files are used to calculate valid overlapping areas of two sclera

templates and to align the templates to the same coordinate system But the

mask files are large in size and will preoccupy the GPU memory and slow

down the data transfer Also some of processing on the mask files will

involve convolution which is difficult to improve its performance on the

scalar process unit on CUDA

2) The procedure of sclera feature matching consists of a pipeline of several

computational stages with different memory and processing requirements

There is no uniform mapping scheme applicable for all these stages

3) When the scale of sclera database is far larger than the number of the

processing units on the GPU parallel matching on the GPU is still unable to

satisfy the requirement of real-time performance New designs are

necessary to help narrow down the search range In summary naiumlve

implementation of the algorithms in parallel would not work efficiently

Note it is relatively straightforward to implement our C program for

CUDA on AMD-based GPU using Open CL Our CUDA kernels can be

directly converted to Open CL kernels by concerning different syntax for

various keywords and built-in functions The mapping strategy is also

effective in Open CL if we regard thread and block in CUDA as work item

and work-group in Open CL Most of our optimization techniques such as

coalesced memory access and prefix sum can work in Open CL too

Moreover since CUDA is a data parallel architecture the implementation

of our approach by Open CL should be programmed in data-parallel model

In this research we first discuss why the naiumlve parallel approach would not

work (Section 3) We then propose the new sclera descriptor ndash the Y shape

sclera feature-based efficient registration method to speed up the mapping

scheme (Section 4) introduce the ldquoweighted polar line (WPL) descriptorrdquo

that would be better suited for parallel computing to mitigate the mask size

issue (Section 5) and develop our coarse to fine two-stage matching

process to dramatically improve the matching speed (Section 6) These new

approaches make the parallel processing possible and efficient However it

is non-trivial to implement these algorithms in CUDA We then developed

the implementation schemes to map our algorithms into CUDA (Section 7)

In the Section 2 we give brief introduction of Sclera vein recognition In

the Section 8 we performed some experiments using the proposed system

In the Section 9 we draw some conclusions

22 BACKGROUND OF SCLERA VEIN RECOGNITION

221 OVERVIEW OF SCLERA VEIN RECOGNITION

A typical sclera vein recognition system includes sclera

segmentation feature enhancement feature extraction and feature

matching (Figure 1)

FIG

Sclera image segmentation is the first step in sclera vein recognition

Several methods have been designed for sclera segmentation Crihalmeanu

et al presented an semi-automated system for sclera segmentation They

used a clustering algorithm to classify the color eye images into three

clusters - sclera iris and background Later on Crihalmeanu and Ross

designed a segmentation approach based on a normalized sclera index

measure which includes coarse sclera segmentation pupil region

segmentation and fine sclera segmentation Zhou et al developed a skin

tone plus ldquowhite colorrdquo-based voting method for sclera segmentation in

color images and Otsursquos thresholding-based method for grayscale images

After sclera segmentation it is necessary to enhance and extract the sclera

features since the sclera vein patterns often lack contrast and are hard to

detect Zhou et al used a bank of multi-directional Gabor filters for

vascular pattern enhancement Derakhshani et al used contrast limited

adaptive histogram equalization (CLAHE) to enhance the green color plane

of the RGB image and a multi-scale region growing approach to identify

the sclera veins from the image background Crihalmeanu and Ross applied

a selective enhancement filter for blood vessels to extract features from the

green component in a color image In the feature matching step

Crihalmeanu and Ross proposed

three registration and matching approaches including Speed Up Robust

Features (SURF) which is based on interest-point detection minutiae

detection which is based on minutiae points on the vasculature structure

and direct correlation matching which relies on image registration Zhou et

al designed a line descriptor based feature registration and matching

method

The proposed sclera recognition consists of five steps which include

sclera segmentation vein pattern enhancement feature extraction feature

matching and matching decision Fig 2 shows the block diagram of sclera

recognition Two types of feature extraction are used in the proposed

method to achieve good accuracy for the identification The characteristics

that are elicited from the blood vessel structure seen in the sclera region are

Histogram of Oriented Gradients (HOG) and interpolation of Cartesian to

Polar conversion HOG is used to determine the gradient orientation and

edge orientations of vein pattern in the sclera region of an eye image To

become more computationally efficient the data of the image are converted

to the polar form It is mainly used for circular or quasi circular shape of

object These two characteristics are extracted from all the images in the

database and compared with the features of the query image whether the

person is correctly identified or not This procedure is done in the feature

matching step and ultimately makes the matching decision By using the

proposed feature extraction methods and matching techniques the human

identification is more accurate than the existing studies In the proposed

method two features of an image are drawn out

222 SCLERA SEGMENTATION

Sclera segmentation is the first step in the sclera recognition It lets

in three steps glare area detection sclera area estimation and iris and eyelid

detection and refinement Fig shows the steps of segmentation

FIG

Glare Area Detection Glare area means a small bright area near

pupil or iris This is the unwanted portion on the eye image Sobel filter is

applied to detect the glare area present in the iris or pupil Simply it runs

only for the grayscale image If the image is color then it needs a

conversion to grayscale image and after that apply it to the Sobel filter to

detect the glare area Fig 4 shows the result of the glare area detection

FIG

Sclera area estimation For the estimation of sclera area Otsursquos

thresholding method is applied The stairs of the sclera area detection are

selection of the area of interest (ROI) Otsursquos thresholding sclera area

detection Left and right sclera area is selected based on the iris boundaries

When the region of interest is selected then apply Otsursquos thresholding for

obtaining the potential sclera areas The correct left sclera area should be

placed in the right and center positions and correct right sclera area should

be placed in the left and center In this way non sclera areas are wiped out

223 IRIS AND EYELID REFINEMENT

The top and underside of the sclera regions are the limits of the

sclera area And then that upper eyelid lower eyelid and iris boundaries are

refined These altogether are the unwanted portion for recognition In order

to eliminate these effects refinement is done in the footstep of the detection

of sclera area Fig shows after the Otsursquos thresholding process and iris and

eyelid refinement to detect right sclera area In the same way the left sclera

area is detected using this method

FIG

In the segmentation process all images are not perfectly segmented

Hence feature extraction and matching are needed to reduce the

segmentation fault The vein patterns in the sclera area are not visible in the

segmentation process To get vein patterns more visible vein pattern

enhancement is to be performed

224 OCULAR SURFACE VASCULATURE

Human recognition using vascular patterns in the human body has

been studied in the context of fingers (Miura et al 2004) palm (Lin and

Fan 2004) and retina (Hill 1999) In the case of retinal biometrics an

especial optical device for imaging the back of the eyeball is needed (Hill

1999) Due to its perceived invasiveness and the required degree of subject

cooperation the use of retinal biometrics may not be acceptable to some

individuals The conjunctiva is a thin transparent and moist tissue that

covers the outer surface of the eye The part of the conjunctiva that covers

the inner lining of the eyelids is called palpebral conjunctiva and the part

that covers the outer surface of the eye is called ocular (or the bulbar)

conjunctiva which is the focus of this study The ocular conjunctiva is very

thin and clear thus the vasculature (including those of the episclera) is

easily visible through it The visible microcirculation of conjunctiva offers a

rich and complex network of veins and fine microcirculation (Fig 1) The

apparent complexity and specificity of these vascular patterns motivated us

to utilize them for personal identification (Derakhshani and Ross 2006)

FIG

We have found conjunctival vasculature to be a suitable biometric as it

conforms to the following criteria (Jain et al 2004)

UNIVERSALITY All normal living tissues including that of the

conjunctiva and episclera have vascular structure

UNIQUENESS Vasculature is created during embryonic vasculogenesis

Its detailed final structure is mostly stochastic and thus unique Even

though no comprehensive study on the uniqueness of vascular structures

has been conducted study of some targeted areas such as those of the eye

fundus confirm the uniqueness of such vascular patterns even between

identical twins (Simon and Goldstein 1935 Tower 1955)

PERMANENCE Other than cases of significant trauma pathology or

chemical intervention spontaneous adult ocular vasculogenesis and

angiogenesis does not easily occur Thus the conjunctival vascular

structure is expected to have reasonable permanence (Joussen 2001)

Practicality Conjunctival vasculature can be captured with commercial off

the shelf digital cameras under normal lighting conditions making this

modality highly practical

ACCEPTABILITY Since the subject is not required to stare directly into

the camera lens and given the possibility of capturing the conjunctival

vasculature from several feet away this modality is non-intrusive and thus

more acceptable

SPOOF-PROOFNESS The fine multi surface structure of the ocular

veins makes them hard to reproduce as a physical artifact Besides being a

stand-alone biometric modality we anticipate that the addition of

conjunctival biometrics will enhance the performance of current iris-based

biometric system in the following ways

Improving accuracy by the addition of vascular features

Facilitating recognition using off-angle iris images For instance if the iris

information is relegated to the left or right portions of the eye the sclera

vein patterns will be further exposed This feature makes sclera vasculature

a natural complement to the iris biometric

Addressing the failure-to-enroll issue when iris patterns are not usable (eg

due to surgical procedures)

Reducing vulnerability to spoof attacks For instance when implemented

alongside iris systems an attacker needs to reproduce not only the iris but

also different surfaces of the sclera along with the associated

microcirculation and make them available on commensurate eye surfaces

The first step in parallelizing an algorithm is to determine the

availability for simultaneous computation Below Figure demonstrates the

possibility for parallel directional filtering Since the filter is computed over

different portions of the input image the computation can be computed in

parallel (denoted by Elements below) In addition individual parallelization

of each element of Filtering can also be performed A detailed discussion of

our proposed parallelization is outside the scope of this paper

FIG

FIG

225 OVERVIEW OF THE LINE DESCRIPTOR-BASED SCLERA

VEIN

2251 RECOGNITION METHOD

The matching segment of the line-descriptor based method is a

bottleneck with regard to matching speed In this section we briefly

describe the Line Descriptor-based sclera vein recognition method After

segmentation vein patterns were enhanced by a bank of directional Gabor

filters Binary morphological operations are used to thin the detected vein

structure down to a single pixel wide skeleton and remove the branch

points The line descriptor is used to describe the segments in the vein

structure Figure 2 shows a visual description of the line descriptor Each

segment is described by three quantities the segments angle to some

reference angle at the iris center θ the segments distance to the iris center r

and the dominant angular orientation of the line segment ɸ Thus the

descriptor is S = ( θ r ɸ )T The individual components of the line descriptor

are calculated as

FIG

Here fline (x) is the polynomial approximation of the line segment (xl yl )

is the center point of the line segment (xi yi ) is the center of the detected

iris and S is the line descriptor In order to register the segments of the

vascular patterns a RANSAC-based algorithm is used to estimate the best-

fit parameters for registration between the two sclera vascular patterns For

the registration algorithm it randomly chooses two points ndash one from the

test template and one from the target template It also randomly chooses a

scaling factor and a rotation value based on a priori knowledge of the

database Using these values it calculates a fitness value for the registration

using these parameters

After sclera template registration each line segment in the test

template is compared to the line segments in the target template for

matches In order to reduce the effect of segmentation errors we created the

weighting image (Figure 3) from the sclera mask by setting interior pixels

in the sclera mask to 1 pixels within some distance of the boundary of the

mask to 05 and pixels outside the mask to 0

The matching score for two segment descriptors is calculated By

where Si and Sj are two segment descriptors m(Si Sj ) is the matching

score between segments Si and Sj d(Si Sj ) is the Euclidean distance

between the segment descriptors center points (from Eq 6-8) Dmatch is

the matching distance threshold and match is the matching angle threshold

The total matching score M is the sum of the individual matching scores

divided by the maximum matching score for the minimal set between the

test and target template That is one of the test or target templates has fewer

points and thus the sum of its descriptors weight sets the maximum score

that can be attained

FIG

FIG

FIG

FIG

movement of eye Y shape branches are observed to be a stable feature and

can be used as sclera feature descriptor To detect the Y shape branches in

the original template we search for the nearest neighbors set of every line

segment in a regular distance classified the angles among these neighbors

If there were two types of angle values in the line segment set this set may

be inferred as a Y shape structure and the line segment angles would be

recorded as a new feature of the sclera

There are two ways to measure both orientation and relationship of

every branch of Y shape vessels one is to use the angles of every branch to

x axle the other is to use the angels between branch and iris radial

direction The first method needs additional rotation operating to align the

template In our approach we employed the second method As Figure 6

shows ϕ1 ϕ2 and ϕ3 denote the angle between each branch and the radius

from pupil center Even when the head tilts the eye moves or the camera

zooms occurs at the image acquisition step ϕ1 ϕ2 and ϕ3 are quite stable

To tolerate errors from the pupil center calculation in the segmentation step

we also recorded the center position (x y) of the Y shape branches as

auxiliary parameters So in our rotation shift and scale invariant feature

vector is defined as y(ϕ1 ϕ2 ϕ3 x y) The Y-shape descriptor is generated

with reference to the iris center Therefore it is automatically aligned to the

iris centers It is a rotational- and scale- invariant descriptor V WPL

SCLERA DESCRIPTOR As we discussed in the Section 22 the line

descriptor is extracted from the skeleton of vessel structure in binary images

(Figure 7) The skeleton is then broken into smaller segments For each

segment a line descriptor is created to record the center and orientation of

the segment This descriptor is expressed as s(x yɸ) where (x y) is the

position of the center and ɸ is its orientation Because of the limitation of

segmentation accuracy the descriptor in the boundary of sclera area might

not be accurate and may contain spur edges resulting from the iris eyelid

andor eyelashes To be tolerant of such error the mask file

FIG

The line descriptor of the sclera vessel pattern (a) An eye image (b) Vessel

patterns in sclera (c) Enhanced sclera vessel patterns (d) Centers of line

segments of vessel patterns

Is designed to indicate whether a line segment belongs to the edge of the

sclera or not However in GPU application using the mask is a challenging

since the mask files are large in size and will occupy the GPU memory and

slow down the data transfer When matching the registration RANSAC

type algorithm was used to randomly select the corresponding descriptors

and the transform parameter between them was used to generate the

template transform affine matrix After every templates transform the mask

data should also be transformed and new boundary should be calculated to

evaluate the weight of the transformed descriptor This results in too many

convolutions in processor unit

To reduce heavy data transfer and computation we designed the

weighted polar line (WPL) descriptor structure which includes the

information of mask and can be automatically aligned We extracted the

relationship of geometric feature of descriptors and store them as a new

descriptor We use a weighted image created via setting various weight

values according to their positions The weight of those descriptors who are

beyond the sclera are set to be 0 and those who are near the sclera

boundary are 05 and interior descriptors are set to be 1 In our work

descriptors weights were calculated on their own mask by the CPU only

once

The calculating result was saved as a component of descriptor The

descriptor of sclera will change to s(x y ɸw) where w denotes the weight

of the point and the value may be 0 05 1 To align two templates when a

template is shifted to another location along the line connecting their

centers all the descriptors of that template will be transformed It would be

faster if two templates have similar reference points If we use the center of

the iris as the reference point when two templates are compared the

correspondence will automatically be aligned to each other since they have

the similar reference point Every feature vector of the template is a set of

line segment descriptors composed of three variable (Figure 8) the

segment angle to the reference line which went through the iris center

denoted as θ the distance between the segments center and pupil center

which is denoted as r the dominant angular orientation of the segment

denoted as ɸ To minimize the GPU computing we also convert the

descriptor value from polar coordinate to rectangular coordinate in CPU

preprocess

The descriptor vector becomes s(x y r θ ɸw) The left and right

parts of sclera in an eye may have different registration parameters For

example as an eyeball moves left left part sclera patterns of the eye may be

compressed while the right part sclera patterns are stretched

In parallel matching these two parts are assigned to threads in

different warps to allow different deformation The multiprocessor in

CUDA manages threads in groups of 32 parallel threads called warps We

reorganized the descriptor from same sides and saved

FIG

FIG

them in continuous address This would meet requirement of coalesced

memory access in GPU

After reorganizing the structure of descriptors and adding mask information

into the new descriptor the computation on the mask file is not needed on

the GPU It was very fast to match with this feature because it does not

need to reregister the templates every time after shifting Thus the cost of

data transfer and computation on GPU will be reduced Matching on the

new descriptor the shift parameters generator in Figure 4 is then simplified

as Figure 9

23 EVOLUTION OF GPU ARCHITECTURE

The fixed-function pipeline lacked the generality to efficiently express

more complicated shading and lighting operations that are essential for

complex effects The key step was replacing the fixed-function per-vertex

and per-fragment operations with user-specified programs run on each

vertex and fragment Over the past six years these vertex programs and

fragment programs have become increasingly more capable with larger

limits on their size and resource consumption with more fully featured

instruction sets and with more flexible control-flow operations After many

years of separate instruction sets for vertex and fragment operations current

GPUs support the unified Shader Model 40 on both vertex and fragment

shaders

The hardware must support shader programs of at least 65 k static

instructions and unlimited dynamic instructions

The instruction set for the first time supports both 32-bit integers and 32-

bit floating-point numbers

The hardware must allow an arbitrary number of both direct and indirect

reads from global memory (texture)

Finally dynamic flow control in the form of loops and branches must be

supported

As the shader model has evolved and become more powerful and GPU

applications of all types have increased vertex and fragment program

complexity GPU architectures have increasingly focused on the

programmable parts of the graphics pipeline Indeed while previous

generations of GPUs could best be described as additions of

programmability to a fixed-function pipeline todayrsquos GPUs are better

characterized as a programmable engine surrounded by supporting fixed-

function units General-Purpose Computing on the GPU Mapping general-

purpose computation onto the GPU uses the graphics hardware in much the

same way as any standard graphics application Because of this similarity it

is both easier and more difficult to explain the process On one hand the

actual operations are the same and are easy to follow on the other hand the

terminology is different between graphics and general-purpose use Harris

provides an excellent description of this mapping process

We begin by describing GPU programming using graphics terminology

then show how the same steps are used in a general-purpose way to author

GPGPU applications and finally use the same steps to show the more

simple and direct way that todayrsquos GPU computing applications are written

231 PROGRAMMING A GPU FOR GRAPHICS

We begin with the same GPU pipeline that we described in Section II

concentrating on the programmable aspects of this pipeline

The programmer specifies geometry that covers a region on the screen

The rasterizer generates a fragment at each pixel location covered by that

geometry

Each fragment is shaded by the fragment program

The fragment program computes the value of the fragment by a

combination of math operations and global memory reads from a global

Btexture[ memory

The resulting image can then be used as texture on future passes through

the graphics pipeline

232 PROGRAMMING A GPU FOR GENERAL-PURPOSE

PROGRAMS (OLD)

Coopting this pipeline to perform general-purpose computation

involves the exact same steps but different terminology A motivating

example is a fluid simulation computed over a grid at each time step we

compute the next state of the fluid for each grid point from the current state

at its grid point and at the grid points of its neighbors

The programmer specifies a geometric primitive that covers a

computation domain of interest The rasterizer generates a fragment at each

pixel location covered by that geometry (In our example our primitive

must cover a grid of fragments equal to the domain size of our fluid

simulation)

Each fragment is shaded by an SPMD general-purpose fragment

program (Each grid point runs the same program to update the state of its

fluid)

The fragment program computes the value of the fragment by a

combination of math operations and Bgather[ accesses from global

memory (Each grid point can access the state of its neighbors from the

previous time step in computing its current value)

The resulting buffer in global memory can then be used as an input on

future passes (The current state of the fluid will be used on the next time

step)

23 2 PROGRAMMING A GPU FOR GENERAL-PURPOSE

PROGRAMS (NEW)

One of the historical difficulties in programming GPGPU applications

has been that despite their general-purpose tasksrsquo having nothing to do with

graphics the applications still had to be programmed using graphics APIs

In addition the program had to be structured in terms of the graphics

pipeline with the programmable units only accessible as an intermediate

step in that pipeline when the programmer would almost certainly prefer to

access the programmable units directly The programming environments we

describe in detail in Section IV are solving this difficulty by providing a

more natural direct non-graphics interface to the hardware and

specifically the programmable units Today GPU computing applications

are structured in the following way

The programmer directly defines the computation domain of interest as a

structured grid of threads

An SPMD general-purpose program computes the value of each thread

The value for each thread is computed by a combination of math

operations and both Bgather[ (read) accesses from and Bscatter[ (write)

accesses to global memory Unlike in the previous two

methods the same buffer can be used for both reading and writing

allowing more flexible algorithms (for example in-place algorithms that

use less memory)

The resulting buffer in global memory can then be used as an input in

future computation

24 COARSE-TO-FINE TWO-STAGE MATCHING PROCESS

To further improve the matching process we propose the coarse-to-fine

two-stage matching process In the first stage we matched two images

coarsely using the Y-shape descriptors which is very fast to match because

no registration was needed The matching result in this stage can help filter

out image pairs with low similarities After this step it is still possible for

some false positive matches In the second stage we used WPL descriptor

to register the two images for more detailed descriptor matching including

scale- and translation invariance This stage includes shift transform affine

matrix generation and final WPL descriptor matching Overall we

partitioned the registration and matching processing into four kernels2 in

CUDA (Figure 10) matching on the Y shape descriptor shift

transformation affine matrix generation and final WSL descriptor

matching Combining these two stages the matching program can run faster

and

achieve more accurate score

241 STAGE I MATCHING WITH Y SHAPE DESCRIPTOR

Due to scale- and rotation- invariance of the Y-shape features

registration is unnecessary before matching on Y shape descriptor The

whole matching algorithm is listed as algorithm 1

FIG

Here ytei and yta j are the Y shape descriptors of test template Tte

and target template Tta respectively dϕ is the Euclidian distance of angle

element of descriptors vector defined as (3) dxy is the Euclidian distance of

two descriptor centers defined as (4) ni and di are the matched descriptor

pairsrsquo number and their centers distance respectively tϕ is a distance

threshold and txy is the threshold to restrict the searching area We set tϕ to

30 and txy to 675 in our experiment Here

To match two sclera templates we searched the areas nearby to all

the Y shape branches The search area is limited to the corresponding left or

right half of the sclera in order to reduce the searching range and time The

distance of two branches is defined in (3) where ϕi j is the angle between

the j th branch and the polar from pupil center in desctiptor i

The number of matched pairs ni and the distance between Y shape

branches centers di are stored as the matching result We fuse the number of

matched branches and the average distance between matched branches

centers as (2) Here α is a factor to fuse the matching score which was set

to 30 in our study Ni and Nj is the total numbers of feature vectors in

template i and j separately The decision is regulated by the threshold t if

the sclerarsquos matching score is lower than t the sclera will be discarded The

sclera with high matching score will be passed to the next more precisely

matching process

242 STAGE II FINE MATCHING USING WPL DESCRIPTOR

The line segment WSL descriptor reveals more vessel structure detail of

sclera than the Y shape descriptor The variation of sclera vessel pattern is

nonlinear because

When acquiring an eye image in different gaze angle the vessel structure

will appear nonlinear shrink or extend because eyeball is spherical in shape

sclera is made up of four layers episclera stroma lamina fusca and

endothelium There are slightly differences among movement of these

layers Considering these factors our registration employed both single

shift transform and multi-parameter transform which combines shift

rotation and scale together

1) SHIFT PARAMETER SEARCH As we discussed before

segmentation may not be accurate As a result the detected iris center could

not be very accurate Shift transform is designed to tolerant possible errors

in pupil center detection in the segmentation step If there is no deformation

or only very minor deformation registration with shift transform together

would be adequate to achieve an accurate result We designed Algorithm 2

to get optimized shift parameter Where Tte is the test template and ssei is

the i th WPL descriptor of Tte Tta is the target template and ssai is the i th

WPL descriptor of Ttad(stek staj ) is Euclidean distance of escriptors stek

and staj

Δsk is the shift value of two descriptors defines as

We first randomly select an equal number of segment descriptors

stek in test template Tte from each quad and find its nearest neighbors staj _

in target template Tta The shift offset of them is recorded as the possible

registration shift factor _sk The final offset registration factor is _soptim

which has the smallest standard deviation among these candidate offsets

2) AFFINE TRANSFORM PARAMETER SEARCH

Affine transform is designed to tolerant some deformation of sclera

patterns in the matching step The affine transform algorithm is shown in

Algorithm 3 The shift value in the parameter set is obtained by randomly

selecting descriptor s(it )te and calculating the distance from its nearest

neighbor staj_ in Tta We transform the test template by the matrix in (7)

At end of the iteration we count the numbers of matched descriptor pairs

from the transformed template and the target template The factor β is

involved to determine if the pair of descriptor is matched and we set it to

be 20 pixels in our experiment After N iterations the optimized transform

parameter set is determined via selecting the maximum matching numbers

m(it) Here stei Tte staj and Tta is defined same as algorithm 2 tr (it )

shi f t θ(it )tr (it ) scale is the parameters of shift rotation and scale

transform generated in i tth iteration R(θ (it )) T (tr (it ) shi f t ) and S(tr (it

) scale) are the transform matrix defined as (7) To search optimize

transform parameter we iterated N times to generate these parameters In

our experiment we set iteration time to 512

3) REGISTRATION AND MATCHING ALGORITHM

Using the optimized parameter set determined from Algorithms 2

and 3 the test template will be registered and matched simultaneously The

registration and matching algorithm is listed in Algorithm 4 Here stei Tte

staj and Tta are defined same as Algorithms 2 and 3 θ(optm) tr (optm) shi f

t tr (optm) scale _soptim are the registration parameters attained from

Algorithms 2 and 3 R_θ(optm)_T _tr (optm) shi f t _S_tr (optm) scale

_ is the descriptor transform matrix defined in Algorithm 3 empty is the angle

between the segment descriptor and radius direction w is the weight of the

descriptor which indicates whether the descriptor is at the edge of sclera or

not To ensure that the nearest descriptors have a similar orientation we

used a constant factor α to check the abstract difference of two ɸ In our

experiment we set α to 5 The total matching score is minimal score of two

transformed result divided by the minimal matching score for test template

and target template

25 MAPPING THE SUBTASKS TO CUDA

CUDA is a single instruction multiple data (SIMD) system and

works as a coprocessor with a CPU A CUDA consists of many streaming

multiprocessors (SM) where the parallel part of the program should be

partitioned into threads by the programmer and mapped into those threads

There are multiple memory spaces in the CUDA memory hierarchy

register local memory shared memory global memory constant memory

and texture memory Register local memory and shared memory are on-

chip and could be a little time consuming to access these memories Only

shared memory can be accessed by other threads within the same block

However there is only limited availability of shared memory Global

memory constant memory and texture memory are off-chip memory and

accessible by all threads which would be very time consuming to access

these memories

Constant memory and texture memory are read-only and cacheable

memory Mapping algorithms to CUDA to achieve efficient processing is

not a trivial task There are several challenges in CUDA programming

If threads in a warp have different control path all the branches will be

executed serially To improve performance branch divergence within a

warp should be avoided

Global memory is slower than on-chip memory in term of access To

completely hide the latency of the small instructions set we should use on-

chip memory preferentially rather than global memory When global

memory access occurs threads in same warp should access the words in

sequence to achieve coalescence

Shared memory is much faster than the local and global memory space

But shared memory is organized into banks which are equal in size If two

addresses of memory request from different thread within a warp fall in the

same memory bank the access will be serialized To get maximum

performance memory requests should be scheduled to minimize bank

conflicts

251 MAPPING ALGORITHM TO BLOCKS

Because the proposed registration and matching algorithm has four

independent modules all the modules will be converted to different kernels

on the GPU These kernels are different in computation density thus we

map them to the GPU by various map strategies to fully utilize the

computing power of CUDA Figure 11 shows our scheme of CPU-GPU

task distribution and the partition among blocks and threads Algorithm 1 is

partitioned into coarse-grained parallel subtasks

We create a number of threads in this kernel The number of threads

is the same as the number of templates in the database As the upper middle

column shows in Figure 11 each target template will be assigned to one

thread One thread performs a pair of templates compare In our work we

use NVIDIA C2070 as our GPU Threads and blocks number is set to

1024 That means we can match our test template with up to 1024times1024

target templates at same time

Algorithms 2-4 will be partitioned into fine-grained subtasks which is

processed a section of descriptors in one thread As the lower portion of the

middle column shows in Figure 11 we assigned a target template to one

block Inside a block one thread corresponds a set of descriptors in this

template This partition makes every block execute independently and there

are no data exchange requirements between different blocks When all

threads complete their responding descriptor fractions the sum of the

intermediate results needs to be computed or compared A parallel prefix

sum algorithm is used to calculate the sum of intermediate results which is

show in right of Figure 11 Firstly all odd number threads compute the sum

of consecutive pairs of the results Then recursively every first of i (= 4 8

16 32 64 ) threads

compute the prefix sum on the new result The final result will be saved in

the first address which has the same variable name as the first intermediate

result

252 MAPPING INSIDE BLOCK

In shift argument searching there are two schemes we can choose to

map task

Mapping one pair of templates to all the threads in a block and then every

thread would take charge of a fraction of descriptors and cooperation with

other threads

Assigning a single possible shift offset to a thread and all the threads will

compute independently unless the final result should be compared with

other possible offset

Due to great number of sum and synchronization operations in every

nearest neighbor searching step we choose the second method to parallelize

shift searching In affine matrix generator we mapped an entire parameter

set searching to a thread and every thread randomly generated a set of

parameters and tried them independently The generated iterations were

assigned to all threads The challenge of this step is the randomly generated

numbers might be correlated among threads In the step of rotation and

scale registration generating we used the Mersenne Twister pseudorandom

number generator because it can use bitwise arithmetic and have long

period

The Mersenne twister as most of pseudorandom generators is iterative

Therefore itrsquos hard to parallelize a single twister state update step among

several execution threads To make sure that thousands of threads in the

launch grid generate uncorrelated random sequence many simultaneous

Mersenne twisters need to process with different initial states in parallel

But even ldquovery differentrdquo (by any definition) initial state values do not

prevent the emission of correlated sequences by each generator sharing

identical parameters To solve this problem and to enable efficient

implementation of Mersenne Twister on parallel architectures we used a

special offline tool for the dynamic creation of Mersenne Twisters

parameters modified from the algorithm developed by Makoto Matsumoto

and Takuji Nishimura In the registration and matching step when

searching the nearest neighbor a line segment that has already matched

with others should not be used again In our approach a flag

FIG

FIG

Variable denoting whether the line has been matched is stored in

shared memory To share the flags all the threads in a block should wait

synchronic operation at every query step Our solution is to use a single

thread in a block to process the matching

253 MEMORY MANAGEMENT

The bandwidth inside GPU board is much higher than the

bandwidth between host memory and device memory The data transfer

between host and device can lead to long latency As shown in Figure 11

we load the entire target templates set from database without considering

when they would be processed Therefore there was no data transfer from

host to device during the matching procedure In global memory the

components in descriptor y(ϕ1 ϕ2 ϕ3 x y) and s(x y rθ ϕw) were stored

separately This would guarantee contiguous kernels of Algorithm 2 to 4

can access their data in successive addresses Although such coalescing

access reduces the latency frequently global memory access was still a

slower way to get data In our kernel we loaded the test template to shared

memory to accelerate memory access Because the Algorithms 2 to 4

execute different number of iterations on same data the bank conflict does

not happen To maximize our texture memory space we set the system

cache to the lowest value and bonded our target descriptor to texture

memory Using this catchable memory our data access was accelerated

more

FIG

26 HISTOGRAM OF ORIENTED GRADIENTS

Histogram of oriented gradients is the feature descriptors It is primarily

applied to the design of target detection In this paper it is applied as the

feature for human recognition In the sclera region the vein patterns are the

edges of an image So HOG is used to determine the gradient orientation

and edge orientations of vein pattern in the sclera region of an eye image

To follow out this technique first of all divide the image into small

connected regions called cells For each cell compute the histogram of

gradient directions or edge orientations of the pixels Then the combination

of different histogram of different cell represents the descriptor To improve

accuracy histograms can be contrast normalized by calculating the intensity

from the block and then using this value normalizes all cells within the

block This normalization result shows that it is invariant to geometric and

photometric changes The gradient magnitude m(x y) and orientation 1050592(x

y) are calculated using x and y directions gradients dx (x y) and dy (x y)

Orientation binning is the second step of HOG This method utilized

to create cell histograms Each pixel within the cell used to give a weight to

the orientation which is found in the gradient computation Gradient

magnitude is used as the weight The cells are in the rectangular form The

binning of gradient orientation should be spread over 0 to 180 degrees and

opposite direction counts as the same In the Fig 8 depicts the edge

orientation of picture elements If the images have any illumination and

contrast changes then the gradient strength must be locally normalized For

that cells are grouped together into larger blocks These blocks are

overlapping so that each cell contributes more than once to the final

descriptor Here rectangular HOG (R-HOG) blocks are applied which are

mainly in square grids The performance of HOG is improved by putting

on a Gaussian window into each block

FIG

CHAPTER 3

SOFTWARE SPECIFICATION

31 GENERAL

MATLAB(matrix laboratory) is a numerical

computing environment and fourth-generation programming language

Developed by Math Works MATLAB allows matrix manipulations

plotting of functions and data implementation of algorithms creation

of user interfaces and interfacing with programs written in other languages

including C C++ Java and Fortran

Although MATLAB is intended primarily for numerical computing an

optional toolbox uses the MuPAD symbolic engine allowing access

to symbolic computing capabilities An additional package Simulink adds

graphicalmulti-domainsimulationandModel-Based

Design for dynamic and embedded systems

In 2004 MATLAB had around one million users across industry

and academia MATLAB users come from various backgrounds

of engineering science and economics MATLAB is widely used in

academic and research institutions as well as industrial enterprises

MATLAB was first adopted by researchers and practitioners

in control engineering Littles specialty but quickly spread to many other

domains It is now also used in education in particular the teaching

of linear algebra and numerical analysis and is popular amongst scientists

involved in image processing The MATLAB application is built around the

MATLAB language The simplest way to execute MATLAB code is to type

it in the Command Window which is one of the elements of the MATLAB

Desktop When code is entered in the Command Window MATLAB can

be used as an interactive mathematical shell Sequences of commands can

be saved in a text file typically using the MATLAB Editor as a script or

encapsulated into a function extending the commands available

MATLAB provides a number of features for documenting and

sharing your work You can integrate your MATLAB code with other

languages and applications and distribute your MATLAB algorithms and

applications

32 FEATURES OF MATLAB

High-level language for technical computing

Development environment for managing code files and data

Interactive tools for iterative exploration design and problem solving

Mathematical functions for linear algebra statistics Fourier analysis

filtering optimization and numerical integration

2-D and 3-D graphics functions for visualizing data

Tools for building custom graphical user interfaces

Functions for integrating MATLAB based algorithms with external

applications and languages such as C C++ FORTRAN Javatrade COM

and Microsoft Excel

MATLAB is used in vast area including signal and image

processing communications control design test and measurement

financial modeling and analysis and computational Add-on toolboxes

(collections of special-purpose MATLAB functions) extend the MATLAB

environment to solve particular classes of problems in these application

areas

MATLAB can be used on personal computers and powerful

server systems including the Cheaha compute cluster With the addition of

the Parallel Computing Toolbox the language can be extended with parallel

implementations for common computational functions including for-loop

unrolling Additionally this toolbox supports offloading computationally

intensive workloads to Cheaha the campus compute cluster MATLAB is

one of a few languages in which each variable is a matrix (broadly

construed) and knows how big it is Moreover the fundamental operators

(eg addition multiplication) are programmed to deal with matrices when

required And the MATLAB environment handles much of the bothersome

housekeeping that makes all this possible Since so many of the procedures

required for Macro-Investment Analysis involves matrices MATLAB

proves to be an extremely efficient language for both communication and

implementation

321 INTERFACING WITH OTHER LANGUAGES

MATLAB can call functions and subroutines written in the C

programming language or FORTRAN A wrapper function is created

allowing MATLAB data types to be passed and returned The dynamically

loadable object files created by compiling such functions are termed MEX-

files (for MATLAB executable)

Libraries written in Java ActiveX or NET can be directly called

from MATLAB and many MATLAB libraries (for

example XML or SQL support) are implemented as wrappers around Java

or ActiveX libraries Calling MATLAB from Java is more complicated but

can be done with MATLAB extension which is sold separately by Math

Works or using an undocumented mechanism called JMI (Java-to-Mat lab

Interface) which should not be confused with the unrelated Java that is also

called JMI

As alternatives to the MuPAD based Symbolic Math Toolbox

available from Math Works MATLAB can be connected

to Maple or Mathematical

Libraries also exist to import and export MathML

Development Environment

Startup Accelerator for faster MATLAB startup on Windows especially on

Windows XP and for network installations

Spreadsheet Import Tool that provides more options for selecting and

loading mixed textual and numeric data

Readability and navigation improvements to warning and error messages in

the MATLAB command window

Automatic variable and function renaming in the MATLAB Editor

Developing Algorithms and Applications

MATLAB provides a high-level language and development

tools that let you quickly develop and analyze your algorithms and

applications

The MATLAB Language

The MATLAB language supports the vector and matrix operations

that are fundamental to engineering and scientific problems It enables fast

development and execution With the MATLAB language you can

program and develop algorithms faster than with traditional languages

because you do not need to perform low-level administrative tasks such as

declaring variables specifying data types and allocating memory In many

cases MATLAB eliminates the need for lsquoforrsquo loops As a result one line of

MATLAB code can often replace several lines of C or C++ code

At the same time MATLAB provides all the features of a traditional

programming language including arithmetic operators flow control data

structures data types object-oriented programming (OOP) and debugging

features

MATLAB lets you execute commands or groups of commands one

at a time without compiling and linking enabling you to quickly iterate to

the optimal solution For fast execution of heavy matrix and vector

computations MATLAB uses processor-optimized libraries For general-

purpose scalar computations MATLAB generates machine-code

instructions using its JIT (Just-In-Time) compilation technology

This technology which is available on most platforms provides

execution speeds that rival those of traditional programming languages

Development Tools

MATLAB includes development tools that help you implement

your algorithm efficiently These include the following

MATLAB Editor

Provides standard editing and debugging features such as setting

breakpoints and single stepping

Code Analyzer

Checks your code for problems and recommends modifications to

maximize performance and maintainability

MATLAB Profiler

Records the time spent executing each line of code

Directory Reports

Scan all the files in a directory and report on code efficiency file

differences file dependencies and code coverage

Designing Graphical User Interfaces

By using the interactive tool GUIDE (Graphical User Interface

Development Environment) to layout design and edit user interfaces

GUIDE lets you include list boxes pull-down menus push buttons radio

buttons and sliders as well as MATLAB plots and Microsoft

ActiveXreg controls Alternatively you can create GUIs programmatically

using MATLAB functions

322 ANALYZING AND ACCESSING DATA

MATLAB supports the entire data analysis process from acquiring

data from external devices and databases through preprocessing

visualization and numerical analysis to producing presentation-quality

output

Data Analysis

MATLAB provides interactive tools and command-line functions for data

analysis operations including

Interpolating and decimating

Extracting sections of data scaling and averaging

Thresholding and smoothing

Correlation Fourier analysis and filtering

1-D peak valley and zero finding

Basic statistics and curve fitting

Matrix analysis

Data Access

MATLAB is an efficient platform for accessing data from

files other applications databases and external devices You can read data

from popular file formats such as Microsoft Excel ASCII text or binary

files image sound and video files and scientific files such as HDF and

HDF5 Low-level binary file IO functions let you work with data files in

any format Additional functions let you read data from Web pages and

XML

Visualizing Data

All the graphics features that are required to visualize engineering

and scientific data are available in MATLAB These include 2-D and 3-D

plotting functions 3-D volume visualization functions tools for

interactively creating plots and the ability to export results to all popular

graphics formats You can customize plots by adding multiple axes

changing line colors and markers adding annotation Latex equations and

legends and drawing shapes

2-D Plotting

Visualizing vectors of data with 2-D plotting functions that create

Line area bar and pie charts

Direction and velocity plots

Histograms

Polygons and surfaces

Scatterbubble plots

Animations

3-D Plotting and Volume Visualization

MATLAB provides functions for visualizing 2-D matrices 3-

D scalar and 3-D vector data You can use these functions to visualize and

understand large often complex multidimensional data Specifying plot

characteristics such as camera viewing angle perspective lighting effect

light source locations and transparency

3-D plotting functions include

Surface contour and mesh

Image plots

Cone slice stream and isosurface

323 PERFORMING NUMERIC COMPUTATION

MATLAB contains mathematical statistical and engineering

functions to support all common engineering and science operations These

functions developed by experts in mathematics are the foundation of the

MATLAB language The core math functions use the LAPACK and BLAS

linear algebra subroutine libraries and the FFTW Discrete Fourier

Transform library Because these processor-dependent libraries are

optimized to the different platforms that MATLAB supports they execute

faster than the equivalent C or C++ code

MATLAB provides the following types of functions for performing

mathematical operations and analyzing data

Matrix manipulation and linear algebra

Polynomials and interpolation

Fourier analysis and filtering

Data analysis and statistics

Optimization and numerical integration

Ordinary differential equations (ODEs)

Partial differential equations (PDEs)

Sparse matrix operations

MATLAB can perform arithmetic on a wide range of data types

including doubles singles and integers

CHAPTER 4

IMPLEMENTATION

41 GENERAL

Matlab is a program that was originally designed to simplify the

implementation of numerical linear algebra routines It has since grown into

something much bigger and it is used to implement numerical algorithms

for a wide range of applications The basic language used is very similar to

standard linear algebra notation but there are a few extensions that will

likely cause you some problems at first

42 SNAPSHOTS

ORIGINAL SCLERA IMAGE IS CONVERTED INTO GREY SCALE IMAGE

FIG

GREY SCALE IMAGE IS CONVERTED INTO BINARY IMAGE

FIG

EDGE DETECTON IS DONE BY OTSUrsquoS THRESHOLDING

FIG

SELECTING THE REGION OF INTEREST (SCLERA PART)

FIG

SELECTED ROI PART

FIG

FIG

ENHANCEMENT OF SCLERA IMAGE

FIG

FEATURE EXTRACTION OF SCLERA IMAGE USING GABOR

FILTERS

FIG

MATCHING WITH IMAGES IN DATABASE

FIG

DISPLAYING THE RESULT (MATCHED OR NOT MATCHED)

FIG

CHAPTER 5

APPLICATIONS

The applications of biometrics can be divided into the following three main groups

Commercial applications such as computer network login electronic data security ecommerce Internet access ATM credit card physical access control cellular phone PDA medical records management distance learning etc

Government applications such as national ID card correctional facility driverrsquos license social security welfare-disbursement border control Passport control etc

Forensic applications such as corpse identification criminal investigation terrorist identification parenthood determination missing children etc Traditionally commercial applications have used knowledge-based systems (eg PIN sand passwords) government applications have used token-based systems (eg ID cards and badges) and forensic applications have relied on human experts to match biometric features Biometric systems are being increasingly deployed in large scale civilian applications The Schiphol Premium scheme at the Amsterdam airport for example employs iris scan cards to speed up the passport and visa control procedures

CHAPTER 6

CONCLUSION AND FUTURE SCOPE

61 CONCLUSION

In this paper we proposed a new parallel sclera vein recognition

method which employees a two stage parallel approach for registration and

matching Even though the research focused on developing a parallel sclera

matching solution for the sequential line-descriptor method using CUDA

GPU architecture the parallel strategies developed in this research can be

applied to design parallel solutions to other sclera vein recognition methods

and general pattern recognition methods We designed the Y shape

descriptor to narrow the search range to increase the matching efficiency

which is a new feature extraction method to take advantage of the GPU

structures We developed the WPL descriptor to incorporate mask

information and make it more suitable for parallel computing which can

dramatically reduce data transferring and computation We then carefully

mapped our algorithms to GPU threads and blocks which is an important

step to achieve parallel computation efficiency using a GPU A work flow

which has high arithmetic intensity to hide the memory access latency was

designed to partition the computation task to the heterogeneous system of

CPU and GPU even to the threads in GPU The proposed method

dramatically improves the matching efficiency without compromising

recognition accuracy

62 REFERENCES

[1] C W Oyster The Human Eye Structure and Function Sunderland

Sinauer Associates 1999

[2] C Cuevas D Berjon F Moran and N Garcia ldquoMoving object

detection for real-time augmented reality applications in a GPGPUrdquo IEEE

Trans Consum Electron vol 58 no 1 pp 117ndash125 Feb 2012

[3] D C Cirean U Meier L M Gambardella and J Schmidhuber ldquoDeep

big simple neural nets for handwritten digit recognitionrdquo Neural Comput

vol 22 no 12 pp 3207ndash3220 2010

[4] F Z Sakr M Taher and A M Wahba ldquoHigh performance iris

recognition system on GPUrdquo in Proc ICCES 2011 pp 237ndash242

[5] G Poli J H Saito J F Mari and M R Zorzan ldquoProcessing

neocognitron of face recognition on high performance environment based

on GPU with CUDA architecturerdquo in Proc 20th Int Symp Comput

Archit High Perform Comput 2008 pp 81ndash88

[6] J Antikainen J Havel R Josth A Herout P Zemcik and M Hauta-

Kasari ldquoNonnegative tensor factorization accelerated using GPGPUrdquo IEEE

Trans Parallel Distrib Syst vol 22 no 7 pp 1135ndash1141 Feb 2011

[7] K-S Oh and K Jung ldquoGPU implementation of neural networksrdquo

Pattern Recognit vol 37 no 6 pp 1311ndash1314 2004

[8] P R Dixon T Oonishi and S Furui ldquoHarnessing graphics processors

for the fast computation of acoustic likelihoods in speech recognitionrdquo

Comput Speech Lang vol 23 no 4 pp 510ndash526 2009

[9] P Kaufman and A Alm ldquoClinical applicationrdquo Adlerrsquos Physiology of

the Eye 2003

[10] R N Rakvic B J Ulis R P Broussard R W Ives and N Steiner

ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics Security vol 4

no 4 pp 812ndash823 Dec 2009

[11] S Crihalmeanu and A Ross ldquoMultispectral scleral patterns for ocular

biometric recognitionrdquo Pattern Recognit Lett vol 33 no 14 pp 1860ndash

1869 Oct 2012

[12] W Wenying Z Dongming Z Yongdong L Jintao and G

Xiaoguang ldquoRobust spatial matching for object retrieval and its parallel

implementation on GPUrdquo IEEE Trans Multimedia vol 13 no 6 pp

1308ndash1318 Dec 2011Multimedia Sec Magdeburg Germany Sep 2004

pp 4ndash15

[13] Y Xu S Deka and R Righetti ldquoA hybrid CPU-GPGPU approach for

real-time elastographyrdquo IEEE Trans Ultrason Ferroelectr Freq Control

vol 58 no 12 pp 2631ndash2645 Dec 2011

[14] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive

multimodal eye recognitionrdquo Signal Image Video Process vol 7 no 4

pp 619ndash631 Jul 2013

[15] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive

approach for sclera image quality measurerdquo Int J Biometrics vol 5 no 2

pp 181ndash198 2013

[16] Z Zhou E Y Du N L Thomas and E J Delp ldquoA new human

identification method Sclera recognitionrdquo IEEE Trans Syst Man

Cybern A Syst Humans vol 42 no 3 pp 571ndash583 May 2012

Page 30: Major project report on

recognition methods Based on the matching approach in there are three

challenges to map the task of sclera feature matching to GPU

1) Mask files are used to calculate valid overlapping areas of two sclera

templates and to align the templates to the same coordinate system But the

mask files are large in size and will preoccupy the GPU memory and slow

down the data transfer Also some of processing on the mask files will

involve convolution which is difficult to improve its performance on the

scalar process unit on CUDA

2) The procedure of sclera feature matching consists of a pipeline of several

computational stages with different memory and processing requirements

There is no uniform mapping scheme applicable for all these stages

3) When the scale of sclera database is far larger than the number of the

processing units on the GPU parallel matching on the GPU is still unable to

satisfy the requirement of real-time performance New designs are

necessary to help narrow down the search range In summary naiumlve

implementation of the algorithms in parallel would not work efficiently

Note it is relatively straightforward to implement our C program for

CUDA on AMD-based GPU using Open CL Our CUDA kernels can be

directly converted to Open CL kernels by concerning different syntax for

various keywords and built-in functions The mapping strategy is also

effective in Open CL if we regard thread and block in CUDA as work item

and work-group in Open CL Most of our optimization techniques such as

coalesced memory access and prefix sum can work in Open CL too

Moreover since CUDA is a data parallel architecture the implementation

of our approach by Open CL should be programmed in data-parallel model

In this research we first discuss why the naiumlve parallel approach would not

work (Section 3) We then propose the new sclera descriptor ndash the Y shape

sclera feature-based efficient registration method to speed up the mapping

scheme (Section 4) introduce the ldquoweighted polar line (WPL) descriptorrdquo

that would be better suited for parallel computing to mitigate the mask size

issue (Section 5) and develop our coarse to fine two-stage matching

process to dramatically improve the matching speed (Section 6) These new

approaches make the parallel processing possible and efficient However it

is non-trivial to implement these algorithms in CUDA We then developed

the implementation schemes to map our algorithms into CUDA (Section 7)

In the Section 2 we give brief introduction of Sclera vein recognition In

the Section 8 we performed some experiments using the proposed system

In the Section 9 we draw some conclusions

22 BACKGROUND OF SCLERA VEIN RECOGNITION

221 OVERVIEW OF SCLERA VEIN RECOGNITION

A typical sclera vein recognition system includes sclera

segmentation feature enhancement feature extraction and feature

matching (Figure 1)

FIG

Sclera image segmentation is the first step in sclera vein recognition

Several methods have been designed for sclera segmentation Crihalmeanu

et al presented an semi-automated system for sclera segmentation They

used a clustering algorithm to classify the color eye images into three

clusters - sclera iris and background Later on Crihalmeanu and Ross

designed a segmentation approach based on a normalized sclera index

measure which includes coarse sclera segmentation pupil region

segmentation and fine sclera segmentation Zhou et al developed a skin

tone plus ldquowhite colorrdquo-based voting method for sclera segmentation in

color images and Otsursquos thresholding-based method for grayscale images

After sclera segmentation it is necessary to enhance and extract the sclera

features since the sclera vein patterns often lack contrast and are hard to

detect Zhou et al used a bank of multi-directional Gabor filters for

vascular pattern enhancement Derakhshani et al used contrast limited

adaptive histogram equalization (CLAHE) to enhance the green color plane

of the RGB image and a multi-scale region growing approach to identify

the sclera veins from the image background Crihalmeanu and Ross applied

a selective enhancement filter for blood vessels to extract features from the

green component in a color image In the feature matching step

Crihalmeanu and Ross proposed

three registration and matching approaches including Speed Up Robust

Features (SURF) which is based on interest-point detection minutiae

detection which is based on minutiae points on the vasculature structure

and direct correlation matching which relies on image registration Zhou et

al designed a line descriptor based feature registration and matching

method

The proposed sclera recognition consists of five steps which include

sclera segmentation vein pattern enhancement feature extraction feature

matching and matching decision Fig 2 shows the block diagram of sclera

recognition Two types of feature extraction are used in the proposed

method to achieve good accuracy for the identification The characteristics

that are elicited from the blood vessel structure seen in the sclera region are

Histogram of Oriented Gradients (HOG) and interpolation of Cartesian to

Polar conversion HOG is used to determine the gradient orientation and

edge orientations of vein pattern in the sclera region of an eye image To

become more computationally efficient the data of the image are converted

to the polar form It is mainly used for circular or quasi circular shape of

object These two characteristics are extracted from all the images in the

database and compared with the features of the query image whether the

person is correctly identified or not This procedure is done in the feature

matching step and ultimately makes the matching decision By using the

proposed feature extraction methods and matching techniques the human

identification is more accurate than the existing studies In the proposed

method two features of an image are drawn out

222 SCLERA SEGMENTATION

Sclera segmentation is the first step in the sclera recognition It lets

in three steps glare area detection sclera area estimation and iris and eyelid

detection and refinement Fig shows the steps of segmentation

FIG

Glare Area Detection Glare area means a small bright area near

pupil or iris This is the unwanted portion on the eye image Sobel filter is

applied to detect the glare area present in the iris or pupil Simply it runs

only for the grayscale image If the image is color then it needs a

conversion to grayscale image and after that apply it to the Sobel filter to

detect the glare area Fig 4 shows the result of the glare area detection

FIG

Sclera area estimation For the estimation of sclera area Otsursquos

thresholding method is applied The stairs of the sclera area detection are

selection of the area of interest (ROI) Otsursquos thresholding sclera area

detection Left and right sclera area is selected based on the iris boundaries

When the region of interest is selected then apply Otsursquos thresholding for

obtaining the potential sclera areas The correct left sclera area should be

placed in the right and center positions and correct right sclera area should

be placed in the left and center In this way non sclera areas are wiped out

223 IRIS AND EYELID REFINEMENT

The top and underside of the sclera regions are the limits of the

sclera area And then that upper eyelid lower eyelid and iris boundaries are

refined These altogether are the unwanted portion for recognition In order

to eliminate these effects refinement is done in the footstep of the detection

of sclera area Fig shows after the Otsursquos thresholding process and iris and

eyelid refinement to detect right sclera area In the same way the left sclera

area is detected using this method

FIG

In the segmentation process all images are not perfectly segmented

Hence feature extraction and matching are needed to reduce the

segmentation fault The vein patterns in the sclera area are not visible in the

segmentation process To get vein patterns more visible vein pattern

enhancement is to be performed

224 OCULAR SURFACE VASCULATURE

Human recognition using vascular patterns in the human body has

been studied in the context of fingers (Miura et al 2004) palm (Lin and

Fan 2004) and retina (Hill 1999) In the case of retinal biometrics an

especial optical device for imaging the back of the eyeball is needed (Hill

1999) Due to its perceived invasiveness and the required degree of subject

cooperation the use of retinal biometrics may not be acceptable to some

individuals The conjunctiva is a thin transparent and moist tissue that

covers the outer surface of the eye The part of the conjunctiva that covers

the inner lining of the eyelids is called palpebral conjunctiva and the part

that covers the outer surface of the eye is called ocular (or the bulbar)

conjunctiva which is the focus of this study The ocular conjunctiva is very

thin and clear thus the vasculature (including those of the episclera) is

easily visible through it The visible microcirculation of conjunctiva offers a

rich and complex network of veins and fine microcirculation (Fig 1) The

apparent complexity and specificity of these vascular patterns motivated us

to utilize them for personal identification (Derakhshani and Ross 2006)

FIG

We have found conjunctival vasculature to be a suitable biometric as it

conforms to the following criteria (Jain et al 2004)

UNIVERSALITY All normal living tissues including that of the

conjunctiva and episclera have vascular structure

UNIQUENESS Vasculature is created during embryonic vasculogenesis

Its detailed final structure is mostly stochastic and thus unique Even

though no comprehensive study on the uniqueness of vascular structures

has been conducted study of some targeted areas such as those of the eye

fundus confirm the uniqueness of such vascular patterns even between

identical twins (Simon and Goldstein 1935 Tower 1955)

PERMANENCE Other than cases of significant trauma pathology or

chemical intervention spontaneous adult ocular vasculogenesis and

angiogenesis does not easily occur Thus the conjunctival vascular

structure is expected to have reasonable permanence (Joussen 2001)

Practicality Conjunctival vasculature can be captured with commercial off

the shelf digital cameras under normal lighting conditions making this

modality highly practical

ACCEPTABILITY Since the subject is not required to stare directly into

the camera lens and given the possibility of capturing the conjunctival

vasculature from several feet away this modality is non-intrusive and thus

more acceptable

SPOOF-PROOFNESS The fine multi surface structure of the ocular

veins makes them hard to reproduce as a physical artifact Besides being a

stand-alone biometric modality we anticipate that the addition of

conjunctival biometrics will enhance the performance of current iris-based

biometric system in the following ways

Improving accuracy by the addition of vascular features

Facilitating recognition using off-angle iris images For instance if the iris

information is relegated to the left or right portions of the eye the sclera

vein patterns will be further exposed This feature makes sclera vasculature

a natural complement to the iris biometric

Addressing the failure-to-enroll issue when iris patterns are not usable (eg

due to surgical procedures)

Reducing vulnerability to spoof attacks For instance when implemented

alongside iris systems an attacker needs to reproduce not only the iris but

also different surfaces of the sclera along with the associated

microcirculation and make them available on commensurate eye surfaces

The first step in parallelizing an algorithm is to determine the

availability for simultaneous computation Below Figure demonstrates the

possibility for parallel directional filtering Since the filter is computed over

different portions of the input image the computation can be computed in

parallel (denoted by Elements below) In addition individual parallelization

of each element of Filtering can also be performed A detailed discussion of

our proposed parallelization is outside the scope of this paper

FIG

FIG

225 OVERVIEW OF THE LINE DESCRIPTOR-BASED SCLERA

VEIN

2251 RECOGNITION METHOD

The matching segment of the line-descriptor based method is a

bottleneck with regard to matching speed In this section we briefly

describe the Line Descriptor-based sclera vein recognition method After

segmentation vein patterns were enhanced by a bank of directional Gabor

filters Binary morphological operations are used to thin the detected vein

structure down to a single pixel wide skeleton and remove the branch

points The line descriptor is used to describe the segments in the vein

structure Figure 2 shows a visual description of the line descriptor Each

segment is described by three quantities the segments angle to some

reference angle at the iris center θ the segments distance to the iris center r

and the dominant angular orientation of the line segment ɸ Thus the

descriptor is S = ( θ r ɸ )T The individual components of the line descriptor

are calculated as

FIG

Here fline (x) is the polynomial approximation of the line segment (xl yl )

is the center point of the line segment (xi yi ) is the center of the detected

iris and S is the line descriptor In order to register the segments of the

vascular patterns a RANSAC-based algorithm is used to estimate the best-

fit parameters for registration between the two sclera vascular patterns For

the registration algorithm it randomly chooses two points ndash one from the

test template and one from the target template It also randomly chooses a

scaling factor and a rotation value based on a priori knowledge of the

database Using these values it calculates a fitness value for the registration

using these parameters

After sclera template registration each line segment in the test

template is compared to the line segments in the target template for

matches In order to reduce the effect of segmentation errors we created the

weighting image (Figure 3) from the sclera mask by setting interior pixels

in the sclera mask to 1 pixels within some distance of the boundary of the

mask to 05 and pixels outside the mask to 0

The matching score for two segment descriptors is calculated By

where Si and Sj are two segment descriptors m(Si Sj ) is the matching

score between segments Si and Sj d(Si Sj ) is the Euclidean distance

between the segment descriptors center points (from Eq 6-8) Dmatch is

the matching distance threshold and match is the matching angle threshold

The total matching score M is the sum of the individual matching scores

divided by the maximum matching score for the minimal set between the

test and target template That is one of the test or target templates has fewer

points and thus the sum of its descriptors weight sets the maximum score

that can be attained

FIG

FIG

FIG

FIG

movement of eye Y shape branches are observed to be a stable feature and

can be used as sclera feature descriptor To detect the Y shape branches in

the original template we search for the nearest neighbors set of every line

segment in a regular distance classified the angles among these neighbors

If there were two types of angle values in the line segment set this set may

be inferred as a Y shape structure and the line segment angles would be

recorded as a new feature of the sclera

There are two ways to measure both orientation and relationship of

every branch of Y shape vessels one is to use the angles of every branch to

x axle the other is to use the angels between branch and iris radial

direction The first method needs additional rotation operating to align the

template In our approach we employed the second method As Figure 6

shows ϕ1 ϕ2 and ϕ3 denote the angle between each branch and the radius

from pupil center Even when the head tilts the eye moves or the camera

zooms occurs at the image acquisition step ϕ1 ϕ2 and ϕ3 are quite stable

To tolerate errors from the pupil center calculation in the segmentation step

we also recorded the center position (x y) of the Y shape branches as

auxiliary parameters So in our rotation shift and scale invariant feature

vector is defined as y(ϕ1 ϕ2 ϕ3 x y) The Y-shape descriptor is generated

with reference to the iris center Therefore it is automatically aligned to the

iris centers It is a rotational- and scale- invariant descriptor V WPL

SCLERA DESCRIPTOR As we discussed in the Section 22 the line

descriptor is extracted from the skeleton of vessel structure in binary images

(Figure 7) The skeleton is then broken into smaller segments For each

segment a line descriptor is created to record the center and orientation of

the segment This descriptor is expressed as s(x yɸ) where (x y) is the

position of the center and ɸ is its orientation Because of the limitation of

segmentation accuracy the descriptor in the boundary of sclera area might

not be accurate and may contain spur edges resulting from the iris eyelid

andor eyelashes To be tolerant of such error the mask file

FIG

The line descriptor of the sclera vessel pattern (a) An eye image (b) Vessel

patterns in sclera (c) Enhanced sclera vessel patterns (d) Centers of line

segments of vessel patterns

Is designed to indicate whether a line segment belongs to the edge of the

sclera or not However in GPU application using the mask is a challenging

since the mask files are large in size and will occupy the GPU memory and

slow down the data transfer When matching the registration RANSAC

type algorithm was used to randomly select the corresponding descriptors

and the transform parameter between them was used to generate the

template transform affine matrix After every templates transform the mask

data should also be transformed and new boundary should be calculated to

evaluate the weight of the transformed descriptor This results in too many

convolutions in processor unit

To reduce heavy data transfer and computation we designed the

weighted polar line (WPL) descriptor structure which includes the

information of mask and can be automatically aligned We extracted the

relationship of geometric feature of descriptors and store them as a new

descriptor We use a weighted image created via setting various weight

values according to their positions The weight of those descriptors who are

beyond the sclera are set to be 0 and those who are near the sclera

boundary are 05 and interior descriptors are set to be 1 In our work

descriptors weights were calculated on their own mask by the CPU only

once

The calculating result was saved as a component of descriptor The

descriptor of sclera will change to s(x y ɸw) where w denotes the weight

of the point and the value may be 0 05 1 To align two templates when a

template is shifted to another location along the line connecting their

centers all the descriptors of that template will be transformed It would be

faster if two templates have similar reference points If we use the center of

the iris as the reference point when two templates are compared the

correspondence will automatically be aligned to each other since they have

the similar reference point Every feature vector of the template is a set of

line segment descriptors composed of three variable (Figure 8) the

segment angle to the reference line which went through the iris center

denoted as θ the distance between the segments center and pupil center

which is denoted as r the dominant angular orientation of the segment

denoted as ɸ To minimize the GPU computing we also convert the

descriptor value from polar coordinate to rectangular coordinate in CPU

preprocess

The descriptor vector becomes s(x y r θ ɸw) The left and right

parts of sclera in an eye may have different registration parameters For

example as an eyeball moves left left part sclera patterns of the eye may be

compressed while the right part sclera patterns are stretched

In parallel matching these two parts are assigned to threads in

different warps to allow different deformation The multiprocessor in

CUDA manages threads in groups of 32 parallel threads called warps We

reorganized the descriptor from same sides and saved

FIG

FIG

them in continuous address This would meet requirement of coalesced

memory access in GPU

After reorganizing the structure of descriptors and adding mask information

into the new descriptor the computation on the mask file is not needed on

the GPU It was very fast to match with this feature because it does not

need to reregister the templates every time after shifting Thus the cost of

data transfer and computation on GPU will be reduced Matching on the

new descriptor the shift parameters generator in Figure 4 is then simplified

as Figure 9

23 EVOLUTION OF GPU ARCHITECTURE

The fixed-function pipeline lacked the generality to efficiently express

more complicated shading and lighting operations that are essential for

complex effects The key step was replacing the fixed-function per-vertex

and per-fragment operations with user-specified programs run on each

vertex and fragment Over the past six years these vertex programs and

fragment programs have become increasingly more capable with larger

limits on their size and resource consumption with more fully featured

instruction sets and with more flexible control-flow operations After many

years of separate instruction sets for vertex and fragment operations current

GPUs support the unified Shader Model 40 on both vertex and fragment

shaders

The hardware must support shader programs of at least 65 k static

instructions and unlimited dynamic instructions

The instruction set for the first time supports both 32-bit integers and 32-

bit floating-point numbers

The hardware must allow an arbitrary number of both direct and indirect

reads from global memory (texture)

Finally dynamic flow control in the form of loops and branches must be

supported

As the shader model has evolved and become more powerful and GPU

applications of all types have increased vertex and fragment program

complexity GPU architectures have increasingly focused on the

programmable parts of the graphics pipeline Indeed while previous

generations of GPUs could best be described as additions of

programmability to a fixed-function pipeline todayrsquos GPUs are better

characterized as a programmable engine surrounded by supporting fixed-

function units General-Purpose Computing on the GPU Mapping general-

purpose computation onto the GPU uses the graphics hardware in much the

same way as any standard graphics application Because of this similarity it

is both easier and more difficult to explain the process On one hand the

actual operations are the same and are easy to follow on the other hand the

terminology is different between graphics and general-purpose use Harris

provides an excellent description of this mapping process

We begin by describing GPU programming using graphics terminology

then show how the same steps are used in a general-purpose way to author

GPGPU applications and finally use the same steps to show the more

simple and direct way that todayrsquos GPU computing applications are written

231 PROGRAMMING A GPU FOR GRAPHICS

We begin with the same GPU pipeline that we described in Section II

concentrating on the programmable aspects of this pipeline

The programmer specifies geometry that covers a region on the screen

The rasterizer generates a fragment at each pixel location covered by that

geometry

Each fragment is shaded by the fragment program

The fragment program computes the value of the fragment by a

combination of math operations and global memory reads from a global

Btexture[ memory

The resulting image can then be used as texture on future passes through

the graphics pipeline

232 PROGRAMMING A GPU FOR GENERAL-PURPOSE

PROGRAMS (OLD)

Coopting this pipeline to perform general-purpose computation

involves the exact same steps but different terminology A motivating

example is a fluid simulation computed over a grid at each time step we

compute the next state of the fluid for each grid point from the current state

at its grid point and at the grid points of its neighbors

The programmer specifies a geometric primitive that covers a

computation domain of interest The rasterizer generates a fragment at each

pixel location covered by that geometry (In our example our primitive

must cover a grid of fragments equal to the domain size of our fluid

simulation)

Each fragment is shaded by an SPMD general-purpose fragment

program (Each grid point runs the same program to update the state of its

fluid)

The fragment program computes the value of the fragment by a

combination of math operations and Bgather[ accesses from global

memory (Each grid point can access the state of its neighbors from the

previous time step in computing its current value)

The resulting buffer in global memory can then be used as an input on

future passes (The current state of the fluid will be used on the next time

step)

23 2 PROGRAMMING A GPU FOR GENERAL-PURPOSE

PROGRAMS (NEW)

One of the historical difficulties in programming GPGPU applications

has been that despite their general-purpose tasksrsquo having nothing to do with

graphics the applications still had to be programmed using graphics APIs

In addition the program had to be structured in terms of the graphics

pipeline with the programmable units only accessible as an intermediate

step in that pipeline when the programmer would almost certainly prefer to

access the programmable units directly The programming environments we

describe in detail in Section IV are solving this difficulty by providing a

more natural direct non-graphics interface to the hardware and

specifically the programmable units Today GPU computing applications

are structured in the following way

The programmer directly defines the computation domain of interest as a

structured grid of threads

An SPMD general-purpose program computes the value of each thread

The value for each thread is computed by a combination of math

operations and both Bgather[ (read) accesses from and Bscatter[ (write)

accesses to global memory Unlike in the previous two

methods the same buffer can be used for both reading and writing

allowing more flexible algorithms (for example in-place algorithms that

use less memory)

The resulting buffer in global memory can then be used as an input in

future computation

24 COARSE-TO-FINE TWO-STAGE MATCHING PROCESS

To further improve the matching process we propose the coarse-to-fine

two-stage matching process In the first stage we matched two images

coarsely using the Y-shape descriptors which is very fast to match because

no registration was needed The matching result in this stage can help filter

out image pairs with low similarities After this step it is still possible for

some false positive matches In the second stage we used WPL descriptor

to register the two images for more detailed descriptor matching including

scale- and translation invariance This stage includes shift transform affine

matrix generation and final WPL descriptor matching Overall we

partitioned the registration and matching processing into four kernels2 in

CUDA (Figure 10) matching on the Y shape descriptor shift

transformation affine matrix generation and final WSL descriptor

matching Combining these two stages the matching program can run faster

and

achieve more accurate score

241 STAGE I MATCHING WITH Y SHAPE DESCRIPTOR

Due to scale- and rotation- invariance of the Y-shape features

registration is unnecessary before matching on Y shape descriptor The

whole matching algorithm is listed as algorithm 1

FIG

Here ytei and yta j are the Y shape descriptors of test template Tte

and target template Tta respectively dϕ is the Euclidian distance of angle

element of descriptors vector defined as (3) dxy is the Euclidian distance of

two descriptor centers defined as (4) ni and di are the matched descriptor

pairsrsquo number and their centers distance respectively tϕ is a distance

threshold and txy is the threshold to restrict the searching area We set tϕ to

30 and txy to 675 in our experiment Here

To match two sclera templates we searched the areas nearby to all

the Y shape branches The search area is limited to the corresponding left or

right half of the sclera in order to reduce the searching range and time The

distance of two branches is defined in (3) where ϕi j is the angle between

the j th branch and the polar from pupil center in desctiptor i

The number of matched pairs ni and the distance between Y shape

branches centers di are stored as the matching result We fuse the number of

matched branches and the average distance between matched branches

centers as (2) Here α is a factor to fuse the matching score which was set

to 30 in our study Ni and Nj is the total numbers of feature vectors in

template i and j separately The decision is regulated by the threshold t if

the sclerarsquos matching score is lower than t the sclera will be discarded The

sclera with high matching score will be passed to the next more precisely

matching process

242 STAGE II FINE MATCHING USING WPL DESCRIPTOR

The line segment WSL descriptor reveals more vessel structure detail of

sclera than the Y shape descriptor The variation of sclera vessel pattern is

nonlinear because

When acquiring an eye image in different gaze angle the vessel structure

will appear nonlinear shrink or extend because eyeball is spherical in shape

sclera is made up of four layers episclera stroma lamina fusca and

endothelium There are slightly differences among movement of these

layers Considering these factors our registration employed both single

shift transform and multi-parameter transform which combines shift

rotation and scale together

1) SHIFT PARAMETER SEARCH As we discussed before

segmentation may not be accurate As a result the detected iris center could

not be very accurate Shift transform is designed to tolerant possible errors

in pupil center detection in the segmentation step If there is no deformation

or only very minor deformation registration with shift transform together

would be adequate to achieve an accurate result We designed Algorithm 2

to get optimized shift parameter Where Tte is the test template and ssei is

the i th WPL descriptor of Tte Tta is the target template and ssai is the i th

WPL descriptor of Ttad(stek staj ) is Euclidean distance of escriptors stek

and staj

Δsk is the shift value of two descriptors defines as

We first randomly select an equal number of segment descriptors

stek in test template Tte from each quad and find its nearest neighbors staj _

in target template Tta The shift offset of them is recorded as the possible

registration shift factor _sk The final offset registration factor is _soptim

which has the smallest standard deviation among these candidate offsets

2) AFFINE TRANSFORM PARAMETER SEARCH

Affine transform is designed to tolerant some deformation of sclera

patterns in the matching step The affine transform algorithm is shown in

Algorithm 3 The shift value in the parameter set is obtained by randomly

selecting descriptor s(it )te and calculating the distance from its nearest

neighbor staj_ in Tta We transform the test template by the matrix in (7)

At end of the iteration we count the numbers of matched descriptor pairs

from the transformed template and the target template The factor β is

involved to determine if the pair of descriptor is matched and we set it to

be 20 pixels in our experiment After N iterations the optimized transform

parameter set is determined via selecting the maximum matching numbers

m(it) Here stei Tte staj and Tta is defined same as algorithm 2 tr (it )

shi f t θ(it )tr (it ) scale is the parameters of shift rotation and scale

transform generated in i tth iteration R(θ (it )) T (tr (it ) shi f t ) and S(tr (it

) scale) are the transform matrix defined as (7) To search optimize

transform parameter we iterated N times to generate these parameters In

our experiment we set iteration time to 512

3) REGISTRATION AND MATCHING ALGORITHM

Using the optimized parameter set determined from Algorithms 2

and 3 the test template will be registered and matched simultaneously The

registration and matching algorithm is listed in Algorithm 4 Here stei Tte

staj and Tta are defined same as Algorithms 2 and 3 θ(optm) tr (optm) shi f

t tr (optm) scale _soptim are the registration parameters attained from

Algorithms 2 and 3 R_θ(optm)_T _tr (optm) shi f t _S_tr (optm) scale

_ is the descriptor transform matrix defined in Algorithm 3 empty is the angle

between the segment descriptor and radius direction w is the weight of the

descriptor which indicates whether the descriptor is at the edge of sclera or

not To ensure that the nearest descriptors have a similar orientation we

used a constant factor α to check the abstract difference of two ɸ In our

experiment we set α to 5 The total matching score is minimal score of two

transformed result divided by the minimal matching score for test template

and target template

25 MAPPING THE SUBTASKS TO CUDA

CUDA is a single instruction multiple data (SIMD) system and

works as a coprocessor with a CPU A CUDA consists of many streaming

multiprocessors (SM) where the parallel part of the program should be

partitioned into threads by the programmer and mapped into those threads

There are multiple memory spaces in the CUDA memory hierarchy

register local memory shared memory global memory constant memory

and texture memory Register local memory and shared memory are on-

chip and could be a little time consuming to access these memories Only

shared memory can be accessed by other threads within the same block

However there is only limited availability of shared memory Global

memory constant memory and texture memory are off-chip memory and

accessible by all threads which would be very time consuming to access

these memories

Constant memory and texture memory are read-only and cacheable

memory Mapping algorithms to CUDA to achieve efficient processing is

not a trivial task There are several challenges in CUDA programming

If threads in a warp have different control path all the branches will be

executed serially To improve performance branch divergence within a

warp should be avoided

Global memory is slower than on-chip memory in term of access To

completely hide the latency of the small instructions set we should use on-

chip memory preferentially rather than global memory When global

memory access occurs threads in same warp should access the words in

sequence to achieve coalescence

Shared memory is much faster than the local and global memory space

But shared memory is organized into banks which are equal in size If two

addresses of memory request from different thread within a warp fall in the

same memory bank the access will be serialized To get maximum

performance memory requests should be scheduled to minimize bank

conflicts

251 MAPPING ALGORITHM TO BLOCKS

Because the proposed registration and matching algorithm has four

independent modules all the modules will be converted to different kernels

on the GPU These kernels are different in computation density thus we

map them to the GPU by various map strategies to fully utilize the

computing power of CUDA Figure 11 shows our scheme of CPU-GPU

task distribution and the partition among blocks and threads Algorithm 1 is

partitioned into coarse-grained parallel subtasks

We create a number of threads in this kernel The number of threads

is the same as the number of templates in the database As the upper middle

column shows in Figure 11 each target template will be assigned to one

thread One thread performs a pair of templates compare In our work we

use NVIDIA C2070 as our GPU Threads and blocks number is set to

1024 That means we can match our test template with up to 1024times1024

target templates at same time

Algorithms 2-4 will be partitioned into fine-grained subtasks which is

processed a section of descriptors in one thread As the lower portion of the

middle column shows in Figure 11 we assigned a target template to one

block Inside a block one thread corresponds a set of descriptors in this

template This partition makes every block execute independently and there

are no data exchange requirements between different blocks When all

threads complete their responding descriptor fractions the sum of the

intermediate results needs to be computed or compared A parallel prefix

sum algorithm is used to calculate the sum of intermediate results which is

show in right of Figure 11 Firstly all odd number threads compute the sum

of consecutive pairs of the results Then recursively every first of i (= 4 8

16 32 64 ) threads

compute the prefix sum on the new result The final result will be saved in

the first address which has the same variable name as the first intermediate

result

252 MAPPING INSIDE BLOCK

In shift argument searching there are two schemes we can choose to

map task

Mapping one pair of templates to all the threads in a block and then every

thread would take charge of a fraction of descriptors and cooperation with

other threads

Assigning a single possible shift offset to a thread and all the threads will

compute independently unless the final result should be compared with

other possible offset

Due to great number of sum and synchronization operations in every

nearest neighbor searching step we choose the second method to parallelize

shift searching In affine matrix generator we mapped an entire parameter

set searching to a thread and every thread randomly generated a set of

parameters and tried them independently The generated iterations were

assigned to all threads The challenge of this step is the randomly generated

numbers might be correlated among threads In the step of rotation and

scale registration generating we used the Mersenne Twister pseudorandom

number generator because it can use bitwise arithmetic and have long

period

The Mersenne twister as most of pseudorandom generators is iterative

Therefore itrsquos hard to parallelize a single twister state update step among

several execution threads To make sure that thousands of threads in the

launch grid generate uncorrelated random sequence many simultaneous

Mersenne twisters need to process with different initial states in parallel

But even ldquovery differentrdquo (by any definition) initial state values do not

prevent the emission of correlated sequences by each generator sharing

identical parameters To solve this problem and to enable efficient

implementation of Mersenne Twister on parallel architectures we used a

special offline tool for the dynamic creation of Mersenne Twisters

parameters modified from the algorithm developed by Makoto Matsumoto

and Takuji Nishimura In the registration and matching step when

searching the nearest neighbor a line segment that has already matched

with others should not be used again In our approach a flag

FIG

FIG

Variable denoting whether the line has been matched is stored in

shared memory To share the flags all the threads in a block should wait

synchronic operation at every query step Our solution is to use a single

thread in a block to process the matching

253 MEMORY MANAGEMENT

The bandwidth inside GPU board is much higher than the

bandwidth between host memory and device memory The data transfer

between host and device can lead to long latency As shown in Figure 11

we load the entire target templates set from database without considering

when they would be processed Therefore there was no data transfer from

host to device during the matching procedure In global memory the

components in descriptor y(ϕ1 ϕ2 ϕ3 x y) and s(x y rθ ϕw) were stored

separately This would guarantee contiguous kernels of Algorithm 2 to 4

can access their data in successive addresses Although such coalescing

access reduces the latency frequently global memory access was still a

slower way to get data In our kernel we loaded the test template to shared

memory to accelerate memory access Because the Algorithms 2 to 4

execute different number of iterations on same data the bank conflict does

not happen To maximize our texture memory space we set the system

cache to the lowest value and bonded our target descriptor to texture

memory Using this catchable memory our data access was accelerated

more

FIG

26 HISTOGRAM OF ORIENTED GRADIENTS

Histogram of oriented gradients is the feature descriptors It is primarily

applied to the design of target detection In this paper it is applied as the

feature for human recognition In the sclera region the vein patterns are the

edges of an image So HOG is used to determine the gradient orientation

and edge orientations of vein pattern in the sclera region of an eye image

To follow out this technique first of all divide the image into small

connected regions called cells For each cell compute the histogram of

gradient directions or edge orientations of the pixels Then the combination

of different histogram of different cell represents the descriptor To improve

accuracy histograms can be contrast normalized by calculating the intensity

from the block and then using this value normalizes all cells within the

block This normalization result shows that it is invariant to geometric and

photometric changes The gradient magnitude m(x y) and orientation 1050592(x

y) are calculated using x and y directions gradients dx (x y) and dy (x y)

Orientation binning is the second step of HOG This method utilized

to create cell histograms Each pixel within the cell used to give a weight to

the orientation which is found in the gradient computation Gradient

magnitude is used as the weight The cells are in the rectangular form The

binning of gradient orientation should be spread over 0 to 180 degrees and

opposite direction counts as the same In the Fig 8 depicts the edge

orientation of picture elements If the images have any illumination and

contrast changes then the gradient strength must be locally normalized For

that cells are grouped together into larger blocks These blocks are

overlapping so that each cell contributes more than once to the final

descriptor Here rectangular HOG (R-HOG) blocks are applied which are

mainly in square grids The performance of HOG is improved by putting

on a Gaussian window into each block

FIG

CHAPTER 3

SOFTWARE SPECIFICATION

31 GENERAL

MATLAB(matrix laboratory) is a numerical

computing environment and fourth-generation programming language

Developed by Math Works MATLAB allows matrix manipulations

plotting of functions and data implementation of algorithms creation

of user interfaces and interfacing with programs written in other languages

including C C++ Java and Fortran

Although MATLAB is intended primarily for numerical computing an

optional toolbox uses the MuPAD symbolic engine allowing access

to symbolic computing capabilities An additional package Simulink adds

graphicalmulti-domainsimulationandModel-Based

Design for dynamic and embedded systems

In 2004 MATLAB had around one million users across industry

and academia MATLAB users come from various backgrounds

of engineering science and economics MATLAB is widely used in

academic and research institutions as well as industrial enterprises

MATLAB was first adopted by researchers and practitioners

in control engineering Littles specialty but quickly spread to many other

domains It is now also used in education in particular the teaching

of linear algebra and numerical analysis and is popular amongst scientists

involved in image processing The MATLAB application is built around the

MATLAB language The simplest way to execute MATLAB code is to type

it in the Command Window which is one of the elements of the MATLAB

Desktop When code is entered in the Command Window MATLAB can

be used as an interactive mathematical shell Sequences of commands can

be saved in a text file typically using the MATLAB Editor as a script or

encapsulated into a function extending the commands available

MATLAB provides a number of features for documenting and

sharing your work You can integrate your MATLAB code with other

languages and applications and distribute your MATLAB algorithms and

applications

32 FEATURES OF MATLAB

High-level language for technical computing

Development environment for managing code files and data

Interactive tools for iterative exploration design and problem solving

Mathematical functions for linear algebra statistics Fourier analysis

filtering optimization and numerical integration

2-D and 3-D graphics functions for visualizing data

Tools for building custom graphical user interfaces

Functions for integrating MATLAB based algorithms with external

applications and languages such as C C++ FORTRAN Javatrade COM

and Microsoft Excel

MATLAB is used in vast area including signal and image

processing communications control design test and measurement

financial modeling and analysis and computational Add-on toolboxes

(collections of special-purpose MATLAB functions) extend the MATLAB

environment to solve particular classes of problems in these application

areas

MATLAB can be used on personal computers and powerful

server systems including the Cheaha compute cluster With the addition of

the Parallel Computing Toolbox the language can be extended with parallel

implementations for common computational functions including for-loop

unrolling Additionally this toolbox supports offloading computationally

intensive workloads to Cheaha the campus compute cluster MATLAB is

one of a few languages in which each variable is a matrix (broadly

construed) and knows how big it is Moreover the fundamental operators

(eg addition multiplication) are programmed to deal with matrices when

required And the MATLAB environment handles much of the bothersome

housekeeping that makes all this possible Since so many of the procedures

required for Macro-Investment Analysis involves matrices MATLAB

proves to be an extremely efficient language for both communication and

implementation

321 INTERFACING WITH OTHER LANGUAGES

MATLAB can call functions and subroutines written in the C

programming language or FORTRAN A wrapper function is created

allowing MATLAB data types to be passed and returned The dynamically

loadable object files created by compiling such functions are termed MEX-

files (for MATLAB executable)

Libraries written in Java ActiveX or NET can be directly called

from MATLAB and many MATLAB libraries (for

example XML or SQL support) are implemented as wrappers around Java

or ActiveX libraries Calling MATLAB from Java is more complicated but

can be done with MATLAB extension which is sold separately by Math

Works or using an undocumented mechanism called JMI (Java-to-Mat lab

Interface) which should not be confused with the unrelated Java that is also

called JMI

As alternatives to the MuPAD based Symbolic Math Toolbox

available from Math Works MATLAB can be connected

to Maple or Mathematical

Libraries also exist to import and export MathML

Development Environment

Startup Accelerator for faster MATLAB startup on Windows especially on

Windows XP and for network installations

Spreadsheet Import Tool that provides more options for selecting and

loading mixed textual and numeric data

Readability and navigation improvements to warning and error messages in

the MATLAB command window

Automatic variable and function renaming in the MATLAB Editor

Developing Algorithms and Applications

MATLAB provides a high-level language and development

tools that let you quickly develop and analyze your algorithms and

applications

The MATLAB Language

The MATLAB language supports the vector and matrix operations

that are fundamental to engineering and scientific problems It enables fast

development and execution With the MATLAB language you can

program and develop algorithms faster than with traditional languages

because you do not need to perform low-level administrative tasks such as

declaring variables specifying data types and allocating memory In many

cases MATLAB eliminates the need for lsquoforrsquo loops As a result one line of

MATLAB code can often replace several lines of C or C++ code

At the same time MATLAB provides all the features of a traditional

programming language including arithmetic operators flow control data

structures data types object-oriented programming (OOP) and debugging

features

MATLAB lets you execute commands or groups of commands one

at a time without compiling and linking enabling you to quickly iterate to

the optimal solution For fast execution of heavy matrix and vector

computations MATLAB uses processor-optimized libraries For general-

purpose scalar computations MATLAB generates machine-code

instructions using its JIT (Just-In-Time) compilation technology

This technology which is available on most platforms provides

execution speeds that rival those of traditional programming languages

Development Tools

MATLAB includes development tools that help you implement

your algorithm efficiently These include the following

MATLAB Editor

Provides standard editing and debugging features such as setting

breakpoints and single stepping

Code Analyzer

Checks your code for problems and recommends modifications to

maximize performance and maintainability

MATLAB Profiler

Records the time spent executing each line of code

Directory Reports

Scan all the files in a directory and report on code efficiency file

differences file dependencies and code coverage

Designing Graphical User Interfaces

By using the interactive tool GUIDE (Graphical User Interface

Development Environment) to layout design and edit user interfaces

GUIDE lets you include list boxes pull-down menus push buttons radio

buttons and sliders as well as MATLAB plots and Microsoft

ActiveXreg controls Alternatively you can create GUIs programmatically

using MATLAB functions

322 ANALYZING AND ACCESSING DATA

MATLAB supports the entire data analysis process from acquiring

data from external devices and databases through preprocessing

visualization and numerical analysis to producing presentation-quality

output

Data Analysis

MATLAB provides interactive tools and command-line functions for data

analysis operations including

Interpolating and decimating

Extracting sections of data scaling and averaging

Thresholding and smoothing

Correlation Fourier analysis and filtering

1-D peak valley and zero finding

Basic statistics and curve fitting

Matrix analysis

Data Access

MATLAB is an efficient platform for accessing data from

files other applications databases and external devices You can read data

from popular file formats such as Microsoft Excel ASCII text or binary

files image sound and video files and scientific files such as HDF and

HDF5 Low-level binary file IO functions let you work with data files in

any format Additional functions let you read data from Web pages and

XML

Visualizing Data

All the graphics features that are required to visualize engineering

and scientific data are available in MATLAB These include 2-D and 3-D

plotting functions 3-D volume visualization functions tools for

interactively creating plots and the ability to export results to all popular

graphics formats You can customize plots by adding multiple axes

changing line colors and markers adding annotation Latex equations and

legends and drawing shapes

2-D Plotting

Visualizing vectors of data with 2-D plotting functions that create

Line area bar and pie charts

Direction and velocity plots

Histograms

Polygons and surfaces

Scatterbubble plots

Animations

3-D Plotting and Volume Visualization

MATLAB provides functions for visualizing 2-D matrices 3-

D scalar and 3-D vector data You can use these functions to visualize and

understand large often complex multidimensional data Specifying plot

characteristics such as camera viewing angle perspective lighting effect

light source locations and transparency

3-D plotting functions include

Surface contour and mesh

Image plots

Cone slice stream and isosurface

323 PERFORMING NUMERIC COMPUTATION

MATLAB contains mathematical statistical and engineering

functions to support all common engineering and science operations These

functions developed by experts in mathematics are the foundation of the

MATLAB language The core math functions use the LAPACK and BLAS

linear algebra subroutine libraries and the FFTW Discrete Fourier

Transform library Because these processor-dependent libraries are

optimized to the different platforms that MATLAB supports they execute

faster than the equivalent C or C++ code

MATLAB provides the following types of functions for performing

mathematical operations and analyzing data

Matrix manipulation and linear algebra

Polynomials and interpolation

Fourier analysis and filtering

Data analysis and statistics

Optimization and numerical integration

Ordinary differential equations (ODEs)

Partial differential equations (PDEs)

Sparse matrix operations

MATLAB can perform arithmetic on a wide range of data types

including doubles singles and integers

CHAPTER 4

IMPLEMENTATION

41 GENERAL

Matlab is a program that was originally designed to simplify the

implementation of numerical linear algebra routines It has since grown into

something much bigger and it is used to implement numerical algorithms

for a wide range of applications The basic language used is very similar to

standard linear algebra notation but there are a few extensions that will

likely cause you some problems at first

42 SNAPSHOTS

ORIGINAL SCLERA IMAGE IS CONVERTED INTO GREY SCALE IMAGE

FIG

GREY SCALE IMAGE IS CONVERTED INTO BINARY IMAGE

FIG

EDGE DETECTON IS DONE BY OTSUrsquoS THRESHOLDING

FIG

SELECTING THE REGION OF INTEREST (SCLERA PART)

FIG

SELECTED ROI PART

FIG

FIG

ENHANCEMENT OF SCLERA IMAGE

FIG

FEATURE EXTRACTION OF SCLERA IMAGE USING GABOR

FILTERS

FIG

MATCHING WITH IMAGES IN DATABASE

FIG

DISPLAYING THE RESULT (MATCHED OR NOT MATCHED)

FIG

CHAPTER 5

APPLICATIONS

The applications of biometrics can be divided into the following three main groups

Commercial applications such as computer network login electronic data security ecommerce Internet access ATM credit card physical access control cellular phone PDA medical records management distance learning etc

Government applications such as national ID card correctional facility driverrsquos license social security welfare-disbursement border control Passport control etc

Forensic applications such as corpse identification criminal investigation terrorist identification parenthood determination missing children etc Traditionally commercial applications have used knowledge-based systems (eg PIN sand passwords) government applications have used token-based systems (eg ID cards and badges) and forensic applications have relied on human experts to match biometric features Biometric systems are being increasingly deployed in large scale civilian applications The Schiphol Premium scheme at the Amsterdam airport for example employs iris scan cards to speed up the passport and visa control procedures

CHAPTER 6

CONCLUSION AND FUTURE SCOPE

61 CONCLUSION

In this paper we proposed a new parallel sclera vein recognition

method which employees a two stage parallel approach for registration and

matching Even though the research focused on developing a parallel sclera

matching solution for the sequential line-descriptor method using CUDA

GPU architecture the parallel strategies developed in this research can be

applied to design parallel solutions to other sclera vein recognition methods

and general pattern recognition methods We designed the Y shape

descriptor to narrow the search range to increase the matching efficiency

which is a new feature extraction method to take advantage of the GPU

structures We developed the WPL descriptor to incorporate mask

information and make it more suitable for parallel computing which can

dramatically reduce data transferring and computation We then carefully

mapped our algorithms to GPU threads and blocks which is an important

step to achieve parallel computation efficiency using a GPU A work flow

which has high arithmetic intensity to hide the memory access latency was

designed to partition the computation task to the heterogeneous system of

CPU and GPU even to the threads in GPU The proposed method

dramatically improves the matching efficiency without compromising

recognition accuracy

62 REFERENCES

[1] C W Oyster The Human Eye Structure and Function Sunderland

Sinauer Associates 1999

[2] C Cuevas D Berjon F Moran and N Garcia ldquoMoving object

detection for real-time augmented reality applications in a GPGPUrdquo IEEE

Trans Consum Electron vol 58 no 1 pp 117ndash125 Feb 2012

[3] D C Cirean U Meier L M Gambardella and J Schmidhuber ldquoDeep

big simple neural nets for handwritten digit recognitionrdquo Neural Comput

vol 22 no 12 pp 3207ndash3220 2010

[4] F Z Sakr M Taher and A M Wahba ldquoHigh performance iris

recognition system on GPUrdquo in Proc ICCES 2011 pp 237ndash242

[5] G Poli J H Saito J F Mari and M R Zorzan ldquoProcessing

neocognitron of face recognition on high performance environment based

on GPU with CUDA architecturerdquo in Proc 20th Int Symp Comput

Archit High Perform Comput 2008 pp 81ndash88

[6] J Antikainen J Havel R Josth A Herout P Zemcik and M Hauta-

Kasari ldquoNonnegative tensor factorization accelerated using GPGPUrdquo IEEE

Trans Parallel Distrib Syst vol 22 no 7 pp 1135ndash1141 Feb 2011

[7] K-S Oh and K Jung ldquoGPU implementation of neural networksrdquo

Pattern Recognit vol 37 no 6 pp 1311ndash1314 2004

[8] P R Dixon T Oonishi and S Furui ldquoHarnessing graphics processors

for the fast computation of acoustic likelihoods in speech recognitionrdquo

Comput Speech Lang vol 23 no 4 pp 510ndash526 2009

[9] P Kaufman and A Alm ldquoClinical applicationrdquo Adlerrsquos Physiology of

the Eye 2003

[10] R N Rakvic B J Ulis R P Broussard R W Ives and N Steiner

ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics Security vol 4

no 4 pp 812ndash823 Dec 2009

[11] S Crihalmeanu and A Ross ldquoMultispectral scleral patterns for ocular

biometric recognitionrdquo Pattern Recognit Lett vol 33 no 14 pp 1860ndash

1869 Oct 2012

[12] W Wenying Z Dongming Z Yongdong L Jintao and G

Xiaoguang ldquoRobust spatial matching for object retrieval and its parallel

implementation on GPUrdquo IEEE Trans Multimedia vol 13 no 6 pp

1308ndash1318 Dec 2011Multimedia Sec Magdeburg Germany Sep 2004

pp 4ndash15

[13] Y Xu S Deka and R Righetti ldquoA hybrid CPU-GPGPU approach for

real-time elastographyrdquo IEEE Trans Ultrason Ferroelectr Freq Control

vol 58 no 12 pp 2631ndash2645 Dec 2011

[14] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive

multimodal eye recognitionrdquo Signal Image Video Process vol 7 no 4

pp 619ndash631 Jul 2013

[15] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive

approach for sclera image quality measurerdquo Int J Biometrics vol 5 no 2

pp 181ndash198 2013

[16] Z Zhou E Y Du N L Thomas and E J Delp ldquoA new human

identification method Sclera recognitionrdquo IEEE Trans Syst Man

Cybern A Syst Humans vol 42 no 3 pp 571ndash583 May 2012

Page 31: Major project report on

is non-trivial to implement these algorithms in CUDA We then developed

the implementation schemes to map our algorithms into CUDA (Section 7)

In the Section 2 we give brief introduction of Sclera vein recognition In

the Section 8 we performed some experiments using the proposed system

In the Section 9 we draw some conclusions

22 BACKGROUND OF SCLERA VEIN RECOGNITION

221 OVERVIEW OF SCLERA VEIN RECOGNITION

A typical sclera vein recognition system includes sclera

segmentation feature enhancement feature extraction and feature

matching (Figure 1)

FIG

Sclera image segmentation is the first step in sclera vein recognition

Several methods have been designed for sclera segmentation Crihalmeanu

et al presented an semi-automated system for sclera segmentation They

used a clustering algorithm to classify the color eye images into three

clusters - sclera iris and background Later on Crihalmeanu and Ross

designed a segmentation approach based on a normalized sclera index

measure which includes coarse sclera segmentation pupil region

segmentation and fine sclera segmentation Zhou et al developed a skin

tone plus ldquowhite colorrdquo-based voting method for sclera segmentation in

color images and Otsursquos thresholding-based method for grayscale images

After sclera segmentation it is necessary to enhance and extract the sclera

features since the sclera vein patterns often lack contrast and are hard to

detect Zhou et al used a bank of multi-directional Gabor filters for

vascular pattern enhancement Derakhshani et al used contrast limited

adaptive histogram equalization (CLAHE) to enhance the green color plane

of the RGB image and a multi-scale region growing approach to identify

the sclera veins from the image background Crihalmeanu and Ross applied

a selective enhancement filter for blood vessels to extract features from the

green component in a color image In the feature matching step

Crihalmeanu and Ross proposed

three registration and matching approaches including Speed Up Robust

Features (SURF) which is based on interest-point detection minutiae

detection which is based on minutiae points on the vasculature structure

and direct correlation matching which relies on image registration Zhou et

al designed a line descriptor based feature registration and matching

method

The proposed sclera recognition consists of five steps which include

sclera segmentation vein pattern enhancement feature extraction feature

matching and matching decision Fig 2 shows the block diagram of sclera

recognition Two types of feature extraction are used in the proposed

method to achieve good accuracy for the identification The characteristics

that are elicited from the blood vessel structure seen in the sclera region are

Histogram of Oriented Gradients (HOG) and interpolation of Cartesian to

Polar conversion HOG is used to determine the gradient orientation and

edge orientations of vein pattern in the sclera region of an eye image To

become more computationally efficient the data of the image are converted

to the polar form It is mainly used for circular or quasi circular shape of

object These two characteristics are extracted from all the images in the

database and compared with the features of the query image whether the

person is correctly identified or not This procedure is done in the feature

matching step and ultimately makes the matching decision By using the

proposed feature extraction methods and matching techniques the human

identification is more accurate than the existing studies In the proposed

method two features of an image are drawn out

222 SCLERA SEGMENTATION

Sclera segmentation is the first step in the sclera recognition It lets

in three steps glare area detection sclera area estimation and iris and eyelid

detection and refinement Fig shows the steps of segmentation

FIG

Glare Area Detection Glare area means a small bright area near

pupil or iris This is the unwanted portion on the eye image Sobel filter is

applied to detect the glare area present in the iris or pupil Simply it runs

only for the grayscale image If the image is color then it needs a

conversion to grayscale image and after that apply it to the Sobel filter to

detect the glare area Fig 4 shows the result of the glare area detection

FIG

Sclera area estimation For the estimation of sclera area Otsursquos

thresholding method is applied The stairs of the sclera area detection are

selection of the area of interest (ROI) Otsursquos thresholding sclera area

detection Left and right sclera area is selected based on the iris boundaries

When the region of interest is selected then apply Otsursquos thresholding for

obtaining the potential sclera areas The correct left sclera area should be

placed in the right and center positions and correct right sclera area should

be placed in the left and center In this way non sclera areas are wiped out

223 IRIS AND EYELID REFINEMENT

The top and underside of the sclera regions are the limits of the

sclera area And then that upper eyelid lower eyelid and iris boundaries are

refined These altogether are the unwanted portion for recognition In order

to eliminate these effects refinement is done in the footstep of the detection

of sclera area Fig shows after the Otsursquos thresholding process and iris and

eyelid refinement to detect right sclera area In the same way the left sclera

area is detected using this method

FIG

In the segmentation process all images are not perfectly segmented

Hence feature extraction and matching are needed to reduce the

segmentation fault The vein patterns in the sclera area are not visible in the

segmentation process To get vein patterns more visible vein pattern

enhancement is to be performed

224 OCULAR SURFACE VASCULATURE

Human recognition using vascular patterns in the human body has

been studied in the context of fingers (Miura et al 2004) palm (Lin and

Fan 2004) and retina (Hill 1999) In the case of retinal biometrics an

especial optical device for imaging the back of the eyeball is needed (Hill

1999) Due to its perceived invasiveness and the required degree of subject

cooperation the use of retinal biometrics may not be acceptable to some

individuals The conjunctiva is a thin transparent and moist tissue that

covers the outer surface of the eye The part of the conjunctiva that covers

the inner lining of the eyelids is called palpebral conjunctiva and the part

that covers the outer surface of the eye is called ocular (or the bulbar)

conjunctiva which is the focus of this study The ocular conjunctiva is very

thin and clear thus the vasculature (including those of the episclera) is

easily visible through it The visible microcirculation of conjunctiva offers a

rich and complex network of veins and fine microcirculation (Fig 1) The

apparent complexity and specificity of these vascular patterns motivated us

to utilize them for personal identification (Derakhshani and Ross 2006)

FIG

We have found conjunctival vasculature to be a suitable biometric as it

conforms to the following criteria (Jain et al 2004)

UNIVERSALITY All normal living tissues including that of the

conjunctiva and episclera have vascular structure

UNIQUENESS Vasculature is created during embryonic vasculogenesis

Its detailed final structure is mostly stochastic and thus unique Even

though no comprehensive study on the uniqueness of vascular structures

has been conducted study of some targeted areas such as those of the eye

fundus confirm the uniqueness of such vascular patterns even between

identical twins (Simon and Goldstein 1935 Tower 1955)

PERMANENCE Other than cases of significant trauma pathology or

chemical intervention spontaneous adult ocular vasculogenesis and

angiogenesis does not easily occur Thus the conjunctival vascular

structure is expected to have reasonable permanence (Joussen 2001)

Practicality Conjunctival vasculature can be captured with commercial off

the shelf digital cameras under normal lighting conditions making this

modality highly practical

ACCEPTABILITY Since the subject is not required to stare directly into

the camera lens and given the possibility of capturing the conjunctival

vasculature from several feet away this modality is non-intrusive and thus

more acceptable

SPOOF-PROOFNESS The fine multi surface structure of the ocular

veins makes them hard to reproduce as a physical artifact Besides being a

stand-alone biometric modality we anticipate that the addition of

conjunctival biometrics will enhance the performance of current iris-based

biometric system in the following ways

Improving accuracy by the addition of vascular features

Facilitating recognition using off-angle iris images For instance if the iris

information is relegated to the left or right portions of the eye the sclera

vein patterns will be further exposed This feature makes sclera vasculature

a natural complement to the iris biometric

Addressing the failure-to-enroll issue when iris patterns are not usable (eg

due to surgical procedures)

Reducing vulnerability to spoof attacks For instance when implemented

alongside iris systems an attacker needs to reproduce not only the iris but

also different surfaces of the sclera along with the associated

microcirculation and make them available on commensurate eye surfaces

The first step in parallelizing an algorithm is to determine the

availability for simultaneous computation Below Figure demonstrates the

possibility for parallel directional filtering Since the filter is computed over

different portions of the input image the computation can be computed in

parallel (denoted by Elements below) In addition individual parallelization

of each element of Filtering can also be performed A detailed discussion of

our proposed parallelization is outside the scope of this paper

FIG

FIG

225 OVERVIEW OF THE LINE DESCRIPTOR-BASED SCLERA

VEIN

2251 RECOGNITION METHOD

The matching segment of the line-descriptor based method is a

bottleneck with regard to matching speed In this section we briefly

describe the Line Descriptor-based sclera vein recognition method After

segmentation vein patterns were enhanced by a bank of directional Gabor

filters Binary morphological operations are used to thin the detected vein

structure down to a single pixel wide skeleton and remove the branch

points The line descriptor is used to describe the segments in the vein

structure Figure 2 shows a visual description of the line descriptor Each

segment is described by three quantities the segments angle to some

reference angle at the iris center θ the segments distance to the iris center r

and the dominant angular orientation of the line segment ɸ Thus the

descriptor is S = ( θ r ɸ )T The individual components of the line descriptor

are calculated as

FIG

Here fline (x) is the polynomial approximation of the line segment (xl yl )

is the center point of the line segment (xi yi ) is the center of the detected

iris and S is the line descriptor In order to register the segments of the

vascular patterns a RANSAC-based algorithm is used to estimate the best-

fit parameters for registration between the two sclera vascular patterns For

the registration algorithm it randomly chooses two points ndash one from the

test template and one from the target template It also randomly chooses a

scaling factor and a rotation value based on a priori knowledge of the

database Using these values it calculates a fitness value for the registration

using these parameters

After sclera template registration each line segment in the test

template is compared to the line segments in the target template for

matches In order to reduce the effect of segmentation errors we created the

weighting image (Figure 3) from the sclera mask by setting interior pixels

in the sclera mask to 1 pixels within some distance of the boundary of the

mask to 05 and pixels outside the mask to 0

The matching score for two segment descriptors is calculated By

where Si and Sj are two segment descriptors m(Si Sj ) is the matching

score between segments Si and Sj d(Si Sj ) is the Euclidean distance

between the segment descriptors center points (from Eq 6-8) Dmatch is

the matching distance threshold and match is the matching angle threshold

The total matching score M is the sum of the individual matching scores

divided by the maximum matching score for the minimal set between the

test and target template That is one of the test or target templates has fewer

points and thus the sum of its descriptors weight sets the maximum score

that can be attained

FIG

FIG

FIG

FIG

movement of eye Y shape branches are observed to be a stable feature and

can be used as sclera feature descriptor To detect the Y shape branches in

the original template we search for the nearest neighbors set of every line

segment in a regular distance classified the angles among these neighbors

If there were two types of angle values in the line segment set this set may

be inferred as a Y shape structure and the line segment angles would be

recorded as a new feature of the sclera

There are two ways to measure both orientation and relationship of

every branch of Y shape vessels one is to use the angles of every branch to

x axle the other is to use the angels between branch and iris radial

direction The first method needs additional rotation operating to align the

template In our approach we employed the second method As Figure 6

shows ϕ1 ϕ2 and ϕ3 denote the angle between each branch and the radius

from pupil center Even when the head tilts the eye moves or the camera

zooms occurs at the image acquisition step ϕ1 ϕ2 and ϕ3 are quite stable

To tolerate errors from the pupil center calculation in the segmentation step

we also recorded the center position (x y) of the Y shape branches as

auxiliary parameters So in our rotation shift and scale invariant feature

vector is defined as y(ϕ1 ϕ2 ϕ3 x y) The Y-shape descriptor is generated

with reference to the iris center Therefore it is automatically aligned to the

iris centers It is a rotational- and scale- invariant descriptor V WPL

SCLERA DESCRIPTOR As we discussed in the Section 22 the line

descriptor is extracted from the skeleton of vessel structure in binary images

(Figure 7) The skeleton is then broken into smaller segments For each

segment a line descriptor is created to record the center and orientation of

the segment This descriptor is expressed as s(x yɸ) where (x y) is the

position of the center and ɸ is its orientation Because of the limitation of

segmentation accuracy the descriptor in the boundary of sclera area might

not be accurate and may contain spur edges resulting from the iris eyelid

andor eyelashes To be tolerant of such error the mask file

FIG

The line descriptor of the sclera vessel pattern (a) An eye image (b) Vessel

patterns in sclera (c) Enhanced sclera vessel patterns (d) Centers of line

segments of vessel patterns

Is designed to indicate whether a line segment belongs to the edge of the

sclera or not However in GPU application using the mask is a challenging

since the mask files are large in size and will occupy the GPU memory and

slow down the data transfer When matching the registration RANSAC

type algorithm was used to randomly select the corresponding descriptors

and the transform parameter between them was used to generate the

template transform affine matrix After every templates transform the mask

data should also be transformed and new boundary should be calculated to

evaluate the weight of the transformed descriptor This results in too many

convolutions in processor unit

To reduce heavy data transfer and computation we designed the

weighted polar line (WPL) descriptor structure which includes the

information of mask and can be automatically aligned We extracted the

relationship of geometric feature of descriptors and store them as a new

descriptor We use a weighted image created via setting various weight

values according to their positions The weight of those descriptors who are

beyond the sclera are set to be 0 and those who are near the sclera

boundary are 05 and interior descriptors are set to be 1 In our work

descriptors weights were calculated on their own mask by the CPU only

once

The calculating result was saved as a component of descriptor The

descriptor of sclera will change to s(x y ɸw) where w denotes the weight

of the point and the value may be 0 05 1 To align two templates when a

template is shifted to another location along the line connecting their

centers all the descriptors of that template will be transformed It would be

faster if two templates have similar reference points If we use the center of

the iris as the reference point when two templates are compared the

correspondence will automatically be aligned to each other since they have

the similar reference point Every feature vector of the template is a set of

line segment descriptors composed of three variable (Figure 8) the

segment angle to the reference line which went through the iris center

denoted as θ the distance between the segments center and pupil center

which is denoted as r the dominant angular orientation of the segment

denoted as ɸ To minimize the GPU computing we also convert the

descriptor value from polar coordinate to rectangular coordinate in CPU

preprocess

The descriptor vector becomes s(x y r θ ɸw) The left and right

parts of sclera in an eye may have different registration parameters For

example as an eyeball moves left left part sclera patterns of the eye may be

compressed while the right part sclera patterns are stretched

In parallel matching these two parts are assigned to threads in

different warps to allow different deformation The multiprocessor in

CUDA manages threads in groups of 32 parallel threads called warps We

reorganized the descriptor from same sides and saved

FIG

FIG

them in continuous address This would meet requirement of coalesced

memory access in GPU

After reorganizing the structure of descriptors and adding mask information

into the new descriptor the computation on the mask file is not needed on

the GPU It was very fast to match with this feature because it does not

need to reregister the templates every time after shifting Thus the cost of

data transfer and computation on GPU will be reduced Matching on the

new descriptor the shift parameters generator in Figure 4 is then simplified

as Figure 9

23 EVOLUTION OF GPU ARCHITECTURE

The fixed-function pipeline lacked the generality to efficiently express

more complicated shading and lighting operations that are essential for

complex effects The key step was replacing the fixed-function per-vertex

and per-fragment operations with user-specified programs run on each

vertex and fragment Over the past six years these vertex programs and

fragment programs have become increasingly more capable with larger

limits on their size and resource consumption with more fully featured

instruction sets and with more flexible control-flow operations After many

years of separate instruction sets for vertex and fragment operations current

GPUs support the unified Shader Model 40 on both vertex and fragment

shaders

The hardware must support shader programs of at least 65 k static

instructions and unlimited dynamic instructions

The instruction set for the first time supports both 32-bit integers and 32-

bit floating-point numbers

The hardware must allow an arbitrary number of both direct and indirect

reads from global memory (texture)

Finally dynamic flow control in the form of loops and branches must be

supported

As the shader model has evolved and become more powerful and GPU

applications of all types have increased vertex and fragment program

complexity GPU architectures have increasingly focused on the

programmable parts of the graphics pipeline Indeed while previous

generations of GPUs could best be described as additions of

programmability to a fixed-function pipeline todayrsquos GPUs are better

characterized as a programmable engine surrounded by supporting fixed-

function units General-Purpose Computing on the GPU Mapping general-

purpose computation onto the GPU uses the graphics hardware in much the

same way as any standard graphics application Because of this similarity it

is both easier and more difficult to explain the process On one hand the

actual operations are the same and are easy to follow on the other hand the

terminology is different between graphics and general-purpose use Harris

provides an excellent description of this mapping process

We begin by describing GPU programming using graphics terminology

then show how the same steps are used in a general-purpose way to author

GPGPU applications and finally use the same steps to show the more

simple and direct way that todayrsquos GPU computing applications are written

231 PROGRAMMING A GPU FOR GRAPHICS

We begin with the same GPU pipeline that we described in Section II

concentrating on the programmable aspects of this pipeline

The programmer specifies geometry that covers a region on the screen

The rasterizer generates a fragment at each pixel location covered by that

geometry

Each fragment is shaded by the fragment program

The fragment program computes the value of the fragment by a

combination of math operations and global memory reads from a global

Btexture[ memory

The resulting image can then be used as texture on future passes through

the graphics pipeline

232 PROGRAMMING A GPU FOR GENERAL-PURPOSE

PROGRAMS (OLD)

Coopting this pipeline to perform general-purpose computation

involves the exact same steps but different terminology A motivating

example is a fluid simulation computed over a grid at each time step we

compute the next state of the fluid for each grid point from the current state

at its grid point and at the grid points of its neighbors

The programmer specifies a geometric primitive that covers a

computation domain of interest The rasterizer generates a fragment at each

pixel location covered by that geometry (In our example our primitive

must cover a grid of fragments equal to the domain size of our fluid

simulation)

Each fragment is shaded by an SPMD general-purpose fragment

program (Each grid point runs the same program to update the state of its

fluid)

The fragment program computes the value of the fragment by a

combination of math operations and Bgather[ accesses from global

memory (Each grid point can access the state of its neighbors from the

previous time step in computing its current value)

The resulting buffer in global memory can then be used as an input on

future passes (The current state of the fluid will be used on the next time

step)

23 2 PROGRAMMING A GPU FOR GENERAL-PURPOSE

PROGRAMS (NEW)

One of the historical difficulties in programming GPGPU applications

has been that despite their general-purpose tasksrsquo having nothing to do with

graphics the applications still had to be programmed using graphics APIs

In addition the program had to be structured in terms of the graphics

pipeline with the programmable units only accessible as an intermediate

step in that pipeline when the programmer would almost certainly prefer to

access the programmable units directly The programming environments we

describe in detail in Section IV are solving this difficulty by providing a

more natural direct non-graphics interface to the hardware and

specifically the programmable units Today GPU computing applications

are structured in the following way

The programmer directly defines the computation domain of interest as a

structured grid of threads

An SPMD general-purpose program computes the value of each thread

The value for each thread is computed by a combination of math

operations and both Bgather[ (read) accesses from and Bscatter[ (write)

accesses to global memory Unlike in the previous two

methods the same buffer can be used for both reading and writing

allowing more flexible algorithms (for example in-place algorithms that

use less memory)

The resulting buffer in global memory can then be used as an input in

future computation

24 COARSE-TO-FINE TWO-STAGE MATCHING PROCESS

To further improve the matching process we propose the coarse-to-fine

two-stage matching process In the first stage we matched two images

coarsely using the Y-shape descriptors which is very fast to match because

no registration was needed The matching result in this stage can help filter

out image pairs with low similarities After this step it is still possible for

some false positive matches In the second stage we used WPL descriptor

to register the two images for more detailed descriptor matching including

scale- and translation invariance This stage includes shift transform affine

matrix generation and final WPL descriptor matching Overall we

partitioned the registration and matching processing into four kernels2 in

CUDA (Figure 10) matching on the Y shape descriptor shift

transformation affine matrix generation and final WSL descriptor

matching Combining these two stages the matching program can run faster

and

achieve more accurate score

241 STAGE I MATCHING WITH Y SHAPE DESCRIPTOR

Due to scale- and rotation- invariance of the Y-shape features

registration is unnecessary before matching on Y shape descriptor The

whole matching algorithm is listed as algorithm 1

FIG

Here ytei and yta j are the Y shape descriptors of test template Tte

and target template Tta respectively dϕ is the Euclidian distance of angle

element of descriptors vector defined as (3) dxy is the Euclidian distance of

two descriptor centers defined as (4) ni and di are the matched descriptor

pairsrsquo number and their centers distance respectively tϕ is a distance

threshold and txy is the threshold to restrict the searching area We set tϕ to

30 and txy to 675 in our experiment Here

To match two sclera templates we searched the areas nearby to all

the Y shape branches The search area is limited to the corresponding left or

right half of the sclera in order to reduce the searching range and time The

distance of two branches is defined in (3) where ϕi j is the angle between

the j th branch and the polar from pupil center in desctiptor i

The number of matched pairs ni and the distance between Y shape

branches centers di are stored as the matching result We fuse the number of

matched branches and the average distance between matched branches

centers as (2) Here α is a factor to fuse the matching score which was set

to 30 in our study Ni and Nj is the total numbers of feature vectors in

template i and j separately The decision is regulated by the threshold t if

the sclerarsquos matching score is lower than t the sclera will be discarded The

sclera with high matching score will be passed to the next more precisely

matching process

242 STAGE II FINE MATCHING USING WPL DESCRIPTOR

The line segment WSL descriptor reveals more vessel structure detail of

sclera than the Y shape descriptor The variation of sclera vessel pattern is

nonlinear because

When acquiring an eye image in different gaze angle the vessel structure

will appear nonlinear shrink or extend because eyeball is spherical in shape

sclera is made up of four layers episclera stroma lamina fusca and

endothelium There are slightly differences among movement of these

layers Considering these factors our registration employed both single

shift transform and multi-parameter transform which combines shift

rotation and scale together

1) SHIFT PARAMETER SEARCH As we discussed before

segmentation may not be accurate As a result the detected iris center could

not be very accurate Shift transform is designed to tolerant possible errors

in pupil center detection in the segmentation step If there is no deformation

or only very minor deformation registration with shift transform together

would be adequate to achieve an accurate result We designed Algorithm 2

to get optimized shift parameter Where Tte is the test template and ssei is

the i th WPL descriptor of Tte Tta is the target template and ssai is the i th

WPL descriptor of Ttad(stek staj ) is Euclidean distance of escriptors stek

and staj

Δsk is the shift value of two descriptors defines as

We first randomly select an equal number of segment descriptors

stek in test template Tte from each quad and find its nearest neighbors staj _

in target template Tta The shift offset of them is recorded as the possible

registration shift factor _sk The final offset registration factor is _soptim

which has the smallest standard deviation among these candidate offsets

2) AFFINE TRANSFORM PARAMETER SEARCH

Affine transform is designed to tolerant some deformation of sclera

patterns in the matching step The affine transform algorithm is shown in

Algorithm 3 The shift value in the parameter set is obtained by randomly

selecting descriptor s(it )te and calculating the distance from its nearest

neighbor staj_ in Tta We transform the test template by the matrix in (7)

At end of the iteration we count the numbers of matched descriptor pairs

from the transformed template and the target template The factor β is

involved to determine if the pair of descriptor is matched and we set it to

be 20 pixels in our experiment After N iterations the optimized transform

parameter set is determined via selecting the maximum matching numbers

m(it) Here stei Tte staj and Tta is defined same as algorithm 2 tr (it )

shi f t θ(it )tr (it ) scale is the parameters of shift rotation and scale

transform generated in i tth iteration R(θ (it )) T (tr (it ) shi f t ) and S(tr (it

) scale) are the transform matrix defined as (7) To search optimize

transform parameter we iterated N times to generate these parameters In

our experiment we set iteration time to 512

3) REGISTRATION AND MATCHING ALGORITHM

Using the optimized parameter set determined from Algorithms 2

and 3 the test template will be registered and matched simultaneously The

registration and matching algorithm is listed in Algorithm 4 Here stei Tte

staj and Tta are defined same as Algorithms 2 and 3 θ(optm) tr (optm) shi f

t tr (optm) scale _soptim are the registration parameters attained from

Algorithms 2 and 3 R_θ(optm)_T _tr (optm) shi f t _S_tr (optm) scale

_ is the descriptor transform matrix defined in Algorithm 3 empty is the angle

between the segment descriptor and radius direction w is the weight of the

descriptor which indicates whether the descriptor is at the edge of sclera or

not To ensure that the nearest descriptors have a similar orientation we

used a constant factor α to check the abstract difference of two ɸ In our

experiment we set α to 5 The total matching score is minimal score of two

transformed result divided by the minimal matching score for test template

and target template

25 MAPPING THE SUBTASKS TO CUDA

CUDA is a single instruction multiple data (SIMD) system and

works as a coprocessor with a CPU A CUDA consists of many streaming

multiprocessors (SM) where the parallel part of the program should be

partitioned into threads by the programmer and mapped into those threads

There are multiple memory spaces in the CUDA memory hierarchy

register local memory shared memory global memory constant memory

and texture memory Register local memory and shared memory are on-

chip and could be a little time consuming to access these memories Only

shared memory can be accessed by other threads within the same block

However there is only limited availability of shared memory Global

memory constant memory and texture memory are off-chip memory and

accessible by all threads which would be very time consuming to access

these memories

Constant memory and texture memory are read-only and cacheable

memory Mapping algorithms to CUDA to achieve efficient processing is

not a trivial task There are several challenges in CUDA programming

If threads in a warp have different control path all the branches will be

executed serially To improve performance branch divergence within a

warp should be avoided

Global memory is slower than on-chip memory in term of access To

completely hide the latency of the small instructions set we should use on-

chip memory preferentially rather than global memory When global

memory access occurs threads in same warp should access the words in

sequence to achieve coalescence

Shared memory is much faster than the local and global memory space

But shared memory is organized into banks which are equal in size If two

addresses of memory request from different thread within a warp fall in the

same memory bank the access will be serialized To get maximum

performance memory requests should be scheduled to minimize bank

conflicts

251 MAPPING ALGORITHM TO BLOCKS

Because the proposed registration and matching algorithm has four

independent modules all the modules will be converted to different kernels

on the GPU These kernels are different in computation density thus we

map them to the GPU by various map strategies to fully utilize the

computing power of CUDA Figure 11 shows our scheme of CPU-GPU

task distribution and the partition among blocks and threads Algorithm 1 is

partitioned into coarse-grained parallel subtasks

We create a number of threads in this kernel The number of threads

is the same as the number of templates in the database As the upper middle

column shows in Figure 11 each target template will be assigned to one

thread One thread performs a pair of templates compare In our work we

use NVIDIA C2070 as our GPU Threads and blocks number is set to

1024 That means we can match our test template with up to 1024times1024

target templates at same time

Algorithms 2-4 will be partitioned into fine-grained subtasks which is

processed a section of descriptors in one thread As the lower portion of the

middle column shows in Figure 11 we assigned a target template to one

block Inside a block one thread corresponds a set of descriptors in this

template This partition makes every block execute independently and there

are no data exchange requirements between different blocks When all

threads complete their responding descriptor fractions the sum of the

intermediate results needs to be computed or compared A parallel prefix

sum algorithm is used to calculate the sum of intermediate results which is

show in right of Figure 11 Firstly all odd number threads compute the sum

of consecutive pairs of the results Then recursively every first of i (= 4 8

16 32 64 ) threads

compute the prefix sum on the new result The final result will be saved in

the first address which has the same variable name as the first intermediate

result

252 MAPPING INSIDE BLOCK

In shift argument searching there are two schemes we can choose to

map task

Mapping one pair of templates to all the threads in a block and then every

thread would take charge of a fraction of descriptors and cooperation with

other threads

Assigning a single possible shift offset to a thread and all the threads will

compute independently unless the final result should be compared with

other possible offset

Due to great number of sum and synchronization operations in every

nearest neighbor searching step we choose the second method to parallelize

shift searching In affine matrix generator we mapped an entire parameter

set searching to a thread and every thread randomly generated a set of

parameters and tried them independently The generated iterations were

assigned to all threads The challenge of this step is the randomly generated

numbers might be correlated among threads In the step of rotation and

scale registration generating we used the Mersenne Twister pseudorandom

number generator because it can use bitwise arithmetic and have long

period

The Mersenne twister as most of pseudorandom generators is iterative

Therefore itrsquos hard to parallelize a single twister state update step among

several execution threads To make sure that thousands of threads in the

launch grid generate uncorrelated random sequence many simultaneous

Mersenne twisters need to process with different initial states in parallel

But even ldquovery differentrdquo (by any definition) initial state values do not

prevent the emission of correlated sequences by each generator sharing

identical parameters To solve this problem and to enable efficient

implementation of Mersenne Twister on parallel architectures we used a

special offline tool for the dynamic creation of Mersenne Twisters

parameters modified from the algorithm developed by Makoto Matsumoto

and Takuji Nishimura In the registration and matching step when

searching the nearest neighbor a line segment that has already matched

with others should not be used again In our approach a flag

FIG

FIG

Variable denoting whether the line has been matched is stored in

shared memory To share the flags all the threads in a block should wait

synchronic operation at every query step Our solution is to use a single

thread in a block to process the matching

253 MEMORY MANAGEMENT

The bandwidth inside GPU board is much higher than the

bandwidth between host memory and device memory The data transfer

between host and device can lead to long latency As shown in Figure 11

we load the entire target templates set from database without considering

when they would be processed Therefore there was no data transfer from

host to device during the matching procedure In global memory the

components in descriptor y(ϕ1 ϕ2 ϕ3 x y) and s(x y rθ ϕw) were stored

separately This would guarantee contiguous kernels of Algorithm 2 to 4

can access their data in successive addresses Although such coalescing

access reduces the latency frequently global memory access was still a

slower way to get data In our kernel we loaded the test template to shared

memory to accelerate memory access Because the Algorithms 2 to 4

execute different number of iterations on same data the bank conflict does

not happen To maximize our texture memory space we set the system

cache to the lowest value and bonded our target descriptor to texture

memory Using this catchable memory our data access was accelerated

more

FIG

26 HISTOGRAM OF ORIENTED GRADIENTS

Histogram of oriented gradients is the feature descriptors It is primarily

applied to the design of target detection In this paper it is applied as the

feature for human recognition In the sclera region the vein patterns are the

edges of an image So HOG is used to determine the gradient orientation

and edge orientations of vein pattern in the sclera region of an eye image

To follow out this technique first of all divide the image into small

connected regions called cells For each cell compute the histogram of

gradient directions or edge orientations of the pixels Then the combination

of different histogram of different cell represents the descriptor To improve

accuracy histograms can be contrast normalized by calculating the intensity

from the block and then using this value normalizes all cells within the

block This normalization result shows that it is invariant to geometric and

photometric changes The gradient magnitude m(x y) and orientation 1050592(x

y) are calculated using x and y directions gradients dx (x y) and dy (x y)

Orientation binning is the second step of HOG This method utilized

to create cell histograms Each pixel within the cell used to give a weight to

the orientation which is found in the gradient computation Gradient

magnitude is used as the weight The cells are in the rectangular form The

binning of gradient orientation should be spread over 0 to 180 degrees and

opposite direction counts as the same In the Fig 8 depicts the edge

orientation of picture elements If the images have any illumination and

contrast changes then the gradient strength must be locally normalized For

that cells are grouped together into larger blocks These blocks are

overlapping so that each cell contributes more than once to the final

descriptor Here rectangular HOG (R-HOG) blocks are applied which are

mainly in square grids The performance of HOG is improved by putting

on a Gaussian window into each block

FIG

CHAPTER 3

SOFTWARE SPECIFICATION

31 GENERAL

MATLAB(matrix laboratory) is a numerical

computing environment and fourth-generation programming language

Developed by Math Works MATLAB allows matrix manipulations

plotting of functions and data implementation of algorithms creation

of user interfaces and interfacing with programs written in other languages

including C C++ Java and Fortran

Although MATLAB is intended primarily for numerical computing an

optional toolbox uses the MuPAD symbolic engine allowing access

to symbolic computing capabilities An additional package Simulink adds

graphicalmulti-domainsimulationandModel-Based

Design for dynamic and embedded systems

In 2004 MATLAB had around one million users across industry

and academia MATLAB users come from various backgrounds

of engineering science and economics MATLAB is widely used in

academic and research institutions as well as industrial enterprises

MATLAB was first adopted by researchers and practitioners

in control engineering Littles specialty but quickly spread to many other

domains It is now also used in education in particular the teaching

of linear algebra and numerical analysis and is popular amongst scientists

involved in image processing The MATLAB application is built around the

MATLAB language The simplest way to execute MATLAB code is to type

it in the Command Window which is one of the elements of the MATLAB

Desktop When code is entered in the Command Window MATLAB can

be used as an interactive mathematical shell Sequences of commands can

be saved in a text file typically using the MATLAB Editor as a script or

encapsulated into a function extending the commands available

MATLAB provides a number of features for documenting and

sharing your work You can integrate your MATLAB code with other

languages and applications and distribute your MATLAB algorithms and

applications

32 FEATURES OF MATLAB

High-level language for technical computing

Development environment for managing code files and data

Interactive tools for iterative exploration design and problem solving

Mathematical functions for linear algebra statistics Fourier analysis

filtering optimization and numerical integration

2-D and 3-D graphics functions for visualizing data

Tools for building custom graphical user interfaces

Functions for integrating MATLAB based algorithms with external

applications and languages such as C C++ FORTRAN Javatrade COM

and Microsoft Excel

MATLAB is used in vast area including signal and image

processing communications control design test and measurement

financial modeling and analysis and computational Add-on toolboxes

(collections of special-purpose MATLAB functions) extend the MATLAB

environment to solve particular classes of problems in these application

areas

MATLAB can be used on personal computers and powerful

server systems including the Cheaha compute cluster With the addition of

the Parallel Computing Toolbox the language can be extended with parallel

implementations for common computational functions including for-loop

unrolling Additionally this toolbox supports offloading computationally

intensive workloads to Cheaha the campus compute cluster MATLAB is

one of a few languages in which each variable is a matrix (broadly

construed) and knows how big it is Moreover the fundamental operators

(eg addition multiplication) are programmed to deal with matrices when

required And the MATLAB environment handles much of the bothersome

housekeeping that makes all this possible Since so many of the procedures

required for Macro-Investment Analysis involves matrices MATLAB

proves to be an extremely efficient language for both communication and

implementation

321 INTERFACING WITH OTHER LANGUAGES

MATLAB can call functions and subroutines written in the C

programming language or FORTRAN A wrapper function is created

allowing MATLAB data types to be passed and returned The dynamically

loadable object files created by compiling such functions are termed MEX-

files (for MATLAB executable)

Libraries written in Java ActiveX or NET can be directly called

from MATLAB and many MATLAB libraries (for

example XML or SQL support) are implemented as wrappers around Java

or ActiveX libraries Calling MATLAB from Java is more complicated but

can be done with MATLAB extension which is sold separately by Math

Works or using an undocumented mechanism called JMI (Java-to-Mat lab

Interface) which should not be confused with the unrelated Java that is also

called JMI

As alternatives to the MuPAD based Symbolic Math Toolbox

available from Math Works MATLAB can be connected

to Maple or Mathematical

Libraries also exist to import and export MathML

Development Environment

Startup Accelerator for faster MATLAB startup on Windows especially on

Windows XP and for network installations

Spreadsheet Import Tool that provides more options for selecting and

loading mixed textual and numeric data

Readability and navigation improvements to warning and error messages in

the MATLAB command window

Automatic variable and function renaming in the MATLAB Editor

Developing Algorithms and Applications

MATLAB provides a high-level language and development

tools that let you quickly develop and analyze your algorithms and

applications

The MATLAB Language

The MATLAB language supports the vector and matrix operations

that are fundamental to engineering and scientific problems It enables fast

development and execution With the MATLAB language you can

program and develop algorithms faster than with traditional languages

because you do not need to perform low-level administrative tasks such as

declaring variables specifying data types and allocating memory In many

cases MATLAB eliminates the need for lsquoforrsquo loops As a result one line of

MATLAB code can often replace several lines of C or C++ code

At the same time MATLAB provides all the features of a traditional

programming language including arithmetic operators flow control data

structures data types object-oriented programming (OOP) and debugging

features

MATLAB lets you execute commands or groups of commands one

at a time without compiling and linking enabling you to quickly iterate to

the optimal solution For fast execution of heavy matrix and vector

computations MATLAB uses processor-optimized libraries For general-

purpose scalar computations MATLAB generates machine-code

instructions using its JIT (Just-In-Time) compilation technology

This technology which is available on most platforms provides

execution speeds that rival those of traditional programming languages

Development Tools

MATLAB includes development tools that help you implement

your algorithm efficiently These include the following

MATLAB Editor

Provides standard editing and debugging features such as setting

breakpoints and single stepping

Code Analyzer

Checks your code for problems and recommends modifications to

maximize performance and maintainability

MATLAB Profiler

Records the time spent executing each line of code

Directory Reports

Scan all the files in a directory and report on code efficiency file

differences file dependencies and code coverage

Designing Graphical User Interfaces

By using the interactive tool GUIDE (Graphical User Interface

Development Environment) to layout design and edit user interfaces

GUIDE lets you include list boxes pull-down menus push buttons radio

buttons and sliders as well as MATLAB plots and Microsoft

ActiveXreg controls Alternatively you can create GUIs programmatically

using MATLAB functions

322 ANALYZING AND ACCESSING DATA

MATLAB supports the entire data analysis process from acquiring

data from external devices and databases through preprocessing

visualization and numerical analysis to producing presentation-quality

output

Data Analysis

MATLAB provides interactive tools and command-line functions for data

analysis operations including

Interpolating and decimating

Extracting sections of data scaling and averaging

Thresholding and smoothing

Correlation Fourier analysis and filtering

1-D peak valley and zero finding

Basic statistics and curve fitting

Matrix analysis

Data Access

MATLAB is an efficient platform for accessing data from

files other applications databases and external devices You can read data

from popular file formats such as Microsoft Excel ASCII text or binary

files image sound and video files and scientific files such as HDF and

HDF5 Low-level binary file IO functions let you work with data files in

any format Additional functions let you read data from Web pages and

XML

Visualizing Data

All the graphics features that are required to visualize engineering

and scientific data are available in MATLAB These include 2-D and 3-D

plotting functions 3-D volume visualization functions tools for

interactively creating plots and the ability to export results to all popular

graphics formats You can customize plots by adding multiple axes

changing line colors and markers adding annotation Latex equations and

legends and drawing shapes

2-D Plotting

Visualizing vectors of data with 2-D plotting functions that create

Line area bar and pie charts

Direction and velocity plots

Histograms

Polygons and surfaces

Scatterbubble plots

Animations

3-D Plotting and Volume Visualization

MATLAB provides functions for visualizing 2-D matrices 3-

D scalar and 3-D vector data You can use these functions to visualize and

understand large often complex multidimensional data Specifying plot

characteristics such as camera viewing angle perspective lighting effect

light source locations and transparency

3-D plotting functions include

Surface contour and mesh

Image plots

Cone slice stream and isosurface

323 PERFORMING NUMERIC COMPUTATION

MATLAB contains mathematical statistical and engineering

functions to support all common engineering and science operations These

functions developed by experts in mathematics are the foundation of the

MATLAB language The core math functions use the LAPACK and BLAS

linear algebra subroutine libraries and the FFTW Discrete Fourier

Transform library Because these processor-dependent libraries are

optimized to the different platforms that MATLAB supports they execute

faster than the equivalent C or C++ code

MATLAB provides the following types of functions for performing

mathematical operations and analyzing data

Matrix manipulation and linear algebra

Polynomials and interpolation

Fourier analysis and filtering

Data analysis and statistics

Optimization and numerical integration

Ordinary differential equations (ODEs)

Partial differential equations (PDEs)

Sparse matrix operations

MATLAB can perform arithmetic on a wide range of data types

including doubles singles and integers

CHAPTER 4

IMPLEMENTATION

41 GENERAL

Matlab is a program that was originally designed to simplify the

implementation of numerical linear algebra routines It has since grown into

something much bigger and it is used to implement numerical algorithms

for a wide range of applications The basic language used is very similar to

standard linear algebra notation but there are a few extensions that will

likely cause you some problems at first

42 SNAPSHOTS

ORIGINAL SCLERA IMAGE IS CONVERTED INTO GREY SCALE IMAGE

FIG

GREY SCALE IMAGE IS CONVERTED INTO BINARY IMAGE

FIG

EDGE DETECTON IS DONE BY OTSUrsquoS THRESHOLDING

FIG

SELECTING THE REGION OF INTEREST (SCLERA PART)

FIG

SELECTED ROI PART

FIG

FIG

ENHANCEMENT OF SCLERA IMAGE

FIG

FEATURE EXTRACTION OF SCLERA IMAGE USING GABOR

FILTERS

FIG

MATCHING WITH IMAGES IN DATABASE

FIG

DISPLAYING THE RESULT (MATCHED OR NOT MATCHED)

FIG

CHAPTER 5

APPLICATIONS

The applications of biometrics can be divided into the following three main groups

Commercial applications such as computer network login electronic data security ecommerce Internet access ATM credit card physical access control cellular phone PDA medical records management distance learning etc

Government applications such as national ID card correctional facility driverrsquos license social security welfare-disbursement border control Passport control etc

Forensic applications such as corpse identification criminal investigation terrorist identification parenthood determination missing children etc Traditionally commercial applications have used knowledge-based systems (eg PIN sand passwords) government applications have used token-based systems (eg ID cards and badges) and forensic applications have relied on human experts to match biometric features Biometric systems are being increasingly deployed in large scale civilian applications The Schiphol Premium scheme at the Amsterdam airport for example employs iris scan cards to speed up the passport and visa control procedures

CHAPTER 6

CONCLUSION AND FUTURE SCOPE

61 CONCLUSION

In this paper we proposed a new parallel sclera vein recognition

method which employees a two stage parallel approach for registration and

matching Even though the research focused on developing a parallel sclera

matching solution for the sequential line-descriptor method using CUDA

GPU architecture the parallel strategies developed in this research can be

applied to design parallel solutions to other sclera vein recognition methods

and general pattern recognition methods We designed the Y shape

descriptor to narrow the search range to increase the matching efficiency

which is a new feature extraction method to take advantage of the GPU

structures We developed the WPL descriptor to incorporate mask

information and make it more suitable for parallel computing which can

dramatically reduce data transferring and computation We then carefully

mapped our algorithms to GPU threads and blocks which is an important

step to achieve parallel computation efficiency using a GPU A work flow

which has high arithmetic intensity to hide the memory access latency was

designed to partition the computation task to the heterogeneous system of

CPU and GPU even to the threads in GPU The proposed method

dramatically improves the matching efficiency without compromising

recognition accuracy

62 REFERENCES

[1] C W Oyster The Human Eye Structure and Function Sunderland

Sinauer Associates 1999

[2] C Cuevas D Berjon F Moran and N Garcia ldquoMoving object

detection for real-time augmented reality applications in a GPGPUrdquo IEEE

Trans Consum Electron vol 58 no 1 pp 117ndash125 Feb 2012

[3] D C Cirean U Meier L M Gambardella and J Schmidhuber ldquoDeep

big simple neural nets for handwritten digit recognitionrdquo Neural Comput

vol 22 no 12 pp 3207ndash3220 2010

[4] F Z Sakr M Taher and A M Wahba ldquoHigh performance iris

recognition system on GPUrdquo in Proc ICCES 2011 pp 237ndash242

[5] G Poli J H Saito J F Mari and M R Zorzan ldquoProcessing

neocognitron of face recognition on high performance environment based

on GPU with CUDA architecturerdquo in Proc 20th Int Symp Comput

Archit High Perform Comput 2008 pp 81ndash88

[6] J Antikainen J Havel R Josth A Herout P Zemcik and M Hauta-

Kasari ldquoNonnegative tensor factorization accelerated using GPGPUrdquo IEEE

Trans Parallel Distrib Syst vol 22 no 7 pp 1135ndash1141 Feb 2011

[7] K-S Oh and K Jung ldquoGPU implementation of neural networksrdquo

Pattern Recognit vol 37 no 6 pp 1311ndash1314 2004

[8] P R Dixon T Oonishi and S Furui ldquoHarnessing graphics processors

for the fast computation of acoustic likelihoods in speech recognitionrdquo

Comput Speech Lang vol 23 no 4 pp 510ndash526 2009

[9] P Kaufman and A Alm ldquoClinical applicationrdquo Adlerrsquos Physiology of

the Eye 2003

[10] R N Rakvic B J Ulis R P Broussard R W Ives and N Steiner

ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics Security vol 4

no 4 pp 812ndash823 Dec 2009

[11] S Crihalmeanu and A Ross ldquoMultispectral scleral patterns for ocular

biometric recognitionrdquo Pattern Recognit Lett vol 33 no 14 pp 1860ndash

1869 Oct 2012

[12] W Wenying Z Dongming Z Yongdong L Jintao and G

Xiaoguang ldquoRobust spatial matching for object retrieval and its parallel

implementation on GPUrdquo IEEE Trans Multimedia vol 13 no 6 pp

1308ndash1318 Dec 2011Multimedia Sec Magdeburg Germany Sep 2004

pp 4ndash15

[13] Y Xu S Deka and R Righetti ldquoA hybrid CPU-GPGPU approach for

real-time elastographyrdquo IEEE Trans Ultrason Ferroelectr Freq Control

vol 58 no 12 pp 2631ndash2645 Dec 2011

[14] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive

multimodal eye recognitionrdquo Signal Image Video Process vol 7 no 4

pp 619ndash631 Jul 2013

[15] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive

approach for sclera image quality measurerdquo Int J Biometrics vol 5 no 2

pp 181ndash198 2013

[16] Z Zhou E Y Du N L Thomas and E J Delp ldquoA new human

identification method Sclera recognitionrdquo IEEE Trans Syst Man

Cybern A Syst Humans vol 42 no 3 pp 571ndash583 May 2012

Page 32: Major project report on

of the RGB image and a multi-scale region growing approach to identify

the sclera veins from the image background Crihalmeanu and Ross applied

a selective enhancement filter for blood vessels to extract features from the

green component in a color image In the feature matching step

Crihalmeanu and Ross proposed

three registration and matching approaches including Speed Up Robust

Features (SURF) which is based on interest-point detection minutiae

detection which is based on minutiae points on the vasculature structure

and direct correlation matching which relies on image registration Zhou et

al designed a line descriptor based feature registration and matching

method

The proposed sclera recognition consists of five steps which include

sclera segmentation vein pattern enhancement feature extraction feature

matching and matching decision Fig 2 shows the block diagram of sclera

recognition Two types of feature extraction are used in the proposed

method to achieve good accuracy for the identification The characteristics

that are elicited from the blood vessel structure seen in the sclera region are

Histogram of Oriented Gradients (HOG) and interpolation of Cartesian to

Polar conversion HOG is used to determine the gradient orientation and

edge orientations of vein pattern in the sclera region of an eye image To

become more computationally efficient the data of the image are converted

to the polar form It is mainly used for circular or quasi circular shape of

object These two characteristics are extracted from all the images in the

database and compared with the features of the query image whether the

person is correctly identified or not This procedure is done in the feature

matching step and ultimately makes the matching decision By using the

proposed feature extraction methods and matching techniques the human

identification is more accurate than the existing studies In the proposed

method two features of an image are drawn out

222 SCLERA SEGMENTATION

Sclera segmentation is the first step in the sclera recognition It lets

in three steps glare area detection sclera area estimation and iris and eyelid

detection and refinement Fig shows the steps of segmentation

FIG

Glare Area Detection Glare area means a small bright area near

pupil or iris This is the unwanted portion on the eye image Sobel filter is

applied to detect the glare area present in the iris or pupil Simply it runs

only for the grayscale image If the image is color then it needs a

conversion to grayscale image and after that apply it to the Sobel filter to

detect the glare area Fig 4 shows the result of the glare area detection

FIG

Sclera area estimation For the estimation of sclera area Otsursquos

thresholding method is applied The stairs of the sclera area detection are

selection of the area of interest (ROI) Otsursquos thresholding sclera area

detection Left and right sclera area is selected based on the iris boundaries

When the region of interest is selected then apply Otsursquos thresholding for

obtaining the potential sclera areas The correct left sclera area should be

placed in the right and center positions and correct right sclera area should

be placed in the left and center In this way non sclera areas are wiped out

223 IRIS AND EYELID REFINEMENT

The top and underside of the sclera regions are the limits of the

sclera area And then that upper eyelid lower eyelid and iris boundaries are

refined These altogether are the unwanted portion for recognition In order

to eliminate these effects refinement is done in the footstep of the detection

of sclera area Fig shows after the Otsursquos thresholding process and iris and

eyelid refinement to detect right sclera area In the same way the left sclera

area is detected using this method

FIG

In the segmentation process all images are not perfectly segmented

Hence feature extraction and matching are needed to reduce the

segmentation fault The vein patterns in the sclera area are not visible in the

segmentation process To get vein patterns more visible vein pattern

enhancement is to be performed

224 OCULAR SURFACE VASCULATURE

Human recognition using vascular patterns in the human body has

been studied in the context of fingers (Miura et al 2004) palm (Lin and

Fan 2004) and retina (Hill 1999) In the case of retinal biometrics an

especial optical device for imaging the back of the eyeball is needed (Hill

1999) Due to its perceived invasiveness and the required degree of subject

cooperation the use of retinal biometrics may not be acceptable to some

individuals The conjunctiva is a thin transparent and moist tissue that

covers the outer surface of the eye The part of the conjunctiva that covers

the inner lining of the eyelids is called palpebral conjunctiva and the part

that covers the outer surface of the eye is called ocular (or the bulbar)

conjunctiva which is the focus of this study The ocular conjunctiva is very

thin and clear thus the vasculature (including those of the episclera) is

easily visible through it The visible microcirculation of conjunctiva offers a

rich and complex network of veins and fine microcirculation (Fig 1) The

apparent complexity and specificity of these vascular patterns motivated us

to utilize them for personal identification (Derakhshani and Ross 2006)

FIG

We have found conjunctival vasculature to be a suitable biometric as it

conforms to the following criteria (Jain et al 2004)

UNIVERSALITY All normal living tissues including that of the

conjunctiva and episclera have vascular structure

UNIQUENESS Vasculature is created during embryonic vasculogenesis

Its detailed final structure is mostly stochastic and thus unique Even

though no comprehensive study on the uniqueness of vascular structures

has been conducted study of some targeted areas such as those of the eye

fundus confirm the uniqueness of such vascular patterns even between

identical twins (Simon and Goldstein 1935 Tower 1955)

PERMANENCE Other than cases of significant trauma pathology or

chemical intervention spontaneous adult ocular vasculogenesis and

angiogenesis does not easily occur Thus the conjunctival vascular

structure is expected to have reasonable permanence (Joussen 2001)

Practicality Conjunctival vasculature can be captured with commercial off

the shelf digital cameras under normal lighting conditions making this

modality highly practical

ACCEPTABILITY Since the subject is not required to stare directly into

the camera lens and given the possibility of capturing the conjunctival

vasculature from several feet away this modality is non-intrusive and thus

more acceptable

SPOOF-PROOFNESS The fine multi surface structure of the ocular

veins makes them hard to reproduce as a physical artifact Besides being a

stand-alone biometric modality we anticipate that the addition of

conjunctival biometrics will enhance the performance of current iris-based

biometric system in the following ways

Improving accuracy by the addition of vascular features

Facilitating recognition using off-angle iris images For instance if the iris

information is relegated to the left or right portions of the eye the sclera

vein patterns will be further exposed This feature makes sclera vasculature

a natural complement to the iris biometric

Addressing the failure-to-enroll issue when iris patterns are not usable (eg

due to surgical procedures)

Reducing vulnerability to spoof attacks For instance when implemented

alongside iris systems an attacker needs to reproduce not only the iris but

also different surfaces of the sclera along with the associated

microcirculation and make them available on commensurate eye surfaces

The first step in parallelizing an algorithm is to determine the

availability for simultaneous computation Below Figure demonstrates the

possibility for parallel directional filtering Since the filter is computed over

different portions of the input image the computation can be computed in

parallel (denoted by Elements below) In addition individual parallelization

of each element of Filtering can also be performed A detailed discussion of

our proposed parallelization is outside the scope of this paper

FIG

FIG

225 OVERVIEW OF THE LINE DESCRIPTOR-BASED SCLERA

VEIN

2251 RECOGNITION METHOD

The matching segment of the line-descriptor based method is a

bottleneck with regard to matching speed In this section we briefly

describe the Line Descriptor-based sclera vein recognition method After

segmentation vein patterns were enhanced by a bank of directional Gabor

filters Binary morphological operations are used to thin the detected vein

structure down to a single pixel wide skeleton and remove the branch

points The line descriptor is used to describe the segments in the vein

structure Figure 2 shows a visual description of the line descriptor Each

segment is described by three quantities the segments angle to some

reference angle at the iris center θ the segments distance to the iris center r

and the dominant angular orientation of the line segment ɸ Thus the

descriptor is S = ( θ r ɸ )T The individual components of the line descriptor

are calculated as

FIG

Here fline (x) is the polynomial approximation of the line segment (xl yl )

is the center point of the line segment (xi yi ) is the center of the detected

iris and S is the line descriptor In order to register the segments of the

vascular patterns a RANSAC-based algorithm is used to estimate the best-

fit parameters for registration between the two sclera vascular patterns For

the registration algorithm it randomly chooses two points ndash one from the

test template and one from the target template It also randomly chooses a

scaling factor and a rotation value based on a priori knowledge of the

database Using these values it calculates a fitness value for the registration

using these parameters

After sclera template registration each line segment in the test

template is compared to the line segments in the target template for

matches In order to reduce the effect of segmentation errors we created the

weighting image (Figure 3) from the sclera mask by setting interior pixels

in the sclera mask to 1 pixels within some distance of the boundary of the

mask to 05 and pixels outside the mask to 0

The matching score for two segment descriptors is calculated By

where Si and Sj are two segment descriptors m(Si Sj ) is the matching

score between segments Si and Sj d(Si Sj ) is the Euclidean distance

between the segment descriptors center points (from Eq 6-8) Dmatch is

the matching distance threshold and match is the matching angle threshold

The total matching score M is the sum of the individual matching scores

divided by the maximum matching score for the minimal set between the

test and target template That is one of the test or target templates has fewer

points and thus the sum of its descriptors weight sets the maximum score

that can be attained

FIG

FIG

FIG

FIG

movement of eye Y shape branches are observed to be a stable feature and

can be used as sclera feature descriptor To detect the Y shape branches in

the original template we search for the nearest neighbors set of every line

segment in a regular distance classified the angles among these neighbors

If there were two types of angle values in the line segment set this set may

be inferred as a Y shape structure and the line segment angles would be

recorded as a new feature of the sclera

There are two ways to measure both orientation and relationship of

every branch of Y shape vessels one is to use the angles of every branch to

x axle the other is to use the angels between branch and iris radial

direction The first method needs additional rotation operating to align the

template In our approach we employed the second method As Figure 6

shows ϕ1 ϕ2 and ϕ3 denote the angle between each branch and the radius

from pupil center Even when the head tilts the eye moves or the camera

zooms occurs at the image acquisition step ϕ1 ϕ2 and ϕ3 are quite stable

To tolerate errors from the pupil center calculation in the segmentation step

we also recorded the center position (x y) of the Y shape branches as

auxiliary parameters So in our rotation shift and scale invariant feature

vector is defined as y(ϕ1 ϕ2 ϕ3 x y) The Y-shape descriptor is generated

with reference to the iris center Therefore it is automatically aligned to the

iris centers It is a rotational- and scale- invariant descriptor V WPL

SCLERA DESCRIPTOR As we discussed in the Section 22 the line

descriptor is extracted from the skeleton of vessel structure in binary images

(Figure 7) The skeleton is then broken into smaller segments For each

segment a line descriptor is created to record the center and orientation of

the segment This descriptor is expressed as s(x yɸ) where (x y) is the

position of the center and ɸ is its orientation Because of the limitation of

segmentation accuracy the descriptor in the boundary of sclera area might

not be accurate and may contain spur edges resulting from the iris eyelid

andor eyelashes To be tolerant of such error the mask file

FIG

The line descriptor of the sclera vessel pattern (a) An eye image (b) Vessel

patterns in sclera (c) Enhanced sclera vessel patterns (d) Centers of line

segments of vessel patterns

Is designed to indicate whether a line segment belongs to the edge of the

sclera or not However in GPU application using the mask is a challenging

since the mask files are large in size and will occupy the GPU memory and

slow down the data transfer When matching the registration RANSAC

type algorithm was used to randomly select the corresponding descriptors

and the transform parameter between them was used to generate the

template transform affine matrix After every templates transform the mask

data should also be transformed and new boundary should be calculated to

evaluate the weight of the transformed descriptor This results in too many

convolutions in processor unit

To reduce heavy data transfer and computation we designed the

weighted polar line (WPL) descriptor structure which includes the

information of mask and can be automatically aligned We extracted the

relationship of geometric feature of descriptors and store them as a new

descriptor We use a weighted image created via setting various weight

values according to their positions The weight of those descriptors who are

beyond the sclera are set to be 0 and those who are near the sclera

boundary are 05 and interior descriptors are set to be 1 In our work

descriptors weights were calculated on their own mask by the CPU only

once

The calculating result was saved as a component of descriptor The

descriptor of sclera will change to s(x y ɸw) where w denotes the weight

of the point and the value may be 0 05 1 To align two templates when a

template is shifted to another location along the line connecting their

centers all the descriptors of that template will be transformed It would be

faster if two templates have similar reference points If we use the center of

the iris as the reference point when two templates are compared the

correspondence will automatically be aligned to each other since they have

the similar reference point Every feature vector of the template is a set of

line segment descriptors composed of three variable (Figure 8) the

segment angle to the reference line which went through the iris center

denoted as θ the distance between the segments center and pupil center

which is denoted as r the dominant angular orientation of the segment

denoted as ɸ To minimize the GPU computing we also convert the

descriptor value from polar coordinate to rectangular coordinate in CPU

preprocess

The descriptor vector becomes s(x y r θ ɸw) The left and right

parts of sclera in an eye may have different registration parameters For

example as an eyeball moves left left part sclera patterns of the eye may be

compressed while the right part sclera patterns are stretched

In parallel matching these two parts are assigned to threads in

different warps to allow different deformation The multiprocessor in

CUDA manages threads in groups of 32 parallel threads called warps We

reorganized the descriptor from same sides and saved

FIG

FIG

them in continuous address This would meet requirement of coalesced

memory access in GPU

After reorganizing the structure of descriptors and adding mask information

into the new descriptor the computation on the mask file is not needed on

the GPU It was very fast to match with this feature because it does not

need to reregister the templates every time after shifting Thus the cost of

data transfer and computation on GPU will be reduced Matching on the

new descriptor the shift parameters generator in Figure 4 is then simplified

as Figure 9

23 EVOLUTION OF GPU ARCHITECTURE

The fixed-function pipeline lacked the generality to efficiently express

more complicated shading and lighting operations that are essential for

complex effects The key step was replacing the fixed-function per-vertex

and per-fragment operations with user-specified programs run on each

vertex and fragment Over the past six years these vertex programs and

fragment programs have become increasingly more capable with larger

limits on their size and resource consumption with more fully featured

instruction sets and with more flexible control-flow operations After many

years of separate instruction sets for vertex and fragment operations current

GPUs support the unified Shader Model 40 on both vertex and fragment

shaders

The hardware must support shader programs of at least 65 k static

instructions and unlimited dynamic instructions

The instruction set for the first time supports both 32-bit integers and 32-

bit floating-point numbers

The hardware must allow an arbitrary number of both direct and indirect

reads from global memory (texture)

Finally dynamic flow control in the form of loops and branches must be

supported

As the shader model has evolved and become more powerful and GPU

applications of all types have increased vertex and fragment program

complexity GPU architectures have increasingly focused on the

programmable parts of the graphics pipeline Indeed while previous

generations of GPUs could best be described as additions of

programmability to a fixed-function pipeline todayrsquos GPUs are better

characterized as a programmable engine surrounded by supporting fixed-

function units General-Purpose Computing on the GPU Mapping general-

purpose computation onto the GPU uses the graphics hardware in much the

same way as any standard graphics application Because of this similarity it

is both easier and more difficult to explain the process On one hand the

actual operations are the same and are easy to follow on the other hand the

terminology is different between graphics and general-purpose use Harris

provides an excellent description of this mapping process

We begin by describing GPU programming using graphics terminology

then show how the same steps are used in a general-purpose way to author

GPGPU applications and finally use the same steps to show the more

simple and direct way that todayrsquos GPU computing applications are written

231 PROGRAMMING A GPU FOR GRAPHICS

We begin with the same GPU pipeline that we described in Section II

concentrating on the programmable aspects of this pipeline

The programmer specifies geometry that covers a region on the screen

The rasterizer generates a fragment at each pixel location covered by that

geometry

Each fragment is shaded by the fragment program

The fragment program computes the value of the fragment by a

combination of math operations and global memory reads from a global

Btexture[ memory

The resulting image can then be used as texture on future passes through

the graphics pipeline

232 PROGRAMMING A GPU FOR GENERAL-PURPOSE

PROGRAMS (OLD)

Coopting this pipeline to perform general-purpose computation

involves the exact same steps but different terminology A motivating

example is a fluid simulation computed over a grid at each time step we

compute the next state of the fluid for each grid point from the current state

at its grid point and at the grid points of its neighbors

The programmer specifies a geometric primitive that covers a

computation domain of interest The rasterizer generates a fragment at each

pixel location covered by that geometry (In our example our primitive

must cover a grid of fragments equal to the domain size of our fluid

simulation)

Each fragment is shaded by an SPMD general-purpose fragment

program (Each grid point runs the same program to update the state of its

fluid)

The fragment program computes the value of the fragment by a

combination of math operations and Bgather[ accesses from global

memory (Each grid point can access the state of its neighbors from the

previous time step in computing its current value)

The resulting buffer in global memory can then be used as an input on

future passes (The current state of the fluid will be used on the next time

step)

23 2 PROGRAMMING A GPU FOR GENERAL-PURPOSE

PROGRAMS (NEW)

One of the historical difficulties in programming GPGPU applications

has been that despite their general-purpose tasksrsquo having nothing to do with

graphics the applications still had to be programmed using graphics APIs

In addition the program had to be structured in terms of the graphics

pipeline with the programmable units only accessible as an intermediate

step in that pipeline when the programmer would almost certainly prefer to

access the programmable units directly The programming environments we

describe in detail in Section IV are solving this difficulty by providing a

more natural direct non-graphics interface to the hardware and

specifically the programmable units Today GPU computing applications

are structured in the following way

The programmer directly defines the computation domain of interest as a

structured grid of threads

An SPMD general-purpose program computes the value of each thread

The value for each thread is computed by a combination of math

operations and both Bgather[ (read) accesses from and Bscatter[ (write)

accesses to global memory Unlike in the previous two

methods the same buffer can be used for both reading and writing

allowing more flexible algorithms (for example in-place algorithms that

use less memory)

The resulting buffer in global memory can then be used as an input in

future computation

24 COARSE-TO-FINE TWO-STAGE MATCHING PROCESS

To further improve the matching process we propose the coarse-to-fine

two-stage matching process In the first stage we matched two images

coarsely using the Y-shape descriptors which is very fast to match because

no registration was needed The matching result in this stage can help filter

out image pairs with low similarities After this step it is still possible for

some false positive matches In the second stage we used WPL descriptor

to register the two images for more detailed descriptor matching including

scale- and translation invariance This stage includes shift transform affine

matrix generation and final WPL descriptor matching Overall we

partitioned the registration and matching processing into four kernels2 in

CUDA (Figure 10) matching on the Y shape descriptor shift

transformation affine matrix generation and final WSL descriptor

matching Combining these two stages the matching program can run faster

and

achieve more accurate score

241 STAGE I MATCHING WITH Y SHAPE DESCRIPTOR

Due to scale- and rotation- invariance of the Y-shape features

registration is unnecessary before matching on Y shape descriptor The

whole matching algorithm is listed as algorithm 1

FIG

Here ytei and yta j are the Y shape descriptors of test template Tte

and target template Tta respectively dϕ is the Euclidian distance of angle

element of descriptors vector defined as (3) dxy is the Euclidian distance of

two descriptor centers defined as (4) ni and di are the matched descriptor

pairsrsquo number and their centers distance respectively tϕ is a distance

threshold and txy is the threshold to restrict the searching area We set tϕ to

30 and txy to 675 in our experiment Here

To match two sclera templates we searched the areas nearby to all

the Y shape branches The search area is limited to the corresponding left or

right half of the sclera in order to reduce the searching range and time The

distance of two branches is defined in (3) where ϕi j is the angle between

the j th branch and the polar from pupil center in desctiptor i

The number of matched pairs ni and the distance between Y shape

branches centers di are stored as the matching result We fuse the number of

matched branches and the average distance between matched branches

centers as (2) Here α is a factor to fuse the matching score which was set

to 30 in our study Ni and Nj is the total numbers of feature vectors in

template i and j separately The decision is regulated by the threshold t if

the sclerarsquos matching score is lower than t the sclera will be discarded The

sclera with high matching score will be passed to the next more precisely

matching process

242 STAGE II FINE MATCHING USING WPL DESCRIPTOR

The line segment WSL descriptor reveals more vessel structure detail of

sclera than the Y shape descriptor The variation of sclera vessel pattern is

nonlinear because

When acquiring an eye image in different gaze angle the vessel structure

will appear nonlinear shrink or extend because eyeball is spherical in shape

sclera is made up of four layers episclera stroma lamina fusca and

endothelium There are slightly differences among movement of these

layers Considering these factors our registration employed both single

shift transform and multi-parameter transform which combines shift

rotation and scale together

1) SHIFT PARAMETER SEARCH As we discussed before

segmentation may not be accurate As a result the detected iris center could

not be very accurate Shift transform is designed to tolerant possible errors

in pupil center detection in the segmentation step If there is no deformation

or only very minor deformation registration with shift transform together

would be adequate to achieve an accurate result We designed Algorithm 2

to get optimized shift parameter Where Tte is the test template and ssei is

the i th WPL descriptor of Tte Tta is the target template and ssai is the i th

WPL descriptor of Ttad(stek staj ) is Euclidean distance of escriptors stek

and staj

Δsk is the shift value of two descriptors defines as

We first randomly select an equal number of segment descriptors

stek in test template Tte from each quad and find its nearest neighbors staj _

in target template Tta The shift offset of them is recorded as the possible

registration shift factor _sk The final offset registration factor is _soptim

which has the smallest standard deviation among these candidate offsets

2) AFFINE TRANSFORM PARAMETER SEARCH

Affine transform is designed to tolerant some deformation of sclera

patterns in the matching step The affine transform algorithm is shown in

Algorithm 3 The shift value in the parameter set is obtained by randomly

selecting descriptor s(it )te and calculating the distance from its nearest

neighbor staj_ in Tta We transform the test template by the matrix in (7)

At end of the iteration we count the numbers of matched descriptor pairs

from the transformed template and the target template The factor β is

involved to determine if the pair of descriptor is matched and we set it to

be 20 pixels in our experiment After N iterations the optimized transform

parameter set is determined via selecting the maximum matching numbers

m(it) Here stei Tte staj and Tta is defined same as algorithm 2 tr (it )

shi f t θ(it )tr (it ) scale is the parameters of shift rotation and scale

transform generated in i tth iteration R(θ (it )) T (tr (it ) shi f t ) and S(tr (it

) scale) are the transform matrix defined as (7) To search optimize

transform parameter we iterated N times to generate these parameters In

our experiment we set iteration time to 512

3) REGISTRATION AND MATCHING ALGORITHM

Using the optimized parameter set determined from Algorithms 2

and 3 the test template will be registered and matched simultaneously The

registration and matching algorithm is listed in Algorithm 4 Here stei Tte

staj and Tta are defined same as Algorithms 2 and 3 θ(optm) tr (optm) shi f

t tr (optm) scale _soptim are the registration parameters attained from

Algorithms 2 and 3 R_θ(optm)_T _tr (optm) shi f t _S_tr (optm) scale

_ is the descriptor transform matrix defined in Algorithm 3 empty is the angle

between the segment descriptor and radius direction w is the weight of the

descriptor which indicates whether the descriptor is at the edge of sclera or

not To ensure that the nearest descriptors have a similar orientation we

used a constant factor α to check the abstract difference of two ɸ In our

experiment we set α to 5 The total matching score is minimal score of two

transformed result divided by the minimal matching score for test template

and target template

25 MAPPING THE SUBTASKS TO CUDA

CUDA is a single instruction multiple data (SIMD) system and

works as a coprocessor with a CPU A CUDA consists of many streaming

multiprocessors (SM) where the parallel part of the program should be

partitioned into threads by the programmer and mapped into those threads

There are multiple memory spaces in the CUDA memory hierarchy

register local memory shared memory global memory constant memory

and texture memory Register local memory and shared memory are on-

chip and could be a little time consuming to access these memories Only

shared memory can be accessed by other threads within the same block

However there is only limited availability of shared memory Global

memory constant memory and texture memory are off-chip memory and

accessible by all threads which would be very time consuming to access

these memories

Constant memory and texture memory are read-only and cacheable

memory Mapping algorithms to CUDA to achieve efficient processing is

not a trivial task There are several challenges in CUDA programming

If threads in a warp have different control path all the branches will be

executed serially To improve performance branch divergence within a

warp should be avoided

Global memory is slower than on-chip memory in term of access To

completely hide the latency of the small instructions set we should use on-

chip memory preferentially rather than global memory When global

memory access occurs threads in same warp should access the words in

sequence to achieve coalescence

Shared memory is much faster than the local and global memory space

But shared memory is organized into banks which are equal in size If two

addresses of memory request from different thread within a warp fall in the

same memory bank the access will be serialized To get maximum

performance memory requests should be scheduled to minimize bank

conflicts

251 MAPPING ALGORITHM TO BLOCKS

Because the proposed registration and matching algorithm has four

independent modules all the modules will be converted to different kernels

on the GPU These kernels are different in computation density thus we

map them to the GPU by various map strategies to fully utilize the

computing power of CUDA Figure 11 shows our scheme of CPU-GPU

task distribution and the partition among blocks and threads Algorithm 1 is

partitioned into coarse-grained parallel subtasks

We create a number of threads in this kernel The number of threads

is the same as the number of templates in the database As the upper middle

column shows in Figure 11 each target template will be assigned to one

thread One thread performs a pair of templates compare In our work we

use NVIDIA C2070 as our GPU Threads and blocks number is set to

1024 That means we can match our test template with up to 1024times1024

target templates at same time

Algorithms 2-4 will be partitioned into fine-grained subtasks which is

processed a section of descriptors in one thread As the lower portion of the

middle column shows in Figure 11 we assigned a target template to one

block Inside a block one thread corresponds a set of descriptors in this

template This partition makes every block execute independently and there

are no data exchange requirements between different blocks When all

threads complete their responding descriptor fractions the sum of the

intermediate results needs to be computed or compared A parallel prefix

sum algorithm is used to calculate the sum of intermediate results which is

show in right of Figure 11 Firstly all odd number threads compute the sum

of consecutive pairs of the results Then recursively every first of i (= 4 8

16 32 64 ) threads

compute the prefix sum on the new result The final result will be saved in

the first address which has the same variable name as the first intermediate

result

252 MAPPING INSIDE BLOCK

In shift argument searching there are two schemes we can choose to

map task

Mapping one pair of templates to all the threads in a block and then every

thread would take charge of a fraction of descriptors and cooperation with

other threads

Assigning a single possible shift offset to a thread and all the threads will

compute independently unless the final result should be compared with

other possible offset

Due to great number of sum and synchronization operations in every

nearest neighbor searching step we choose the second method to parallelize

shift searching In affine matrix generator we mapped an entire parameter

set searching to a thread and every thread randomly generated a set of

parameters and tried them independently The generated iterations were

assigned to all threads The challenge of this step is the randomly generated

numbers might be correlated among threads In the step of rotation and

scale registration generating we used the Mersenne Twister pseudorandom

number generator because it can use bitwise arithmetic and have long

period

The Mersenne twister as most of pseudorandom generators is iterative

Therefore itrsquos hard to parallelize a single twister state update step among

several execution threads To make sure that thousands of threads in the

launch grid generate uncorrelated random sequence many simultaneous

Mersenne twisters need to process with different initial states in parallel

But even ldquovery differentrdquo (by any definition) initial state values do not

prevent the emission of correlated sequences by each generator sharing

identical parameters To solve this problem and to enable efficient

implementation of Mersenne Twister on parallel architectures we used a

special offline tool for the dynamic creation of Mersenne Twisters

parameters modified from the algorithm developed by Makoto Matsumoto

and Takuji Nishimura In the registration and matching step when

searching the nearest neighbor a line segment that has already matched

with others should not be used again In our approach a flag

FIG

FIG

Variable denoting whether the line has been matched is stored in

shared memory To share the flags all the threads in a block should wait

synchronic operation at every query step Our solution is to use a single

thread in a block to process the matching

253 MEMORY MANAGEMENT

The bandwidth inside GPU board is much higher than the

bandwidth between host memory and device memory The data transfer

between host and device can lead to long latency As shown in Figure 11

we load the entire target templates set from database without considering

when they would be processed Therefore there was no data transfer from

host to device during the matching procedure In global memory the

components in descriptor y(ϕ1 ϕ2 ϕ3 x y) and s(x y rθ ϕw) were stored

separately This would guarantee contiguous kernels of Algorithm 2 to 4

can access their data in successive addresses Although such coalescing

access reduces the latency frequently global memory access was still a

slower way to get data In our kernel we loaded the test template to shared

memory to accelerate memory access Because the Algorithms 2 to 4

execute different number of iterations on same data the bank conflict does

not happen To maximize our texture memory space we set the system

cache to the lowest value and bonded our target descriptor to texture

memory Using this catchable memory our data access was accelerated

more

FIG

26 HISTOGRAM OF ORIENTED GRADIENTS

Histogram of oriented gradients is the feature descriptors It is primarily

applied to the design of target detection In this paper it is applied as the

feature for human recognition In the sclera region the vein patterns are the

edges of an image So HOG is used to determine the gradient orientation

and edge orientations of vein pattern in the sclera region of an eye image

To follow out this technique first of all divide the image into small

connected regions called cells For each cell compute the histogram of

gradient directions or edge orientations of the pixels Then the combination

of different histogram of different cell represents the descriptor To improve

accuracy histograms can be contrast normalized by calculating the intensity

from the block and then using this value normalizes all cells within the

block This normalization result shows that it is invariant to geometric and

photometric changes The gradient magnitude m(x y) and orientation 1050592(x

y) are calculated using x and y directions gradients dx (x y) and dy (x y)

Orientation binning is the second step of HOG This method utilized

to create cell histograms Each pixel within the cell used to give a weight to

the orientation which is found in the gradient computation Gradient

magnitude is used as the weight The cells are in the rectangular form The

binning of gradient orientation should be spread over 0 to 180 degrees and

opposite direction counts as the same In the Fig 8 depicts the edge

orientation of picture elements If the images have any illumination and

contrast changes then the gradient strength must be locally normalized For

that cells are grouped together into larger blocks These blocks are

overlapping so that each cell contributes more than once to the final

descriptor Here rectangular HOG (R-HOG) blocks are applied which are

mainly in square grids The performance of HOG is improved by putting

on a Gaussian window into each block

FIG

CHAPTER 3

SOFTWARE SPECIFICATION

31 GENERAL

MATLAB(matrix laboratory) is a numerical

computing environment and fourth-generation programming language

Developed by Math Works MATLAB allows matrix manipulations

plotting of functions and data implementation of algorithms creation

of user interfaces and interfacing with programs written in other languages

including C C++ Java and Fortran

Although MATLAB is intended primarily for numerical computing an

optional toolbox uses the MuPAD symbolic engine allowing access

to symbolic computing capabilities An additional package Simulink adds

graphicalmulti-domainsimulationandModel-Based

Design for dynamic and embedded systems

In 2004 MATLAB had around one million users across industry

and academia MATLAB users come from various backgrounds

of engineering science and economics MATLAB is widely used in

academic and research institutions as well as industrial enterprises

MATLAB was first adopted by researchers and practitioners

in control engineering Littles specialty but quickly spread to many other

domains It is now also used in education in particular the teaching

of linear algebra and numerical analysis and is popular amongst scientists

involved in image processing The MATLAB application is built around the

MATLAB language The simplest way to execute MATLAB code is to type

it in the Command Window which is one of the elements of the MATLAB

Desktop When code is entered in the Command Window MATLAB can

be used as an interactive mathematical shell Sequences of commands can

be saved in a text file typically using the MATLAB Editor as a script or

encapsulated into a function extending the commands available

MATLAB provides a number of features for documenting and

sharing your work You can integrate your MATLAB code with other

languages and applications and distribute your MATLAB algorithms and

applications

32 FEATURES OF MATLAB

High-level language for technical computing

Development environment for managing code files and data

Interactive tools for iterative exploration design and problem solving

Mathematical functions for linear algebra statistics Fourier analysis

filtering optimization and numerical integration

2-D and 3-D graphics functions for visualizing data

Tools for building custom graphical user interfaces

Functions for integrating MATLAB based algorithms with external

applications and languages such as C C++ FORTRAN Javatrade COM

and Microsoft Excel

MATLAB is used in vast area including signal and image

processing communications control design test and measurement

financial modeling and analysis and computational Add-on toolboxes

(collections of special-purpose MATLAB functions) extend the MATLAB

environment to solve particular classes of problems in these application

areas

MATLAB can be used on personal computers and powerful

server systems including the Cheaha compute cluster With the addition of

the Parallel Computing Toolbox the language can be extended with parallel

implementations for common computational functions including for-loop

unrolling Additionally this toolbox supports offloading computationally

intensive workloads to Cheaha the campus compute cluster MATLAB is

one of a few languages in which each variable is a matrix (broadly

construed) and knows how big it is Moreover the fundamental operators

(eg addition multiplication) are programmed to deal with matrices when

required And the MATLAB environment handles much of the bothersome

housekeeping that makes all this possible Since so many of the procedures

required for Macro-Investment Analysis involves matrices MATLAB

proves to be an extremely efficient language for both communication and

implementation

321 INTERFACING WITH OTHER LANGUAGES

MATLAB can call functions and subroutines written in the C

programming language or FORTRAN A wrapper function is created

allowing MATLAB data types to be passed and returned The dynamically

loadable object files created by compiling such functions are termed MEX-

files (for MATLAB executable)

Libraries written in Java ActiveX or NET can be directly called

from MATLAB and many MATLAB libraries (for

example XML or SQL support) are implemented as wrappers around Java

or ActiveX libraries Calling MATLAB from Java is more complicated but

can be done with MATLAB extension which is sold separately by Math

Works or using an undocumented mechanism called JMI (Java-to-Mat lab

Interface) which should not be confused with the unrelated Java that is also

called JMI

As alternatives to the MuPAD based Symbolic Math Toolbox

available from Math Works MATLAB can be connected

to Maple or Mathematical

Libraries also exist to import and export MathML

Development Environment

Startup Accelerator for faster MATLAB startup on Windows especially on

Windows XP and for network installations

Spreadsheet Import Tool that provides more options for selecting and

loading mixed textual and numeric data

Readability and navigation improvements to warning and error messages in

the MATLAB command window

Automatic variable and function renaming in the MATLAB Editor

Developing Algorithms and Applications

MATLAB provides a high-level language and development

tools that let you quickly develop and analyze your algorithms and

applications

The MATLAB Language

The MATLAB language supports the vector and matrix operations

that are fundamental to engineering and scientific problems It enables fast

development and execution With the MATLAB language you can

program and develop algorithms faster than with traditional languages

because you do not need to perform low-level administrative tasks such as

declaring variables specifying data types and allocating memory In many

cases MATLAB eliminates the need for lsquoforrsquo loops As a result one line of

MATLAB code can often replace several lines of C or C++ code

At the same time MATLAB provides all the features of a traditional

programming language including arithmetic operators flow control data

structures data types object-oriented programming (OOP) and debugging

features

MATLAB lets you execute commands or groups of commands one

at a time without compiling and linking enabling you to quickly iterate to

the optimal solution For fast execution of heavy matrix and vector

computations MATLAB uses processor-optimized libraries For general-

purpose scalar computations MATLAB generates machine-code

instructions using its JIT (Just-In-Time) compilation technology

This technology which is available on most platforms provides

execution speeds that rival those of traditional programming languages

Development Tools

MATLAB includes development tools that help you implement

your algorithm efficiently These include the following

MATLAB Editor

Provides standard editing and debugging features such as setting

breakpoints and single stepping

Code Analyzer

Checks your code for problems and recommends modifications to

maximize performance and maintainability

MATLAB Profiler

Records the time spent executing each line of code

Directory Reports

Scan all the files in a directory and report on code efficiency file

differences file dependencies and code coverage

Designing Graphical User Interfaces

By using the interactive tool GUIDE (Graphical User Interface

Development Environment) to layout design and edit user interfaces

GUIDE lets you include list boxes pull-down menus push buttons radio

buttons and sliders as well as MATLAB plots and Microsoft

ActiveXreg controls Alternatively you can create GUIs programmatically

using MATLAB functions

322 ANALYZING AND ACCESSING DATA

MATLAB supports the entire data analysis process from acquiring

data from external devices and databases through preprocessing

visualization and numerical analysis to producing presentation-quality

output

Data Analysis

MATLAB provides interactive tools and command-line functions for data

analysis operations including

Interpolating and decimating

Extracting sections of data scaling and averaging

Thresholding and smoothing

Correlation Fourier analysis and filtering

1-D peak valley and zero finding

Basic statistics and curve fitting

Matrix analysis

Data Access

MATLAB is an efficient platform for accessing data from

files other applications databases and external devices You can read data

from popular file formats such as Microsoft Excel ASCII text or binary

files image sound and video files and scientific files such as HDF and

HDF5 Low-level binary file IO functions let you work with data files in

any format Additional functions let you read data from Web pages and

XML

Visualizing Data

All the graphics features that are required to visualize engineering

and scientific data are available in MATLAB These include 2-D and 3-D

plotting functions 3-D volume visualization functions tools for

interactively creating plots and the ability to export results to all popular

graphics formats You can customize plots by adding multiple axes

changing line colors and markers adding annotation Latex equations and

legends and drawing shapes

2-D Plotting

Visualizing vectors of data with 2-D plotting functions that create

Line area bar and pie charts

Direction and velocity plots

Histograms

Polygons and surfaces

Scatterbubble plots

Animations

3-D Plotting and Volume Visualization

MATLAB provides functions for visualizing 2-D matrices 3-

D scalar and 3-D vector data You can use these functions to visualize and

understand large often complex multidimensional data Specifying plot

characteristics such as camera viewing angle perspective lighting effect

light source locations and transparency

3-D plotting functions include

Surface contour and mesh

Image plots

Cone slice stream and isosurface

323 PERFORMING NUMERIC COMPUTATION

MATLAB contains mathematical statistical and engineering

functions to support all common engineering and science operations These

functions developed by experts in mathematics are the foundation of the

MATLAB language The core math functions use the LAPACK and BLAS

linear algebra subroutine libraries and the FFTW Discrete Fourier

Transform library Because these processor-dependent libraries are

optimized to the different platforms that MATLAB supports they execute

faster than the equivalent C or C++ code

MATLAB provides the following types of functions for performing

mathematical operations and analyzing data

Matrix manipulation and linear algebra

Polynomials and interpolation

Fourier analysis and filtering

Data analysis and statistics

Optimization and numerical integration

Ordinary differential equations (ODEs)

Partial differential equations (PDEs)

Sparse matrix operations

MATLAB can perform arithmetic on a wide range of data types

including doubles singles and integers

CHAPTER 4

IMPLEMENTATION

41 GENERAL

Matlab is a program that was originally designed to simplify the

implementation of numerical linear algebra routines It has since grown into

something much bigger and it is used to implement numerical algorithms

for a wide range of applications The basic language used is very similar to

standard linear algebra notation but there are a few extensions that will

likely cause you some problems at first

42 SNAPSHOTS

ORIGINAL SCLERA IMAGE IS CONVERTED INTO GREY SCALE IMAGE

FIG

GREY SCALE IMAGE IS CONVERTED INTO BINARY IMAGE

FIG

EDGE DETECTON IS DONE BY OTSUrsquoS THRESHOLDING

FIG

SELECTING THE REGION OF INTEREST (SCLERA PART)

FIG

SELECTED ROI PART

FIG

FIG

ENHANCEMENT OF SCLERA IMAGE

FIG

FEATURE EXTRACTION OF SCLERA IMAGE USING GABOR

FILTERS

FIG

MATCHING WITH IMAGES IN DATABASE

FIG

DISPLAYING THE RESULT (MATCHED OR NOT MATCHED)

FIG

CHAPTER 5

APPLICATIONS

The applications of biometrics can be divided into the following three main groups

Commercial applications such as computer network login electronic data security ecommerce Internet access ATM credit card physical access control cellular phone PDA medical records management distance learning etc

Government applications such as national ID card correctional facility driverrsquos license social security welfare-disbursement border control Passport control etc

Forensic applications such as corpse identification criminal investigation terrorist identification parenthood determination missing children etc Traditionally commercial applications have used knowledge-based systems (eg PIN sand passwords) government applications have used token-based systems (eg ID cards and badges) and forensic applications have relied on human experts to match biometric features Biometric systems are being increasingly deployed in large scale civilian applications The Schiphol Premium scheme at the Amsterdam airport for example employs iris scan cards to speed up the passport and visa control procedures

CHAPTER 6

CONCLUSION AND FUTURE SCOPE

61 CONCLUSION

In this paper we proposed a new parallel sclera vein recognition

method which employees a two stage parallel approach for registration and

matching Even though the research focused on developing a parallel sclera

matching solution for the sequential line-descriptor method using CUDA

GPU architecture the parallel strategies developed in this research can be

applied to design parallel solutions to other sclera vein recognition methods

and general pattern recognition methods We designed the Y shape

descriptor to narrow the search range to increase the matching efficiency

which is a new feature extraction method to take advantage of the GPU

structures We developed the WPL descriptor to incorporate mask

information and make it more suitable for parallel computing which can

dramatically reduce data transferring and computation We then carefully

mapped our algorithms to GPU threads and blocks which is an important

step to achieve parallel computation efficiency using a GPU A work flow

which has high arithmetic intensity to hide the memory access latency was

designed to partition the computation task to the heterogeneous system of

CPU and GPU even to the threads in GPU The proposed method

dramatically improves the matching efficiency without compromising

recognition accuracy

62 REFERENCES

[1] C W Oyster The Human Eye Structure and Function Sunderland

Sinauer Associates 1999

[2] C Cuevas D Berjon F Moran and N Garcia ldquoMoving object

detection for real-time augmented reality applications in a GPGPUrdquo IEEE

Trans Consum Electron vol 58 no 1 pp 117ndash125 Feb 2012

[3] D C Cirean U Meier L M Gambardella and J Schmidhuber ldquoDeep

big simple neural nets for handwritten digit recognitionrdquo Neural Comput

vol 22 no 12 pp 3207ndash3220 2010

[4] F Z Sakr M Taher and A M Wahba ldquoHigh performance iris

recognition system on GPUrdquo in Proc ICCES 2011 pp 237ndash242

[5] G Poli J H Saito J F Mari and M R Zorzan ldquoProcessing

neocognitron of face recognition on high performance environment based

on GPU with CUDA architecturerdquo in Proc 20th Int Symp Comput

Archit High Perform Comput 2008 pp 81ndash88

[6] J Antikainen J Havel R Josth A Herout P Zemcik and M Hauta-

Kasari ldquoNonnegative tensor factorization accelerated using GPGPUrdquo IEEE

Trans Parallel Distrib Syst vol 22 no 7 pp 1135ndash1141 Feb 2011

[7] K-S Oh and K Jung ldquoGPU implementation of neural networksrdquo

Pattern Recognit vol 37 no 6 pp 1311ndash1314 2004

[8] P R Dixon T Oonishi and S Furui ldquoHarnessing graphics processors

for the fast computation of acoustic likelihoods in speech recognitionrdquo

Comput Speech Lang vol 23 no 4 pp 510ndash526 2009

[9] P Kaufman and A Alm ldquoClinical applicationrdquo Adlerrsquos Physiology of

the Eye 2003

[10] R N Rakvic B J Ulis R P Broussard R W Ives and N Steiner

ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics Security vol 4

no 4 pp 812ndash823 Dec 2009

[11] S Crihalmeanu and A Ross ldquoMultispectral scleral patterns for ocular

biometric recognitionrdquo Pattern Recognit Lett vol 33 no 14 pp 1860ndash

1869 Oct 2012

[12] W Wenying Z Dongming Z Yongdong L Jintao and G

Xiaoguang ldquoRobust spatial matching for object retrieval and its parallel

implementation on GPUrdquo IEEE Trans Multimedia vol 13 no 6 pp

1308ndash1318 Dec 2011Multimedia Sec Magdeburg Germany Sep 2004

pp 4ndash15

[13] Y Xu S Deka and R Righetti ldquoA hybrid CPU-GPGPU approach for

real-time elastographyrdquo IEEE Trans Ultrason Ferroelectr Freq Control

vol 58 no 12 pp 2631ndash2645 Dec 2011

[14] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive

multimodal eye recognitionrdquo Signal Image Video Process vol 7 no 4

pp 619ndash631 Jul 2013

[15] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive

approach for sclera image quality measurerdquo Int J Biometrics vol 5 no 2

pp 181ndash198 2013

[16] Z Zhou E Y Du N L Thomas and E J Delp ldquoA new human

identification method Sclera recognitionrdquo IEEE Trans Syst Man

Cybern A Syst Humans vol 42 no 3 pp 571ndash583 May 2012

Page 33: Major project report on

Sclera segmentation is the first step in the sclera recognition It lets

in three steps glare area detection sclera area estimation and iris and eyelid

detection and refinement Fig shows the steps of segmentation

FIG

Glare Area Detection Glare area means a small bright area near

pupil or iris This is the unwanted portion on the eye image Sobel filter is

applied to detect the glare area present in the iris or pupil Simply it runs

only for the grayscale image If the image is color then it needs a

conversion to grayscale image and after that apply it to the Sobel filter to

detect the glare area Fig 4 shows the result of the glare area detection

FIG

Sclera area estimation For the estimation of sclera area Otsursquos

thresholding method is applied The stairs of the sclera area detection are

selection of the area of interest (ROI) Otsursquos thresholding sclera area

detection Left and right sclera area is selected based on the iris boundaries

When the region of interest is selected then apply Otsursquos thresholding for

obtaining the potential sclera areas The correct left sclera area should be

placed in the right and center positions and correct right sclera area should

be placed in the left and center In this way non sclera areas are wiped out

223 IRIS AND EYELID REFINEMENT

The top and underside of the sclera regions are the limits of the

sclera area And then that upper eyelid lower eyelid and iris boundaries are

refined These altogether are the unwanted portion for recognition In order

to eliminate these effects refinement is done in the footstep of the detection

of sclera area Fig shows after the Otsursquos thresholding process and iris and

eyelid refinement to detect right sclera area In the same way the left sclera

area is detected using this method

FIG

In the segmentation process all images are not perfectly segmented

Hence feature extraction and matching are needed to reduce the

segmentation fault The vein patterns in the sclera area are not visible in the

segmentation process To get vein patterns more visible vein pattern

enhancement is to be performed

224 OCULAR SURFACE VASCULATURE

Human recognition using vascular patterns in the human body has

been studied in the context of fingers (Miura et al 2004) palm (Lin and

Fan 2004) and retina (Hill 1999) In the case of retinal biometrics an

especial optical device for imaging the back of the eyeball is needed (Hill

1999) Due to its perceived invasiveness and the required degree of subject

cooperation the use of retinal biometrics may not be acceptable to some

individuals The conjunctiva is a thin transparent and moist tissue that

covers the outer surface of the eye The part of the conjunctiva that covers

the inner lining of the eyelids is called palpebral conjunctiva and the part

that covers the outer surface of the eye is called ocular (or the bulbar)

conjunctiva which is the focus of this study The ocular conjunctiva is very

thin and clear thus the vasculature (including those of the episclera) is

easily visible through it The visible microcirculation of conjunctiva offers a

rich and complex network of veins and fine microcirculation (Fig 1) The

apparent complexity and specificity of these vascular patterns motivated us

to utilize them for personal identification (Derakhshani and Ross 2006)

FIG

We have found conjunctival vasculature to be a suitable biometric as it

conforms to the following criteria (Jain et al 2004)

UNIVERSALITY All normal living tissues including that of the

conjunctiva and episclera have vascular structure

UNIQUENESS Vasculature is created during embryonic vasculogenesis

Its detailed final structure is mostly stochastic and thus unique Even

though no comprehensive study on the uniqueness of vascular structures

has been conducted study of some targeted areas such as those of the eye

fundus confirm the uniqueness of such vascular patterns even between

identical twins (Simon and Goldstein 1935 Tower 1955)

PERMANENCE Other than cases of significant trauma pathology or

chemical intervention spontaneous adult ocular vasculogenesis and

angiogenesis does not easily occur Thus the conjunctival vascular

structure is expected to have reasonable permanence (Joussen 2001)

Practicality Conjunctival vasculature can be captured with commercial off

the shelf digital cameras under normal lighting conditions making this

modality highly practical

ACCEPTABILITY Since the subject is not required to stare directly into

the camera lens and given the possibility of capturing the conjunctival

vasculature from several feet away this modality is non-intrusive and thus

more acceptable

SPOOF-PROOFNESS The fine multi surface structure of the ocular

veins makes them hard to reproduce as a physical artifact Besides being a

stand-alone biometric modality we anticipate that the addition of

conjunctival biometrics will enhance the performance of current iris-based

biometric system in the following ways

Improving accuracy by the addition of vascular features

Facilitating recognition using off-angle iris images For instance if the iris

information is relegated to the left or right portions of the eye the sclera

vein patterns will be further exposed This feature makes sclera vasculature

a natural complement to the iris biometric

Addressing the failure-to-enroll issue when iris patterns are not usable (eg

due to surgical procedures)

Reducing vulnerability to spoof attacks For instance when implemented

alongside iris systems an attacker needs to reproduce not only the iris but

also different surfaces of the sclera along with the associated

microcirculation and make them available on commensurate eye surfaces

The first step in parallelizing an algorithm is to determine the

availability for simultaneous computation Below Figure demonstrates the

possibility for parallel directional filtering Since the filter is computed over

different portions of the input image the computation can be computed in

parallel (denoted by Elements below) In addition individual parallelization

of each element of Filtering can also be performed A detailed discussion of

our proposed parallelization is outside the scope of this paper

FIG

FIG

225 OVERVIEW OF THE LINE DESCRIPTOR-BASED SCLERA

VEIN

2251 RECOGNITION METHOD

The matching segment of the line-descriptor based method is a

bottleneck with regard to matching speed In this section we briefly

describe the Line Descriptor-based sclera vein recognition method After

segmentation vein patterns were enhanced by a bank of directional Gabor

filters Binary morphological operations are used to thin the detected vein

structure down to a single pixel wide skeleton and remove the branch

points The line descriptor is used to describe the segments in the vein

structure Figure 2 shows a visual description of the line descriptor Each

segment is described by three quantities the segments angle to some

reference angle at the iris center θ the segments distance to the iris center r

and the dominant angular orientation of the line segment ɸ Thus the

descriptor is S = ( θ r ɸ )T The individual components of the line descriptor

are calculated as

FIG

Here fline (x) is the polynomial approximation of the line segment (xl yl )

is the center point of the line segment (xi yi ) is the center of the detected

iris and S is the line descriptor In order to register the segments of the

vascular patterns a RANSAC-based algorithm is used to estimate the best-

fit parameters for registration between the two sclera vascular patterns For

the registration algorithm it randomly chooses two points ndash one from the

test template and one from the target template It also randomly chooses a

scaling factor and a rotation value based on a priori knowledge of the

database Using these values it calculates a fitness value for the registration

using these parameters

After sclera template registration each line segment in the test

template is compared to the line segments in the target template for

matches In order to reduce the effect of segmentation errors we created the

weighting image (Figure 3) from the sclera mask by setting interior pixels

in the sclera mask to 1 pixels within some distance of the boundary of the

mask to 05 and pixels outside the mask to 0

The matching score for two segment descriptors is calculated By

where Si and Sj are two segment descriptors m(Si Sj ) is the matching

score between segments Si and Sj d(Si Sj ) is the Euclidean distance

between the segment descriptors center points (from Eq 6-8) Dmatch is

the matching distance threshold and match is the matching angle threshold

The total matching score M is the sum of the individual matching scores

divided by the maximum matching score for the minimal set between the

test and target template That is one of the test or target templates has fewer

points and thus the sum of its descriptors weight sets the maximum score

that can be attained

FIG

FIG

FIG

FIG

movement of eye Y shape branches are observed to be a stable feature and

can be used as sclera feature descriptor To detect the Y shape branches in

the original template we search for the nearest neighbors set of every line

segment in a regular distance classified the angles among these neighbors

If there were two types of angle values in the line segment set this set may

be inferred as a Y shape structure and the line segment angles would be

recorded as a new feature of the sclera

There are two ways to measure both orientation and relationship of

every branch of Y shape vessels one is to use the angles of every branch to

x axle the other is to use the angels between branch and iris radial

direction The first method needs additional rotation operating to align the

template In our approach we employed the second method As Figure 6

shows ϕ1 ϕ2 and ϕ3 denote the angle between each branch and the radius

from pupil center Even when the head tilts the eye moves or the camera

zooms occurs at the image acquisition step ϕ1 ϕ2 and ϕ3 are quite stable

To tolerate errors from the pupil center calculation in the segmentation step

we also recorded the center position (x y) of the Y shape branches as

auxiliary parameters So in our rotation shift and scale invariant feature

vector is defined as y(ϕ1 ϕ2 ϕ3 x y) The Y-shape descriptor is generated

with reference to the iris center Therefore it is automatically aligned to the

iris centers It is a rotational- and scale- invariant descriptor V WPL

SCLERA DESCRIPTOR As we discussed in the Section 22 the line

descriptor is extracted from the skeleton of vessel structure in binary images

(Figure 7) The skeleton is then broken into smaller segments For each

segment a line descriptor is created to record the center and orientation of

the segment This descriptor is expressed as s(x yɸ) where (x y) is the

position of the center and ɸ is its orientation Because of the limitation of

segmentation accuracy the descriptor in the boundary of sclera area might

not be accurate and may contain spur edges resulting from the iris eyelid

andor eyelashes To be tolerant of such error the mask file

FIG

The line descriptor of the sclera vessel pattern (a) An eye image (b) Vessel

patterns in sclera (c) Enhanced sclera vessel patterns (d) Centers of line

segments of vessel patterns

Is designed to indicate whether a line segment belongs to the edge of the

sclera or not However in GPU application using the mask is a challenging

since the mask files are large in size and will occupy the GPU memory and

slow down the data transfer When matching the registration RANSAC

type algorithm was used to randomly select the corresponding descriptors

and the transform parameter between them was used to generate the

template transform affine matrix After every templates transform the mask

data should also be transformed and new boundary should be calculated to

evaluate the weight of the transformed descriptor This results in too many

convolutions in processor unit

To reduce heavy data transfer and computation we designed the

weighted polar line (WPL) descriptor structure which includes the

information of mask and can be automatically aligned We extracted the

relationship of geometric feature of descriptors and store them as a new

descriptor We use a weighted image created via setting various weight

values according to their positions The weight of those descriptors who are

beyond the sclera are set to be 0 and those who are near the sclera

boundary are 05 and interior descriptors are set to be 1 In our work

descriptors weights were calculated on their own mask by the CPU only

once

The calculating result was saved as a component of descriptor The

descriptor of sclera will change to s(x y ɸw) where w denotes the weight

of the point and the value may be 0 05 1 To align two templates when a

template is shifted to another location along the line connecting their

centers all the descriptors of that template will be transformed It would be

faster if two templates have similar reference points If we use the center of

the iris as the reference point when two templates are compared the

correspondence will automatically be aligned to each other since they have

the similar reference point Every feature vector of the template is a set of

line segment descriptors composed of three variable (Figure 8) the

segment angle to the reference line which went through the iris center

denoted as θ the distance between the segments center and pupil center

which is denoted as r the dominant angular orientation of the segment

denoted as ɸ To minimize the GPU computing we also convert the

descriptor value from polar coordinate to rectangular coordinate in CPU

preprocess

The descriptor vector becomes s(x y r θ ɸw) The left and right

parts of sclera in an eye may have different registration parameters For

example as an eyeball moves left left part sclera patterns of the eye may be

compressed while the right part sclera patterns are stretched

In parallel matching these two parts are assigned to threads in

different warps to allow different deformation The multiprocessor in

CUDA manages threads in groups of 32 parallel threads called warps We

reorganized the descriptor from same sides and saved

FIG

FIG

them in continuous address This would meet requirement of coalesced

memory access in GPU

After reorganizing the structure of descriptors and adding mask information

into the new descriptor the computation on the mask file is not needed on

the GPU It was very fast to match with this feature because it does not

need to reregister the templates every time after shifting Thus the cost of

data transfer and computation on GPU will be reduced Matching on the

new descriptor the shift parameters generator in Figure 4 is then simplified

as Figure 9

23 EVOLUTION OF GPU ARCHITECTURE

The fixed-function pipeline lacked the generality to efficiently express

more complicated shading and lighting operations that are essential for

complex effects The key step was replacing the fixed-function per-vertex

and per-fragment operations with user-specified programs run on each

vertex and fragment Over the past six years these vertex programs and

fragment programs have become increasingly more capable with larger

limits on their size and resource consumption with more fully featured

instruction sets and with more flexible control-flow operations After many

years of separate instruction sets for vertex and fragment operations current

GPUs support the unified Shader Model 40 on both vertex and fragment

shaders

The hardware must support shader programs of at least 65 k static

instructions and unlimited dynamic instructions

The instruction set for the first time supports both 32-bit integers and 32-

bit floating-point numbers

The hardware must allow an arbitrary number of both direct and indirect

reads from global memory (texture)

Finally dynamic flow control in the form of loops and branches must be

supported

As the shader model has evolved and become more powerful and GPU

applications of all types have increased vertex and fragment program

complexity GPU architectures have increasingly focused on the

programmable parts of the graphics pipeline Indeed while previous

generations of GPUs could best be described as additions of

programmability to a fixed-function pipeline todayrsquos GPUs are better

characterized as a programmable engine surrounded by supporting fixed-

function units General-Purpose Computing on the GPU Mapping general-

purpose computation onto the GPU uses the graphics hardware in much the

same way as any standard graphics application Because of this similarity it

is both easier and more difficult to explain the process On one hand the

actual operations are the same and are easy to follow on the other hand the

terminology is different between graphics and general-purpose use Harris

provides an excellent description of this mapping process

We begin by describing GPU programming using graphics terminology

then show how the same steps are used in a general-purpose way to author

GPGPU applications and finally use the same steps to show the more

simple and direct way that todayrsquos GPU computing applications are written

231 PROGRAMMING A GPU FOR GRAPHICS

We begin with the same GPU pipeline that we described in Section II

concentrating on the programmable aspects of this pipeline

The programmer specifies geometry that covers a region on the screen

The rasterizer generates a fragment at each pixel location covered by that

geometry

Each fragment is shaded by the fragment program

The fragment program computes the value of the fragment by a

combination of math operations and global memory reads from a global

Btexture[ memory

The resulting image can then be used as texture on future passes through

the graphics pipeline

232 PROGRAMMING A GPU FOR GENERAL-PURPOSE

PROGRAMS (OLD)

Coopting this pipeline to perform general-purpose computation

involves the exact same steps but different terminology A motivating

example is a fluid simulation computed over a grid at each time step we

compute the next state of the fluid for each grid point from the current state

at its grid point and at the grid points of its neighbors

The programmer specifies a geometric primitive that covers a

computation domain of interest The rasterizer generates a fragment at each

pixel location covered by that geometry (In our example our primitive

must cover a grid of fragments equal to the domain size of our fluid

simulation)

Each fragment is shaded by an SPMD general-purpose fragment

program (Each grid point runs the same program to update the state of its

fluid)

The fragment program computes the value of the fragment by a

combination of math operations and Bgather[ accesses from global

memory (Each grid point can access the state of its neighbors from the

previous time step in computing its current value)

The resulting buffer in global memory can then be used as an input on

future passes (The current state of the fluid will be used on the next time

step)

23 2 PROGRAMMING A GPU FOR GENERAL-PURPOSE

PROGRAMS (NEW)

One of the historical difficulties in programming GPGPU applications

has been that despite their general-purpose tasksrsquo having nothing to do with

graphics the applications still had to be programmed using graphics APIs

In addition the program had to be structured in terms of the graphics

pipeline with the programmable units only accessible as an intermediate

step in that pipeline when the programmer would almost certainly prefer to

access the programmable units directly The programming environments we

describe in detail in Section IV are solving this difficulty by providing a

more natural direct non-graphics interface to the hardware and

specifically the programmable units Today GPU computing applications

are structured in the following way

The programmer directly defines the computation domain of interest as a

structured grid of threads

An SPMD general-purpose program computes the value of each thread

The value for each thread is computed by a combination of math

operations and both Bgather[ (read) accesses from and Bscatter[ (write)

accesses to global memory Unlike in the previous two

methods the same buffer can be used for both reading and writing

allowing more flexible algorithms (for example in-place algorithms that

use less memory)

The resulting buffer in global memory can then be used as an input in

future computation

24 COARSE-TO-FINE TWO-STAGE MATCHING PROCESS

To further improve the matching process we propose the coarse-to-fine

two-stage matching process In the first stage we matched two images

coarsely using the Y-shape descriptors which is very fast to match because

no registration was needed The matching result in this stage can help filter

out image pairs with low similarities After this step it is still possible for

some false positive matches In the second stage we used WPL descriptor

to register the two images for more detailed descriptor matching including

scale- and translation invariance This stage includes shift transform affine

matrix generation and final WPL descriptor matching Overall we

partitioned the registration and matching processing into four kernels2 in

CUDA (Figure 10) matching on the Y shape descriptor shift

transformation affine matrix generation and final WSL descriptor

matching Combining these two stages the matching program can run faster

and

achieve more accurate score

241 STAGE I MATCHING WITH Y SHAPE DESCRIPTOR

Due to scale- and rotation- invariance of the Y-shape features

registration is unnecessary before matching on Y shape descriptor The

whole matching algorithm is listed as algorithm 1

FIG

Here ytei and yta j are the Y shape descriptors of test template Tte

and target template Tta respectively dϕ is the Euclidian distance of angle

element of descriptors vector defined as (3) dxy is the Euclidian distance of

two descriptor centers defined as (4) ni and di are the matched descriptor

pairsrsquo number and their centers distance respectively tϕ is a distance

threshold and txy is the threshold to restrict the searching area We set tϕ to

30 and txy to 675 in our experiment Here

To match two sclera templates we searched the areas nearby to all

the Y shape branches The search area is limited to the corresponding left or

right half of the sclera in order to reduce the searching range and time The

distance of two branches is defined in (3) where ϕi j is the angle between

the j th branch and the polar from pupil center in desctiptor i

The number of matched pairs ni and the distance between Y shape

branches centers di are stored as the matching result We fuse the number of

matched branches and the average distance between matched branches

centers as (2) Here α is a factor to fuse the matching score which was set

to 30 in our study Ni and Nj is the total numbers of feature vectors in

template i and j separately The decision is regulated by the threshold t if

the sclerarsquos matching score is lower than t the sclera will be discarded The

sclera with high matching score will be passed to the next more precisely

matching process

242 STAGE II FINE MATCHING USING WPL DESCRIPTOR

The line segment WSL descriptor reveals more vessel structure detail of

sclera than the Y shape descriptor The variation of sclera vessel pattern is

nonlinear because

When acquiring an eye image in different gaze angle the vessel structure

will appear nonlinear shrink or extend because eyeball is spherical in shape

sclera is made up of four layers episclera stroma lamina fusca and

endothelium There are slightly differences among movement of these

layers Considering these factors our registration employed both single

shift transform and multi-parameter transform which combines shift

rotation and scale together

1) SHIFT PARAMETER SEARCH As we discussed before

segmentation may not be accurate As a result the detected iris center could

not be very accurate Shift transform is designed to tolerant possible errors

in pupil center detection in the segmentation step If there is no deformation

or only very minor deformation registration with shift transform together

would be adequate to achieve an accurate result We designed Algorithm 2

to get optimized shift parameter Where Tte is the test template and ssei is

the i th WPL descriptor of Tte Tta is the target template and ssai is the i th

WPL descriptor of Ttad(stek staj ) is Euclidean distance of escriptors stek

and staj

Δsk is the shift value of two descriptors defines as

We first randomly select an equal number of segment descriptors

stek in test template Tte from each quad and find its nearest neighbors staj _

in target template Tta The shift offset of them is recorded as the possible

registration shift factor _sk The final offset registration factor is _soptim

which has the smallest standard deviation among these candidate offsets

2) AFFINE TRANSFORM PARAMETER SEARCH

Affine transform is designed to tolerant some deformation of sclera

patterns in the matching step The affine transform algorithm is shown in

Algorithm 3 The shift value in the parameter set is obtained by randomly

selecting descriptor s(it )te and calculating the distance from its nearest

neighbor staj_ in Tta We transform the test template by the matrix in (7)

At end of the iteration we count the numbers of matched descriptor pairs

from the transformed template and the target template The factor β is

involved to determine if the pair of descriptor is matched and we set it to

be 20 pixels in our experiment After N iterations the optimized transform

parameter set is determined via selecting the maximum matching numbers

m(it) Here stei Tte staj and Tta is defined same as algorithm 2 tr (it )

shi f t θ(it )tr (it ) scale is the parameters of shift rotation and scale

transform generated in i tth iteration R(θ (it )) T (tr (it ) shi f t ) and S(tr (it

) scale) are the transform matrix defined as (7) To search optimize

transform parameter we iterated N times to generate these parameters In

our experiment we set iteration time to 512

3) REGISTRATION AND MATCHING ALGORITHM

Using the optimized parameter set determined from Algorithms 2

and 3 the test template will be registered and matched simultaneously The

registration and matching algorithm is listed in Algorithm 4 Here stei Tte

staj and Tta are defined same as Algorithms 2 and 3 θ(optm) tr (optm) shi f

t tr (optm) scale _soptim are the registration parameters attained from

Algorithms 2 and 3 R_θ(optm)_T _tr (optm) shi f t _S_tr (optm) scale

_ is the descriptor transform matrix defined in Algorithm 3 empty is the angle

between the segment descriptor and radius direction w is the weight of the

descriptor which indicates whether the descriptor is at the edge of sclera or

not To ensure that the nearest descriptors have a similar orientation we

used a constant factor α to check the abstract difference of two ɸ In our

experiment we set α to 5 The total matching score is minimal score of two

transformed result divided by the minimal matching score for test template

and target template

25 MAPPING THE SUBTASKS TO CUDA

CUDA is a single instruction multiple data (SIMD) system and

works as a coprocessor with a CPU A CUDA consists of many streaming

multiprocessors (SM) where the parallel part of the program should be

partitioned into threads by the programmer and mapped into those threads

There are multiple memory spaces in the CUDA memory hierarchy

register local memory shared memory global memory constant memory

and texture memory Register local memory and shared memory are on-

chip and could be a little time consuming to access these memories Only

shared memory can be accessed by other threads within the same block

However there is only limited availability of shared memory Global

memory constant memory and texture memory are off-chip memory and

accessible by all threads which would be very time consuming to access

these memories

Constant memory and texture memory are read-only and cacheable

memory Mapping algorithms to CUDA to achieve efficient processing is

not a trivial task There are several challenges in CUDA programming

If threads in a warp have different control path all the branches will be

executed serially To improve performance branch divergence within a

warp should be avoided

Global memory is slower than on-chip memory in term of access To

completely hide the latency of the small instructions set we should use on-

chip memory preferentially rather than global memory When global

memory access occurs threads in same warp should access the words in

sequence to achieve coalescence

Shared memory is much faster than the local and global memory space

But shared memory is organized into banks which are equal in size If two

addresses of memory request from different thread within a warp fall in the

same memory bank the access will be serialized To get maximum

performance memory requests should be scheduled to minimize bank

conflicts

251 MAPPING ALGORITHM TO BLOCKS

Because the proposed registration and matching algorithm has four

independent modules all the modules will be converted to different kernels

on the GPU These kernels are different in computation density thus we

map them to the GPU by various map strategies to fully utilize the

computing power of CUDA Figure 11 shows our scheme of CPU-GPU

task distribution and the partition among blocks and threads Algorithm 1 is

partitioned into coarse-grained parallel subtasks

We create a number of threads in this kernel The number of threads

is the same as the number of templates in the database As the upper middle

column shows in Figure 11 each target template will be assigned to one

thread One thread performs a pair of templates compare In our work we

use NVIDIA C2070 as our GPU Threads and blocks number is set to

1024 That means we can match our test template with up to 1024times1024

target templates at same time

Algorithms 2-4 will be partitioned into fine-grained subtasks which is

processed a section of descriptors in one thread As the lower portion of the

middle column shows in Figure 11 we assigned a target template to one

block Inside a block one thread corresponds a set of descriptors in this

template This partition makes every block execute independently and there

are no data exchange requirements between different blocks When all

threads complete their responding descriptor fractions the sum of the

intermediate results needs to be computed or compared A parallel prefix

sum algorithm is used to calculate the sum of intermediate results which is

show in right of Figure 11 Firstly all odd number threads compute the sum

of consecutive pairs of the results Then recursively every first of i (= 4 8

16 32 64 ) threads

compute the prefix sum on the new result The final result will be saved in

the first address which has the same variable name as the first intermediate

result

252 MAPPING INSIDE BLOCK

In shift argument searching there are two schemes we can choose to

map task

Mapping one pair of templates to all the threads in a block and then every

thread would take charge of a fraction of descriptors and cooperation with

other threads

Assigning a single possible shift offset to a thread and all the threads will

compute independently unless the final result should be compared with

other possible offset

Due to great number of sum and synchronization operations in every

nearest neighbor searching step we choose the second method to parallelize

shift searching In affine matrix generator we mapped an entire parameter

set searching to a thread and every thread randomly generated a set of

parameters and tried them independently The generated iterations were

assigned to all threads The challenge of this step is the randomly generated

numbers might be correlated among threads In the step of rotation and

scale registration generating we used the Mersenne Twister pseudorandom

number generator because it can use bitwise arithmetic and have long

period

The Mersenne twister as most of pseudorandom generators is iterative

Therefore itrsquos hard to parallelize a single twister state update step among

several execution threads To make sure that thousands of threads in the

launch grid generate uncorrelated random sequence many simultaneous

Mersenne twisters need to process with different initial states in parallel

But even ldquovery differentrdquo (by any definition) initial state values do not

prevent the emission of correlated sequences by each generator sharing

identical parameters To solve this problem and to enable efficient

implementation of Mersenne Twister on parallel architectures we used a

special offline tool for the dynamic creation of Mersenne Twisters

parameters modified from the algorithm developed by Makoto Matsumoto

and Takuji Nishimura In the registration and matching step when

searching the nearest neighbor a line segment that has already matched

with others should not be used again In our approach a flag

FIG

FIG

Variable denoting whether the line has been matched is stored in

shared memory To share the flags all the threads in a block should wait

synchronic operation at every query step Our solution is to use a single

thread in a block to process the matching

253 MEMORY MANAGEMENT

The bandwidth inside GPU board is much higher than the

bandwidth between host memory and device memory The data transfer

between host and device can lead to long latency As shown in Figure 11

we load the entire target templates set from database without considering

when they would be processed Therefore there was no data transfer from

host to device during the matching procedure In global memory the

components in descriptor y(ϕ1 ϕ2 ϕ3 x y) and s(x y rθ ϕw) were stored

separately This would guarantee contiguous kernels of Algorithm 2 to 4

can access their data in successive addresses Although such coalescing

access reduces the latency frequently global memory access was still a

slower way to get data In our kernel we loaded the test template to shared

memory to accelerate memory access Because the Algorithms 2 to 4

execute different number of iterations on same data the bank conflict does

not happen To maximize our texture memory space we set the system

cache to the lowest value and bonded our target descriptor to texture

memory Using this catchable memory our data access was accelerated

more

FIG

26 HISTOGRAM OF ORIENTED GRADIENTS

Histogram of oriented gradients is the feature descriptors It is primarily

applied to the design of target detection In this paper it is applied as the

feature for human recognition In the sclera region the vein patterns are the

edges of an image So HOG is used to determine the gradient orientation

and edge orientations of vein pattern in the sclera region of an eye image

To follow out this technique first of all divide the image into small

connected regions called cells For each cell compute the histogram of

gradient directions or edge orientations of the pixels Then the combination

of different histogram of different cell represents the descriptor To improve

accuracy histograms can be contrast normalized by calculating the intensity

from the block and then using this value normalizes all cells within the

block This normalization result shows that it is invariant to geometric and

photometric changes The gradient magnitude m(x y) and orientation 1050592(x

y) are calculated using x and y directions gradients dx (x y) and dy (x y)

Orientation binning is the second step of HOG This method utilized

to create cell histograms Each pixel within the cell used to give a weight to

the orientation which is found in the gradient computation Gradient

magnitude is used as the weight The cells are in the rectangular form The

binning of gradient orientation should be spread over 0 to 180 degrees and

opposite direction counts as the same In the Fig 8 depicts the edge

orientation of picture elements If the images have any illumination and

contrast changes then the gradient strength must be locally normalized For

that cells are grouped together into larger blocks These blocks are

overlapping so that each cell contributes more than once to the final

descriptor Here rectangular HOG (R-HOG) blocks are applied which are

mainly in square grids The performance of HOG is improved by putting

on a Gaussian window into each block

FIG

CHAPTER 3

SOFTWARE SPECIFICATION

31 GENERAL

MATLAB(matrix laboratory) is a numerical

computing environment and fourth-generation programming language

Developed by Math Works MATLAB allows matrix manipulations

plotting of functions and data implementation of algorithms creation

of user interfaces and interfacing with programs written in other languages

including C C++ Java and Fortran

Although MATLAB is intended primarily for numerical computing an

optional toolbox uses the MuPAD symbolic engine allowing access

to symbolic computing capabilities An additional package Simulink adds

graphicalmulti-domainsimulationandModel-Based

Design for dynamic and embedded systems

In 2004 MATLAB had around one million users across industry

and academia MATLAB users come from various backgrounds

of engineering science and economics MATLAB is widely used in

academic and research institutions as well as industrial enterprises

MATLAB was first adopted by researchers and practitioners

in control engineering Littles specialty but quickly spread to many other

domains It is now also used in education in particular the teaching

of linear algebra and numerical analysis and is popular amongst scientists

involved in image processing The MATLAB application is built around the

MATLAB language The simplest way to execute MATLAB code is to type

it in the Command Window which is one of the elements of the MATLAB

Desktop When code is entered in the Command Window MATLAB can

be used as an interactive mathematical shell Sequences of commands can

be saved in a text file typically using the MATLAB Editor as a script or

encapsulated into a function extending the commands available

MATLAB provides a number of features for documenting and

sharing your work You can integrate your MATLAB code with other

languages and applications and distribute your MATLAB algorithms and

applications

32 FEATURES OF MATLAB

High-level language for technical computing

Development environment for managing code files and data

Interactive tools for iterative exploration design and problem solving

Mathematical functions for linear algebra statistics Fourier analysis

filtering optimization and numerical integration

2-D and 3-D graphics functions for visualizing data

Tools for building custom graphical user interfaces

Functions for integrating MATLAB based algorithms with external

applications and languages such as C C++ FORTRAN Javatrade COM

and Microsoft Excel

MATLAB is used in vast area including signal and image

processing communications control design test and measurement

financial modeling and analysis and computational Add-on toolboxes

(collections of special-purpose MATLAB functions) extend the MATLAB

environment to solve particular classes of problems in these application

areas

MATLAB can be used on personal computers and powerful

server systems including the Cheaha compute cluster With the addition of

the Parallel Computing Toolbox the language can be extended with parallel

implementations for common computational functions including for-loop

unrolling Additionally this toolbox supports offloading computationally

intensive workloads to Cheaha the campus compute cluster MATLAB is

one of a few languages in which each variable is a matrix (broadly

construed) and knows how big it is Moreover the fundamental operators

(eg addition multiplication) are programmed to deal with matrices when

required And the MATLAB environment handles much of the bothersome

housekeeping that makes all this possible Since so many of the procedures

required for Macro-Investment Analysis involves matrices MATLAB

proves to be an extremely efficient language for both communication and

implementation

321 INTERFACING WITH OTHER LANGUAGES

MATLAB can call functions and subroutines written in the C

programming language or FORTRAN A wrapper function is created

allowing MATLAB data types to be passed and returned The dynamically

loadable object files created by compiling such functions are termed MEX-

files (for MATLAB executable)

Libraries written in Java ActiveX or NET can be directly called

from MATLAB and many MATLAB libraries (for

example XML or SQL support) are implemented as wrappers around Java

or ActiveX libraries Calling MATLAB from Java is more complicated but

can be done with MATLAB extension which is sold separately by Math

Works or using an undocumented mechanism called JMI (Java-to-Mat lab

Interface) which should not be confused with the unrelated Java that is also

called JMI

As alternatives to the MuPAD based Symbolic Math Toolbox

available from Math Works MATLAB can be connected

to Maple or Mathematical

Libraries also exist to import and export MathML

Development Environment

Startup Accelerator for faster MATLAB startup on Windows especially on

Windows XP and for network installations

Spreadsheet Import Tool that provides more options for selecting and

loading mixed textual and numeric data

Readability and navigation improvements to warning and error messages in

the MATLAB command window

Automatic variable and function renaming in the MATLAB Editor

Developing Algorithms and Applications

MATLAB provides a high-level language and development

tools that let you quickly develop and analyze your algorithms and

applications

The MATLAB Language

The MATLAB language supports the vector and matrix operations

that are fundamental to engineering and scientific problems It enables fast

development and execution With the MATLAB language you can

program and develop algorithms faster than with traditional languages

because you do not need to perform low-level administrative tasks such as

declaring variables specifying data types and allocating memory In many

cases MATLAB eliminates the need for lsquoforrsquo loops As a result one line of

MATLAB code can often replace several lines of C or C++ code

At the same time MATLAB provides all the features of a traditional

programming language including arithmetic operators flow control data

structures data types object-oriented programming (OOP) and debugging

features

MATLAB lets you execute commands or groups of commands one

at a time without compiling and linking enabling you to quickly iterate to

the optimal solution For fast execution of heavy matrix and vector

computations MATLAB uses processor-optimized libraries For general-

purpose scalar computations MATLAB generates machine-code

instructions using its JIT (Just-In-Time) compilation technology

This technology which is available on most platforms provides

execution speeds that rival those of traditional programming languages

Development Tools

MATLAB includes development tools that help you implement

your algorithm efficiently These include the following

MATLAB Editor

Provides standard editing and debugging features such as setting

breakpoints and single stepping

Code Analyzer

Checks your code for problems and recommends modifications to

maximize performance and maintainability

MATLAB Profiler

Records the time spent executing each line of code

Directory Reports

Scan all the files in a directory and report on code efficiency file

differences file dependencies and code coverage

Designing Graphical User Interfaces

By using the interactive tool GUIDE (Graphical User Interface

Development Environment) to layout design and edit user interfaces

GUIDE lets you include list boxes pull-down menus push buttons radio

buttons and sliders as well as MATLAB plots and Microsoft

ActiveXreg controls Alternatively you can create GUIs programmatically

using MATLAB functions

322 ANALYZING AND ACCESSING DATA

MATLAB supports the entire data analysis process from acquiring

data from external devices and databases through preprocessing

visualization and numerical analysis to producing presentation-quality

output

Data Analysis

MATLAB provides interactive tools and command-line functions for data

analysis operations including

Interpolating and decimating

Extracting sections of data scaling and averaging

Thresholding and smoothing

Correlation Fourier analysis and filtering

1-D peak valley and zero finding

Basic statistics and curve fitting

Matrix analysis

Data Access

MATLAB is an efficient platform for accessing data from

files other applications databases and external devices You can read data

from popular file formats such as Microsoft Excel ASCII text or binary

files image sound and video files and scientific files such as HDF and

HDF5 Low-level binary file IO functions let you work with data files in

any format Additional functions let you read data from Web pages and

XML

Visualizing Data

All the graphics features that are required to visualize engineering

and scientific data are available in MATLAB These include 2-D and 3-D

plotting functions 3-D volume visualization functions tools for

interactively creating plots and the ability to export results to all popular

graphics formats You can customize plots by adding multiple axes

changing line colors and markers adding annotation Latex equations and

legends and drawing shapes

2-D Plotting

Visualizing vectors of data with 2-D plotting functions that create

Line area bar and pie charts

Direction and velocity plots

Histograms

Polygons and surfaces

Scatterbubble plots

Animations

3-D Plotting and Volume Visualization

MATLAB provides functions for visualizing 2-D matrices 3-

D scalar and 3-D vector data You can use these functions to visualize and

understand large often complex multidimensional data Specifying plot

characteristics such as camera viewing angle perspective lighting effect

light source locations and transparency

3-D plotting functions include

Surface contour and mesh

Image plots

Cone slice stream and isosurface

323 PERFORMING NUMERIC COMPUTATION

MATLAB contains mathematical statistical and engineering

functions to support all common engineering and science operations These

functions developed by experts in mathematics are the foundation of the

MATLAB language The core math functions use the LAPACK and BLAS

linear algebra subroutine libraries and the FFTW Discrete Fourier

Transform library Because these processor-dependent libraries are

optimized to the different platforms that MATLAB supports they execute

faster than the equivalent C or C++ code

MATLAB provides the following types of functions for performing

mathematical operations and analyzing data

Matrix manipulation and linear algebra

Polynomials and interpolation

Fourier analysis and filtering

Data analysis and statistics

Optimization and numerical integration

Ordinary differential equations (ODEs)

Partial differential equations (PDEs)

Sparse matrix operations

MATLAB can perform arithmetic on a wide range of data types

including doubles singles and integers

CHAPTER 4

IMPLEMENTATION

41 GENERAL

Matlab is a program that was originally designed to simplify the

implementation of numerical linear algebra routines It has since grown into

something much bigger and it is used to implement numerical algorithms

for a wide range of applications The basic language used is very similar to

standard linear algebra notation but there are a few extensions that will

likely cause you some problems at first

42 SNAPSHOTS

ORIGINAL SCLERA IMAGE IS CONVERTED INTO GREY SCALE IMAGE

FIG

GREY SCALE IMAGE IS CONVERTED INTO BINARY IMAGE

FIG

EDGE DETECTON IS DONE BY OTSUrsquoS THRESHOLDING

FIG

SELECTING THE REGION OF INTEREST (SCLERA PART)

FIG

SELECTED ROI PART

FIG

FIG

ENHANCEMENT OF SCLERA IMAGE

FIG

FEATURE EXTRACTION OF SCLERA IMAGE USING GABOR

FILTERS

FIG

MATCHING WITH IMAGES IN DATABASE

FIG

DISPLAYING THE RESULT (MATCHED OR NOT MATCHED)

FIG

CHAPTER 5

APPLICATIONS

The applications of biometrics can be divided into the following three main groups

Commercial applications such as computer network login electronic data security ecommerce Internet access ATM credit card physical access control cellular phone PDA medical records management distance learning etc

Government applications such as national ID card correctional facility driverrsquos license social security welfare-disbursement border control Passport control etc

Forensic applications such as corpse identification criminal investigation terrorist identification parenthood determination missing children etc Traditionally commercial applications have used knowledge-based systems (eg PIN sand passwords) government applications have used token-based systems (eg ID cards and badges) and forensic applications have relied on human experts to match biometric features Biometric systems are being increasingly deployed in large scale civilian applications The Schiphol Premium scheme at the Amsterdam airport for example employs iris scan cards to speed up the passport and visa control procedures

CHAPTER 6

CONCLUSION AND FUTURE SCOPE

61 CONCLUSION

In this paper we proposed a new parallel sclera vein recognition

method which employees a two stage parallel approach for registration and

matching Even though the research focused on developing a parallel sclera

matching solution for the sequential line-descriptor method using CUDA

GPU architecture the parallel strategies developed in this research can be

applied to design parallel solutions to other sclera vein recognition methods

and general pattern recognition methods We designed the Y shape

descriptor to narrow the search range to increase the matching efficiency

which is a new feature extraction method to take advantage of the GPU

structures We developed the WPL descriptor to incorporate mask

information and make it more suitable for parallel computing which can

dramatically reduce data transferring and computation We then carefully

mapped our algorithms to GPU threads and blocks which is an important

step to achieve parallel computation efficiency using a GPU A work flow

which has high arithmetic intensity to hide the memory access latency was

designed to partition the computation task to the heterogeneous system of

CPU and GPU even to the threads in GPU The proposed method

dramatically improves the matching efficiency without compromising

recognition accuracy

62 REFERENCES

[1] C W Oyster The Human Eye Structure and Function Sunderland

Sinauer Associates 1999

[2] C Cuevas D Berjon F Moran and N Garcia ldquoMoving object

detection for real-time augmented reality applications in a GPGPUrdquo IEEE

Trans Consum Electron vol 58 no 1 pp 117ndash125 Feb 2012

[3] D C Cirean U Meier L M Gambardella and J Schmidhuber ldquoDeep

big simple neural nets for handwritten digit recognitionrdquo Neural Comput

vol 22 no 12 pp 3207ndash3220 2010

[4] F Z Sakr M Taher and A M Wahba ldquoHigh performance iris

recognition system on GPUrdquo in Proc ICCES 2011 pp 237ndash242

[5] G Poli J H Saito J F Mari and M R Zorzan ldquoProcessing

neocognitron of face recognition on high performance environment based

on GPU with CUDA architecturerdquo in Proc 20th Int Symp Comput

Archit High Perform Comput 2008 pp 81ndash88

[6] J Antikainen J Havel R Josth A Herout P Zemcik and M Hauta-

Kasari ldquoNonnegative tensor factorization accelerated using GPGPUrdquo IEEE

Trans Parallel Distrib Syst vol 22 no 7 pp 1135ndash1141 Feb 2011

[7] K-S Oh and K Jung ldquoGPU implementation of neural networksrdquo

Pattern Recognit vol 37 no 6 pp 1311ndash1314 2004

[8] P R Dixon T Oonishi and S Furui ldquoHarnessing graphics processors

for the fast computation of acoustic likelihoods in speech recognitionrdquo

Comput Speech Lang vol 23 no 4 pp 510ndash526 2009

[9] P Kaufman and A Alm ldquoClinical applicationrdquo Adlerrsquos Physiology of

the Eye 2003

[10] R N Rakvic B J Ulis R P Broussard R W Ives and N Steiner

ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics Security vol 4

no 4 pp 812ndash823 Dec 2009

[11] S Crihalmeanu and A Ross ldquoMultispectral scleral patterns for ocular

biometric recognitionrdquo Pattern Recognit Lett vol 33 no 14 pp 1860ndash

1869 Oct 2012

[12] W Wenying Z Dongming Z Yongdong L Jintao and G

Xiaoguang ldquoRobust spatial matching for object retrieval and its parallel

implementation on GPUrdquo IEEE Trans Multimedia vol 13 no 6 pp

1308ndash1318 Dec 2011Multimedia Sec Magdeburg Germany Sep 2004

pp 4ndash15

[13] Y Xu S Deka and R Righetti ldquoA hybrid CPU-GPGPU approach for

real-time elastographyrdquo IEEE Trans Ultrason Ferroelectr Freq Control

vol 58 no 12 pp 2631ndash2645 Dec 2011

[14] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive

multimodal eye recognitionrdquo Signal Image Video Process vol 7 no 4

pp 619ndash631 Jul 2013

[15] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive

approach for sclera image quality measurerdquo Int J Biometrics vol 5 no 2

pp 181ndash198 2013

[16] Z Zhou E Y Du N L Thomas and E J Delp ldquoA new human

identification method Sclera recognitionrdquo IEEE Trans Syst Man

Cybern A Syst Humans vol 42 no 3 pp 571ndash583 May 2012

Page 34: Major project report on

The top and underside of the sclera regions are the limits of the

sclera area And then that upper eyelid lower eyelid and iris boundaries are

refined These altogether are the unwanted portion for recognition In order

to eliminate these effects refinement is done in the footstep of the detection

of sclera area Fig shows after the Otsursquos thresholding process and iris and

eyelid refinement to detect right sclera area In the same way the left sclera

area is detected using this method

FIG

In the segmentation process all images are not perfectly segmented

Hence feature extraction and matching are needed to reduce the

segmentation fault The vein patterns in the sclera area are not visible in the

segmentation process To get vein patterns more visible vein pattern

enhancement is to be performed

224 OCULAR SURFACE VASCULATURE

Human recognition using vascular patterns in the human body has

been studied in the context of fingers (Miura et al 2004) palm (Lin and

Fan 2004) and retina (Hill 1999) In the case of retinal biometrics an

especial optical device for imaging the back of the eyeball is needed (Hill

1999) Due to its perceived invasiveness and the required degree of subject

cooperation the use of retinal biometrics may not be acceptable to some

individuals The conjunctiva is a thin transparent and moist tissue that

covers the outer surface of the eye The part of the conjunctiva that covers

the inner lining of the eyelids is called palpebral conjunctiva and the part

that covers the outer surface of the eye is called ocular (or the bulbar)

conjunctiva which is the focus of this study The ocular conjunctiva is very

thin and clear thus the vasculature (including those of the episclera) is

easily visible through it The visible microcirculation of conjunctiva offers a

rich and complex network of veins and fine microcirculation (Fig 1) The

apparent complexity and specificity of these vascular patterns motivated us

to utilize them for personal identification (Derakhshani and Ross 2006)

FIG

We have found conjunctival vasculature to be a suitable biometric as it

conforms to the following criteria (Jain et al 2004)

UNIVERSALITY All normal living tissues including that of the

conjunctiva and episclera have vascular structure

UNIQUENESS Vasculature is created during embryonic vasculogenesis

Its detailed final structure is mostly stochastic and thus unique Even

though no comprehensive study on the uniqueness of vascular structures

has been conducted study of some targeted areas such as those of the eye

fundus confirm the uniqueness of such vascular patterns even between

identical twins (Simon and Goldstein 1935 Tower 1955)

PERMANENCE Other than cases of significant trauma pathology or

chemical intervention spontaneous adult ocular vasculogenesis and

angiogenesis does not easily occur Thus the conjunctival vascular

structure is expected to have reasonable permanence (Joussen 2001)

Practicality Conjunctival vasculature can be captured with commercial off

the shelf digital cameras under normal lighting conditions making this

modality highly practical

ACCEPTABILITY Since the subject is not required to stare directly into

the camera lens and given the possibility of capturing the conjunctival

vasculature from several feet away this modality is non-intrusive and thus

more acceptable

SPOOF-PROOFNESS The fine multi surface structure of the ocular

veins makes them hard to reproduce as a physical artifact Besides being a

stand-alone biometric modality we anticipate that the addition of

conjunctival biometrics will enhance the performance of current iris-based

biometric system in the following ways

Improving accuracy by the addition of vascular features

Facilitating recognition using off-angle iris images For instance if the iris

information is relegated to the left or right portions of the eye the sclera

vein patterns will be further exposed This feature makes sclera vasculature

a natural complement to the iris biometric

Addressing the failure-to-enroll issue when iris patterns are not usable (eg

due to surgical procedures)

Reducing vulnerability to spoof attacks For instance when implemented

alongside iris systems an attacker needs to reproduce not only the iris but

also different surfaces of the sclera along with the associated

microcirculation and make them available on commensurate eye surfaces

The first step in parallelizing an algorithm is to determine the

availability for simultaneous computation Below Figure demonstrates the

possibility for parallel directional filtering Since the filter is computed over

different portions of the input image the computation can be computed in

parallel (denoted by Elements below) In addition individual parallelization

of each element of Filtering can also be performed A detailed discussion of

our proposed parallelization is outside the scope of this paper

FIG

FIG

225 OVERVIEW OF THE LINE DESCRIPTOR-BASED SCLERA

VEIN

2251 RECOGNITION METHOD

The matching segment of the line-descriptor based method is a

bottleneck with regard to matching speed In this section we briefly

describe the Line Descriptor-based sclera vein recognition method After

segmentation vein patterns were enhanced by a bank of directional Gabor

filters Binary morphological operations are used to thin the detected vein

structure down to a single pixel wide skeleton and remove the branch

points The line descriptor is used to describe the segments in the vein

structure Figure 2 shows a visual description of the line descriptor Each

segment is described by three quantities the segments angle to some

reference angle at the iris center θ the segments distance to the iris center r

and the dominant angular orientation of the line segment ɸ Thus the

descriptor is S = ( θ r ɸ )T The individual components of the line descriptor

are calculated as

FIG

Here fline (x) is the polynomial approximation of the line segment (xl yl )

is the center point of the line segment (xi yi ) is the center of the detected

iris and S is the line descriptor In order to register the segments of the

vascular patterns a RANSAC-based algorithm is used to estimate the best-

fit parameters for registration between the two sclera vascular patterns For

the registration algorithm it randomly chooses two points ndash one from the

test template and one from the target template It also randomly chooses a

scaling factor and a rotation value based on a priori knowledge of the

database Using these values it calculates a fitness value for the registration

using these parameters

After sclera template registration each line segment in the test

template is compared to the line segments in the target template for

matches In order to reduce the effect of segmentation errors we created the

weighting image (Figure 3) from the sclera mask by setting interior pixels

in the sclera mask to 1 pixels within some distance of the boundary of the

mask to 05 and pixels outside the mask to 0

The matching score for two segment descriptors is calculated By

where Si and Sj are two segment descriptors m(Si Sj ) is the matching

score between segments Si and Sj d(Si Sj ) is the Euclidean distance

between the segment descriptors center points (from Eq 6-8) Dmatch is

the matching distance threshold and match is the matching angle threshold

The total matching score M is the sum of the individual matching scores

divided by the maximum matching score for the minimal set between the

test and target template That is one of the test or target templates has fewer

points and thus the sum of its descriptors weight sets the maximum score

that can be attained

FIG

FIG

FIG

FIG

movement of eye Y shape branches are observed to be a stable feature and

can be used as sclera feature descriptor To detect the Y shape branches in

the original template we search for the nearest neighbors set of every line

segment in a regular distance classified the angles among these neighbors

If there were two types of angle values in the line segment set this set may

be inferred as a Y shape structure and the line segment angles would be

recorded as a new feature of the sclera

There are two ways to measure both orientation and relationship of

every branch of Y shape vessels one is to use the angles of every branch to

x axle the other is to use the angels between branch and iris radial

direction The first method needs additional rotation operating to align the

template In our approach we employed the second method As Figure 6

shows ϕ1 ϕ2 and ϕ3 denote the angle between each branch and the radius

from pupil center Even when the head tilts the eye moves or the camera

zooms occurs at the image acquisition step ϕ1 ϕ2 and ϕ3 are quite stable

To tolerate errors from the pupil center calculation in the segmentation step

we also recorded the center position (x y) of the Y shape branches as

auxiliary parameters So in our rotation shift and scale invariant feature

vector is defined as y(ϕ1 ϕ2 ϕ3 x y) The Y-shape descriptor is generated

with reference to the iris center Therefore it is automatically aligned to the

iris centers It is a rotational- and scale- invariant descriptor V WPL

SCLERA DESCRIPTOR As we discussed in the Section 22 the line

descriptor is extracted from the skeleton of vessel structure in binary images

(Figure 7) The skeleton is then broken into smaller segments For each

segment a line descriptor is created to record the center and orientation of

the segment This descriptor is expressed as s(x yɸ) where (x y) is the

position of the center and ɸ is its orientation Because of the limitation of

segmentation accuracy the descriptor in the boundary of sclera area might

not be accurate and may contain spur edges resulting from the iris eyelid

andor eyelashes To be tolerant of such error the mask file

FIG

The line descriptor of the sclera vessel pattern (a) An eye image (b) Vessel

patterns in sclera (c) Enhanced sclera vessel patterns (d) Centers of line

segments of vessel patterns

Is designed to indicate whether a line segment belongs to the edge of the

sclera or not However in GPU application using the mask is a challenging

since the mask files are large in size and will occupy the GPU memory and

slow down the data transfer When matching the registration RANSAC

type algorithm was used to randomly select the corresponding descriptors

and the transform parameter between them was used to generate the

template transform affine matrix After every templates transform the mask

data should also be transformed and new boundary should be calculated to

evaluate the weight of the transformed descriptor This results in too many

convolutions in processor unit

To reduce heavy data transfer and computation we designed the

weighted polar line (WPL) descriptor structure which includes the

information of mask and can be automatically aligned We extracted the

relationship of geometric feature of descriptors and store them as a new

descriptor We use a weighted image created via setting various weight

values according to their positions The weight of those descriptors who are

beyond the sclera are set to be 0 and those who are near the sclera

boundary are 05 and interior descriptors are set to be 1 In our work

descriptors weights were calculated on their own mask by the CPU only

once

The calculating result was saved as a component of descriptor The

descriptor of sclera will change to s(x y ɸw) where w denotes the weight

of the point and the value may be 0 05 1 To align two templates when a

template is shifted to another location along the line connecting their

centers all the descriptors of that template will be transformed It would be

faster if two templates have similar reference points If we use the center of

the iris as the reference point when two templates are compared the

correspondence will automatically be aligned to each other since they have

the similar reference point Every feature vector of the template is a set of

line segment descriptors composed of three variable (Figure 8) the

segment angle to the reference line which went through the iris center

denoted as θ the distance between the segments center and pupil center

which is denoted as r the dominant angular orientation of the segment

denoted as ɸ To minimize the GPU computing we also convert the

descriptor value from polar coordinate to rectangular coordinate in CPU

preprocess

The descriptor vector becomes s(x y r θ ɸw) The left and right

parts of sclera in an eye may have different registration parameters For

example as an eyeball moves left left part sclera patterns of the eye may be

compressed while the right part sclera patterns are stretched

In parallel matching these two parts are assigned to threads in

different warps to allow different deformation The multiprocessor in

CUDA manages threads in groups of 32 parallel threads called warps We

reorganized the descriptor from same sides and saved

FIG

FIG

them in continuous address This would meet requirement of coalesced

memory access in GPU

After reorganizing the structure of descriptors and adding mask information

into the new descriptor the computation on the mask file is not needed on

the GPU It was very fast to match with this feature because it does not

need to reregister the templates every time after shifting Thus the cost of

data transfer and computation on GPU will be reduced Matching on the

new descriptor the shift parameters generator in Figure 4 is then simplified

as Figure 9

23 EVOLUTION OF GPU ARCHITECTURE

The fixed-function pipeline lacked the generality to efficiently express

more complicated shading and lighting operations that are essential for

complex effects The key step was replacing the fixed-function per-vertex

and per-fragment operations with user-specified programs run on each

vertex and fragment Over the past six years these vertex programs and

fragment programs have become increasingly more capable with larger

limits on their size and resource consumption with more fully featured

instruction sets and with more flexible control-flow operations After many

years of separate instruction sets for vertex and fragment operations current

GPUs support the unified Shader Model 40 on both vertex and fragment

shaders

The hardware must support shader programs of at least 65 k static

instructions and unlimited dynamic instructions

The instruction set for the first time supports both 32-bit integers and 32-

bit floating-point numbers

The hardware must allow an arbitrary number of both direct and indirect

reads from global memory (texture)

Finally dynamic flow control in the form of loops and branches must be

supported

As the shader model has evolved and become more powerful and GPU

applications of all types have increased vertex and fragment program

complexity GPU architectures have increasingly focused on the

programmable parts of the graphics pipeline Indeed while previous

generations of GPUs could best be described as additions of

programmability to a fixed-function pipeline todayrsquos GPUs are better

characterized as a programmable engine surrounded by supporting fixed-

function units General-Purpose Computing on the GPU Mapping general-

purpose computation onto the GPU uses the graphics hardware in much the

same way as any standard graphics application Because of this similarity it

is both easier and more difficult to explain the process On one hand the

actual operations are the same and are easy to follow on the other hand the

terminology is different between graphics and general-purpose use Harris

provides an excellent description of this mapping process

We begin by describing GPU programming using graphics terminology

then show how the same steps are used in a general-purpose way to author

GPGPU applications and finally use the same steps to show the more

simple and direct way that todayrsquos GPU computing applications are written

231 PROGRAMMING A GPU FOR GRAPHICS

We begin with the same GPU pipeline that we described in Section II

concentrating on the programmable aspects of this pipeline

The programmer specifies geometry that covers a region on the screen

The rasterizer generates a fragment at each pixel location covered by that

geometry

Each fragment is shaded by the fragment program

The fragment program computes the value of the fragment by a

combination of math operations and global memory reads from a global

Btexture[ memory

The resulting image can then be used as texture on future passes through

the graphics pipeline

232 PROGRAMMING A GPU FOR GENERAL-PURPOSE

PROGRAMS (OLD)

Coopting this pipeline to perform general-purpose computation

involves the exact same steps but different terminology A motivating

example is a fluid simulation computed over a grid at each time step we

compute the next state of the fluid for each grid point from the current state

at its grid point and at the grid points of its neighbors

The programmer specifies a geometric primitive that covers a

computation domain of interest The rasterizer generates a fragment at each

pixel location covered by that geometry (In our example our primitive

must cover a grid of fragments equal to the domain size of our fluid

simulation)

Each fragment is shaded by an SPMD general-purpose fragment

program (Each grid point runs the same program to update the state of its

fluid)

The fragment program computes the value of the fragment by a

combination of math operations and Bgather[ accesses from global

memory (Each grid point can access the state of its neighbors from the

previous time step in computing its current value)

The resulting buffer in global memory can then be used as an input on

future passes (The current state of the fluid will be used on the next time

step)

23 2 PROGRAMMING A GPU FOR GENERAL-PURPOSE

PROGRAMS (NEW)

One of the historical difficulties in programming GPGPU applications

has been that despite their general-purpose tasksrsquo having nothing to do with

graphics the applications still had to be programmed using graphics APIs

In addition the program had to be structured in terms of the graphics

pipeline with the programmable units only accessible as an intermediate

step in that pipeline when the programmer would almost certainly prefer to

access the programmable units directly The programming environments we

describe in detail in Section IV are solving this difficulty by providing a

more natural direct non-graphics interface to the hardware and

specifically the programmable units Today GPU computing applications

are structured in the following way

The programmer directly defines the computation domain of interest as a

structured grid of threads

An SPMD general-purpose program computes the value of each thread

The value for each thread is computed by a combination of math

operations and both Bgather[ (read) accesses from and Bscatter[ (write)

accesses to global memory Unlike in the previous two

methods the same buffer can be used for both reading and writing

allowing more flexible algorithms (for example in-place algorithms that

use less memory)

The resulting buffer in global memory can then be used as an input in

future computation

24 COARSE-TO-FINE TWO-STAGE MATCHING PROCESS

To further improve the matching process we propose the coarse-to-fine

two-stage matching process In the first stage we matched two images

coarsely using the Y-shape descriptors which is very fast to match because

no registration was needed The matching result in this stage can help filter

out image pairs with low similarities After this step it is still possible for

some false positive matches In the second stage we used WPL descriptor

to register the two images for more detailed descriptor matching including

scale- and translation invariance This stage includes shift transform affine

matrix generation and final WPL descriptor matching Overall we

partitioned the registration and matching processing into four kernels2 in

CUDA (Figure 10) matching on the Y shape descriptor shift

transformation affine matrix generation and final WSL descriptor

matching Combining these two stages the matching program can run faster

and

achieve more accurate score

241 STAGE I MATCHING WITH Y SHAPE DESCRIPTOR

Due to scale- and rotation- invariance of the Y-shape features

registration is unnecessary before matching on Y shape descriptor The

whole matching algorithm is listed as algorithm 1

FIG

Here ytei and yta j are the Y shape descriptors of test template Tte

and target template Tta respectively dϕ is the Euclidian distance of angle

element of descriptors vector defined as (3) dxy is the Euclidian distance of

two descriptor centers defined as (4) ni and di are the matched descriptor

pairsrsquo number and their centers distance respectively tϕ is a distance

threshold and txy is the threshold to restrict the searching area We set tϕ to

30 and txy to 675 in our experiment Here

To match two sclera templates we searched the areas nearby to all

the Y shape branches The search area is limited to the corresponding left or

right half of the sclera in order to reduce the searching range and time The

distance of two branches is defined in (3) where ϕi j is the angle between

the j th branch and the polar from pupil center in desctiptor i

The number of matched pairs ni and the distance between Y shape

branches centers di are stored as the matching result We fuse the number of

matched branches and the average distance between matched branches

centers as (2) Here α is a factor to fuse the matching score which was set

to 30 in our study Ni and Nj is the total numbers of feature vectors in

template i and j separately The decision is regulated by the threshold t if

the sclerarsquos matching score is lower than t the sclera will be discarded The

sclera with high matching score will be passed to the next more precisely

matching process

242 STAGE II FINE MATCHING USING WPL DESCRIPTOR

The line segment WSL descriptor reveals more vessel structure detail of

sclera than the Y shape descriptor The variation of sclera vessel pattern is

nonlinear because

When acquiring an eye image in different gaze angle the vessel structure

will appear nonlinear shrink or extend because eyeball is spherical in shape

sclera is made up of four layers episclera stroma lamina fusca and

endothelium There are slightly differences among movement of these

layers Considering these factors our registration employed both single

shift transform and multi-parameter transform which combines shift

rotation and scale together

1) SHIFT PARAMETER SEARCH As we discussed before

segmentation may not be accurate As a result the detected iris center could

not be very accurate Shift transform is designed to tolerant possible errors

in pupil center detection in the segmentation step If there is no deformation

or only very minor deformation registration with shift transform together

would be adequate to achieve an accurate result We designed Algorithm 2

to get optimized shift parameter Where Tte is the test template and ssei is

the i th WPL descriptor of Tte Tta is the target template and ssai is the i th

WPL descriptor of Ttad(stek staj ) is Euclidean distance of escriptors stek

and staj

Δsk is the shift value of two descriptors defines as

We first randomly select an equal number of segment descriptors

stek in test template Tte from each quad and find its nearest neighbors staj _

in target template Tta The shift offset of them is recorded as the possible

registration shift factor _sk The final offset registration factor is _soptim

which has the smallest standard deviation among these candidate offsets

2) AFFINE TRANSFORM PARAMETER SEARCH

Affine transform is designed to tolerant some deformation of sclera

patterns in the matching step The affine transform algorithm is shown in

Algorithm 3 The shift value in the parameter set is obtained by randomly

selecting descriptor s(it )te and calculating the distance from its nearest

neighbor staj_ in Tta We transform the test template by the matrix in (7)

At end of the iteration we count the numbers of matched descriptor pairs

from the transformed template and the target template The factor β is

involved to determine if the pair of descriptor is matched and we set it to

be 20 pixels in our experiment After N iterations the optimized transform

parameter set is determined via selecting the maximum matching numbers

m(it) Here stei Tte staj and Tta is defined same as algorithm 2 tr (it )

shi f t θ(it )tr (it ) scale is the parameters of shift rotation and scale

transform generated in i tth iteration R(θ (it )) T (tr (it ) shi f t ) and S(tr (it

) scale) are the transform matrix defined as (7) To search optimize

transform parameter we iterated N times to generate these parameters In

our experiment we set iteration time to 512

3) REGISTRATION AND MATCHING ALGORITHM

Using the optimized parameter set determined from Algorithms 2

and 3 the test template will be registered and matched simultaneously The

registration and matching algorithm is listed in Algorithm 4 Here stei Tte

staj and Tta are defined same as Algorithms 2 and 3 θ(optm) tr (optm) shi f

t tr (optm) scale _soptim are the registration parameters attained from

Algorithms 2 and 3 R_θ(optm)_T _tr (optm) shi f t _S_tr (optm) scale

_ is the descriptor transform matrix defined in Algorithm 3 empty is the angle

between the segment descriptor and radius direction w is the weight of the

descriptor which indicates whether the descriptor is at the edge of sclera or

not To ensure that the nearest descriptors have a similar orientation we

used a constant factor α to check the abstract difference of two ɸ In our

experiment we set α to 5 The total matching score is minimal score of two

transformed result divided by the minimal matching score for test template

and target template

25 MAPPING THE SUBTASKS TO CUDA

CUDA is a single instruction multiple data (SIMD) system and

works as a coprocessor with a CPU A CUDA consists of many streaming

multiprocessors (SM) where the parallel part of the program should be

partitioned into threads by the programmer and mapped into those threads

There are multiple memory spaces in the CUDA memory hierarchy

register local memory shared memory global memory constant memory

and texture memory Register local memory and shared memory are on-

chip and could be a little time consuming to access these memories Only

shared memory can be accessed by other threads within the same block

However there is only limited availability of shared memory Global

memory constant memory and texture memory are off-chip memory and

accessible by all threads which would be very time consuming to access

these memories

Constant memory and texture memory are read-only and cacheable

memory Mapping algorithms to CUDA to achieve efficient processing is

not a trivial task There are several challenges in CUDA programming

If threads in a warp have different control path all the branches will be

executed serially To improve performance branch divergence within a

warp should be avoided

Global memory is slower than on-chip memory in term of access To

completely hide the latency of the small instructions set we should use on-

chip memory preferentially rather than global memory When global

memory access occurs threads in same warp should access the words in

sequence to achieve coalescence

Shared memory is much faster than the local and global memory space

But shared memory is organized into banks which are equal in size If two

addresses of memory request from different thread within a warp fall in the

same memory bank the access will be serialized To get maximum

performance memory requests should be scheduled to minimize bank

conflicts

251 MAPPING ALGORITHM TO BLOCKS

Because the proposed registration and matching algorithm has four

independent modules all the modules will be converted to different kernels

on the GPU These kernels are different in computation density thus we

map them to the GPU by various map strategies to fully utilize the

computing power of CUDA Figure 11 shows our scheme of CPU-GPU

task distribution and the partition among blocks and threads Algorithm 1 is

partitioned into coarse-grained parallel subtasks

We create a number of threads in this kernel The number of threads

is the same as the number of templates in the database As the upper middle

column shows in Figure 11 each target template will be assigned to one

thread One thread performs a pair of templates compare In our work we

use NVIDIA C2070 as our GPU Threads and blocks number is set to

1024 That means we can match our test template with up to 1024times1024

target templates at same time

Algorithms 2-4 will be partitioned into fine-grained subtasks which is

processed a section of descriptors in one thread As the lower portion of the

middle column shows in Figure 11 we assigned a target template to one

block Inside a block one thread corresponds a set of descriptors in this

template This partition makes every block execute independently and there

are no data exchange requirements between different blocks When all

threads complete their responding descriptor fractions the sum of the

intermediate results needs to be computed or compared A parallel prefix

sum algorithm is used to calculate the sum of intermediate results which is

show in right of Figure 11 Firstly all odd number threads compute the sum

of consecutive pairs of the results Then recursively every first of i (= 4 8

16 32 64 ) threads

compute the prefix sum on the new result The final result will be saved in

the first address which has the same variable name as the first intermediate

result

252 MAPPING INSIDE BLOCK

In shift argument searching there are two schemes we can choose to

map task

Mapping one pair of templates to all the threads in a block and then every

thread would take charge of a fraction of descriptors and cooperation with

other threads

Assigning a single possible shift offset to a thread and all the threads will

compute independently unless the final result should be compared with

other possible offset

Due to great number of sum and synchronization operations in every

nearest neighbor searching step we choose the second method to parallelize

shift searching In affine matrix generator we mapped an entire parameter

set searching to a thread and every thread randomly generated a set of

parameters and tried them independently The generated iterations were

assigned to all threads The challenge of this step is the randomly generated

numbers might be correlated among threads In the step of rotation and

scale registration generating we used the Mersenne Twister pseudorandom

number generator because it can use bitwise arithmetic and have long

period

The Mersenne twister as most of pseudorandom generators is iterative

Therefore itrsquos hard to parallelize a single twister state update step among

several execution threads To make sure that thousands of threads in the

launch grid generate uncorrelated random sequence many simultaneous

Mersenne twisters need to process with different initial states in parallel

But even ldquovery differentrdquo (by any definition) initial state values do not

prevent the emission of correlated sequences by each generator sharing

identical parameters To solve this problem and to enable efficient

implementation of Mersenne Twister on parallel architectures we used a

special offline tool for the dynamic creation of Mersenne Twisters

parameters modified from the algorithm developed by Makoto Matsumoto

and Takuji Nishimura In the registration and matching step when

searching the nearest neighbor a line segment that has already matched

with others should not be used again In our approach a flag

FIG

FIG

Variable denoting whether the line has been matched is stored in

shared memory To share the flags all the threads in a block should wait

synchronic operation at every query step Our solution is to use a single

thread in a block to process the matching

253 MEMORY MANAGEMENT

The bandwidth inside GPU board is much higher than the

bandwidth between host memory and device memory The data transfer

between host and device can lead to long latency As shown in Figure 11

we load the entire target templates set from database without considering

when they would be processed Therefore there was no data transfer from

host to device during the matching procedure In global memory the

components in descriptor y(ϕ1 ϕ2 ϕ3 x y) and s(x y rθ ϕw) were stored

separately This would guarantee contiguous kernels of Algorithm 2 to 4

can access their data in successive addresses Although such coalescing

access reduces the latency frequently global memory access was still a

slower way to get data In our kernel we loaded the test template to shared

memory to accelerate memory access Because the Algorithms 2 to 4

execute different number of iterations on same data the bank conflict does

not happen To maximize our texture memory space we set the system

cache to the lowest value and bonded our target descriptor to texture

memory Using this catchable memory our data access was accelerated

more

FIG

26 HISTOGRAM OF ORIENTED GRADIENTS

Histogram of oriented gradients is the feature descriptors It is primarily

applied to the design of target detection In this paper it is applied as the

feature for human recognition In the sclera region the vein patterns are the

edges of an image So HOG is used to determine the gradient orientation

and edge orientations of vein pattern in the sclera region of an eye image

To follow out this technique first of all divide the image into small

connected regions called cells For each cell compute the histogram of

gradient directions or edge orientations of the pixels Then the combination

of different histogram of different cell represents the descriptor To improve

accuracy histograms can be contrast normalized by calculating the intensity

from the block and then using this value normalizes all cells within the

block This normalization result shows that it is invariant to geometric and

photometric changes The gradient magnitude m(x y) and orientation 1050592(x

y) are calculated using x and y directions gradients dx (x y) and dy (x y)

Orientation binning is the second step of HOG This method utilized

to create cell histograms Each pixel within the cell used to give a weight to

the orientation which is found in the gradient computation Gradient

magnitude is used as the weight The cells are in the rectangular form The

binning of gradient orientation should be spread over 0 to 180 degrees and

opposite direction counts as the same In the Fig 8 depicts the edge

orientation of picture elements If the images have any illumination and

contrast changes then the gradient strength must be locally normalized For

that cells are grouped together into larger blocks These blocks are

overlapping so that each cell contributes more than once to the final

descriptor Here rectangular HOG (R-HOG) blocks are applied which are

mainly in square grids The performance of HOG is improved by putting

on a Gaussian window into each block

FIG

CHAPTER 3

SOFTWARE SPECIFICATION

31 GENERAL

MATLAB(matrix laboratory) is a numerical

computing environment and fourth-generation programming language

Developed by Math Works MATLAB allows matrix manipulations

plotting of functions and data implementation of algorithms creation

of user interfaces and interfacing with programs written in other languages

including C C++ Java and Fortran

Although MATLAB is intended primarily for numerical computing an

optional toolbox uses the MuPAD symbolic engine allowing access

to symbolic computing capabilities An additional package Simulink adds

graphicalmulti-domainsimulationandModel-Based

Design for dynamic and embedded systems

In 2004 MATLAB had around one million users across industry

and academia MATLAB users come from various backgrounds

of engineering science and economics MATLAB is widely used in

academic and research institutions as well as industrial enterprises

MATLAB was first adopted by researchers and practitioners

in control engineering Littles specialty but quickly spread to many other

domains It is now also used in education in particular the teaching

of linear algebra and numerical analysis and is popular amongst scientists

involved in image processing The MATLAB application is built around the

MATLAB language The simplest way to execute MATLAB code is to type

it in the Command Window which is one of the elements of the MATLAB

Desktop When code is entered in the Command Window MATLAB can

be used as an interactive mathematical shell Sequences of commands can

be saved in a text file typically using the MATLAB Editor as a script or

encapsulated into a function extending the commands available

MATLAB provides a number of features for documenting and

sharing your work You can integrate your MATLAB code with other

languages and applications and distribute your MATLAB algorithms and

applications

32 FEATURES OF MATLAB

High-level language for technical computing

Development environment for managing code files and data

Interactive tools for iterative exploration design and problem solving

Mathematical functions for linear algebra statistics Fourier analysis

filtering optimization and numerical integration

2-D and 3-D graphics functions for visualizing data

Tools for building custom graphical user interfaces

Functions for integrating MATLAB based algorithms with external

applications and languages such as C C++ FORTRAN Javatrade COM

and Microsoft Excel

MATLAB is used in vast area including signal and image

processing communications control design test and measurement

financial modeling and analysis and computational Add-on toolboxes

(collections of special-purpose MATLAB functions) extend the MATLAB

environment to solve particular classes of problems in these application

areas

MATLAB can be used on personal computers and powerful

server systems including the Cheaha compute cluster With the addition of

the Parallel Computing Toolbox the language can be extended with parallel

implementations for common computational functions including for-loop

unrolling Additionally this toolbox supports offloading computationally

intensive workloads to Cheaha the campus compute cluster MATLAB is

one of a few languages in which each variable is a matrix (broadly

construed) and knows how big it is Moreover the fundamental operators

(eg addition multiplication) are programmed to deal with matrices when

required And the MATLAB environment handles much of the bothersome

housekeeping that makes all this possible Since so many of the procedures

required for Macro-Investment Analysis involves matrices MATLAB

proves to be an extremely efficient language for both communication and

implementation

321 INTERFACING WITH OTHER LANGUAGES

MATLAB can call functions and subroutines written in the C

programming language or FORTRAN A wrapper function is created

allowing MATLAB data types to be passed and returned The dynamically

loadable object files created by compiling such functions are termed MEX-

files (for MATLAB executable)

Libraries written in Java ActiveX or NET can be directly called

from MATLAB and many MATLAB libraries (for

example XML or SQL support) are implemented as wrappers around Java

or ActiveX libraries Calling MATLAB from Java is more complicated but

can be done with MATLAB extension which is sold separately by Math

Works or using an undocumented mechanism called JMI (Java-to-Mat lab

Interface) which should not be confused with the unrelated Java that is also

called JMI

As alternatives to the MuPAD based Symbolic Math Toolbox

available from Math Works MATLAB can be connected

to Maple or Mathematical

Libraries also exist to import and export MathML

Development Environment

Startup Accelerator for faster MATLAB startup on Windows especially on

Windows XP and for network installations

Spreadsheet Import Tool that provides more options for selecting and

loading mixed textual and numeric data

Readability and navigation improvements to warning and error messages in

the MATLAB command window

Automatic variable and function renaming in the MATLAB Editor

Developing Algorithms and Applications

MATLAB provides a high-level language and development

tools that let you quickly develop and analyze your algorithms and

applications

The MATLAB Language

The MATLAB language supports the vector and matrix operations

that are fundamental to engineering and scientific problems It enables fast

development and execution With the MATLAB language you can

program and develop algorithms faster than with traditional languages

because you do not need to perform low-level administrative tasks such as

declaring variables specifying data types and allocating memory In many

cases MATLAB eliminates the need for lsquoforrsquo loops As a result one line of

MATLAB code can often replace several lines of C or C++ code

At the same time MATLAB provides all the features of a traditional

programming language including arithmetic operators flow control data

structures data types object-oriented programming (OOP) and debugging

features

MATLAB lets you execute commands or groups of commands one

at a time without compiling and linking enabling you to quickly iterate to

the optimal solution For fast execution of heavy matrix and vector

computations MATLAB uses processor-optimized libraries For general-

purpose scalar computations MATLAB generates machine-code

instructions using its JIT (Just-In-Time) compilation technology

This technology which is available on most platforms provides

execution speeds that rival those of traditional programming languages

Development Tools

MATLAB includes development tools that help you implement

your algorithm efficiently These include the following

MATLAB Editor

Provides standard editing and debugging features such as setting

breakpoints and single stepping

Code Analyzer

Checks your code for problems and recommends modifications to

maximize performance and maintainability

MATLAB Profiler

Records the time spent executing each line of code

Directory Reports

Scan all the files in a directory and report on code efficiency file

differences file dependencies and code coverage

Designing Graphical User Interfaces

By using the interactive tool GUIDE (Graphical User Interface

Development Environment) to layout design and edit user interfaces

GUIDE lets you include list boxes pull-down menus push buttons radio

buttons and sliders as well as MATLAB plots and Microsoft

ActiveXreg controls Alternatively you can create GUIs programmatically

using MATLAB functions

322 ANALYZING AND ACCESSING DATA

MATLAB supports the entire data analysis process from acquiring

data from external devices and databases through preprocessing

visualization and numerical analysis to producing presentation-quality

output

Data Analysis

MATLAB provides interactive tools and command-line functions for data

analysis operations including

Interpolating and decimating

Extracting sections of data scaling and averaging

Thresholding and smoothing

Correlation Fourier analysis and filtering

1-D peak valley and zero finding

Basic statistics and curve fitting

Matrix analysis

Data Access

MATLAB is an efficient platform for accessing data from

files other applications databases and external devices You can read data

from popular file formats such as Microsoft Excel ASCII text or binary

files image sound and video files and scientific files such as HDF and

HDF5 Low-level binary file IO functions let you work with data files in

any format Additional functions let you read data from Web pages and

XML

Visualizing Data

All the graphics features that are required to visualize engineering

and scientific data are available in MATLAB These include 2-D and 3-D

plotting functions 3-D volume visualization functions tools for

interactively creating plots and the ability to export results to all popular

graphics formats You can customize plots by adding multiple axes

changing line colors and markers adding annotation Latex equations and

legends and drawing shapes

2-D Plotting

Visualizing vectors of data with 2-D plotting functions that create

Line area bar and pie charts

Direction and velocity plots

Histograms

Polygons and surfaces

Scatterbubble plots

Animations

3-D Plotting and Volume Visualization

MATLAB provides functions for visualizing 2-D matrices 3-

D scalar and 3-D vector data You can use these functions to visualize and

understand large often complex multidimensional data Specifying plot

characteristics such as camera viewing angle perspective lighting effect

light source locations and transparency

3-D plotting functions include

Surface contour and mesh

Image plots

Cone slice stream and isosurface

323 PERFORMING NUMERIC COMPUTATION

MATLAB contains mathematical statistical and engineering

functions to support all common engineering and science operations These

functions developed by experts in mathematics are the foundation of the

MATLAB language The core math functions use the LAPACK and BLAS

linear algebra subroutine libraries and the FFTW Discrete Fourier

Transform library Because these processor-dependent libraries are

optimized to the different platforms that MATLAB supports they execute

faster than the equivalent C or C++ code

MATLAB provides the following types of functions for performing

mathematical operations and analyzing data

Matrix manipulation and linear algebra

Polynomials and interpolation

Fourier analysis and filtering

Data analysis and statistics

Optimization and numerical integration

Ordinary differential equations (ODEs)

Partial differential equations (PDEs)

Sparse matrix operations

MATLAB can perform arithmetic on a wide range of data types

including doubles singles and integers

CHAPTER 4

IMPLEMENTATION

41 GENERAL

Matlab is a program that was originally designed to simplify the

implementation of numerical linear algebra routines It has since grown into

something much bigger and it is used to implement numerical algorithms

for a wide range of applications The basic language used is very similar to

standard linear algebra notation but there are a few extensions that will

likely cause you some problems at first

42 SNAPSHOTS

ORIGINAL SCLERA IMAGE IS CONVERTED INTO GREY SCALE IMAGE

FIG

GREY SCALE IMAGE IS CONVERTED INTO BINARY IMAGE

FIG

EDGE DETECTON IS DONE BY OTSUrsquoS THRESHOLDING

FIG

SELECTING THE REGION OF INTEREST (SCLERA PART)

FIG

SELECTED ROI PART

FIG

FIG

ENHANCEMENT OF SCLERA IMAGE

FIG

FEATURE EXTRACTION OF SCLERA IMAGE USING GABOR

FILTERS

FIG

MATCHING WITH IMAGES IN DATABASE

FIG

DISPLAYING THE RESULT (MATCHED OR NOT MATCHED)

FIG

CHAPTER 5

APPLICATIONS

The applications of biometrics can be divided into the following three main groups

Commercial applications such as computer network login electronic data security ecommerce Internet access ATM credit card physical access control cellular phone PDA medical records management distance learning etc

Government applications such as national ID card correctional facility driverrsquos license social security welfare-disbursement border control Passport control etc

Forensic applications such as corpse identification criminal investigation terrorist identification parenthood determination missing children etc Traditionally commercial applications have used knowledge-based systems (eg PIN sand passwords) government applications have used token-based systems (eg ID cards and badges) and forensic applications have relied on human experts to match biometric features Biometric systems are being increasingly deployed in large scale civilian applications The Schiphol Premium scheme at the Amsterdam airport for example employs iris scan cards to speed up the passport and visa control procedures

CHAPTER 6

CONCLUSION AND FUTURE SCOPE

61 CONCLUSION

In this paper we proposed a new parallel sclera vein recognition

method which employees a two stage parallel approach for registration and

matching Even though the research focused on developing a parallel sclera

matching solution for the sequential line-descriptor method using CUDA

GPU architecture the parallel strategies developed in this research can be

applied to design parallel solutions to other sclera vein recognition methods

and general pattern recognition methods We designed the Y shape

descriptor to narrow the search range to increase the matching efficiency

which is a new feature extraction method to take advantage of the GPU

structures We developed the WPL descriptor to incorporate mask

information and make it more suitable for parallel computing which can

dramatically reduce data transferring and computation We then carefully

mapped our algorithms to GPU threads and blocks which is an important

step to achieve parallel computation efficiency using a GPU A work flow

which has high arithmetic intensity to hide the memory access latency was

designed to partition the computation task to the heterogeneous system of

CPU and GPU even to the threads in GPU The proposed method

dramatically improves the matching efficiency without compromising

recognition accuracy

62 REFERENCES

[1] C W Oyster The Human Eye Structure and Function Sunderland

Sinauer Associates 1999

[2] C Cuevas D Berjon F Moran and N Garcia ldquoMoving object

detection for real-time augmented reality applications in a GPGPUrdquo IEEE

Trans Consum Electron vol 58 no 1 pp 117ndash125 Feb 2012

[3] D C Cirean U Meier L M Gambardella and J Schmidhuber ldquoDeep

big simple neural nets for handwritten digit recognitionrdquo Neural Comput

vol 22 no 12 pp 3207ndash3220 2010

[4] F Z Sakr M Taher and A M Wahba ldquoHigh performance iris

recognition system on GPUrdquo in Proc ICCES 2011 pp 237ndash242

[5] G Poli J H Saito J F Mari and M R Zorzan ldquoProcessing

neocognitron of face recognition on high performance environment based

on GPU with CUDA architecturerdquo in Proc 20th Int Symp Comput

Archit High Perform Comput 2008 pp 81ndash88

[6] J Antikainen J Havel R Josth A Herout P Zemcik and M Hauta-

Kasari ldquoNonnegative tensor factorization accelerated using GPGPUrdquo IEEE

Trans Parallel Distrib Syst vol 22 no 7 pp 1135ndash1141 Feb 2011

[7] K-S Oh and K Jung ldquoGPU implementation of neural networksrdquo

Pattern Recognit vol 37 no 6 pp 1311ndash1314 2004

[8] P R Dixon T Oonishi and S Furui ldquoHarnessing graphics processors

for the fast computation of acoustic likelihoods in speech recognitionrdquo

Comput Speech Lang vol 23 no 4 pp 510ndash526 2009

[9] P Kaufman and A Alm ldquoClinical applicationrdquo Adlerrsquos Physiology of

the Eye 2003

[10] R N Rakvic B J Ulis R P Broussard R W Ives and N Steiner

ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics Security vol 4

no 4 pp 812ndash823 Dec 2009

[11] S Crihalmeanu and A Ross ldquoMultispectral scleral patterns for ocular

biometric recognitionrdquo Pattern Recognit Lett vol 33 no 14 pp 1860ndash

1869 Oct 2012

[12] W Wenying Z Dongming Z Yongdong L Jintao and G

Xiaoguang ldquoRobust spatial matching for object retrieval and its parallel

implementation on GPUrdquo IEEE Trans Multimedia vol 13 no 6 pp

1308ndash1318 Dec 2011Multimedia Sec Magdeburg Germany Sep 2004

pp 4ndash15

[13] Y Xu S Deka and R Righetti ldquoA hybrid CPU-GPGPU approach for

real-time elastographyrdquo IEEE Trans Ultrason Ferroelectr Freq Control

vol 58 no 12 pp 2631ndash2645 Dec 2011

[14] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive

multimodal eye recognitionrdquo Signal Image Video Process vol 7 no 4

pp 619ndash631 Jul 2013

[15] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive

approach for sclera image quality measurerdquo Int J Biometrics vol 5 no 2

pp 181ndash198 2013

[16] Z Zhou E Y Du N L Thomas and E J Delp ldquoA new human

identification method Sclera recognitionrdquo IEEE Trans Syst Man

Cybern A Syst Humans vol 42 no 3 pp 571ndash583 May 2012

Page 35: Major project report on

conjunctiva which is the focus of this study The ocular conjunctiva is very

thin and clear thus the vasculature (including those of the episclera) is

easily visible through it The visible microcirculation of conjunctiva offers a

rich and complex network of veins and fine microcirculation (Fig 1) The

apparent complexity and specificity of these vascular patterns motivated us

to utilize them for personal identification (Derakhshani and Ross 2006)

FIG

We have found conjunctival vasculature to be a suitable biometric as it

conforms to the following criteria (Jain et al 2004)

UNIVERSALITY All normal living tissues including that of the

conjunctiva and episclera have vascular structure

UNIQUENESS Vasculature is created during embryonic vasculogenesis

Its detailed final structure is mostly stochastic and thus unique Even

though no comprehensive study on the uniqueness of vascular structures

has been conducted study of some targeted areas such as those of the eye

fundus confirm the uniqueness of such vascular patterns even between

identical twins (Simon and Goldstein 1935 Tower 1955)

PERMANENCE Other than cases of significant trauma pathology or

chemical intervention spontaneous adult ocular vasculogenesis and

angiogenesis does not easily occur Thus the conjunctival vascular

structure is expected to have reasonable permanence (Joussen 2001)

Practicality Conjunctival vasculature can be captured with commercial off

the shelf digital cameras under normal lighting conditions making this

modality highly practical

ACCEPTABILITY Since the subject is not required to stare directly into

the camera lens and given the possibility of capturing the conjunctival

vasculature from several feet away this modality is non-intrusive and thus

more acceptable

SPOOF-PROOFNESS The fine multi surface structure of the ocular

veins makes them hard to reproduce as a physical artifact Besides being a

stand-alone biometric modality we anticipate that the addition of

conjunctival biometrics will enhance the performance of current iris-based

biometric system in the following ways

Improving accuracy by the addition of vascular features

Facilitating recognition using off-angle iris images For instance if the iris

information is relegated to the left or right portions of the eye the sclera

vein patterns will be further exposed This feature makes sclera vasculature

a natural complement to the iris biometric

Addressing the failure-to-enroll issue when iris patterns are not usable (eg

due to surgical procedures)

Reducing vulnerability to spoof attacks For instance when implemented

alongside iris systems an attacker needs to reproduce not only the iris but

also different surfaces of the sclera along with the associated

microcirculation and make them available on commensurate eye surfaces

The first step in parallelizing an algorithm is to determine the

availability for simultaneous computation Below Figure demonstrates the

possibility for parallel directional filtering Since the filter is computed over

different portions of the input image the computation can be computed in

parallel (denoted by Elements below) In addition individual parallelization

of each element of Filtering can also be performed A detailed discussion of

our proposed parallelization is outside the scope of this paper

FIG

FIG

225 OVERVIEW OF THE LINE DESCRIPTOR-BASED SCLERA

VEIN

2251 RECOGNITION METHOD

The matching segment of the line-descriptor based method is a

bottleneck with regard to matching speed In this section we briefly

describe the Line Descriptor-based sclera vein recognition method After

segmentation vein patterns were enhanced by a bank of directional Gabor

filters Binary morphological operations are used to thin the detected vein

structure down to a single pixel wide skeleton and remove the branch

points The line descriptor is used to describe the segments in the vein

structure Figure 2 shows a visual description of the line descriptor Each

segment is described by three quantities the segments angle to some

reference angle at the iris center θ the segments distance to the iris center r

and the dominant angular orientation of the line segment ɸ Thus the

descriptor is S = ( θ r ɸ )T The individual components of the line descriptor

are calculated as

FIG

Here fline (x) is the polynomial approximation of the line segment (xl yl )

is the center point of the line segment (xi yi ) is the center of the detected

iris and S is the line descriptor In order to register the segments of the

vascular patterns a RANSAC-based algorithm is used to estimate the best-

fit parameters for registration between the two sclera vascular patterns For

the registration algorithm it randomly chooses two points ndash one from the

test template and one from the target template It also randomly chooses a

scaling factor and a rotation value based on a priori knowledge of the

database Using these values it calculates a fitness value for the registration

using these parameters

After sclera template registration each line segment in the test

template is compared to the line segments in the target template for

matches In order to reduce the effect of segmentation errors we created the

weighting image (Figure 3) from the sclera mask by setting interior pixels

in the sclera mask to 1 pixels within some distance of the boundary of the

mask to 05 and pixels outside the mask to 0

The matching score for two segment descriptors is calculated By

where Si and Sj are two segment descriptors m(Si Sj ) is the matching

score between segments Si and Sj d(Si Sj ) is the Euclidean distance

between the segment descriptors center points (from Eq 6-8) Dmatch is

the matching distance threshold and match is the matching angle threshold

The total matching score M is the sum of the individual matching scores

divided by the maximum matching score for the minimal set between the

test and target template That is one of the test or target templates has fewer

points and thus the sum of its descriptors weight sets the maximum score

that can be attained

FIG

FIG

FIG

FIG

movement of eye Y shape branches are observed to be a stable feature and

can be used as sclera feature descriptor To detect the Y shape branches in

the original template we search for the nearest neighbors set of every line

segment in a regular distance classified the angles among these neighbors

If there were two types of angle values in the line segment set this set may

be inferred as a Y shape structure and the line segment angles would be

recorded as a new feature of the sclera

There are two ways to measure both orientation and relationship of

every branch of Y shape vessels one is to use the angles of every branch to

x axle the other is to use the angels between branch and iris radial

direction The first method needs additional rotation operating to align the

template In our approach we employed the second method As Figure 6

shows ϕ1 ϕ2 and ϕ3 denote the angle between each branch and the radius

from pupil center Even when the head tilts the eye moves or the camera

zooms occurs at the image acquisition step ϕ1 ϕ2 and ϕ3 are quite stable

To tolerate errors from the pupil center calculation in the segmentation step

we also recorded the center position (x y) of the Y shape branches as

auxiliary parameters So in our rotation shift and scale invariant feature

vector is defined as y(ϕ1 ϕ2 ϕ3 x y) The Y-shape descriptor is generated

with reference to the iris center Therefore it is automatically aligned to the

iris centers It is a rotational- and scale- invariant descriptor V WPL

SCLERA DESCRIPTOR As we discussed in the Section 22 the line

descriptor is extracted from the skeleton of vessel structure in binary images

(Figure 7) The skeleton is then broken into smaller segments For each

segment a line descriptor is created to record the center and orientation of

the segment This descriptor is expressed as s(x yɸ) where (x y) is the

position of the center and ɸ is its orientation Because of the limitation of

segmentation accuracy the descriptor in the boundary of sclera area might

not be accurate and may contain spur edges resulting from the iris eyelid

andor eyelashes To be tolerant of such error the mask file

FIG

The line descriptor of the sclera vessel pattern (a) An eye image (b) Vessel

patterns in sclera (c) Enhanced sclera vessel patterns (d) Centers of line

segments of vessel patterns

Is designed to indicate whether a line segment belongs to the edge of the

sclera or not However in GPU application using the mask is a challenging

since the mask files are large in size and will occupy the GPU memory and

slow down the data transfer When matching the registration RANSAC

type algorithm was used to randomly select the corresponding descriptors

and the transform parameter between them was used to generate the

template transform affine matrix After every templates transform the mask

data should also be transformed and new boundary should be calculated to

evaluate the weight of the transformed descriptor This results in too many

convolutions in processor unit

To reduce heavy data transfer and computation we designed the

weighted polar line (WPL) descriptor structure which includes the

information of mask and can be automatically aligned We extracted the

relationship of geometric feature of descriptors and store them as a new

descriptor We use a weighted image created via setting various weight

values according to their positions The weight of those descriptors who are

beyond the sclera are set to be 0 and those who are near the sclera

boundary are 05 and interior descriptors are set to be 1 In our work

descriptors weights were calculated on their own mask by the CPU only

once

The calculating result was saved as a component of descriptor The

descriptor of sclera will change to s(x y ɸw) where w denotes the weight

of the point and the value may be 0 05 1 To align two templates when a

template is shifted to another location along the line connecting their

centers all the descriptors of that template will be transformed It would be

faster if two templates have similar reference points If we use the center of

the iris as the reference point when two templates are compared the

correspondence will automatically be aligned to each other since they have

the similar reference point Every feature vector of the template is a set of

line segment descriptors composed of three variable (Figure 8) the

segment angle to the reference line which went through the iris center

denoted as θ the distance between the segments center and pupil center

which is denoted as r the dominant angular orientation of the segment

denoted as ɸ To minimize the GPU computing we also convert the

descriptor value from polar coordinate to rectangular coordinate in CPU

preprocess

The descriptor vector becomes s(x y r θ ɸw) The left and right

parts of sclera in an eye may have different registration parameters For

example as an eyeball moves left left part sclera patterns of the eye may be

compressed while the right part sclera patterns are stretched

In parallel matching these two parts are assigned to threads in

different warps to allow different deformation The multiprocessor in

CUDA manages threads in groups of 32 parallel threads called warps We

reorganized the descriptor from same sides and saved

FIG

FIG

them in continuous address This would meet requirement of coalesced

memory access in GPU

After reorganizing the structure of descriptors and adding mask information

into the new descriptor the computation on the mask file is not needed on

the GPU It was very fast to match with this feature because it does not

need to reregister the templates every time after shifting Thus the cost of

data transfer and computation on GPU will be reduced Matching on the

new descriptor the shift parameters generator in Figure 4 is then simplified

as Figure 9

23 EVOLUTION OF GPU ARCHITECTURE

The fixed-function pipeline lacked the generality to efficiently express

more complicated shading and lighting operations that are essential for

complex effects The key step was replacing the fixed-function per-vertex

and per-fragment operations with user-specified programs run on each

vertex and fragment Over the past six years these vertex programs and

fragment programs have become increasingly more capable with larger

limits on their size and resource consumption with more fully featured

instruction sets and with more flexible control-flow operations After many

years of separate instruction sets for vertex and fragment operations current

GPUs support the unified Shader Model 40 on both vertex and fragment

shaders

The hardware must support shader programs of at least 65 k static

instructions and unlimited dynamic instructions

The instruction set for the first time supports both 32-bit integers and 32-

bit floating-point numbers

The hardware must allow an arbitrary number of both direct and indirect

reads from global memory (texture)

Finally dynamic flow control in the form of loops and branches must be

supported

As the shader model has evolved and become more powerful and GPU

applications of all types have increased vertex and fragment program

complexity GPU architectures have increasingly focused on the

programmable parts of the graphics pipeline Indeed while previous

generations of GPUs could best be described as additions of

programmability to a fixed-function pipeline todayrsquos GPUs are better

characterized as a programmable engine surrounded by supporting fixed-

function units General-Purpose Computing on the GPU Mapping general-

purpose computation onto the GPU uses the graphics hardware in much the

same way as any standard graphics application Because of this similarity it

is both easier and more difficult to explain the process On one hand the

actual operations are the same and are easy to follow on the other hand the

terminology is different between graphics and general-purpose use Harris

provides an excellent description of this mapping process

We begin by describing GPU programming using graphics terminology

then show how the same steps are used in a general-purpose way to author

GPGPU applications and finally use the same steps to show the more

simple and direct way that todayrsquos GPU computing applications are written

231 PROGRAMMING A GPU FOR GRAPHICS

We begin with the same GPU pipeline that we described in Section II

concentrating on the programmable aspects of this pipeline

The programmer specifies geometry that covers a region on the screen

The rasterizer generates a fragment at each pixel location covered by that

geometry

Each fragment is shaded by the fragment program

The fragment program computes the value of the fragment by a

combination of math operations and global memory reads from a global

Btexture[ memory

The resulting image can then be used as texture on future passes through

the graphics pipeline

232 PROGRAMMING A GPU FOR GENERAL-PURPOSE

PROGRAMS (OLD)

Coopting this pipeline to perform general-purpose computation

involves the exact same steps but different terminology A motivating

example is a fluid simulation computed over a grid at each time step we

compute the next state of the fluid for each grid point from the current state

at its grid point and at the grid points of its neighbors

The programmer specifies a geometric primitive that covers a

computation domain of interest The rasterizer generates a fragment at each

pixel location covered by that geometry (In our example our primitive

must cover a grid of fragments equal to the domain size of our fluid

simulation)

Each fragment is shaded by an SPMD general-purpose fragment

program (Each grid point runs the same program to update the state of its

fluid)

The fragment program computes the value of the fragment by a

combination of math operations and Bgather[ accesses from global

memory (Each grid point can access the state of its neighbors from the

previous time step in computing its current value)

The resulting buffer in global memory can then be used as an input on

future passes (The current state of the fluid will be used on the next time

step)

23 2 PROGRAMMING A GPU FOR GENERAL-PURPOSE

PROGRAMS (NEW)

One of the historical difficulties in programming GPGPU applications

has been that despite their general-purpose tasksrsquo having nothing to do with

graphics the applications still had to be programmed using graphics APIs

In addition the program had to be structured in terms of the graphics

pipeline with the programmable units only accessible as an intermediate

step in that pipeline when the programmer would almost certainly prefer to

access the programmable units directly The programming environments we

describe in detail in Section IV are solving this difficulty by providing a

more natural direct non-graphics interface to the hardware and

specifically the programmable units Today GPU computing applications

are structured in the following way

The programmer directly defines the computation domain of interest as a

structured grid of threads

An SPMD general-purpose program computes the value of each thread

The value for each thread is computed by a combination of math

operations and both Bgather[ (read) accesses from and Bscatter[ (write)

accesses to global memory Unlike in the previous two

methods the same buffer can be used for both reading and writing

allowing more flexible algorithms (for example in-place algorithms that

use less memory)

The resulting buffer in global memory can then be used as an input in

future computation

24 COARSE-TO-FINE TWO-STAGE MATCHING PROCESS

To further improve the matching process we propose the coarse-to-fine

two-stage matching process In the first stage we matched two images

coarsely using the Y-shape descriptors which is very fast to match because

no registration was needed The matching result in this stage can help filter

out image pairs with low similarities After this step it is still possible for

some false positive matches In the second stage we used WPL descriptor

to register the two images for more detailed descriptor matching including

scale- and translation invariance This stage includes shift transform affine

matrix generation and final WPL descriptor matching Overall we

partitioned the registration and matching processing into four kernels2 in

CUDA (Figure 10) matching on the Y shape descriptor shift

transformation affine matrix generation and final WSL descriptor

matching Combining these two stages the matching program can run faster

and

achieve more accurate score

241 STAGE I MATCHING WITH Y SHAPE DESCRIPTOR

Due to scale- and rotation- invariance of the Y-shape features

registration is unnecessary before matching on Y shape descriptor The

whole matching algorithm is listed as algorithm 1

FIG

Here ytei and yta j are the Y shape descriptors of test template Tte

and target template Tta respectively dϕ is the Euclidian distance of angle

element of descriptors vector defined as (3) dxy is the Euclidian distance of

two descriptor centers defined as (4) ni and di are the matched descriptor

pairsrsquo number and their centers distance respectively tϕ is a distance

threshold and txy is the threshold to restrict the searching area We set tϕ to

30 and txy to 675 in our experiment Here

To match two sclera templates we searched the areas nearby to all

the Y shape branches The search area is limited to the corresponding left or

right half of the sclera in order to reduce the searching range and time The

distance of two branches is defined in (3) where ϕi j is the angle between

the j th branch and the polar from pupil center in desctiptor i

The number of matched pairs ni and the distance between Y shape

branches centers di are stored as the matching result We fuse the number of

matched branches and the average distance between matched branches

centers as (2) Here α is a factor to fuse the matching score which was set

to 30 in our study Ni and Nj is the total numbers of feature vectors in

template i and j separately The decision is regulated by the threshold t if

the sclerarsquos matching score is lower than t the sclera will be discarded The

sclera with high matching score will be passed to the next more precisely

matching process

242 STAGE II FINE MATCHING USING WPL DESCRIPTOR

The line segment WSL descriptor reveals more vessel structure detail of

sclera than the Y shape descriptor The variation of sclera vessel pattern is

nonlinear because

When acquiring an eye image in different gaze angle the vessel structure

will appear nonlinear shrink or extend because eyeball is spherical in shape

sclera is made up of four layers episclera stroma lamina fusca and

endothelium There are slightly differences among movement of these

layers Considering these factors our registration employed both single

shift transform and multi-parameter transform which combines shift

rotation and scale together

1) SHIFT PARAMETER SEARCH As we discussed before

segmentation may not be accurate As a result the detected iris center could

not be very accurate Shift transform is designed to tolerant possible errors

in pupil center detection in the segmentation step If there is no deformation

or only very minor deformation registration with shift transform together

would be adequate to achieve an accurate result We designed Algorithm 2

to get optimized shift parameter Where Tte is the test template and ssei is

the i th WPL descriptor of Tte Tta is the target template and ssai is the i th

WPL descriptor of Ttad(stek staj ) is Euclidean distance of escriptors stek

and staj

Δsk is the shift value of two descriptors defines as

We first randomly select an equal number of segment descriptors

stek in test template Tte from each quad and find its nearest neighbors staj _

in target template Tta The shift offset of them is recorded as the possible

registration shift factor _sk The final offset registration factor is _soptim

which has the smallest standard deviation among these candidate offsets

2) AFFINE TRANSFORM PARAMETER SEARCH

Affine transform is designed to tolerant some deformation of sclera

patterns in the matching step The affine transform algorithm is shown in

Algorithm 3 The shift value in the parameter set is obtained by randomly

selecting descriptor s(it )te and calculating the distance from its nearest

neighbor staj_ in Tta We transform the test template by the matrix in (7)

At end of the iteration we count the numbers of matched descriptor pairs

from the transformed template and the target template The factor β is

involved to determine if the pair of descriptor is matched and we set it to

be 20 pixels in our experiment After N iterations the optimized transform

parameter set is determined via selecting the maximum matching numbers

m(it) Here stei Tte staj and Tta is defined same as algorithm 2 tr (it )

shi f t θ(it )tr (it ) scale is the parameters of shift rotation and scale

transform generated in i tth iteration R(θ (it )) T (tr (it ) shi f t ) and S(tr (it

) scale) are the transform matrix defined as (7) To search optimize

transform parameter we iterated N times to generate these parameters In

our experiment we set iteration time to 512

3) REGISTRATION AND MATCHING ALGORITHM

Using the optimized parameter set determined from Algorithms 2

and 3 the test template will be registered and matched simultaneously The

registration and matching algorithm is listed in Algorithm 4 Here stei Tte

staj and Tta are defined same as Algorithms 2 and 3 θ(optm) tr (optm) shi f

t tr (optm) scale _soptim are the registration parameters attained from

Algorithms 2 and 3 R_θ(optm)_T _tr (optm) shi f t _S_tr (optm) scale

_ is the descriptor transform matrix defined in Algorithm 3 empty is the angle

between the segment descriptor and radius direction w is the weight of the

descriptor which indicates whether the descriptor is at the edge of sclera or

not To ensure that the nearest descriptors have a similar orientation we

used a constant factor α to check the abstract difference of two ɸ In our

experiment we set α to 5 The total matching score is minimal score of two

transformed result divided by the minimal matching score for test template

and target template

25 MAPPING THE SUBTASKS TO CUDA

CUDA is a single instruction multiple data (SIMD) system and

works as a coprocessor with a CPU A CUDA consists of many streaming

multiprocessors (SM) where the parallel part of the program should be

partitioned into threads by the programmer and mapped into those threads

There are multiple memory spaces in the CUDA memory hierarchy

register local memory shared memory global memory constant memory

and texture memory Register local memory and shared memory are on-

chip and could be a little time consuming to access these memories Only

shared memory can be accessed by other threads within the same block

However there is only limited availability of shared memory Global

memory constant memory and texture memory are off-chip memory and

accessible by all threads which would be very time consuming to access

these memories

Constant memory and texture memory are read-only and cacheable

memory Mapping algorithms to CUDA to achieve efficient processing is

not a trivial task There are several challenges in CUDA programming

If threads in a warp have different control path all the branches will be

executed serially To improve performance branch divergence within a

warp should be avoided

Global memory is slower than on-chip memory in term of access To

completely hide the latency of the small instructions set we should use on-

chip memory preferentially rather than global memory When global

memory access occurs threads in same warp should access the words in

sequence to achieve coalescence

Shared memory is much faster than the local and global memory space

But shared memory is organized into banks which are equal in size If two

addresses of memory request from different thread within a warp fall in the

same memory bank the access will be serialized To get maximum

performance memory requests should be scheduled to minimize bank

conflicts

251 MAPPING ALGORITHM TO BLOCKS

Because the proposed registration and matching algorithm has four

independent modules all the modules will be converted to different kernels

on the GPU These kernels are different in computation density thus we

map them to the GPU by various map strategies to fully utilize the

computing power of CUDA Figure 11 shows our scheme of CPU-GPU

task distribution and the partition among blocks and threads Algorithm 1 is

partitioned into coarse-grained parallel subtasks

We create a number of threads in this kernel The number of threads

is the same as the number of templates in the database As the upper middle

column shows in Figure 11 each target template will be assigned to one

thread One thread performs a pair of templates compare In our work we

use NVIDIA C2070 as our GPU Threads and blocks number is set to

1024 That means we can match our test template with up to 1024times1024

target templates at same time

Algorithms 2-4 will be partitioned into fine-grained subtasks which is

processed a section of descriptors in one thread As the lower portion of the

middle column shows in Figure 11 we assigned a target template to one

block Inside a block one thread corresponds a set of descriptors in this

template This partition makes every block execute independently and there

are no data exchange requirements between different blocks When all

threads complete their responding descriptor fractions the sum of the

intermediate results needs to be computed or compared A parallel prefix

sum algorithm is used to calculate the sum of intermediate results which is

show in right of Figure 11 Firstly all odd number threads compute the sum

of consecutive pairs of the results Then recursively every first of i (= 4 8

16 32 64 ) threads

compute the prefix sum on the new result The final result will be saved in

the first address which has the same variable name as the first intermediate

result

252 MAPPING INSIDE BLOCK

In shift argument searching there are two schemes we can choose to

map task

Mapping one pair of templates to all the threads in a block and then every

thread would take charge of a fraction of descriptors and cooperation with

other threads

Assigning a single possible shift offset to a thread and all the threads will

compute independently unless the final result should be compared with

other possible offset

Due to great number of sum and synchronization operations in every

nearest neighbor searching step we choose the second method to parallelize

shift searching In affine matrix generator we mapped an entire parameter

set searching to a thread and every thread randomly generated a set of

parameters and tried them independently The generated iterations were

assigned to all threads The challenge of this step is the randomly generated

numbers might be correlated among threads In the step of rotation and

scale registration generating we used the Mersenne Twister pseudorandom

number generator because it can use bitwise arithmetic and have long

period

The Mersenne twister as most of pseudorandom generators is iterative

Therefore itrsquos hard to parallelize a single twister state update step among

several execution threads To make sure that thousands of threads in the

launch grid generate uncorrelated random sequence many simultaneous

Mersenne twisters need to process with different initial states in parallel

But even ldquovery differentrdquo (by any definition) initial state values do not

prevent the emission of correlated sequences by each generator sharing

identical parameters To solve this problem and to enable efficient

implementation of Mersenne Twister on parallel architectures we used a

special offline tool for the dynamic creation of Mersenne Twisters

parameters modified from the algorithm developed by Makoto Matsumoto

and Takuji Nishimura In the registration and matching step when

searching the nearest neighbor a line segment that has already matched

with others should not be used again In our approach a flag

FIG

FIG

Variable denoting whether the line has been matched is stored in

shared memory To share the flags all the threads in a block should wait

synchronic operation at every query step Our solution is to use a single

thread in a block to process the matching

253 MEMORY MANAGEMENT

The bandwidth inside GPU board is much higher than the

bandwidth between host memory and device memory The data transfer

between host and device can lead to long latency As shown in Figure 11

we load the entire target templates set from database without considering

when they would be processed Therefore there was no data transfer from

host to device during the matching procedure In global memory the

components in descriptor y(ϕ1 ϕ2 ϕ3 x y) and s(x y rθ ϕw) were stored

separately This would guarantee contiguous kernels of Algorithm 2 to 4

can access their data in successive addresses Although such coalescing

access reduces the latency frequently global memory access was still a

slower way to get data In our kernel we loaded the test template to shared

memory to accelerate memory access Because the Algorithms 2 to 4

execute different number of iterations on same data the bank conflict does

not happen To maximize our texture memory space we set the system

cache to the lowest value and bonded our target descriptor to texture

memory Using this catchable memory our data access was accelerated

more

FIG

26 HISTOGRAM OF ORIENTED GRADIENTS

Histogram of oriented gradients is the feature descriptors It is primarily

applied to the design of target detection In this paper it is applied as the

feature for human recognition In the sclera region the vein patterns are the

edges of an image So HOG is used to determine the gradient orientation

and edge orientations of vein pattern in the sclera region of an eye image

To follow out this technique first of all divide the image into small

connected regions called cells For each cell compute the histogram of

gradient directions or edge orientations of the pixels Then the combination

of different histogram of different cell represents the descriptor To improve

accuracy histograms can be contrast normalized by calculating the intensity

from the block and then using this value normalizes all cells within the

block This normalization result shows that it is invariant to geometric and

photometric changes The gradient magnitude m(x y) and orientation 1050592(x

y) are calculated using x and y directions gradients dx (x y) and dy (x y)

Orientation binning is the second step of HOG This method utilized

to create cell histograms Each pixel within the cell used to give a weight to

the orientation which is found in the gradient computation Gradient

magnitude is used as the weight The cells are in the rectangular form The

binning of gradient orientation should be spread over 0 to 180 degrees and

opposite direction counts as the same In the Fig 8 depicts the edge

orientation of picture elements If the images have any illumination and

contrast changes then the gradient strength must be locally normalized For

that cells are grouped together into larger blocks These blocks are

overlapping so that each cell contributes more than once to the final

descriptor Here rectangular HOG (R-HOG) blocks are applied which are

mainly in square grids The performance of HOG is improved by putting

on a Gaussian window into each block

FIG

CHAPTER 3

SOFTWARE SPECIFICATION

31 GENERAL

MATLAB(matrix laboratory) is a numerical

computing environment and fourth-generation programming language

Developed by Math Works MATLAB allows matrix manipulations

plotting of functions and data implementation of algorithms creation

of user interfaces and interfacing with programs written in other languages

including C C++ Java and Fortran

Although MATLAB is intended primarily for numerical computing an

optional toolbox uses the MuPAD symbolic engine allowing access

to symbolic computing capabilities An additional package Simulink adds

graphicalmulti-domainsimulationandModel-Based

Design for dynamic and embedded systems

In 2004 MATLAB had around one million users across industry

and academia MATLAB users come from various backgrounds

of engineering science and economics MATLAB is widely used in

academic and research institutions as well as industrial enterprises

MATLAB was first adopted by researchers and practitioners

in control engineering Littles specialty but quickly spread to many other

domains It is now also used in education in particular the teaching

of linear algebra and numerical analysis and is popular amongst scientists

involved in image processing The MATLAB application is built around the

MATLAB language The simplest way to execute MATLAB code is to type

it in the Command Window which is one of the elements of the MATLAB

Desktop When code is entered in the Command Window MATLAB can

be used as an interactive mathematical shell Sequences of commands can

be saved in a text file typically using the MATLAB Editor as a script or

encapsulated into a function extending the commands available

MATLAB provides a number of features for documenting and

sharing your work You can integrate your MATLAB code with other

languages and applications and distribute your MATLAB algorithms and

applications

32 FEATURES OF MATLAB

High-level language for technical computing

Development environment for managing code files and data

Interactive tools for iterative exploration design and problem solving

Mathematical functions for linear algebra statistics Fourier analysis

filtering optimization and numerical integration

2-D and 3-D graphics functions for visualizing data

Tools for building custom graphical user interfaces

Functions for integrating MATLAB based algorithms with external

applications and languages such as C C++ FORTRAN Javatrade COM

and Microsoft Excel

MATLAB is used in vast area including signal and image

processing communications control design test and measurement

financial modeling and analysis and computational Add-on toolboxes

(collections of special-purpose MATLAB functions) extend the MATLAB

environment to solve particular classes of problems in these application

areas

MATLAB can be used on personal computers and powerful

server systems including the Cheaha compute cluster With the addition of

the Parallel Computing Toolbox the language can be extended with parallel

implementations for common computational functions including for-loop

unrolling Additionally this toolbox supports offloading computationally

intensive workloads to Cheaha the campus compute cluster MATLAB is

one of a few languages in which each variable is a matrix (broadly

construed) and knows how big it is Moreover the fundamental operators

(eg addition multiplication) are programmed to deal with matrices when

required And the MATLAB environment handles much of the bothersome

housekeeping that makes all this possible Since so many of the procedures

required for Macro-Investment Analysis involves matrices MATLAB

proves to be an extremely efficient language for both communication and

implementation

321 INTERFACING WITH OTHER LANGUAGES

MATLAB can call functions and subroutines written in the C

programming language or FORTRAN A wrapper function is created

allowing MATLAB data types to be passed and returned The dynamically

loadable object files created by compiling such functions are termed MEX-

files (for MATLAB executable)

Libraries written in Java ActiveX or NET can be directly called

from MATLAB and many MATLAB libraries (for

example XML or SQL support) are implemented as wrappers around Java

or ActiveX libraries Calling MATLAB from Java is more complicated but

can be done with MATLAB extension which is sold separately by Math

Works or using an undocumented mechanism called JMI (Java-to-Mat lab

Interface) which should not be confused with the unrelated Java that is also

called JMI

As alternatives to the MuPAD based Symbolic Math Toolbox

available from Math Works MATLAB can be connected

to Maple or Mathematical

Libraries also exist to import and export MathML

Development Environment

Startup Accelerator for faster MATLAB startup on Windows especially on

Windows XP and for network installations

Spreadsheet Import Tool that provides more options for selecting and

loading mixed textual and numeric data

Readability and navigation improvements to warning and error messages in

the MATLAB command window

Automatic variable and function renaming in the MATLAB Editor

Developing Algorithms and Applications

MATLAB provides a high-level language and development

tools that let you quickly develop and analyze your algorithms and

applications

The MATLAB Language

The MATLAB language supports the vector and matrix operations

that are fundamental to engineering and scientific problems It enables fast

development and execution With the MATLAB language you can

program and develop algorithms faster than with traditional languages

because you do not need to perform low-level administrative tasks such as

declaring variables specifying data types and allocating memory In many

cases MATLAB eliminates the need for lsquoforrsquo loops As a result one line of

MATLAB code can often replace several lines of C or C++ code

At the same time MATLAB provides all the features of a traditional

programming language including arithmetic operators flow control data

structures data types object-oriented programming (OOP) and debugging

features

MATLAB lets you execute commands or groups of commands one

at a time without compiling and linking enabling you to quickly iterate to

the optimal solution For fast execution of heavy matrix and vector

computations MATLAB uses processor-optimized libraries For general-

purpose scalar computations MATLAB generates machine-code

instructions using its JIT (Just-In-Time) compilation technology

This technology which is available on most platforms provides

execution speeds that rival those of traditional programming languages

Development Tools

MATLAB includes development tools that help you implement

your algorithm efficiently These include the following

MATLAB Editor

Provides standard editing and debugging features such as setting

breakpoints and single stepping

Code Analyzer

Checks your code for problems and recommends modifications to

maximize performance and maintainability

MATLAB Profiler

Records the time spent executing each line of code

Directory Reports

Scan all the files in a directory and report on code efficiency file

differences file dependencies and code coverage

Designing Graphical User Interfaces

By using the interactive tool GUIDE (Graphical User Interface

Development Environment) to layout design and edit user interfaces

GUIDE lets you include list boxes pull-down menus push buttons radio

buttons and sliders as well as MATLAB plots and Microsoft

ActiveXreg controls Alternatively you can create GUIs programmatically

using MATLAB functions

322 ANALYZING AND ACCESSING DATA

MATLAB supports the entire data analysis process from acquiring

data from external devices and databases through preprocessing

visualization and numerical analysis to producing presentation-quality

output

Data Analysis

MATLAB provides interactive tools and command-line functions for data

analysis operations including

Interpolating and decimating

Extracting sections of data scaling and averaging

Thresholding and smoothing

Correlation Fourier analysis and filtering

1-D peak valley and zero finding

Basic statistics and curve fitting

Matrix analysis

Data Access

MATLAB is an efficient platform for accessing data from

files other applications databases and external devices You can read data

from popular file formats such as Microsoft Excel ASCII text or binary

files image sound and video files and scientific files such as HDF and

HDF5 Low-level binary file IO functions let you work with data files in

any format Additional functions let you read data from Web pages and

XML

Visualizing Data

All the graphics features that are required to visualize engineering

and scientific data are available in MATLAB These include 2-D and 3-D

plotting functions 3-D volume visualization functions tools for

interactively creating plots and the ability to export results to all popular

graphics formats You can customize plots by adding multiple axes

changing line colors and markers adding annotation Latex equations and

legends and drawing shapes

2-D Plotting

Visualizing vectors of data with 2-D plotting functions that create

Line area bar and pie charts

Direction and velocity plots

Histograms

Polygons and surfaces

Scatterbubble plots

Animations

3-D Plotting and Volume Visualization

MATLAB provides functions for visualizing 2-D matrices 3-

D scalar and 3-D vector data You can use these functions to visualize and

understand large often complex multidimensional data Specifying plot

characteristics such as camera viewing angle perspective lighting effect

light source locations and transparency

3-D plotting functions include

Surface contour and mesh

Image plots

Cone slice stream and isosurface

323 PERFORMING NUMERIC COMPUTATION

MATLAB contains mathematical statistical and engineering

functions to support all common engineering and science operations These

functions developed by experts in mathematics are the foundation of the

MATLAB language The core math functions use the LAPACK and BLAS

linear algebra subroutine libraries and the FFTW Discrete Fourier

Transform library Because these processor-dependent libraries are

optimized to the different platforms that MATLAB supports they execute

faster than the equivalent C or C++ code

MATLAB provides the following types of functions for performing

mathematical operations and analyzing data

Matrix manipulation and linear algebra

Polynomials and interpolation

Fourier analysis and filtering

Data analysis and statistics

Optimization and numerical integration

Ordinary differential equations (ODEs)

Partial differential equations (PDEs)

Sparse matrix operations

MATLAB can perform arithmetic on a wide range of data types

including doubles singles and integers

CHAPTER 4

IMPLEMENTATION

41 GENERAL

Matlab is a program that was originally designed to simplify the

implementation of numerical linear algebra routines It has since grown into

something much bigger and it is used to implement numerical algorithms

for a wide range of applications The basic language used is very similar to

standard linear algebra notation but there are a few extensions that will

likely cause you some problems at first

42 SNAPSHOTS

ORIGINAL SCLERA IMAGE IS CONVERTED INTO GREY SCALE IMAGE

FIG

GREY SCALE IMAGE IS CONVERTED INTO BINARY IMAGE

FIG

EDGE DETECTON IS DONE BY OTSUrsquoS THRESHOLDING

FIG

SELECTING THE REGION OF INTEREST (SCLERA PART)

FIG

SELECTED ROI PART

FIG

FIG

ENHANCEMENT OF SCLERA IMAGE

FIG

FEATURE EXTRACTION OF SCLERA IMAGE USING GABOR

FILTERS

FIG

MATCHING WITH IMAGES IN DATABASE

FIG

DISPLAYING THE RESULT (MATCHED OR NOT MATCHED)

FIG

CHAPTER 5

APPLICATIONS

The applications of biometrics can be divided into the following three main groups

Commercial applications such as computer network login electronic data security ecommerce Internet access ATM credit card physical access control cellular phone PDA medical records management distance learning etc

Government applications such as national ID card correctional facility driverrsquos license social security welfare-disbursement border control Passport control etc

Forensic applications such as corpse identification criminal investigation terrorist identification parenthood determination missing children etc Traditionally commercial applications have used knowledge-based systems (eg PIN sand passwords) government applications have used token-based systems (eg ID cards and badges) and forensic applications have relied on human experts to match biometric features Biometric systems are being increasingly deployed in large scale civilian applications The Schiphol Premium scheme at the Amsterdam airport for example employs iris scan cards to speed up the passport and visa control procedures

CHAPTER 6

CONCLUSION AND FUTURE SCOPE

61 CONCLUSION

In this paper we proposed a new parallel sclera vein recognition

method which employees a two stage parallel approach for registration and

matching Even though the research focused on developing a parallel sclera

matching solution for the sequential line-descriptor method using CUDA

GPU architecture the parallel strategies developed in this research can be

applied to design parallel solutions to other sclera vein recognition methods

and general pattern recognition methods We designed the Y shape

descriptor to narrow the search range to increase the matching efficiency

which is a new feature extraction method to take advantage of the GPU

structures We developed the WPL descriptor to incorporate mask

information and make it more suitable for parallel computing which can

dramatically reduce data transferring and computation We then carefully

mapped our algorithms to GPU threads and blocks which is an important

step to achieve parallel computation efficiency using a GPU A work flow

which has high arithmetic intensity to hide the memory access latency was

designed to partition the computation task to the heterogeneous system of

CPU and GPU even to the threads in GPU The proposed method

dramatically improves the matching efficiency without compromising

recognition accuracy

62 REFERENCES

[1] C W Oyster The Human Eye Structure and Function Sunderland

Sinauer Associates 1999

[2] C Cuevas D Berjon F Moran and N Garcia ldquoMoving object

detection for real-time augmented reality applications in a GPGPUrdquo IEEE

Trans Consum Electron vol 58 no 1 pp 117ndash125 Feb 2012

[3] D C Cirean U Meier L M Gambardella and J Schmidhuber ldquoDeep

big simple neural nets for handwritten digit recognitionrdquo Neural Comput

vol 22 no 12 pp 3207ndash3220 2010

[4] F Z Sakr M Taher and A M Wahba ldquoHigh performance iris

recognition system on GPUrdquo in Proc ICCES 2011 pp 237ndash242

[5] G Poli J H Saito J F Mari and M R Zorzan ldquoProcessing

neocognitron of face recognition on high performance environment based

on GPU with CUDA architecturerdquo in Proc 20th Int Symp Comput

Archit High Perform Comput 2008 pp 81ndash88

[6] J Antikainen J Havel R Josth A Herout P Zemcik and M Hauta-

Kasari ldquoNonnegative tensor factorization accelerated using GPGPUrdquo IEEE

Trans Parallel Distrib Syst vol 22 no 7 pp 1135ndash1141 Feb 2011

[7] K-S Oh and K Jung ldquoGPU implementation of neural networksrdquo

Pattern Recognit vol 37 no 6 pp 1311ndash1314 2004

[8] P R Dixon T Oonishi and S Furui ldquoHarnessing graphics processors

for the fast computation of acoustic likelihoods in speech recognitionrdquo

Comput Speech Lang vol 23 no 4 pp 510ndash526 2009

[9] P Kaufman and A Alm ldquoClinical applicationrdquo Adlerrsquos Physiology of

the Eye 2003

[10] R N Rakvic B J Ulis R P Broussard R W Ives and N Steiner

ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics Security vol 4

no 4 pp 812ndash823 Dec 2009

[11] S Crihalmeanu and A Ross ldquoMultispectral scleral patterns for ocular

biometric recognitionrdquo Pattern Recognit Lett vol 33 no 14 pp 1860ndash

1869 Oct 2012

[12] W Wenying Z Dongming Z Yongdong L Jintao and G

Xiaoguang ldquoRobust spatial matching for object retrieval and its parallel

implementation on GPUrdquo IEEE Trans Multimedia vol 13 no 6 pp

1308ndash1318 Dec 2011Multimedia Sec Magdeburg Germany Sep 2004

pp 4ndash15

[13] Y Xu S Deka and R Righetti ldquoA hybrid CPU-GPGPU approach for

real-time elastographyrdquo IEEE Trans Ultrason Ferroelectr Freq Control

vol 58 no 12 pp 2631ndash2645 Dec 2011

[14] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive

multimodal eye recognitionrdquo Signal Image Video Process vol 7 no 4

pp 619ndash631 Jul 2013

[15] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive

approach for sclera image quality measurerdquo Int J Biometrics vol 5 no 2

pp 181ndash198 2013

[16] Z Zhou E Y Du N L Thomas and E J Delp ldquoA new human

identification method Sclera recognitionrdquo IEEE Trans Syst Man

Cybern A Syst Humans vol 42 no 3 pp 571ndash583 May 2012

Page 36: Major project report on

ACCEPTABILITY Since the subject is not required to stare directly into

the camera lens and given the possibility of capturing the conjunctival

vasculature from several feet away this modality is non-intrusive and thus

more acceptable

SPOOF-PROOFNESS The fine multi surface structure of the ocular

veins makes them hard to reproduce as a physical artifact Besides being a

stand-alone biometric modality we anticipate that the addition of

conjunctival biometrics will enhance the performance of current iris-based

biometric system in the following ways

Improving accuracy by the addition of vascular features

Facilitating recognition using off-angle iris images For instance if the iris

information is relegated to the left or right portions of the eye the sclera

vein patterns will be further exposed This feature makes sclera vasculature

a natural complement to the iris biometric

Addressing the failure-to-enroll issue when iris patterns are not usable (eg

due to surgical procedures)

Reducing vulnerability to spoof attacks For instance when implemented

alongside iris systems an attacker needs to reproduce not only the iris but

also different surfaces of the sclera along with the associated

microcirculation and make them available on commensurate eye surfaces

The first step in parallelizing an algorithm is to determine the

availability for simultaneous computation Below Figure demonstrates the

possibility for parallel directional filtering Since the filter is computed over

different portions of the input image the computation can be computed in

parallel (denoted by Elements below) In addition individual parallelization

of each element of Filtering can also be performed A detailed discussion of

our proposed parallelization is outside the scope of this paper

FIG

FIG

225 OVERVIEW OF THE LINE DESCRIPTOR-BASED SCLERA

VEIN

2251 RECOGNITION METHOD

The matching segment of the line-descriptor based method is a

bottleneck with regard to matching speed In this section we briefly

describe the Line Descriptor-based sclera vein recognition method After

segmentation vein patterns were enhanced by a bank of directional Gabor

filters Binary morphological operations are used to thin the detected vein

structure down to a single pixel wide skeleton and remove the branch

points The line descriptor is used to describe the segments in the vein

structure Figure 2 shows a visual description of the line descriptor Each

segment is described by three quantities the segments angle to some

reference angle at the iris center θ the segments distance to the iris center r

and the dominant angular orientation of the line segment ɸ Thus the

descriptor is S = ( θ r ɸ )T The individual components of the line descriptor

are calculated as

FIG

Here fline (x) is the polynomial approximation of the line segment (xl yl )

is the center point of the line segment (xi yi ) is the center of the detected

iris and S is the line descriptor In order to register the segments of the

vascular patterns a RANSAC-based algorithm is used to estimate the best-

fit parameters for registration between the two sclera vascular patterns For

the registration algorithm it randomly chooses two points ndash one from the

test template and one from the target template It also randomly chooses a

scaling factor and a rotation value based on a priori knowledge of the

database Using these values it calculates a fitness value for the registration

using these parameters

After sclera template registration each line segment in the test

template is compared to the line segments in the target template for

matches In order to reduce the effect of segmentation errors we created the

weighting image (Figure 3) from the sclera mask by setting interior pixels

in the sclera mask to 1 pixels within some distance of the boundary of the

mask to 05 and pixels outside the mask to 0

The matching score for two segment descriptors is calculated By

where Si and Sj are two segment descriptors m(Si Sj ) is the matching

score between segments Si and Sj d(Si Sj ) is the Euclidean distance

between the segment descriptors center points (from Eq 6-8) Dmatch is

the matching distance threshold and match is the matching angle threshold

The total matching score M is the sum of the individual matching scores

divided by the maximum matching score for the minimal set between the

test and target template That is one of the test or target templates has fewer

points and thus the sum of its descriptors weight sets the maximum score

that can be attained

FIG

FIG

FIG

FIG

movement of eye Y shape branches are observed to be a stable feature and

can be used as sclera feature descriptor To detect the Y shape branches in

the original template we search for the nearest neighbors set of every line

segment in a regular distance classified the angles among these neighbors

If there were two types of angle values in the line segment set this set may

be inferred as a Y shape structure and the line segment angles would be

recorded as a new feature of the sclera

There are two ways to measure both orientation and relationship of

every branch of Y shape vessels one is to use the angles of every branch to

x axle the other is to use the angels between branch and iris radial

direction The first method needs additional rotation operating to align the

template In our approach we employed the second method As Figure 6

shows ϕ1 ϕ2 and ϕ3 denote the angle between each branch and the radius

from pupil center Even when the head tilts the eye moves or the camera

zooms occurs at the image acquisition step ϕ1 ϕ2 and ϕ3 are quite stable

To tolerate errors from the pupil center calculation in the segmentation step

we also recorded the center position (x y) of the Y shape branches as

auxiliary parameters So in our rotation shift and scale invariant feature

vector is defined as y(ϕ1 ϕ2 ϕ3 x y) The Y-shape descriptor is generated

with reference to the iris center Therefore it is automatically aligned to the

iris centers It is a rotational- and scale- invariant descriptor V WPL

SCLERA DESCRIPTOR As we discussed in the Section 22 the line

descriptor is extracted from the skeleton of vessel structure in binary images

(Figure 7) The skeleton is then broken into smaller segments For each

segment a line descriptor is created to record the center and orientation of

the segment This descriptor is expressed as s(x yɸ) where (x y) is the

position of the center and ɸ is its orientation Because of the limitation of

segmentation accuracy the descriptor in the boundary of sclera area might

not be accurate and may contain spur edges resulting from the iris eyelid

andor eyelashes To be tolerant of such error the mask file

FIG

The line descriptor of the sclera vessel pattern (a) An eye image (b) Vessel

patterns in sclera (c) Enhanced sclera vessel patterns (d) Centers of line

segments of vessel patterns

Is designed to indicate whether a line segment belongs to the edge of the

sclera or not However in GPU application using the mask is a challenging

since the mask files are large in size and will occupy the GPU memory and

slow down the data transfer When matching the registration RANSAC

type algorithm was used to randomly select the corresponding descriptors

and the transform parameter between them was used to generate the

template transform affine matrix After every templates transform the mask

data should also be transformed and new boundary should be calculated to

evaluate the weight of the transformed descriptor This results in too many

convolutions in processor unit

To reduce heavy data transfer and computation we designed the

weighted polar line (WPL) descriptor structure which includes the

information of mask and can be automatically aligned We extracted the

relationship of geometric feature of descriptors and store them as a new

descriptor We use a weighted image created via setting various weight

values according to their positions The weight of those descriptors who are

beyond the sclera are set to be 0 and those who are near the sclera

boundary are 05 and interior descriptors are set to be 1 In our work

descriptors weights were calculated on their own mask by the CPU only

once

The calculating result was saved as a component of descriptor The

descriptor of sclera will change to s(x y ɸw) where w denotes the weight

of the point and the value may be 0 05 1 To align two templates when a

template is shifted to another location along the line connecting their

centers all the descriptors of that template will be transformed It would be

faster if two templates have similar reference points If we use the center of

the iris as the reference point when two templates are compared the

correspondence will automatically be aligned to each other since they have

the similar reference point Every feature vector of the template is a set of

line segment descriptors composed of three variable (Figure 8) the

segment angle to the reference line which went through the iris center

denoted as θ the distance between the segments center and pupil center

which is denoted as r the dominant angular orientation of the segment

denoted as ɸ To minimize the GPU computing we also convert the

descriptor value from polar coordinate to rectangular coordinate in CPU

preprocess

The descriptor vector becomes s(x y r θ ɸw) The left and right

parts of sclera in an eye may have different registration parameters For

example as an eyeball moves left left part sclera patterns of the eye may be

compressed while the right part sclera patterns are stretched

In parallel matching these two parts are assigned to threads in

different warps to allow different deformation The multiprocessor in

CUDA manages threads in groups of 32 parallel threads called warps We

reorganized the descriptor from same sides and saved

FIG

FIG

them in continuous address This would meet requirement of coalesced

memory access in GPU

After reorganizing the structure of descriptors and adding mask information

into the new descriptor the computation on the mask file is not needed on

the GPU It was very fast to match with this feature because it does not

need to reregister the templates every time after shifting Thus the cost of

data transfer and computation on GPU will be reduced Matching on the

new descriptor the shift parameters generator in Figure 4 is then simplified

as Figure 9

23 EVOLUTION OF GPU ARCHITECTURE

The fixed-function pipeline lacked the generality to efficiently express

more complicated shading and lighting operations that are essential for

complex effects The key step was replacing the fixed-function per-vertex

and per-fragment operations with user-specified programs run on each

vertex and fragment Over the past six years these vertex programs and

fragment programs have become increasingly more capable with larger

limits on their size and resource consumption with more fully featured

instruction sets and with more flexible control-flow operations After many

years of separate instruction sets for vertex and fragment operations current

GPUs support the unified Shader Model 40 on both vertex and fragment

shaders

The hardware must support shader programs of at least 65 k static

instructions and unlimited dynamic instructions

The instruction set for the first time supports both 32-bit integers and 32-

bit floating-point numbers

The hardware must allow an arbitrary number of both direct and indirect

reads from global memory (texture)

Finally dynamic flow control in the form of loops and branches must be

supported

As the shader model has evolved and become more powerful and GPU

applications of all types have increased vertex and fragment program

complexity GPU architectures have increasingly focused on the

programmable parts of the graphics pipeline Indeed while previous

generations of GPUs could best be described as additions of

programmability to a fixed-function pipeline todayrsquos GPUs are better

characterized as a programmable engine surrounded by supporting fixed-

function units General-Purpose Computing on the GPU Mapping general-

purpose computation onto the GPU uses the graphics hardware in much the

same way as any standard graphics application Because of this similarity it

is both easier and more difficult to explain the process On one hand the

actual operations are the same and are easy to follow on the other hand the

terminology is different between graphics and general-purpose use Harris

provides an excellent description of this mapping process

We begin by describing GPU programming using graphics terminology

then show how the same steps are used in a general-purpose way to author

GPGPU applications and finally use the same steps to show the more

simple and direct way that todayrsquos GPU computing applications are written

231 PROGRAMMING A GPU FOR GRAPHICS

We begin with the same GPU pipeline that we described in Section II

concentrating on the programmable aspects of this pipeline

The programmer specifies geometry that covers a region on the screen

The rasterizer generates a fragment at each pixel location covered by that

geometry

Each fragment is shaded by the fragment program

The fragment program computes the value of the fragment by a

combination of math operations and global memory reads from a global

Btexture[ memory

The resulting image can then be used as texture on future passes through

the graphics pipeline

232 PROGRAMMING A GPU FOR GENERAL-PURPOSE

PROGRAMS (OLD)

Coopting this pipeline to perform general-purpose computation

involves the exact same steps but different terminology A motivating

example is a fluid simulation computed over a grid at each time step we

compute the next state of the fluid for each grid point from the current state

at its grid point and at the grid points of its neighbors

The programmer specifies a geometric primitive that covers a

computation domain of interest The rasterizer generates a fragment at each

pixel location covered by that geometry (In our example our primitive

must cover a grid of fragments equal to the domain size of our fluid

simulation)

Each fragment is shaded by an SPMD general-purpose fragment

program (Each grid point runs the same program to update the state of its

fluid)

The fragment program computes the value of the fragment by a

combination of math operations and Bgather[ accesses from global

memory (Each grid point can access the state of its neighbors from the

previous time step in computing its current value)

The resulting buffer in global memory can then be used as an input on

future passes (The current state of the fluid will be used on the next time

step)

23 2 PROGRAMMING A GPU FOR GENERAL-PURPOSE

PROGRAMS (NEW)

One of the historical difficulties in programming GPGPU applications

has been that despite their general-purpose tasksrsquo having nothing to do with

graphics the applications still had to be programmed using graphics APIs

In addition the program had to be structured in terms of the graphics

pipeline with the programmable units only accessible as an intermediate

step in that pipeline when the programmer would almost certainly prefer to

access the programmable units directly The programming environments we

describe in detail in Section IV are solving this difficulty by providing a

more natural direct non-graphics interface to the hardware and

specifically the programmable units Today GPU computing applications

are structured in the following way

The programmer directly defines the computation domain of interest as a

structured grid of threads

An SPMD general-purpose program computes the value of each thread

The value for each thread is computed by a combination of math

operations and both Bgather[ (read) accesses from and Bscatter[ (write)

accesses to global memory Unlike in the previous two

methods the same buffer can be used for both reading and writing

allowing more flexible algorithms (for example in-place algorithms that

use less memory)

The resulting buffer in global memory can then be used as an input in

future computation

24 COARSE-TO-FINE TWO-STAGE MATCHING PROCESS

To further improve the matching process we propose the coarse-to-fine

two-stage matching process In the first stage we matched two images

coarsely using the Y-shape descriptors which is very fast to match because

no registration was needed The matching result in this stage can help filter

out image pairs with low similarities After this step it is still possible for

some false positive matches In the second stage we used WPL descriptor

to register the two images for more detailed descriptor matching including

scale- and translation invariance This stage includes shift transform affine

matrix generation and final WPL descriptor matching Overall we

partitioned the registration and matching processing into four kernels2 in

CUDA (Figure 10) matching on the Y shape descriptor shift

transformation affine matrix generation and final WSL descriptor

matching Combining these two stages the matching program can run faster

and

achieve more accurate score

241 STAGE I MATCHING WITH Y SHAPE DESCRIPTOR

Due to scale- and rotation- invariance of the Y-shape features

registration is unnecessary before matching on Y shape descriptor The

whole matching algorithm is listed as algorithm 1

FIG

Here ytei and yta j are the Y shape descriptors of test template Tte

and target template Tta respectively dϕ is the Euclidian distance of angle

element of descriptors vector defined as (3) dxy is the Euclidian distance of

two descriptor centers defined as (4) ni and di are the matched descriptor

pairsrsquo number and their centers distance respectively tϕ is a distance

threshold and txy is the threshold to restrict the searching area We set tϕ to

30 and txy to 675 in our experiment Here

To match two sclera templates we searched the areas nearby to all

the Y shape branches The search area is limited to the corresponding left or

right half of the sclera in order to reduce the searching range and time The

distance of two branches is defined in (3) where ϕi j is the angle between

the j th branch and the polar from pupil center in desctiptor i

The number of matched pairs ni and the distance between Y shape

branches centers di are stored as the matching result We fuse the number of

matched branches and the average distance between matched branches

centers as (2) Here α is a factor to fuse the matching score which was set

to 30 in our study Ni and Nj is the total numbers of feature vectors in

template i and j separately The decision is regulated by the threshold t if

the sclerarsquos matching score is lower than t the sclera will be discarded The

sclera with high matching score will be passed to the next more precisely

matching process

242 STAGE II FINE MATCHING USING WPL DESCRIPTOR

The line segment WSL descriptor reveals more vessel structure detail of

sclera than the Y shape descriptor The variation of sclera vessel pattern is

nonlinear because

When acquiring an eye image in different gaze angle the vessel structure

will appear nonlinear shrink or extend because eyeball is spherical in shape

sclera is made up of four layers episclera stroma lamina fusca and

endothelium There are slightly differences among movement of these

layers Considering these factors our registration employed both single

shift transform and multi-parameter transform which combines shift

rotation and scale together

1) SHIFT PARAMETER SEARCH As we discussed before

segmentation may not be accurate As a result the detected iris center could

not be very accurate Shift transform is designed to tolerant possible errors

in pupil center detection in the segmentation step If there is no deformation

or only very minor deformation registration with shift transform together

would be adequate to achieve an accurate result We designed Algorithm 2

to get optimized shift parameter Where Tte is the test template and ssei is

the i th WPL descriptor of Tte Tta is the target template and ssai is the i th

WPL descriptor of Ttad(stek staj ) is Euclidean distance of escriptors stek

and staj

Δsk is the shift value of two descriptors defines as

We first randomly select an equal number of segment descriptors

stek in test template Tte from each quad and find its nearest neighbors staj _

in target template Tta The shift offset of them is recorded as the possible

registration shift factor _sk The final offset registration factor is _soptim

which has the smallest standard deviation among these candidate offsets

2) AFFINE TRANSFORM PARAMETER SEARCH

Affine transform is designed to tolerant some deformation of sclera

patterns in the matching step The affine transform algorithm is shown in

Algorithm 3 The shift value in the parameter set is obtained by randomly

selecting descriptor s(it )te and calculating the distance from its nearest

neighbor staj_ in Tta We transform the test template by the matrix in (7)

At end of the iteration we count the numbers of matched descriptor pairs

from the transformed template and the target template The factor β is

involved to determine if the pair of descriptor is matched and we set it to

be 20 pixels in our experiment After N iterations the optimized transform

parameter set is determined via selecting the maximum matching numbers

m(it) Here stei Tte staj and Tta is defined same as algorithm 2 tr (it )

shi f t θ(it )tr (it ) scale is the parameters of shift rotation and scale

transform generated in i tth iteration R(θ (it )) T (tr (it ) shi f t ) and S(tr (it

) scale) are the transform matrix defined as (7) To search optimize

transform parameter we iterated N times to generate these parameters In

our experiment we set iteration time to 512

3) REGISTRATION AND MATCHING ALGORITHM

Using the optimized parameter set determined from Algorithms 2

and 3 the test template will be registered and matched simultaneously The

registration and matching algorithm is listed in Algorithm 4 Here stei Tte

staj and Tta are defined same as Algorithms 2 and 3 θ(optm) tr (optm) shi f

t tr (optm) scale _soptim are the registration parameters attained from

Algorithms 2 and 3 R_θ(optm)_T _tr (optm) shi f t _S_tr (optm) scale

_ is the descriptor transform matrix defined in Algorithm 3 empty is the angle

between the segment descriptor and radius direction w is the weight of the

descriptor which indicates whether the descriptor is at the edge of sclera or

not To ensure that the nearest descriptors have a similar orientation we

used a constant factor α to check the abstract difference of two ɸ In our

experiment we set α to 5 The total matching score is minimal score of two

transformed result divided by the minimal matching score for test template

and target template

25 MAPPING THE SUBTASKS TO CUDA

CUDA is a single instruction multiple data (SIMD) system and

works as a coprocessor with a CPU A CUDA consists of many streaming

multiprocessors (SM) where the parallel part of the program should be

partitioned into threads by the programmer and mapped into those threads

There are multiple memory spaces in the CUDA memory hierarchy

register local memory shared memory global memory constant memory

and texture memory Register local memory and shared memory are on-

chip and could be a little time consuming to access these memories Only

shared memory can be accessed by other threads within the same block

However there is only limited availability of shared memory Global

memory constant memory and texture memory are off-chip memory and

accessible by all threads which would be very time consuming to access

these memories

Constant memory and texture memory are read-only and cacheable

memory Mapping algorithms to CUDA to achieve efficient processing is

not a trivial task There are several challenges in CUDA programming

If threads in a warp have different control path all the branches will be

executed serially To improve performance branch divergence within a

warp should be avoided

Global memory is slower than on-chip memory in term of access To

completely hide the latency of the small instructions set we should use on-

chip memory preferentially rather than global memory When global

memory access occurs threads in same warp should access the words in

sequence to achieve coalescence

Shared memory is much faster than the local and global memory space

But shared memory is organized into banks which are equal in size If two

addresses of memory request from different thread within a warp fall in the

same memory bank the access will be serialized To get maximum

performance memory requests should be scheduled to minimize bank

conflicts

251 MAPPING ALGORITHM TO BLOCKS

Because the proposed registration and matching algorithm has four

independent modules all the modules will be converted to different kernels

on the GPU These kernels are different in computation density thus we

map them to the GPU by various map strategies to fully utilize the

computing power of CUDA Figure 11 shows our scheme of CPU-GPU

task distribution and the partition among blocks and threads Algorithm 1 is

partitioned into coarse-grained parallel subtasks

We create a number of threads in this kernel The number of threads

is the same as the number of templates in the database As the upper middle

column shows in Figure 11 each target template will be assigned to one

thread One thread performs a pair of templates compare In our work we

use NVIDIA C2070 as our GPU Threads and blocks number is set to

1024 That means we can match our test template with up to 1024times1024

target templates at same time

Algorithms 2-4 will be partitioned into fine-grained subtasks which is

processed a section of descriptors in one thread As the lower portion of the

middle column shows in Figure 11 we assigned a target template to one

block Inside a block one thread corresponds a set of descriptors in this

template This partition makes every block execute independently and there

are no data exchange requirements between different blocks When all

threads complete their responding descriptor fractions the sum of the

intermediate results needs to be computed or compared A parallel prefix

sum algorithm is used to calculate the sum of intermediate results which is

show in right of Figure 11 Firstly all odd number threads compute the sum

of consecutive pairs of the results Then recursively every first of i (= 4 8

16 32 64 ) threads

compute the prefix sum on the new result The final result will be saved in

the first address which has the same variable name as the first intermediate

result

252 MAPPING INSIDE BLOCK

In shift argument searching there are two schemes we can choose to

map task

Mapping one pair of templates to all the threads in a block and then every

thread would take charge of a fraction of descriptors and cooperation with

other threads

Assigning a single possible shift offset to a thread and all the threads will

compute independently unless the final result should be compared with

other possible offset

Due to great number of sum and synchronization operations in every

nearest neighbor searching step we choose the second method to parallelize

shift searching In affine matrix generator we mapped an entire parameter

set searching to a thread and every thread randomly generated a set of

parameters and tried them independently The generated iterations were

assigned to all threads The challenge of this step is the randomly generated

numbers might be correlated among threads In the step of rotation and

scale registration generating we used the Mersenne Twister pseudorandom

number generator because it can use bitwise arithmetic and have long

period

The Mersenne twister as most of pseudorandom generators is iterative

Therefore itrsquos hard to parallelize a single twister state update step among

several execution threads To make sure that thousands of threads in the

launch grid generate uncorrelated random sequence many simultaneous

Mersenne twisters need to process with different initial states in parallel

But even ldquovery differentrdquo (by any definition) initial state values do not

prevent the emission of correlated sequences by each generator sharing

identical parameters To solve this problem and to enable efficient

implementation of Mersenne Twister on parallel architectures we used a

special offline tool for the dynamic creation of Mersenne Twisters

parameters modified from the algorithm developed by Makoto Matsumoto

and Takuji Nishimura In the registration and matching step when

searching the nearest neighbor a line segment that has already matched

with others should not be used again In our approach a flag

FIG

FIG

Variable denoting whether the line has been matched is stored in

shared memory To share the flags all the threads in a block should wait

synchronic operation at every query step Our solution is to use a single

thread in a block to process the matching

253 MEMORY MANAGEMENT

The bandwidth inside GPU board is much higher than the

bandwidth between host memory and device memory The data transfer

between host and device can lead to long latency As shown in Figure 11

we load the entire target templates set from database without considering

when they would be processed Therefore there was no data transfer from

host to device during the matching procedure In global memory the

components in descriptor y(ϕ1 ϕ2 ϕ3 x y) and s(x y rθ ϕw) were stored

separately This would guarantee contiguous kernels of Algorithm 2 to 4

can access their data in successive addresses Although such coalescing

access reduces the latency frequently global memory access was still a

slower way to get data In our kernel we loaded the test template to shared

memory to accelerate memory access Because the Algorithms 2 to 4

execute different number of iterations on same data the bank conflict does

not happen To maximize our texture memory space we set the system

cache to the lowest value and bonded our target descriptor to texture

memory Using this catchable memory our data access was accelerated

more

FIG

26 HISTOGRAM OF ORIENTED GRADIENTS

Histogram of oriented gradients is the feature descriptors It is primarily

applied to the design of target detection In this paper it is applied as the

feature for human recognition In the sclera region the vein patterns are the

edges of an image So HOG is used to determine the gradient orientation

and edge orientations of vein pattern in the sclera region of an eye image

To follow out this technique first of all divide the image into small

connected regions called cells For each cell compute the histogram of

gradient directions or edge orientations of the pixels Then the combination

of different histogram of different cell represents the descriptor To improve

accuracy histograms can be contrast normalized by calculating the intensity

from the block and then using this value normalizes all cells within the

block This normalization result shows that it is invariant to geometric and

photometric changes The gradient magnitude m(x y) and orientation 1050592(x

y) are calculated using x and y directions gradients dx (x y) and dy (x y)

Orientation binning is the second step of HOG This method utilized

to create cell histograms Each pixel within the cell used to give a weight to

the orientation which is found in the gradient computation Gradient

magnitude is used as the weight The cells are in the rectangular form The

binning of gradient orientation should be spread over 0 to 180 degrees and

opposite direction counts as the same In the Fig 8 depicts the edge

orientation of picture elements If the images have any illumination and

contrast changes then the gradient strength must be locally normalized For

that cells are grouped together into larger blocks These blocks are

overlapping so that each cell contributes more than once to the final

descriptor Here rectangular HOG (R-HOG) blocks are applied which are

mainly in square grids The performance of HOG is improved by putting

on a Gaussian window into each block

FIG

CHAPTER 3

SOFTWARE SPECIFICATION

31 GENERAL

MATLAB(matrix laboratory) is a numerical

computing environment and fourth-generation programming language

Developed by Math Works MATLAB allows matrix manipulations

plotting of functions and data implementation of algorithms creation

of user interfaces and interfacing with programs written in other languages

including C C++ Java and Fortran

Although MATLAB is intended primarily for numerical computing an

optional toolbox uses the MuPAD symbolic engine allowing access

to symbolic computing capabilities An additional package Simulink adds

graphicalmulti-domainsimulationandModel-Based

Design for dynamic and embedded systems

In 2004 MATLAB had around one million users across industry

and academia MATLAB users come from various backgrounds

of engineering science and economics MATLAB is widely used in

academic and research institutions as well as industrial enterprises

MATLAB was first adopted by researchers and practitioners

in control engineering Littles specialty but quickly spread to many other

domains It is now also used in education in particular the teaching

of linear algebra and numerical analysis and is popular amongst scientists

involved in image processing The MATLAB application is built around the

MATLAB language The simplest way to execute MATLAB code is to type

it in the Command Window which is one of the elements of the MATLAB

Desktop When code is entered in the Command Window MATLAB can

be used as an interactive mathematical shell Sequences of commands can

be saved in a text file typically using the MATLAB Editor as a script or

encapsulated into a function extending the commands available

MATLAB provides a number of features for documenting and

sharing your work You can integrate your MATLAB code with other

languages and applications and distribute your MATLAB algorithms and

applications

32 FEATURES OF MATLAB

High-level language for technical computing

Development environment for managing code files and data

Interactive tools for iterative exploration design and problem solving

Mathematical functions for linear algebra statistics Fourier analysis

filtering optimization and numerical integration

2-D and 3-D graphics functions for visualizing data

Tools for building custom graphical user interfaces

Functions for integrating MATLAB based algorithms with external

applications and languages such as C C++ FORTRAN Javatrade COM

and Microsoft Excel

MATLAB is used in vast area including signal and image

processing communications control design test and measurement

financial modeling and analysis and computational Add-on toolboxes

(collections of special-purpose MATLAB functions) extend the MATLAB

environment to solve particular classes of problems in these application

areas

MATLAB can be used on personal computers and powerful

server systems including the Cheaha compute cluster With the addition of

the Parallel Computing Toolbox the language can be extended with parallel

implementations for common computational functions including for-loop

unrolling Additionally this toolbox supports offloading computationally

intensive workloads to Cheaha the campus compute cluster MATLAB is

one of a few languages in which each variable is a matrix (broadly

construed) and knows how big it is Moreover the fundamental operators

(eg addition multiplication) are programmed to deal with matrices when

required And the MATLAB environment handles much of the bothersome

housekeeping that makes all this possible Since so many of the procedures

required for Macro-Investment Analysis involves matrices MATLAB

proves to be an extremely efficient language for both communication and

implementation

321 INTERFACING WITH OTHER LANGUAGES

MATLAB can call functions and subroutines written in the C

programming language or FORTRAN A wrapper function is created

allowing MATLAB data types to be passed and returned The dynamically

loadable object files created by compiling such functions are termed MEX-

files (for MATLAB executable)

Libraries written in Java ActiveX or NET can be directly called

from MATLAB and many MATLAB libraries (for

example XML or SQL support) are implemented as wrappers around Java

or ActiveX libraries Calling MATLAB from Java is more complicated but

can be done with MATLAB extension which is sold separately by Math

Works or using an undocumented mechanism called JMI (Java-to-Mat lab

Interface) which should not be confused with the unrelated Java that is also

called JMI

As alternatives to the MuPAD based Symbolic Math Toolbox

available from Math Works MATLAB can be connected

to Maple or Mathematical

Libraries also exist to import and export MathML

Development Environment

Startup Accelerator for faster MATLAB startup on Windows especially on

Windows XP and for network installations

Spreadsheet Import Tool that provides more options for selecting and

loading mixed textual and numeric data

Readability and navigation improvements to warning and error messages in

the MATLAB command window

Automatic variable and function renaming in the MATLAB Editor

Developing Algorithms and Applications

MATLAB provides a high-level language and development

tools that let you quickly develop and analyze your algorithms and

applications

The MATLAB Language

The MATLAB language supports the vector and matrix operations

that are fundamental to engineering and scientific problems It enables fast

development and execution With the MATLAB language you can

program and develop algorithms faster than with traditional languages

because you do not need to perform low-level administrative tasks such as

declaring variables specifying data types and allocating memory In many

cases MATLAB eliminates the need for lsquoforrsquo loops As a result one line of

MATLAB code can often replace several lines of C or C++ code

At the same time MATLAB provides all the features of a traditional

programming language including arithmetic operators flow control data

structures data types object-oriented programming (OOP) and debugging

features

MATLAB lets you execute commands or groups of commands one

at a time without compiling and linking enabling you to quickly iterate to

the optimal solution For fast execution of heavy matrix and vector

computations MATLAB uses processor-optimized libraries For general-

purpose scalar computations MATLAB generates machine-code

instructions using its JIT (Just-In-Time) compilation technology

This technology which is available on most platforms provides

execution speeds that rival those of traditional programming languages

Development Tools

MATLAB includes development tools that help you implement

your algorithm efficiently These include the following

MATLAB Editor

Provides standard editing and debugging features such as setting

breakpoints and single stepping

Code Analyzer

Checks your code for problems and recommends modifications to

maximize performance and maintainability

MATLAB Profiler

Records the time spent executing each line of code

Directory Reports

Scan all the files in a directory and report on code efficiency file

differences file dependencies and code coverage

Designing Graphical User Interfaces

By using the interactive tool GUIDE (Graphical User Interface

Development Environment) to layout design and edit user interfaces

GUIDE lets you include list boxes pull-down menus push buttons radio

buttons and sliders as well as MATLAB plots and Microsoft

ActiveXreg controls Alternatively you can create GUIs programmatically

using MATLAB functions

322 ANALYZING AND ACCESSING DATA

MATLAB supports the entire data analysis process from acquiring

data from external devices and databases through preprocessing

visualization and numerical analysis to producing presentation-quality

output

Data Analysis

MATLAB provides interactive tools and command-line functions for data

analysis operations including

Interpolating and decimating

Extracting sections of data scaling and averaging

Thresholding and smoothing

Correlation Fourier analysis and filtering

1-D peak valley and zero finding

Basic statistics and curve fitting

Matrix analysis

Data Access

MATLAB is an efficient platform for accessing data from

files other applications databases and external devices You can read data

from popular file formats such as Microsoft Excel ASCII text or binary

files image sound and video files and scientific files such as HDF and

HDF5 Low-level binary file IO functions let you work with data files in

any format Additional functions let you read data from Web pages and

XML

Visualizing Data

All the graphics features that are required to visualize engineering

and scientific data are available in MATLAB These include 2-D and 3-D

plotting functions 3-D volume visualization functions tools for

interactively creating plots and the ability to export results to all popular

graphics formats You can customize plots by adding multiple axes

changing line colors and markers adding annotation Latex equations and

legends and drawing shapes

2-D Plotting

Visualizing vectors of data with 2-D plotting functions that create

Line area bar and pie charts

Direction and velocity plots

Histograms

Polygons and surfaces

Scatterbubble plots

Animations

3-D Plotting and Volume Visualization

MATLAB provides functions for visualizing 2-D matrices 3-

D scalar and 3-D vector data You can use these functions to visualize and

understand large often complex multidimensional data Specifying plot

characteristics such as camera viewing angle perspective lighting effect

light source locations and transparency

3-D plotting functions include

Surface contour and mesh

Image plots

Cone slice stream and isosurface

323 PERFORMING NUMERIC COMPUTATION

MATLAB contains mathematical statistical and engineering

functions to support all common engineering and science operations These

functions developed by experts in mathematics are the foundation of the

MATLAB language The core math functions use the LAPACK and BLAS

linear algebra subroutine libraries and the FFTW Discrete Fourier

Transform library Because these processor-dependent libraries are

optimized to the different platforms that MATLAB supports they execute

faster than the equivalent C or C++ code

MATLAB provides the following types of functions for performing

mathematical operations and analyzing data

Matrix manipulation and linear algebra

Polynomials and interpolation

Fourier analysis and filtering

Data analysis and statistics

Optimization and numerical integration

Ordinary differential equations (ODEs)

Partial differential equations (PDEs)

Sparse matrix operations

MATLAB can perform arithmetic on a wide range of data types

including doubles singles and integers

CHAPTER 4

IMPLEMENTATION

41 GENERAL

Matlab is a program that was originally designed to simplify the

implementation of numerical linear algebra routines It has since grown into

something much bigger and it is used to implement numerical algorithms

for a wide range of applications The basic language used is very similar to

standard linear algebra notation but there are a few extensions that will

likely cause you some problems at first

42 SNAPSHOTS

ORIGINAL SCLERA IMAGE IS CONVERTED INTO GREY SCALE IMAGE

FIG

GREY SCALE IMAGE IS CONVERTED INTO BINARY IMAGE

FIG

EDGE DETECTON IS DONE BY OTSUrsquoS THRESHOLDING

FIG

SELECTING THE REGION OF INTEREST (SCLERA PART)

FIG

SELECTED ROI PART

FIG

FIG

ENHANCEMENT OF SCLERA IMAGE

FIG

FEATURE EXTRACTION OF SCLERA IMAGE USING GABOR

FILTERS

FIG

MATCHING WITH IMAGES IN DATABASE

FIG

DISPLAYING THE RESULT (MATCHED OR NOT MATCHED)

FIG

CHAPTER 5

APPLICATIONS

The applications of biometrics can be divided into the following three main groups

Commercial applications such as computer network login electronic data security ecommerce Internet access ATM credit card physical access control cellular phone PDA medical records management distance learning etc

Government applications such as national ID card correctional facility driverrsquos license social security welfare-disbursement border control Passport control etc

Forensic applications such as corpse identification criminal investigation terrorist identification parenthood determination missing children etc Traditionally commercial applications have used knowledge-based systems (eg PIN sand passwords) government applications have used token-based systems (eg ID cards and badges) and forensic applications have relied on human experts to match biometric features Biometric systems are being increasingly deployed in large scale civilian applications The Schiphol Premium scheme at the Amsterdam airport for example employs iris scan cards to speed up the passport and visa control procedures

CHAPTER 6

CONCLUSION AND FUTURE SCOPE

61 CONCLUSION

In this paper we proposed a new parallel sclera vein recognition

method which employees a two stage parallel approach for registration and

matching Even though the research focused on developing a parallel sclera

matching solution for the sequential line-descriptor method using CUDA

GPU architecture the parallel strategies developed in this research can be

applied to design parallel solutions to other sclera vein recognition methods

and general pattern recognition methods We designed the Y shape

descriptor to narrow the search range to increase the matching efficiency

which is a new feature extraction method to take advantage of the GPU

structures We developed the WPL descriptor to incorporate mask

information and make it more suitable for parallel computing which can

dramatically reduce data transferring and computation We then carefully

mapped our algorithms to GPU threads and blocks which is an important

step to achieve parallel computation efficiency using a GPU A work flow

which has high arithmetic intensity to hide the memory access latency was

designed to partition the computation task to the heterogeneous system of

CPU and GPU even to the threads in GPU The proposed method

dramatically improves the matching efficiency without compromising

recognition accuracy

62 REFERENCES

[1] C W Oyster The Human Eye Structure and Function Sunderland

Sinauer Associates 1999

[2] C Cuevas D Berjon F Moran and N Garcia ldquoMoving object

detection for real-time augmented reality applications in a GPGPUrdquo IEEE

Trans Consum Electron vol 58 no 1 pp 117ndash125 Feb 2012

[3] D C Cirean U Meier L M Gambardella and J Schmidhuber ldquoDeep

big simple neural nets for handwritten digit recognitionrdquo Neural Comput

vol 22 no 12 pp 3207ndash3220 2010

[4] F Z Sakr M Taher and A M Wahba ldquoHigh performance iris

recognition system on GPUrdquo in Proc ICCES 2011 pp 237ndash242

[5] G Poli J H Saito J F Mari and M R Zorzan ldquoProcessing

neocognitron of face recognition on high performance environment based

on GPU with CUDA architecturerdquo in Proc 20th Int Symp Comput

Archit High Perform Comput 2008 pp 81ndash88

[6] J Antikainen J Havel R Josth A Herout P Zemcik and M Hauta-

Kasari ldquoNonnegative tensor factorization accelerated using GPGPUrdquo IEEE

Trans Parallel Distrib Syst vol 22 no 7 pp 1135ndash1141 Feb 2011

[7] K-S Oh and K Jung ldquoGPU implementation of neural networksrdquo

Pattern Recognit vol 37 no 6 pp 1311ndash1314 2004

[8] P R Dixon T Oonishi and S Furui ldquoHarnessing graphics processors

for the fast computation of acoustic likelihoods in speech recognitionrdquo

Comput Speech Lang vol 23 no 4 pp 510ndash526 2009

[9] P Kaufman and A Alm ldquoClinical applicationrdquo Adlerrsquos Physiology of

the Eye 2003

[10] R N Rakvic B J Ulis R P Broussard R W Ives and N Steiner

ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics Security vol 4

no 4 pp 812ndash823 Dec 2009

[11] S Crihalmeanu and A Ross ldquoMultispectral scleral patterns for ocular

biometric recognitionrdquo Pattern Recognit Lett vol 33 no 14 pp 1860ndash

1869 Oct 2012

[12] W Wenying Z Dongming Z Yongdong L Jintao and G

Xiaoguang ldquoRobust spatial matching for object retrieval and its parallel

implementation on GPUrdquo IEEE Trans Multimedia vol 13 no 6 pp

1308ndash1318 Dec 2011Multimedia Sec Magdeburg Germany Sep 2004

pp 4ndash15

[13] Y Xu S Deka and R Righetti ldquoA hybrid CPU-GPGPU approach for

real-time elastographyrdquo IEEE Trans Ultrason Ferroelectr Freq Control

vol 58 no 12 pp 2631ndash2645 Dec 2011

[14] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive

multimodal eye recognitionrdquo Signal Image Video Process vol 7 no 4

pp 619ndash631 Jul 2013

[15] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive

approach for sclera image quality measurerdquo Int J Biometrics vol 5 no 2

pp 181ndash198 2013

[16] Z Zhou E Y Du N L Thomas and E J Delp ldquoA new human

identification method Sclera recognitionrdquo IEEE Trans Syst Man

Cybern A Syst Humans vol 42 no 3 pp 571ndash583 May 2012

Page 37: Major project report on

FIG

FIG

225 OVERVIEW OF THE LINE DESCRIPTOR-BASED SCLERA

VEIN

2251 RECOGNITION METHOD

The matching segment of the line-descriptor based method is a

bottleneck with regard to matching speed In this section we briefly

describe the Line Descriptor-based sclera vein recognition method After

segmentation vein patterns were enhanced by a bank of directional Gabor

filters Binary morphological operations are used to thin the detected vein

structure down to a single pixel wide skeleton and remove the branch

points The line descriptor is used to describe the segments in the vein

structure Figure 2 shows a visual description of the line descriptor Each

segment is described by three quantities the segments angle to some

reference angle at the iris center θ the segments distance to the iris center r

and the dominant angular orientation of the line segment ɸ Thus the

descriptor is S = ( θ r ɸ )T The individual components of the line descriptor

are calculated as

FIG

Here fline (x) is the polynomial approximation of the line segment (xl yl )

is the center point of the line segment (xi yi ) is the center of the detected

iris and S is the line descriptor In order to register the segments of the

vascular patterns a RANSAC-based algorithm is used to estimate the best-

fit parameters for registration between the two sclera vascular patterns For

the registration algorithm it randomly chooses two points ndash one from the

test template and one from the target template It also randomly chooses a

scaling factor and a rotation value based on a priori knowledge of the

database Using these values it calculates a fitness value for the registration

using these parameters

After sclera template registration each line segment in the test

template is compared to the line segments in the target template for

matches In order to reduce the effect of segmentation errors we created the

weighting image (Figure 3) from the sclera mask by setting interior pixels

in the sclera mask to 1 pixels within some distance of the boundary of the

mask to 05 and pixels outside the mask to 0

The matching score for two segment descriptors is calculated By

where Si and Sj are two segment descriptors m(Si Sj ) is the matching

score between segments Si and Sj d(Si Sj ) is the Euclidean distance

between the segment descriptors center points (from Eq 6-8) Dmatch is

the matching distance threshold and match is the matching angle threshold

The total matching score M is the sum of the individual matching scores

divided by the maximum matching score for the minimal set between the

test and target template That is one of the test or target templates has fewer

points and thus the sum of its descriptors weight sets the maximum score

that can be attained

FIG

FIG

FIG

FIG

movement of eye Y shape branches are observed to be a stable feature and

can be used as sclera feature descriptor To detect the Y shape branches in

the original template we search for the nearest neighbors set of every line

segment in a regular distance classified the angles among these neighbors

If there were two types of angle values in the line segment set this set may

be inferred as a Y shape structure and the line segment angles would be

recorded as a new feature of the sclera

There are two ways to measure both orientation and relationship of

every branch of Y shape vessels one is to use the angles of every branch to

x axle the other is to use the angels between branch and iris radial

direction The first method needs additional rotation operating to align the

template In our approach we employed the second method As Figure 6

shows ϕ1 ϕ2 and ϕ3 denote the angle between each branch and the radius

from pupil center Even when the head tilts the eye moves or the camera

zooms occurs at the image acquisition step ϕ1 ϕ2 and ϕ3 are quite stable

To tolerate errors from the pupil center calculation in the segmentation step

we also recorded the center position (x y) of the Y shape branches as

auxiliary parameters So in our rotation shift and scale invariant feature

vector is defined as y(ϕ1 ϕ2 ϕ3 x y) The Y-shape descriptor is generated

with reference to the iris center Therefore it is automatically aligned to the

iris centers It is a rotational- and scale- invariant descriptor V WPL

SCLERA DESCRIPTOR As we discussed in the Section 22 the line

descriptor is extracted from the skeleton of vessel structure in binary images

(Figure 7) The skeleton is then broken into smaller segments For each

segment a line descriptor is created to record the center and orientation of

the segment This descriptor is expressed as s(x yɸ) where (x y) is the

position of the center and ɸ is its orientation Because of the limitation of

segmentation accuracy the descriptor in the boundary of sclera area might

not be accurate and may contain spur edges resulting from the iris eyelid

andor eyelashes To be tolerant of such error the mask file

FIG

The line descriptor of the sclera vessel pattern (a) An eye image (b) Vessel

patterns in sclera (c) Enhanced sclera vessel patterns (d) Centers of line

segments of vessel patterns

Is designed to indicate whether a line segment belongs to the edge of the

sclera or not However in GPU application using the mask is a challenging

since the mask files are large in size and will occupy the GPU memory and

slow down the data transfer When matching the registration RANSAC

type algorithm was used to randomly select the corresponding descriptors

and the transform parameter between them was used to generate the

template transform affine matrix After every templates transform the mask

data should also be transformed and new boundary should be calculated to

evaluate the weight of the transformed descriptor This results in too many

convolutions in processor unit

To reduce heavy data transfer and computation we designed the

weighted polar line (WPL) descriptor structure which includes the

information of mask and can be automatically aligned We extracted the

relationship of geometric feature of descriptors and store them as a new

descriptor We use a weighted image created via setting various weight

values according to their positions The weight of those descriptors who are

beyond the sclera are set to be 0 and those who are near the sclera

boundary are 05 and interior descriptors are set to be 1 In our work

descriptors weights were calculated on their own mask by the CPU only

once

The calculating result was saved as a component of descriptor The

descriptor of sclera will change to s(x y ɸw) where w denotes the weight

of the point and the value may be 0 05 1 To align two templates when a

template is shifted to another location along the line connecting their

centers all the descriptors of that template will be transformed It would be

faster if two templates have similar reference points If we use the center of

the iris as the reference point when two templates are compared the

correspondence will automatically be aligned to each other since they have

the similar reference point Every feature vector of the template is a set of

line segment descriptors composed of three variable (Figure 8) the

segment angle to the reference line which went through the iris center

denoted as θ the distance between the segments center and pupil center

which is denoted as r the dominant angular orientation of the segment

denoted as ɸ To minimize the GPU computing we also convert the

descriptor value from polar coordinate to rectangular coordinate in CPU

preprocess

The descriptor vector becomes s(x y r θ ɸw) The left and right

parts of sclera in an eye may have different registration parameters For

example as an eyeball moves left left part sclera patterns of the eye may be

compressed while the right part sclera patterns are stretched

In parallel matching these two parts are assigned to threads in

different warps to allow different deformation The multiprocessor in

CUDA manages threads in groups of 32 parallel threads called warps We

reorganized the descriptor from same sides and saved

FIG

FIG

them in continuous address This would meet requirement of coalesced

memory access in GPU

After reorganizing the structure of descriptors and adding mask information

into the new descriptor the computation on the mask file is not needed on

the GPU It was very fast to match with this feature because it does not

need to reregister the templates every time after shifting Thus the cost of

data transfer and computation on GPU will be reduced Matching on the

new descriptor the shift parameters generator in Figure 4 is then simplified

as Figure 9

23 EVOLUTION OF GPU ARCHITECTURE

The fixed-function pipeline lacked the generality to efficiently express

more complicated shading and lighting operations that are essential for

complex effects The key step was replacing the fixed-function per-vertex

and per-fragment operations with user-specified programs run on each

vertex and fragment Over the past six years these vertex programs and

fragment programs have become increasingly more capable with larger

limits on their size and resource consumption with more fully featured

instruction sets and with more flexible control-flow operations After many

years of separate instruction sets for vertex and fragment operations current

GPUs support the unified Shader Model 40 on both vertex and fragment

shaders

The hardware must support shader programs of at least 65 k static

instructions and unlimited dynamic instructions

The instruction set for the first time supports both 32-bit integers and 32-

bit floating-point numbers

The hardware must allow an arbitrary number of both direct and indirect

reads from global memory (texture)

Finally dynamic flow control in the form of loops and branches must be

supported

As the shader model has evolved and become more powerful and GPU

applications of all types have increased vertex and fragment program

complexity GPU architectures have increasingly focused on the

programmable parts of the graphics pipeline Indeed while previous

generations of GPUs could best be described as additions of

programmability to a fixed-function pipeline todayrsquos GPUs are better

characterized as a programmable engine surrounded by supporting fixed-

function units General-Purpose Computing on the GPU Mapping general-

purpose computation onto the GPU uses the graphics hardware in much the

same way as any standard graphics application Because of this similarity it

is both easier and more difficult to explain the process On one hand the

actual operations are the same and are easy to follow on the other hand the

terminology is different between graphics and general-purpose use Harris

provides an excellent description of this mapping process

We begin by describing GPU programming using graphics terminology

then show how the same steps are used in a general-purpose way to author

GPGPU applications and finally use the same steps to show the more

simple and direct way that todayrsquos GPU computing applications are written

231 PROGRAMMING A GPU FOR GRAPHICS

We begin with the same GPU pipeline that we described in Section II

concentrating on the programmable aspects of this pipeline

The programmer specifies geometry that covers a region on the screen

The rasterizer generates a fragment at each pixel location covered by that

geometry

Each fragment is shaded by the fragment program

The fragment program computes the value of the fragment by a

combination of math operations and global memory reads from a global

Btexture[ memory

The resulting image can then be used as texture on future passes through

the graphics pipeline

232 PROGRAMMING A GPU FOR GENERAL-PURPOSE

PROGRAMS (OLD)

Coopting this pipeline to perform general-purpose computation

involves the exact same steps but different terminology A motivating

example is a fluid simulation computed over a grid at each time step we

compute the next state of the fluid for each grid point from the current state

at its grid point and at the grid points of its neighbors

The programmer specifies a geometric primitive that covers a

computation domain of interest The rasterizer generates a fragment at each

pixel location covered by that geometry (In our example our primitive

must cover a grid of fragments equal to the domain size of our fluid

simulation)

Each fragment is shaded by an SPMD general-purpose fragment

program (Each grid point runs the same program to update the state of its

fluid)

The fragment program computes the value of the fragment by a

combination of math operations and Bgather[ accesses from global

memory (Each grid point can access the state of its neighbors from the

previous time step in computing its current value)

The resulting buffer in global memory can then be used as an input on

future passes (The current state of the fluid will be used on the next time

step)

23 2 PROGRAMMING A GPU FOR GENERAL-PURPOSE

PROGRAMS (NEW)

One of the historical difficulties in programming GPGPU applications

has been that despite their general-purpose tasksrsquo having nothing to do with

graphics the applications still had to be programmed using graphics APIs

In addition the program had to be structured in terms of the graphics

pipeline with the programmable units only accessible as an intermediate

step in that pipeline when the programmer would almost certainly prefer to

access the programmable units directly The programming environments we

describe in detail in Section IV are solving this difficulty by providing a

more natural direct non-graphics interface to the hardware and

specifically the programmable units Today GPU computing applications

are structured in the following way

The programmer directly defines the computation domain of interest as a

structured grid of threads

An SPMD general-purpose program computes the value of each thread

The value for each thread is computed by a combination of math

operations and both Bgather[ (read) accesses from and Bscatter[ (write)

accesses to global memory Unlike in the previous two

methods the same buffer can be used for both reading and writing

allowing more flexible algorithms (for example in-place algorithms that

use less memory)

The resulting buffer in global memory can then be used as an input in

future computation

24 COARSE-TO-FINE TWO-STAGE MATCHING PROCESS

To further improve the matching process we propose the coarse-to-fine

two-stage matching process In the first stage we matched two images

coarsely using the Y-shape descriptors which is very fast to match because

no registration was needed The matching result in this stage can help filter

out image pairs with low similarities After this step it is still possible for

some false positive matches In the second stage we used WPL descriptor

to register the two images for more detailed descriptor matching including

scale- and translation invariance This stage includes shift transform affine

matrix generation and final WPL descriptor matching Overall we

partitioned the registration and matching processing into four kernels2 in

CUDA (Figure 10) matching on the Y shape descriptor shift

transformation affine matrix generation and final WSL descriptor

matching Combining these two stages the matching program can run faster

and

achieve more accurate score

241 STAGE I MATCHING WITH Y SHAPE DESCRIPTOR

Due to scale- and rotation- invariance of the Y-shape features

registration is unnecessary before matching on Y shape descriptor The

whole matching algorithm is listed as algorithm 1

FIG

Here ytei and yta j are the Y shape descriptors of test template Tte

and target template Tta respectively dϕ is the Euclidian distance of angle

element of descriptors vector defined as (3) dxy is the Euclidian distance of

two descriptor centers defined as (4) ni and di are the matched descriptor

pairsrsquo number and their centers distance respectively tϕ is a distance

threshold and txy is the threshold to restrict the searching area We set tϕ to

30 and txy to 675 in our experiment Here

To match two sclera templates we searched the areas nearby to all

the Y shape branches The search area is limited to the corresponding left or

right half of the sclera in order to reduce the searching range and time The

distance of two branches is defined in (3) where ϕi j is the angle between

the j th branch and the polar from pupil center in desctiptor i

The number of matched pairs ni and the distance between Y shape

branches centers di are stored as the matching result We fuse the number of

matched branches and the average distance between matched branches

centers as (2) Here α is a factor to fuse the matching score which was set

to 30 in our study Ni and Nj is the total numbers of feature vectors in

template i and j separately The decision is regulated by the threshold t if

the sclerarsquos matching score is lower than t the sclera will be discarded The

sclera with high matching score will be passed to the next more precisely

matching process

242 STAGE II FINE MATCHING USING WPL DESCRIPTOR

The line segment WSL descriptor reveals more vessel structure detail of

sclera than the Y shape descriptor The variation of sclera vessel pattern is

nonlinear because

When acquiring an eye image in different gaze angle the vessel structure

will appear nonlinear shrink or extend because eyeball is spherical in shape

sclera is made up of four layers episclera stroma lamina fusca and

endothelium There are slightly differences among movement of these

layers Considering these factors our registration employed both single

shift transform and multi-parameter transform which combines shift

rotation and scale together

1) SHIFT PARAMETER SEARCH As we discussed before

segmentation may not be accurate As a result the detected iris center could

not be very accurate Shift transform is designed to tolerant possible errors

in pupil center detection in the segmentation step If there is no deformation

or only very minor deformation registration with shift transform together

would be adequate to achieve an accurate result We designed Algorithm 2

to get optimized shift parameter Where Tte is the test template and ssei is

the i th WPL descriptor of Tte Tta is the target template and ssai is the i th

WPL descriptor of Ttad(stek staj ) is Euclidean distance of escriptors stek

and staj

Δsk is the shift value of two descriptors defines as

We first randomly select an equal number of segment descriptors

stek in test template Tte from each quad and find its nearest neighbors staj _

in target template Tta The shift offset of them is recorded as the possible

registration shift factor _sk The final offset registration factor is _soptim

which has the smallest standard deviation among these candidate offsets

2) AFFINE TRANSFORM PARAMETER SEARCH

Affine transform is designed to tolerant some deformation of sclera

patterns in the matching step The affine transform algorithm is shown in

Algorithm 3 The shift value in the parameter set is obtained by randomly

selecting descriptor s(it )te and calculating the distance from its nearest

neighbor staj_ in Tta We transform the test template by the matrix in (7)

At end of the iteration we count the numbers of matched descriptor pairs

from the transformed template and the target template The factor β is

involved to determine if the pair of descriptor is matched and we set it to

be 20 pixels in our experiment After N iterations the optimized transform

parameter set is determined via selecting the maximum matching numbers

m(it) Here stei Tte staj and Tta is defined same as algorithm 2 tr (it )

shi f t θ(it )tr (it ) scale is the parameters of shift rotation and scale

transform generated in i tth iteration R(θ (it )) T (tr (it ) shi f t ) and S(tr (it

) scale) are the transform matrix defined as (7) To search optimize

transform parameter we iterated N times to generate these parameters In

our experiment we set iteration time to 512

3) REGISTRATION AND MATCHING ALGORITHM

Using the optimized parameter set determined from Algorithms 2

and 3 the test template will be registered and matched simultaneously The

registration and matching algorithm is listed in Algorithm 4 Here stei Tte

staj and Tta are defined same as Algorithms 2 and 3 θ(optm) tr (optm) shi f

t tr (optm) scale _soptim are the registration parameters attained from

Algorithms 2 and 3 R_θ(optm)_T _tr (optm) shi f t _S_tr (optm) scale

_ is the descriptor transform matrix defined in Algorithm 3 empty is the angle

between the segment descriptor and radius direction w is the weight of the

descriptor which indicates whether the descriptor is at the edge of sclera or

not To ensure that the nearest descriptors have a similar orientation we

used a constant factor α to check the abstract difference of two ɸ In our

experiment we set α to 5 The total matching score is minimal score of two

transformed result divided by the minimal matching score for test template

and target template

25 MAPPING THE SUBTASKS TO CUDA

CUDA is a single instruction multiple data (SIMD) system and

works as a coprocessor with a CPU A CUDA consists of many streaming

multiprocessors (SM) where the parallel part of the program should be

partitioned into threads by the programmer and mapped into those threads

There are multiple memory spaces in the CUDA memory hierarchy

register local memory shared memory global memory constant memory

and texture memory Register local memory and shared memory are on-

chip and could be a little time consuming to access these memories Only

shared memory can be accessed by other threads within the same block

However there is only limited availability of shared memory Global

memory constant memory and texture memory are off-chip memory and

accessible by all threads which would be very time consuming to access

these memories

Constant memory and texture memory are read-only and cacheable

memory Mapping algorithms to CUDA to achieve efficient processing is

not a trivial task There are several challenges in CUDA programming

If threads in a warp have different control path all the branches will be

executed serially To improve performance branch divergence within a

warp should be avoided

Global memory is slower than on-chip memory in term of access To

completely hide the latency of the small instructions set we should use on-

chip memory preferentially rather than global memory When global

memory access occurs threads in same warp should access the words in

sequence to achieve coalescence

Shared memory is much faster than the local and global memory space

But shared memory is organized into banks which are equal in size If two

addresses of memory request from different thread within a warp fall in the

same memory bank the access will be serialized To get maximum

performance memory requests should be scheduled to minimize bank

conflicts

251 MAPPING ALGORITHM TO BLOCKS

Because the proposed registration and matching algorithm has four

independent modules all the modules will be converted to different kernels

on the GPU These kernels are different in computation density thus we

map them to the GPU by various map strategies to fully utilize the

computing power of CUDA Figure 11 shows our scheme of CPU-GPU

task distribution and the partition among blocks and threads Algorithm 1 is

partitioned into coarse-grained parallel subtasks

We create a number of threads in this kernel The number of threads

is the same as the number of templates in the database As the upper middle

column shows in Figure 11 each target template will be assigned to one

thread One thread performs a pair of templates compare In our work we

use NVIDIA C2070 as our GPU Threads and blocks number is set to

1024 That means we can match our test template with up to 1024times1024

target templates at same time

Algorithms 2-4 will be partitioned into fine-grained subtasks which is

processed a section of descriptors in one thread As the lower portion of the

middle column shows in Figure 11 we assigned a target template to one

block Inside a block one thread corresponds a set of descriptors in this

template This partition makes every block execute independently and there

are no data exchange requirements between different blocks When all

threads complete their responding descriptor fractions the sum of the

intermediate results needs to be computed or compared A parallel prefix

sum algorithm is used to calculate the sum of intermediate results which is

show in right of Figure 11 Firstly all odd number threads compute the sum

of consecutive pairs of the results Then recursively every first of i (= 4 8

16 32 64 ) threads

compute the prefix sum on the new result The final result will be saved in

the first address which has the same variable name as the first intermediate

result

252 MAPPING INSIDE BLOCK

In shift argument searching there are two schemes we can choose to

map task

Mapping one pair of templates to all the threads in a block and then every

thread would take charge of a fraction of descriptors and cooperation with

other threads

Assigning a single possible shift offset to a thread and all the threads will

compute independently unless the final result should be compared with

other possible offset

Due to great number of sum and synchronization operations in every

nearest neighbor searching step we choose the second method to parallelize

shift searching In affine matrix generator we mapped an entire parameter

set searching to a thread and every thread randomly generated a set of

parameters and tried them independently The generated iterations were

assigned to all threads The challenge of this step is the randomly generated

numbers might be correlated among threads In the step of rotation and

scale registration generating we used the Mersenne Twister pseudorandom

number generator because it can use bitwise arithmetic and have long

period

The Mersenne twister as most of pseudorandom generators is iterative

Therefore itrsquos hard to parallelize a single twister state update step among

several execution threads To make sure that thousands of threads in the

launch grid generate uncorrelated random sequence many simultaneous

Mersenne twisters need to process with different initial states in parallel

But even ldquovery differentrdquo (by any definition) initial state values do not

prevent the emission of correlated sequences by each generator sharing

identical parameters To solve this problem and to enable efficient

implementation of Mersenne Twister on parallel architectures we used a

special offline tool for the dynamic creation of Mersenne Twisters

parameters modified from the algorithm developed by Makoto Matsumoto

and Takuji Nishimura In the registration and matching step when

searching the nearest neighbor a line segment that has already matched

with others should not be used again In our approach a flag

FIG

FIG

Variable denoting whether the line has been matched is stored in

shared memory To share the flags all the threads in a block should wait

synchronic operation at every query step Our solution is to use a single

thread in a block to process the matching

253 MEMORY MANAGEMENT

The bandwidth inside GPU board is much higher than the

bandwidth between host memory and device memory The data transfer

between host and device can lead to long latency As shown in Figure 11

we load the entire target templates set from database without considering

when they would be processed Therefore there was no data transfer from

host to device during the matching procedure In global memory the

components in descriptor y(ϕ1 ϕ2 ϕ3 x y) and s(x y rθ ϕw) were stored

separately This would guarantee contiguous kernels of Algorithm 2 to 4

can access their data in successive addresses Although such coalescing

access reduces the latency frequently global memory access was still a

slower way to get data In our kernel we loaded the test template to shared

memory to accelerate memory access Because the Algorithms 2 to 4

execute different number of iterations on same data the bank conflict does

not happen To maximize our texture memory space we set the system

cache to the lowest value and bonded our target descriptor to texture

memory Using this catchable memory our data access was accelerated

more

FIG

26 HISTOGRAM OF ORIENTED GRADIENTS

Histogram of oriented gradients is the feature descriptors It is primarily

applied to the design of target detection In this paper it is applied as the

feature for human recognition In the sclera region the vein patterns are the

edges of an image So HOG is used to determine the gradient orientation

and edge orientations of vein pattern in the sclera region of an eye image

To follow out this technique first of all divide the image into small

connected regions called cells For each cell compute the histogram of

gradient directions or edge orientations of the pixels Then the combination

of different histogram of different cell represents the descriptor To improve

accuracy histograms can be contrast normalized by calculating the intensity

from the block and then using this value normalizes all cells within the

block This normalization result shows that it is invariant to geometric and

photometric changes The gradient magnitude m(x y) and orientation 1050592(x

y) are calculated using x and y directions gradients dx (x y) and dy (x y)

Orientation binning is the second step of HOG This method utilized

to create cell histograms Each pixel within the cell used to give a weight to

the orientation which is found in the gradient computation Gradient

magnitude is used as the weight The cells are in the rectangular form The

binning of gradient orientation should be spread over 0 to 180 degrees and

opposite direction counts as the same In the Fig 8 depicts the edge

orientation of picture elements If the images have any illumination and

contrast changes then the gradient strength must be locally normalized For

that cells are grouped together into larger blocks These blocks are

overlapping so that each cell contributes more than once to the final

descriptor Here rectangular HOG (R-HOG) blocks are applied which are

mainly in square grids The performance of HOG is improved by putting

on a Gaussian window into each block

FIG

CHAPTER 3

SOFTWARE SPECIFICATION

31 GENERAL

MATLAB(matrix laboratory) is a numerical

computing environment and fourth-generation programming language

Developed by Math Works MATLAB allows matrix manipulations

plotting of functions and data implementation of algorithms creation

of user interfaces and interfacing with programs written in other languages

including C C++ Java and Fortran

Although MATLAB is intended primarily for numerical computing an

optional toolbox uses the MuPAD symbolic engine allowing access

to symbolic computing capabilities An additional package Simulink adds

graphicalmulti-domainsimulationandModel-Based

Design for dynamic and embedded systems

In 2004 MATLAB had around one million users across industry

and academia MATLAB users come from various backgrounds

of engineering science and economics MATLAB is widely used in

academic and research institutions as well as industrial enterprises

MATLAB was first adopted by researchers and practitioners

in control engineering Littles specialty but quickly spread to many other

domains It is now also used in education in particular the teaching

of linear algebra and numerical analysis and is popular amongst scientists

involved in image processing The MATLAB application is built around the

MATLAB language The simplest way to execute MATLAB code is to type

it in the Command Window which is one of the elements of the MATLAB

Desktop When code is entered in the Command Window MATLAB can

be used as an interactive mathematical shell Sequences of commands can

be saved in a text file typically using the MATLAB Editor as a script or

encapsulated into a function extending the commands available

MATLAB provides a number of features for documenting and

sharing your work You can integrate your MATLAB code with other

languages and applications and distribute your MATLAB algorithms and

applications

32 FEATURES OF MATLAB

High-level language for technical computing

Development environment for managing code files and data

Interactive tools for iterative exploration design and problem solving

Mathematical functions for linear algebra statistics Fourier analysis

filtering optimization and numerical integration

2-D and 3-D graphics functions for visualizing data

Tools for building custom graphical user interfaces

Functions for integrating MATLAB based algorithms with external

applications and languages such as C C++ FORTRAN Javatrade COM

and Microsoft Excel

MATLAB is used in vast area including signal and image

processing communications control design test and measurement

financial modeling and analysis and computational Add-on toolboxes

(collections of special-purpose MATLAB functions) extend the MATLAB

environment to solve particular classes of problems in these application

areas

MATLAB can be used on personal computers and powerful

server systems including the Cheaha compute cluster With the addition of

the Parallel Computing Toolbox the language can be extended with parallel

implementations for common computational functions including for-loop

unrolling Additionally this toolbox supports offloading computationally

intensive workloads to Cheaha the campus compute cluster MATLAB is

one of a few languages in which each variable is a matrix (broadly

construed) and knows how big it is Moreover the fundamental operators

(eg addition multiplication) are programmed to deal with matrices when

required And the MATLAB environment handles much of the bothersome

housekeeping that makes all this possible Since so many of the procedures

required for Macro-Investment Analysis involves matrices MATLAB

proves to be an extremely efficient language for both communication and

implementation

321 INTERFACING WITH OTHER LANGUAGES

MATLAB can call functions and subroutines written in the C

programming language or FORTRAN A wrapper function is created

allowing MATLAB data types to be passed and returned The dynamically

loadable object files created by compiling such functions are termed MEX-

files (for MATLAB executable)

Libraries written in Java ActiveX or NET can be directly called

from MATLAB and many MATLAB libraries (for

example XML or SQL support) are implemented as wrappers around Java

or ActiveX libraries Calling MATLAB from Java is more complicated but

can be done with MATLAB extension which is sold separately by Math

Works or using an undocumented mechanism called JMI (Java-to-Mat lab

Interface) which should not be confused with the unrelated Java that is also

called JMI

As alternatives to the MuPAD based Symbolic Math Toolbox

available from Math Works MATLAB can be connected

to Maple or Mathematical

Libraries also exist to import and export MathML

Development Environment

Startup Accelerator for faster MATLAB startup on Windows especially on

Windows XP and for network installations

Spreadsheet Import Tool that provides more options for selecting and

loading mixed textual and numeric data

Readability and navigation improvements to warning and error messages in

the MATLAB command window

Automatic variable and function renaming in the MATLAB Editor

Developing Algorithms and Applications

MATLAB provides a high-level language and development

tools that let you quickly develop and analyze your algorithms and

applications

The MATLAB Language

The MATLAB language supports the vector and matrix operations

that are fundamental to engineering and scientific problems It enables fast

development and execution With the MATLAB language you can

program and develop algorithms faster than with traditional languages

because you do not need to perform low-level administrative tasks such as

declaring variables specifying data types and allocating memory In many

cases MATLAB eliminates the need for lsquoforrsquo loops As a result one line of

MATLAB code can often replace several lines of C or C++ code

At the same time MATLAB provides all the features of a traditional

programming language including arithmetic operators flow control data

structures data types object-oriented programming (OOP) and debugging

features

MATLAB lets you execute commands or groups of commands one

at a time without compiling and linking enabling you to quickly iterate to

the optimal solution For fast execution of heavy matrix and vector

computations MATLAB uses processor-optimized libraries For general-

purpose scalar computations MATLAB generates machine-code

instructions using its JIT (Just-In-Time) compilation technology

This technology which is available on most platforms provides

execution speeds that rival those of traditional programming languages

Development Tools

MATLAB includes development tools that help you implement

your algorithm efficiently These include the following

MATLAB Editor

Provides standard editing and debugging features such as setting

breakpoints and single stepping

Code Analyzer

Checks your code for problems and recommends modifications to

maximize performance and maintainability

MATLAB Profiler

Records the time spent executing each line of code

Directory Reports

Scan all the files in a directory and report on code efficiency file

differences file dependencies and code coverage

Designing Graphical User Interfaces

By using the interactive tool GUIDE (Graphical User Interface

Development Environment) to layout design and edit user interfaces

GUIDE lets you include list boxes pull-down menus push buttons radio

buttons and sliders as well as MATLAB plots and Microsoft

ActiveXreg controls Alternatively you can create GUIs programmatically

using MATLAB functions

322 ANALYZING AND ACCESSING DATA

MATLAB supports the entire data analysis process from acquiring

data from external devices and databases through preprocessing

visualization and numerical analysis to producing presentation-quality

output

Data Analysis

MATLAB provides interactive tools and command-line functions for data

analysis operations including

Interpolating and decimating

Extracting sections of data scaling and averaging

Thresholding and smoothing

Correlation Fourier analysis and filtering

1-D peak valley and zero finding

Basic statistics and curve fitting

Matrix analysis

Data Access

MATLAB is an efficient platform for accessing data from

files other applications databases and external devices You can read data

from popular file formats such as Microsoft Excel ASCII text or binary

files image sound and video files and scientific files such as HDF and

HDF5 Low-level binary file IO functions let you work with data files in

any format Additional functions let you read data from Web pages and

XML

Visualizing Data

All the graphics features that are required to visualize engineering

and scientific data are available in MATLAB These include 2-D and 3-D

plotting functions 3-D volume visualization functions tools for

interactively creating plots and the ability to export results to all popular

graphics formats You can customize plots by adding multiple axes

changing line colors and markers adding annotation Latex equations and

legends and drawing shapes

2-D Plotting

Visualizing vectors of data with 2-D plotting functions that create

Line area bar and pie charts

Direction and velocity plots

Histograms

Polygons and surfaces

Scatterbubble plots

Animations

3-D Plotting and Volume Visualization

MATLAB provides functions for visualizing 2-D matrices 3-

D scalar and 3-D vector data You can use these functions to visualize and

understand large often complex multidimensional data Specifying plot

characteristics such as camera viewing angle perspective lighting effect

light source locations and transparency

3-D plotting functions include

Surface contour and mesh

Image plots

Cone slice stream and isosurface

323 PERFORMING NUMERIC COMPUTATION

MATLAB contains mathematical statistical and engineering

functions to support all common engineering and science operations These

functions developed by experts in mathematics are the foundation of the

MATLAB language The core math functions use the LAPACK and BLAS

linear algebra subroutine libraries and the FFTW Discrete Fourier

Transform library Because these processor-dependent libraries are

optimized to the different platforms that MATLAB supports they execute

faster than the equivalent C or C++ code

MATLAB provides the following types of functions for performing

mathematical operations and analyzing data

Matrix manipulation and linear algebra

Polynomials and interpolation

Fourier analysis and filtering

Data analysis and statistics

Optimization and numerical integration

Ordinary differential equations (ODEs)

Partial differential equations (PDEs)

Sparse matrix operations

MATLAB can perform arithmetic on a wide range of data types

including doubles singles and integers

CHAPTER 4

IMPLEMENTATION

41 GENERAL

Matlab is a program that was originally designed to simplify the

implementation of numerical linear algebra routines It has since grown into

something much bigger and it is used to implement numerical algorithms

for a wide range of applications The basic language used is very similar to

standard linear algebra notation but there are a few extensions that will

likely cause you some problems at first

42 SNAPSHOTS

ORIGINAL SCLERA IMAGE IS CONVERTED INTO GREY SCALE IMAGE

FIG

GREY SCALE IMAGE IS CONVERTED INTO BINARY IMAGE

FIG

EDGE DETECTON IS DONE BY OTSUrsquoS THRESHOLDING

FIG

SELECTING THE REGION OF INTEREST (SCLERA PART)

FIG

SELECTED ROI PART

FIG

FIG

ENHANCEMENT OF SCLERA IMAGE

FIG

FEATURE EXTRACTION OF SCLERA IMAGE USING GABOR

FILTERS

FIG

MATCHING WITH IMAGES IN DATABASE

FIG

DISPLAYING THE RESULT (MATCHED OR NOT MATCHED)

FIG

CHAPTER 5

APPLICATIONS

The applications of biometrics can be divided into the following three main groups

Commercial applications such as computer network login electronic data security ecommerce Internet access ATM credit card physical access control cellular phone PDA medical records management distance learning etc

Government applications such as national ID card correctional facility driverrsquos license social security welfare-disbursement border control Passport control etc

Forensic applications such as corpse identification criminal investigation terrorist identification parenthood determination missing children etc Traditionally commercial applications have used knowledge-based systems (eg PIN sand passwords) government applications have used token-based systems (eg ID cards and badges) and forensic applications have relied on human experts to match biometric features Biometric systems are being increasingly deployed in large scale civilian applications The Schiphol Premium scheme at the Amsterdam airport for example employs iris scan cards to speed up the passport and visa control procedures

CHAPTER 6

CONCLUSION AND FUTURE SCOPE

61 CONCLUSION

In this paper we proposed a new parallel sclera vein recognition

method which employees a two stage parallel approach for registration and

matching Even though the research focused on developing a parallel sclera

matching solution for the sequential line-descriptor method using CUDA

GPU architecture the parallel strategies developed in this research can be

applied to design parallel solutions to other sclera vein recognition methods

and general pattern recognition methods We designed the Y shape

descriptor to narrow the search range to increase the matching efficiency

which is a new feature extraction method to take advantage of the GPU

structures We developed the WPL descriptor to incorporate mask

information and make it more suitable for parallel computing which can

dramatically reduce data transferring and computation We then carefully

mapped our algorithms to GPU threads and blocks which is an important

step to achieve parallel computation efficiency using a GPU A work flow

which has high arithmetic intensity to hide the memory access latency was

designed to partition the computation task to the heterogeneous system of

CPU and GPU even to the threads in GPU The proposed method

dramatically improves the matching efficiency without compromising

recognition accuracy

62 REFERENCES

[1] C W Oyster The Human Eye Structure and Function Sunderland

Sinauer Associates 1999

[2] C Cuevas D Berjon F Moran and N Garcia ldquoMoving object

detection for real-time augmented reality applications in a GPGPUrdquo IEEE

Trans Consum Electron vol 58 no 1 pp 117ndash125 Feb 2012

[3] D C Cirean U Meier L M Gambardella and J Schmidhuber ldquoDeep

big simple neural nets for handwritten digit recognitionrdquo Neural Comput

vol 22 no 12 pp 3207ndash3220 2010

[4] F Z Sakr M Taher and A M Wahba ldquoHigh performance iris

recognition system on GPUrdquo in Proc ICCES 2011 pp 237ndash242

[5] G Poli J H Saito J F Mari and M R Zorzan ldquoProcessing

neocognitron of face recognition on high performance environment based

on GPU with CUDA architecturerdquo in Proc 20th Int Symp Comput

Archit High Perform Comput 2008 pp 81ndash88

[6] J Antikainen J Havel R Josth A Herout P Zemcik and M Hauta-

Kasari ldquoNonnegative tensor factorization accelerated using GPGPUrdquo IEEE

Trans Parallel Distrib Syst vol 22 no 7 pp 1135ndash1141 Feb 2011

[7] K-S Oh and K Jung ldquoGPU implementation of neural networksrdquo

Pattern Recognit vol 37 no 6 pp 1311ndash1314 2004

[8] P R Dixon T Oonishi and S Furui ldquoHarnessing graphics processors

for the fast computation of acoustic likelihoods in speech recognitionrdquo

Comput Speech Lang vol 23 no 4 pp 510ndash526 2009

[9] P Kaufman and A Alm ldquoClinical applicationrdquo Adlerrsquos Physiology of

the Eye 2003

[10] R N Rakvic B J Ulis R P Broussard R W Ives and N Steiner

ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics Security vol 4

no 4 pp 812ndash823 Dec 2009

[11] S Crihalmeanu and A Ross ldquoMultispectral scleral patterns for ocular

biometric recognitionrdquo Pattern Recognit Lett vol 33 no 14 pp 1860ndash

1869 Oct 2012

[12] W Wenying Z Dongming Z Yongdong L Jintao and G

Xiaoguang ldquoRobust spatial matching for object retrieval and its parallel

implementation on GPUrdquo IEEE Trans Multimedia vol 13 no 6 pp

1308ndash1318 Dec 2011Multimedia Sec Magdeburg Germany Sep 2004

pp 4ndash15

[13] Y Xu S Deka and R Righetti ldquoA hybrid CPU-GPGPU approach for

real-time elastographyrdquo IEEE Trans Ultrason Ferroelectr Freq Control

vol 58 no 12 pp 2631ndash2645 Dec 2011

[14] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive

multimodal eye recognitionrdquo Signal Image Video Process vol 7 no 4

pp 619ndash631 Jul 2013

[15] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive

approach for sclera image quality measurerdquo Int J Biometrics vol 5 no 2

pp 181ndash198 2013

[16] Z Zhou E Y Du N L Thomas and E J Delp ldquoA new human

identification method Sclera recognitionrdquo IEEE Trans Syst Man

Cybern A Syst Humans vol 42 no 3 pp 571ndash583 May 2012

Page 38: Major project report on

filters Binary morphological operations are used to thin the detected vein

structure down to a single pixel wide skeleton and remove the branch

points The line descriptor is used to describe the segments in the vein

structure Figure 2 shows a visual description of the line descriptor Each

segment is described by three quantities the segments angle to some

reference angle at the iris center θ the segments distance to the iris center r

and the dominant angular orientation of the line segment ɸ Thus the

descriptor is S = ( θ r ɸ )T The individual components of the line descriptor

are calculated as

FIG

Here fline (x) is the polynomial approximation of the line segment (xl yl )

is the center point of the line segment (xi yi ) is the center of the detected

iris and S is the line descriptor In order to register the segments of the

vascular patterns a RANSAC-based algorithm is used to estimate the best-

fit parameters for registration between the two sclera vascular patterns For

the registration algorithm it randomly chooses two points ndash one from the

test template and one from the target template It also randomly chooses a

scaling factor and a rotation value based on a priori knowledge of the

database Using these values it calculates a fitness value for the registration

using these parameters

After sclera template registration each line segment in the test

template is compared to the line segments in the target template for

matches In order to reduce the effect of segmentation errors we created the

weighting image (Figure 3) from the sclera mask by setting interior pixels

in the sclera mask to 1 pixels within some distance of the boundary of the

mask to 05 and pixels outside the mask to 0

The matching score for two segment descriptors is calculated By

where Si and Sj are two segment descriptors m(Si Sj ) is the matching

score between segments Si and Sj d(Si Sj ) is the Euclidean distance

between the segment descriptors center points (from Eq 6-8) Dmatch is

the matching distance threshold and match is the matching angle threshold

The total matching score M is the sum of the individual matching scores

divided by the maximum matching score for the minimal set between the

test and target template That is one of the test or target templates has fewer

points and thus the sum of its descriptors weight sets the maximum score

that can be attained

FIG

FIG

FIG

FIG

movement of eye Y shape branches are observed to be a stable feature and

can be used as sclera feature descriptor To detect the Y shape branches in

the original template we search for the nearest neighbors set of every line

segment in a regular distance classified the angles among these neighbors

If there were two types of angle values in the line segment set this set may

be inferred as a Y shape structure and the line segment angles would be

recorded as a new feature of the sclera

There are two ways to measure both orientation and relationship of

every branch of Y shape vessels one is to use the angles of every branch to

x axle the other is to use the angels between branch and iris radial

direction The first method needs additional rotation operating to align the

template In our approach we employed the second method As Figure 6

shows ϕ1 ϕ2 and ϕ3 denote the angle between each branch and the radius

from pupil center Even when the head tilts the eye moves or the camera

zooms occurs at the image acquisition step ϕ1 ϕ2 and ϕ3 are quite stable

To tolerate errors from the pupil center calculation in the segmentation step

we also recorded the center position (x y) of the Y shape branches as

auxiliary parameters So in our rotation shift and scale invariant feature

vector is defined as y(ϕ1 ϕ2 ϕ3 x y) The Y-shape descriptor is generated

with reference to the iris center Therefore it is automatically aligned to the

iris centers It is a rotational- and scale- invariant descriptor V WPL

SCLERA DESCRIPTOR As we discussed in the Section 22 the line

descriptor is extracted from the skeleton of vessel structure in binary images

(Figure 7) The skeleton is then broken into smaller segments For each

segment a line descriptor is created to record the center and orientation of

the segment This descriptor is expressed as s(x yɸ) where (x y) is the

position of the center and ɸ is its orientation Because of the limitation of

segmentation accuracy the descriptor in the boundary of sclera area might

not be accurate and may contain spur edges resulting from the iris eyelid

andor eyelashes To be tolerant of such error the mask file

FIG

The line descriptor of the sclera vessel pattern (a) An eye image (b) Vessel

patterns in sclera (c) Enhanced sclera vessel patterns (d) Centers of line

segments of vessel patterns

Is designed to indicate whether a line segment belongs to the edge of the

sclera or not However in GPU application using the mask is a challenging

since the mask files are large in size and will occupy the GPU memory and

slow down the data transfer When matching the registration RANSAC

type algorithm was used to randomly select the corresponding descriptors

and the transform parameter between them was used to generate the

template transform affine matrix After every templates transform the mask

data should also be transformed and new boundary should be calculated to

evaluate the weight of the transformed descriptor This results in too many

convolutions in processor unit

To reduce heavy data transfer and computation we designed the

weighted polar line (WPL) descriptor structure which includes the

information of mask and can be automatically aligned We extracted the

relationship of geometric feature of descriptors and store them as a new

descriptor We use a weighted image created via setting various weight

values according to their positions The weight of those descriptors who are

beyond the sclera are set to be 0 and those who are near the sclera

boundary are 05 and interior descriptors are set to be 1 In our work

descriptors weights were calculated on their own mask by the CPU only

once

The calculating result was saved as a component of descriptor The

descriptor of sclera will change to s(x y ɸw) where w denotes the weight

of the point and the value may be 0 05 1 To align two templates when a

template is shifted to another location along the line connecting their

centers all the descriptors of that template will be transformed It would be

faster if two templates have similar reference points If we use the center of

the iris as the reference point when two templates are compared the

correspondence will automatically be aligned to each other since they have

the similar reference point Every feature vector of the template is a set of

line segment descriptors composed of three variable (Figure 8) the

segment angle to the reference line which went through the iris center

denoted as θ the distance between the segments center and pupil center

which is denoted as r the dominant angular orientation of the segment

denoted as ɸ To minimize the GPU computing we also convert the

descriptor value from polar coordinate to rectangular coordinate in CPU

preprocess

The descriptor vector becomes s(x y r θ ɸw) The left and right

parts of sclera in an eye may have different registration parameters For

example as an eyeball moves left left part sclera patterns of the eye may be

compressed while the right part sclera patterns are stretched

In parallel matching these two parts are assigned to threads in

different warps to allow different deformation The multiprocessor in

CUDA manages threads in groups of 32 parallel threads called warps We

reorganized the descriptor from same sides and saved

FIG

FIG

them in continuous address This would meet requirement of coalesced

memory access in GPU

After reorganizing the structure of descriptors and adding mask information

into the new descriptor the computation on the mask file is not needed on

the GPU It was very fast to match with this feature because it does not

need to reregister the templates every time after shifting Thus the cost of

data transfer and computation on GPU will be reduced Matching on the

new descriptor the shift parameters generator in Figure 4 is then simplified

as Figure 9

23 EVOLUTION OF GPU ARCHITECTURE

The fixed-function pipeline lacked the generality to efficiently express

more complicated shading and lighting operations that are essential for

complex effects The key step was replacing the fixed-function per-vertex

and per-fragment operations with user-specified programs run on each

vertex and fragment Over the past six years these vertex programs and

fragment programs have become increasingly more capable with larger

limits on their size and resource consumption with more fully featured

instruction sets and with more flexible control-flow operations After many

years of separate instruction sets for vertex and fragment operations current

GPUs support the unified Shader Model 40 on both vertex and fragment

shaders

The hardware must support shader programs of at least 65 k static

instructions and unlimited dynamic instructions

The instruction set for the first time supports both 32-bit integers and 32-

bit floating-point numbers

The hardware must allow an arbitrary number of both direct and indirect

reads from global memory (texture)

Finally dynamic flow control in the form of loops and branches must be

supported

As the shader model has evolved and become more powerful and GPU

applications of all types have increased vertex and fragment program

complexity GPU architectures have increasingly focused on the

programmable parts of the graphics pipeline Indeed while previous

generations of GPUs could best be described as additions of

programmability to a fixed-function pipeline todayrsquos GPUs are better

characterized as a programmable engine surrounded by supporting fixed-

function units General-Purpose Computing on the GPU Mapping general-

purpose computation onto the GPU uses the graphics hardware in much the

same way as any standard graphics application Because of this similarity it

is both easier and more difficult to explain the process On one hand the

actual operations are the same and are easy to follow on the other hand the

terminology is different between graphics and general-purpose use Harris

provides an excellent description of this mapping process

We begin by describing GPU programming using graphics terminology

then show how the same steps are used in a general-purpose way to author

GPGPU applications and finally use the same steps to show the more

simple and direct way that todayrsquos GPU computing applications are written

231 PROGRAMMING A GPU FOR GRAPHICS

We begin with the same GPU pipeline that we described in Section II

concentrating on the programmable aspects of this pipeline

The programmer specifies geometry that covers a region on the screen

The rasterizer generates a fragment at each pixel location covered by that

geometry

Each fragment is shaded by the fragment program

The fragment program computes the value of the fragment by a

combination of math operations and global memory reads from a global

Btexture[ memory

The resulting image can then be used as texture on future passes through

the graphics pipeline

232 PROGRAMMING A GPU FOR GENERAL-PURPOSE

PROGRAMS (OLD)

Coopting this pipeline to perform general-purpose computation

involves the exact same steps but different terminology A motivating

example is a fluid simulation computed over a grid at each time step we

compute the next state of the fluid for each grid point from the current state

at its grid point and at the grid points of its neighbors

The programmer specifies a geometric primitive that covers a

computation domain of interest The rasterizer generates a fragment at each

pixel location covered by that geometry (In our example our primitive

must cover a grid of fragments equal to the domain size of our fluid

simulation)

Each fragment is shaded by an SPMD general-purpose fragment

program (Each grid point runs the same program to update the state of its

fluid)

The fragment program computes the value of the fragment by a

combination of math operations and Bgather[ accesses from global

memory (Each grid point can access the state of its neighbors from the

previous time step in computing its current value)

The resulting buffer in global memory can then be used as an input on

future passes (The current state of the fluid will be used on the next time

step)

23 2 PROGRAMMING A GPU FOR GENERAL-PURPOSE

PROGRAMS (NEW)

One of the historical difficulties in programming GPGPU applications

has been that despite their general-purpose tasksrsquo having nothing to do with

graphics the applications still had to be programmed using graphics APIs

In addition the program had to be structured in terms of the graphics

pipeline with the programmable units only accessible as an intermediate

step in that pipeline when the programmer would almost certainly prefer to

access the programmable units directly The programming environments we

describe in detail in Section IV are solving this difficulty by providing a

more natural direct non-graphics interface to the hardware and

specifically the programmable units Today GPU computing applications

are structured in the following way

The programmer directly defines the computation domain of interest as a

structured grid of threads

An SPMD general-purpose program computes the value of each thread

The value for each thread is computed by a combination of math

operations and both Bgather[ (read) accesses from and Bscatter[ (write)

accesses to global memory Unlike in the previous two

methods the same buffer can be used for both reading and writing

allowing more flexible algorithms (for example in-place algorithms that

use less memory)

The resulting buffer in global memory can then be used as an input in

future computation

24 COARSE-TO-FINE TWO-STAGE MATCHING PROCESS

To further improve the matching process we propose the coarse-to-fine

two-stage matching process In the first stage we matched two images

coarsely using the Y-shape descriptors which is very fast to match because

no registration was needed The matching result in this stage can help filter

out image pairs with low similarities After this step it is still possible for

some false positive matches In the second stage we used WPL descriptor

to register the two images for more detailed descriptor matching including

scale- and translation invariance This stage includes shift transform affine

matrix generation and final WPL descriptor matching Overall we

partitioned the registration and matching processing into four kernels2 in

CUDA (Figure 10) matching on the Y shape descriptor shift

transformation affine matrix generation and final WSL descriptor

matching Combining these two stages the matching program can run faster

and

achieve more accurate score

241 STAGE I MATCHING WITH Y SHAPE DESCRIPTOR

Due to scale- and rotation- invariance of the Y-shape features

registration is unnecessary before matching on Y shape descriptor The

whole matching algorithm is listed as algorithm 1

FIG

Here ytei and yta j are the Y shape descriptors of test template Tte

and target template Tta respectively dϕ is the Euclidian distance of angle

element of descriptors vector defined as (3) dxy is the Euclidian distance of

two descriptor centers defined as (4) ni and di are the matched descriptor

pairsrsquo number and their centers distance respectively tϕ is a distance

threshold and txy is the threshold to restrict the searching area We set tϕ to

30 and txy to 675 in our experiment Here

To match two sclera templates we searched the areas nearby to all

the Y shape branches The search area is limited to the corresponding left or

right half of the sclera in order to reduce the searching range and time The

distance of two branches is defined in (3) where ϕi j is the angle between

the j th branch and the polar from pupil center in desctiptor i

The number of matched pairs ni and the distance between Y shape

branches centers di are stored as the matching result We fuse the number of

matched branches and the average distance between matched branches

centers as (2) Here α is a factor to fuse the matching score which was set

to 30 in our study Ni and Nj is the total numbers of feature vectors in

template i and j separately The decision is regulated by the threshold t if

the sclerarsquos matching score is lower than t the sclera will be discarded The

sclera with high matching score will be passed to the next more precisely

matching process

242 STAGE II FINE MATCHING USING WPL DESCRIPTOR

The line segment WSL descriptor reveals more vessel structure detail of

sclera than the Y shape descriptor The variation of sclera vessel pattern is

nonlinear because

When acquiring an eye image in different gaze angle the vessel structure

will appear nonlinear shrink or extend because eyeball is spherical in shape

sclera is made up of four layers episclera stroma lamina fusca and

endothelium There are slightly differences among movement of these

layers Considering these factors our registration employed both single

shift transform and multi-parameter transform which combines shift

rotation and scale together

1) SHIFT PARAMETER SEARCH As we discussed before

segmentation may not be accurate As a result the detected iris center could

not be very accurate Shift transform is designed to tolerant possible errors

in pupil center detection in the segmentation step If there is no deformation

or only very minor deformation registration with shift transform together

would be adequate to achieve an accurate result We designed Algorithm 2

to get optimized shift parameter Where Tte is the test template and ssei is

the i th WPL descriptor of Tte Tta is the target template and ssai is the i th

WPL descriptor of Ttad(stek staj ) is Euclidean distance of escriptors stek

and staj

Δsk is the shift value of two descriptors defines as

We first randomly select an equal number of segment descriptors

stek in test template Tte from each quad and find its nearest neighbors staj _

in target template Tta The shift offset of them is recorded as the possible

registration shift factor _sk The final offset registration factor is _soptim

which has the smallest standard deviation among these candidate offsets

2) AFFINE TRANSFORM PARAMETER SEARCH

Affine transform is designed to tolerant some deformation of sclera

patterns in the matching step The affine transform algorithm is shown in

Algorithm 3 The shift value in the parameter set is obtained by randomly

selecting descriptor s(it )te and calculating the distance from its nearest

neighbor staj_ in Tta We transform the test template by the matrix in (7)

At end of the iteration we count the numbers of matched descriptor pairs

from the transformed template and the target template The factor β is

involved to determine if the pair of descriptor is matched and we set it to

be 20 pixels in our experiment After N iterations the optimized transform

parameter set is determined via selecting the maximum matching numbers

m(it) Here stei Tte staj and Tta is defined same as algorithm 2 tr (it )

shi f t θ(it )tr (it ) scale is the parameters of shift rotation and scale

transform generated in i tth iteration R(θ (it )) T (tr (it ) shi f t ) and S(tr (it

) scale) are the transform matrix defined as (7) To search optimize

transform parameter we iterated N times to generate these parameters In

our experiment we set iteration time to 512

3) REGISTRATION AND MATCHING ALGORITHM

Using the optimized parameter set determined from Algorithms 2

and 3 the test template will be registered and matched simultaneously The

registration and matching algorithm is listed in Algorithm 4 Here stei Tte

staj and Tta are defined same as Algorithms 2 and 3 θ(optm) tr (optm) shi f

t tr (optm) scale _soptim are the registration parameters attained from

Algorithms 2 and 3 R_θ(optm)_T _tr (optm) shi f t _S_tr (optm) scale

_ is the descriptor transform matrix defined in Algorithm 3 empty is the angle

between the segment descriptor and radius direction w is the weight of the

descriptor which indicates whether the descriptor is at the edge of sclera or

not To ensure that the nearest descriptors have a similar orientation we

used a constant factor α to check the abstract difference of two ɸ In our

experiment we set α to 5 The total matching score is minimal score of two

transformed result divided by the minimal matching score for test template

and target template

25 MAPPING THE SUBTASKS TO CUDA

CUDA is a single instruction multiple data (SIMD) system and

works as a coprocessor with a CPU A CUDA consists of many streaming

multiprocessors (SM) where the parallel part of the program should be

partitioned into threads by the programmer and mapped into those threads

There are multiple memory spaces in the CUDA memory hierarchy

register local memory shared memory global memory constant memory

and texture memory Register local memory and shared memory are on-

chip and could be a little time consuming to access these memories Only

shared memory can be accessed by other threads within the same block

However there is only limited availability of shared memory Global

memory constant memory and texture memory are off-chip memory and

accessible by all threads which would be very time consuming to access

these memories

Constant memory and texture memory are read-only and cacheable

memory Mapping algorithms to CUDA to achieve efficient processing is

not a trivial task There are several challenges in CUDA programming

If threads in a warp have different control path all the branches will be

executed serially To improve performance branch divergence within a

warp should be avoided

Global memory is slower than on-chip memory in term of access To

completely hide the latency of the small instructions set we should use on-

chip memory preferentially rather than global memory When global

memory access occurs threads in same warp should access the words in

sequence to achieve coalescence

Shared memory is much faster than the local and global memory space

But shared memory is organized into banks which are equal in size If two

addresses of memory request from different thread within a warp fall in the

same memory bank the access will be serialized To get maximum

performance memory requests should be scheduled to minimize bank

conflicts

251 MAPPING ALGORITHM TO BLOCKS

Because the proposed registration and matching algorithm has four

independent modules all the modules will be converted to different kernels

on the GPU These kernels are different in computation density thus we

map them to the GPU by various map strategies to fully utilize the

computing power of CUDA Figure 11 shows our scheme of CPU-GPU

task distribution and the partition among blocks and threads Algorithm 1 is

partitioned into coarse-grained parallel subtasks

We create a number of threads in this kernel The number of threads

is the same as the number of templates in the database As the upper middle

column shows in Figure 11 each target template will be assigned to one

thread One thread performs a pair of templates compare In our work we

use NVIDIA C2070 as our GPU Threads and blocks number is set to

1024 That means we can match our test template with up to 1024times1024

target templates at same time

Algorithms 2-4 will be partitioned into fine-grained subtasks which is

processed a section of descriptors in one thread As the lower portion of the

middle column shows in Figure 11 we assigned a target template to one

block Inside a block one thread corresponds a set of descriptors in this

template This partition makes every block execute independently and there

are no data exchange requirements between different blocks When all

threads complete their responding descriptor fractions the sum of the

intermediate results needs to be computed or compared A parallel prefix

sum algorithm is used to calculate the sum of intermediate results which is

show in right of Figure 11 Firstly all odd number threads compute the sum

of consecutive pairs of the results Then recursively every first of i (= 4 8

16 32 64 ) threads

compute the prefix sum on the new result The final result will be saved in

the first address which has the same variable name as the first intermediate

result

252 MAPPING INSIDE BLOCK

In shift argument searching there are two schemes we can choose to

map task

Mapping one pair of templates to all the threads in a block and then every

thread would take charge of a fraction of descriptors and cooperation with

other threads

Assigning a single possible shift offset to a thread and all the threads will

compute independently unless the final result should be compared with

other possible offset

Due to great number of sum and synchronization operations in every

nearest neighbor searching step we choose the second method to parallelize

shift searching In affine matrix generator we mapped an entire parameter

set searching to a thread and every thread randomly generated a set of

parameters and tried them independently The generated iterations were

assigned to all threads The challenge of this step is the randomly generated

numbers might be correlated among threads In the step of rotation and

scale registration generating we used the Mersenne Twister pseudorandom

number generator because it can use bitwise arithmetic and have long

period

The Mersenne twister as most of pseudorandom generators is iterative

Therefore itrsquos hard to parallelize a single twister state update step among

several execution threads To make sure that thousands of threads in the

launch grid generate uncorrelated random sequence many simultaneous

Mersenne twisters need to process with different initial states in parallel

But even ldquovery differentrdquo (by any definition) initial state values do not

prevent the emission of correlated sequences by each generator sharing

identical parameters To solve this problem and to enable efficient

implementation of Mersenne Twister on parallel architectures we used a

special offline tool for the dynamic creation of Mersenne Twisters

parameters modified from the algorithm developed by Makoto Matsumoto

and Takuji Nishimura In the registration and matching step when

searching the nearest neighbor a line segment that has already matched

with others should not be used again In our approach a flag

FIG

FIG

Variable denoting whether the line has been matched is stored in

shared memory To share the flags all the threads in a block should wait

synchronic operation at every query step Our solution is to use a single

thread in a block to process the matching

253 MEMORY MANAGEMENT

The bandwidth inside GPU board is much higher than the

bandwidth between host memory and device memory The data transfer

between host and device can lead to long latency As shown in Figure 11

we load the entire target templates set from database without considering

when they would be processed Therefore there was no data transfer from

host to device during the matching procedure In global memory the

components in descriptor y(ϕ1 ϕ2 ϕ3 x y) and s(x y rθ ϕw) were stored

separately This would guarantee contiguous kernels of Algorithm 2 to 4

can access their data in successive addresses Although such coalescing

access reduces the latency frequently global memory access was still a

slower way to get data In our kernel we loaded the test template to shared

memory to accelerate memory access Because the Algorithms 2 to 4

execute different number of iterations on same data the bank conflict does

not happen To maximize our texture memory space we set the system

cache to the lowest value and bonded our target descriptor to texture

memory Using this catchable memory our data access was accelerated

more

FIG

26 HISTOGRAM OF ORIENTED GRADIENTS

Histogram of oriented gradients is the feature descriptors It is primarily

applied to the design of target detection In this paper it is applied as the

feature for human recognition In the sclera region the vein patterns are the

edges of an image So HOG is used to determine the gradient orientation

and edge orientations of vein pattern in the sclera region of an eye image

To follow out this technique first of all divide the image into small

connected regions called cells For each cell compute the histogram of

gradient directions or edge orientations of the pixels Then the combination

of different histogram of different cell represents the descriptor To improve

accuracy histograms can be contrast normalized by calculating the intensity

from the block and then using this value normalizes all cells within the

block This normalization result shows that it is invariant to geometric and

photometric changes The gradient magnitude m(x y) and orientation 1050592(x

y) are calculated using x and y directions gradients dx (x y) and dy (x y)

Orientation binning is the second step of HOG This method utilized

to create cell histograms Each pixel within the cell used to give a weight to

the orientation which is found in the gradient computation Gradient

magnitude is used as the weight The cells are in the rectangular form The

binning of gradient orientation should be spread over 0 to 180 degrees and

opposite direction counts as the same In the Fig 8 depicts the edge

orientation of picture elements If the images have any illumination and

contrast changes then the gradient strength must be locally normalized For

that cells are grouped together into larger blocks These blocks are

overlapping so that each cell contributes more than once to the final

descriptor Here rectangular HOG (R-HOG) blocks are applied which are

mainly in square grids The performance of HOG is improved by putting

on a Gaussian window into each block

FIG

CHAPTER 3

SOFTWARE SPECIFICATION

31 GENERAL

MATLAB(matrix laboratory) is a numerical

computing environment and fourth-generation programming language

Developed by Math Works MATLAB allows matrix manipulations

plotting of functions and data implementation of algorithms creation

of user interfaces and interfacing with programs written in other languages

including C C++ Java and Fortran

Although MATLAB is intended primarily for numerical computing an

optional toolbox uses the MuPAD symbolic engine allowing access

to symbolic computing capabilities An additional package Simulink adds

graphicalmulti-domainsimulationandModel-Based

Design for dynamic and embedded systems

In 2004 MATLAB had around one million users across industry

and academia MATLAB users come from various backgrounds

of engineering science and economics MATLAB is widely used in

academic and research institutions as well as industrial enterprises

MATLAB was first adopted by researchers and practitioners

in control engineering Littles specialty but quickly spread to many other

domains It is now also used in education in particular the teaching

of linear algebra and numerical analysis and is popular amongst scientists

involved in image processing The MATLAB application is built around the

MATLAB language The simplest way to execute MATLAB code is to type

it in the Command Window which is one of the elements of the MATLAB

Desktop When code is entered in the Command Window MATLAB can

be used as an interactive mathematical shell Sequences of commands can

be saved in a text file typically using the MATLAB Editor as a script or

encapsulated into a function extending the commands available

MATLAB provides a number of features for documenting and

sharing your work You can integrate your MATLAB code with other

languages and applications and distribute your MATLAB algorithms and

applications

32 FEATURES OF MATLAB

High-level language for technical computing

Development environment for managing code files and data

Interactive tools for iterative exploration design and problem solving

Mathematical functions for linear algebra statistics Fourier analysis

filtering optimization and numerical integration

2-D and 3-D graphics functions for visualizing data

Tools for building custom graphical user interfaces

Functions for integrating MATLAB based algorithms with external

applications and languages such as C C++ FORTRAN Javatrade COM

and Microsoft Excel

MATLAB is used in vast area including signal and image

processing communications control design test and measurement

financial modeling and analysis and computational Add-on toolboxes

(collections of special-purpose MATLAB functions) extend the MATLAB

environment to solve particular classes of problems in these application

areas

MATLAB can be used on personal computers and powerful

server systems including the Cheaha compute cluster With the addition of

the Parallel Computing Toolbox the language can be extended with parallel

implementations for common computational functions including for-loop

unrolling Additionally this toolbox supports offloading computationally

intensive workloads to Cheaha the campus compute cluster MATLAB is

one of a few languages in which each variable is a matrix (broadly

construed) and knows how big it is Moreover the fundamental operators

(eg addition multiplication) are programmed to deal with matrices when

required And the MATLAB environment handles much of the bothersome

housekeeping that makes all this possible Since so many of the procedures

required for Macro-Investment Analysis involves matrices MATLAB

proves to be an extremely efficient language for both communication and

implementation

321 INTERFACING WITH OTHER LANGUAGES

MATLAB can call functions and subroutines written in the C

programming language or FORTRAN A wrapper function is created

allowing MATLAB data types to be passed and returned The dynamically

loadable object files created by compiling such functions are termed MEX-

files (for MATLAB executable)

Libraries written in Java ActiveX or NET can be directly called

from MATLAB and many MATLAB libraries (for

example XML or SQL support) are implemented as wrappers around Java

or ActiveX libraries Calling MATLAB from Java is more complicated but

can be done with MATLAB extension which is sold separately by Math

Works or using an undocumented mechanism called JMI (Java-to-Mat lab

Interface) which should not be confused with the unrelated Java that is also

called JMI

As alternatives to the MuPAD based Symbolic Math Toolbox

available from Math Works MATLAB can be connected

to Maple or Mathematical

Libraries also exist to import and export MathML

Development Environment

Startup Accelerator for faster MATLAB startup on Windows especially on

Windows XP and for network installations

Spreadsheet Import Tool that provides more options for selecting and

loading mixed textual and numeric data

Readability and navigation improvements to warning and error messages in

the MATLAB command window

Automatic variable and function renaming in the MATLAB Editor

Developing Algorithms and Applications

MATLAB provides a high-level language and development

tools that let you quickly develop and analyze your algorithms and

applications

The MATLAB Language

The MATLAB language supports the vector and matrix operations

that are fundamental to engineering and scientific problems It enables fast

development and execution With the MATLAB language you can

program and develop algorithms faster than with traditional languages

because you do not need to perform low-level administrative tasks such as

declaring variables specifying data types and allocating memory In many

cases MATLAB eliminates the need for lsquoforrsquo loops As a result one line of

MATLAB code can often replace several lines of C or C++ code

At the same time MATLAB provides all the features of a traditional

programming language including arithmetic operators flow control data

structures data types object-oriented programming (OOP) and debugging

features

MATLAB lets you execute commands or groups of commands one

at a time without compiling and linking enabling you to quickly iterate to

the optimal solution For fast execution of heavy matrix and vector

computations MATLAB uses processor-optimized libraries For general-

purpose scalar computations MATLAB generates machine-code

instructions using its JIT (Just-In-Time) compilation technology

This technology which is available on most platforms provides

execution speeds that rival those of traditional programming languages

Development Tools

MATLAB includes development tools that help you implement

your algorithm efficiently These include the following

MATLAB Editor

Provides standard editing and debugging features such as setting

breakpoints and single stepping

Code Analyzer

Checks your code for problems and recommends modifications to

maximize performance and maintainability

MATLAB Profiler

Records the time spent executing each line of code

Directory Reports

Scan all the files in a directory and report on code efficiency file

differences file dependencies and code coverage

Designing Graphical User Interfaces

By using the interactive tool GUIDE (Graphical User Interface

Development Environment) to layout design and edit user interfaces

GUIDE lets you include list boxes pull-down menus push buttons radio

buttons and sliders as well as MATLAB plots and Microsoft

ActiveXreg controls Alternatively you can create GUIs programmatically

using MATLAB functions

322 ANALYZING AND ACCESSING DATA

MATLAB supports the entire data analysis process from acquiring

data from external devices and databases through preprocessing

visualization and numerical analysis to producing presentation-quality

output

Data Analysis

MATLAB provides interactive tools and command-line functions for data

analysis operations including

Interpolating and decimating

Extracting sections of data scaling and averaging

Thresholding and smoothing

Correlation Fourier analysis and filtering

1-D peak valley and zero finding

Basic statistics and curve fitting

Matrix analysis

Data Access

MATLAB is an efficient platform for accessing data from

files other applications databases and external devices You can read data

from popular file formats such as Microsoft Excel ASCII text or binary

files image sound and video files and scientific files such as HDF and

HDF5 Low-level binary file IO functions let you work with data files in

any format Additional functions let you read data from Web pages and

XML

Visualizing Data

All the graphics features that are required to visualize engineering

and scientific data are available in MATLAB These include 2-D and 3-D

plotting functions 3-D volume visualization functions tools for

interactively creating plots and the ability to export results to all popular

graphics formats You can customize plots by adding multiple axes

changing line colors and markers adding annotation Latex equations and

legends and drawing shapes

2-D Plotting

Visualizing vectors of data with 2-D plotting functions that create

Line area bar and pie charts

Direction and velocity plots

Histograms

Polygons and surfaces

Scatterbubble plots

Animations

3-D Plotting and Volume Visualization

MATLAB provides functions for visualizing 2-D matrices 3-

D scalar and 3-D vector data You can use these functions to visualize and

understand large often complex multidimensional data Specifying plot

characteristics such as camera viewing angle perspective lighting effect

light source locations and transparency

3-D plotting functions include

Surface contour and mesh

Image plots

Cone slice stream and isosurface

323 PERFORMING NUMERIC COMPUTATION

MATLAB contains mathematical statistical and engineering

functions to support all common engineering and science operations These

functions developed by experts in mathematics are the foundation of the

MATLAB language The core math functions use the LAPACK and BLAS

linear algebra subroutine libraries and the FFTW Discrete Fourier

Transform library Because these processor-dependent libraries are

optimized to the different platforms that MATLAB supports they execute

faster than the equivalent C or C++ code

MATLAB provides the following types of functions for performing

mathematical operations and analyzing data

Matrix manipulation and linear algebra

Polynomials and interpolation

Fourier analysis and filtering

Data analysis and statistics

Optimization and numerical integration

Ordinary differential equations (ODEs)

Partial differential equations (PDEs)

Sparse matrix operations

MATLAB can perform arithmetic on a wide range of data types

including doubles singles and integers

CHAPTER 4

IMPLEMENTATION

41 GENERAL

Matlab is a program that was originally designed to simplify the

implementation of numerical linear algebra routines It has since grown into

something much bigger and it is used to implement numerical algorithms

for a wide range of applications The basic language used is very similar to

standard linear algebra notation but there are a few extensions that will

likely cause you some problems at first

42 SNAPSHOTS

ORIGINAL SCLERA IMAGE IS CONVERTED INTO GREY SCALE IMAGE

FIG

GREY SCALE IMAGE IS CONVERTED INTO BINARY IMAGE

FIG

EDGE DETECTON IS DONE BY OTSUrsquoS THRESHOLDING

FIG

SELECTING THE REGION OF INTEREST (SCLERA PART)

FIG

SELECTED ROI PART

FIG

FIG

ENHANCEMENT OF SCLERA IMAGE

FIG

FEATURE EXTRACTION OF SCLERA IMAGE USING GABOR

FILTERS

FIG

MATCHING WITH IMAGES IN DATABASE

FIG

DISPLAYING THE RESULT (MATCHED OR NOT MATCHED)

FIG

CHAPTER 5

APPLICATIONS

The applications of biometrics can be divided into the following three main groups

Commercial applications such as computer network login electronic data security ecommerce Internet access ATM credit card physical access control cellular phone PDA medical records management distance learning etc

Government applications such as national ID card correctional facility driverrsquos license social security welfare-disbursement border control Passport control etc

Forensic applications such as corpse identification criminal investigation terrorist identification parenthood determination missing children etc Traditionally commercial applications have used knowledge-based systems (eg PIN sand passwords) government applications have used token-based systems (eg ID cards and badges) and forensic applications have relied on human experts to match biometric features Biometric systems are being increasingly deployed in large scale civilian applications The Schiphol Premium scheme at the Amsterdam airport for example employs iris scan cards to speed up the passport and visa control procedures

CHAPTER 6

CONCLUSION AND FUTURE SCOPE

61 CONCLUSION

In this paper we proposed a new parallel sclera vein recognition

method which employees a two stage parallel approach for registration and

matching Even though the research focused on developing a parallel sclera

matching solution for the sequential line-descriptor method using CUDA

GPU architecture the parallel strategies developed in this research can be

applied to design parallel solutions to other sclera vein recognition methods

and general pattern recognition methods We designed the Y shape

descriptor to narrow the search range to increase the matching efficiency

which is a new feature extraction method to take advantage of the GPU

structures We developed the WPL descriptor to incorporate mask

information and make it more suitable for parallel computing which can

dramatically reduce data transferring and computation We then carefully

mapped our algorithms to GPU threads and blocks which is an important

step to achieve parallel computation efficiency using a GPU A work flow

which has high arithmetic intensity to hide the memory access latency was

designed to partition the computation task to the heterogeneous system of

CPU and GPU even to the threads in GPU The proposed method

dramatically improves the matching efficiency without compromising

recognition accuracy

62 REFERENCES

[1] C W Oyster The Human Eye Structure and Function Sunderland

Sinauer Associates 1999

[2] C Cuevas D Berjon F Moran and N Garcia ldquoMoving object

detection for real-time augmented reality applications in a GPGPUrdquo IEEE

Trans Consum Electron vol 58 no 1 pp 117ndash125 Feb 2012

[3] D C Cirean U Meier L M Gambardella and J Schmidhuber ldquoDeep

big simple neural nets for handwritten digit recognitionrdquo Neural Comput

vol 22 no 12 pp 3207ndash3220 2010

[4] F Z Sakr M Taher and A M Wahba ldquoHigh performance iris

recognition system on GPUrdquo in Proc ICCES 2011 pp 237ndash242

[5] G Poli J H Saito J F Mari and M R Zorzan ldquoProcessing

neocognitron of face recognition on high performance environment based

on GPU with CUDA architecturerdquo in Proc 20th Int Symp Comput

Archit High Perform Comput 2008 pp 81ndash88

[6] J Antikainen J Havel R Josth A Herout P Zemcik and M Hauta-

Kasari ldquoNonnegative tensor factorization accelerated using GPGPUrdquo IEEE

Trans Parallel Distrib Syst vol 22 no 7 pp 1135ndash1141 Feb 2011

[7] K-S Oh and K Jung ldquoGPU implementation of neural networksrdquo

Pattern Recognit vol 37 no 6 pp 1311ndash1314 2004

[8] P R Dixon T Oonishi and S Furui ldquoHarnessing graphics processors

for the fast computation of acoustic likelihoods in speech recognitionrdquo

Comput Speech Lang vol 23 no 4 pp 510ndash526 2009

[9] P Kaufman and A Alm ldquoClinical applicationrdquo Adlerrsquos Physiology of

the Eye 2003

[10] R N Rakvic B J Ulis R P Broussard R W Ives and N Steiner

ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics Security vol 4

no 4 pp 812ndash823 Dec 2009

[11] S Crihalmeanu and A Ross ldquoMultispectral scleral patterns for ocular

biometric recognitionrdquo Pattern Recognit Lett vol 33 no 14 pp 1860ndash

1869 Oct 2012

[12] W Wenying Z Dongming Z Yongdong L Jintao and G

Xiaoguang ldquoRobust spatial matching for object retrieval and its parallel

implementation on GPUrdquo IEEE Trans Multimedia vol 13 no 6 pp

1308ndash1318 Dec 2011Multimedia Sec Magdeburg Germany Sep 2004

pp 4ndash15

[13] Y Xu S Deka and R Righetti ldquoA hybrid CPU-GPGPU approach for

real-time elastographyrdquo IEEE Trans Ultrason Ferroelectr Freq Control

vol 58 no 12 pp 2631ndash2645 Dec 2011

[14] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive

multimodal eye recognitionrdquo Signal Image Video Process vol 7 no 4

pp 619ndash631 Jul 2013

[15] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive

approach for sclera image quality measurerdquo Int J Biometrics vol 5 no 2

pp 181ndash198 2013

[16] Z Zhou E Y Du N L Thomas and E J Delp ldquoA new human

identification method Sclera recognitionrdquo IEEE Trans Syst Man

Cybern A Syst Humans vol 42 no 3 pp 571ndash583 May 2012

Page 39: Major project report on

iris and S is the line descriptor In order to register the segments of the

vascular patterns a RANSAC-based algorithm is used to estimate the best-

fit parameters for registration between the two sclera vascular patterns For

the registration algorithm it randomly chooses two points ndash one from the

test template and one from the target template It also randomly chooses a

scaling factor and a rotation value based on a priori knowledge of the

database Using these values it calculates a fitness value for the registration

using these parameters

After sclera template registration each line segment in the test

template is compared to the line segments in the target template for

matches In order to reduce the effect of segmentation errors we created the

weighting image (Figure 3) from the sclera mask by setting interior pixels

in the sclera mask to 1 pixels within some distance of the boundary of the

mask to 05 and pixels outside the mask to 0

The matching score for two segment descriptors is calculated By

where Si and Sj are two segment descriptors m(Si Sj ) is the matching

score between segments Si and Sj d(Si Sj ) is the Euclidean distance

between the segment descriptors center points (from Eq 6-8) Dmatch is

the matching distance threshold and match is the matching angle threshold

The total matching score M is the sum of the individual matching scores

divided by the maximum matching score for the minimal set between the

test and target template That is one of the test or target templates has fewer

points and thus the sum of its descriptors weight sets the maximum score

that can be attained

FIG

FIG

FIG

FIG

movement of eye Y shape branches are observed to be a stable feature and

can be used as sclera feature descriptor To detect the Y shape branches in

the original template we search for the nearest neighbors set of every line

segment in a regular distance classified the angles among these neighbors

If there were two types of angle values in the line segment set this set may

be inferred as a Y shape structure and the line segment angles would be

recorded as a new feature of the sclera

There are two ways to measure both orientation and relationship of

every branch of Y shape vessels one is to use the angles of every branch to

x axle the other is to use the angels between branch and iris radial

direction The first method needs additional rotation operating to align the

template In our approach we employed the second method As Figure 6

shows ϕ1 ϕ2 and ϕ3 denote the angle between each branch and the radius

from pupil center Even when the head tilts the eye moves or the camera

zooms occurs at the image acquisition step ϕ1 ϕ2 and ϕ3 are quite stable

To tolerate errors from the pupil center calculation in the segmentation step

we also recorded the center position (x y) of the Y shape branches as

auxiliary parameters So in our rotation shift and scale invariant feature

vector is defined as y(ϕ1 ϕ2 ϕ3 x y) The Y-shape descriptor is generated

with reference to the iris center Therefore it is automatically aligned to the

iris centers It is a rotational- and scale- invariant descriptor V WPL

SCLERA DESCRIPTOR As we discussed in the Section 22 the line

descriptor is extracted from the skeleton of vessel structure in binary images

(Figure 7) The skeleton is then broken into smaller segments For each

segment a line descriptor is created to record the center and orientation of

the segment This descriptor is expressed as s(x yɸ) where (x y) is the

position of the center and ɸ is its orientation Because of the limitation of

segmentation accuracy the descriptor in the boundary of sclera area might

not be accurate and may contain spur edges resulting from the iris eyelid

andor eyelashes To be tolerant of such error the mask file

FIG

The line descriptor of the sclera vessel pattern (a) An eye image (b) Vessel

patterns in sclera (c) Enhanced sclera vessel patterns (d) Centers of line

segments of vessel patterns

Is designed to indicate whether a line segment belongs to the edge of the

sclera or not However in GPU application using the mask is a challenging

since the mask files are large in size and will occupy the GPU memory and

slow down the data transfer When matching the registration RANSAC

type algorithm was used to randomly select the corresponding descriptors

and the transform parameter between them was used to generate the

template transform affine matrix After every templates transform the mask

data should also be transformed and new boundary should be calculated to

evaluate the weight of the transformed descriptor This results in too many

convolutions in processor unit

To reduce heavy data transfer and computation we designed the

weighted polar line (WPL) descriptor structure which includes the

information of mask and can be automatically aligned We extracted the

relationship of geometric feature of descriptors and store them as a new

descriptor We use a weighted image created via setting various weight

values according to their positions The weight of those descriptors who are

beyond the sclera are set to be 0 and those who are near the sclera

boundary are 05 and interior descriptors are set to be 1 In our work

descriptors weights were calculated on their own mask by the CPU only

once

The calculating result was saved as a component of descriptor The

descriptor of sclera will change to s(x y ɸw) where w denotes the weight

of the point and the value may be 0 05 1 To align two templates when a

template is shifted to another location along the line connecting their

centers all the descriptors of that template will be transformed It would be

faster if two templates have similar reference points If we use the center of

the iris as the reference point when two templates are compared the

correspondence will automatically be aligned to each other since they have

the similar reference point Every feature vector of the template is a set of

line segment descriptors composed of three variable (Figure 8) the

segment angle to the reference line which went through the iris center

denoted as θ the distance between the segments center and pupil center

which is denoted as r the dominant angular orientation of the segment

denoted as ɸ To minimize the GPU computing we also convert the

descriptor value from polar coordinate to rectangular coordinate in CPU

preprocess

The descriptor vector becomes s(x y r θ ɸw) The left and right

parts of sclera in an eye may have different registration parameters For

example as an eyeball moves left left part sclera patterns of the eye may be

compressed while the right part sclera patterns are stretched

In parallel matching these two parts are assigned to threads in

different warps to allow different deformation The multiprocessor in

CUDA manages threads in groups of 32 parallel threads called warps We

reorganized the descriptor from same sides and saved

FIG

FIG

them in continuous address This would meet requirement of coalesced

memory access in GPU

After reorganizing the structure of descriptors and adding mask information

into the new descriptor the computation on the mask file is not needed on

the GPU It was very fast to match with this feature because it does not

need to reregister the templates every time after shifting Thus the cost of

data transfer and computation on GPU will be reduced Matching on the

new descriptor the shift parameters generator in Figure 4 is then simplified

as Figure 9

23 EVOLUTION OF GPU ARCHITECTURE

The fixed-function pipeline lacked the generality to efficiently express

more complicated shading and lighting operations that are essential for

complex effects The key step was replacing the fixed-function per-vertex

and per-fragment operations with user-specified programs run on each

vertex and fragment Over the past six years these vertex programs and

fragment programs have become increasingly more capable with larger

limits on their size and resource consumption with more fully featured

instruction sets and with more flexible control-flow operations After many

years of separate instruction sets for vertex and fragment operations current

GPUs support the unified Shader Model 40 on both vertex and fragment

shaders

The hardware must support shader programs of at least 65 k static

instructions and unlimited dynamic instructions

The instruction set for the first time supports both 32-bit integers and 32-

bit floating-point numbers

The hardware must allow an arbitrary number of both direct and indirect

reads from global memory (texture)

Finally dynamic flow control in the form of loops and branches must be

supported

As the shader model has evolved and become more powerful and GPU

applications of all types have increased vertex and fragment program

complexity GPU architectures have increasingly focused on the

programmable parts of the graphics pipeline Indeed while previous

generations of GPUs could best be described as additions of

programmability to a fixed-function pipeline todayrsquos GPUs are better

characterized as a programmable engine surrounded by supporting fixed-

function units General-Purpose Computing on the GPU Mapping general-

purpose computation onto the GPU uses the graphics hardware in much the

same way as any standard graphics application Because of this similarity it

is both easier and more difficult to explain the process On one hand the

actual operations are the same and are easy to follow on the other hand the

terminology is different between graphics and general-purpose use Harris

provides an excellent description of this mapping process

We begin by describing GPU programming using graphics terminology

then show how the same steps are used in a general-purpose way to author

GPGPU applications and finally use the same steps to show the more

simple and direct way that todayrsquos GPU computing applications are written

231 PROGRAMMING A GPU FOR GRAPHICS

We begin with the same GPU pipeline that we described in Section II

concentrating on the programmable aspects of this pipeline

The programmer specifies geometry that covers a region on the screen

The rasterizer generates a fragment at each pixel location covered by that

geometry

Each fragment is shaded by the fragment program

The fragment program computes the value of the fragment by a

combination of math operations and global memory reads from a global

Btexture[ memory

The resulting image can then be used as texture on future passes through

the graphics pipeline

232 PROGRAMMING A GPU FOR GENERAL-PURPOSE

PROGRAMS (OLD)

Coopting this pipeline to perform general-purpose computation

involves the exact same steps but different terminology A motivating

example is a fluid simulation computed over a grid at each time step we

compute the next state of the fluid for each grid point from the current state

at its grid point and at the grid points of its neighbors

The programmer specifies a geometric primitive that covers a

computation domain of interest The rasterizer generates a fragment at each

pixel location covered by that geometry (In our example our primitive

must cover a grid of fragments equal to the domain size of our fluid

simulation)

Each fragment is shaded by an SPMD general-purpose fragment

program (Each grid point runs the same program to update the state of its

fluid)

The fragment program computes the value of the fragment by a

combination of math operations and Bgather[ accesses from global

memory (Each grid point can access the state of its neighbors from the

previous time step in computing its current value)

The resulting buffer in global memory can then be used as an input on

future passes (The current state of the fluid will be used on the next time

step)

23 2 PROGRAMMING A GPU FOR GENERAL-PURPOSE

PROGRAMS (NEW)

One of the historical difficulties in programming GPGPU applications

has been that despite their general-purpose tasksrsquo having nothing to do with

graphics the applications still had to be programmed using graphics APIs

In addition the program had to be structured in terms of the graphics

pipeline with the programmable units only accessible as an intermediate

step in that pipeline when the programmer would almost certainly prefer to

access the programmable units directly The programming environments we

describe in detail in Section IV are solving this difficulty by providing a

more natural direct non-graphics interface to the hardware and

specifically the programmable units Today GPU computing applications

are structured in the following way

The programmer directly defines the computation domain of interest as a

structured grid of threads

An SPMD general-purpose program computes the value of each thread

The value for each thread is computed by a combination of math

operations and both Bgather[ (read) accesses from and Bscatter[ (write)

accesses to global memory Unlike in the previous two

methods the same buffer can be used for both reading and writing

allowing more flexible algorithms (for example in-place algorithms that

use less memory)

The resulting buffer in global memory can then be used as an input in

future computation

24 COARSE-TO-FINE TWO-STAGE MATCHING PROCESS

To further improve the matching process we propose the coarse-to-fine

two-stage matching process In the first stage we matched two images

coarsely using the Y-shape descriptors which is very fast to match because

no registration was needed The matching result in this stage can help filter

out image pairs with low similarities After this step it is still possible for

some false positive matches In the second stage we used WPL descriptor

to register the two images for more detailed descriptor matching including

scale- and translation invariance This stage includes shift transform affine

matrix generation and final WPL descriptor matching Overall we

partitioned the registration and matching processing into four kernels2 in

CUDA (Figure 10) matching on the Y shape descriptor shift

transformation affine matrix generation and final WSL descriptor

matching Combining these two stages the matching program can run faster

and

achieve more accurate score

241 STAGE I MATCHING WITH Y SHAPE DESCRIPTOR

Due to scale- and rotation- invariance of the Y-shape features

registration is unnecessary before matching on Y shape descriptor The

whole matching algorithm is listed as algorithm 1

FIG

Here ytei and yta j are the Y shape descriptors of test template Tte

and target template Tta respectively dϕ is the Euclidian distance of angle

element of descriptors vector defined as (3) dxy is the Euclidian distance of

two descriptor centers defined as (4) ni and di are the matched descriptor

pairsrsquo number and their centers distance respectively tϕ is a distance

threshold and txy is the threshold to restrict the searching area We set tϕ to

30 and txy to 675 in our experiment Here

To match two sclera templates we searched the areas nearby to all

the Y shape branches The search area is limited to the corresponding left or

right half of the sclera in order to reduce the searching range and time The

distance of two branches is defined in (3) where ϕi j is the angle between

the j th branch and the polar from pupil center in desctiptor i

The number of matched pairs ni and the distance between Y shape

branches centers di are stored as the matching result We fuse the number of

matched branches and the average distance between matched branches

centers as (2) Here α is a factor to fuse the matching score which was set

to 30 in our study Ni and Nj is the total numbers of feature vectors in

template i and j separately The decision is regulated by the threshold t if

the sclerarsquos matching score is lower than t the sclera will be discarded The

sclera with high matching score will be passed to the next more precisely

matching process

242 STAGE II FINE MATCHING USING WPL DESCRIPTOR

The line segment WSL descriptor reveals more vessel structure detail of

sclera than the Y shape descriptor The variation of sclera vessel pattern is

nonlinear because

When acquiring an eye image in different gaze angle the vessel structure

will appear nonlinear shrink or extend because eyeball is spherical in shape

sclera is made up of four layers episclera stroma lamina fusca and

endothelium There are slightly differences among movement of these

layers Considering these factors our registration employed both single

shift transform and multi-parameter transform which combines shift

rotation and scale together

1) SHIFT PARAMETER SEARCH As we discussed before

segmentation may not be accurate As a result the detected iris center could

not be very accurate Shift transform is designed to tolerant possible errors

in pupil center detection in the segmentation step If there is no deformation

or only very minor deformation registration with shift transform together

would be adequate to achieve an accurate result We designed Algorithm 2

to get optimized shift parameter Where Tte is the test template and ssei is

the i th WPL descriptor of Tte Tta is the target template and ssai is the i th

WPL descriptor of Ttad(stek staj ) is Euclidean distance of escriptors stek

and staj

Δsk is the shift value of two descriptors defines as

We first randomly select an equal number of segment descriptors

stek in test template Tte from each quad and find its nearest neighbors staj _

in target template Tta The shift offset of them is recorded as the possible

registration shift factor _sk The final offset registration factor is _soptim

which has the smallest standard deviation among these candidate offsets

2) AFFINE TRANSFORM PARAMETER SEARCH

Affine transform is designed to tolerant some deformation of sclera

patterns in the matching step The affine transform algorithm is shown in

Algorithm 3 The shift value in the parameter set is obtained by randomly

selecting descriptor s(it )te and calculating the distance from its nearest

neighbor staj_ in Tta We transform the test template by the matrix in (7)

At end of the iteration we count the numbers of matched descriptor pairs

from the transformed template and the target template The factor β is

involved to determine if the pair of descriptor is matched and we set it to

be 20 pixels in our experiment After N iterations the optimized transform

parameter set is determined via selecting the maximum matching numbers

m(it) Here stei Tte staj and Tta is defined same as algorithm 2 tr (it )

shi f t θ(it )tr (it ) scale is the parameters of shift rotation and scale

transform generated in i tth iteration R(θ (it )) T (tr (it ) shi f t ) and S(tr (it

) scale) are the transform matrix defined as (7) To search optimize

transform parameter we iterated N times to generate these parameters In

our experiment we set iteration time to 512

3) REGISTRATION AND MATCHING ALGORITHM

Using the optimized parameter set determined from Algorithms 2

and 3 the test template will be registered and matched simultaneously The

registration and matching algorithm is listed in Algorithm 4 Here stei Tte

staj and Tta are defined same as Algorithms 2 and 3 θ(optm) tr (optm) shi f

t tr (optm) scale _soptim are the registration parameters attained from

Algorithms 2 and 3 R_θ(optm)_T _tr (optm) shi f t _S_tr (optm) scale

_ is the descriptor transform matrix defined in Algorithm 3 empty is the angle

between the segment descriptor and radius direction w is the weight of the

descriptor which indicates whether the descriptor is at the edge of sclera or

not To ensure that the nearest descriptors have a similar orientation we

used a constant factor α to check the abstract difference of two ɸ In our

experiment we set α to 5 The total matching score is minimal score of two

transformed result divided by the minimal matching score for test template

and target template

25 MAPPING THE SUBTASKS TO CUDA

CUDA is a single instruction multiple data (SIMD) system and

works as a coprocessor with a CPU A CUDA consists of many streaming

multiprocessors (SM) where the parallel part of the program should be

partitioned into threads by the programmer and mapped into those threads

There are multiple memory spaces in the CUDA memory hierarchy

register local memory shared memory global memory constant memory

and texture memory Register local memory and shared memory are on-

chip and could be a little time consuming to access these memories Only

shared memory can be accessed by other threads within the same block

However there is only limited availability of shared memory Global

memory constant memory and texture memory are off-chip memory and

accessible by all threads which would be very time consuming to access

these memories

Constant memory and texture memory are read-only and cacheable

memory Mapping algorithms to CUDA to achieve efficient processing is

not a trivial task There are several challenges in CUDA programming

If threads in a warp have different control path all the branches will be

executed serially To improve performance branch divergence within a

warp should be avoided

Global memory is slower than on-chip memory in term of access To

completely hide the latency of the small instructions set we should use on-

chip memory preferentially rather than global memory When global

memory access occurs threads in same warp should access the words in

sequence to achieve coalescence

Shared memory is much faster than the local and global memory space

But shared memory is organized into banks which are equal in size If two

addresses of memory request from different thread within a warp fall in the

same memory bank the access will be serialized To get maximum

performance memory requests should be scheduled to minimize bank

conflicts

251 MAPPING ALGORITHM TO BLOCKS

Because the proposed registration and matching algorithm has four

independent modules all the modules will be converted to different kernels

on the GPU These kernels are different in computation density thus we

map them to the GPU by various map strategies to fully utilize the

computing power of CUDA Figure 11 shows our scheme of CPU-GPU

task distribution and the partition among blocks and threads Algorithm 1 is

partitioned into coarse-grained parallel subtasks

We create a number of threads in this kernel The number of threads

is the same as the number of templates in the database As the upper middle

column shows in Figure 11 each target template will be assigned to one

thread One thread performs a pair of templates compare In our work we

use NVIDIA C2070 as our GPU Threads and blocks number is set to

1024 That means we can match our test template with up to 1024times1024

target templates at same time

Algorithms 2-4 will be partitioned into fine-grained subtasks which is

processed a section of descriptors in one thread As the lower portion of the

middle column shows in Figure 11 we assigned a target template to one

block Inside a block one thread corresponds a set of descriptors in this

template This partition makes every block execute independently and there

are no data exchange requirements between different blocks When all

threads complete their responding descriptor fractions the sum of the

intermediate results needs to be computed or compared A parallel prefix

sum algorithm is used to calculate the sum of intermediate results which is

show in right of Figure 11 Firstly all odd number threads compute the sum

of consecutive pairs of the results Then recursively every first of i (= 4 8

16 32 64 ) threads

compute the prefix sum on the new result The final result will be saved in

the first address which has the same variable name as the first intermediate

result

252 MAPPING INSIDE BLOCK

In shift argument searching there are two schemes we can choose to

map task

Mapping one pair of templates to all the threads in a block and then every

thread would take charge of a fraction of descriptors and cooperation with

other threads

Assigning a single possible shift offset to a thread and all the threads will

compute independently unless the final result should be compared with

other possible offset

Due to great number of sum and synchronization operations in every

nearest neighbor searching step we choose the second method to parallelize

shift searching In affine matrix generator we mapped an entire parameter

set searching to a thread and every thread randomly generated a set of

parameters and tried them independently The generated iterations were

assigned to all threads The challenge of this step is the randomly generated

numbers might be correlated among threads In the step of rotation and

scale registration generating we used the Mersenne Twister pseudorandom

number generator because it can use bitwise arithmetic and have long

period

The Mersenne twister as most of pseudorandom generators is iterative

Therefore itrsquos hard to parallelize a single twister state update step among

several execution threads To make sure that thousands of threads in the

launch grid generate uncorrelated random sequence many simultaneous

Mersenne twisters need to process with different initial states in parallel

But even ldquovery differentrdquo (by any definition) initial state values do not

prevent the emission of correlated sequences by each generator sharing

identical parameters To solve this problem and to enable efficient

implementation of Mersenne Twister on parallel architectures we used a

special offline tool for the dynamic creation of Mersenne Twisters

parameters modified from the algorithm developed by Makoto Matsumoto

and Takuji Nishimura In the registration and matching step when

searching the nearest neighbor a line segment that has already matched

with others should not be used again In our approach a flag

FIG

FIG

Variable denoting whether the line has been matched is stored in

shared memory To share the flags all the threads in a block should wait

synchronic operation at every query step Our solution is to use a single

thread in a block to process the matching

253 MEMORY MANAGEMENT

The bandwidth inside GPU board is much higher than the

bandwidth between host memory and device memory The data transfer

between host and device can lead to long latency As shown in Figure 11

we load the entire target templates set from database without considering

when they would be processed Therefore there was no data transfer from

host to device during the matching procedure In global memory the

components in descriptor y(ϕ1 ϕ2 ϕ3 x y) and s(x y rθ ϕw) were stored

separately This would guarantee contiguous kernels of Algorithm 2 to 4

can access their data in successive addresses Although such coalescing

access reduces the latency frequently global memory access was still a

slower way to get data In our kernel we loaded the test template to shared

memory to accelerate memory access Because the Algorithms 2 to 4

execute different number of iterations on same data the bank conflict does

not happen To maximize our texture memory space we set the system

cache to the lowest value and bonded our target descriptor to texture

memory Using this catchable memory our data access was accelerated

more

FIG

26 HISTOGRAM OF ORIENTED GRADIENTS

Histogram of oriented gradients is the feature descriptors It is primarily

applied to the design of target detection In this paper it is applied as the

feature for human recognition In the sclera region the vein patterns are the

edges of an image So HOG is used to determine the gradient orientation

and edge orientations of vein pattern in the sclera region of an eye image

To follow out this technique first of all divide the image into small

connected regions called cells For each cell compute the histogram of

gradient directions or edge orientations of the pixels Then the combination

of different histogram of different cell represents the descriptor To improve

accuracy histograms can be contrast normalized by calculating the intensity

from the block and then using this value normalizes all cells within the

block This normalization result shows that it is invariant to geometric and

photometric changes The gradient magnitude m(x y) and orientation 1050592(x

y) are calculated using x and y directions gradients dx (x y) and dy (x y)

Orientation binning is the second step of HOG This method utilized

to create cell histograms Each pixel within the cell used to give a weight to

the orientation which is found in the gradient computation Gradient

magnitude is used as the weight The cells are in the rectangular form The

binning of gradient orientation should be spread over 0 to 180 degrees and

opposite direction counts as the same In the Fig 8 depicts the edge

orientation of picture elements If the images have any illumination and

contrast changes then the gradient strength must be locally normalized For

that cells are grouped together into larger blocks These blocks are

overlapping so that each cell contributes more than once to the final

descriptor Here rectangular HOG (R-HOG) blocks are applied which are

mainly in square grids The performance of HOG is improved by putting

on a Gaussian window into each block

FIG

CHAPTER 3

SOFTWARE SPECIFICATION

31 GENERAL

MATLAB(matrix laboratory) is a numerical

computing environment and fourth-generation programming language

Developed by Math Works MATLAB allows matrix manipulations

plotting of functions and data implementation of algorithms creation

of user interfaces and interfacing with programs written in other languages

including C C++ Java and Fortran

Although MATLAB is intended primarily for numerical computing an

optional toolbox uses the MuPAD symbolic engine allowing access

to symbolic computing capabilities An additional package Simulink adds

graphicalmulti-domainsimulationandModel-Based

Design for dynamic and embedded systems

In 2004 MATLAB had around one million users across industry

and academia MATLAB users come from various backgrounds

of engineering science and economics MATLAB is widely used in

academic and research institutions as well as industrial enterprises

MATLAB was first adopted by researchers and practitioners

in control engineering Littles specialty but quickly spread to many other

domains It is now also used in education in particular the teaching

of linear algebra and numerical analysis and is popular amongst scientists

involved in image processing The MATLAB application is built around the

MATLAB language The simplest way to execute MATLAB code is to type

it in the Command Window which is one of the elements of the MATLAB

Desktop When code is entered in the Command Window MATLAB can

be used as an interactive mathematical shell Sequences of commands can

be saved in a text file typically using the MATLAB Editor as a script or

encapsulated into a function extending the commands available

MATLAB provides a number of features for documenting and

sharing your work You can integrate your MATLAB code with other

languages and applications and distribute your MATLAB algorithms and

applications

32 FEATURES OF MATLAB

High-level language for technical computing

Development environment for managing code files and data

Interactive tools for iterative exploration design and problem solving

Mathematical functions for linear algebra statistics Fourier analysis

filtering optimization and numerical integration

2-D and 3-D graphics functions for visualizing data

Tools for building custom graphical user interfaces

Functions for integrating MATLAB based algorithms with external

applications and languages such as C C++ FORTRAN Javatrade COM

and Microsoft Excel

MATLAB is used in vast area including signal and image

processing communications control design test and measurement

financial modeling and analysis and computational Add-on toolboxes

(collections of special-purpose MATLAB functions) extend the MATLAB

environment to solve particular classes of problems in these application

areas

MATLAB can be used on personal computers and powerful

server systems including the Cheaha compute cluster With the addition of

the Parallel Computing Toolbox the language can be extended with parallel

implementations for common computational functions including for-loop

unrolling Additionally this toolbox supports offloading computationally

intensive workloads to Cheaha the campus compute cluster MATLAB is

one of a few languages in which each variable is a matrix (broadly

construed) and knows how big it is Moreover the fundamental operators

(eg addition multiplication) are programmed to deal with matrices when

required And the MATLAB environment handles much of the bothersome

housekeeping that makes all this possible Since so many of the procedures

required for Macro-Investment Analysis involves matrices MATLAB

proves to be an extremely efficient language for both communication and

implementation

321 INTERFACING WITH OTHER LANGUAGES

MATLAB can call functions and subroutines written in the C

programming language or FORTRAN A wrapper function is created

allowing MATLAB data types to be passed and returned The dynamically

loadable object files created by compiling such functions are termed MEX-

files (for MATLAB executable)

Libraries written in Java ActiveX or NET can be directly called

from MATLAB and many MATLAB libraries (for

example XML or SQL support) are implemented as wrappers around Java

or ActiveX libraries Calling MATLAB from Java is more complicated but

can be done with MATLAB extension which is sold separately by Math

Works or using an undocumented mechanism called JMI (Java-to-Mat lab

Interface) which should not be confused with the unrelated Java that is also

called JMI

As alternatives to the MuPAD based Symbolic Math Toolbox

available from Math Works MATLAB can be connected

to Maple or Mathematical

Libraries also exist to import and export MathML

Development Environment

Startup Accelerator for faster MATLAB startup on Windows especially on

Windows XP and for network installations

Spreadsheet Import Tool that provides more options for selecting and

loading mixed textual and numeric data

Readability and navigation improvements to warning and error messages in

the MATLAB command window

Automatic variable and function renaming in the MATLAB Editor

Developing Algorithms and Applications

MATLAB provides a high-level language and development

tools that let you quickly develop and analyze your algorithms and

applications

The MATLAB Language

The MATLAB language supports the vector and matrix operations

that are fundamental to engineering and scientific problems It enables fast

development and execution With the MATLAB language you can

program and develop algorithms faster than with traditional languages

because you do not need to perform low-level administrative tasks such as

declaring variables specifying data types and allocating memory In many

cases MATLAB eliminates the need for lsquoforrsquo loops As a result one line of

MATLAB code can often replace several lines of C or C++ code

At the same time MATLAB provides all the features of a traditional

programming language including arithmetic operators flow control data

structures data types object-oriented programming (OOP) and debugging

features

MATLAB lets you execute commands or groups of commands one

at a time without compiling and linking enabling you to quickly iterate to

the optimal solution For fast execution of heavy matrix and vector

computations MATLAB uses processor-optimized libraries For general-

purpose scalar computations MATLAB generates machine-code

instructions using its JIT (Just-In-Time) compilation technology

This technology which is available on most platforms provides

execution speeds that rival those of traditional programming languages

Development Tools

MATLAB includes development tools that help you implement

your algorithm efficiently These include the following

MATLAB Editor

Provides standard editing and debugging features such as setting

breakpoints and single stepping

Code Analyzer

Checks your code for problems and recommends modifications to

maximize performance and maintainability

MATLAB Profiler

Records the time spent executing each line of code

Directory Reports

Scan all the files in a directory and report on code efficiency file

differences file dependencies and code coverage

Designing Graphical User Interfaces

By using the interactive tool GUIDE (Graphical User Interface

Development Environment) to layout design and edit user interfaces

GUIDE lets you include list boxes pull-down menus push buttons radio

buttons and sliders as well as MATLAB plots and Microsoft

ActiveXreg controls Alternatively you can create GUIs programmatically

using MATLAB functions

322 ANALYZING AND ACCESSING DATA

MATLAB supports the entire data analysis process from acquiring

data from external devices and databases through preprocessing

visualization and numerical analysis to producing presentation-quality

output

Data Analysis

MATLAB provides interactive tools and command-line functions for data

analysis operations including

Interpolating and decimating

Extracting sections of data scaling and averaging

Thresholding and smoothing

Correlation Fourier analysis and filtering

1-D peak valley and zero finding

Basic statistics and curve fitting

Matrix analysis

Data Access

MATLAB is an efficient platform for accessing data from

files other applications databases and external devices You can read data

from popular file formats such as Microsoft Excel ASCII text or binary

files image sound and video files and scientific files such as HDF and

HDF5 Low-level binary file IO functions let you work with data files in

any format Additional functions let you read data from Web pages and

XML

Visualizing Data

All the graphics features that are required to visualize engineering

and scientific data are available in MATLAB These include 2-D and 3-D

plotting functions 3-D volume visualization functions tools for

interactively creating plots and the ability to export results to all popular

graphics formats You can customize plots by adding multiple axes

changing line colors and markers adding annotation Latex equations and

legends and drawing shapes

2-D Plotting

Visualizing vectors of data with 2-D plotting functions that create

Line area bar and pie charts

Direction and velocity plots

Histograms

Polygons and surfaces

Scatterbubble plots

Animations

3-D Plotting and Volume Visualization

MATLAB provides functions for visualizing 2-D matrices 3-

D scalar and 3-D vector data You can use these functions to visualize and

understand large often complex multidimensional data Specifying plot

characteristics such as camera viewing angle perspective lighting effect

light source locations and transparency

3-D plotting functions include

Surface contour and mesh

Image plots

Cone slice stream and isosurface

323 PERFORMING NUMERIC COMPUTATION

MATLAB contains mathematical statistical and engineering

functions to support all common engineering and science operations These

functions developed by experts in mathematics are the foundation of the

MATLAB language The core math functions use the LAPACK and BLAS

linear algebra subroutine libraries and the FFTW Discrete Fourier

Transform library Because these processor-dependent libraries are

optimized to the different platforms that MATLAB supports they execute

faster than the equivalent C or C++ code

MATLAB provides the following types of functions for performing

mathematical operations and analyzing data

Matrix manipulation and linear algebra

Polynomials and interpolation

Fourier analysis and filtering

Data analysis and statistics

Optimization and numerical integration

Ordinary differential equations (ODEs)

Partial differential equations (PDEs)

Sparse matrix operations

MATLAB can perform arithmetic on a wide range of data types

including doubles singles and integers

CHAPTER 4

IMPLEMENTATION

41 GENERAL

Matlab is a program that was originally designed to simplify the

implementation of numerical linear algebra routines It has since grown into

something much bigger and it is used to implement numerical algorithms

for a wide range of applications The basic language used is very similar to

standard linear algebra notation but there are a few extensions that will

likely cause you some problems at first

42 SNAPSHOTS

ORIGINAL SCLERA IMAGE IS CONVERTED INTO GREY SCALE IMAGE

FIG

GREY SCALE IMAGE IS CONVERTED INTO BINARY IMAGE

FIG

EDGE DETECTON IS DONE BY OTSUrsquoS THRESHOLDING

FIG

SELECTING THE REGION OF INTEREST (SCLERA PART)

FIG

SELECTED ROI PART

FIG

FIG

ENHANCEMENT OF SCLERA IMAGE

FIG

FEATURE EXTRACTION OF SCLERA IMAGE USING GABOR

FILTERS

FIG

MATCHING WITH IMAGES IN DATABASE

FIG

DISPLAYING THE RESULT (MATCHED OR NOT MATCHED)

FIG

CHAPTER 5

APPLICATIONS

The applications of biometrics can be divided into the following three main groups

Commercial applications such as computer network login electronic data security ecommerce Internet access ATM credit card physical access control cellular phone PDA medical records management distance learning etc

Government applications such as national ID card correctional facility driverrsquos license social security welfare-disbursement border control Passport control etc

Forensic applications such as corpse identification criminal investigation terrorist identification parenthood determination missing children etc Traditionally commercial applications have used knowledge-based systems (eg PIN sand passwords) government applications have used token-based systems (eg ID cards and badges) and forensic applications have relied on human experts to match biometric features Biometric systems are being increasingly deployed in large scale civilian applications The Schiphol Premium scheme at the Amsterdam airport for example employs iris scan cards to speed up the passport and visa control procedures

CHAPTER 6

CONCLUSION AND FUTURE SCOPE

61 CONCLUSION

In this paper we proposed a new parallel sclera vein recognition

method which employees a two stage parallel approach for registration and

matching Even though the research focused on developing a parallel sclera

matching solution for the sequential line-descriptor method using CUDA

GPU architecture the parallel strategies developed in this research can be

applied to design parallel solutions to other sclera vein recognition methods

and general pattern recognition methods We designed the Y shape

descriptor to narrow the search range to increase the matching efficiency

which is a new feature extraction method to take advantage of the GPU

structures We developed the WPL descriptor to incorporate mask

information and make it more suitable for parallel computing which can

dramatically reduce data transferring and computation We then carefully

mapped our algorithms to GPU threads and blocks which is an important

step to achieve parallel computation efficiency using a GPU A work flow

which has high arithmetic intensity to hide the memory access latency was

designed to partition the computation task to the heterogeneous system of

CPU and GPU even to the threads in GPU The proposed method

dramatically improves the matching efficiency without compromising

recognition accuracy

62 REFERENCES

[1] C W Oyster The Human Eye Structure and Function Sunderland

Sinauer Associates 1999

[2] C Cuevas D Berjon F Moran and N Garcia ldquoMoving object

detection for real-time augmented reality applications in a GPGPUrdquo IEEE

Trans Consum Electron vol 58 no 1 pp 117ndash125 Feb 2012

[3] D C Cirean U Meier L M Gambardella and J Schmidhuber ldquoDeep

big simple neural nets for handwritten digit recognitionrdquo Neural Comput

vol 22 no 12 pp 3207ndash3220 2010

[4] F Z Sakr M Taher and A M Wahba ldquoHigh performance iris

recognition system on GPUrdquo in Proc ICCES 2011 pp 237ndash242

[5] G Poli J H Saito J F Mari and M R Zorzan ldquoProcessing

neocognitron of face recognition on high performance environment based

on GPU with CUDA architecturerdquo in Proc 20th Int Symp Comput

Archit High Perform Comput 2008 pp 81ndash88

[6] J Antikainen J Havel R Josth A Herout P Zemcik and M Hauta-

Kasari ldquoNonnegative tensor factorization accelerated using GPGPUrdquo IEEE

Trans Parallel Distrib Syst vol 22 no 7 pp 1135ndash1141 Feb 2011

[7] K-S Oh and K Jung ldquoGPU implementation of neural networksrdquo

Pattern Recognit vol 37 no 6 pp 1311ndash1314 2004

[8] P R Dixon T Oonishi and S Furui ldquoHarnessing graphics processors

for the fast computation of acoustic likelihoods in speech recognitionrdquo

Comput Speech Lang vol 23 no 4 pp 510ndash526 2009

[9] P Kaufman and A Alm ldquoClinical applicationrdquo Adlerrsquos Physiology of

the Eye 2003

[10] R N Rakvic B J Ulis R P Broussard R W Ives and N Steiner

ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics Security vol 4

no 4 pp 812ndash823 Dec 2009

[11] S Crihalmeanu and A Ross ldquoMultispectral scleral patterns for ocular

biometric recognitionrdquo Pattern Recognit Lett vol 33 no 14 pp 1860ndash

1869 Oct 2012

[12] W Wenying Z Dongming Z Yongdong L Jintao and G

Xiaoguang ldquoRobust spatial matching for object retrieval and its parallel

implementation on GPUrdquo IEEE Trans Multimedia vol 13 no 6 pp

1308ndash1318 Dec 2011Multimedia Sec Magdeburg Germany Sep 2004

pp 4ndash15

[13] Y Xu S Deka and R Righetti ldquoA hybrid CPU-GPGPU approach for

real-time elastographyrdquo IEEE Trans Ultrason Ferroelectr Freq Control

vol 58 no 12 pp 2631ndash2645 Dec 2011

[14] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive

multimodal eye recognitionrdquo Signal Image Video Process vol 7 no 4

pp 619ndash631 Jul 2013

[15] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive

approach for sclera image quality measurerdquo Int J Biometrics vol 5 no 2

pp 181ndash198 2013

[16] Z Zhou E Y Du N L Thomas and E J Delp ldquoA new human

identification method Sclera recognitionrdquo IEEE Trans Syst Man

Cybern A Syst Humans vol 42 no 3 pp 571ndash583 May 2012

Page 40: Major project report on

FIG

FIG

FIG

FIG

movement of eye Y shape branches are observed to be a stable feature and

can be used as sclera feature descriptor To detect the Y shape branches in

the original template we search for the nearest neighbors set of every line

segment in a regular distance classified the angles among these neighbors

If there were two types of angle values in the line segment set this set may

be inferred as a Y shape structure and the line segment angles would be

recorded as a new feature of the sclera

There are two ways to measure both orientation and relationship of

every branch of Y shape vessels one is to use the angles of every branch to

x axle the other is to use the angels between branch and iris radial

direction The first method needs additional rotation operating to align the

template In our approach we employed the second method As Figure 6

shows ϕ1 ϕ2 and ϕ3 denote the angle between each branch and the radius

from pupil center Even when the head tilts the eye moves or the camera

zooms occurs at the image acquisition step ϕ1 ϕ2 and ϕ3 are quite stable

To tolerate errors from the pupil center calculation in the segmentation step

we also recorded the center position (x y) of the Y shape branches as

auxiliary parameters So in our rotation shift and scale invariant feature

vector is defined as y(ϕ1 ϕ2 ϕ3 x y) The Y-shape descriptor is generated

with reference to the iris center Therefore it is automatically aligned to the

iris centers It is a rotational- and scale- invariant descriptor V WPL

SCLERA DESCRIPTOR As we discussed in the Section 22 the line

descriptor is extracted from the skeleton of vessel structure in binary images

(Figure 7) The skeleton is then broken into smaller segments For each

segment a line descriptor is created to record the center and orientation of

the segment This descriptor is expressed as s(x yɸ) where (x y) is the

position of the center and ɸ is its orientation Because of the limitation of

segmentation accuracy the descriptor in the boundary of sclera area might

not be accurate and may contain spur edges resulting from the iris eyelid

andor eyelashes To be tolerant of such error the mask file

FIG

The line descriptor of the sclera vessel pattern (a) An eye image (b) Vessel

patterns in sclera (c) Enhanced sclera vessel patterns (d) Centers of line

segments of vessel patterns

Is designed to indicate whether a line segment belongs to the edge of the

sclera or not However in GPU application using the mask is a challenging

since the mask files are large in size and will occupy the GPU memory and

slow down the data transfer When matching the registration RANSAC

type algorithm was used to randomly select the corresponding descriptors

and the transform parameter between them was used to generate the

template transform affine matrix After every templates transform the mask

data should also be transformed and new boundary should be calculated to

evaluate the weight of the transformed descriptor This results in too many

convolutions in processor unit

To reduce heavy data transfer and computation we designed the

weighted polar line (WPL) descriptor structure which includes the

information of mask and can be automatically aligned We extracted the

relationship of geometric feature of descriptors and store them as a new

descriptor We use a weighted image created via setting various weight

values according to their positions The weight of those descriptors who are

beyond the sclera are set to be 0 and those who are near the sclera

boundary are 05 and interior descriptors are set to be 1 In our work

descriptors weights were calculated on their own mask by the CPU only

once

The calculating result was saved as a component of descriptor The

descriptor of sclera will change to s(x y ɸw) where w denotes the weight

of the point and the value may be 0 05 1 To align two templates when a

template is shifted to another location along the line connecting their

centers all the descriptors of that template will be transformed It would be

faster if two templates have similar reference points If we use the center of

the iris as the reference point when two templates are compared the

correspondence will automatically be aligned to each other since they have

the similar reference point Every feature vector of the template is a set of

line segment descriptors composed of three variable (Figure 8) the

segment angle to the reference line which went through the iris center

denoted as θ the distance between the segments center and pupil center

which is denoted as r the dominant angular orientation of the segment

denoted as ɸ To minimize the GPU computing we also convert the

descriptor value from polar coordinate to rectangular coordinate in CPU

preprocess

The descriptor vector becomes s(x y r θ ɸw) The left and right

parts of sclera in an eye may have different registration parameters For

example as an eyeball moves left left part sclera patterns of the eye may be

compressed while the right part sclera patterns are stretched

In parallel matching these two parts are assigned to threads in

different warps to allow different deformation The multiprocessor in

CUDA manages threads in groups of 32 parallel threads called warps We

reorganized the descriptor from same sides and saved

FIG

FIG

them in continuous address This would meet requirement of coalesced

memory access in GPU

After reorganizing the structure of descriptors and adding mask information

into the new descriptor the computation on the mask file is not needed on

the GPU It was very fast to match with this feature because it does not

need to reregister the templates every time after shifting Thus the cost of

data transfer and computation on GPU will be reduced Matching on the

new descriptor the shift parameters generator in Figure 4 is then simplified

as Figure 9

23 EVOLUTION OF GPU ARCHITECTURE

The fixed-function pipeline lacked the generality to efficiently express

more complicated shading and lighting operations that are essential for

complex effects The key step was replacing the fixed-function per-vertex

and per-fragment operations with user-specified programs run on each

vertex and fragment Over the past six years these vertex programs and

fragment programs have become increasingly more capable with larger

limits on their size and resource consumption with more fully featured

instruction sets and with more flexible control-flow operations After many

years of separate instruction sets for vertex and fragment operations current

GPUs support the unified Shader Model 40 on both vertex and fragment

shaders

The hardware must support shader programs of at least 65 k static

instructions and unlimited dynamic instructions

The instruction set for the first time supports both 32-bit integers and 32-

bit floating-point numbers

The hardware must allow an arbitrary number of both direct and indirect

reads from global memory (texture)

Finally dynamic flow control in the form of loops and branches must be

supported

As the shader model has evolved and become more powerful and GPU

applications of all types have increased vertex and fragment program

complexity GPU architectures have increasingly focused on the

programmable parts of the graphics pipeline Indeed while previous

generations of GPUs could best be described as additions of

programmability to a fixed-function pipeline todayrsquos GPUs are better

characterized as a programmable engine surrounded by supporting fixed-

function units General-Purpose Computing on the GPU Mapping general-

purpose computation onto the GPU uses the graphics hardware in much the

same way as any standard graphics application Because of this similarity it

is both easier and more difficult to explain the process On one hand the

actual operations are the same and are easy to follow on the other hand the

terminology is different between graphics and general-purpose use Harris

provides an excellent description of this mapping process

We begin by describing GPU programming using graphics terminology

then show how the same steps are used in a general-purpose way to author

GPGPU applications and finally use the same steps to show the more

simple and direct way that todayrsquos GPU computing applications are written

231 PROGRAMMING A GPU FOR GRAPHICS

We begin with the same GPU pipeline that we described in Section II

concentrating on the programmable aspects of this pipeline

The programmer specifies geometry that covers a region on the screen

The rasterizer generates a fragment at each pixel location covered by that

geometry

Each fragment is shaded by the fragment program

The fragment program computes the value of the fragment by a

combination of math operations and global memory reads from a global

Btexture[ memory

The resulting image can then be used as texture on future passes through

the graphics pipeline

232 PROGRAMMING A GPU FOR GENERAL-PURPOSE

PROGRAMS (OLD)

Coopting this pipeline to perform general-purpose computation

involves the exact same steps but different terminology A motivating

example is a fluid simulation computed over a grid at each time step we

compute the next state of the fluid for each grid point from the current state

at its grid point and at the grid points of its neighbors

The programmer specifies a geometric primitive that covers a

computation domain of interest The rasterizer generates a fragment at each

pixel location covered by that geometry (In our example our primitive

must cover a grid of fragments equal to the domain size of our fluid

simulation)

Each fragment is shaded by an SPMD general-purpose fragment

program (Each grid point runs the same program to update the state of its

fluid)

The fragment program computes the value of the fragment by a

combination of math operations and Bgather[ accesses from global

memory (Each grid point can access the state of its neighbors from the

previous time step in computing its current value)

The resulting buffer in global memory can then be used as an input on

future passes (The current state of the fluid will be used on the next time

step)

23 2 PROGRAMMING A GPU FOR GENERAL-PURPOSE

PROGRAMS (NEW)

One of the historical difficulties in programming GPGPU applications

has been that despite their general-purpose tasksrsquo having nothing to do with

graphics the applications still had to be programmed using graphics APIs

In addition the program had to be structured in terms of the graphics

pipeline with the programmable units only accessible as an intermediate

step in that pipeline when the programmer would almost certainly prefer to

access the programmable units directly The programming environments we

describe in detail in Section IV are solving this difficulty by providing a

more natural direct non-graphics interface to the hardware and

specifically the programmable units Today GPU computing applications

are structured in the following way

The programmer directly defines the computation domain of interest as a

structured grid of threads

An SPMD general-purpose program computes the value of each thread

The value for each thread is computed by a combination of math

operations and both Bgather[ (read) accesses from and Bscatter[ (write)

accesses to global memory Unlike in the previous two

methods the same buffer can be used for both reading and writing

allowing more flexible algorithms (for example in-place algorithms that

use less memory)

The resulting buffer in global memory can then be used as an input in

future computation

24 COARSE-TO-FINE TWO-STAGE MATCHING PROCESS

To further improve the matching process we propose the coarse-to-fine

two-stage matching process In the first stage we matched two images

coarsely using the Y-shape descriptors which is very fast to match because

no registration was needed The matching result in this stage can help filter

out image pairs with low similarities After this step it is still possible for

some false positive matches In the second stage we used WPL descriptor

to register the two images for more detailed descriptor matching including

scale- and translation invariance This stage includes shift transform affine

matrix generation and final WPL descriptor matching Overall we

partitioned the registration and matching processing into four kernels2 in

CUDA (Figure 10) matching on the Y shape descriptor shift

transformation affine matrix generation and final WSL descriptor

matching Combining these two stages the matching program can run faster

and

achieve more accurate score

241 STAGE I MATCHING WITH Y SHAPE DESCRIPTOR

Due to scale- and rotation- invariance of the Y-shape features

registration is unnecessary before matching on Y shape descriptor The

whole matching algorithm is listed as algorithm 1

FIG

Here ytei and yta j are the Y shape descriptors of test template Tte

and target template Tta respectively dϕ is the Euclidian distance of angle

element of descriptors vector defined as (3) dxy is the Euclidian distance of

two descriptor centers defined as (4) ni and di are the matched descriptor

pairsrsquo number and their centers distance respectively tϕ is a distance

threshold and txy is the threshold to restrict the searching area We set tϕ to

30 and txy to 675 in our experiment Here

To match two sclera templates we searched the areas nearby to all

the Y shape branches The search area is limited to the corresponding left or

right half of the sclera in order to reduce the searching range and time The

distance of two branches is defined in (3) where ϕi j is the angle between

the j th branch and the polar from pupil center in desctiptor i

The number of matched pairs ni and the distance between Y shape

branches centers di are stored as the matching result We fuse the number of

matched branches and the average distance between matched branches

centers as (2) Here α is a factor to fuse the matching score which was set

to 30 in our study Ni and Nj is the total numbers of feature vectors in

template i and j separately The decision is regulated by the threshold t if

the sclerarsquos matching score is lower than t the sclera will be discarded The

sclera with high matching score will be passed to the next more precisely

matching process

242 STAGE II FINE MATCHING USING WPL DESCRIPTOR

The line segment WSL descriptor reveals more vessel structure detail of

sclera than the Y shape descriptor The variation of sclera vessel pattern is

nonlinear because

When acquiring an eye image in different gaze angle the vessel structure

will appear nonlinear shrink or extend because eyeball is spherical in shape

sclera is made up of four layers episclera stroma lamina fusca and

endothelium There are slightly differences among movement of these

layers Considering these factors our registration employed both single

shift transform and multi-parameter transform which combines shift

rotation and scale together

1) SHIFT PARAMETER SEARCH As we discussed before

segmentation may not be accurate As a result the detected iris center could

not be very accurate Shift transform is designed to tolerant possible errors

in pupil center detection in the segmentation step If there is no deformation

or only very minor deformation registration with shift transform together

would be adequate to achieve an accurate result We designed Algorithm 2

to get optimized shift parameter Where Tte is the test template and ssei is

the i th WPL descriptor of Tte Tta is the target template and ssai is the i th

WPL descriptor of Ttad(stek staj ) is Euclidean distance of escriptors stek

and staj

Δsk is the shift value of two descriptors defines as

We first randomly select an equal number of segment descriptors

stek in test template Tte from each quad and find its nearest neighbors staj _

in target template Tta The shift offset of them is recorded as the possible

registration shift factor _sk The final offset registration factor is _soptim

which has the smallest standard deviation among these candidate offsets

2) AFFINE TRANSFORM PARAMETER SEARCH

Affine transform is designed to tolerant some deformation of sclera

patterns in the matching step The affine transform algorithm is shown in

Algorithm 3 The shift value in the parameter set is obtained by randomly

selecting descriptor s(it )te and calculating the distance from its nearest

neighbor staj_ in Tta We transform the test template by the matrix in (7)

At end of the iteration we count the numbers of matched descriptor pairs

from the transformed template and the target template The factor β is

involved to determine if the pair of descriptor is matched and we set it to

be 20 pixels in our experiment After N iterations the optimized transform

parameter set is determined via selecting the maximum matching numbers

m(it) Here stei Tte staj and Tta is defined same as algorithm 2 tr (it )

shi f t θ(it )tr (it ) scale is the parameters of shift rotation and scale

transform generated in i tth iteration R(θ (it )) T (tr (it ) shi f t ) and S(tr (it

) scale) are the transform matrix defined as (7) To search optimize

transform parameter we iterated N times to generate these parameters In

our experiment we set iteration time to 512

3) REGISTRATION AND MATCHING ALGORITHM

Using the optimized parameter set determined from Algorithms 2

and 3 the test template will be registered and matched simultaneously The

registration and matching algorithm is listed in Algorithm 4 Here stei Tte

staj and Tta are defined same as Algorithms 2 and 3 θ(optm) tr (optm) shi f

t tr (optm) scale _soptim are the registration parameters attained from

Algorithms 2 and 3 R_θ(optm)_T _tr (optm) shi f t _S_tr (optm) scale

_ is the descriptor transform matrix defined in Algorithm 3 empty is the angle

between the segment descriptor and radius direction w is the weight of the

descriptor which indicates whether the descriptor is at the edge of sclera or

not To ensure that the nearest descriptors have a similar orientation we

used a constant factor α to check the abstract difference of two ɸ In our

experiment we set α to 5 The total matching score is minimal score of two

transformed result divided by the minimal matching score for test template

and target template

25 MAPPING THE SUBTASKS TO CUDA

CUDA is a single instruction multiple data (SIMD) system and

works as a coprocessor with a CPU A CUDA consists of many streaming

multiprocessors (SM) where the parallel part of the program should be

partitioned into threads by the programmer and mapped into those threads

There are multiple memory spaces in the CUDA memory hierarchy

register local memory shared memory global memory constant memory

and texture memory Register local memory and shared memory are on-

chip and could be a little time consuming to access these memories Only

shared memory can be accessed by other threads within the same block

However there is only limited availability of shared memory Global

memory constant memory and texture memory are off-chip memory and

accessible by all threads which would be very time consuming to access

these memories

Constant memory and texture memory are read-only and cacheable

memory Mapping algorithms to CUDA to achieve efficient processing is

not a trivial task There are several challenges in CUDA programming

If threads in a warp have different control path all the branches will be

executed serially To improve performance branch divergence within a

warp should be avoided

Global memory is slower than on-chip memory in term of access To

completely hide the latency of the small instructions set we should use on-

chip memory preferentially rather than global memory When global

memory access occurs threads in same warp should access the words in

sequence to achieve coalescence

Shared memory is much faster than the local and global memory space

But shared memory is organized into banks which are equal in size If two

addresses of memory request from different thread within a warp fall in the

same memory bank the access will be serialized To get maximum

performance memory requests should be scheduled to minimize bank

conflicts

251 MAPPING ALGORITHM TO BLOCKS

Because the proposed registration and matching algorithm has four

independent modules all the modules will be converted to different kernels

on the GPU These kernels are different in computation density thus we

map them to the GPU by various map strategies to fully utilize the

computing power of CUDA Figure 11 shows our scheme of CPU-GPU

task distribution and the partition among blocks and threads Algorithm 1 is

partitioned into coarse-grained parallel subtasks

We create a number of threads in this kernel The number of threads

is the same as the number of templates in the database As the upper middle

column shows in Figure 11 each target template will be assigned to one

thread One thread performs a pair of templates compare In our work we

use NVIDIA C2070 as our GPU Threads and blocks number is set to

1024 That means we can match our test template with up to 1024times1024

target templates at same time

Algorithms 2-4 will be partitioned into fine-grained subtasks which is

processed a section of descriptors in one thread As the lower portion of the

middle column shows in Figure 11 we assigned a target template to one

block Inside a block one thread corresponds a set of descriptors in this

template This partition makes every block execute independently and there

are no data exchange requirements between different blocks When all

threads complete their responding descriptor fractions the sum of the

intermediate results needs to be computed or compared A parallel prefix

sum algorithm is used to calculate the sum of intermediate results which is

show in right of Figure 11 Firstly all odd number threads compute the sum

of consecutive pairs of the results Then recursively every first of i (= 4 8

16 32 64 ) threads

compute the prefix sum on the new result The final result will be saved in

the first address which has the same variable name as the first intermediate

result

252 MAPPING INSIDE BLOCK

In shift argument searching there are two schemes we can choose to

map task

Mapping one pair of templates to all the threads in a block and then every

thread would take charge of a fraction of descriptors and cooperation with

other threads

Assigning a single possible shift offset to a thread and all the threads will

compute independently unless the final result should be compared with

other possible offset

Due to great number of sum and synchronization operations in every

nearest neighbor searching step we choose the second method to parallelize

shift searching In affine matrix generator we mapped an entire parameter

set searching to a thread and every thread randomly generated a set of

parameters and tried them independently The generated iterations were

assigned to all threads The challenge of this step is the randomly generated

numbers might be correlated among threads In the step of rotation and

scale registration generating we used the Mersenne Twister pseudorandom

number generator because it can use bitwise arithmetic and have long

period

The Mersenne twister as most of pseudorandom generators is iterative

Therefore itrsquos hard to parallelize a single twister state update step among

several execution threads To make sure that thousands of threads in the

launch grid generate uncorrelated random sequence many simultaneous

Mersenne twisters need to process with different initial states in parallel

But even ldquovery differentrdquo (by any definition) initial state values do not

prevent the emission of correlated sequences by each generator sharing

identical parameters To solve this problem and to enable efficient

implementation of Mersenne Twister on parallel architectures we used a

special offline tool for the dynamic creation of Mersenne Twisters

parameters modified from the algorithm developed by Makoto Matsumoto

and Takuji Nishimura In the registration and matching step when

searching the nearest neighbor a line segment that has already matched

with others should not be used again In our approach a flag

FIG

FIG

Variable denoting whether the line has been matched is stored in

shared memory To share the flags all the threads in a block should wait

synchronic operation at every query step Our solution is to use a single

thread in a block to process the matching

253 MEMORY MANAGEMENT

The bandwidth inside GPU board is much higher than the

bandwidth between host memory and device memory The data transfer

between host and device can lead to long latency As shown in Figure 11

we load the entire target templates set from database without considering

when they would be processed Therefore there was no data transfer from

host to device during the matching procedure In global memory the

components in descriptor y(ϕ1 ϕ2 ϕ3 x y) and s(x y rθ ϕw) were stored

separately This would guarantee contiguous kernels of Algorithm 2 to 4

can access their data in successive addresses Although such coalescing

access reduces the latency frequently global memory access was still a

slower way to get data In our kernel we loaded the test template to shared

memory to accelerate memory access Because the Algorithms 2 to 4

execute different number of iterations on same data the bank conflict does

not happen To maximize our texture memory space we set the system

cache to the lowest value and bonded our target descriptor to texture

memory Using this catchable memory our data access was accelerated

more

FIG

26 HISTOGRAM OF ORIENTED GRADIENTS

Histogram of oriented gradients is the feature descriptors It is primarily

applied to the design of target detection In this paper it is applied as the

feature for human recognition In the sclera region the vein patterns are the

edges of an image So HOG is used to determine the gradient orientation

and edge orientations of vein pattern in the sclera region of an eye image

To follow out this technique first of all divide the image into small

connected regions called cells For each cell compute the histogram of

gradient directions or edge orientations of the pixels Then the combination

of different histogram of different cell represents the descriptor To improve

accuracy histograms can be contrast normalized by calculating the intensity

from the block and then using this value normalizes all cells within the

block This normalization result shows that it is invariant to geometric and

photometric changes The gradient magnitude m(x y) and orientation 1050592(x

y) are calculated using x and y directions gradients dx (x y) and dy (x y)

Orientation binning is the second step of HOG This method utilized

to create cell histograms Each pixel within the cell used to give a weight to

the orientation which is found in the gradient computation Gradient

magnitude is used as the weight The cells are in the rectangular form The

binning of gradient orientation should be spread over 0 to 180 degrees and

opposite direction counts as the same In the Fig 8 depicts the edge

orientation of picture elements If the images have any illumination and

contrast changes then the gradient strength must be locally normalized For

that cells are grouped together into larger blocks These blocks are

overlapping so that each cell contributes more than once to the final

descriptor Here rectangular HOG (R-HOG) blocks are applied which are

mainly in square grids The performance of HOG is improved by putting

on a Gaussian window into each block

FIG

CHAPTER 3

SOFTWARE SPECIFICATION

31 GENERAL

MATLAB(matrix laboratory) is a numerical

computing environment and fourth-generation programming language

Developed by Math Works MATLAB allows matrix manipulations

plotting of functions and data implementation of algorithms creation

of user interfaces and interfacing with programs written in other languages

including C C++ Java and Fortran

Although MATLAB is intended primarily for numerical computing an

optional toolbox uses the MuPAD symbolic engine allowing access

to symbolic computing capabilities An additional package Simulink adds

graphicalmulti-domainsimulationandModel-Based

Design for dynamic and embedded systems

In 2004 MATLAB had around one million users across industry

and academia MATLAB users come from various backgrounds

of engineering science and economics MATLAB is widely used in

academic and research institutions as well as industrial enterprises

MATLAB was first adopted by researchers and practitioners

in control engineering Littles specialty but quickly spread to many other

domains It is now also used in education in particular the teaching

of linear algebra and numerical analysis and is popular amongst scientists

involved in image processing The MATLAB application is built around the

MATLAB language The simplest way to execute MATLAB code is to type

it in the Command Window which is one of the elements of the MATLAB

Desktop When code is entered in the Command Window MATLAB can

be used as an interactive mathematical shell Sequences of commands can

be saved in a text file typically using the MATLAB Editor as a script or

encapsulated into a function extending the commands available

MATLAB provides a number of features for documenting and

sharing your work You can integrate your MATLAB code with other

languages and applications and distribute your MATLAB algorithms and

applications

32 FEATURES OF MATLAB

High-level language for technical computing

Development environment for managing code files and data

Interactive tools for iterative exploration design and problem solving

Mathematical functions for linear algebra statistics Fourier analysis

filtering optimization and numerical integration

2-D and 3-D graphics functions for visualizing data

Tools for building custom graphical user interfaces

Functions for integrating MATLAB based algorithms with external

applications and languages such as C C++ FORTRAN Javatrade COM

and Microsoft Excel

MATLAB is used in vast area including signal and image

processing communications control design test and measurement

financial modeling and analysis and computational Add-on toolboxes

(collections of special-purpose MATLAB functions) extend the MATLAB

environment to solve particular classes of problems in these application

areas

MATLAB can be used on personal computers and powerful

server systems including the Cheaha compute cluster With the addition of

the Parallel Computing Toolbox the language can be extended with parallel

implementations for common computational functions including for-loop

unrolling Additionally this toolbox supports offloading computationally

intensive workloads to Cheaha the campus compute cluster MATLAB is

one of a few languages in which each variable is a matrix (broadly

construed) and knows how big it is Moreover the fundamental operators

(eg addition multiplication) are programmed to deal with matrices when

required And the MATLAB environment handles much of the bothersome

housekeeping that makes all this possible Since so many of the procedures

required for Macro-Investment Analysis involves matrices MATLAB

proves to be an extremely efficient language for both communication and

implementation

321 INTERFACING WITH OTHER LANGUAGES

MATLAB can call functions and subroutines written in the C

programming language or FORTRAN A wrapper function is created

allowing MATLAB data types to be passed and returned The dynamically

loadable object files created by compiling such functions are termed MEX-

files (for MATLAB executable)

Libraries written in Java ActiveX or NET can be directly called

from MATLAB and many MATLAB libraries (for

example XML or SQL support) are implemented as wrappers around Java

or ActiveX libraries Calling MATLAB from Java is more complicated but

can be done with MATLAB extension which is sold separately by Math

Works or using an undocumented mechanism called JMI (Java-to-Mat lab

Interface) which should not be confused with the unrelated Java that is also

called JMI

As alternatives to the MuPAD based Symbolic Math Toolbox

available from Math Works MATLAB can be connected

to Maple or Mathematical

Libraries also exist to import and export MathML

Development Environment

Startup Accelerator for faster MATLAB startup on Windows especially on

Windows XP and for network installations

Spreadsheet Import Tool that provides more options for selecting and

loading mixed textual and numeric data

Readability and navigation improvements to warning and error messages in

the MATLAB command window

Automatic variable and function renaming in the MATLAB Editor

Developing Algorithms and Applications

MATLAB provides a high-level language and development

tools that let you quickly develop and analyze your algorithms and

applications

The MATLAB Language

The MATLAB language supports the vector and matrix operations

that are fundamental to engineering and scientific problems It enables fast

development and execution With the MATLAB language you can

program and develop algorithms faster than with traditional languages

because you do not need to perform low-level administrative tasks such as

declaring variables specifying data types and allocating memory In many

cases MATLAB eliminates the need for lsquoforrsquo loops As a result one line of

MATLAB code can often replace several lines of C or C++ code

At the same time MATLAB provides all the features of a traditional

programming language including arithmetic operators flow control data

structures data types object-oriented programming (OOP) and debugging

features

MATLAB lets you execute commands or groups of commands one

at a time without compiling and linking enabling you to quickly iterate to

the optimal solution For fast execution of heavy matrix and vector

computations MATLAB uses processor-optimized libraries For general-

purpose scalar computations MATLAB generates machine-code

instructions using its JIT (Just-In-Time) compilation technology

This technology which is available on most platforms provides

execution speeds that rival those of traditional programming languages

Development Tools

MATLAB includes development tools that help you implement

your algorithm efficiently These include the following

MATLAB Editor

Provides standard editing and debugging features such as setting

breakpoints and single stepping

Code Analyzer

Checks your code for problems and recommends modifications to

maximize performance and maintainability

MATLAB Profiler

Records the time spent executing each line of code

Directory Reports

Scan all the files in a directory and report on code efficiency file

differences file dependencies and code coverage

Designing Graphical User Interfaces

By using the interactive tool GUIDE (Graphical User Interface

Development Environment) to layout design and edit user interfaces

GUIDE lets you include list boxes pull-down menus push buttons radio

buttons and sliders as well as MATLAB plots and Microsoft

ActiveXreg controls Alternatively you can create GUIs programmatically

using MATLAB functions

322 ANALYZING AND ACCESSING DATA

MATLAB supports the entire data analysis process from acquiring

data from external devices and databases through preprocessing

visualization and numerical analysis to producing presentation-quality

output

Data Analysis

MATLAB provides interactive tools and command-line functions for data

analysis operations including

Interpolating and decimating

Extracting sections of data scaling and averaging

Thresholding and smoothing

Correlation Fourier analysis and filtering

1-D peak valley and zero finding

Basic statistics and curve fitting

Matrix analysis

Data Access

MATLAB is an efficient platform for accessing data from

files other applications databases and external devices You can read data

from popular file formats such as Microsoft Excel ASCII text or binary

files image sound and video files and scientific files such as HDF and

HDF5 Low-level binary file IO functions let you work with data files in

any format Additional functions let you read data from Web pages and

XML

Visualizing Data

All the graphics features that are required to visualize engineering

and scientific data are available in MATLAB These include 2-D and 3-D

plotting functions 3-D volume visualization functions tools for

interactively creating plots and the ability to export results to all popular

graphics formats You can customize plots by adding multiple axes

changing line colors and markers adding annotation Latex equations and

legends and drawing shapes

2-D Plotting

Visualizing vectors of data with 2-D plotting functions that create

Line area bar and pie charts

Direction and velocity plots

Histograms

Polygons and surfaces

Scatterbubble plots

Animations

3-D Plotting and Volume Visualization

MATLAB provides functions for visualizing 2-D matrices 3-

D scalar and 3-D vector data You can use these functions to visualize and

understand large often complex multidimensional data Specifying plot

characteristics such as camera viewing angle perspective lighting effect

light source locations and transparency

3-D plotting functions include

Surface contour and mesh

Image plots

Cone slice stream and isosurface

323 PERFORMING NUMERIC COMPUTATION

MATLAB contains mathematical statistical and engineering

functions to support all common engineering and science operations These

functions developed by experts in mathematics are the foundation of the

MATLAB language The core math functions use the LAPACK and BLAS

linear algebra subroutine libraries and the FFTW Discrete Fourier

Transform library Because these processor-dependent libraries are

optimized to the different platforms that MATLAB supports they execute

faster than the equivalent C or C++ code

MATLAB provides the following types of functions for performing

mathematical operations and analyzing data

Matrix manipulation and linear algebra

Polynomials and interpolation

Fourier analysis and filtering

Data analysis and statistics

Optimization and numerical integration

Ordinary differential equations (ODEs)

Partial differential equations (PDEs)

Sparse matrix operations

MATLAB can perform arithmetic on a wide range of data types

including doubles singles and integers

CHAPTER 4

IMPLEMENTATION

41 GENERAL

Matlab is a program that was originally designed to simplify the

implementation of numerical linear algebra routines It has since grown into

something much bigger and it is used to implement numerical algorithms

for a wide range of applications The basic language used is very similar to

standard linear algebra notation but there are a few extensions that will

likely cause you some problems at first

42 SNAPSHOTS

ORIGINAL SCLERA IMAGE IS CONVERTED INTO GREY SCALE IMAGE

FIG

GREY SCALE IMAGE IS CONVERTED INTO BINARY IMAGE

FIG

EDGE DETECTON IS DONE BY OTSUrsquoS THRESHOLDING

FIG

SELECTING THE REGION OF INTEREST (SCLERA PART)

FIG

SELECTED ROI PART

FIG

FIG

ENHANCEMENT OF SCLERA IMAGE

FIG

FEATURE EXTRACTION OF SCLERA IMAGE USING GABOR

FILTERS

FIG

MATCHING WITH IMAGES IN DATABASE

FIG

DISPLAYING THE RESULT (MATCHED OR NOT MATCHED)

FIG

CHAPTER 5

APPLICATIONS

The applications of biometrics can be divided into the following three main groups

Commercial applications such as computer network login electronic data security ecommerce Internet access ATM credit card physical access control cellular phone PDA medical records management distance learning etc

Government applications such as national ID card correctional facility driverrsquos license social security welfare-disbursement border control Passport control etc

Forensic applications such as corpse identification criminal investigation terrorist identification parenthood determination missing children etc Traditionally commercial applications have used knowledge-based systems (eg PIN sand passwords) government applications have used token-based systems (eg ID cards and badges) and forensic applications have relied on human experts to match biometric features Biometric systems are being increasingly deployed in large scale civilian applications The Schiphol Premium scheme at the Amsterdam airport for example employs iris scan cards to speed up the passport and visa control procedures

CHAPTER 6

CONCLUSION AND FUTURE SCOPE

61 CONCLUSION

In this paper we proposed a new parallel sclera vein recognition

method which employees a two stage parallel approach for registration and

matching Even though the research focused on developing a parallel sclera

matching solution for the sequential line-descriptor method using CUDA

GPU architecture the parallel strategies developed in this research can be

applied to design parallel solutions to other sclera vein recognition methods

and general pattern recognition methods We designed the Y shape

descriptor to narrow the search range to increase the matching efficiency

which is a new feature extraction method to take advantage of the GPU

structures We developed the WPL descriptor to incorporate mask

information and make it more suitable for parallel computing which can

dramatically reduce data transferring and computation We then carefully

mapped our algorithms to GPU threads and blocks which is an important

step to achieve parallel computation efficiency using a GPU A work flow

which has high arithmetic intensity to hide the memory access latency was

designed to partition the computation task to the heterogeneous system of

CPU and GPU even to the threads in GPU The proposed method

dramatically improves the matching efficiency without compromising

recognition accuracy

62 REFERENCES

[1] C W Oyster The Human Eye Structure and Function Sunderland

Sinauer Associates 1999

[2] C Cuevas D Berjon F Moran and N Garcia ldquoMoving object

detection for real-time augmented reality applications in a GPGPUrdquo IEEE

Trans Consum Electron vol 58 no 1 pp 117ndash125 Feb 2012

[3] D C Cirean U Meier L M Gambardella and J Schmidhuber ldquoDeep

big simple neural nets for handwritten digit recognitionrdquo Neural Comput

vol 22 no 12 pp 3207ndash3220 2010

[4] F Z Sakr M Taher and A M Wahba ldquoHigh performance iris

recognition system on GPUrdquo in Proc ICCES 2011 pp 237ndash242

[5] G Poli J H Saito J F Mari and M R Zorzan ldquoProcessing

neocognitron of face recognition on high performance environment based

on GPU with CUDA architecturerdquo in Proc 20th Int Symp Comput

Archit High Perform Comput 2008 pp 81ndash88

[6] J Antikainen J Havel R Josth A Herout P Zemcik and M Hauta-

Kasari ldquoNonnegative tensor factorization accelerated using GPGPUrdquo IEEE

Trans Parallel Distrib Syst vol 22 no 7 pp 1135ndash1141 Feb 2011

[7] K-S Oh and K Jung ldquoGPU implementation of neural networksrdquo

Pattern Recognit vol 37 no 6 pp 1311ndash1314 2004

[8] P R Dixon T Oonishi and S Furui ldquoHarnessing graphics processors

for the fast computation of acoustic likelihoods in speech recognitionrdquo

Comput Speech Lang vol 23 no 4 pp 510ndash526 2009

[9] P Kaufman and A Alm ldquoClinical applicationrdquo Adlerrsquos Physiology of

the Eye 2003

[10] R N Rakvic B J Ulis R P Broussard R W Ives and N Steiner

ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics Security vol 4

no 4 pp 812ndash823 Dec 2009

[11] S Crihalmeanu and A Ross ldquoMultispectral scleral patterns for ocular

biometric recognitionrdquo Pattern Recognit Lett vol 33 no 14 pp 1860ndash

1869 Oct 2012

[12] W Wenying Z Dongming Z Yongdong L Jintao and G

Xiaoguang ldquoRobust spatial matching for object retrieval and its parallel

implementation on GPUrdquo IEEE Trans Multimedia vol 13 no 6 pp

1308ndash1318 Dec 2011Multimedia Sec Magdeburg Germany Sep 2004

pp 4ndash15

[13] Y Xu S Deka and R Righetti ldquoA hybrid CPU-GPGPU approach for

real-time elastographyrdquo IEEE Trans Ultrason Ferroelectr Freq Control

vol 58 no 12 pp 2631ndash2645 Dec 2011

[14] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive

multimodal eye recognitionrdquo Signal Image Video Process vol 7 no 4

pp 619ndash631 Jul 2013

[15] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive

approach for sclera image quality measurerdquo Int J Biometrics vol 5 no 2

pp 181ndash198 2013

[16] Z Zhou E Y Du N L Thomas and E J Delp ldquoA new human

identification method Sclera recognitionrdquo IEEE Trans Syst Man

Cybern A Syst Humans vol 42 no 3 pp 571ndash583 May 2012

Page 41: Major project report on

FIG

FIG

movement of eye Y shape branches are observed to be a stable feature and

can be used as sclera feature descriptor To detect the Y shape branches in

the original template we search for the nearest neighbors set of every line

segment in a regular distance classified the angles among these neighbors

If there were two types of angle values in the line segment set this set may

be inferred as a Y shape structure and the line segment angles would be

recorded as a new feature of the sclera

There are two ways to measure both orientation and relationship of

every branch of Y shape vessels one is to use the angles of every branch to

x axle the other is to use the angels between branch and iris radial

direction The first method needs additional rotation operating to align the

template In our approach we employed the second method As Figure 6

shows ϕ1 ϕ2 and ϕ3 denote the angle between each branch and the radius

from pupil center Even when the head tilts the eye moves or the camera

zooms occurs at the image acquisition step ϕ1 ϕ2 and ϕ3 are quite stable

To tolerate errors from the pupil center calculation in the segmentation step

we also recorded the center position (x y) of the Y shape branches as

auxiliary parameters So in our rotation shift and scale invariant feature

vector is defined as y(ϕ1 ϕ2 ϕ3 x y) The Y-shape descriptor is generated

with reference to the iris center Therefore it is automatically aligned to the

iris centers It is a rotational- and scale- invariant descriptor V WPL

SCLERA DESCRIPTOR As we discussed in the Section 22 the line

descriptor is extracted from the skeleton of vessel structure in binary images

(Figure 7) The skeleton is then broken into smaller segments For each

segment a line descriptor is created to record the center and orientation of

the segment This descriptor is expressed as s(x yɸ) where (x y) is the

position of the center and ɸ is its orientation Because of the limitation of

segmentation accuracy the descriptor in the boundary of sclera area might

not be accurate and may contain spur edges resulting from the iris eyelid

andor eyelashes To be tolerant of such error the mask file

FIG

The line descriptor of the sclera vessel pattern (a) An eye image (b) Vessel

patterns in sclera (c) Enhanced sclera vessel patterns (d) Centers of line

segments of vessel patterns

Is designed to indicate whether a line segment belongs to the edge of the

sclera or not However in GPU application using the mask is a challenging

since the mask files are large in size and will occupy the GPU memory and

slow down the data transfer When matching the registration RANSAC

type algorithm was used to randomly select the corresponding descriptors

and the transform parameter between them was used to generate the

template transform affine matrix After every templates transform the mask

data should also be transformed and new boundary should be calculated to

evaluate the weight of the transformed descriptor This results in too many

convolutions in processor unit

To reduce heavy data transfer and computation we designed the

weighted polar line (WPL) descriptor structure which includes the

information of mask and can be automatically aligned We extracted the

relationship of geometric feature of descriptors and store them as a new

descriptor We use a weighted image created via setting various weight

values according to their positions The weight of those descriptors who are

beyond the sclera are set to be 0 and those who are near the sclera

boundary are 05 and interior descriptors are set to be 1 In our work

descriptors weights were calculated on their own mask by the CPU only

once

The calculating result was saved as a component of descriptor The

descriptor of sclera will change to s(x y ɸw) where w denotes the weight

of the point and the value may be 0 05 1 To align two templates when a

template is shifted to another location along the line connecting their

centers all the descriptors of that template will be transformed It would be

faster if two templates have similar reference points If we use the center of

the iris as the reference point when two templates are compared the

correspondence will automatically be aligned to each other since they have

the similar reference point Every feature vector of the template is a set of

line segment descriptors composed of three variable (Figure 8) the

segment angle to the reference line which went through the iris center

denoted as θ the distance between the segments center and pupil center

which is denoted as r the dominant angular orientation of the segment

denoted as ɸ To minimize the GPU computing we also convert the

descriptor value from polar coordinate to rectangular coordinate in CPU

preprocess

The descriptor vector becomes s(x y r θ ɸw) The left and right

parts of sclera in an eye may have different registration parameters For

example as an eyeball moves left left part sclera patterns of the eye may be

compressed while the right part sclera patterns are stretched

In parallel matching these two parts are assigned to threads in

different warps to allow different deformation The multiprocessor in

CUDA manages threads in groups of 32 parallel threads called warps We

reorganized the descriptor from same sides and saved

FIG

FIG

them in continuous address This would meet requirement of coalesced

memory access in GPU

After reorganizing the structure of descriptors and adding mask information

into the new descriptor the computation on the mask file is not needed on

the GPU It was very fast to match with this feature because it does not

need to reregister the templates every time after shifting Thus the cost of

data transfer and computation on GPU will be reduced Matching on the

new descriptor the shift parameters generator in Figure 4 is then simplified

as Figure 9

23 EVOLUTION OF GPU ARCHITECTURE

The fixed-function pipeline lacked the generality to efficiently express

more complicated shading and lighting operations that are essential for

complex effects The key step was replacing the fixed-function per-vertex

and per-fragment operations with user-specified programs run on each

vertex and fragment Over the past six years these vertex programs and

fragment programs have become increasingly more capable with larger

limits on their size and resource consumption with more fully featured

instruction sets and with more flexible control-flow operations After many

years of separate instruction sets for vertex and fragment operations current

GPUs support the unified Shader Model 40 on both vertex and fragment

shaders

The hardware must support shader programs of at least 65 k static

instructions and unlimited dynamic instructions

The instruction set for the first time supports both 32-bit integers and 32-

bit floating-point numbers

The hardware must allow an arbitrary number of both direct and indirect

reads from global memory (texture)

Finally dynamic flow control in the form of loops and branches must be

supported

As the shader model has evolved and become more powerful and GPU

applications of all types have increased vertex and fragment program

complexity GPU architectures have increasingly focused on the

programmable parts of the graphics pipeline Indeed while previous

generations of GPUs could best be described as additions of

programmability to a fixed-function pipeline todayrsquos GPUs are better

characterized as a programmable engine surrounded by supporting fixed-

function units General-Purpose Computing on the GPU Mapping general-

purpose computation onto the GPU uses the graphics hardware in much the

same way as any standard graphics application Because of this similarity it

is both easier and more difficult to explain the process On one hand the

actual operations are the same and are easy to follow on the other hand the

terminology is different between graphics and general-purpose use Harris

provides an excellent description of this mapping process

We begin by describing GPU programming using graphics terminology

then show how the same steps are used in a general-purpose way to author

GPGPU applications and finally use the same steps to show the more

simple and direct way that todayrsquos GPU computing applications are written

231 PROGRAMMING A GPU FOR GRAPHICS

We begin with the same GPU pipeline that we described in Section II

concentrating on the programmable aspects of this pipeline

The programmer specifies geometry that covers a region on the screen

The rasterizer generates a fragment at each pixel location covered by that

geometry

Each fragment is shaded by the fragment program

The fragment program computes the value of the fragment by a

combination of math operations and global memory reads from a global

Btexture[ memory

The resulting image can then be used as texture on future passes through

the graphics pipeline

232 PROGRAMMING A GPU FOR GENERAL-PURPOSE

PROGRAMS (OLD)

Coopting this pipeline to perform general-purpose computation

involves the exact same steps but different terminology A motivating

example is a fluid simulation computed over a grid at each time step we

compute the next state of the fluid for each grid point from the current state

at its grid point and at the grid points of its neighbors

The programmer specifies a geometric primitive that covers a

computation domain of interest The rasterizer generates a fragment at each

pixel location covered by that geometry (In our example our primitive

must cover a grid of fragments equal to the domain size of our fluid

simulation)

Each fragment is shaded by an SPMD general-purpose fragment

program (Each grid point runs the same program to update the state of its

fluid)

The fragment program computes the value of the fragment by a

combination of math operations and Bgather[ accesses from global

memory (Each grid point can access the state of its neighbors from the

previous time step in computing its current value)

The resulting buffer in global memory can then be used as an input on

future passes (The current state of the fluid will be used on the next time

step)

23 2 PROGRAMMING A GPU FOR GENERAL-PURPOSE

PROGRAMS (NEW)

One of the historical difficulties in programming GPGPU applications

has been that despite their general-purpose tasksrsquo having nothing to do with

graphics the applications still had to be programmed using graphics APIs

In addition the program had to be structured in terms of the graphics

pipeline with the programmable units only accessible as an intermediate

step in that pipeline when the programmer would almost certainly prefer to

access the programmable units directly The programming environments we

describe in detail in Section IV are solving this difficulty by providing a

more natural direct non-graphics interface to the hardware and

specifically the programmable units Today GPU computing applications

are structured in the following way

The programmer directly defines the computation domain of interest as a

structured grid of threads

An SPMD general-purpose program computes the value of each thread

The value for each thread is computed by a combination of math

operations and both Bgather[ (read) accesses from and Bscatter[ (write)

accesses to global memory Unlike in the previous two

methods the same buffer can be used for both reading and writing

allowing more flexible algorithms (for example in-place algorithms that

use less memory)

The resulting buffer in global memory can then be used as an input in

future computation

24 COARSE-TO-FINE TWO-STAGE MATCHING PROCESS

To further improve the matching process we propose the coarse-to-fine

two-stage matching process In the first stage we matched two images

coarsely using the Y-shape descriptors which is very fast to match because

no registration was needed The matching result in this stage can help filter

out image pairs with low similarities After this step it is still possible for

some false positive matches In the second stage we used WPL descriptor

to register the two images for more detailed descriptor matching including

scale- and translation invariance This stage includes shift transform affine

matrix generation and final WPL descriptor matching Overall we

partitioned the registration and matching processing into four kernels2 in

CUDA (Figure 10) matching on the Y shape descriptor shift

transformation affine matrix generation and final WSL descriptor

matching Combining these two stages the matching program can run faster

and

achieve more accurate score

241 STAGE I MATCHING WITH Y SHAPE DESCRIPTOR

Due to scale- and rotation- invariance of the Y-shape features

registration is unnecessary before matching on Y shape descriptor The

whole matching algorithm is listed as algorithm 1

FIG

Here ytei and yta j are the Y shape descriptors of test template Tte

and target template Tta respectively dϕ is the Euclidian distance of angle

element of descriptors vector defined as (3) dxy is the Euclidian distance of

two descriptor centers defined as (4) ni and di are the matched descriptor

pairsrsquo number and their centers distance respectively tϕ is a distance

threshold and txy is the threshold to restrict the searching area We set tϕ to

30 and txy to 675 in our experiment Here

To match two sclera templates we searched the areas nearby to all

the Y shape branches The search area is limited to the corresponding left or

right half of the sclera in order to reduce the searching range and time The

distance of two branches is defined in (3) where ϕi j is the angle between

the j th branch and the polar from pupil center in desctiptor i

The number of matched pairs ni and the distance between Y shape

branches centers di are stored as the matching result We fuse the number of

matched branches and the average distance between matched branches

centers as (2) Here α is a factor to fuse the matching score which was set

to 30 in our study Ni and Nj is the total numbers of feature vectors in

template i and j separately The decision is regulated by the threshold t if

the sclerarsquos matching score is lower than t the sclera will be discarded The

sclera with high matching score will be passed to the next more precisely

matching process

242 STAGE II FINE MATCHING USING WPL DESCRIPTOR

The line segment WSL descriptor reveals more vessel structure detail of

sclera than the Y shape descriptor The variation of sclera vessel pattern is

nonlinear because

When acquiring an eye image in different gaze angle the vessel structure

will appear nonlinear shrink or extend because eyeball is spherical in shape

sclera is made up of four layers episclera stroma lamina fusca and

endothelium There are slightly differences among movement of these

layers Considering these factors our registration employed both single

shift transform and multi-parameter transform which combines shift

rotation and scale together

1) SHIFT PARAMETER SEARCH As we discussed before

segmentation may not be accurate As a result the detected iris center could

not be very accurate Shift transform is designed to tolerant possible errors

in pupil center detection in the segmentation step If there is no deformation

or only very minor deformation registration with shift transform together

would be adequate to achieve an accurate result We designed Algorithm 2

to get optimized shift parameter Where Tte is the test template and ssei is

the i th WPL descriptor of Tte Tta is the target template and ssai is the i th

WPL descriptor of Ttad(stek staj ) is Euclidean distance of escriptors stek

and staj

Δsk is the shift value of two descriptors defines as

We first randomly select an equal number of segment descriptors

stek in test template Tte from each quad and find its nearest neighbors staj _

in target template Tta The shift offset of them is recorded as the possible

registration shift factor _sk The final offset registration factor is _soptim

which has the smallest standard deviation among these candidate offsets

2) AFFINE TRANSFORM PARAMETER SEARCH

Affine transform is designed to tolerant some deformation of sclera

patterns in the matching step The affine transform algorithm is shown in

Algorithm 3 The shift value in the parameter set is obtained by randomly

selecting descriptor s(it )te and calculating the distance from its nearest

neighbor staj_ in Tta We transform the test template by the matrix in (7)

At end of the iteration we count the numbers of matched descriptor pairs

from the transformed template and the target template The factor β is

involved to determine if the pair of descriptor is matched and we set it to

be 20 pixels in our experiment After N iterations the optimized transform

parameter set is determined via selecting the maximum matching numbers

m(it) Here stei Tte staj and Tta is defined same as algorithm 2 tr (it )

shi f t θ(it )tr (it ) scale is the parameters of shift rotation and scale

transform generated in i tth iteration R(θ (it )) T (tr (it ) shi f t ) and S(tr (it

) scale) are the transform matrix defined as (7) To search optimize

transform parameter we iterated N times to generate these parameters In

our experiment we set iteration time to 512

3) REGISTRATION AND MATCHING ALGORITHM

Using the optimized parameter set determined from Algorithms 2

and 3 the test template will be registered and matched simultaneously The

registration and matching algorithm is listed in Algorithm 4 Here stei Tte

staj and Tta are defined same as Algorithms 2 and 3 θ(optm) tr (optm) shi f

t tr (optm) scale _soptim are the registration parameters attained from

Algorithms 2 and 3 R_θ(optm)_T _tr (optm) shi f t _S_tr (optm) scale

_ is the descriptor transform matrix defined in Algorithm 3 empty is the angle

between the segment descriptor and radius direction w is the weight of the

descriptor which indicates whether the descriptor is at the edge of sclera or

not To ensure that the nearest descriptors have a similar orientation we

used a constant factor α to check the abstract difference of two ɸ In our

experiment we set α to 5 The total matching score is minimal score of two

transformed result divided by the minimal matching score for test template

and target template

25 MAPPING THE SUBTASKS TO CUDA

CUDA is a single instruction multiple data (SIMD) system and

works as a coprocessor with a CPU A CUDA consists of many streaming

multiprocessors (SM) where the parallel part of the program should be

partitioned into threads by the programmer and mapped into those threads

There are multiple memory spaces in the CUDA memory hierarchy

register local memory shared memory global memory constant memory

and texture memory Register local memory and shared memory are on-

chip and could be a little time consuming to access these memories Only

shared memory can be accessed by other threads within the same block

However there is only limited availability of shared memory Global

memory constant memory and texture memory are off-chip memory and

accessible by all threads which would be very time consuming to access

these memories

Constant memory and texture memory are read-only and cacheable

memory Mapping algorithms to CUDA to achieve efficient processing is

not a trivial task There are several challenges in CUDA programming

If threads in a warp have different control path all the branches will be

executed serially To improve performance branch divergence within a

warp should be avoided

Global memory is slower than on-chip memory in term of access To

completely hide the latency of the small instructions set we should use on-

chip memory preferentially rather than global memory When global

memory access occurs threads in same warp should access the words in

sequence to achieve coalescence

Shared memory is much faster than the local and global memory space

But shared memory is organized into banks which are equal in size If two

addresses of memory request from different thread within a warp fall in the

same memory bank the access will be serialized To get maximum

performance memory requests should be scheduled to minimize bank

conflicts

251 MAPPING ALGORITHM TO BLOCKS

Because the proposed registration and matching algorithm has four

independent modules all the modules will be converted to different kernels

on the GPU These kernels are different in computation density thus we

map them to the GPU by various map strategies to fully utilize the

computing power of CUDA Figure 11 shows our scheme of CPU-GPU

task distribution and the partition among blocks and threads Algorithm 1 is

partitioned into coarse-grained parallel subtasks

We create a number of threads in this kernel The number of threads

is the same as the number of templates in the database As the upper middle

column shows in Figure 11 each target template will be assigned to one

thread One thread performs a pair of templates compare In our work we

use NVIDIA C2070 as our GPU Threads and blocks number is set to

1024 That means we can match our test template with up to 1024times1024

target templates at same time

Algorithms 2-4 will be partitioned into fine-grained subtasks which is

processed a section of descriptors in one thread As the lower portion of the

middle column shows in Figure 11 we assigned a target template to one

block Inside a block one thread corresponds a set of descriptors in this

template This partition makes every block execute independently and there

are no data exchange requirements between different blocks When all

threads complete their responding descriptor fractions the sum of the

intermediate results needs to be computed or compared A parallel prefix

sum algorithm is used to calculate the sum of intermediate results which is

show in right of Figure 11 Firstly all odd number threads compute the sum

of consecutive pairs of the results Then recursively every first of i (= 4 8

16 32 64 ) threads

compute the prefix sum on the new result The final result will be saved in

the first address which has the same variable name as the first intermediate

result

252 MAPPING INSIDE BLOCK

In shift argument searching there are two schemes we can choose to

map task

Mapping one pair of templates to all the threads in a block and then every

thread would take charge of a fraction of descriptors and cooperation with

other threads

Assigning a single possible shift offset to a thread and all the threads will

compute independently unless the final result should be compared with

other possible offset

Due to great number of sum and synchronization operations in every

nearest neighbor searching step we choose the second method to parallelize

shift searching In affine matrix generator we mapped an entire parameter

set searching to a thread and every thread randomly generated a set of

parameters and tried them independently The generated iterations were

assigned to all threads The challenge of this step is the randomly generated

numbers might be correlated among threads In the step of rotation and

scale registration generating we used the Mersenne Twister pseudorandom

number generator because it can use bitwise arithmetic and have long

period

The Mersenne twister as most of pseudorandom generators is iterative

Therefore itrsquos hard to parallelize a single twister state update step among

several execution threads To make sure that thousands of threads in the

launch grid generate uncorrelated random sequence many simultaneous

Mersenne twisters need to process with different initial states in parallel

But even ldquovery differentrdquo (by any definition) initial state values do not

prevent the emission of correlated sequences by each generator sharing

identical parameters To solve this problem and to enable efficient

implementation of Mersenne Twister on parallel architectures we used a

special offline tool for the dynamic creation of Mersenne Twisters

parameters modified from the algorithm developed by Makoto Matsumoto

and Takuji Nishimura In the registration and matching step when

searching the nearest neighbor a line segment that has already matched

with others should not be used again In our approach a flag

FIG

FIG

Variable denoting whether the line has been matched is stored in

shared memory To share the flags all the threads in a block should wait

synchronic operation at every query step Our solution is to use a single

thread in a block to process the matching

253 MEMORY MANAGEMENT

The bandwidth inside GPU board is much higher than the

bandwidth between host memory and device memory The data transfer

between host and device can lead to long latency As shown in Figure 11

we load the entire target templates set from database without considering

when they would be processed Therefore there was no data transfer from

host to device during the matching procedure In global memory the

components in descriptor y(ϕ1 ϕ2 ϕ3 x y) and s(x y rθ ϕw) were stored

separately This would guarantee contiguous kernels of Algorithm 2 to 4

can access their data in successive addresses Although such coalescing

access reduces the latency frequently global memory access was still a

slower way to get data In our kernel we loaded the test template to shared

memory to accelerate memory access Because the Algorithms 2 to 4

execute different number of iterations on same data the bank conflict does

not happen To maximize our texture memory space we set the system

cache to the lowest value and bonded our target descriptor to texture

memory Using this catchable memory our data access was accelerated

more

FIG

26 HISTOGRAM OF ORIENTED GRADIENTS

Histogram of oriented gradients is the feature descriptors It is primarily

applied to the design of target detection In this paper it is applied as the

feature for human recognition In the sclera region the vein patterns are the

edges of an image So HOG is used to determine the gradient orientation

and edge orientations of vein pattern in the sclera region of an eye image

To follow out this technique first of all divide the image into small

connected regions called cells For each cell compute the histogram of

gradient directions or edge orientations of the pixels Then the combination

of different histogram of different cell represents the descriptor To improve

accuracy histograms can be contrast normalized by calculating the intensity

from the block and then using this value normalizes all cells within the

block This normalization result shows that it is invariant to geometric and

photometric changes The gradient magnitude m(x y) and orientation 1050592(x

y) are calculated using x and y directions gradients dx (x y) and dy (x y)

Orientation binning is the second step of HOG This method utilized

to create cell histograms Each pixel within the cell used to give a weight to

the orientation which is found in the gradient computation Gradient

magnitude is used as the weight The cells are in the rectangular form The

binning of gradient orientation should be spread over 0 to 180 degrees and

opposite direction counts as the same In the Fig 8 depicts the edge

orientation of picture elements If the images have any illumination and

contrast changes then the gradient strength must be locally normalized For

that cells are grouped together into larger blocks These blocks are

overlapping so that each cell contributes more than once to the final

descriptor Here rectangular HOG (R-HOG) blocks are applied which are

mainly in square grids The performance of HOG is improved by putting

on a Gaussian window into each block

FIG

CHAPTER 3

SOFTWARE SPECIFICATION

31 GENERAL

MATLAB(matrix laboratory) is a numerical

computing environment and fourth-generation programming language

Developed by Math Works MATLAB allows matrix manipulations

plotting of functions and data implementation of algorithms creation

of user interfaces and interfacing with programs written in other languages

including C C++ Java and Fortran

Although MATLAB is intended primarily for numerical computing an

optional toolbox uses the MuPAD symbolic engine allowing access

to symbolic computing capabilities An additional package Simulink adds

graphicalmulti-domainsimulationandModel-Based

Design for dynamic and embedded systems

In 2004 MATLAB had around one million users across industry

and academia MATLAB users come from various backgrounds

of engineering science and economics MATLAB is widely used in

academic and research institutions as well as industrial enterprises

MATLAB was first adopted by researchers and practitioners

in control engineering Littles specialty but quickly spread to many other

domains It is now also used in education in particular the teaching

of linear algebra and numerical analysis and is popular amongst scientists

involved in image processing The MATLAB application is built around the

MATLAB language The simplest way to execute MATLAB code is to type

it in the Command Window which is one of the elements of the MATLAB

Desktop When code is entered in the Command Window MATLAB can

be used as an interactive mathematical shell Sequences of commands can

be saved in a text file typically using the MATLAB Editor as a script or

encapsulated into a function extending the commands available

MATLAB provides a number of features for documenting and

sharing your work You can integrate your MATLAB code with other

languages and applications and distribute your MATLAB algorithms and

applications

32 FEATURES OF MATLAB

High-level language for technical computing

Development environment for managing code files and data

Interactive tools for iterative exploration design and problem solving

Mathematical functions for linear algebra statistics Fourier analysis

filtering optimization and numerical integration

2-D and 3-D graphics functions for visualizing data

Tools for building custom graphical user interfaces

Functions for integrating MATLAB based algorithms with external

applications and languages such as C C++ FORTRAN Javatrade COM

and Microsoft Excel

MATLAB is used in vast area including signal and image

processing communications control design test and measurement

financial modeling and analysis and computational Add-on toolboxes

(collections of special-purpose MATLAB functions) extend the MATLAB

environment to solve particular classes of problems in these application

areas

MATLAB can be used on personal computers and powerful

server systems including the Cheaha compute cluster With the addition of

the Parallel Computing Toolbox the language can be extended with parallel

implementations for common computational functions including for-loop

unrolling Additionally this toolbox supports offloading computationally

intensive workloads to Cheaha the campus compute cluster MATLAB is

one of a few languages in which each variable is a matrix (broadly

construed) and knows how big it is Moreover the fundamental operators

(eg addition multiplication) are programmed to deal with matrices when

required And the MATLAB environment handles much of the bothersome

housekeeping that makes all this possible Since so many of the procedures

required for Macro-Investment Analysis involves matrices MATLAB

proves to be an extremely efficient language for both communication and

implementation

321 INTERFACING WITH OTHER LANGUAGES

MATLAB can call functions and subroutines written in the C

programming language or FORTRAN A wrapper function is created

allowing MATLAB data types to be passed and returned The dynamically

loadable object files created by compiling such functions are termed MEX-

files (for MATLAB executable)

Libraries written in Java ActiveX or NET can be directly called

from MATLAB and many MATLAB libraries (for

example XML or SQL support) are implemented as wrappers around Java

or ActiveX libraries Calling MATLAB from Java is more complicated but

can be done with MATLAB extension which is sold separately by Math

Works or using an undocumented mechanism called JMI (Java-to-Mat lab

Interface) which should not be confused with the unrelated Java that is also

called JMI

As alternatives to the MuPAD based Symbolic Math Toolbox

available from Math Works MATLAB can be connected

to Maple or Mathematical

Libraries also exist to import and export MathML

Development Environment

Startup Accelerator for faster MATLAB startup on Windows especially on

Windows XP and for network installations

Spreadsheet Import Tool that provides more options for selecting and

loading mixed textual and numeric data

Readability and navigation improvements to warning and error messages in

the MATLAB command window

Automatic variable and function renaming in the MATLAB Editor

Developing Algorithms and Applications

MATLAB provides a high-level language and development

tools that let you quickly develop and analyze your algorithms and

applications

The MATLAB Language

The MATLAB language supports the vector and matrix operations

that are fundamental to engineering and scientific problems It enables fast

development and execution With the MATLAB language you can

program and develop algorithms faster than with traditional languages

because you do not need to perform low-level administrative tasks such as

declaring variables specifying data types and allocating memory In many

cases MATLAB eliminates the need for lsquoforrsquo loops As a result one line of

MATLAB code can often replace several lines of C or C++ code

At the same time MATLAB provides all the features of a traditional

programming language including arithmetic operators flow control data

structures data types object-oriented programming (OOP) and debugging

features

MATLAB lets you execute commands or groups of commands one

at a time without compiling and linking enabling you to quickly iterate to

the optimal solution For fast execution of heavy matrix and vector

computations MATLAB uses processor-optimized libraries For general-

purpose scalar computations MATLAB generates machine-code

instructions using its JIT (Just-In-Time) compilation technology

This technology which is available on most platforms provides

execution speeds that rival those of traditional programming languages

Development Tools

MATLAB includes development tools that help you implement

your algorithm efficiently These include the following

MATLAB Editor

Provides standard editing and debugging features such as setting

breakpoints and single stepping

Code Analyzer

Checks your code for problems and recommends modifications to

maximize performance and maintainability

MATLAB Profiler

Records the time spent executing each line of code

Directory Reports

Scan all the files in a directory and report on code efficiency file

differences file dependencies and code coverage

Designing Graphical User Interfaces

By using the interactive tool GUIDE (Graphical User Interface

Development Environment) to layout design and edit user interfaces

GUIDE lets you include list boxes pull-down menus push buttons radio

buttons and sliders as well as MATLAB plots and Microsoft

ActiveXreg controls Alternatively you can create GUIs programmatically

using MATLAB functions

322 ANALYZING AND ACCESSING DATA

MATLAB supports the entire data analysis process from acquiring

data from external devices and databases through preprocessing

visualization and numerical analysis to producing presentation-quality

output

Data Analysis

MATLAB provides interactive tools and command-line functions for data

analysis operations including

Interpolating and decimating

Extracting sections of data scaling and averaging

Thresholding and smoothing

Correlation Fourier analysis and filtering

1-D peak valley and zero finding

Basic statistics and curve fitting

Matrix analysis

Data Access

MATLAB is an efficient platform for accessing data from

files other applications databases and external devices You can read data

from popular file formats such as Microsoft Excel ASCII text or binary

files image sound and video files and scientific files such as HDF and

HDF5 Low-level binary file IO functions let you work with data files in

any format Additional functions let you read data from Web pages and

XML

Visualizing Data

All the graphics features that are required to visualize engineering

and scientific data are available in MATLAB These include 2-D and 3-D

plotting functions 3-D volume visualization functions tools for

interactively creating plots and the ability to export results to all popular

graphics formats You can customize plots by adding multiple axes

changing line colors and markers adding annotation Latex equations and

legends and drawing shapes

2-D Plotting

Visualizing vectors of data with 2-D plotting functions that create

Line area bar and pie charts

Direction and velocity plots

Histograms

Polygons and surfaces

Scatterbubble plots

Animations

3-D Plotting and Volume Visualization

MATLAB provides functions for visualizing 2-D matrices 3-

D scalar and 3-D vector data You can use these functions to visualize and

understand large often complex multidimensional data Specifying plot

characteristics such as camera viewing angle perspective lighting effect

light source locations and transparency

3-D plotting functions include

Surface contour and mesh

Image plots

Cone slice stream and isosurface

323 PERFORMING NUMERIC COMPUTATION

MATLAB contains mathematical statistical and engineering

functions to support all common engineering and science operations These

functions developed by experts in mathematics are the foundation of the

MATLAB language The core math functions use the LAPACK and BLAS

linear algebra subroutine libraries and the FFTW Discrete Fourier

Transform library Because these processor-dependent libraries are

optimized to the different platforms that MATLAB supports they execute

faster than the equivalent C or C++ code

MATLAB provides the following types of functions for performing

mathematical operations and analyzing data

Matrix manipulation and linear algebra

Polynomials and interpolation

Fourier analysis and filtering

Data analysis and statistics

Optimization and numerical integration

Ordinary differential equations (ODEs)

Partial differential equations (PDEs)

Sparse matrix operations

MATLAB can perform arithmetic on a wide range of data types

including doubles singles and integers

CHAPTER 4

IMPLEMENTATION

41 GENERAL

Matlab is a program that was originally designed to simplify the

implementation of numerical linear algebra routines It has since grown into

something much bigger and it is used to implement numerical algorithms

for a wide range of applications The basic language used is very similar to

standard linear algebra notation but there are a few extensions that will

likely cause you some problems at first

42 SNAPSHOTS

ORIGINAL SCLERA IMAGE IS CONVERTED INTO GREY SCALE IMAGE

FIG

GREY SCALE IMAGE IS CONVERTED INTO BINARY IMAGE

FIG

EDGE DETECTON IS DONE BY OTSUrsquoS THRESHOLDING

FIG

SELECTING THE REGION OF INTEREST (SCLERA PART)

FIG

SELECTED ROI PART

FIG

FIG

ENHANCEMENT OF SCLERA IMAGE

FIG

FEATURE EXTRACTION OF SCLERA IMAGE USING GABOR

FILTERS

FIG

MATCHING WITH IMAGES IN DATABASE

FIG

DISPLAYING THE RESULT (MATCHED OR NOT MATCHED)

FIG

CHAPTER 5

APPLICATIONS

The applications of biometrics can be divided into the following three main groups

Commercial applications such as computer network login electronic data security ecommerce Internet access ATM credit card physical access control cellular phone PDA medical records management distance learning etc

Government applications such as national ID card correctional facility driverrsquos license social security welfare-disbursement border control Passport control etc

Forensic applications such as corpse identification criminal investigation terrorist identification parenthood determination missing children etc Traditionally commercial applications have used knowledge-based systems (eg PIN sand passwords) government applications have used token-based systems (eg ID cards and badges) and forensic applications have relied on human experts to match biometric features Biometric systems are being increasingly deployed in large scale civilian applications The Schiphol Premium scheme at the Amsterdam airport for example employs iris scan cards to speed up the passport and visa control procedures

CHAPTER 6

CONCLUSION AND FUTURE SCOPE

61 CONCLUSION

In this paper we proposed a new parallel sclera vein recognition

method which employees a two stage parallel approach for registration and

matching Even though the research focused on developing a parallel sclera

matching solution for the sequential line-descriptor method using CUDA

GPU architecture the parallel strategies developed in this research can be

applied to design parallel solutions to other sclera vein recognition methods

and general pattern recognition methods We designed the Y shape

descriptor to narrow the search range to increase the matching efficiency

which is a new feature extraction method to take advantage of the GPU

structures We developed the WPL descriptor to incorporate mask

information and make it more suitable for parallel computing which can

dramatically reduce data transferring and computation We then carefully

mapped our algorithms to GPU threads and blocks which is an important

step to achieve parallel computation efficiency using a GPU A work flow

which has high arithmetic intensity to hide the memory access latency was

designed to partition the computation task to the heterogeneous system of

CPU and GPU even to the threads in GPU The proposed method

dramatically improves the matching efficiency without compromising

recognition accuracy

62 REFERENCES

[1] C W Oyster The Human Eye Structure and Function Sunderland

Sinauer Associates 1999

[2] C Cuevas D Berjon F Moran and N Garcia ldquoMoving object

detection for real-time augmented reality applications in a GPGPUrdquo IEEE

Trans Consum Electron vol 58 no 1 pp 117ndash125 Feb 2012

[3] D C Cirean U Meier L M Gambardella and J Schmidhuber ldquoDeep

big simple neural nets for handwritten digit recognitionrdquo Neural Comput

vol 22 no 12 pp 3207ndash3220 2010

[4] F Z Sakr M Taher and A M Wahba ldquoHigh performance iris

recognition system on GPUrdquo in Proc ICCES 2011 pp 237ndash242

[5] G Poli J H Saito J F Mari and M R Zorzan ldquoProcessing

neocognitron of face recognition on high performance environment based

on GPU with CUDA architecturerdquo in Proc 20th Int Symp Comput

Archit High Perform Comput 2008 pp 81ndash88

[6] J Antikainen J Havel R Josth A Herout P Zemcik and M Hauta-

Kasari ldquoNonnegative tensor factorization accelerated using GPGPUrdquo IEEE

Trans Parallel Distrib Syst vol 22 no 7 pp 1135ndash1141 Feb 2011

[7] K-S Oh and K Jung ldquoGPU implementation of neural networksrdquo

Pattern Recognit vol 37 no 6 pp 1311ndash1314 2004

[8] P R Dixon T Oonishi and S Furui ldquoHarnessing graphics processors

for the fast computation of acoustic likelihoods in speech recognitionrdquo

Comput Speech Lang vol 23 no 4 pp 510ndash526 2009

[9] P Kaufman and A Alm ldquoClinical applicationrdquo Adlerrsquos Physiology of

the Eye 2003

[10] R N Rakvic B J Ulis R P Broussard R W Ives and N Steiner

ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics Security vol 4

no 4 pp 812ndash823 Dec 2009

[11] S Crihalmeanu and A Ross ldquoMultispectral scleral patterns for ocular

biometric recognitionrdquo Pattern Recognit Lett vol 33 no 14 pp 1860ndash

1869 Oct 2012

[12] W Wenying Z Dongming Z Yongdong L Jintao and G

Xiaoguang ldquoRobust spatial matching for object retrieval and its parallel

implementation on GPUrdquo IEEE Trans Multimedia vol 13 no 6 pp

1308ndash1318 Dec 2011Multimedia Sec Magdeburg Germany Sep 2004

pp 4ndash15

[13] Y Xu S Deka and R Righetti ldquoA hybrid CPU-GPGPU approach for

real-time elastographyrdquo IEEE Trans Ultrason Ferroelectr Freq Control

vol 58 no 12 pp 2631ndash2645 Dec 2011

[14] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive

multimodal eye recognitionrdquo Signal Image Video Process vol 7 no 4

pp 619ndash631 Jul 2013

[15] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive

approach for sclera image quality measurerdquo Int J Biometrics vol 5 no 2

pp 181ndash198 2013

[16] Z Zhou E Y Du N L Thomas and E J Delp ldquoA new human

identification method Sclera recognitionrdquo IEEE Trans Syst Man

Cybern A Syst Humans vol 42 no 3 pp 571ndash583 May 2012

Page 42: Major project report on

zooms occurs at the image acquisition step ϕ1 ϕ2 and ϕ3 are quite stable

To tolerate errors from the pupil center calculation in the segmentation step

we also recorded the center position (x y) of the Y shape branches as

auxiliary parameters So in our rotation shift and scale invariant feature

vector is defined as y(ϕ1 ϕ2 ϕ3 x y) The Y-shape descriptor is generated

with reference to the iris center Therefore it is automatically aligned to the

iris centers It is a rotational- and scale- invariant descriptor V WPL

SCLERA DESCRIPTOR As we discussed in the Section 22 the line

descriptor is extracted from the skeleton of vessel structure in binary images

(Figure 7) The skeleton is then broken into smaller segments For each

segment a line descriptor is created to record the center and orientation of

the segment This descriptor is expressed as s(x yɸ) where (x y) is the

position of the center and ɸ is its orientation Because of the limitation of

segmentation accuracy the descriptor in the boundary of sclera area might

not be accurate and may contain spur edges resulting from the iris eyelid

andor eyelashes To be tolerant of such error the mask file

FIG

The line descriptor of the sclera vessel pattern (a) An eye image (b) Vessel

patterns in sclera (c) Enhanced sclera vessel patterns (d) Centers of line

segments of vessel patterns

Is designed to indicate whether a line segment belongs to the edge of the

sclera or not However in GPU application using the mask is a challenging

since the mask files are large in size and will occupy the GPU memory and

slow down the data transfer When matching the registration RANSAC

type algorithm was used to randomly select the corresponding descriptors

and the transform parameter between them was used to generate the

template transform affine matrix After every templates transform the mask

data should also be transformed and new boundary should be calculated to

evaluate the weight of the transformed descriptor This results in too many

convolutions in processor unit

To reduce heavy data transfer and computation we designed the

weighted polar line (WPL) descriptor structure which includes the

information of mask and can be automatically aligned We extracted the

relationship of geometric feature of descriptors and store them as a new

descriptor We use a weighted image created via setting various weight

values according to their positions The weight of those descriptors who are

beyond the sclera are set to be 0 and those who are near the sclera

boundary are 05 and interior descriptors are set to be 1 In our work

descriptors weights were calculated on their own mask by the CPU only

once

The calculating result was saved as a component of descriptor The

descriptor of sclera will change to s(x y ɸw) where w denotes the weight

of the point and the value may be 0 05 1 To align two templates when a

template is shifted to another location along the line connecting their

centers all the descriptors of that template will be transformed It would be

faster if two templates have similar reference points If we use the center of

the iris as the reference point when two templates are compared the

correspondence will automatically be aligned to each other since they have

the similar reference point Every feature vector of the template is a set of

line segment descriptors composed of three variable (Figure 8) the

segment angle to the reference line which went through the iris center

denoted as θ the distance between the segments center and pupil center

which is denoted as r the dominant angular orientation of the segment

denoted as ɸ To minimize the GPU computing we also convert the

descriptor value from polar coordinate to rectangular coordinate in CPU

preprocess

The descriptor vector becomes s(x y r θ ɸw) The left and right

parts of sclera in an eye may have different registration parameters For

example as an eyeball moves left left part sclera patterns of the eye may be

compressed while the right part sclera patterns are stretched

In parallel matching these two parts are assigned to threads in

different warps to allow different deformation The multiprocessor in

CUDA manages threads in groups of 32 parallel threads called warps We

reorganized the descriptor from same sides and saved

FIG

FIG

them in continuous address This would meet requirement of coalesced

memory access in GPU

After reorganizing the structure of descriptors and adding mask information

into the new descriptor the computation on the mask file is not needed on

the GPU It was very fast to match with this feature because it does not

need to reregister the templates every time after shifting Thus the cost of

data transfer and computation on GPU will be reduced Matching on the

new descriptor the shift parameters generator in Figure 4 is then simplified

as Figure 9

23 EVOLUTION OF GPU ARCHITECTURE

The fixed-function pipeline lacked the generality to efficiently express

more complicated shading and lighting operations that are essential for

complex effects The key step was replacing the fixed-function per-vertex

and per-fragment operations with user-specified programs run on each

vertex and fragment Over the past six years these vertex programs and

fragment programs have become increasingly more capable with larger

limits on their size and resource consumption with more fully featured

instruction sets and with more flexible control-flow operations After many

years of separate instruction sets for vertex and fragment operations current

GPUs support the unified Shader Model 40 on both vertex and fragment

shaders

The hardware must support shader programs of at least 65 k static

instructions and unlimited dynamic instructions

The instruction set for the first time supports both 32-bit integers and 32-

bit floating-point numbers

The hardware must allow an arbitrary number of both direct and indirect

reads from global memory (texture)

Finally dynamic flow control in the form of loops and branches must be

supported

As the shader model has evolved and become more powerful and GPU

applications of all types have increased vertex and fragment program

complexity GPU architectures have increasingly focused on the

programmable parts of the graphics pipeline Indeed while previous

generations of GPUs could best be described as additions of

programmability to a fixed-function pipeline todayrsquos GPUs are better

characterized as a programmable engine surrounded by supporting fixed-

function units General-Purpose Computing on the GPU Mapping general-

purpose computation onto the GPU uses the graphics hardware in much the

same way as any standard graphics application Because of this similarity it

is both easier and more difficult to explain the process On one hand the

actual operations are the same and are easy to follow on the other hand the

terminology is different between graphics and general-purpose use Harris

provides an excellent description of this mapping process

We begin by describing GPU programming using graphics terminology

then show how the same steps are used in a general-purpose way to author

GPGPU applications and finally use the same steps to show the more

simple and direct way that todayrsquos GPU computing applications are written

231 PROGRAMMING A GPU FOR GRAPHICS

We begin with the same GPU pipeline that we described in Section II

concentrating on the programmable aspects of this pipeline

The programmer specifies geometry that covers a region on the screen

The rasterizer generates a fragment at each pixel location covered by that

geometry

Each fragment is shaded by the fragment program

The fragment program computes the value of the fragment by a

combination of math operations and global memory reads from a global

Btexture[ memory

The resulting image can then be used as texture on future passes through

the graphics pipeline

232 PROGRAMMING A GPU FOR GENERAL-PURPOSE

PROGRAMS (OLD)

Coopting this pipeline to perform general-purpose computation

involves the exact same steps but different terminology A motivating

example is a fluid simulation computed over a grid at each time step we

compute the next state of the fluid for each grid point from the current state

at its grid point and at the grid points of its neighbors

The programmer specifies a geometric primitive that covers a

computation domain of interest The rasterizer generates a fragment at each

pixel location covered by that geometry (In our example our primitive

must cover a grid of fragments equal to the domain size of our fluid

simulation)

Each fragment is shaded by an SPMD general-purpose fragment

program (Each grid point runs the same program to update the state of its

fluid)

The fragment program computes the value of the fragment by a

combination of math operations and Bgather[ accesses from global

memory (Each grid point can access the state of its neighbors from the

previous time step in computing its current value)

The resulting buffer in global memory can then be used as an input on

future passes (The current state of the fluid will be used on the next time

step)

23 2 PROGRAMMING A GPU FOR GENERAL-PURPOSE

PROGRAMS (NEW)

One of the historical difficulties in programming GPGPU applications

has been that despite their general-purpose tasksrsquo having nothing to do with

graphics the applications still had to be programmed using graphics APIs

In addition the program had to be structured in terms of the graphics

pipeline with the programmable units only accessible as an intermediate

step in that pipeline when the programmer would almost certainly prefer to

access the programmable units directly The programming environments we

describe in detail in Section IV are solving this difficulty by providing a

more natural direct non-graphics interface to the hardware and

specifically the programmable units Today GPU computing applications

are structured in the following way

The programmer directly defines the computation domain of interest as a

structured grid of threads

An SPMD general-purpose program computes the value of each thread

The value for each thread is computed by a combination of math

operations and both Bgather[ (read) accesses from and Bscatter[ (write)

accesses to global memory Unlike in the previous two

methods the same buffer can be used for both reading and writing

allowing more flexible algorithms (for example in-place algorithms that

use less memory)

The resulting buffer in global memory can then be used as an input in

future computation

24 COARSE-TO-FINE TWO-STAGE MATCHING PROCESS

To further improve the matching process we propose the coarse-to-fine

two-stage matching process In the first stage we matched two images

coarsely using the Y-shape descriptors which is very fast to match because

no registration was needed The matching result in this stage can help filter

out image pairs with low similarities After this step it is still possible for

some false positive matches In the second stage we used WPL descriptor

to register the two images for more detailed descriptor matching including

scale- and translation invariance This stage includes shift transform affine

matrix generation and final WPL descriptor matching Overall we

partitioned the registration and matching processing into four kernels2 in

CUDA (Figure 10) matching on the Y shape descriptor shift

transformation affine matrix generation and final WSL descriptor

matching Combining these two stages the matching program can run faster

and

achieve more accurate score

241 STAGE I MATCHING WITH Y SHAPE DESCRIPTOR

Due to scale- and rotation- invariance of the Y-shape features

registration is unnecessary before matching on Y shape descriptor The

whole matching algorithm is listed as algorithm 1

FIG

Here ytei and yta j are the Y shape descriptors of test template Tte

and target template Tta respectively dϕ is the Euclidian distance of angle

element of descriptors vector defined as (3) dxy is the Euclidian distance of

two descriptor centers defined as (4) ni and di are the matched descriptor

pairsrsquo number and their centers distance respectively tϕ is a distance

threshold and txy is the threshold to restrict the searching area We set tϕ to

30 and txy to 675 in our experiment Here

To match two sclera templates we searched the areas nearby to all

the Y shape branches The search area is limited to the corresponding left or

right half of the sclera in order to reduce the searching range and time The

distance of two branches is defined in (3) where ϕi j is the angle between

the j th branch and the polar from pupil center in desctiptor i

The number of matched pairs ni and the distance between Y shape

branches centers di are stored as the matching result We fuse the number of

matched branches and the average distance between matched branches

centers as (2) Here α is a factor to fuse the matching score which was set

to 30 in our study Ni and Nj is the total numbers of feature vectors in

template i and j separately The decision is regulated by the threshold t if

the sclerarsquos matching score is lower than t the sclera will be discarded The

sclera with high matching score will be passed to the next more precisely

matching process

242 STAGE II FINE MATCHING USING WPL DESCRIPTOR

The line segment WSL descriptor reveals more vessel structure detail of

sclera than the Y shape descriptor The variation of sclera vessel pattern is

nonlinear because

When acquiring an eye image in different gaze angle the vessel structure

will appear nonlinear shrink or extend because eyeball is spherical in shape

sclera is made up of four layers episclera stroma lamina fusca and

endothelium There are slightly differences among movement of these

layers Considering these factors our registration employed both single

shift transform and multi-parameter transform which combines shift

rotation and scale together

1) SHIFT PARAMETER SEARCH As we discussed before

segmentation may not be accurate As a result the detected iris center could

not be very accurate Shift transform is designed to tolerant possible errors

in pupil center detection in the segmentation step If there is no deformation

or only very minor deformation registration with shift transform together

would be adequate to achieve an accurate result We designed Algorithm 2

to get optimized shift parameter Where Tte is the test template and ssei is

the i th WPL descriptor of Tte Tta is the target template and ssai is the i th

WPL descriptor of Ttad(stek staj ) is Euclidean distance of escriptors stek

and staj

Δsk is the shift value of two descriptors defines as

We first randomly select an equal number of segment descriptors

stek in test template Tte from each quad and find its nearest neighbors staj _

in target template Tta The shift offset of them is recorded as the possible

registration shift factor _sk The final offset registration factor is _soptim

which has the smallest standard deviation among these candidate offsets

2) AFFINE TRANSFORM PARAMETER SEARCH

Affine transform is designed to tolerant some deformation of sclera

patterns in the matching step The affine transform algorithm is shown in

Algorithm 3 The shift value in the parameter set is obtained by randomly

selecting descriptor s(it )te and calculating the distance from its nearest

neighbor staj_ in Tta We transform the test template by the matrix in (7)

At end of the iteration we count the numbers of matched descriptor pairs

from the transformed template and the target template The factor β is

involved to determine if the pair of descriptor is matched and we set it to

be 20 pixels in our experiment After N iterations the optimized transform

parameter set is determined via selecting the maximum matching numbers

m(it) Here stei Tte staj and Tta is defined same as algorithm 2 tr (it )

shi f t θ(it )tr (it ) scale is the parameters of shift rotation and scale

transform generated in i tth iteration R(θ (it )) T (tr (it ) shi f t ) and S(tr (it

) scale) are the transform matrix defined as (7) To search optimize

transform parameter we iterated N times to generate these parameters In

our experiment we set iteration time to 512

3) REGISTRATION AND MATCHING ALGORITHM

Using the optimized parameter set determined from Algorithms 2

and 3 the test template will be registered and matched simultaneously The

registration and matching algorithm is listed in Algorithm 4 Here stei Tte

staj and Tta are defined same as Algorithms 2 and 3 θ(optm) tr (optm) shi f

t tr (optm) scale _soptim are the registration parameters attained from

Algorithms 2 and 3 R_θ(optm)_T _tr (optm) shi f t _S_tr (optm) scale

_ is the descriptor transform matrix defined in Algorithm 3 empty is the angle

between the segment descriptor and radius direction w is the weight of the

descriptor which indicates whether the descriptor is at the edge of sclera or

not To ensure that the nearest descriptors have a similar orientation we

used a constant factor α to check the abstract difference of two ɸ In our

experiment we set α to 5 The total matching score is minimal score of two

transformed result divided by the minimal matching score for test template

and target template

25 MAPPING THE SUBTASKS TO CUDA

CUDA is a single instruction multiple data (SIMD) system and

works as a coprocessor with a CPU A CUDA consists of many streaming

multiprocessors (SM) where the parallel part of the program should be

partitioned into threads by the programmer and mapped into those threads

There are multiple memory spaces in the CUDA memory hierarchy

register local memory shared memory global memory constant memory

and texture memory Register local memory and shared memory are on-

chip and could be a little time consuming to access these memories Only

shared memory can be accessed by other threads within the same block

However there is only limited availability of shared memory Global

memory constant memory and texture memory are off-chip memory and

accessible by all threads which would be very time consuming to access

these memories

Constant memory and texture memory are read-only and cacheable

memory Mapping algorithms to CUDA to achieve efficient processing is

not a trivial task There are several challenges in CUDA programming

If threads in a warp have different control path all the branches will be

executed serially To improve performance branch divergence within a

warp should be avoided

Global memory is slower than on-chip memory in term of access To

completely hide the latency of the small instructions set we should use on-

chip memory preferentially rather than global memory When global

memory access occurs threads in same warp should access the words in

sequence to achieve coalescence

Shared memory is much faster than the local and global memory space

But shared memory is organized into banks which are equal in size If two

addresses of memory request from different thread within a warp fall in the

same memory bank the access will be serialized To get maximum

performance memory requests should be scheduled to minimize bank

conflicts

251 MAPPING ALGORITHM TO BLOCKS

Because the proposed registration and matching algorithm has four

independent modules all the modules will be converted to different kernels

on the GPU These kernels are different in computation density thus we

map them to the GPU by various map strategies to fully utilize the

computing power of CUDA Figure 11 shows our scheme of CPU-GPU

task distribution and the partition among blocks and threads Algorithm 1 is

partitioned into coarse-grained parallel subtasks

We create a number of threads in this kernel The number of threads

is the same as the number of templates in the database As the upper middle

column shows in Figure 11 each target template will be assigned to one

thread One thread performs a pair of templates compare In our work we

use NVIDIA C2070 as our GPU Threads and blocks number is set to

1024 That means we can match our test template with up to 1024times1024

target templates at same time

Algorithms 2-4 will be partitioned into fine-grained subtasks which is

processed a section of descriptors in one thread As the lower portion of the

middle column shows in Figure 11 we assigned a target template to one

block Inside a block one thread corresponds a set of descriptors in this

template This partition makes every block execute independently and there

are no data exchange requirements between different blocks When all

threads complete their responding descriptor fractions the sum of the

intermediate results needs to be computed or compared A parallel prefix

sum algorithm is used to calculate the sum of intermediate results which is

show in right of Figure 11 Firstly all odd number threads compute the sum

of consecutive pairs of the results Then recursively every first of i (= 4 8

16 32 64 ) threads

compute the prefix sum on the new result The final result will be saved in

the first address which has the same variable name as the first intermediate

result

252 MAPPING INSIDE BLOCK

In shift argument searching there are two schemes we can choose to

map task

Mapping one pair of templates to all the threads in a block and then every

thread would take charge of a fraction of descriptors and cooperation with

other threads

Assigning a single possible shift offset to a thread and all the threads will

compute independently unless the final result should be compared with

other possible offset

Due to great number of sum and synchronization operations in every

nearest neighbor searching step we choose the second method to parallelize

shift searching In affine matrix generator we mapped an entire parameter

set searching to a thread and every thread randomly generated a set of

parameters and tried them independently The generated iterations were

assigned to all threads The challenge of this step is the randomly generated

numbers might be correlated among threads In the step of rotation and

scale registration generating we used the Mersenne Twister pseudorandom

number generator because it can use bitwise arithmetic and have long

period

The Mersenne twister as most of pseudorandom generators is iterative

Therefore itrsquos hard to parallelize a single twister state update step among

several execution threads To make sure that thousands of threads in the

launch grid generate uncorrelated random sequence many simultaneous

Mersenne twisters need to process with different initial states in parallel

But even ldquovery differentrdquo (by any definition) initial state values do not

prevent the emission of correlated sequences by each generator sharing

identical parameters To solve this problem and to enable efficient

implementation of Mersenne Twister on parallel architectures we used a

special offline tool for the dynamic creation of Mersenne Twisters

parameters modified from the algorithm developed by Makoto Matsumoto

and Takuji Nishimura In the registration and matching step when

searching the nearest neighbor a line segment that has already matched

with others should not be used again In our approach a flag

FIG

FIG

Variable denoting whether the line has been matched is stored in

shared memory To share the flags all the threads in a block should wait

synchronic operation at every query step Our solution is to use a single

thread in a block to process the matching

253 MEMORY MANAGEMENT

The bandwidth inside GPU board is much higher than the

bandwidth between host memory and device memory The data transfer

between host and device can lead to long latency As shown in Figure 11

we load the entire target templates set from database without considering

when they would be processed Therefore there was no data transfer from

host to device during the matching procedure In global memory the

components in descriptor y(ϕ1 ϕ2 ϕ3 x y) and s(x y rθ ϕw) were stored

separately This would guarantee contiguous kernels of Algorithm 2 to 4

can access their data in successive addresses Although such coalescing

access reduces the latency frequently global memory access was still a

slower way to get data In our kernel we loaded the test template to shared

memory to accelerate memory access Because the Algorithms 2 to 4

execute different number of iterations on same data the bank conflict does

not happen To maximize our texture memory space we set the system

cache to the lowest value and bonded our target descriptor to texture

memory Using this catchable memory our data access was accelerated

more

FIG

26 HISTOGRAM OF ORIENTED GRADIENTS

Histogram of oriented gradients is the feature descriptors It is primarily

applied to the design of target detection In this paper it is applied as the

feature for human recognition In the sclera region the vein patterns are the

edges of an image So HOG is used to determine the gradient orientation

and edge orientations of vein pattern in the sclera region of an eye image

To follow out this technique first of all divide the image into small

connected regions called cells For each cell compute the histogram of

gradient directions or edge orientations of the pixels Then the combination

of different histogram of different cell represents the descriptor To improve

accuracy histograms can be contrast normalized by calculating the intensity

from the block and then using this value normalizes all cells within the

block This normalization result shows that it is invariant to geometric and

photometric changes The gradient magnitude m(x y) and orientation 1050592(x

y) are calculated using x and y directions gradients dx (x y) and dy (x y)

Orientation binning is the second step of HOG This method utilized

to create cell histograms Each pixel within the cell used to give a weight to

the orientation which is found in the gradient computation Gradient

magnitude is used as the weight The cells are in the rectangular form The

binning of gradient orientation should be spread over 0 to 180 degrees and

opposite direction counts as the same In the Fig 8 depicts the edge

orientation of picture elements If the images have any illumination and

contrast changes then the gradient strength must be locally normalized For

that cells are grouped together into larger blocks These blocks are

overlapping so that each cell contributes more than once to the final

descriptor Here rectangular HOG (R-HOG) blocks are applied which are

mainly in square grids The performance of HOG is improved by putting

on a Gaussian window into each block

FIG

CHAPTER 3

SOFTWARE SPECIFICATION

31 GENERAL

MATLAB(matrix laboratory) is a numerical

computing environment and fourth-generation programming language

Developed by Math Works MATLAB allows matrix manipulations

plotting of functions and data implementation of algorithms creation

of user interfaces and interfacing with programs written in other languages

including C C++ Java and Fortran

Although MATLAB is intended primarily for numerical computing an

optional toolbox uses the MuPAD symbolic engine allowing access

to symbolic computing capabilities An additional package Simulink adds

graphicalmulti-domainsimulationandModel-Based

Design for dynamic and embedded systems

In 2004 MATLAB had around one million users across industry

and academia MATLAB users come from various backgrounds

of engineering science and economics MATLAB is widely used in

academic and research institutions as well as industrial enterprises

MATLAB was first adopted by researchers and practitioners

in control engineering Littles specialty but quickly spread to many other

domains It is now also used in education in particular the teaching

of linear algebra and numerical analysis and is popular amongst scientists

involved in image processing The MATLAB application is built around the

MATLAB language The simplest way to execute MATLAB code is to type

it in the Command Window which is one of the elements of the MATLAB

Desktop When code is entered in the Command Window MATLAB can

be used as an interactive mathematical shell Sequences of commands can

be saved in a text file typically using the MATLAB Editor as a script or

encapsulated into a function extending the commands available

MATLAB provides a number of features for documenting and

sharing your work You can integrate your MATLAB code with other

languages and applications and distribute your MATLAB algorithms and

applications

32 FEATURES OF MATLAB

High-level language for technical computing

Development environment for managing code files and data

Interactive tools for iterative exploration design and problem solving

Mathematical functions for linear algebra statistics Fourier analysis

filtering optimization and numerical integration

2-D and 3-D graphics functions for visualizing data

Tools for building custom graphical user interfaces

Functions for integrating MATLAB based algorithms with external

applications and languages such as C C++ FORTRAN Javatrade COM

and Microsoft Excel

MATLAB is used in vast area including signal and image

processing communications control design test and measurement

financial modeling and analysis and computational Add-on toolboxes

(collections of special-purpose MATLAB functions) extend the MATLAB

environment to solve particular classes of problems in these application

areas

MATLAB can be used on personal computers and powerful

server systems including the Cheaha compute cluster With the addition of

the Parallel Computing Toolbox the language can be extended with parallel

implementations for common computational functions including for-loop

unrolling Additionally this toolbox supports offloading computationally

intensive workloads to Cheaha the campus compute cluster MATLAB is

one of a few languages in which each variable is a matrix (broadly

construed) and knows how big it is Moreover the fundamental operators

(eg addition multiplication) are programmed to deal with matrices when

required And the MATLAB environment handles much of the bothersome

housekeeping that makes all this possible Since so many of the procedures

required for Macro-Investment Analysis involves matrices MATLAB

proves to be an extremely efficient language for both communication and

implementation

321 INTERFACING WITH OTHER LANGUAGES

MATLAB can call functions and subroutines written in the C

programming language or FORTRAN A wrapper function is created

allowing MATLAB data types to be passed and returned The dynamically

loadable object files created by compiling such functions are termed MEX-

files (for MATLAB executable)

Libraries written in Java ActiveX or NET can be directly called

from MATLAB and many MATLAB libraries (for

example XML or SQL support) are implemented as wrappers around Java

or ActiveX libraries Calling MATLAB from Java is more complicated but

can be done with MATLAB extension which is sold separately by Math

Works or using an undocumented mechanism called JMI (Java-to-Mat lab

Interface) which should not be confused with the unrelated Java that is also

called JMI

As alternatives to the MuPAD based Symbolic Math Toolbox

available from Math Works MATLAB can be connected

to Maple or Mathematical

Libraries also exist to import and export MathML

Development Environment

Startup Accelerator for faster MATLAB startup on Windows especially on

Windows XP and for network installations

Spreadsheet Import Tool that provides more options for selecting and

loading mixed textual and numeric data

Readability and navigation improvements to warning and error messages in

the MATLAB command window

Automatic variable and function renaming in the MATLAB Editor

Developing Algorithms and Applications

MATLAB provides a high-level language and development

tools that let you quickly develop and analyze your algorithms and

applications

The MATLAB Language

The MATLAB language supports the vector and matrix operations

that are fundamental to engineering and scientific problems It enables fast

development and execution With the MATLAB language you can

program and develop algorithms faster than with traditional languages

because you do not need to perform low-level administrative tasks such as

declaring variables specifying data types and allocating memory In many

cases MATLAB eliminates the need for lsquoforrsquo loops As a result one line of

MATLAB code can often replace several lines of C or C++ code

At the same time MATLAB provides all the features of a traditional

programming language including arithmetic operators flow control data

structures data types object-oriented programming (OOP) and debugging

features

MATLAB lets you execute commands or groups of commands one

at a time without compiling and linking enabling you to quickly iterate to

the optimal solution For fast execution of heavy matrix and vector

computations MATLAB uses processor-optimized libraries For general-

purpose scalar computations MATLAB generates machine-code

instructions using its JIT (Just-In-Time) compilation technology

This technology which is available on most platforms provides

execution speeds that rival those of traditional programming languages

Development Tools

MATLAB includes development tools that help you implement

your algorithm efficiently These include the following

MATLAB Editor

Provides standard editing and debugging features such as setting

breakpoints and single stepping

Code Analyzer

Checks your code for problems and recommends modifications to

maximize performance and maintainability

MATLAB Profiler

Records the time spent executing each line of code

Directory Reports

Scan all the files in a directory and report on code efficiency file

differences file dependencies and code coverage

Designing Graphical User Interfaces

By using the interactive tool GUIDE (Graphical User Interface

Development Environment) to layout design and edit user interfaces

GUIDE lets you include list boxes pull-down menus push buttons radio

buttons and sliders as well as MATLAB plots and Microsoft

ActiveXreg controls Alternatively you can create GUIs programmatically

using MATLAB functions

322 ANALYZING AND ACCESSING DATA

MATLAB supports the entire data analysis process from acquiring

data from external devices and databases through preprocessing

visualization and numerical analysis to producing presentation-quality

output

Data Analysis

MATLAB provides interactive tools and command-line functions for data

analysis operations including

Interpolating and decimating

Extracting sections of data scaling and averaging

Thresholding and smoothing

Correlation Fourier analysis and filtering

1-D peak valley and zero finding

Basic statistics and curve fitting

Matrix analysis

Data Access

MATLAB is an efficient platform for accessing data from

files other applications databases and external devices You can read data

from popular file formats such as Microsoft Excel ASCII text or binary

files image sound and video files and scientific files such as HDF and

HDF5 Low-level binary file IO functions let you work with data files in

any format Additional functions let you read data from Web pages and

XML

Visualizing Data

All the graphics features that are required to visualize engineering

and scientific data are available in MATLAB These include 2-D and 3-D

plotting functions 3-D volume visualization functions tools for

interactively creating plots and the ability to export results to all popular

graphics formats You can customize plots by adding multiple axes

changing line colors and markers adding annotation Latex equations and

legends and drawing shapes

2-D Plotting

Visualizing vectors of data with 2-D plotting functions that create

Line area bar and pie charts

Direction and velocity plots

Histograms

Polygons and surfaces

Scatterbubble plots

Animations

3-D Plotting and Volume Visualization

MATLAB provides functions for visualizing 2-D matrices 3-

D scalar and 3-D vector data You can use these functions to visualize and

understand large often complex multidimensional data Specifying plot

characteristics such as camera viewing angle perspective lighting effect

light source locations and transparency

3-D plotting functions include

Surface contour and mesh

Image plots

Cone slice stream and isosurface

323 PERFORMING NUMERIC COMPUTATION

MATLAB contains mathematical statistical and engineering

functions to support all common engineering and science operations These

functions developed by experts in mathematics are the foundation of the

MATLAB language The core math functions use the LAPACK and BLAS

linear algebra subroutine libraries and the FFTW Discrete Fourier

Transform library Because these processor-dependent libraries are

optimized to the different platforms that MATLAB supports they execute

faster than the equivalent C or C++ code

MATLAB provides the following types of functions for performing

mathematical operations and analyzing data

Matrix manipulation and linear algebra

Polynomials and interpolation

Fourier analysis and filtering

Data analysis and statistics

Optimization and numerical integration

Ordinary differential equations (ODEs)

Partial differential equations (PDEs)

Sparse matrix operations

MATLAB can perform arithmetic on a wide range of data types

including doubles singles and integers

CHAPTER 4

IMPLEMENTATION

41 GENERAL

Matlab is a program that was originally designed to simplify the

implementation of numerical linear algebra routines It has since grown into

something much bigger and it is used to implement numerical algorithms

for a wide range of applications The basic language used is very similar to

standard linear algebra notation but there are a few extensions that will

likely cause you some problems at first

42 SNAPSHOTS

ORIGINAL SCLERA IMAGE IS CONVERTED INTO GREY SCALE IMAGE

FIG

GREY SCALE IMAGE IS CONVERTED INTO BINARY IMAGE

FIG

EDGE DETECTON IS DONE BY OTSUrsquoS THRESHOLDING

FIG

SELECTING THE REGION OF INTEREST (SCLERA PART)

FIG

SELECTED ROI PART

FIG

FIG

ENHANCEMENT OF SCLERA IMAGE

FIG

FEATURE EXTRACTION OF SCLERA IMAGE USING GABOR

FILTERS

FIG

MATCHING WITH IMAGES IN DATABASE

FIG

DISPLAYING THE RESULT (MATCHED OR NOT MATCHED)

FIG

CHAPTER 5

APPLICATIONS

The applications of biometrics can be divided into the following three main groups

Commercial applications such as computer network login electronic data security ecommerce Internet access ATM credit card physical access control cellular phone PDA medical records management distance learning etc

Government applications such as national ID card correctional facility driverrsquos license social security welfare-disbursement border control Passport control etc

Forensic applications such as corpse identification criminal investigation terrorist identification parenthood determination missing children etc Traditionally commercial applications have used knowledge-based systems (eg PIN sand passwords) government applications have used token-based systems (eg ID cards and badges) and forensic applications have relied on human experts to match biometric features Biometric systems are being increasingly deployed in large scale civilian applications The Schiphol Premium scheme at the Amsterdam airport for example employs iris scan cards to speed up the passport and visa control procedures

CHAPTER 6

CONCLUSION AND FUTURE SCOPE

61 CONCLUSION

In this paper we proposed a new parallel sclera vein recognition

method which employees a two stage parallel approach for registration and

matching Even though the research focused on developing a parallel sclera

matching solution for the sequential line-descriptor method using CUDA

GPU architecture the parallel strategies developed in this research can be

applied to design parallel solutions to other sclera vein recognition methods

and general pattern recognition methods We designed the Y shape

descriptor to narrow the search range to increase the matching efficiency

which is a new feature extraction method to take advantage of the GPU

structures We developed the WPL descriptor to incorporate mask

information and make it more suitable for parallel computing which can

dramatically reduce data transferring and computation We then carefully

mapped our algorithms to GPU threads and blocks which is an important

step to achieve parallel computation efficiency using a GPU A work flow

which has high arithmetic intensity to hide the memory access latency was

designed to partition the computation task to the heterogeneous system of

CPU and GPU even to the threads in GPU The proposed method

dramatically improves the matching efficiency without compromising

recognition accuracy

62 REFERENCES

[1] C W Oyster The Human Eye Structure and Function Sunderland

Sinauer Associates 1999

[2] C Cuevas D Berjon F Moran and N Garcia ldquoMoving object

detection for real-time augmented reality applications in a GPGPUrdquo IEEE

Trans Consum Electron vol 58 no 1 pp 117ndash125 Feb 2012

[3] D C Cirean U Meier L M Gambardella and J Schmidhuber ldquoDeep

big simple neural nets for handwritten digit recognitionrdquo Neural Comput

vol 22 no 12 pp 3207ndash3220 2010

[4] F Z Sakr M Taher and A M Wahba ldquoHigh performance iris

recognition system on GPUrdquo in Proc ICCES 2011 pp 237ndash242

[5] G Poli J H Saito J F Mari and M R Zorzan ldquoProcessing

neocognitron of face recognition on high performance environment based

on GPU with CUDA architecturerdquo in Proc 20th Int Symp Comput

Archit High Perform Comput 2008 pp 81ndash88

[6] J Antikainen J Havel R Josth A Herout P Zemcik and M Hauta-

Kasari ldquoNonnegative tensor factorization accelerated using GPGPUrdquo IEEE

Trans Parallel Distrib Syst vol 22 no 7 pp 1135ndash1141 Feb 2011

[7] K-S Oh and K Jung ldquoGPU implementation of neural networksrdquo

Pattern Recognit vol 37 no 6 pp 1311ndash1314 2004

[8] P R Dixon T Oonishi and S Furui ldquoHarnessing graphics processors

for the fast computation of acoustic likelihoods in speech recognitionrdquo

Comput Speech Lang vol 23 no 4 pp 510ndash526 2009

[9] P Kaufman and A Alm ldquoClinical applicationrdquo Adlerrsquos Physiology of

the Eye 2003

[10] R N Rakvic B J Ulis R P Broussard R W Ives and N Steiner

ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics Security vol 4

no 4 pp 812ndash823 Dec 2009

[11] S Crihalmeanu and A Ross ldquoMultispectral scleral patterns for ocular

biometric recognitionrdquo Pattern Recognit Lett vol 33 no 14 pp 1860ndash

1869 Oct 2012

[12] W Wenying Z Dongming Z Yongdong L Jintao and G

Xiaoguang ldquoRobust spatial matching for object retrieval and its parallel

implementation on GPUrdquo IEEE Trans Multimedia vol 13 no 6 pp

1308ndash1318 Dec 2011Multimedia Sec Magdeburg Germany Sep 2004

pp 4ndash15

[13] Y Xu S Deka and R Righetti ldquoA hybrid CPU-GPGPU approach for

real-time elastographyrdquo IEEE Trans Ultrason Ferroelectr Freq Control

vol 58 no 12 pp 2631ndash2645 Dec 2011

[14] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive

multimodal eye recognitionrdquo Signal Image Video Process vol 7 no 4

pp 619ndash631 Jul 2013

[15] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive

approach for sclera image quality measurerdquo Int J Biometrics vol 5 no 2

pp 181ndash198 2013

[16] Z Zhou E Y Du N L Thomas and E J Delp ldquoA new human

identification method Sclera recognitionrdquo IEEE Trans Syst Man

Cybern A Syst Humans vol 42 no 3 pp 571ndash583 May 2012

Page 43: Major project report on

since the mask files are large in size and will occupy the GPU memory and

slow down the data transfer When matching the registration RANSAC

type algorithm was used to randomly select the corresponding descriptors

and the transform parameter between them was used to generate the

template transform affine matrix After every templates transform the mask

data should also be transformed and new boundary should be calculated to

evaluate the weight of the transformed descriptor This results in too many

convolutions in processor unit

To reduce heavy data transfer and computation we designed the

weighted polar line (WPL) descriptor structure which includes the

information of mask and can be automatically aligned We extracted the

relationship of geometric feature of descriptors and store them as a new

descriptor We use a weighted image created via setting various weight

values according to their positions The weight of those descriptors who are

beyond the sclera are set to be 0 and those who are near the sclera

boundary are 05 and interior descriptors are set to be 1 In our work

descriptors weights were calculated on their own mask by the CPU only

once

The calculating result was saved as a component of descriptor The

descriptor of sclera will change to s(x y ɸw) where w denotes the weight

of the point and the value may be 0 05 1 To align two templates when a

template is shifted to another location along the line connecting their

centers all the descriptors of that template will be transformed It would be

faster if two templates have similar reference points If we use the center of

the iris as the reference point when two templates are compared the

correspondence will automatically be aligned to each other since they have

the similar reference point Every feature vector of the template is a set of

line segment descriptors composed of three variable (Figure 8) the

segment angle to the reference line which went through the iris center

denoted as θ the distance between the segments center and pupil center

which is denoted as r the dominant angular orientation of the segment

denoted as ɸ To minimize the GPU computing we also convert the

descriptor value from polar coordinate to rectangular coordinate in CPU

preprocess

The descriptor vector becomes s(x y r θ ɸw) The left and right

parts of sclera in an eye may have different registration parameters For

example as an eyeball moves left left part sclera patterns of the eye may be

compressed while the right part sclera patterns are stretched

In parallel matching these two parts are assigned to threads in

different warps to allow different deformation The multiprocessor in

CUDA manages threads in groups of 32 parallel threads called warps We

reorganized the descriptor from same sides and saved

FIG

FIG

them in continuous address This would meet requirement of coalesced

memory access in GPU

After reorganizing the structure of descriptors and adding mask information

into the new descriptor the computation on the mask file is not needed on

the GPU It was very fast to match with this feature because it does not

need to reregister the templates every time after shifting Thus the cost of

data transfer and computation on GPU will be reduced Matching on the

new descriptor the shift parameters generator in Figure 4 is then simplified

as Figure 9

23 EVOLUTION OF GPU ARCHITECTURE

The fixed-function pipeline lacked the generality to efficiently express

more complicated shading and lighting operations that are essential for

complex effects The key step was replacing the fixed-function per-vertex

and per-fragment operations with user-specified programs run on each

vertex and fragment Over the past six years these vertex programs and

fragment programs have become increasingly more capable with larger

limits on their size and resource consumption with more fully featured

instruction sets and with more flexible control-flow operations After many

years of separate instruction sets for vertex and fragment operations current

GPUs support the unified Shader Model 40 on both vertex and fragment

shaders

The hardware must support shader programs of at least 65 k static

instructions and unlimited dynamic instructions

The instruction set for the first time supports both 32-bit integers and 32-

bit floating-point numbers

The hardware must allow an arbitrary number of both direct and indirect

reads from global memory (texture)

Finally dynamic flow control in the form of loops and branches must be

supported

As the shader model has evolved and become more powerful and GPU

applications of all types have increased vertex and fragment program

complexity GPU architectures have increasingly focused on the

programmable parts of the graphics pipeline Indeed while previous

generations of GPUs could best be described as additions of

programmability to a fixed-function pipeline todayrsquos GPUs are better

characterized as a programmable engine surrounded by supporting fixed-

function units General-Purpose Computing on the GPU Mapping general-

purpose computation onto the GPU uses the graphics hardware in much the

same way as any standard graphics application Because of this similarity it

is both easier and more difficult to explain the process On one hand the

actual operations are the same and are easy to follow on the other hand the

terminology is different between graphics and general-purpose use Harris

provides an excellent description of this mapping process

We begin by describing GPU programming using graphics terminology

then show how the same steps are used in a general-purpose way to author

GPGPU applications and finally use the same steps to show the more

simple and direct way that todayrsquos GPU computing applications are written

231 PROGRAMMING A GPU FOR GRAPHICS

We begin with the same GPU pipeline that we described in Section II

concentrating on the programmable aspects of this pipeline

The programmer specifies geometry that covers a region on the screen

The rasterizer generates a fragment at each pixel location covered by that

geometry

Each fragment is shaded by the fragment program

The fragment program computes the value of the fragment by a

combination of math operations and global memory reads from a global

Btexture[ memory

The resulting image can then be used as texture on future passes through

the graphics pipeline

232 PROGRAMMING A GPU FOR GENERAL-PURPOSE

PROGRAMS (OLD)

Coopting this pipeline to perform general-purpose computation

involves the exact same steps but different terminology A motivating

example is a fluid simulation computed over a grid at each time step we

compute the next state of the fluid for each grid point from the current state

at its grid point and at the grid points of its neighbors

The programmer specifies a geometric primitive that covers a

computation domain of interest The rasterizer generates a fragment at each

pixel location covered by that geometry (In our example our primitive

must cover a grid of fragments equal to the domain size of our fluid

simulation)

Each fragment is shaded by an SPMD general-purpose fragment

program (Each grid point runs the same program to update the state of its

fluid)

The fragment program computes the value of the fragment by a

combination of math operations and Bgather[ accesses from global

memory (Each grid point can access the state of its neighbors from the

previous time step in computing its current value)

The resulting buffer in global memory can then be used as an input on

future passes (The current state of the fluid will be used on the next time

step)

23 2 PROGRAMMING A GPU FOR GENERAL-PURPOSE

PROGRAMS (NEW)

One of the historical difficulties in programming GPGPU applications

has been that despite their general-purpose tasksrsquo having nothing to do with

graphics the applications still had to be programmed using graphics APIs

In addition the program had to be structured in terms of the graphics

pipeline with the programmable units only accessible as an intermediate

step in that pipeline when the programmer would almost certainly prefer to

access the programmable units directly The programming environments we

describe in detail in Section IV are solving this difficulty by providing a

more natural direct non-graphics interface to the hardware and

specifically the programmable units Today GPU computing applications

are structured in the following way

The programmer directly defines the computation domain of interest as a

structured grid of threads

An SPMD general-purpose program computes the value of each thread

The value for each thread is computed by a combination of math

operations and both Bgather[ (read) accesses from and Bscatter[ (write)

accesses to global memory Unlike in the previous two

methods the same buffer can be used for both reading and writing

allowing more flexible algorithms (for example in-place algorithms that

use less memory)

The resulting buffer in global memory can then be used as an input in

future computation

24 COARSE-TO-FINE TWO-STAGE MATCHING PROCESS

To further improve the matching process we propose the coarse-to-fine

two-stage matching process In the first stage we matched two images

coarsely using the Y-shape descriptors which is very fast to match because

no registration was needed The matching result in this stage can help filter

out image pairs with low similarities After this step it is still possible for

some false positive matches In the second stage we used WPL descriptor

to register the two images for more detailed descriptor matching including

scale- and translation invariance This stage includes shift transform affine

matrix generation and final WPL descriptor matching Overall we

partitioned the registration and matching processing into four kernels2 in

CUDA (Figure 10) matching on the Y shape descriptor shift

transformation affine matrix generation and final WSL descriptor

matching Combining these two stages the matching program can run faster

and

achieve more accurate score

241 STAGE I MATCHING WITH Y SHAPE DESCRIPTOR

Due to scale- and rotation- invariance of the Y-shape features

registration is unnecessary before matching on Y shape descriptor The

whole matching algorithm is listed as algorithm 1

FIG

Here ytei and yta j are the Y shape descriptors of test template Tte

and target template Tta respectively dϕ is the Euclidian distance of angle

element of descriptors vector defined as (3) dxy is the Euclidian distance of

two descriptor centers defined as (4) ni and di are the matched descriptor

pairsrsquo number and their centers distance respectively tϕ is a distance

threshold and txy is the threshold to restrict the searching area We set tϕ to

30 and txy to 675 in our experiment Here

To match two sclera templates we searched the areas nearby to all

the Y shape branches The search area is limited to the corresponding left or

right half of the sclera in order to reduce the searching range and time The

distance of two branches is defined in (3) where ϕi j is the angle between

the j th branch and the polar from pupil center in desctiptor i

The number of matched pairs ni and the distance between Y shape

branches centers di are stored as the matching result We fuse the number of

matched branches and the average distance between matched branches

centers as (2) Here α is a factor to fuse the matching score which was set

to 30 in our study Ni and Nj is the total numbers of feature vectors in

template i and j separately The decision is regulated by the threshold t if

the sclerarsquos matching score is lower than t the sclera will be discarded The

sclera with high matching score will be passed to the next more precisely

matching process

242 STAGE II FINE MATCHING USING WPL DESCRIPTOR

The line segment WSL descriptor reveals more vessel structure detail of

sclera than the Y shape descriptor The variation of sclera vessel pattern is

nonlinear because

When acquiring an eye image in different gaze angle the vessel structure

will appear nonlinear shrink or extend because eyeball is spherical in shape

sclera is made up of four layers episclera stroma lamina fusca and

endothelium There are slightly differences among movement of these

layers Considering these factors our registration employed both single

shift transform and multi-parameter transform which combines shift

rotation and scale together

1) SHIFT PARAMETER SEARCH As we discussed before

segmentation may not be accurate As a result the detected iris center could

not be very accurate Shift transform is designed to tolerant possible errors

in pupil center detection in the segmentation step If there is no deformation

or only very minor deformation registration with shift transform together

would be adequate to achieve an accurate result We designed Algorithm 2

to get optimized shift parameter Where Tte is the test template and ssei is

the i th WPL descriptor of Tte Tta is the target template and ssai is the i th

WPL descriptor of Ttad(stek staj ) is Euclidean distance of escriptors stek

and staj

Δsk is the shift value of two descriptors defines as

We first randomly select an equal number of segment descriptors

stek in test template Tte from each quad and find its nearest neighbors staj _

in target template Tta The shift offset of them is recorded as the possible

registration shift factor _sk The final offset registration factor is _soptim

which has the smallest standard deviation among these candidate offsets

2) AFFINE TRANSFORM PARAMETER SEARCH

Affine transform is designed to tolerant some deformation of sclera

patterns in the matching step The affine transform algorithm is shown in

Algorithm 3 The shift value in the parameter set is obtained by randomly

selecting descriptor s(it )te and calculating the distance from its nearest

neighbor staj_ in Tta We transform the test template by the matrix in (7)

At end of the iteration we count the numbers of matched descriptor pairs

from the transformed template and the target template The factor β is

involved to determine if the pair of descriptor is matched and we set it to

be 20 pixels in our experiment After N iterations the optimized transform

parameter set is determined via selecting the maximum matching numbers

m(it) Here stei Tte staj and Tta is defined same as algorithm 2 tr (it )

shi f t θ(it )tr (it ) scale is the parameters of shift rotation and scale

transform generated in i tth iteration R(θ (it )) T (tr (it ) shi f t ) and S(tr (it

) scale) are the transform matrix defined as (7) To search optimize

transform parameter we iterated N times to generate these parameters In

our experiment we set iteration time to 512

3) REGISTRATION AND MATCHING ALGORITHM

Using the optimized parameter set determined from Algorithms 2

and 3 the test template will be registered and matched simultaneously The

registration and matching algorithm is listed in Algorithm 4 Here stei Tte

staj and Tta are defined same as Algorithms 2 and 3 θ(optm) tr (optm) shi f

t tr (optm) scale _soptim are the registration parameters attained from

Algorithms 2 and 3 R_θ(optm)_T _tr (optm) shi f t _S_tr (optm) scale

_ is the descriptor transform matrix defined in Algorithm 3 empty is the angle

between the segment descriptor and radius direction w is the weight of the

descriptor which indicates whether the descriptor is at the edge of sclera or

not To ensure that the nearest descriptors have a similar orientation we

used a constant factor α to check the abstract difference of two ɸ In our

experiment we set α to 5 The total matching score is minimal score of two

transformed result divided by the minimal matching score for test template

and target template

25 MAPPING THE SUBTASKS TO CUDA

CUDA is a single instruction multiple data (SIMD) system and

works as a coprocessor with a CPU A CUDA consists of many streaming

multiprocessors (SM) where the parallel part of the program should be

partitioned into threads by the programmer and mapped into those threads

There are multiple memory spaces in the CUDA memory hierarchy

register local memory shared memory global memory constant memory

and texture memory Register local memory and shared memory are on-

chip and could be a little time consuming to access these memories Only

shared memory can be accessed by other threads within the same block

However there is only limited availability of shared memory Global

memory constant memory and texture memory are off-chip memory and

accessible by all threads which would be very time consuming to access

these memories

Constant memory and texture memory are read-only and cacheable

memory Mapping algorithms to CUDA to achieve efficient processing is

not a trivial task There are several challenges in CUDA programming

If threads in a warp have different control path all the branches will be

executed serially To improve performance branch divergence within a

warp should be avoided

Global memory is slower than on-chip memory in term of access To

completely hide the latency of the small instructions set we should use on-

chip memory preferentially rather than global memory When global

memory access occurs threads in same warp should access the words in

sequence to achieve coalescence

Shared memory is much faster than the local and global memory space

But shared memory is organized into banks which are equal in size If two

addresses of memory request from different thread within a warp fall in the

same memory bank the access will be serialized To get maximum

performance memory requests should be scheduled to minimize bank

conflicts

251 MAPPING ALGORITHM TO BLOCKS

Because the proposed registration and matching algorithm has four

independent modules all the modules will be converted to different kernels

on the GPU These kernels are different in computation density thus we

map them to the GPU by various map strategies to fully utilize the

computing power of CUDA Figure 11 shows our scheme of CPU-GPU

task distribution and the partition among blocks and threads Algorithm 1 is

partitioned into coarse-grained parallel subtasks

We create a number of threads in this kernel The number of threads

is the same as the number of templates in the database As the upper middle

column shows in Figure 11 each target template will be assigned to one

thread One thread performs a pair of templates compare In our work we

use NVIDIA C2070 as our GPU Threads and blocks number is set to

1024 That means we can match our test template with up to 1024times1024

target templates at same time

Algorithms 2-4 will be partitioned into fine-grained subtasks which is

processed a section of descriptors in one thread As the lower portion of the

middle column shows in Figure 11 we assigned a target template to one

block Inside a block one thread corresponds a set of descriptors in this

template This partition makes every block execute independently and there

are no data exchange requirements between different blocks When all

threads complete their responding descriptor fractions the sum of the

intermediate results needs to be computed or compared A parallel prefix

sum algorithm is used to calculate the sum of intermediate results which is

show in right of Figure 11 Firstly all odd number threads compute the sum

of consecutive pairs of the results Then recursively every first of i (= 4 8

16 32 64 ) threads

compute the prefix sum on the new result The final result will be saved in

the first address which has the same variable name as the first intermediate

result

252 MAPPING INSIDE BLOCK

In shift argument searching there are two schemes we can choose to

map task

Mapping one pair of templates to all the threads in a block and then every

thread would take charge of a fraction of descriptors and cooperation with

other threads

Assigning a single possible shift offset to a thread and all the threads will

compute independently unless the final result should be compared with

other possible offset

Due to great number of sum and synchronization operations in every

nearest neighbor searching step we choose the second method to parallelize

shift searching In affine matrix generator we mapped an entire parameter

set searching to a thread and every thread randomly generated a set of

parameters and tried them independently The generated iterations were

assigned to all threads The challenge of this step is the randomly generated

numbers might be correlated among threads In the step of rotation and

scale registration generating we used the Mersenne Twister pseudorandom

number generator because it can use bitwise arithmetic and have long

period

The Mersenne twister as most of pseudorandom generators is iterative

Therefore itrsquos hard to parallelize a single twister state update step among

several execution threads To make sure that thousands of threads in the

launch grid generate uncorrelated random sequence many simultaneous

Mersenne twisters need to process with different initial states in parallel

But even ldquovery differentrdquo (by any definition) initial state values do not

prevent the emission of correlated sequences by each generator sharing

identical parameters To solve this problem and to enable efficient

implementation of Mersenne Twister on parallel architectures we used a

special offline tool for the dynamic creation of Mersenne Twisters

parameters modified from the algorithm developed by Makoto Matsumoto

and Takuji Nishimura In the registration and matching step when

searching the nearest neighbor a line segment that has already matched

with others should not be used again In our approach a flag

FIG

FIG

Variable denoting whether the line has been matched is stored in

shared memory To share the flags all the threads in a block should wait

synchronic operation at every query step Our solution is to use a single

thread in a block to process the matching

253 MEMORY MANAGEMENT

The bandwidth inside GPU board is much higher than the

bandwidth between host memory and device memory The data transfer

between host and device can lead to long latency As shown in Figure 11

we load the entire target templates set from database without considering

when they would be processed Therefore there was no data transfer from

host to device during the matching procedure In global memory the

components in descriptor y(ϕ1 ϕ2 ϕ3 x y) and s(x y rθ ϕw) were stored

separately This would guarantee contiguous kernels of Algorithm 2 to 4

can access their data in successive addresses Although such coalescing

access reduces the latency frequently global memory access was still a

slower way to get data In our kernel we loaded the test template to shared

memory to accelerate memory access Because the Algorithms 2 to 4

execute different number of iterations on same data the bank conflict does

not happen To maximize our texture memory space we set the system

cache to the lowest value and bonded our target descriptor to texture

memory Using this catchable memory our data access was accelerated

more

FIG

26 HISTOGRAM OF ORIENTED GRADIENTS

Histogram of oriented gradients is the feature descriptors It is primarily

applied to the design of target detection In this paper it is applied as the

feature for human recognition In the sclera region the vein patterns are the

edges of an image So HOG is used to determine the gradient orientation

and edge orientations of vein pattern in the sclera region of an eye image

To follow out this technique first of all divide the image into small

connected regions called cells For each cell compute the histogram of

gradient directions or edge orientations of the pixels Then the combination

of different histogram of different cell represents the descriptor To improve

accuracy histograms can be contrast normalized by calculating the intensity

from the block and then using this value normalizes all cells within the

block This normalization result shows that it is invariant to geometric and

photometric changes The gradient magnitude m(x y) and orientation 1050592(x

y) are calculated using x and y directions gradients dx (x y) and dy (x y)

Orientation binning is the second step of HOG This method utilized

to create cell histograms Each pixel within the cell used to give a weight to

the orientation which is found in the gradient computation Gradient

magnitude is used as the weight The cells are in the rectangular form The

binning of gradient orientation should be spread over 0 to 180 degrees and

opposite direction counts as the same In the Fig 8 depicts the edge

orientation of picture elements If the images have any illumination and

contrast changes then the gradient strength must be locally normalized For

that cells are grouped together into larger blocks These blocks are

overlapping so that each cell contributes more than once to the final

descriptor Here rectangular HOG (R-HOG) blocks are applied which are

mainly in square grids The performance of HOG is improved by putting

on a Gaussian window into each block

FIG

CHAPTER 3

SOFTWARE SPECIFICATION

31 GENERAL

MATLAB(matrix laboratory) is a numerical

computing environment and fourth-generation programming language

Developed by Math Works MATLAB allows matrix manipulations

plotting of functions and data implementation of algorithms creation

of user interfaces and interfacing with programs written in other languages

including C C++ Java and Fortran

Although MATLAB is intended primarily for numerical computing an

optional toolbox uses the MuPAD symbolic engine allowing access

to symbolic computing capabilities An additional package Simulink adds

graphicalmulti-domainsimulationandModel-Based

Design for dynamic and embedded systems

In 2004 MATLAB had around one million users across industry

and academia MATLAB users come from various backgrounds

of engineering science and economics MATLAB is widely used in

academic and research institutions as well as industrial enterprises

MATLAB was first adopted by researchers and practitioners

in control engineering Littles specialty but quickly spread to many other

domains It is now also used in education in particular the teaching

of linear algebra and numerical analysis and is popular amongst scientists

involved in image processing The MATLAB application is built around the

MATLAB language The simplest way to execute MATLAB code is to type

it in the Command Window which is one of the elements of the MATLAB

Desktop When code is entered in the Command Window MATLAB can

be used as an interactive mathematical shell Sequences of commands can

be saved in a text file typically using the MATLAB Editor as a script or

encapsulated into a function extending the commands available

MATLAB provides a number of features for documenting and

sharing your work You can integrate your MATLAB code with other

languages and applications and distribute your MATLAB algorithms and

applications

32 FEATURES OF MATLAB

High-level language for technical computing

Development environment for managing code files and data

Interactive tools for iterative exploration design and problem solving

Mathematical functions for linear algebra statistics Fourier analysis

filtering optimization and numerical integration

2-D and 3-D graphics functions for visualizing data

Tools for building custom graphical user interfaces

Functions for integrating MATLAB based algorithms with external

applications and languages such as C C++ FORTRAN Javatrade COM

and Microsoft Excel

MATLAB is used in vast area including signal and image

processing communications control design test and measurement

financial modeling and analysis and computational Add-on toolboxes

(collections of special-purpose MATLAB functions) extend the MATLAB

environment to solve particular classes of problems in these application

areas

MATLAB can be used on personal computers and powerful

server systems including the Cheaha compute cluster With the addition of

the Parallel Computing Toolbox the language can be extended with parallel

implementations for common computational functions including for-loop

unrolling Additionally this toolbox supports offloading computationally

intensive workloads to Cheaha the campus compute cluster MATLAB is

one of a few languages in which each variable is a matrix (broadly

construed) and knows how big it is Moreover the fundamental operators

(eg addition multiplication) are programmed to deal with matrices when

required And the MATLAB environment handles much of the bothersome

housekeeping that makes all this possible Since so many of the procedures

required for Macro-Investment Analysis involves matrices MATLAB

proves to be an extremely efficient language for both communication and

implementation

321 INTERFACING WITH OTHER LANGUAGES

MATLAB can call functions and subroutines written in the C

programming language or FORTRAN A wrapper function is created

allowing MATLAB data types to be passed and returned The dynamically

loadable object files created by compiling such functions are termed MEX-

files (for MATLAB executable)

Libraries written in Java ActiveX or NET can be directly called

from MATLAB and many MATLAB libraries (for

example XML or SQL support) are implemented as wrappers around Java

or ActiveX libraries Calling MATLAB from Java is more complicated but

can be done with MATLAB extension which is sold separately by Math

Works or using an undocumented mechanism called JMI (Java-to-Mat lab

Interface) which should not be confused with the unrelated Java that is also

called JMI

As alternatives to the MuPAD based Symbolic Math Toolbox

available from Math Works MATLAB can be connected

to Maple or Mathematical

Libraries also exist to import and export MathML

Development Environment

Startup Accelerator for faster MATLAB startup on Windows especially on

Windows XP and for network installations

Spreadsheet Import Tool that provides more options for selecting and

loading mixed textual and numeric data

Readability and navigation improvements to warning and error messages in

the MATLAB command window

Automatic variable and function renaming in the MATLAB Editor

Developing Algorithms and Applications

MATLAB provides a high-level language and development

tools that let you quickly develop and analyze your algorithms and

applications

The MATLAB Language

The MATLAB language supports the vector and matrix operations

that are fundamental to engineering and scientific problems It enables fast

development and execution With the MATLAB language you can

program and develop algorithms faster than with traditional languages

because you do not need to perform low-level administrative tasks such as

declaring variables specifying data types and allocating memory In many

cases MATLAB eliminates the need for lsquoforrsquo loops As a result one line of

MATLAB code can often replace several lines of C or C++ code

At the same time MATLAB provides all the features of a traditional

programming language including arithmetic operators flow control data

structures data types object-oriented programming (OOP) and debugging

features

MATLAB lets you execute commands or groups of commands one

at a time without compiling and linking enabling you to quickly iterate to

the optimal solution For fast execution of heavy matrix and vector

computations MATLAB uses processor-optimized libraries For general-

purpose scalar computations MATLAB generates machine-code

instructions using its JIT (Just-In-Time) compilation technology

This technology which is available on most platforms provides

execution speeds that rival those of traditional programming languages

Development Tools

MATLAB includes development tools that help you implement

your algorithm efficiently These include the following

MATLAB Editor

Provides standard editing and debugging features such as setting

breakpoints and single stepping

Code Analyzer

Checks your code for problems and recommends modifications to

maximize performance and maintainability

MATLAB Profiler

Records the time spent executing each line of code

Directory Reports

Scan all the files in a directory and report on code efficiency file

differences file dependencies and code coverage

Designing Graphical User Interfaces

By using the interactive tool GUIDE (Graphical User Interface

Development Environment) to layout design and edit user interfaces

GUIDE lets you include list boxes pull-down menus push buttons radio

buttons and sliders as well as MATLAB plots and Microsoft

ActiveXreg controls Alternatively you can create GUIs programmatically

using MATLAB functions

322 ANALYZING AND ACCESSING DATA

MATLAB supports the entire data analysis process from acquiring

data from external devices and databases through preprocessing

visualization and numerical analysis to producing presentation-quality

output

Data Analysis

MATLAB provides interactive tools and command-line functions for data

analysis operations including

Interpolating and decimating

Extracting sections of data scaling and averaging

Thresholding and smoothing

Correlation Fourier analysis and filtering

1-D peak valley and zero finding

Basic statistics and curve fitting

Matrix analysis

Data Access

MATLAB is an efficient platform for accessing data from

files other applications databases and external devices You can read data

from popular file formats such as Microsoft Excel ASCII text or binary

files image sound and video files and scientific files such as HDF and

HDF5 Low-level binary file IO functions let you work with data files in

any format Additional functions let you read data from Web pages and

XML

Visualizing Data

All the graphics features that are required to visualize engineering

and scientific data are available in MATLAB These include 2-D and 3-D

plotting functions 3-D volume visualization functions tools for

interactively creating plots and the ability to export results to all popular

graphics formats You can customize plots by adding multiple axes

changing line colors and markers adding annotation Latex equations and

legends and drawing shapes

2-D Plotting

Visualizing vectors of data with 2-D plotting functions that create

Line area bar and pie charts

Direction and velocity plots

Histograms

Polygons and surfaces

Scatterbubble plots

Animations

3-D Plotting and Volume Visualization

MATLAB provides functions for visualizing 2-D matrices 3-

D scalar and 3-D vector data You can use these functions to visualize and

understand large often complex multidimensional data Specifying plot

characteristics such as camera viewing angle perspective lighting effect

light source locations and transparency

3-D plotting functions include

Surface contour and mesh

Image plots

Cone slice stream and isosurface

323 PERFORMING NUMERIC COMPUTATION

MATLAB contains mathematical statistical and engineering

functions to support all common engineering and science operations These

functions developed by experts in mathematics are the foundation of the

MATLAB language The core math functions use the LAPACK and BLAS

linear algebra subroutine libraries and the FFTW Discrete Fourier

Transform library Because these processor-dependent libraries are

optimized to the different platforms that MATLAB supports they execute

faster than the equivalent C or C++ code

MATLAB provides the following types of functions for performing

mathematical operations and analyzing data

Matrix manipulation and linear algebra

Polynomials and interpolation

Fourier analysis and filtering

Data analysis and statistics

Optimization and numerical integration

Ordinary differential equations (ODEs)

Partial differential equations (PDEs)

Sparse matrix operations

MATLAB can perform arithmetic on a wide range of data types

including doubles singles and integers

CHAPTER 4

IMPLEMENTATION

41 GENERAL

Matlab is a program that was originally designed to simplify the

implementation of numerical linear algebra routines It has since grown into

something much bigger and it is used to implement numerical algorithms

for a wide range of applications The basic language used is very similar to

standard linear algebra notation but there are a few extensions that will

likely cause you some problems at first

42 SNAPSHOTS

ORIGINAL SCLERA IMAGE IS CONVERTED INTO GREY SCALE IMAGE

FIG

GREY SCALE IMAGE IS CONVERTED INTO BINARY IMAGE

FIG

EDGE DETECTON IS DONE BY OTSUrsquoS THRESHOLDING

FIG

SELECTING THE REGION OF INTEREST (SCLERA PART)

FIG

SELECTED ROI PART

FIG

FIG

ENHANCEMENT OF SCLERA IMAGE

FIG

FEATURE EXTRACTION OF SCLERA IMAGE USING GABOR

FILTERS

FIG

MATCHING WITH IMAGES IN DATABASE

FIG

DISPLAYING THE RESULT (MATCHED OR NOT MATCHED)

FIG

CHAPTER 5

APPLICATIONS

The applications of biometrics can be divided into the following three main groups

Commercial applications such as computer network login electronic data security ecommerce Internet access ATM credit card physical access control cellular phone PDA medical records management distance learning etc

Government applications such as national ID card correctional facility driverrsquos license social security welfare-disbursement border control Passport control etc

Forensic applications such as corpse identification criminal investigation terrorist identification parenthood determination missing children etc Traditionally commercial applications have used knowledge-based systems (eg PIN sand passwords) government applications have used token-based systems (eg ID cards and badges) and forensic applications have relied on human experts to match biometric features Biometric systems are being increasingly deployed in large scale civilian applications The Schiphol Premium scheme at the Amsterdam airport for example employs iris scan cards to speed up the passport and visa control procedures

CHAPTER 6

CONCLUSION AND FUTURE SCOPE

61 CONCLUSION

In this paper we proposed a new parallel sclera vein recognition

method which employees a two stage parallel approach for registration and

matching Even though the research focused on developing a parallel sclera

matching solution for the sequential line-descriptor method using CUDA

GPU architecture the parallel strategies developed in this research can be

applied to design parallel solutions to other sclera vein recognition methods

and general pattern recognition methods We designed the Y shape

descriptor to narrow the search range to increase the matching efficiency

which is a new feature extraction method to take advantage of the GPU

structures We developed the WPL descriptor to incorporate mask

information and make it more suitable for parallel computing which can

dramatically reduce data transferring and computation We then carefully

mapped our algorithms to GPU threads and blocks which is an important

step to achieve parallel computation efficiency using a GPU A work flow

which has high arithmetic intensity to hide the memory access latency was

designed to partition the computation task to the heterogeneous system of

CPU and GPU even to the threads in GPU The proposed method

dramatically improves the matching efficiency without compromising

recognition accuracy

62 REFERENCES

[1] C W Oyster The Human Eye Structure and Function Sunderland

Sinauer Associates 1999

[2] C Cuevas D Berjon F Moran and N Garcia ldquoMoving object

detection for real-time augmented reality applications in a GPGPUrdquo IEEE

Trans Consum Electron vol 58 no 1 pp 117ndash125 Feb 2012

[3] D C Cirean U Meier L M Gambardella and J Schmidhuber ldquoDeep

big simple neural nets for handwritten digit recognitionrdquo Neural Comput

vol 22 no 12 pp 3207ndash3220 2010

[4] F Z Sakr M Taher and A M Wahba ldquoHigh performance iris

recognition system on GPUrdquo in Proc ICCES 2011 pp 237ndash242

[5] G Poli J H Saito J F Mari and M R Zorzan ldquoProcessing

neocognitron of face recognition on high performance environment based

on GPU with CUDA architecturerdquo in Proc 20th Int Symp Comput

Archit High Perform Comput 2008 pp 81ndash88

[6] J Antikainen J Havel R Josth A Herout P Zemcik and M Hauta-

Kasari ldquoNonnegative tensor factorization accelerated using GPGPUrdquo IEEE

Trans Parallel Distrib Syst vol 22 no 7 pp 1135ndash1141 Feb 2011

[7] K-S Oh and K Jung ldquoGPU implementation of neural networksrdquo

Pattern Recognit vol 37 no 6 pp 1311ndash1314 2004

[8] P R Dixon T Oonishi and S Furui ldquoHarnessing graphics processors

for the fast computation of acoustic likelihoods in speech recognitionrdquo

Comput Speech Lang vol 23 no 4 pp 510ndash526 2009

[9] P Kaufman and A Alm ldquoClinical applicationrdquo Adlerrsquos Physiology of

the Eye 2003

[10] R N Rakvic B J Ulis R P Broussard R W Ives and N Steiner

ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics Security vol 4

no 4 pp 812ndash823 Dec 2009

[11] S Crihalmeanu and A Ross ldquoMultispectral scleral patterns for ocular

biometric recognitionrdquo Pattern Recognit Lett vol 33 no 14 pp 1860ndash

1869 Oct 2012

[12] W Wenying Z Dongming Z Yongdong L Jintao and G

Xiaoguang ldquoRobust spatial matching for object retrieval and its parallel

implementation on GPUrdquo IEEE Trans Multimedia vol 13 no 6 pp

1308ndash1318 Dec 2011Multimedia Sec Magdeburg Germany Sep 2004

pp 4ndash15

[13] Y Xu S Deka and R Righetti ldquoA hybrid CPU-GPGPU approach for

real-time elastographyrdquo IEEE Trans Ultrason Ferroelectr Freq Control

vol 58 no 12 pp 2631ndash2645 Dec 2011

[14] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive

multimodal eye recognitionrdquo Signal Image Video Process vol 7 no 4

pp 619ndash631 Jul 2013

[15] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive

approach for sclera image quality measurerdquo Int J Biometrics vol 5 no 2

pp 181ndash198 2013

[16] Z Zhou E Y Du N L Thomas and E J Delp ldquoA new human

identification method Sclera recognitionrdquo IEEE Trans Syst Man

Cybern A Syst Humans vol 42 no 3 pp 571ndash583 May 2012

Page 44: Major project report on

descriptor value from polar coordinate to rectangular coordinate in CPU

preprocess

The descriptor vector becomes s(x y r θ ɸw) The left and right

parts of sclera in an eye may have different registration parameters For

example as an eyeball moves left left part sclera patterns of the eye may be

compressed while the right part sclera patterns are stretched

In parallel matching these two parts are assigned to threads in

different warps to allow different deformation The multiprocessor in

CUDA manages threads in groups of 32 parallel threads called warps We

reorganized the descriptor from same sides and saved

FIG

FIG

them in continuous address This would meet requirement of coalesced

memory access in GPU

After reorganizing the structure of descriptors and adding mask information

into the new descriptor the computation on the mask file is not needed on

the GPU It was very fast to match with this feature because it does not

need to reregister the templates every time after shifting Thus the cost of

data transfer and computation on GPU will be reduced Matching on the

new descriptor the shift parameters generator in Figure 4 is then simplified

as Figure 9

23 EVOLUTION OF GPU ARCHITECTURE

The fixed-function pipeline lacked the generality to efficiently express

more complicated shading and lighting operations that are essential for

complex effects The key step was replacing the fixed-function per-vertex

and per-fragment operations with user-specified programs run on each

vertex and fragment Over the past six years these vertex programs and

fragment programs have become increasingly more capable with larger

limits on their size and resource consumption with more fully featured

instruction sets and with more flexible control-flow operations After many

years of separate instruction sets for vertex and fragment operations current

GPUs support the unified Shader Model 40 on both vertex and fragment

shaders

The hardware must support shader programs of at least 65 k static

instructions and unlimited dynamic instructions

The instruction set for the first time supports both 32-bit integers and 32-

bit floating-point numbers

The hardware must allow an arbitrary number of both direct and indirect

reads from global memory (texture)

Finally dynamic flow control in the form of loops and branches must be

supported

As the shader model has evolved and become more powerful and GPU

applications of all types have increased vertex and fragment program

complexity GPU architectures have increasingly focused on the

programmable parts of the graphics pipeline Indeed while previous

generations of GPUs could best be described as additions of

programmability to a fixed-function pipeline todayrsquos GPUs are better

characterized as a programmable engine surrounded by supporting fixed-

function units General-Purpose Computing on the GPU Mapping general-

purpose computation onto the GPU uses the graphics hardware in much the

same way as any standard graphics application Because of this similarity it

is both easier and more difficult to explain the process On one hand the

actual operations are the same and are easy to follow on the other hand the

terminology is different between graphics and general-purpose use Harris

provides an excellent description of this mapping process

We begin by describing GPU programming using graphics terminology

then show how the same steps are used in a general-purpose way to author

GPGPU applications and finally use the same steps to show the more

simple and direct way that todayrsquos GPU computing applications are written

231 PROGRAMMING A GPU FOR GRAPHICS

We begin with the same GPU pipeline that we described in Section II

concentrating on the programmable aspects of this pipeline

The programmer specifies geometry that covers a region on the screen

The rasterizer generates a fragment at each pixel location covered by that

geometry

Each fragment is shaded by the fragment program

The fragment program computes the value of the fragment by a

combination of math operations and global memory reads from a global

Btexture[ memory

The resulting image can then be used as texture on future passes through

the graphics pipeline

232 PROGRAMMING A GPU FOR GENERAL-PURPOSE

PROGRAMS (OLD)

Coopting this pipeline to perform general-purpose computation

involves the exact same steps but different terminology A motivating

example is a fluid simulation computed over a grid at each time step we

compute the next state of the fluid for each grid point from the current state

at its grid point and at the grid points of its neighbors

The programmer specifies a geometric primitive that covers a

computation domain of interest The rasterizer generates a fragment at each

pixel location covered by that geometry (In our example our primitive

must cover a grid of fragments equal to the domain size of our fluid

simulation)

Each fragment is shaded by an SPMD general-purpose fragment

program (Each grid point runs the same program to update the state of its

fluid)

The fragment program computes the value of the fragment by a

combination of math operations and Bgather[ accesses from global

memory (Each grid point can access the state of its neighbors from the

previous time step in computing its current value)

The resulting buffer in global memory can then be used as an input on

future passes (The current state of the fluid will be used on the next time

step)

23 2 PROGRAMMING A GPU FOR GENERAL-PURPOSE

PROGRAMS (NEW)

One of the historical difficulties in programming GPGPU applications

has been that despite their general-purpose tasksrsquo having nothing to do with

graphics the applications still had to be programmed using graphics APIs

In addition the program had to be structured in terms of the graphics

pipeline with the programmable units only accessible as an intermediate

step in that pipeline when the programmer would almost certainly prefer to

access the programmable units directly The programming environments we

describe in detail in Section IV are solving this difficulty by providing a

more natural direct non-graphics interface to the hardware and

specifically the programmable units Today GPU computing applications

are structured in the following way

The programmer directly defines the computation domain of interest as a

structured grid of threads

An SPMD general-purpose program computes the value of each thread

The value for each thread is computed by a combination of math

operations and both Bgather[ (read) accesses from and Bscatter[ (write)

accesses to global memory Unlike in the previous two

methods the same buffer can be used for both reading and writing

allowing more flexible algorithms (for example in-place algorithms that

use less memory)

The resulting buffer in global memory can then be used as an input in

future computation

24 COARSE-TO-FINE TWO-STAGE MATCHING PROCESS

To further improve the matching process we propose the coarse-to-fine

two-stage matching process In the first stage we matched two images

coarsely using the Y-shape descriptors which is very fast to match because

no registration was needed The matching result in this stage can help filter

out image pairs with low similarities After this step it is still possible for

some false positive matches In the second stage we used WPL descriptor

to register the two images for more detailed descriptor matching including

scale- and translation invariance This stage includes shift transform affine

matrix generation and final WPL descriptor matching Overall we

partitioned the registration and matching processing into four kernels2 in

CUDA (Figure 10) matching on the Y shape descriptor shift

transformation affine matrix generation and final WSL descriptor

matching Combining these two stages the matching program can run faster

and

achieve more accurate score

241 STAGE I MATCHING WITH Y SHAPE DESCRIPTOR

Due to scale- and rotation- invariance of the Y-shape features

registration is unnecessary before matching on Y shape descriptor The

whole matching algorithm is listed as algorithm 1

FIG

Here ytei and yta j are the Y shape descriptors of test template Tte

and target template Tta respectively dϕ is the Euclidian distance of angle

element of descriptors vector defined as (3) dxy is the Euclidian distance of

two descriptor centers defined as (4) ni and di are the matched descriptor

pairsrsquo number and their centers distance respectively tϕ is a distance

threshold and txy is the threshold to restrict the searching area We set tϕ to

30 and txy to 675 in our experiment Here

To match two sclera templates we searched the areas nearby to all

the Y shape branches The search area is limited to the corresponding left or

right half of the sclera in order to reduce the searching range and time The

distance of two branches is defined in (3) where ϕi j is the angle between

the j th branch and the polar from pupil center in desctiptor i

The number of matched pairs ni and the distance between Y shape

branches centers di are stored as the matching result We fuse the number of

matched branches and the average distance between matched branches

centers as (2) Here α is a factor to fuse the matching score which was set

to 30 in our study Ni and Nj is the total numbers of feature vectors in

template i and j separately The decision is regulated by the threshold t if

the sclerarsquos matching score is lower than t the sclera will be discarded The

sclera with high matching score will be passed to the next more precisely

matching process

242 STAGE II FINE MATCHING USING WPL DESCRIPTOR

The line segment WSL descriptor reveals more vessel structure detail of

sclera than the Y shape descriptor The variation of sclera vessel pattern is

nonlinear because

When acquiring an eye image in different gaze angle the vessel structure

will appear nonlinear shrink or extend because eyeball is spherical in shape

sclera is made up of four layers episclera stroma lamina fusca and

endothelium There are slightly differences among movement of these

layers Considering these factors our registration employed both single

shift transform and multi-parameter transform which combines shift

rotation and scale together

1) SHIFT PARAMETER SEARCH As we discussed before

segmentation may not be accurate As a result the detected iris center could

not be very accurate Shift transform is designed to tolerant possible errors

in pupil center detection in the segmentation step If there is no deformation

or only very minor deformation registration with shift transform together

would be adequate to achieve an accurate result We designed Algorithm 2

to get optimized shift parameter Where Tte is the test template and ssei is

the i th WPL descriptor of Tte Tta is the target template and ssai is the i th

WPL descriptor of Ttad(stek staj ) is Euclidean distance of escriptors stek

and staj

Δsk is the shift value of two descriptors defines as

We first randomly select an equal number of segment descriptors

stek in test template Tte from each quad and find its nearest neighbors staj _

in target template Tta The shift offset of them is recorded as the possible

registration shift factor _sk The final offset registration factor is _soptim

which has the smallest standard deviation among these candidate offsets

2) AFFINE TRANSFORM PARAMETER SEARCH

Affine transform is designed to tolerant some deformation of sclera

patterns in the matching step The affine transform algorithm is shown in

Algorithm 3 The shift value in the parameter set is obtained by randomly

selecting descriptor s(it )te and calculating the distance from its nearest

neighbor staj_ in Tta We transform the test template by the matrix in (7)

At end of the iteration we count the numbers of matched descriptor pairs

from the transformed template and the target template The factor β is

involved to determine if the pair of descriptor is matched and we set it to

be 20 pixels in our experiment After N iterations the optimized transform

parameter set is determined via selecting the maximum matching numbers

m(it) Here stei Tte staj and Tta is defined same as algorithm 2 tr (it )

shi f t θ(it )tr (it ) scale is the parameters of shift rotation and scale

transform generated in i tth iteration R(θ (it )) T (tr (it ) shi f t ) and S(tr (it

) scale) are the transform matrix defined as (7) To search optimize

transform parameter we iterated N times to generate these parameters In

our experiment we set iteration time to 512

3) REGISTRATION AND MATCHING ALGORITHM

Using the optimized parameter set determined from Algorithms 2

and 3 the test template will be registered and matched simultaneously The

registration and matching algorithm is listed in Algorithm 4 Here stei Tte

staj and Tta are defined same as Algorithms 2 and 3 θ(optm) tr (optm) shi f

t tr (optm) scale _soptim are the registration parameters attained from

Algorithms 2 and 3 R_θ(optm)_T _tr (optm) shi f t _S_tr (optm) scale

_ is the descriptor transform matrix defined in Algorithm 3 empty is the angle

between the segment descriptor and radius direction w is the weight of the

descriptor which indicates whether the descriptor is at the edge of sclera or

not To ensure that the nearest descriptors have a similar orientation we

used a constant factor α to check the abstract difference of two ɸ In our

experiment we set α to 5 The total matching score is minimal score of two

transformed result divided by the minimal matching score for test template

and target template

25 MAPPING THE SUBTASKS TO CUDA

CUDA is a single instruction multiple data (SIMD) system and

works as a coprocessor with a CPU A CUDA consists of many streaming

multiprocessors (SM) where the parallel part of the program should be

partitioned into threads by the programmer and mapped into those threads

There are multiple memory spaces in the CUDA memory hierarchy

register local memory shared memory global memory constant memory

and texture memory Register local memory and shared memory are on-

chip and could be a little time consuming to access these memories Only

shared memory can be accessed by other threads within the same block

However there is only limited availability of shared memory Global

memory constant memory and texture memory are off-chip memory and

accessible by all threads which would be very time consuming to access

these memories

Constant memory and texture memory are read-only and cacheable

memory Mapping algorithms to CUDA to achieve efficient processing is

not a trivial task There are several challenges in CUDA programming

If threads in a warp have different control path all the branches will be

executed serially To improve performance branch divergence within a

warp should be avoided

Global memory is slower than on-chip memory in term of access To

completely hide the latency of the small instructions set we should use on-

chip memory preferentially rather than global memory When global

memory access occurs threads in same warp should access the words in

sequence to achieve coalescence

Shared memory is much faster than the local and global memory space

But shared memory is organized into banks which are equal in size If two

addresses of memory request from different thread within a warp fall in the

same memory bank the access will be serialized To get maximum

performance memory requests should be scheduled to minimize bank

conflicts

251 MAPPING ALGORITHM TO BLOCKS

Because the proposed registration and matching algorithm has four

independent modules all the modules will be converted to different kernels

on the GPU These kernels are different in computation density thus we

map them to the GPU by various map strategies to fully utilize the

computing power of CUDA Figure 11 shows our scheme of CPU-GPU

task distribution and the partition among blocks and threads Algorithm 1 is

partitioned into coarse-grained parallel subtasks

We create a number of threads in this kernel The number of threads

is the same as the number of templates in the database As the upper middle

column shows in Figure 11 each target template will be assigned to one

thread One thread performs a pair of templates compare In our work we

use NVIDIA C2070 as our GPU Threads and blocks number is set to

1024 That means we can match our test template with up to 1024times1024

target templates at same time

Algorithms 2-4 will be partitioned into fine-grained subtasks which is

processed a section of descriptors in one thread As the lower portion of the

middle column shows in Figure 11 we assigned a target template to one

block Inside a block one thread corresponds a set of descriptors in this

template This partition makes every block execute independently and there

are no data exchange requirements between different blocks When all

threads complete their responding descriptor fractions the sum of the

intermediate results needs to be computed or compared A parallel prefix

sum algorithm is used to calculate the sum of intermediate results which is

show in right of Figure 11 Firstly all odd number threads compute the sum

of consecutive pairs of the results Then recursively every first of i (= 4 8

16 32 64 ) threads

compute the prefix sum on the new result The final result will be saved in

the first address which has the same variable name as the first intermediate

result

252 MAPPING INSIDE BLOCK

In shift argument searching there are two schemes we can choose to

map task

Mapping one pair of templates to all the threads in a block and then every

thread would take charge of a fraction of descriptors and cooperation with

other threads

Assigning a single possible shift offset to a thread and all the threads will

compute independently unless the final result should be compared with

other possible offset

Due to great number of sum and synchronization operations in every

nearest neighbor searching step we choose the second method to parallelize

shift searching In affine matrix generator we mapped an entire parameter

set searching to a thread and every thread randomly generated a set of

parameters and tried them independently The generated iterations were

assigned to all threads The challenge of this step is the randomly generated

numbers might be correlated among threads In the step of rotation and

scale registration generating we used the Mersenne Twister pseudorandom

number generator because it can use bitwise arithmetic and have long

period

The Mersenne twister as most of pseudorandom generators is iterative

Therefore itrsquos hard to parallelize a single twister state update step among

several execution threads To make sure that thousands of threads in the

launch grid generate uncorrelated random sequence many simultaneous

Mersenne twisters need to process with different initial states in parallel

But even ldquovery differentrdquo (by any definition) initial state values do not

prevent the emission of correlated sequences by each generator sharing

identical parameters To solve this problem and to enable efficient

implementation of Mersenne Twister on parallel architectures we used a

special offline tool for the dynamic creation of Mersenne Twisters

parameters modified from the algorithm developed by Makoto Matsumoto

and Takuji Nishimura In the registration and matching step when

searching the nearest neighbor a line segment that has already matched

with others should not be used again In our approach a flag

FIG

FIG

Variable denoting whether the line has been matched is stored in

shared memory To share the flags all the threads in a block should wait

synchronic operation at every query step Our solution is to use a single

thread in a block to process the matching

253 MEMORY MANAGEMENT

The bandwidth inside GPU board is much higher than the

bandwidth between host memory and device memory The data transfer

between host and device can lead to long latency As shown in Figure 11

we load the entire target templates set from database without considering

when they would be processed Therefore there was no data transfer from

host to device during the matching procedure In global memory the

components in descriptor y(ϕ1 ϕ2 ϕ3 x y) and s(x y rθ ϕw) were stored

separately This would guarantee contiguous kernels of Algorithm 2 to 4

can access their data in successive addresses Although such coalescing

access reduces the latency frequently global memory access was still a

slower way to get data In our kernel we loaded the test template to shared

memory to accelerate memory access Because the Algorithms 2 to 4

execute different number of iterations on same data the bank conflict does

not happen To maximize our texture memory space we set the system

cache to the lowest value and bonded our target descriptor to texture

memory Using this catchable memory our data access was accelerated

more

FIG

26 HISTOGRAM OF ORIENTED GRADIENTS

Histogram of oriented gradients is the feature descriptors It is primarily

applied to the design of target detection In this paper it is applied as the

feature for human recognition In the sclera region the vein patterns are the

edges of an image So HOG is used to determine the gradient orientation

and edge orientations of vein pattern in the sclera region of an eye image

To follow out this technique first of all divide the image into small

connected regions called cells For each cell compute the histogram of

gradient directions or edge orientations of the pixels Then the combination

of different histogram of different cell represents the descriptor To improve

accuracy histograms can be contrast normalized by calculating the intensity

from the block and then using this value normalizes all cells within the

block This normalization result shows that it is invariant to geometric and

photometric changes The gradient magnitude m(x y) and orientation 1050592(x

y) are calculated using x and y directions gradients dx (x y) and dy (x y)

Orientation binning is the second step of HOG This method utilized

to create cell histograms Each pixel within the cell used to give a weight to

the orientation which is found in the gradient computation Gradient

magnitude is used as the weight The cells are in the rectangular form The

binning of gradient orientation should be spread over 0 to 180 degrees and

opposite direction counts as the same In the Fig 8 depicts the edge

orientation of picture elements If the images have any illumination and

contrast changes then the gradient strength must be locally normalized For

that cells are grouped together into larger blocks These blocks are

overlapping so that each cell contributes more than once to the final

descriptor Here rectangular HOG (R-HOG) blocks are applied which are

mainly in square grids The performance of HOG is improved by putting

on a Gaussian window into each block

FIG

CHAPTER 3

SOFTWARE SPECIFICATION

31 GENERAL

MATLAB(matrix laboratory) is a numerical

computing environment and fourth-generation programming language

Developed by Math Works MATLAB allows matrix manipulations

plotting of functions and data implementation of algorithms creation

of user interfaces and interfacing with programs written in other languages

including C C++ Java and Fortran

Although MATLAB is intended primarily for numerical computing an

optional toolbox uses the MuPAD symbolic engine allowing access

to symbolic computing capabilities An additional package Simulink adds

graphicalmulti-domainsimulationandModel-Based

Design for dynamic and embedded systems

In 2004 MATLAB had around one million users across industry

and academia MATLAB users come from various backgrounds

of engineering science and economics MATLAB is widely used in

academic and research institutions as well as industrial enterprises

MATLAB was first adopted by researchers and practitioners

in control engineering Littles specialty but quickly spread to many other

domains It is now also used in education in particular the teaching

of linear algebra and numerical analysis and is popular amongst scientists

involved in image processing The MATLAB application is built around the

MATLAB language The simplest way to execute MATLAB code is to type

it in the Command Window which is one of the elements of the MATLAB

Desktop When code is entered in the Command Window MATLAB can

be used as an interactive mathematical shell Sequences of commands can

be saved in a text file typically using the MATLAB Editor as a script or

encapsulated into a function extending the commands available

MATLAB provides a number of features for documenting and

sharing your work You can integrate your MATLAB code with other

languages and applications and distribute your MATLAB algorithms and

applications

32 FEATURES OF MATLAB

High-level language for technical computing

Development environment for managing code files and data

Interactive tools for iterative exploration design and problem solving

Mathematical functions for linear algebra statistics Fourier analysis

filtering optimization and numerical integration

2-D and 3-D graphics functions for visualizing data

Tools for building custom graphical user interfaces

Functions for integrating MATLAB based algorithms with external

applications and languages such as C C++ FORTRAN Javatrade COM

and Microsoft Excel

MATLAB is used in vast area including signal and image

processing communications control design test and measurement

financial modeling and analysis and computational Add-on toolboxes

(collections of special-purpose MATLAB functions) extend the MATLAB

environment to solve particular classes of problems in these application

areas

MATLAB can be used on personal computers and powerful

server systems including the Cheaha compute cluster With the addition of

the Parallel Computing Toolbox the language can be extended with parallel

implementations for common computational functions including for-loop

unrolling Additionally this toolbox supports offloading computationally

intensive workloads to Cheaha the campus compute cluster MATLAB is

one of a few languages in which each variable is a matrix (broadly

construed) and knows how big it is Moreover the fundamental operators

(eg addition multiplication) are programmed to deal with matrices when

required And the MATLAB environment handles much of the bothersome

housekeeping that makes all this possible Since so many of the procedures

required for Macro-Investment Analysis involves matrices MATLAB

proves to be an extremely efficient language for both communication and

implementation

321 INTERFACING WITH OTHER LANGUAGES

MATLAB can call functions and subroutines written in the C

programming language or FORTRAN A wrapper function is created

allowing MATLAB data types to be passed and returned The dynamically

loadable object files created by compiling such functions are termed MEX-

files (for MATLAB executable)

Libraries written in Java ActiveX or NET can be directly called

from MATLAB and many MATLAB libraries (for

example XML or SQL support) are implemented as wrappers around Java

or ActiveX libraries Calling MATLAB from Java is more complicated but

can be done with MATLAB extension which is sold separately by Math

Works or using an undocumented mechanism called JMI (Java-to-Mat lab

Interface) which should not be confused with the unrelated Java that is also

called JMI

As alternatives to the MuPAD based Symbolic Math Toolbox

available from Math Works MATLAB can be connected

to Maple or Mathematical

Libraries also exist to import and export MathML

Development Environment

Startup Accelerator for faster MATLAB startup on Windows especially on

Windows XP and for network installations

Spreadsheet Import Tool that provides more options for selecting and

loading mixed textual and numeric data

Readability and navigation improvements to warning and error messages in

the MATLAB command window

Automatic variable and function renaming in the MATLAB Editor

Developing Algorithms and Applications

MATLAB provides a high-level language and development

tools that let you quickly develop and analyze your algorithms and

applications

The MATLAB Language

The MATLAB language supports the vector and matrix operations

that are fundamental to engineering and scientific problems It enables fast

development and execution With the MATLAB language you can

program and develop algorithms faster than with traditional languages

because you do not need to perform low-level administrative tasks such as

declaring variables specifying data types and allocating memory In many

cases MATLAB eliminates the need for lsquoforrsquo loops As a result one line of

MATLAB code can often replace several lines of C or C++ code

At the same time MATLAB provides all the features of a traditional

programming language including arithmetic operators flow control data

structures data types object-oriented programming (OOP) and debugging

features

MATLAB lets you execute commands or groups of commands one

at a time without compiling and linking enabling you to quickly iterate to

the optimal solution For fast execution of heavy matrix and vector

computations MATLAB uses processor-optimized libraries For general-

purpose scalar computations MATLAB generates machine-code

instructions using its JIT (Just-In-Time) compilation technology

This technology which is available on most platforms provides

execution speeds that rival those of traditional programming languages

Development Tools

MATLAB includes development tools that help you implement

your algorithm efficiently These include the following

MATLAB Editor

Provides standard editing and debugging features such as setting

breakpoints and single stepping

Code Analyzer

Checks your code for problems and recommends modifications to

maximize performance and maintainability

MATLAB Profiler

Records the time spent executing each line of code

Directory Reports

Scan all the files in a directory and report on code efficiency file

differences file dependencies and code coverage

Designing Graphical User Interfaces

By using the interactive tool GUIDE (Graphical User Interface

Development Environment) to layout design and edit user interfaces

GUIDE lets you include list boxes pull-down menus push buttons radio

buttons and sliders as well as MATLAB plots and Microsoft

ActiveXreg controls Alternatively you can create GUIs programmatically

using MATLAB functions

322 ANALYZING AND ACCESSING DATA

MATLAB supports the entire data analysis process from acquiring

data from external devices and databases through preprocessing

visualization and numerical analysis to producing presentation-quality

output

Data Analysis

MATLAB provides interactive tools and command-line functions for data

analysis operations including

Interpolating and decimating

Extracting sections of data scaling and averaging

Thresholding and smoothing

Correlation Fourier analysis and filtering

1-D peak valley and zero finding

Basic statistics and curve fitting

Matrix analysis

Data Access

MATLAB is an efficient platform for accessing data from

files other applications databases and external devices You can read data

from popular file formats such as Microsoft Excel ASCII text or binary

files image sound and video files and scientific files such as HDF and

HDF5 Low-level binary file IO functions let you work with data files in

any format Additional functions let you read data from Web pages and

XML

Visualizing Data

All the graphics features that are required to visualize engineering

and scientific data are available in MATLAB These include 2-D and 3-D

plotting functions 3-D volume visualization functions tools for

interactively creating plots and the ability to export results to all popular

graphics formats You can customize plots by adding multiple axes

changing line colors and markers adding annotation Latex equations and

legends and drawing shapes

2-D Plotting

Visualizing vectors of data with 2-D plotting functions that create

Line area bar and pie charts

Direction and velocity plots

Histograms

Polygons and surfaces

Scatterbubble plots

Animations

3-D Plotting and Volume Visualization

MATLAB provides functions for visualizing 2-D matrices 3-

D scalar and 3-D vector data You can use these functions to visualize and

understand large often complex multidimensional data Specifying plot

characteristics such as camera viewing angle perspective lighting effect

light source locations and transparency

3-D plotting functions include

Surface contour and mesh

Image plots

Cone slice stream and isosurface

323 PERFORMING NUMERIC COMPUTATION

MATLAB contains mathematical statistical and engineering

functions to support all common engineering and science operations These

functions developed by experts in mathematics are the foundation of the

MATLAB language The core math functions use the LAPACK and BLAS

linear algebra subroutine libraries and the FFTW Discrete Fourier

Transform library Because these processor-dependent libraries are

optimized to the different platforms that MATLAB supports they execute

faster than the equivalent C or C++ code

MATLAB provides the following types of functions for performing

mathematical operations and analyzing data

Matrix manipulation and linear algebra

Polynomials and interpolation

Fourier analysis and filtering

Data analysis and statistics

Optimization and numerical integration

Ordinary differential equations (ODEs)

Partial differential equations (PDEs)

Sparse matrix operations

MATLAB can perform arithmetic on a wide range of data types

including doubles singles and integers

CHAPTER 4

IMPLEMENTATION

41 GENERAL

Matlab is a program that was originally designed to simplify the

implementation of numerical linear algebra routines It has since grown into

something much bigger and it is used to implement numerical algorithms

for a wide range of applications The basic language used is very similar to

standard linear algebra notation but there are a few extensions that will

likely cause you some problems at first

42 SNAPSHOTS

ORIGINAL SCLERA IMAGE IS CONVERTED INTO GREY SCALE IMAGE

FIG

GREY SCALE IMAGE IS CONVERTED INTO BINARY IMAGE

FIG

EDGE DETECTON IS DONE BY OTSUrsquoS THRESHOLDING

FIG

SELECTING THE REGION OF INTEREST (SCLERA PART)

FIG

SELECTED ROI PART

FIG

FIG

ENHANCEMENT OF SCLERA IMAGE

FIG

FEATURE EXTRACTION OF SCLERA IMAGE USING GABOR

FILTERS

FIG

MATCHING WITH IMAGES IN DATABASE

FIG

DISPLAYING THE RESULT (MATCHED OR NOT MATCHED)

FIG

CHAPTER 5

APPLICATIONS

The applications of biometrics can be divided into the following three main groups

Commercial applications such as computer network login electronic data security ecommerce Internet access ATM credit card physical access control cellular phone PDA medical records management distance learning etc

Government applications such as national ID card correctional facility driverrsquos license social security welfare-disbursement border control Passport control etc

Forensic applications such as corpse identification criminal investigation terrorist identification parenthood determination missing children etc Traditionally commercial applications have used knowledge-based systems (eg PIN sand passwords) government applications have used token-based systems (eg ID cards and badges) and forensic applications have relied on human experts to match biometric features Biometric systems are being increasingly deployed in large scale civilian applications The Schiphol Premium scheme at the Amsterdam airport for example employs iris scan cards to speed up the passport and visa control procedures

CHAPTER 6

CONCLUSION AND FUTURE SCOPE

61 CONCLUSION

In this paper we proposed a new parallel sclera vein recognition

method which employees a two stage parallel approach for registration and

matching Even though the research focused on developing a parallel sclera

matching solution for the sequential line-descriptor method using CUDA

GPU architecture the parallel strategies developed in this research can be

applied to design parallel solutions to other sclera vein recognition methods

and general pattern recognition methods We designed the Y shape

descriptor to narrow the search range to increase the matching efficiency

which is a new feature extraction method to take advantage of the GPU

structures We developed the WPL descriptor to incorporate mask

information and make it more suitable for parallel computing which can

dramatically reduce data transferring and computation We then carefully

mapped our algorithms to GPU threads and blocks which is an important

step to achieve parallel computation efficiency using a GPU A work flow

which has high arithmetic intensity to hide the memory access latency was

designed to partition the computation task to the heterogeneous system of

CPU and GPU even to the threads in GPU The proposed method

dramatically improves the matching efficiency without compromising

recognition accuracy

62 REFERENCES

[1] C W Oyster The Human Eye Structure and Function Sunderland

Sinauer Associates 1999

[2] C Cuevas D Berjon F Moran and N Garcia ldquoMoving object

detection for real-time augmented reality applications in a GPGPUrdquo IEEE

Trans Consum Electron vol 58 no 1 pp 117ndash125 Feb 2012

[3] D C Cirean U Meier L M Gambardella and J Schmidhuber ldquoDeep

big simple neural nets for handwritten digit recognitionrdquo Neural Comput

vol 22 no 12 pp 3207ndash3220 2010

[4] F Z Sakr M Taher and A M Wahba ldquoHigh performance iris

recognition system on GPUrdquo in Proc ICCES 2011 pp 237ndash242

[5] G Poli J H Saito J F Mari and M R Zorzan ldquoProcessing

neocognitron of face recognition on high performance environment based

on GPU with CUDA architecturerdquo in Proc 20th Int Symp Comput

Archit High Perform Comput 2008 pp 81ndash88

[6] J Antikainen J Havel R Josth A Herout P Zemcik and M Hauta-

Kasari ldquoNonnegative tensor factorization accelerated using GPGPUrdquo IEEE

Trans Parallel Distrib Syst vol 22 no 7 pp 1135ndash1141 Feb 2011

[7] K-S Oh and K Jung ldquoGPU implementation of neural networksrdquo

Pattern Recognit vol 37 no 6 pp 1311ndash1314 2004

[8] P R Dixon T Oonishi and S Furui ldquoHarnessing graphics processors

for the fast computation of acoustic likelihoods in speech recognitionrdquo

Comput Speech Lang vol 23 no 4 pp 510ndash526 2009

[9] P Kaufman and A Alm ldquoClinical applicationrdquo Adlerrsquos Physiology of

the Eye 2003

[10] R N Rakvic B J Ulis R P Broussard R W Ives and N Steiner

ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics Security vol 4

no 4 pp 812ndash823 Dec 2009

[11] S Crihalmeanu and A Ross ldquoMultispectral scleral patterns for ocular

biometric recognitionrdquo Pattern Recognit Lett vol 33 no 14 pp 1860ndash

1869 Oct 2012

[12] W Wenying Z Dongming Z Yongdong L Jintao and G

Xiaoguang ldquoRobust spatial matching for object retrieval and its parallel

implementation on GPUrdquo IEEE Trans Multimedia vol 13 no 6 pp

1308ndash1318 Dec 2011Multimedia Sec Magdeburg Germany Sep 2004

pp 4ndash15

[13] Y Xu S Deka and R Righetti ldquoA hybrid CPU-GPGPU approach for

real-time elastographyrdquo IEEE Trans Ultrason Ferroelectr Freq Control

vol 58 no 12 pp 2631ndash2645 Dec 2011

[14] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive

multimodal eye recognitionrdquo Signal Image Video Process vol 7 no 4

pp 619ndash631 Jul 2013

[15] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive

approach for sclera image quality measurerdquo Int J Biometrics vol 5 no 2

pp 181ndash198 2013

[16] Z Zhou E Y Du N L Thomas and E J Delp ldquoA new human

identification method Sclera recognitionrdquo IEEE Trans Syst Man

Cybern A Syst Humans vol 42 no 3 pp 571ndash583 May 2012

Page 45: Major project report on

them in continuous address This would meet requirement of coalesced

memory access in GPU

After reorganizing the structure of descriptors and adding mask information

into the new descriptor the computation on the mask file is not needed on

the GPU It was very fast to match with this feature because it does not

need to reregister the templates every time after shifting Thus the cost of

data transfer and computation on GPU will be reduced Matching on the

new descriptor the shift parameters generator in Figure 4 is then simplified

as Figure 9

23 EVOLUTION OF GPU ARCHITECTURE

The fixed-function pipeline lacked the generality to efficiently express

more complicated shading and lighting operations that are essential for

complex effects The key step was replacing the fixed-function per-vertex

and per-fragment operations with user-specified programs run on each

vertex and fragment Over the past six years these vertex programs and

fragment programs have become increasingly more capable with larger

limits on their size and resource consumption with more fully featured

instruction sets and with more flexible control-flow operations After many

years of separate instruction sets for vertex and fragment operations current

GPUs support the unified Shader Model 40 on both vertex and fragment

shaders

The hardware must support shader programs of at least 65 k static

instructions and unlimited dynamic instructions

The instruction set for the first time supports both 32-bit integers and 32-

bit floating-point numbers

The hardware must allow an arbitrary number of both direct and indirect

reads from global memory (texture)

Finally dynamic flow control in the form of loops and branches must be

supported

As the shader model has evolved and become more powerful and GPU

applications of all types have increased vertex and fragment program

complexity GPU architectures have increasingly focused on the

programmable parts of the graphics pipeline Indeed while previous

generations of GPUs could best be described as additions of

programmability to a fixed-function pipeline todayrsquos GPUs are better

characterized as a programmable engine surrounded by supporting fixed-

function units General-Purpose Computing on the GPU Mapping general-

purpose computation onto the GPU uses the graphics hardware in much the

same way as any standard graphics application Because of this similarity it

is both easier and more difficult to explain the process On one hand the

actual operations are the same and are easy to follow on the other hand the

terminology is different between graphics and general-purpose use Harris

provides an excellent description of this mapping process

We begin by describing GPU programming using graphics terminology

then show how the same steps are used in a general-purpose way to author

GPGPU applications and finally use the same steps to show the more

simple and direct way that todayrsquos GPU computing applications are written

231 PROGRAMMING A GPU FOR GRAPHICS

We begin with the same GPU pipeline that we described in Section II

concentrating on the programmable aspects of this pipeline

The programmer specifies geometry that covers a region on the screen

The rasterizer generates a fragment at each pixel location covered by that

geometry

Each fragment is shaded by the fragment program

The fragment program computes the value of the fragment by a

combination of math operations and global memory reads from a global

Btexture[ memory

The resulting image can then be used as texture on future passes through

the graphics pipeline

232 PROGRAMMING A GPU FOR GENERAL-PURPOSE

PROGRAMS (OLD)

Coopting this pipeline to perform general-purpose computation

involves the exact same steps but different terminology A motivating

example is a fluid simulation computed over a grid at each time step we

compute the next state of the fluid for each grid point from the current state

at its grid point and at the grid points of its neighbors

The programmer specifies a geometric primitive that covers a

computation domain of interest The rasterizer generates a fragment at each

pixel location covered by that geometry (In our example our primitive

must cover a grid of fragments equal to the domain size of our fluid

simulation)

Each fragment is shaded by an SPMD general-purpose fragment

program (Each grid point runs the same program to update the state of its

fluid)

The fragment program computes the value of the fragment by a

combination of math operations and Bgather[ accesses from global

memory (Each grid point can access the state of its neighbors from the

previous time step in computing its current value)

The resulting buffer in global memory can then be used as an input on

future passes (The current state of the fluid will be used on the next time

step)

23 2 PROGRAMMING A GPU FOR GENERAL-PURPOSE

PROGRAMS (NEW)

One of the historical difficulties in programming GPGPU applications

has been that despite their general-purpose tasksrsquo having nothing to do with

graphics the applications still had to be programmed using graphics APIs

In addition the program had to be structured in terms of the graphics

pipeline with the programmable units only accessible as an intermediate

step in that pipeline when the programmer would almost certainly prefer to

access the programmable units directly The programming environments we

describe in detail in Section IV are solving this difficulty by providing a

more natural direct non-graphics interface to the hardware and

specifically the programmable units Today GPU computing applications

are structured in the following way

The programmer directly defines the computation domain of interest as a

structured grid of threads

An SPMD general-purpose program computes the value of each thread

The value for each thread is computed by a combination of math

operations and both Bgather[ (read) accesses from and Bscatter[ (write)

accesses to global memory Unlike in the previous two

methods the same buffer can be used for both reading and writing

allowing more flexible algorithms (for example in-place algorithms that

use less memory)

The resulting buffer in global memory can then be used as an input in

future computation

24 COARSE-TO-FINE TWO-STAGE MATCHING PROCESS

To further improve the matching process we propose the coarse-to-fine

two-stage matching process In the first stage we matched two images

coarsely using the Y-shape descriptors which is very fast to match because

no registration was needed The matching result in this stage can help filter

out image pairs with low similarities After this step it is still possible for

some false positive matches In the second stage we used WPL descriptor

to register the two images for more detailed descriptor matching including

scale- and translation invariance This stage includes shift transform affine

matrix generation and final WPL descriptor matching Overall we

partitioned the registration and matching processing into four kernels2 in

CUDA (Figure 10) matching on the Y shape descriptor shift

transformation affine matrix generation and final WSL descriptor

matching Combining these two stages the matching program can run faster

and

achieve more accurate score

241 STAGE I MATCHING WITH Y SHAPE DESCRIPTOR

Due to scale- and rotation- invariance of the Y-shape features

registration is unnecessary before matching on Y shape descriptor The

whole matching algorithm is listed as algorithm 1

FIG

Here ytei and yta j are the Y shape descriptors of test template Tte

and target template Tta respectively dϕ is the Euclidian distance of angle

element of descriptors vector defined as (3) dxy is the Euclidian distance of

two descriptor centers defined as (4) ni and di are the matched descriptor

pairsrsquo number and their centers distance respectively tϕ is a distance

threshold and txy is the threshold to restrict the searching area We set tϕ to

30 and txy to 675 in our experiment Here

To match two sclera templates we searched the areas nearby to all

the Y shape branches The search area is limited to the corresponding left or

right half of the sclera in order to reduce the searching range and time The

distance of two branches is defined in (3) where ϕi j is the angle between

the j th branch and the polar from pupil center in desctiptor i

The number of matched pairs ni and the distance between Y shape

branches centers di are stored as the matching result We fuse the number of

matched branches and the average distance between matched branches

centers as (2) Here α is a factor to fuse the matching score which was set

to 30 in our study Ni and Nj is the total numbers of feature vectors in

template i and j separately The decision is regulated by the threshold t if

the sclerarsquos matching score is lower than t the sclera will be discarded The

sclera with high matching score will be passed to the next more precisely

matching process

242 STAGE II FINE MATCHING USING WPL DESCRIPTOR

The line segment WSL descriptor reveals more vessel structure detail of

sclera than the Y shape descriptor The variation of sclera vessel pattern is

nonlinear because

When acquiring an eye image in different gaze angle the vessel structure

will appear nonlinear shrink or extend because eyeball is spherical in shape

sclera is made up of four layers episclera stroma lamina fusca and

endothelium There are slightly differences among movement of these

layers Considering these factors our registration employed both single

shift transform and multi-parameter transform which combines shift

rotation and scale together

1) SHIFT PARAMETER SEARCH As we discussed before

segmentation may not be accurate As a result the detected iris center could

not be very accurate Shift transform is designed to tolerant possible errors

in pupil center detection in the segmentation step If there is no deformation

or only very minor deformation registration with shift transform together

would be adequate to achieve an accurate result We designed Algorithm 2

to get optimized shift parameter Where Tte is the test template and ssei is

the i th WPL descriptor of Tte Tta is the target template and ssai is the i th

WPL descriptor of Ttad(stek staj ) is Euclidean distance of escriptors stek

and staj

Δsk is the shift value of two descriptors defines as

We first randomly select an equal number of segment descriptors

stek in test template Tte from each quad and find its nearest neighbors staj _

in target template Tta The shift offset of them is recorded as the possible

registration shift factor _sk The final offset registration factor is _soptim

which has the smallest standard deviation among these candidate offsets

2) AFFINE TRANSFORM PARAMETER SEARCH

Affine transform is designed to tolerant some deformation of sclera

patterns in the matching step The affine transform algorithm is shown in

Algorithm 3 The shift value in the parameter set is obtained by randomly

selecting descriptor s(it )te and calculating the distance from its nearest

neighbor staj_ in Tta We transform the test template by the matrix in (7)

At end of the iteration we count the numbers of matched descriptor pairs

from the transformed template and the target template The factor β is

involved to determine if the pair of descriptor is matched and we set it to

be 20 pixels in our experiment After N iterations the optimized transform

parameter set is determined via selecting the maximum matching numbers

m(it) Here stei Tte staj and Tta is defined same as algorithm 2 tr (it )

shi f t θ(it )tr (it ) scale is the parameters of shift rotation and scale

transform generated in i tth iteration R(θ (it )) T (tr (it ) shi f t ) and S(tr (it

) scale) are the transform matrix defined as (7) To search optimize

transform parameter we iterated N times to generate these parameters In

our experiment we set iteration time to 512

3) REGISTRATION AND MATCHING ALGORITHM

Using the optimized parameter set determined from Algorithms 2

and 3 the test template will be registered and matched simultaneously The

registration and matching algorithm is listed in Algorithm 4 Here stei Tte

staj and Tta are defined same as Algorithms 2 and 3 θ(optm) tr (optm) shi f

t tr (optm) scale _soptim are the registration parameters attained from

Algorithms 2 and 3 R_θ(optm)_T _tr (optm) shi f t _S_tr (optm) scale

_ is the descriptor transform matrix defined in Algorithm 3 empty is the angle

between the segment descriptor and radius direction w is the weight of the

descriptor which indicates whether the descriptor is at the edge of sclera or

not To ensure that the nearest descriptors have a similar orientation we

used a constant factor α to check the abstract difference of two ɸ In our

experiment we set α to 5 The total matching score is minimal score of two

transformed result divided by the minimal matching score for test template

and target template

25 MAPPING THE SUBTASKS TO CUDA

CUDA is a single instruction multiple data (SIMD) system and

works as a coprocessor with a CPU A CUDA consists of many streaming

multiprocessors (SM) where the parallel part of the program should be

partitioned into threads by the programmer and mapped into those threads

There are multiple memory spaces in the CUDA memory hierarchy

register local memory shared memory global memory constant memory

and texture memory Register local memory and shared memory are on-

chip and could be a little time consuming to access these memories Only

shared memory can be accessed by other threads within the same block

However there is only limited availability of shared memory Global

memory constant memory and texture memory are off-chip memory and

accessible by all threads which would be very time consuming to access

these memories

Constant memory and texture memory are read-only and cacheable

memory Mapping algorithms to CUDA to achieve efficient processing is

not a trivial task There are several challenges in CUDA programming

If threads in a warp have different control path all the branches will be

executed serially To improve performance branch divergence within a

warp should be avoided

Global memory is slower than on-chip memory in term of access To

completely hide the latency of the small instructions set we should use on-

chip memory preferentially rather than global memory When global

memory access occurs threads in same warp should access the words in

sequence to achieve coalescence

Shared memory is much faster than the local and global memory space

But shared memory is organized into banks which are equal in size If two

addresses of memory request from different thread within a warp fall in the

same memory bank the access will be serialized To get maximum

performance memory requests should be scheduled to minimize bank

conflicts

251 MAPPING ALGORITHM TO BLOCKS

Because the proposed registration and matching algorithm has four

independent modules all the modules will be converted to different kernels

on the GPU These kernels are different in computation density thus we

map them to the GPU by various map strategies to fully utilize the

computing power of CUDA Figure 11 shows our scheme of CPU-GPU

task distribution and the partition among blocks and threads Algorithm 1 is

partitioned into coarse-grained parallel subtasks

We create a number of threads in this kernel The number of threads

is the same as the number of templates in the database As the upper middle

column shows in Figure 11 each target template will be assigned to one

thread One thread performs a pair of templates compare In our work we

use NVIDIA C2070 as our GPU Threads and blocks number is set to

1024 That means we can match our test template with up to 1024times1024

target templates at same time

Algorithms 2-4 will be partitioned into fine-grained subtasks which is

processed a section of descriptors in one thread As the lower portion of the

middle column shows in Figure 11 we assigned a target template to one

block Inside a block one thread corresponds a set of descriptors in this

template This partition makes every block execute independently and there

are no data exchange requirements between different blocks When all

threads complete their responding descriptor fractions the sum of the

intermediate results needs to be computed or compared A parallel prefix

sum algorithm is used to calculate the sum of intermediate results which is

show in right of Figure 11 Firstly all odd number threads compute the sum

of consecutive pairs of the results Then recursively every first of i (= 4 8

16 32 64 ) threads

compute the prefix sum on the new result The final result will be saved in

the first address which has the same variable name as the first intermediate

result

252 MAPPING INSIDE BLOCK

In shift argument searching there are two schemes we can choose to

map task

Mapping one pair of templates to all the threads in a block and then every

thread would take charge of a fraction of descriptors and cooperation with

other threads

Assigning a single possible shift offset to a thread and all the threads will

compute independently unless the final result should be compared with

other possible offset

Due to great number of sum and synchronization operations in every

nearest neighbor searching step we choose the second method to parallelize

shift searching In affine matrix generator we mapped an entire parameter

set searching to a thread and every thread randomly generated a set of

parameters and tried them independently The generated iterations were

assigned to all threads The challenge of this step is the randomly generated

numbers might be correlated among threads In the step of rotation and

scale registration generating we used the Mersenne Twister pseudorandom

number generator because it can use bitwise arithmetic and have long

period

The Mersenne twister as most of pseudorandom generators is iterative

Therefore itrsquos hard to parallelize a single twister state update step among

several execution threads To make sure that thousands of threads in the

launch grid generate uncorrelated random sequence many simultaneous

Mersenne twisters need to process with different initial states in parallel

But even ldquovery differentrdquo (by any definition) initial state values do not

prevent the emission of correlated sequences by each generator sharing

identical parameters To solve this problem and to enable efficient

implementation of Mersenne Twister on parallel architectures we used a

special offline tool for the dynamic creation of Mersenne Twisters

parameters modified from the algorithm developed by Makoto Matsumoto

and Takuji Nishimura In the registration and matching step when

searching the nearest neighbor a line segment that has already matched

with others should not be used again In our approach a flag

FIG

FIG

Variable denoting whether the line has been matched is stored in

shared memory To share the flags all the threads in a block should wait

synchronic operation at every query step Our solution is to use a single

thread in a block to process the matching

253 MEMORY MANAGEMENT

The bandwidth inside GPU board is much higher than the

bandwidth between host memory and device memory The data transfer

between host and device can lead to long latency As shown in Figure 11

we load the entire target templates set from database without considering

when they would be processed Therefore there was no data transfer from

host to device during the matching procedure In global memory the

components in descriptor y(ϕ1 ϕ2 ϕ3 x y) and s(x y rθ ϕw) were stored

separately This would guarantee contiguous kernels of Algorithm 2 to 4

can access their data in successive addresses Although such coalescing

access reduces the latency frequently global memory access was still a

slower way to get data In our kernel we loaded the test template to shared

memory to accelerate memory access Because the Algorithms 2 to 4

execute different number of iterations on same data the bank conflict does

not happen To maximize our texture memory space we set the system

cache to the lowest value and bonded our target descriptor to texture

memory Using this catchable memory our data access was accelerated

more

FIG

26 HISTOGRAM OF ORIENTED GRADIENTS

Histogram of oriented gradients is the feature descriptors It is primarily

applied to the design of target detection In this paper it is applied as the

feature for human recognition In the sclera region the vein patterns are the

edges of an image So HOG is used to determine the gradient orientation

and edge orientations of vein pattern in the sclera region of an eye image

To follow out this technique first of all divide the image into small

connected regions called cells For each cell compute the histogram of

gradient directions or edge orientations of the pixels Then the combination

of different histogram of different cell represents the descriptor To improve

accuracy histograms can be contrast normalized by calculating the intensity

from the block and then using this value normalizes all cells within the

block This normalization result shows that it is invariant to geometric and

photometric changes The gradient magnitude m(x y) and orientation 1050592(x

y) are calculated using x and y directions gradients dx (x y) and dy (x y)

Orientation binning is the second step of HOG This method utilized

to create cell histograms Each pixel within the cell used to give a weight to

the orientation which is found in the gradient computation Gradient

magnitude is used as the weight The cells are in the rectangular form The

binning of gradient orientation should be spread over 0 to 180 degrees and

opposite direction counts as the same In the Fig 8 depicts the edge

orientation of picture elements If the images have any illumination and

contrast changes then the gradient strength must be locally normalized For

that cells are grouped together into larger blocks These blocks are

overlapping so that each cell contributes more than once to the final

descriptor Here rectangular HOG (R-HOG) blocks are applied which are

mainly in square grids The performance of HOG is improved by putting

on a Gaussian window into each block

FIG

CHAPTER 3

SOFTWARE SPECIFICATION

31 GENERAL

MATLAB(matrix laboratory) is a numerical

computing environment and fourth-generation programming language

Developed by Math Works MATLAB allows matrix manipulations

plotting of functions and data implementation of algorithms creation

of user interfaces and interfacing with programs written in other languages

including C C++ Java and Fortran

Although MATLAB is intended primarily for numerical computing an

optional toolbox uses the MuPAD symbolic engine allowing access

to symbolic computing capabilities An additional package Simulink adds

graphicalmulti-domainsimulationandModel-Based

Design for dynamic and embedded systems

In 2004 MATLAB had around one million users across industry

and academia MATLAB users come from various backgrounds

of engineering science and economics MATLAB is widely used in

academic and research institutions as well as industrial enterprises

MATLAB was first adopted by researchers and practitioners

in control engineering Littles specialty but quickly spread to many other

domains It is now also used in education in particular the teaching

of linear algebra and numerical analysis and is popular amongst scientists

involved in image processing The MATLAB application is built around the

MATLAB language The simplest way to execute MATLAB code is to type

it in the Command Window which is one of the elements of the MATLAB

Desktop When code is entered in the Command Window MATLAB can

be used as an interactive mathematical shell Sequences of commands can

be saved in a text file typically using the MATLAB Editor as a script or

encapsulated into a function extending the commands available

MATLAB provides a number of features for documenting and

sharing your work You can integrate your MATLAB code with other

languages and applications and distribute your MATLAB algorithms and

applications

32 FEATURES OF MATLAB

High-level language for technical computing

Development environment for managing code files and data

Interactive tools for iterative exploration design and problem solving

Mathematical functions for linear algebra statistics Fourier analysis

filtering optimization and numerical integration

2-D and 3-D graphics functions for visualizing data

Tools for building custom graphical user interfaces

Functions for integrating MATLAB based algorithms with external

applications and languages such as C C++ FORTRAN Javatrade COM

and Microsoft Excel

MATLAB is used in vast area including signal and image

processing communications control design test and measurement

financial modeling and analysis and computational Add-on toolboxes

(collections of special-purpose MATLAB functions) extend the MATLAB

environment to solve particular classes of problems in these application

areas

MATLAB can be used on personal computers and powerful

server systems including the Cheaha compute cluster With the addition of

the Parallel Computing Toolbox the language can be extended with parallel

implementations for common computational functions including for-loop

unrolling Additionally this toolbox supports offloading computationally

intensive workloads to Cheaha the campus compute cluster MATLAB is

one of a few languages in which each variable is a matrix (broadly

construed) and knows how big it is Moreover the fundamental operators

(eg addition multiplication) are programmed to deal with matrices when

required And the MATLAB environment handles much of the bothersome

housekeeping that makes all this possible Since so many of the procedures

required for Macro-Investment Analysis involves matrices MATLAB

proves to be an extremely efficient language for both communication and

implementation

321 INTERFACING WITH OTHER LANGUAGES

MATLAB can call functions and subroutines written in the C

programming language or FORTRAN A wrapper function is created

allowing MATLAB data types to be passed and returned The dynamically

loadable object files created by compiling such functions are termed MEX-

files (for MATLAB executable)

Libraries written in Java ActiveX or NET can be directly called

from MATLAB and many MATLAB libraries (for

example XML or SQL support) are implemented as wrappers around Java

or ActiveX libraries Calling MATLAB from Java is more complicated but

can be done with MATLAB extension which is sold separately by Math

Works or using an undocumented mechanism called JMI (Java-to-Mat lab

Interface) which should not be confused with the unrelated Java that is also

called JMI

As alternatives to the MuPAD based Symbolic Math Toolbox

available from Math Works MATLAB can be connected

to Maple or Mathematical

Libraries also exist to import and export MathML

Development Environment

Startup Accelerator for faster MATLAB startup on Windows especially on

Windows XP and for network installations

Spreadsheet Import Tool that provides more options for selecting and

loading mixed textual and numeric data

Readability and navigation improvements to warning and error messages in

the MATLAB command window

Automatic variable and function renaming in the MATLAB Editor

Developing Algorithms and Applications

MATLAB provides a high-level language and development

tools that let you quickly develop and analyze your algorithms and

applications

The MATLAB Language

The MATLAB language supports the vector and matrix operations

that are fundamental to engineering and scientific problems It enables fast

development and execution With the MATLAB language you can

program and develop algorithms faster than with traditional languages

because you do not need to perform low-level administrative tasks such as

declaring variables specifying data types and allocating memory In many

cases MATLAB eliminates the need for lsquoforrsquo loops As a result one line of

MATLAB code can often replace several lines of C or C++ code

At the same time MATLAB provides all the features of a traditional

programming language including arithmetic operators flow control data

structures data types object-oriented programming (OOP) and debugging

features

MATLAB lets you execute commands or groups of commands one

at a time without compiling and linking enabling you to quickly iterate to

the optimal solution For fast execution of heavy matrix and vector

computations MATLAB uses processor-optimized libraries For general-

purpose scalar computations MATLAB generates machine-code

instructions using its JIT (Just-In-Time) compilation technology

This technology which is available on most platforms provides

execution speeds that rival those of traditional programming languages

Development Tools

MATLAB includes development tools that help you implement

your algorithm efficiently These include the following

MATLAB Editor

Provides standard editing and debugging features such as setting

breakpoints and single stepping

Code Analyzer

Checks your code for problems and recommends modifications to

maximize performance and maintainability

MATLAB Profiler

Records the time spent executing each line of code

Directory Reports

Scan all the files in a directory and report on code efficiency file

differences file dependencies and code coverage

Designing Graphical User Interfaces

By using the interactive tool GUIDE (Graphical User Interface

Development Environment) to layout design and edit user interfaces

GUIDE lets you include list boxes pull-down menus push buttons radio

buttons and sliders as well as MATLAB plots and Microsoft

ActiveXreg controls Alternatively you can create GUIs programmatically

using MATLAB functions

322 ANALYZING AND ACCESSING DATA

MATLAB supports the entire data analysis process from acquiring

data from external devices and databases through preprocessing

visualization and numerical analysis to producing presentation-quality

output

Data Analysis

MATLAB provides interactive tools and command-line functions for data

analysis operations including

Interpolating and decimating

Extracting sections of data scaling and averaging

Thresholding and smoothing

Correlation Fourier analysis and filtering

1-D peak valley and zero finding

Basic statistics and curve fitting

Matrix analysis

Data Access

MATLAB is an efficient platform for accessing data from

files other applications databases and external devices You can read data

from popular file formats such as Microsoft Excel ASCII text or binary

files image sound and video files and scientific files such as HDF and

HDF5 Low-level binary file IO functions let you work with data files in

any format Additional functions let you read data from Web pages and

XML

Visualizing Data

All the graphics features that are required to visualize engineering

and scientific data are available in MATLAB These include 2-D and 3-D

plotting functions 3-D volume visualization functions tools for

interactively creating plots and the ability to export results to all popular

graphics formats You can customize plots by adding multiple axes

changing line colors and markers adding annotation Latex equations and

legends and drawing shapes

2-D Plotting

Visualizing vectors of data with 2-D plotting functions that create

Line area bar and pie charts

Direction and velocity plots

Histograms

Polygons and surfaces

Scatterbubble plots

Animations

3-D Plotting and Volume Visualization

MATLAB provides functions for visualizing 2-D matrices 3-

D scalar and 3-D vector data You can use these functions to visualize and

understand large often complex multidimensional data Specifying plot

characteristics such as camera viewing angle perspective lighting effect

light source locations and transparency

3-D plotting functions include

Surface contour and mesh

Image plots

Cone slice stream and isosurface

323 PERFORMING NUMERIC COMPUTATION

MATLAB contains mathematical statistical and engineering

functions to support all common engineering and science operations These

functions developed by experts in mathematics are the foundation of the

MATLAB language The core math functions use the LAPACK and BLAS

linear algebra subroutine libraries and the FFTW Discrete Fourier

Transform library Because these processor-dependent libraries are

optimized to the different platforms that MATLAB supports they execute

faster than the equivalent C or C++ code

MATLAB provides the following types of functions for performing

mathematical operations and analyzing data

Matrix manipulation and linear algebra

Polynomials and interpolation

Fourier analysis and filtering

Data analysis and statistics

Optimization and numerical integration

Ordinary differential equations (ODEs)

Partial differential equations (PDEs)

Sparse matrix operations

MATLAB can perform arithmetic on a wide range of data types

including doubles singles and integers

CHAPTER 4

IMPLEMENTATION

41 GENERAL

Matlab is a program that was originally designed to simplify the

implementation of numerical linear algebra routines It has since grown into

something much bigger and it is used to implement numerical algorithms

for a wide range of applications The basic language used is very similar to

standard linear algebra notation but there are a few extensions that will

likely cause you some problems at first

42 SNAPSHOTS

ORIGINAL SCLERA IMAGE IS CONVERTED INTO GREY SCALE IMAGE

FIG

GREY SCALE IMAGE IS CONVERTED INTO BINARY IMAGE

FIG

EDGE DETECTON IS DONE BY OTSUrsquoS THRESHOLDING

FIG

SELECTING THE REGION OF INTEREST (SCLERA PART)

FIG

SELECTED ROI PART

FIG

FIG

ENHANCEMENT OF SCLERA IMAGE

FIG

FEATURE EXTRACTION OF SCLERA IMAGE USING GABOR

FILTERS

FIG

MATCHING WITH IMAGES IN DATABASE

FIG

DISPLAYING THE RESULT (MATCHED OR NOT MATCHED)

FIG

CHAPTER 5

APPLICATIONS

The applications of biometrics can be divided into the following three main groups

Commercial applications such as computer network login electronic data security ecommerce Internet access ATM credit card physical access control cellular phone PDA medical records management distance learning etc

Government applications such as national ID card correctional facility driverrsquos license social security welfare-disbursement border control Passport control etc

Forensic applications such as corpse identification criminal investigation terrorist identification parenthood determination missing children etc Traditionally commercial applications have used knowledge-based systems (eg PIN sand passwords) government applications have used token-based systems (eg ID cards and badges) and forensic applications have relied on human experts to match biometric features Biometric systems are being increasingly deployed in large scale civilian applications The Schiphol Premium scheme at the Amsterdam airport for example employs iris scan cards to speed up the passport and visa control procedures

CHAPTER 6

CONCLUSION AND FUTURE SCOPE

61 CONCLUSION

In this paper we proposed a new parallel sclera vein recognition

method which employees a two stage parallel approach for registration and

matching Even though the research focused on developing a parallel sclera

matching solution for the sequential line-descriptor method using CUDA

GPU architecture the parallel strategies developed in this research can be

applied to design parallel solutions to other sclera vein recognition methods

and general pattern recognition methods We designed the Y shape

descriptor to narrow the search range to increase the matching efficiency

which is a new feature extraction method to take advantage of the GPU

structures We developed the WPL descriptor to incorporate mask

information and make it more suitable for parallel computing which can

dramatically reduce data transferring and computation We then carefully

mapped our algorithms to GPU threads and blocks which is an important

step to achieve parallel computation efficiency using a GPU A work flow

which has high arithmetic intensity to hide the memory access latency was

designed to partition the computation task to the heterogeneous system of

CPU and GPU even to the threads in GPU The proposed method

dramatically improves the matching efficiency without compromising

recognition accuracy

62 REFERENCES

[1] C W Oyster The Human Eye Structure and Function Sunderland

Sinauer Associates 1999

[2] C Cuevas D Berjon F Moran and N Garcia ldquoMoving object

detection for real-time augmented reality applications in a GPGPUrdquo IEEE

Trans Consum Electron vol 58 no 1 pp 117ndash125 Feb 2012

[3] D C Cirean U Meier L M Gambardella and J Schmidhuber ldquoDeep

big simple neural nets for handwritten digit recognitionrdquo Neural Comput

vol 22 no 12 pp 3207ndash3220 2010

[4] F Z Sakr M Taher and A M Wahba ldquoHigh performance iris

recognition system on GPUrdquo in Proc ICCES 2011 pp 237ndash242

[5] G Poli J H Saito J F Mari and M R Zorzan ldquoProcessing

neocognitron of face recognition on high performance environment based

on GPU with CUDA architecturerdquo in Proc 20th Int Symp Comput

Archit High Perform Comput 2008 pp 81ndash88

[6] J Antikainen J Havel R Josth A Herout P Zemcik and M Hauta-

Kasari ldquoNonnegative tensor factorization accelerated using GPGPUrdquo IEEE

Trans Parallel Distrib Syst vol 22 no 7 pp 1135ndash1141 Feb 2011

[7] K-S Oh and K Jung ldquoGPU implementation of neural networksrdquo

Pattern Recognit vol 37 no 6 pp 1311ndash1314 2004

[8] P R Dixon T Oonishi and S Furui ldquoHarnessing graphics processors

for the fast computation of acoustic likelihoods in speech recognitionrdquo

Comput Speech Lang vol 23 no 4 pp 510ndash526 2009

[9] P Kaufman and A Alm ldquoClinical applicationrdquo Adlerrsquos Physiology of

the Eye 2003

[10] R N Rakvic B J Ulis R P Broussard R W Ives and N Steiner

ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics Security vol 4

no 4 pp 812ndash823 Dec 2009

[11] S Crihalmeanu and A Ross ldquoMultispectral scleral patterns for ocular

biometric recognitionrdquo Pattern Recognit Lett vol 33 no 14 pp 1860ndash

1869 Oct 2012

[12] W Wenying Z Dongming Z Yongdong L Jintao and G

Xiaoguang ldquoRobust spatial matching for object retrieval and its parallel

implementation on GPUrdquo IEEE Trans Multimedia vol 13 no 6 pp

1308ndash1318 Dec 2011Multimedia Sec Magdeburg Germany Sep 2004

pp 4ndash15

[13] Y Xu S Deka and R Righetti ldquoA hybrid CPU-GPGPU approach for

real-time elastographyrdquo IEEE Trans Ultrason Ferroelectr Freq Control

vol 58 no 12 pp 2631ndash2645 Dec 2011

[14] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive

multimodal eye recognitionrdquo Signal Image Video Process vol 7 no 4

pp 619ndash631 Jul 2013

[15] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive

approach for sclera image quality measurerdquo Int J Biometrics vol 5 no 2

pp 181ndash198 2013

[16] Z Zhou E Y Du N L Thomas and E J Delp ldquoA new human

identification method Sclera recognitionrdquo IEEE Trans Syst Man

Cybern A Syst Humans vol 42 no 3 pp 571ndash583 May 2012

Page 46: Major project report on

programmable parts of the graphics pipeline Indeed while previous

generations of GPUs could best be described as additions of

programmability to a fixed-function pipeline todayrsquos GPUs are better

characterized as a programmable engine surrounded by supporting fixed-

function units General-Purpose Computing on the GPU Mapping general-

purpose computation onto the GPU uses the graphics hardware in much the

same way as any standard graphics application Because of this similarity it

is both easier and more difficult to explain the process On one hand the

actual operations are the same and are easy to follow on the other hand the

terminology is different between graphics and general-purpose use Harris

provides an excellent description of this mapping process

We begin by describing GPU programming using graphics terminology

then show how the same steps are used in a general-purpose way to author

GPGPU applications and finally use the same steps to show the more

simple and direct way that todayrsquos GPU computing applications are written

231 PROGRAMMING A GPU FOR GRAPHICS

We begin with the same GPU pipeline that we described in Section II

concentrating on the programmable aspects of this pipeline

The programmer specifies geometry that covers a region on the screen

The rasterizer generates a fragment at each pixel location covered by that

geometry

Each fragment is shaded by the fragment program

The fragment program computes the value of the fragment by a

combination of math operations and global memory reads from a global

Btexture[ memory

The resulting image can then be used as texture on future passes through

the graphics pipeline

232 PROGRAMMING A GPU FOR GENERAL-PURPOSE

PROGRAMS (OLD)

Coopting this pipeline to perform general-purpose computation

involves the exact same steps but different terminology A motivating

example is a fluid simulation computed over a grid at each time step we

compute the next state of the fluid for each grid point from the current state

at its grid point and at the grid points of its neighbors

The programmer specifies a geometric primitive that covers a

computation domain of interest The rasterizer generates a fragment at each

pixel location covered by that geometry (In our example our primitive

must cover a grid of fragments equal to the domain size of our fluid

simulation)

Each fragment is shaded by an SPMD general-purpose fragment

program (Each grid point runs the same program to update the state of its

fluid)

The fragment program computes the value of the fragment by a

combination of math operations and Bgather[ accesses from global

memory (Each grid point can access the state of its neighbors from the

previous time step in computing its current value)

The resulting buffer in global memory can then be used as an input on

future passes (The current state of the fluid will be used on the next time

step)

23 2 PROGRAMMING A GPU FOR GENERAL-PURPOSE

PROGRAMS (NEW)

One of the historical difficulties in programming GPGPU applications

has been that despite their general-purpose tasksrsquo having nothing to do with

graphics the applications still had to be programmed using graphics APIs

In addition the program had to be structured in terms of the graphics

pipeline with the programmable units only accessible as an intermediate

step in that pipeline when the programmer would almost certainly prefer to

access the programmable units directly The programming environments we

describe in detail in Section IV are solving this difficulty by providing a

more natural direct non-graphics interface to the hardware and

specifically the programmable units Today GPU computing applications

are structured in the following way

The programmer directly defines the computation domain of interest as a

structured grid of threads

An SPMD general-purpose program computes the value of each thread

The value for each thread is computed by a combination of math

operations and both Bgather[ (read) accesses from and Bscatter[ (write)

accesses to global memory Unlike in the previous two

methods the same buffer can be used for both reading and writing

allowing more flexible algorithms (for example in-place algorithms that

use less memory)

The resulting buffer in global memory can then be used as an input in

future computation

24 COARSE-TO-FINE TWO-STAGE MATCHING PROCESS

To further improve the matching process we propose the coarse-to-fine

two-stage matching process In the first stage we matched two images

coarsely using the Y-shape descriptors which is very fast to match because

no registration was needed The matching result in this stage can help filter

out image pairs with low similarities After this step it is still possible for

some false positive matches In the second stage we used WPL descriptor

to register the two images for more detailed descriptor matching including

scale- and translation invariance This stage includes shift transform affine

matrix generation and final WPL descriptor matching Overall we

partitioned the registration and matching processing into four kernels2 in

CUDA (Figure 10) matching on the Y shape descriptor shift

transformation affine matrix generation and final WSL descriptor

matching Combining these two stages the matching program can run faster

and

achieve more accurate score

241 STAGE I MATCHING WITH Y SHAPE DESCRIPTOR

Due to scale- and rotation- invariance of the Y-shape features

registration is unnecessary before matching on Y shape descriptor The

whole matching algorithm is listed as algorithm 1

FIG

Here ytei and yta j are the Y shape descriptors of test template Tte

and target template Tta respectively dϕ is the Euclidian distance of angle

element of descriptors vector defined as (3) dxy is the Euclidian distance of

two descriptor centers defined as (4) ni and di are the matched descriptor

pairsrsquo number and their centers distance respectively tϕ is a distance

threshold and txy is the threshold to restrict the searching area We set tϕ to

30 and txy to 675 in our experiment Here

To match two sclera templates we searched the areas nearby to all

the Y shape branches The search area is limited to the corresponding left or

right half of the sclera in order to reduce the searching range and time The

distance of two branches is defined in (3) where ϕi j is the angle between

the j th branch and the polar from pupil center in desctiptor i

The number of matched pairs ni and the distance between Y shape

branches centers di are stored as the matching result We fuse the number of

matched branches and the average distance between matched branches

centers as (2) Here α is a factor to fuse the matching score which was set

to 30 in our study Ni and Nj is the total numbers of feature vectors in

template i and j separately The decision is regulated by the threshold t if

the sclerarsquos matching score is lower than t the sclera will be discarded The

sclera with high matching score will be passed to the next more precisely

matching process

242 STAGE II FINE MATCHING USING WPL DESCRIPTOR

The line segment WSL descriptor reveals more vessel structure detail of

sclera than the Y shape descriptor The variation of sclera vessel pattern is

nonlinear because

When acquiring an eye image in different gaze angle the vessel structure

will appear nonlinear shrink or extend because eyeball is spherical in shape

sclera is made up of four layers episclera stroma lamina fusca and

endothelium There are slightly differences among movement of these

layers Considering these factors our registration employed both single

shift transform and multi-parameter transform which combines shift

rotation and scale together

1) SHIFT PARAMETER SEARCH As we discussed before

segmentation may not be accurate As a result the detected iris center could

not be very accurate Shift transform is designed to tolerant possible errors

in pupil center detection in the segmentation step If there is no deformation

or only very minor deformation registration with shift transform together

would be adequate to achieve an accurate result We designed Algorithm 2

to get optimized shift parameter Where Tte is the test template and ssei is

the i th WPL descriptor of Tte Tta is the target template and ssai is the i th

WPL descriptor of Ttad(stek staj ) is Euclidean distance of escriptors stek

and staj

Δsk is the shift value of two descriptors defines as

We first randomly select an equal number of segment descriptors

stek in test template Tte from each quad and find its nearest neighbors staj _

in target template Tta The shift offset of them is recorded as the possible

registration shift factor _sk The final offset registration factor is _soptim

which has the smallest standard deviation among these candidate offsets

2) AFFINE TRANSFORM PARAMETER SEARCH

Affine transform is designed to tolerant some deformation of sclera

patterns in the matching step The affine transform algorithm is shown in

Algorithm 3 The shift value in the parameter set is obtained by randomly

selecting descriptor s(it )te and calculating the distance from its nearest

neighbor staj_ in Tta We transform the test template by the matrix in (7)

At end of the iteration we count the numbers of matched descriptor pairs

from the transformed template and the target template The factor β is

involved to determine if the pair of descriptor is matched and we set it to

be 20 pixels in our experiment After N iterations the optimized transform

parameter set is determined via selecting the maximum matching numbers

m(it) Here stei Tte staj and Tta is defined same as algorithm 2 tr (it )

shi f t θ(it )tr (it ) scale is the parameters of shift rotation and scale

transform generated in i tth iteration R(θ (it )) T (tr (it ) shi f t ) and S(tr (it

) scale) are the transform matrix defined as (7) To search optimize

transform parameter we iterated N times to generate these parameters In

our experiment we set iteration time to 512

3) REGISTRATION AND MATCHING ALGORITHM

Using the optimized parameter set determined from Algorithms 2

and 3 the test template will be registered and matched simultaneously The

registration and matching algorithm is listed in Algorithm 4 Here stei Tte

staj and Tta are defined same as Algorithms 2 and 3 θ(optm) tr (optm) shi f

t tr (optm) scale _soptim are the registration parameters attained from

Algorithms 2 and 3 R_θ(optm)_T _tr (optm) shi f t _S_tr (optm) scale

_ is the descriptor transform matrix defined in Algorithm 3 empty is the angle

between the segment descriptor and radius direction w is the weight of the

descriptor which indicates whether the descriptor is at the edge of sclera or

not To ensure that the nearest descriptors have a similar orientation we

used a constant factor α to check the abstract difference of two ɸ In our

experiment we set α to 5 The total matching score is minimal score of two

transformed result divided by the minimal matching score for test template

and target template

25 MAPPING THE SUBTASKS TO CUDA

CUDA is a single instruction multiple data (SIMD) system and

works as a coprocessor with a CPU A CUDA consists of many streaming

multiprocessors (SM) where the parallel part of the program should be

partitioned into threads by the programmer and mapped into those threads

There are multiple memory spaces in the CUDA memory hierarchy

register local memory shared memory global memory constant memory

and texture memory Register local memory and shared memory are on-

chip and could be a little time consuming to access these memories Only

shared memory can be accessed by other threads within the same block

However there is only limited availability of shared memory Global

memory constant memory and texture memory are off-chip memory and

accessible by all threads which would be very time consuming to access

these memories

Constant memory and texture memory are read-only and cacheable

memory Mapping algorithms to CUDA to achieve efficient processing is

not a trivial task There are several challenges in CUDA programming

If threads in a warp have different control path all the branches will be

executed serially To improve performance branch divergence within a

warp should be avoided

Global memory is slower than on-chip memory in term of access To

completely hide the latency of the small instructions set we should use on-

chip memory preferentially rather than global memory When global

memory access occurs threads in same warp should access the words in

sequence to achieve coalescence

Shared memory is much faster than the local and global memory space

But shared memory is organized into banks which are equal in size If two

addresses of memory request from different thread within a warp fall in the

same memory bank the access will be serialized To get maximum

performance memory requests should be scheduled to minimize bank

conflicts

251 MAPPING ALGORITHM TO BLOCKS

Because the proposed registration and matching algorithm has four

independent modules all the modules will be converted to different kernels

on the GPU These kernels are different in computation density thus we

map them to the GPU by various map strategies to fully utilize the

computing power of CUDA Figure 11 shows our scheme of CPU-GPU

task distribution and the partition among blocks and threads Algorithm 1 is

partitioned into coarse-grained parallel subtasks

We create a number of threads in this kernel The number of threads

is the same as the number of templates in the database As the upper middle

column shows in Figure 11 each target template will be assigned to one

thread One thread performs a pair of templates compare In our work we

use NVIDIA C2070 as our GPU Threads and blocks number is set to

1024 That means we can match our test template with up to 1024times1024

target templates at same time

Algorithms 2-4 will be partitioned into fine-grained subtasks which is

processed a section of descriptors in one thread As the lower portion of the

middle column shows in Figure 11 we assigned a target template to one

block Inside a block one thread corresponds a set of descriptors in this

template This partition makes every block execute independently and there

are no data exchange requirements between different blocks When all

threads complete their responding descriptor fractions the sum of the

intermediate results needs to be computed or compared A parallel prefix

sum algorithm is used to calculate the sum of intermediate results which is

show in right of Figure 11 Firstly all odd number threads compute the sum

of consecutive pairs of the results Then recursively every first of i (= 4 8

16 32 64 ) threads

compute the prefix sum on the new result The final result will be saved in

the first address which has the same variable name as the first intermediate

result

252 MAPPING INSIDE BLOCK

In shift argument searching there are two schemes we can choose to

map task

Mapping one pair of templates to all the threads in a block and then every

thread would take charge of a fraction of descriptors and cooperation with

other threads

Assigning a single possible shift offset to a thread and all the threads will

compute independently unless the final result should be compared with

other possible offset

Due to great number of sum and synchronization operations in every

nearest neighbor searching step we choose the second method to parallelize

shift searching In affine matrix generator we mapped an entire parameter

set searching to a thread and every thread randomly generated a set of

parameters and tried them independently The generated iterations were

assigned to all threads The challenge of this step is the randomly generated

numbers might be correlated among threads In the step of rotation and

scale registration generating we used the Mersenne Twister pseudorandom

number generator because it can use bitwise arithmetic and have long

period

The Mersenne twister as most of pseudorandom generators is iterative

Therefore itrsquos hard to parallelize a single twister state update step among

several execution threads To make sure that thousands of threads in the

launch grid generate uncorrelated random sequence many simultaneous

Mersenne twisters need to process with different initial states in parallel

But even ldquovery differentrdquo (by any definition) initial state values do not

prevent the emission of correlated sequences by each generator sharing

identical parameters To solve this problem and to enable efficient

implementation of Mersenne Twister on parallel architectures we used a

special offline tool for the dynamic creation of Mersenne Twisters

parameters modified from the algorithm developed by Makoto Matsumoto

and Takuji Nishimura In the registration and matching step when

searching the nearest neighbor a line segment that has already matched

with others should not be used again In our approach a flag

FIG

FIG

Variable denoting whether the line has been matched is stored in

shared memory To share the flags all the threads in a block should wait

synchronic operation at every query step Our solution is to use a single

thread in a block to process the matching

253 MEMORY MANAGEMENT

The bandwidth inside GPU board is much higher than the

bandwidth between host memory and device memory The data transfer

between host and device can lead to long latency As shown in Figure 11

we load the entire target templates set from database without considering

when they would be processed Therefore there was no data transfer from

host to device during the matching procedure In global memory the

components in descriptor y(ϕ1 ϕ2 ϕ3 x y) and s(x y rθ ϕw) were stored

separately This would guarantee contiguous kernels of Algorithm 2 to 4

can access their data in successive addresses Although such coalescing

access reduces the latency frequently global memory access was still a

slower way to get data In our kernel we loaded the test template to shared

memory to accelerate memory access Because the Algorithms 2 to 4

execute different number of iterations on same data the bank conflict does

not happen To maximize our texture memory space we set the system

cache to the lowest value and bonded our target descriptor to texture

memory Using this catchable memory our data access was accelerated

more

FIG

26 HISTOGRAM OF ORIENTED GRADIENTS

Histogram of oriented gradients is the feature descriptors It is primarily

applied to the design of target detection In this paper it is applied as the

feature for human recognition In the sclera region the vein patterns are the

edges of an image So HOG is used to determine the gradient orientation

and edge orientations of vein pattern in the sclera region of an eye image

To follow out this technique first of all divide the image into small

connected regions called cells For each cell compute the histogram of

gradient directions or edge orientations of the pixels Then the combination

of different histogram of different cell represents the descriptor To improve

accuracy histograms can be contrast normalized by calculating the intensity

from the block and then using this value normalizes all cells within the

block This normalization result shows that it is invariant to geometric and

photometric changes The gradient magnitude m(x y) and orientation 1050592(x

y) are calculated using x and y directions gradients dx (x y) and dy (x y)

Orientation binning is the second step of HOG This method utilized

to create cell histograms Each pixel within the cell used to give a weight to

the orientation which is found in the gradient computation Gradient

magnitude is used as the weight The cells are in the rectangular form The

binning of gradient orientation should be spread over 0 to 180 degrees and

opposite direction counts as the same In the Fig 8 depicts the edge

orientation of picture elements If the images have any illumination and

contrast changes then the gradient strength must be locally normalized For

that cells are grouped together into larger blocks These blocks are

overlapping so that each cell contributes more than once to the final

descriptor Here rectangular HOG (R-HOG) blocks are applied which are

mainly in square grids The performance of HOG is improved by putting

on a Gaussian window into each block

FIG

CHAPTER 3

SOFTWARE SPECIFICATION

31 GENERAL

MATLAB(matrix laboratory) is a numerical

computing environment and fourth-generation programming language

Developed by Math Works MATLAB allows matrix manipulations

plotting of functions and data implementation of algorithms creation

of user interfaces and interfacing with programs written in other languages

including C C++ Java and Fortran

Although MATLAB is intended primarily for numerical computing an

optional toolbox uses the MuPAD symbolic engine allowing access

to symbolic computing capabilities An additional package Simulink adds

graphicalmulti-domainsimulationandModel-Based

Design for dynamic and embedded systems

In 2004 MATLAB had around one million users across industry

and academia MATLAB users come from various backgrounds

of engineering science and economics MATLAB is widely used in

academic and research institutions as well as industrial enterprises

MATLAB was first adopted by researchers and practitioners

in control engineering Littles specialty but quickly spread to many other

domains It is now also used in education in particular the teaching

of linear algebra and numerical analysis and is popular amongst scientists

involved in image processing The MATLAB application is built around the

MATLAB language The simplest way to execute MATLAB code is to type

it in the Command Window which is one of the elements of the MATLAB

Desktop When code is entered in the Command Window MATLAB can

be used as an interactive mathematical shell Sequences of commands can

be saved in a text file typically using the MATLAB Editor as a script or

encapsulated into a function extending the commands available

MATLAB provides a number of features for documenting and

sharing your work You can integrate your MATLAB code with other

languages and applications and distribute your MATLAB algorithms and

applications

32 FEATURES OF MATLAB

High-level language for technical computing

Development environment for managing code files and data

Interactive tools for iterative exploration design and problem solving

Mathematical functions for linear algebra statistics Fourier analysis

filtering optimization and numerical integration

2-D and 3-D graphics functions for visualizing data

Tools for building custom graphical user interfaces

Functions for integrating MATLAB based algorithms with external

applications and languages such as C C++ FORTRAN Javatrade COM

and Microsoft Excel

MATLAB is used in vast area including signal and image

processing communications control design test and measurement

financial modeling and analysis and computational Add-on toolboxes

(collections of special-purpose MATLAB functions) extend the MATLAB

environment to solve particular classes of problems in these application

areas

MATLAB can be used on personal computers and powerful

server systems including the Cheaha compute cluster With the addition of

the Parallel Computing Toolbox the language can be extended with parallel

implementations for common computational functions including for-loop

unrolling Additionally this toolbox supports offloading computationally

intensive workloads to Cheaha the campus compute cluster MATLAB is

one of a few languages in which each variable is a matrix (broadly

construed) and knows how big it is Moreover the fundamental operators

(eg addition multiplication) are programmed to deal with matrices when

required And the MATLAB environment handles much of the bothersome

housekeeping that makes all this possible Since so many of the procedures

required for Macro-Investment Analysis involves matrices MATLAB

proves to be an extremely efficient language for both communication and

implementation

321 INTERFACING WITH OTHER LANGUAGES

MATLAB can call functions and subroutines written in the C

programming language or FORTRAN A wrapper function is created

allowing MATLAB data types to be passed and returned The dynamically

loadable object files created by compiling such functions are termed MEX-

files (for MATLAB executable)

Libraries written in Java ActiveX or NET can be directly called

from MATLAB and many MATLAB libraries (for

example XML or SQL support) are implemented as wrappers around Java

or ActiveX libraries Calling MATLAB from Java is more complicated but

can be done with MATLAB extension which is sold separately by Math

Works or using an undocumented mechanism called JMI (Java-to-Mat lab

Interface) which should not be confused with the unrelated Java that is also

called JMI

As alternatives to the MuPAD based Symbolic Math Toolbox

available from Math Works MATLAB can be connected

to Maple or Mathematical

Libraries also exist to import and export MathML

Development Environment

Startup Accelerator for faster MATLAB startup on Windows especially on

Windows XP and for network installations

Spreadsheet Import Tool that provides more options for selecting and

loading mixed textual and numeric data

Readability and navigation improvements to warning and error messages in

the MATLAB command window

Automatic variable and function renaming in the MATLAB Editor

Developing Algorithms and Applications

MATLAB provides a high-level language and development

tools that let you quickly develop and analyze your algorithms and

applications

The MATLAB Language

The MATLAB language supports the vector and matrix operations

that are fundamental to engineering and scientific problems It enables fast

development and execution With the MATLAB language you can

program and develop algorithms faster than with traditional languages

because you do not need to perform low-level administrative tasks such as

declaring variables specifying data types and allocating memory In many

cases MATLAB eliminates the need for lsquoforrsquo loops As a result one line of

MATLAB code can often replace several lines of C or C++ code

At the same time MATLAB provides all the features of a traditional

programming language including arithmetic operators flow control data

structures data types object-oriented programming (OOP) and debugging

features

MATLAB lets you execute commands or groups of commands one

at a time without compiling and linking enabling you to quickly iterate to

the optimal solution For fast execution of heavy matrix and vector

computations MATLAB uses processor-optimized libraries For general-

purpose scalar computations MATLAB generates machine-code

instructions using its JIT (Just-In-Time) compilation technology

This technology which is available on most platforms provides

execution speeds that rival those of traditional programming languages

Development Tools

MATLAB includes development tools that help you implement

your algorithm efficiently These include the following

MATLAB Editor

Provides standard editing and debugging features such as setting

breakpoints and single stepping

Code Analyzer

Checks your code for problems and recommends modifications to

maximize performance and maintainability

MATLAB Profiler

Records the time spent executing each line of code

Directory Reports

Scan all the files in a directory and report on code efficiency file

differences file dependencies and code coverage

Designing Graphical User Interfaces

By using the interactive tool GUIDE (Graphical User Interface

Development Environment) to layout design and edit user interfaces

GUIDE lets you include list boxes pull-down menus push buttons radio

buttons and sliders as well as MATLAB plots and Microsoft

ActiveXreg controls Alternatively you can create GUIs programmatically

using MATLAB functions

322 ANALYZING AND ACCESSING DATA

MATLAB supports the entire data analysis process from acquiring

data from external devices and databases through preprocessing

visualization and numerical analysis to producing presentation-quality

output

Data Analysis

MATLAB provides interactive tools and command-line functions for data

analysis operations including

Interpolating and decimating

Extracting sections of data scaling and averaging

Thresholding and smoothing

Correlation Fourier analysis and filtering

1-D peak valley and zero finding

Basic statistics and curve fitting

Matrix analysis

Data Access

MATLAB is an efficient platform for accessing data from

files other applications databases and external devices You can read data

from popular file formats such as Microsoft Excel ASCII text or binary

files image sound and video files and scientific files such as HDF and

HDF5 Low-level binary file IO functions let you work with data files in

any format Additional functions let you read data from Web pages and

XML

Visualizing Data

All the graphics features that are required to visualize engineering

and scientific data are available in MATLAB These include 2-D and 3-D

plotting functions 3-D volume visualization functions tools for

interactively creating plots and the ability to export results to all popular

graphics formats You can customize plots by adding multiple axes

changing line colors and markers adding annotation Latex equations and

legends and drawing shapes

2-D Plotting

Visualizing vectors of data with 2-D plotting functions that create

Line area bar and pie charts

Direction and velocity plots

Histograms

Polygons and surfaces

Scatterbubble plots

Animations

3-D Plotting and Volume Visualization

MATLAB provides functions for visualizing 2-D matrices 3-

D scalar and 3-D vector data You can use these functions to visualize and

understand large often complex multidimensional data Specifying plot

characteristics such as camera viewing angle perspective lighting effect

light source locations and transparency

3-D plotting functions include

Surface contour and mesh

Image plots

Cone slice stream and isosurface

323 PERFORMING NUMERIC COMPUTATION

MATLAB contains mathematical statistical and engineering

functions to support all common engineering and science operations These

functions developed by experts in mathematics are the foundation of the

MATLAB language The core math functions use the LAPACK and BLAS

linear algebra subroutine libraries and the FFTW Discrete Fourier

Transform library Because these processor-dependent libraries are

optimized to the different platforms that MATLAB supports they execute

faster than the equivalent C or C++ code

MATLAB provides the following types of functions for performing

mathematical operations and analyzing data

Matrix manipulation and linear algebra

Polynomials and interpolation

Fourier analysis and filtering

Data analysis and statistics

Optimization and numerical integration

Ordinary differential equations (ODEs)

Partial differential equations (PDEs)

Sparse matrix operations

MATLAB can perform arithmetic on a wide range of data types

including doubles singles and integers

CHAPTER 4

IMPLEMENTATION

41 GENERAL

Matlab is a program that was originally designed to simplify the

implementation of numerical linear algebra routines It has since grown into

something much bigger and it is used to implement numerical algorithms

for a wide range of applications The basic language used is very similar to

standard linear algebra notation but there are a few extensions that will

likely cause you some problems at first

42 SNAPSHOTS

ORIGINAL SCLERA IMAGE IS CONVERTED INTO GREY SCALE IMAGE

FIG

GREY SCALE IMAGE IS CONVERTED INTO BINARY IMAGE

FIG

EDGE DETECTON IS DONE BY OTSUrsquoS THRESHOLDING

FIG

SELECTING THE REGION OF INTEREST (SCLERA PART)

FIG

SELECTED ROI PART

FIG

FIG

ENHANCEMENT OF SCLERA IMAGE

FIG

FEATURE EXTRACTION OF SCLERA IMAGE USING GABOR

FILTERS

FIG

MATCHING WITH IMAGES IN DATABASE

FIG

DISPLAYING THE RESULT (MATCHED OR NOT MATCHED)

FIG

CHAPTER 5

APPLICATIONS

The applications of biometrics can be divided into the following three main groups

Commercial applications such as computer network login electronic data security ecommerce Internet access ATM credit card physical access control cellular phone PDA medical records management distance learning etc

Government applications such as national ID card correctional facility driverrsquos license social security welfare-disbursement border control Passport control etc

Forensic applications such as corpse identification criminal investigation terrorist identification parenthood determination missing children etc Traditionally commercial applications have used knowledge-based systems (eg PIN sand passwords) government applications have used token-based systems (eg ID cards and badges) and forensic applications have relied on human experts to match biometric features Biometric systems are being increasingly deployed in large scale civilian applications The Schiphol Premium scheme at the Amsterdam airport for example employs iris scan cards to speed up the passport and visa control procedures

CHAPTER 6

CONCLUSION AND FUTURE SCOPE

61 CONCLUSION

In this paper we proposed a new parallel sclera vein recognition

method which employees a two stage parallel approach for registration and

matching Even though the research focused on developing a parallel sclera

matching solution for the sequential line-descriptor method using CUDA

GPU architecture the parallel strategies developed in this research can be

applied to design parallel solutions to other sclera vein recognition methods

and general pattern recognition methods We designed the Y shape

descriptor to narrow the search range to increase the matching efficiency

which is a new feature extraction method to take advantage of the GPU

structures We developed the WPL descriptor to incorporate mask

information and make it more suitable for parallel computing which can

dramatically reduce data transferring and computation We then carefully

mapped our algorithms to GPU threads and blocks which is an important

step to achieve parallel computation efficiency using a GPU A work flow

which has high arithmetic intensity to hide the memory access latency was

designed to partition the computation task to the heterogeneous system of

CPU and GPU even to the threads in GPU The proposed method

dramatically improves the matching efficiency without compromising

recognition accuracy

62 REFERENCES

[1] C W Oyster The Human Eye Structure and Function Sunderland

Sinauer Associates 1999

[2] C Cuevas D Berjon F Moran and N Garcia ldquoMoving object

detection for real-time augmented reality applications in a GPGPUrdquo IEEE

Trans Consum Electron vol 58 no 1 pp 117ndash125 Feb 2012

[3] D C Cirean U Meier L M Gambardella and J Schmidhuber ldquoDeep

big simple neural nets for handwritten digit recognitionrdquo Neural Comput

vol 22 no 12 pp 3207ndash3220 2010

[4] F Z Sakr M Taher and A M Wahba ldquoHigh performance iris

recognition system on GPUrdquo in Proc ICCES 2011 pp 237ndash242

[5] G Poli J H Saito J F Mari and M R Zorzan ldquoProcessing

neocognitron of face recognition on high performance environment based

on GPU with CUDA architecturerdquo in Proc 20th Int Symp Comput

Archit High Perform Comput 2008 pp 81ndash88

[6] J Antikainen J Havel R Josth A Herout P Zemcik and M Hauta-

Kasari ldquoNonnegative tensor factorization accelerated using GPGPUrdquo IEEE

Trans Parallel Distrib Syst vol 22 no 7 pp 1135ndash1141 Feb 2011

[7] K-S Oh and K Jung ldquoGPU implementation of neural networksrdquo

Pattern Recognit vol 37 no 6 pp 1311ndash1314 2004

[8] P R Dixon T Oonishi and S Furui ldquoHarnessing graphics processors

for the fast computation of acoustic likelihoods in speech recognitionrdquo

Comput Speech Lang vol 23 no 4 pp 510ndash526 2009

[9] P Kaufman and A Alm ldquoClinical applicationrdquo Adlerrsquos Physiology of

the Eye 2003

[10] R N Rakvic B J Ulis R P Broussard R W Ives and N Steiner

ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics Security vol 4

no 4 pp 812ndash823 Dec 2009

[11] S Crihalmeanu and A Ross ldquoMultispectral scleral patterns for ocular

biometric recognitionrdquo Pattern Recognit Lett vol 33 no 14 pp 1860ndash

1869 Oct 2012

[12] W Wenying Z Dongming Z Yongdong L Jintao and G

Xiaoguang ldquoRobust spatial matching for object retrieval and its parallel

implementation on GPUrdquo IEEE Trans Multimedia vol 13 no 6 pp

1308ndash1318 Dec 2011Multimedia Sec Magdeburg Germany Sep 2004

pp 4ndash15

[13] Y Xu S Deka and R Righetti ldquoA hybrid CPU-GPGPU approach for

real-time elastographyrdquo IEEE Trans Ultrason Ferroelectr Freq Control

vol 58 no 12 pp 2631ndash2645 Dec 2011

[14] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive

multimodal eye recognitionrdquo Signal Image Video Process vol 7 no 4

pp 619ndash631 Jul 2013

[15] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive

approach for sclera image quality measurerdquo Int J Biometrics vol 5 no 2

pp 181ndash198 2013

[16] Z Zhou E Y Du N L Thomas and E J Delp ldquoA new human

identification method Sclera recognitionrdquo IEEE Trans Syst Man

Cybern A Syst Humans vol 42 no 3 pp 571ndash583 May 2012

Page 47: Major project report on

example is a fluid simulation computed over a grid at each time step we

compute the next state of the fluid for each grid point from the current state

at its grid point and at the grid points of its neighbors

The programmer specifies a geometric primitive that covers a

computation domain of interest The rasterizer generates a fragment at each

pixel location covered by that geometry (In our example our primitive

must cover a grid of fragments equal to the domain size of our fluid

simulation)

Each fragment is shaded by an SPMD general-purpose fragment

program (Each grid point runs the same program to update the state of its

fluid)

The fragment program computes the value of the fragment by a

combination of math operations and Bgather[ accesses from global

memory (Each grid point can access the state of its neighbors from the

previous time step in computing its current value)

The resulting buffer in global memory can then be used as an input on

future passes (The current state of the fluid will be used on the next time

step)

23 2 PROGRAMMING A GPU FOR GENERAL-PURPOSE

PROGRAMS (NEW)

One of the historical difficulties in programming GPGPU applications

has been that despite their general-purpose tasksrsquo having nothing to do with

graphics the applications still had to be programmed using graphics APIs

In addition the program had to be structured in terms of the graphics

pipeline with the programmable units only accessible as an intermediate

step in that pipeline when the programmer would almost certainly prefer to

access the programmable units directly The programming environments we

describe in detail in Section IV are solving this difficulty by providing a

more natural direct non-graphics interface to the hardware and

specifically the programmable units Today GPU computing applications

are structured in the following way

The programmer directly defines the computation domain of interest as a

structured grid of threads

An SPMD general-purpose program computes the value of each thread

The value for each thread is computed by a combination of math

operations and both Bgather[ (read) accesses from and Bscatter[ (write)

accesses to global memory Unlike in the previous two

methods the same buffer can be used for both reading and writing

allowing more flexible algorithms (for example in-place algorithms that

use less memory)

The resulting buffer in global memory can then be used as an input in

future computation

24 COARSE-TO-FINE TWO-STAGE MATCHING PROCESS

To further improve the matching process we propose the coarse-to-fine

two-stage matching process In the first stage we matched two images

coarsely using the Y-shape descriptors which is very fast to match because

no registration was needed The matching result in this stage can help filter

out image pairs with low similarities After this step it is still possible for

some false positive matches In the second stage we used WPL descriptor

to register the two images for more detailed descriptor matching including

scale- and translation invariance This stage includes shift transform affine

matrix generation and final WPL descriptor matching Overall we

partitioned the registration and matching processing into four kernels2 in

CUDA (Figure 10) matching on the Y shape descriptor shift

transformation affine matrix generation and final WSL descriptor

matching Combining these two stages the matching program can run faster

and

achieve more accurate score

241 STAGE I MATCHING WITH Y SHAPE DESCRIPTOR

Due to scale- and rotation- invariance of the Y-shape features

registration is unnecessary before matching on Y shape descriptor The

whole matching algorithm is listed as algorithm 1

FIG

Here ytei and yta j are the Y shape descriptors of test template Tte

and target template Tta respectively dϕ is the Euclidian distance of angle

element of descriptors vector defined as (3) dxy is the Euclidian distance of

two descriptor centers defined as (4) ni and di are the matched descriptor

pairsrsquo number and their centers distance respectively tϕ is a distance

threshold and txy is the threshold to restrict the searching area We set tϕ to

30 and txy to 675 in our experiment Here

To match two sclera templates we searched the areas nearby to all

the Y shape branches The search area is limited to the corresponding left or

right half of the sclera in order to reduce the searching range and time The

distance of two branches is defined in (3) where ϕi j is the angle between

the j th branch and the polar from pupil center in desctiptor i

The number of matched pairs ni and the distance between Y shape

branches centers di are stored as the matching result We fuse the number of

matched branches and the average distance between matched branches

centers as (2) Here α is a factor to fuse the matching score which was set

to 30 in our study Ni and Nj is the total numbers of feature vectors in

template i and j separately The decision is regulated by the threshold t if

the sclerarsquos matching score is lower than t the sclera will be discarded The

sclera with high matching score will be passed to the next more precisely

matching process

242 STAGE II FINE MATCHING USING WPL DESCRIPTOR

The line segment WSL descriptor reveals more vessel structure detail of

sclera than the Y shape descriptor The variation of sclera vessel pattern is

nonlinear because

When acquiring an eye image in different gaze angle the vessel structure

will appear nonlinear shrink or extend because eyeball is spherical in shape

sclera is made up of four layers episclera stroma lamina fusca and

endothelium There are slightly differences among movement of these

layers Considering these factors our registration employed both single

shift transform and multi-parameter transform which combines shift

rotation and scale together

1) SHIFT PARAMETER SEARCH As we discussed before

segmentation may not be accurate As a result the detected iris center could

not be very accurate Shift transform is designed to tolerant possible errors

in pupil center detection in the segmentation step If there is no deformation

or only very minor deformation registration with shift transform together

would be adequate to achieve an accurate result We designed Algorithm 2

to get optimized shift parameter Where Tte is the test template and ssei is

the i th WPL descriptor of Tte Tta is the target template and ssai is the i th

WPL descriptor of Ttad(stek staj ) is Euclidean distance of escriptors stek

and staj

Δsk is the shift value of two descriptors defines as

We first randomly select an equal number of segment descriptors

stek in test template Tte from each quad and find its nearest neighbors staj _

in target template Tta The shift offset of them is recorded as the possible

registration shift factor _sk The final offset registration factor is _soptim

which has the smallest standard deviation among these candidate offsets

2) AFFINE TRANSFORM PARAMETER SEARCH

Affine transform is designed to tolerant some deformation of sclera

patterns in the matching step The affine transform algorithm is shown in

Algorithm 3 The shift value in the parameter set is obtained by randomly

selecting descriptor s(it )te and calculating the distance from its nearest

neighbor staj_ in Tta We transform the test template by the matrix in (7)

At end of the iteration we count the numbers of matched descriptor pairs

from the transformed template and the target template The factor β is

involved to determine if the pair of descriptor is matched and we set it to

be 20 pixels in our experiment After N iterations the optimized transform

parameter set is determined via selecting the maximum matching numbers

m(it) Here stei Tte staj and Tta is defined same as algorithm 2 tr (it )

shi f t θ(it )tr (it ) scale is the parameters of shift rotation and scale

transform generated in i tth iteration R(θ (it )) T (tr (it ) shi f t ) and S(tr (it

) scale) are the transform matrix defined as (7) To search optimize

transform parameter we iterated N times to generate these parameters In

our experiment we set iteration time to 512

3) REGISTRATION AND MATCHING ALGORITHM

Using the optimized parameter set determined from Algorithms 2

and 3 the test template will be registered and matched simultaneously The

registration and matching algorithm is listed in Algorithm 4 Here stei Tte

staj and Tta are defined same as Algorithms 2 and 3 θ(optm) tr (optm) shi f

t tr (optm) scale _soptim are the registration parameters attained from

Algorithms 2 and 3 R_θ(optm)_T _tr (optm) shi f t _S_tr (optm) scale

_ is the descriptor transform matrix defined in Algorithm 3 empty is the angle

between the segment descriptor and radius direction w is the weight of the

descriptor which indicates whether the descriptor is at the edge of sclera or

not To ensure that the nearest descriptors have a similar orientation we

used a constant factor α to check the abstract difference of two ɸ In our

experiment we set α to 5 The total matching score is minimal score of two

transformed result divided by the minimal matching score for test template

and target template

25 MAPPING THE SUBTASKS TO CUDA

CUDA is a single instruction multiple data (SIMD) system and

works as a coprocessor with a CPU A CUDA consists of many streaming

multiprocessors (SM) where the parallel part of the program should be

partitioned into threads by the programmer and mapped into those threads

There are multiple memory spaces in the CUDA memory hierarchy

register local memory shared memory global memory constant memory

and texture memory Register local memory and shared memory are on-

chip and could be a little time consuming to access these memories Only

shared memory can be accessed by other threads within the same block

However there is only limited availability of shared memory Global

memory constant memory and texture memory are off-chip memory and

accessible by all threads which would be very time consuming to access

these memories

Constant memory and texture memory are read-only and cacheable

memory Mapping algorithms to CUDA to achieve efficient processing is

not a trivial task There are several challenges in CUDA programming

If threads in a warp have different control path all the branches will be

executed serially To improve performance branch divergence within a

warp should be avoided

Global memory is slower than on-chip memory in term of access To

completely hide the latency of the small instructions set we should use on-

chip memory preferentially rather than global memory When global

memory access occurs threads in same warp should access the words in

sequence to achieve coalescence

Shared memory is much faster than the local and global memory space

But shared memory is organized into banks which are equal in size If two

addresses of memory request from different thread within a warp fall in the

same memory bank the access will be serialized To get maximum

performance memory requests should be scheduled to minimize bank

conflicts

251 MAPPING ALGORITHM TO BLOCKS

Because the proposed registration and matching algorithm has four

independent modules all the modules will be converted to different kernels

on the GPU These kernels are different in computation density thus we

map them to the GPU by various map strategies to fully utilize the

computing power of CUDA Figure 11 shows our scheme of CPU-GPU

task distribution and the partition among blocks and threads Algorithm 1 is

partitioned into coarse-grained parallel subtasks

We create a number of threads in this kernel The number of threads

is the same as the number of templates in the database As the upper middle

column shows in Figure 11 each target template will be assigned to one

thread One thread performs a pair of templates compare In our work we

use NVIDIA C2070 as our GPU Threads and blocks number is set to

1024 That means we can match our test template with up to 1024times1024

target templates at same time

Algorithms 2-4 will be partitioned into fine-grained subtasks which is

processed a section of descriptors in one thread As the lower portion of the

middle column shows in Figure 11 we assigned a target template to one

block Inside a block one thread corresponds a set of descriptors in this

template This partition makes every block execute independently and there

are no data exchange requirements between different blocks When all

threads complete their responding descriptor fractions the sum of the

intermediate results needs to be computed or compared A parallel prefix

sum algorithm is used to calculate the sum of intermediate results which is

show in right of Figure 11 Firstly all odd number threads compute the sum

of consecutive pairs of the results Then recursively every first of i (= 4 8

16 32 64 ) threads

compute the prefix sum on the new result The final result will be saved in

the first address which has the same variable name as the first intermediate

result

252 MAPPING INSIDE BLOCK

In shift argument searching there are two schemes we can choose to

map task

Mapping one pair of templates to all the threads in a block and then every

thread would take charge of a fraction of descriptors and cooperation with

other threads

Assigning a single possible shift offset to a thread and all the threads will

compute independently unless the final result should be compared with

other possible offset

Due to great number of sum and synchronization operations in every

nearest neighbor searching step we choose the second method to parallelize

shift searching In affine matrix generator we mapped an entire parameter

set searching to a thread and every thread randomly generated a set of

parameters and tried them independently The generated iterations were

assigned to all threads The challenge of this step is the randomly generated

numbers might be correlated among threads In the step of rotation and

scale registration generating we used the Mersenne Twister pseudorandom

number generator because it can use bitwise arithmetic and have long

period

The Mersenne twister as most of pseudorandom generators is iterative

Therefore itrsquos hard to parallelize a single twister state update step among

several execution threads To make sure that thousands of threads in the

launch grid generate uncorrelated random sequence many simultaneous

Mersenne twisters need to process with different initial states in parallel

But even ldquovery differentrdquo (by any definition) initial state values do not

prevent the emission of correlated sequences by each generator sharing

identical parameters To solve this problem and to enable efficient

implementation of Mersenne Twister on parallel architectures we used a

special offline tool for the dynamic creation of Mersenne Twisters

parameters modified from the algorithm developed by Makoto Matsumoto

and Takuji Nishimura In the registration and matching step when

searching the nearest neighbor a line segment that has already matched

with others should not be used again In our approach a flag

FIG

FIG

Variable denoting whether the line has been matched is stored in

shared memory To share the flags all the threads in a block should wait

synchronic operation at every query step Our solution is to use a single

thread in a block to process the matching

253 MEMORY MANAGEMENT

The bandwidth inside GPU board is much higher than the

bandwidth between host memory and device memory The data transfer

between host and device can lead to long latency As shown in Figure 11

we load the entire target templates set from database without considering

when they would be processed Therefore there was no data transfer from

host to device during the matching procedure In global memory the

components in descriptor y(ϕ1 ϕ2 ϕ3 x y) and s(x y rθ ϕw) were stored

separately This would guarantee contiguous kernels of Algorithm 2 to 4

can access their data in successive addresses Although such coalescing

access reduces the latency frequently global memory access was still a

slower way to get data In our kernel we loaded the test template to shared

memory to accelerate memory access Because the Algorithms 2 to 4

execute different number of iterations on same data the bank conflict does

not happen To maximize our texture memory space we set the system

cache to the lowest value and bonded our target descriptor to texture

memory Using this catchable memory our data access was accelerated

more

FIG

26 HISTOGRAM OF ORIENTED GRADIENTS

Histogram of oriented gradients is the feature descriptors It is primarily

applied to the design of target detection In this paper it is applied as the

feature for human recognition In the sclera region the vein patterns are the

edges of an image So HOG is used to determine the gradient orientation

and edge orientations of vein pattern in the sclera region of an eye image

To follow out this technique first of all divide the image into small

connected regions called cells For each cell compute the histogram of

gradient directions or edge orientations of the pixels Then the combination

of different histogram of different cell represents the descriptor To improve

accuracy histograms can be contrast normalized by calculating the intensity

from the block and then using this value normalizes all cells within the

block This normalization result shows that it is invariant to geometric and

photometric changes The gradient magnitude m(x y) and orientation 1050592(x

y) are calculated using x and y directions gradients dx (x y) and dy (x y)

Orientation binning is the second step of HOG This method utilized

to create cell histograms Each pixel within the cell used to give a weight to

the orientation which is found in the gradient computation Gradient

magnitude is used as the weight The cells are in the rectangular form The

binning of gradient orientation should be spread over 0 to 180 degrees and

opposite direction counts as the same In the Fig 8 depicts the edge

orientation of picture elements If the images have any illumination and

contrast changes then the gradient strength must be locally normalized For

that cells are grouped together into larger blocks These blocks are

overlapping so that each cell contributes more than once to the final

descriptor Here rectangular HOG (R-HOG) blocks are applied which are

mainly in square grids The performance of HOG is improved by putting

on a Gaussian window into each block

FIG

CHAPTER 3

SOFTWARE SPECIFICATION

31 GENERAL

MATLAB(matrix laboratory) is a numerical

computing environment and fourth-generation programming language

Developed by Math Works MATLAB allows matrix manipulations

plotting of functions and data implementation of algorithms creation

of user interfaces and interfacing with programs written in other languages

including C C++ Java and Fortran

Although MATLAB is intended primarily for numerical computing an

optional toolbox uses the MuPAD symbolic engine allowing access

to symbolic computing capabilities An additional package Simulink adds

graphicalmulti-domainsimulationandModel-Based

Design for dynamic and embedded systems

In 2004 MATLAB had around one million users across industry

and academia MATLAB users come from various backgrounds

of engineering science and economics MATLAB is widely used in

academic and research institutions as well as industrial enterprises

MATLAB was first adopted by researchers and practitioners

in control engineering Littles specialty but quickly spread to many other

domains It is now also used in education in particular the teaching

of linear algebra and numerical analysis and is popular amongst scientists

involved in image processing The MATLAB application is built around the

MATLAB language The simplest way to execute MATLAB code is to type

it in the Command Window which is one of the elements of the MATLAB

Desktop When code is entered in the Command Window MATLAB can

be used as an interactive mathematical shell Sequences of commands can

be saved in a text file typically using the MATLAB Editor as a script or

encapsulated into a function extending the commands available

MATLAB provides a number of features for documenting and

sharing your work You can integrate your MATLAB code with other

languages and applications and distribute your MATLAB algorithms and

applications

32 FEATURES OF MATLAB

High-level language for technical computing

Development environment for managing code files and data

Interactive tools for iterative exploration design and problem solving

Mathematical functions for linear algebra statistics Fourier analysis

filtering optimization and numerical integration

2-D and 3-D graphics functions for visualizing data

Tools for building custom graphical user interfaces

Functions for integrating MATLAB based algorithms with external

applications and languages such as C C++ FORTRAN Javatrade COM

and Microsoft Excel

MATLAB is used in vast area including signal and image

processing communications control design test and measurement

financial modeling and analysis and computational Add-on toolboxes

(collections of special-purpose MATLAB functions) extend the MATLAB

environment to solve particular classes of problems in these application

areas

MATLAB can be used on personal computers and powerful

server systems including the Cheaha compute cluster With the addition of

the Parallel Computing Toolbox the language can be extended with parallel

implementations for common computational functions including for-loop

unrolling Additionally this toolbox supports offloading computationally

intensive workloads to Cheaha the campus compute cluster MATLAB is

one of a few languages in which each variable is a matrix (broadly

construed) and knows how big it is Moreover the fundamental operators

(eg addition multiplication) are programmed to deal with matrices when

required And the MATLAB environment handles much of the bothersome

housekeeping that makes all this possible Since so many of the procedures

required for Macro-Investment Analysis involves matrices MATLAB

proves to be an extremely efficient language for both communication and

implementation

321 INTERFACING WITH OTHER LANGUAGES

MATLAB can call functions and subroutines written in the C

programming language or FORTRAN A wrapper function is created

allowing MATLAB data types to be passed and returned The dynamically

loadable object files created by compiling such functions are termed MEX-

files (for MATLAB executable)

Libraries written in Java ActiveX or NET can be directly called

from MATLAB and many MATLAB libraries (for

example XML or SQL support) are implemented as wrappers around Java

or ActiveX libraries Calling MATLAB from Java is more complicated but

can be done with MATLAB extension which is sold separately by Math

Works or using an undocumented mechanism called JMI (Java-to-Mat lab

Interface) which should not be confused with the unrelated Java that is also

called JMI

As alternatives to the MuPAD based Symbolic Math Toolbox

available from Math Works MATLAB can be connected

to Maple or Mathematical

Libraries also exist to import and export MathML

Development Environment

Startup Accelerator for faster MATLAB startup on Windows especially on

Windows XP and for network installations

Spreadsheet Import Tool that provides more options for selecting and

loading mixed textual and numeric data

Readability and navigation improvements to warning and error messages in

the MATLAB command window

Automatic variable and function renaming in the MATLAB Editor

Developing Algorithms and Applications

MATLAB provides a high-level language and development

tools that let you quickly develop and analyze your algorithms and

applications

The MATLAB Language

The MATLAB language supports the vector and matrix operations

that are fundamental to engineering and scientific problems It enables fast

development and execution With the MATLAB language you can

program and develop algorithms faster than with traditional languages

because you do not need to perform low-level administrative tasks such as

declaring variables specifying data types and allocating memory In many

cases MATLAB eliminates the need for lsquoforrsquo loops As a result one line of

MATLAB code can often replace several lines of C or C++ code

At the same time MATLAB provides all the features of a traditional

programming language including arithmetic operators flow control data

structures data types object-oriented programming (OOP) and debugging

features

MATLAB lets you execute commands or groups of commands one

at a time without compiling and linking enabling you to quickly iterate to

the optimal solution For fast execution of heavy matrix and vector

computations MATLAB uses processor-optimized libraries For general-

purpose scalar computations MATLAB generates machine-code

instructions using its JIT (Just-In-Time) compilation technology

This technology which is available on most platforms provides

execution speeds that rival those of traditional programming languages

Development Tools

MATLAB includes development tools that help you implement

your algorithm efficiently These include the following

MATLAB Editor

Provides standard editing and debugging features such as setting

breakpoints and single stepping

Code Analyzer

Checks your code for problems and recommends modifications to

maximize performance and maintainability

MATLAB Profiler

Records the time spent executing each line of code

Directory Reports

Scan all the files in a directory and report on code efficiency file

differences file dependencies and code coverage

Designing Graphical User Interfaces

By using the interactive tool GUIDE (Graphical User Interface

Development Environment) to layout design and edit user interfaces

GUIDE lets you include list boxes pull-down menus push buttons radio

buttons and sliders as well as MATLAB plots and Microsoft

ActiveXreg controls Alternatively you can create GUIs programmatically

using MATLAB functions

322 ANALYZING AND ACCESSING DATA

MATLAB supports the entire data analysis process from acquiring

data from external devices and databases through preprocessing

visualization and numerical analysis to producing presentation-quality

output

Data Analysis

MATLAB provides interactive tools and command-line functions for data

analysis operations including

Interpolating and decimating

Extracting sections of data scaling and averaging

Thresholding and smoothing

Correlation Fourier analysis and filtering

1-D peak valley and zero finding

Basic statistics and curve fitting

Matrix analysis

Data Access

MATLAB is an efficient platform for accessing data from

files other applications databases and external devices You can read data

from popular file formats such as Microsoft Excel ASCII text or binary

files image sound and video files and scientific files such as HDF and

HDF5 Low-level binary file IO functions let you work with data files in

any format Additional functions let you read data from Web pages and

XML

Visualizing Data

All the graphics features that are required to visualize engineering

and scientific data are available in MATLAB These include 2-D and 3-D

plotting functions 3-D volume visualization functions tools for

interactively creating plots and the ability to export results to all popular

graphics formats You can customize plots by adding multiple axes

changing line colors and markers adding annotation Latex equations and

legends and drawing shapes

2-D Plotting

Visualizing vectors of data with 2-D plotting functions that create

Line area bar and pie charts

Direction and velocity plots

Histograms

Polygons and surfaces

Scatterbubble plots

Animations

3-D Plotting and Volume Visualization

MATLAB provides functions for visualizing 2-D matrices 3-

D scalar and 3-D vector data You can use these functions to visualize and

understand large often complex multidimensional data Specifying plot

characteristics such as camera viewing angle perspective lighting effect

light source locations and transparency

3-D plotting functions include

Surface contour and mesh

Image plots

Cone slice stream and isosurface

323 PERFORMING NUMERIC COMPUTATION

MATLAB contains mathematical statistical and engineering

functions to support all common engineering and science operations These

functions developed by experts in mathematics are the foundation of the

MATLAB language The core math functions use the LAPACK and BLAS

linear algebra subroutine libraries and the FFTW Discrete Fourier

Transform library Because these processor-dependent libraries are

optimized to the different platforms that MATLAB supports they execute

faster than the equivalent C or C++ code

MATLAB provides the following types of functions for performing

mathematical operations and analyzing data

Matrix manipulation and linear algebra

Polynomials and interpolation

Fourier analysis and filtering

Data analysis and statistics

Optimization and numerical integration

Ordinary differential equations (ODEs)

Partial differential equations (PDEs)

Sparse matrix operations

MATLAB can perform arithmetic on a wide range of data types

including doubles singles and integers

CHAPTER 4

IMPLEMENTATION

41 GENERAL

Matlab is a program that was originally designed to simplify the

implementation of numerical linear algebra routines It has since grown into

something much bigger and it is used to implement numerical algorithms

for a wide range of applications The basic language used is very similar to

standard linear algebra notation but there are a few extensions that will

likely cause you some problems at first

42 SNAPSHOTS

ORIGINAL SCLERA IMAGE IS CONVERTED INTO GREY SCALE IMAGE

FIG

GREY SCALE IMAGE IS CONVERTED INTO BINARY IMAGE

FIG

EDGE DETECTON IS DONE BY OTSUrsquoS THRESHOLDING

FIG

SELECTING THE REGION OF INTEREST (SCLERA PART)

FIG

SELECTED ROI PART

FIG

FIG

ENHANCEMENT OF SCLERA IMAGE

FIG

FEATURE EXTRACTION OF SCLERA IMAGE USING GABOR

FILTERS

FIG

MATCHING WITH IMAGES IN DATABASE

FIG

DISPLAYING THE RESULT (MATCHED OR NOT MATCHED)

FIG

CHAPTER 5

APPLICATIONS

The applications of biometrics can be divided into the following three main groups

Commercial applications such as computer network login electronic data security ecommerce Internet access ATM credit card physical access control cellular phone PDA medical records management distance learning etc

Government applications such as national ID card correctional facility driverrsquos license social security welfare-disbursement border control Passport control etc

Forensic applications such as corpse identification criminal investigation terrorist identification parenthood determination missing children etc Traditionally commercial applications have used knowledge-based systems (eg PIN sand passwords) government applications have used token-based systems (eg ID cards and badges) and forensic applications have relied on human experts to match biometric features Biometric systems are being increasingly deployed in large scale civilian applications The Schiphol Premium scheme at the Amsterdam airport for example employs iris scan cards to speed up the passport and visa control procedures

CHAPTER 6

CONCLUSION AND FUTURE SCOPE

61 CONCLUSION

In this paper we proposed a new parallel sclera vein recognition

method which employees a two stage parallel approach for registration and

matching Even though the research focused on developing a parallel sclera

matching solution for the sequential line-descriptor method using CUDA

GPU architecture the parallel strategies developed in this research can be

applied to design parallel solutions to other sclera vein recognition methods

and general pattern recognition methods We designed the Y shape

descriptor to narrow the search range to increase the matching efficiency

which is a new feature extraction method to take advantage of the GPU

structures We developed the WPL descriptor to incorporate mask

information and make it more suitable for parallel computing which can

dramatically reduce data transferring and computation We then carefully

mapped our algorithms to GPU threads and blocks which is an important

step to achieve parallel computation efficiency using a GPU A work flow

which has high arithmetic intensity to hide the memory access latency was

designed to partition the computation task to the heterogeneous system of

CPU and GPU even to the threads in GPU The proposed method

dramatically improves the matching efficiency without compromising

recognition accuracy

62 REFERENCES

[1] C W Oyster The Human Eye Structure and Function Sunderland

Sinauer Associates 1999

[2] C Cuevas D Berjon F Moran and N Garcia ldquoMoving object

detection for real-time augmented reality applications in a GPGPUrdquo IEEE

Trans Consum Electron vol 58 no 1 pp 117ndash125 Feb 2012

[3] D C Cirean U Meier L M Gambardella and J Schmidhuber ldquoDeep

big simple neural nets for handwritten digit recognitionrdquo Neural Comput

vol 22 no 12 pp 3207ndash3220 2010

[4] F Z Sakr M Taher and A M Wahba ldquoHigh performance iris

recognition system on GPUrdquo in Proc ICCES 2011 pp 237ndash242

[5] G Poli J H Saito J F Mari and M R Zorzan ldquoProcessing

neocognitron of face recognition on high performance environment based

on GPU with CUDA architecturerdquo in Proc 20th Int Symp Comput

Archit High Perform Comput 2008 pp 81ndash88

[6] J Antikainen J Havel R Josth A Herout P Zemcik and M Hauta-

Kasari ldquoNonnegative tensor factorization accelerated using GPGPUrdquo IEEE

Trans Parallel Distrib Syst vol 22 no 7 pp 1135ndash1141 Feb 2011

[7] K-S Oh and K Jung ldquoGPU implementation of neural networksrdquo

Pattern Recognit vol 37 no 6 pp 1311ndash1314 2004

[8] P R Dixon T Oonishi and S Furui ldquoHarnessing graphics processors

for the fast computation of acoustic likelihoods in speech recognitionrdquo

Comput Speech Lang vol 23 no 4 pp 510ndash526 2009

[9] P Kaufman and A Alm ldquoClinical applicationrdquo Adlerrsquos Physiology of

the Eye 2003

[10] R N Rakvic B J Ulis R P Broussard R W Ives and N Steiner

ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics Security vol 4

no 4 pp 812ndash823 Dec 2009

[11] S Crihalmeanu and A Ross ldquoMultispectral scleral patterns for ocular

biometric recognitionrdquo Pattern Recognit Lett vol 33 no 14 pp 1860ndash

1869 Oct 2012

[12] W Wenying Z Dongming Z Yongdong L Jintao and G

Xiaoguang ldquoRobust spatial matching for object retrieval and its parallel

implementation on GPUrdquo IEEE Trans Multimedia vol 13 no 6 pp

1308ndash1318 Dec 2011Multimedia Sec Magdeburg Germany Sep 2004

pp 4ndash15

[13] Y Xu S Deka and R Righetti ldquoA hybrid CPU-GPGPU approach for

real-time elastographyrdquo IEEE Trans Ultrason Ferroelectr Freq Control

vol 58 no 12 pp 2631ndash2645 Dec 2011

[14] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive

multimodal eye recognitionrdquo Signal Image Video Process vol 7 no 4

pp 619ndash631 Jul 2013

[15] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive

approach for sclera image quality measurerdquo Int J Biometrics vol 5 no 2

pp 181ndash198 2013

[16] Z Zhou E Y Du N L Thomas and E J Delp ldquoA new human

identification method Sclera recognitionrdquo IEEE Trans Syst Man

Cybern A Syst Humans vol 42 no 3 pp 571ndash583 May 2012

Page 48: Major project report on

The programmer directly defines the computation domain of interest as a

structured grid of threads

An SPMD general-purpose program computes the value of each thread

The value for each thread is computed by a combination of math

operations and both Bgather[ (read) accesses from and Bscatter[ (write)

accesses to global memory Unlike in the previous two

methods the same buffer can be used for both reading and writing

allowing more flexible algorithms (for example in-place algorithms that

use less memory)

The resulting buffer in global memory can then be used as an input in

future computation

24 COARSE-TO-FINE TWO-STAGE MATCHING PROCESS

To further improve the matching process we propose the coarse-to-fine

two-stage matching process In the first stage we matched two images

coarsely using the Y-shape descriptors which is very fast to match because

no registration was needed The matching result in this stage can help filter

out image pairs with low similarities After this step it is still possible for

some false positive matches In the second stage we used WPL descriptor

to register the two images for more detailed descriptor matching including

scale- and translation invariance This stage includes shift transform affine

matrix generation and final WPL descriptor matching Overall we

partitioned the registration and matching processing into four kernels2 in

CUDA (Figure 10) matching on the Y shape descriptor shift

transformation affine matrix generation and final WSL descriptor

matching Combining these two stages the matching program can run faster

and

achieve more accurate score

241 STAGE I MATCHING WITH Y SHAPE DESCRIPTOR

Due to scale- and rotation- invariance of the Y-shape features

registration is unnecessary before matching on Y shape descriptor The

whole matching algorithm is listed as algorithm 1

FIG

Here ytei and yta j are the Y shape descriptors of test template Tte

and target template Tta respectively dϕ is the Euclidian distance of angle

element of descriptors vector defined as (3) dxy is the Euclidian distance of

two descriptor centers defined as (4) ni and di are the matched descriptor

pairsrsquo number and their centers distance respectively tϕ is a distance

threshold and txy is the threshold to restrict the searching area We set tϕ to

30 and txy to 675 in our experiment Here

To match two sclera templates we searched the areas nearby to all

the Y shape branches The search area is limited to the corresponding left or

right half of the sclera in order to reduce the searching range and time The

distance of two branches is defined in (3) where ϕi j is the angle between

the j th branch and the polar from pupil center in desctiptor i

The number of matched pairs ni and the distance between Y shape

branches centers di are stored as the matching result We fuse the number of

matched branches and the average distance between matched branches

centers as (2) Here α is a factor to fuse the matching score which was set

to 30 in our study Ni and Nj is the total numbers of feature vectors in

template i and j separately The decision is regulated by the threshold t if

the sclerarsquos matching score is lower than t the sclera will be discarded The

sclera with high matching score will be passed to the next more precisely

matching process

242 STAGE II FINE MATCHING USING WPL DESCRIPTOR

The line segment WSL descriptor reveals more vessel structure detail of

sclera than the Y shape descriptor The variation of sclera vessel pattern is

nonlinear because

When acquiring an eye image in different gaze angle the vessel structure

will appear nonlinear shrink or extend because eyeball is spherical in shape

sclera is made up of four layers episclera stroma lamina fusca and

endothelium There are slightly differences among movement of these

layers Considering these factors our registration employed both single

shift transform and multi-parameter transform which combines shift

rotation and scale together

1) SHIFT PARAMETER SEARCH As we discussed before

segmentation may not be accurate As a result the detected iris center could

not be very accurate Shift transform is designed to tolerant possible errors

in pupil center detection in the segmentation step If there is no deformation

or only very minor deformation registration with shift transform together

would be adequate to achieve an accurate result We designed Algorithm 2

to get optimized shift parameter Where Tte is the test template and ssei is

the i th WPL descriptor of Tte Tta is the target template and ssai is the i th

WPL descriptor of Ttad(stek staj ) is Euclidean distance of escriptors stek

and staj

Δsk is the shift value of two descriptors defines as

We first randomly select an equal number of segment descriptors

stek in test template Tte from each quad and find its nearest neighbors staj _

in target template Tta The shift offset of them is recorded as the possible

registration shift factor _sk The final offset registration factor is _soptim

which has the smallest standard deviation among these candidate offsets

2) AFFINE TRANSFORM PARAMETER SEARCH

Affine transform is designed to tolerant some deformation of sclera

patterns in the matching step The affine transform algorithm is shown in

Algorithm 3 The shift value in the parameter set is obtained by randomly

selecting descriptor s(it )te and calculating the distance from its nearest

neighbor staj_ in Tta We transform the test template by the matrix in (7)

At end of the iteration we count the numbers of matched descriptor pairs

from the transformed template and the target template The factor β is

involved to determine if the pair of descriptor is matched and we set it to

be 20 pixels in our experiment After N iterations the optimized transform

parameter set is determined via selecting the maximum matching numbers

m(it) Here stei Tte staj and Tta is defined same as algorithm 2 tr (it )

shi f t θ(it )tr (it ) scale is the parameters of shift rotation and scale

transform generated in i tth iteration R(θ (it )) T (tr (it ) shi f t ) and S(tr (it

) scale) are the transform matrix defined as (7) To search optimize

transform parameter we iterated N times to generate these parameters In

our experiment we set iteration time to 512

3) REGISTRATION AND MATCHING ALGORITHM

Using the optimized parameter set determined from Algorithms 2

and 3 the test template will be registered and matched simultaneously The

registration and matching algorithm is listed in Algorithm 4 Here stei Tte

staj and Tta are defined same as Algorithms 2 and 3 θ(optm) tr (optm) shi f

t tr (optm) scale _soptim are the registration parameters attained from

Algorithms 2 and 3 R_θ(optm)_T _tr (optm) shi f t _S_tr (optm) scale

_ is the descriptor transform matrix defined in Algorithm 3 empty is the angle

between the segment descriptor and radius direction w is the weight of the

descriptor which indicates whether the descriptor is at the edge of sclera or

not To ensure that the nearest descriptors have a similar orientation we

used a constant factor α to check the abstract difference of two ɸ In our

experiment we set α to 5 The total matching score is minimal score of two

transformed result divided by the minimal matching score for test template

and target template

25 MAPPING THE SUBTASKS TO CUDA

CUDA is a single instruction multiple data (SIMD) system and

works as a coprocessor with a CPU A CUDA consists of many streaming

multiprocessors (SM) where the parallel part of the program should be

partitioned into threads by the programmer and mapped into those threads

There are multiple memory spaces in the CUDA memory hierarchy

register local memory shared memory global memory constant memory

and texture memory Register local memory and shared memory are on-

chip and could be a little time consuming to access these memories Only

shared memory can be accessed by other threads within the same block

However there is only limited availability of shared memory Global

memory constant memory and texture memory are off-chip memory and

accessible by all threads which would be very time consuming to access

these memories

Constant memory and texture memory are read-only and cacheable

memory Mapping algorithms to CUDA to achieve efficient processing is

not a trivial task There are several challenges in CUDA programming

If threads in a warp have different control path all the branches will be

executed serially To improve performance branch divergence within a

warp should be avoided

Global memory is slower than on-chip memory in term of access To

completely hide the latency of the small instructions set we should use on-

chip memory preferentially rather than global memory When global

memory access occurs threads in same warp should access the words in

sequence to achieve coalescence

Shared memory is much faster than the local and global memory space

But shared memory is organized into banks which are equal in size If two

addresses of memory request from different thread within a warp fall in the

same memory bank the access will be serialized To get maximum

performance memory requests should be scheduled to minimize bank

conflicts

251 MAPPING ALGORITHM TO BLOCKS

Because the proposed registration and matching algorithm has four

independent modules all the modules will be converted to different kernels

on the GPU These kernels are different in computation density thus we

map them to the GPU by various map strategies to fully utilize the

computing power of CUDA Figure 11 shows our scheme of CPU-GPU

task distribution and the partition among blocks and threads Algorithm 1 is

partitioned into coarse-grained parallel subtasks

We create a number of threads in this kernel The number of threads

is the same as the number of templates in the database As the upper middle

column shows in Figure 11 each target template will be assigned to one

thread One thread performs a pair of templates compare In our work we

use NVIDIA C2070 as our GPU Threads and blocks number is set to

1024 That means we can match our test template with up to 1024times1024

target templates at same time

Algorithms 2-4 will be partitioned into fine-grained subtasks which is

processed a section of descriptors in one thread As the lower portion of the

middle column shows in Figure 11 we assigned a target template to one

block Inside a block one thread corresponds a set of descriptors in this

template This partition makes every block execute independently and there

are no data exchange requirements between different blocks When all

threads complete their responding descriptor fractions the sum of the

intermediate results needs to be computed or compared A parallel prefix

sum algorithm is used to calculate the sum of intermediate results which is

show in right of Figure 11 Firstly all odd number threads compute the sum

of consecutive pairs of the results Then recursively every first of i (= 4 8

16 32 64 ) threads

compute the prefix sum on the new result The final result will be saved in

the first address which has the same variable name as the first intermediate

result

252 MAPPING INSIDE BLOCK

In shift argument searching there are two schemes we can choose to

map task

Mapping one pair of templates to all the threads in a block and then every

thread would take charge of a fraction of descriptors and cooperation with

other threads

Assigning a single possible shift offset to a thread and all the threads will

compute independently unless the final result should be compared with

other possible offset

Due to great number of sum and synchronization operations in every

nearest neighbor searching step we choose the second method to parallelize

shift searching In affine matrix generator we mapped an entire parameter

set searching to a thread and every thread randomly generated a set of

parameters and tried them independently The generated iterations were

assigned to all threads The challenge of this step is the randomly generated

numbers might be correlated among threads In the step of rotation and

scale registration generating we used the Mersenne Twister pseudorandom

number generator because it can use bitwise arithmetic and have long

period

The Mersenne twister as most of pseudorandom generators is iterative

Therefore itrsquos hard to parallelize a single twister state update step among

several execution threads To make sure that thousands of threads in the

launch grid generate uncorrelated random sequence many simultaneous

Mersenne twisters need to process with different initial states in parallel

But even ldquovery differentrdquo (by any definition) initial state values do not

prevent the emission of correlated sequences by each generator sharing

identical parameters To solve this problem and to enable efficient

implementation of Mersenne Twister on parallel architectures we used a

special offline tool for the dynamic creation of Mersenne Twisters

parameters modified from the algorithm developed by Makoto Matsumoto

and Takuji Nishimura In the registration and matching step when

searching the nearest neighbor a line segment that has already matched

with others should not be used again In our approach a flag

FIG

FIG

Variable denoting whether the line has been matched is stored in

shared memory To share the flags all the threads in a block should wait

synchronic operation at every query step Our solution is to use a single

thread in a block to process the matching

253 MEMORY MANAGEMENT

The bandwidth inside GPU board is much higher than the

bandwidth between host memory and device memory The data transfer

between host and device can lead to long latency As shown in Figure 11

we load the entire target templates set from database without considering

when they would be processed Therefore there was no data transfer from

host to device during the matching procedure In global memory the

components in descriptor y(ϕ1 ϕ2 ϕ3 x y) and s(x y rθ ϕw) were stored

separately This would guarantee contiguous kernels of Algorithm 2 to 4

can access their data in successive addresses Although such coalescing

access reduces the latency frequently global memory access was still a

slower way to get data In our kernel we loaded the test template to shared

memory to accelerate memory access Because the Algorithms 2 to 4

execute different number of iterations on same data the bank conflict does

not happen To maximize our texture memory space we set the system

cache to the lowest value and bonded our target descriptor to texture

memory Using this catchable memory our data access was accelerated

more

FIG

26 HISTOGRAM OF ORIENTED GRADIENTS

Histogram of oriented gradients is the feature descriptors It is primarily

applied to the design of target detection In this paper it is applied as the

feature for human recognition In the sclera region the vein patterns are the

edges of an image So HOG is used to determine the gradient orientation

and edge orientations of vein pattern in the sclera region of an eye image

To follow out this technique first of all divide the image into small

connected regions called cells For each cell compute the histogram of

gradient directions or edge orientations of the pixels Then the combination

of different histogram of different cell represents the descriptor To improve

accuracy histograms can be contrast normalized by calculating the intensity

from the block and then using this value normalizes all cells within the

block This normalization result shows that it is invariant to geometric and

photometric changes The gradient magnitude m(x y) and orientation 1050592(x

y) are calculated using x and y directions gradients dx (x y) and dy (x y)

Orientation binning is the second step of HOG This method utilized

to create cell histograms Each pixel within the cell used to give a weight to

the orientation which is found in the gradient computation Gradient

magnitude is used as the weight The cells are in the rectangular form The

binning of gradient orientation should be spread over 0 to 180 degrees and

opposite direction counts as the same In the Fig 8 depicts the edge

orientation of picture elements If the images have any illumination and

contrast changes then the gradient strength must be locally normalized For

that cells are grouped together into larger blocks These blocks are

overlapping so that each cell contributes more than once to the final

descriptor Here rectangular HOG (R-HOG) blocks are applied which are

mainly in square grids The performance of HOG is improved by putting

on a Gaussian window into each block

FIG

CHAPTER 3

SOFTWARE SPECIFICATION

31 GENERAL

MATLAB(matrix laboratory) is a numerical

computing environment and fourth-generation programming language

Developed by Math Works MATLAB allows matrix manipulations

plotting of functions and data implementation of algorithms creation

of user interfaces and interfacing with programs written in other languages

including C C++ Java and Fortran

Although MATLAB is intended primarily for numerical computing an

optional toolbox uses the MuPAD symbolic engine allowing access

to symbolic computing capabilities An additional package Simulink adds

graphicalmulti-domainsimulationandModel-Based

Design for dynamic and embedded systems

In 2004 MATLAB had around one million users across industry

and academia MATLAB users come from various backgrounds

of engineering science and economics MATLAB is widely used in

academic and research institutions as well as industrial enterprises

MATLAB was first adopted by researchers and practitioners

in control engineering Littles specialty but quickly spread to many other

domains It is now also used in education in particular the teaching

of linear algebra and numerical analysis and is popular amongst scientists

involved in image processing The MATLAB application is built around the

MATLAB language The simplest way to execute MATLAB code is to type

it in the Command Window which is one of the elements of the MATLAB

Desktop When code is entered in the Command Window MATLAB can

be used as an interactive mathematical shell Sequences of commands can

be saved in a text file typically using the MATLAB Editor as a script or

encapsulated into a function extending the commands available

MATLAB provides a number of features for documenting and

sharing your work You can integrate your MATLAB code with other

languages and applications and distribute your MATLAB algorithms and

applications

32 FEATURES OF MATLAB

High-level language for technical computing

Development environment for managing code files and data

Interactive tools for iterative exploration design and problem solving

Mathematical functions for linear algebra statistics Fourier analysis

filtering optimization and numerical integration

2-D and 3-D graphics functions for visualizing data

Tools for building custom graphical user interfaces

Functions for integrating MATLAB based algorithms with external

applications and languages such as C C++ FORTRAN Javatrade COM

and Microsoft Excel

MATLAB is used in vast area including signal and image

processing communications control design test and measurement

financial modeling and analysis and computational Add-on toolboxes

(collections of special-purpose MATLAB functions) extend the MATLAB

environment to solve particular classes of problems in these application

areas

MATLAB can be used on personal computers and powerful

server systems including the Cheaha compute cluster With the addition of

the Parallel Computing Toolbox the language can be extended with parallel

implementations for common computational functions including for-loop

unrolling Additionally this toolbox supports offloading computationally

intensive workloads to Cheaha the campus compute cluster MATLAB is

one of a few languages in which each variable is a matrix (broadly

construed) and knows how big it is Moreover the fundamental operators

(eg addition multiplication) are programmed to deal with matrices when

required And the MATLAB environment handles much of the bothersome

housekeeping that makes all this possible Since so many of the procedures

required for Macro-Investment Analysis involves matrices MATLAB

proves to be an extremely efficient language for both communication and

implementation

321 INTERFACING WITH OTHER LANGUAGES

MATLAB can call functions and subroutines written in the C

programming language or FORTRAN A wrapper function is created

allowing MATLAB data types to be passed and returned The dynamically

loadable object files created by compiling such functions are termed MEX-

files (for MATLAB executable)

Libraries written in Java ActiveX or NET can be directly called

from MATLAB and many MATLAB libraries (for

example XML or SQL support) are implemented as wrappers around Java

or ActiveX libraries Calling MATLAB from Java is more complicated but

can be done with MATLAB extension which is sold separately by Math

Works or using an undocumented mechanism called JMI (Java-to-Mat lab

Interface) which should not be confused with the unrelated Java that is also

called JMI

As alternatives to the MuPAD based Symbolic Math Toolbox

available from Math Works MATLAB can be connected

to Maple or Mathematical

Libraries also exist to import and export MathML

Development Environment

Startup Accelerator for faster MATLAB startup on Windows especially on

Windows XP and for network installations

Spreadsheet Import Tool that provides more options for selecting and

loading mixed textual and numeric data

Readability and navigation improvements to warning and error messages in

the MATLAB command window

Automatic variable and function renaming in the MATLAB Editor

Developing Algorithms and Applications

MATLAB provides a high-level language and development

tools that let you quickly develop and analyze your algorithms and

applications

The MATLAB Language

The MATLAB language supports the vector and matrix operations

that are fundamental to engineering and scientific problems It enables fast

development and execution With the MATLAB language you can

program and develop algorithms faster than with traditional languages

because you do not need to perform low-level administrative tasks such as

declaring variables specifying data types and allocating memory In many

cases MATLAB eliminates the need for lsquoforrsquo loops As a result one line of

MATLAB code can often replace several lines of C or C++ code

At the same time MATLAB provides all the features of a traditional

programming language including arithmetic operators flow control data

structures data types object-oriented programming (OOP) and debugging

features

MATLAB lets you execute commands or groups of commands one

at a time without compiling and linking enabling you to quickly iterate to

the optimal solution For fast execution of heavy matrix and vector

computations MATLAB uses processor-optimized libraries For general-

purpose scalar computations MATLAB generates machine-code

instructions using its JIT (Just-In-Time) compilation technology

This technology which is available on most platforms provides

execution speeds that rival those of traditional programming languages

Development Tools

MATLAB includes development tools that help you implement

your algorithm efficiently These include the following

MATLAB Editor

Provides standard editing and debugging features such as setting

breakpoints and single stepping

Code Analyzer

Checks your code for problems and recommends modifications to

maximize performance and maintainability

MATLAB Profiler

Records the time spent executing each line of code

Directory Reports

Scan all the files in a directory and report on code efficiency file

differences file dependencies and code coverage

Designing Graphical User Interfaces

By using the interactive tool GUIDE (Graphical User Interface

Development Environment) to layout design and edit user interfaces

GUIDE lets you include list boxes pull-down menus push buttons radio

buttons and sliders as well as MATLAB plots and Microsoft

ActiveXreg controls Alternatively you can create GUIs programmatically

using MATLAB functions

322 ANALYZING AND ACCESSING DATA

MATLAB supports the entire data analysis process from acquiring

data from external devices and databases through preprocessing

visualization and numerical analysis to producing presentation-quality

output

Data Analysis

MATLAB provides interactive tools and command-line functions for data

analysis operations including

Interpolating and decimating

Extracting sections of data scaling and averaging

Thresholding and smoothing

Correlation Fourier analysis and filtering

1-D peak valley and zero finding

Basic statistics and curve fitting

Matrix analysis

Data Access

MATLAB is an efficient platform for accessing data from

files other applications databases and external devices You can read data

from popular file formats such as Microsoft Excel ASCII text or binary

files image sound and video files and scientific files such as HDF and

HDF5 Low-level binary file IO functions let you work with data files in

any format Additional functions let you read data from Web pages and

XML

Visualizing Data

All the graphics features that are required to visualize engineering

and scientific data are available in MATLAB These include 2-D and 3-D

plotting functions 3-D volume visualization functions tools for

interactively creating plots and the ability to export results to all popular

graphics formats You can customize plots by adding multiple axes

changing line colors and markers adding annotation Latex equations and

legends and drawing shapes

2-D Plotting

Visualizing vectors of data with 2-D plotting functions that create

Line area bar and pie charts

Direction and velocity plots

Histograms

Polygons and surfaces

Scatterbubble plots

Animations

3-D Plotting and Volume Visualization

MATLAB provides functions for visualizing 2-D matrices 3-

D scalar and 3-D vector data You can use these functions to visualize and

understand large often complex multidimensional data Specifying plot

characteristics such as camera viewing angle perspective lighting effect

light source locations and transparency

3-D plotting functions include

Surface contour and mesh

Image plots

Cone slice stream and isosurface

323 PERFORMING NUMERIC COMPUTATION

MATLAB contains mathematical statistical and engineering

functions to support all common engineering and science operations These

functions developed by experts in mathematics are the foundation of the

MATLAB language The core math functions use the LAPACK and BLAS

linear algebra subroutine libraries and the FFTW Discrete Fourier

Transform library Because these processor-dependent libraries are

optimized to the different platforms that MATLAB supports they execute

faster than the equivalent C or C++ code

MATLAB provides the following types of functions for performing

mathematical operations and analyzing data

Matrix manipulation and linear algebra

Polynomials and interpolation

Fourier analysis and filtering

Data analysis and statistics

Optimization and numerical integration

Ordinary differential equations (ODEs)

Partial differential equations (PDEs)

Sparse matrix operations

MATLAB can perform arithmetic on a wide range of data types

including doubles singles and integers

CHAPTER 4

IMPLEMENTATION

41 GENERAL

Matlab is a program that was originally designed to simplify the

implementation of numerical linear algebra routines It has since grown into

something much bigger and it is used to implement numerical algorithms

for a wide range of applications The basic language used is very similar to

standard linear algebra notation but there are a few extensions that will

likely cause you some problems at first

42 SNAPSHOTS

ORIGINAL SCLERA IMAGE IS CONVERTED INTO GREY SCALE IMAGE

FIG

GREY SCALE IMAGE IS CONVERTED INTO BINARY IMAGE

FIG

EDGE DETECTON IS DONE BY OTSUrsquoS THRESHOLDING

FIG

SELECTING THE REGION OF INTEREST (SCLERA PART)

FIG

SELECTED ROI PART

FIG

FIG

ENHANCEMENT OF SCLERA IMAGE

FIG

FEATURE EXTRACTION OF SCLERA IMAGE USING GABOR

FILTERS

FIG

MATCHING WITH IMAGES IN DATABASE

FIG

DISPLAYING THE RESULT (MATCHED OR NOT MATCHED)

FIG

CHAPTER 5

APPLICATIONS

The applications of biometrics can be divided into the following three main groups

Commercial applications such as computer network login electronic data security ecommerce Internet access ATM credit card physical access control cellular phone PDA medical records management distance learning etc

Government applications such as national ID card correctional facility driverrsquos license social security welfare-disbursement border control Passport control etc

Forensic applications such as corpse identification criminal investigation terrorist identification parenthood determination missing children etc Traditionally commercial applications have used knowledge-based systems (eg PIN sand passwords) government applications have used token-based systems (eg ID cards and badges) and forensic applications have relied on human experts to match biometric features Biometric systems are being increasingly deployed in large scale civilian applications The Schiphol Premium scheme at the Amsterdam airport for example employs iris scan cards to speed up the passport and visa control procedures

CHAPTER 6

CONCLUSION AND FUTURE SCOPE

61 CONCLUSION

In this paper we proposed a new parallel sclera vein recognition

method which employees a two stage parallel approach for registration and

matching Even though the research focused on developing a parallel sclera

matching solution for the sequential line-descriptor method using CUDA

GPU architecture the parallel strategies developed in this research can be

applied to design parallel solutions to other sclera vein recognition methods

and general pattern recognition methods We designed the Y shape

descriptor to narrow the search range to increase the matching efficiency

which is a new feature extraction method to take advantage of the GPU

structures We developed the WPL descriptor to incorporate mask

information and make it more suitable for parallel computing which can

dramatically reduce data transferring and computation We then carefully

mapped our algorithms to GPU threads and blocks which is an important

step to achieve parallel computation efficiency using a GPU A work flow

which has high arithmetic intensity to hide the memory access latency was

designed to partition the computation task to the heterogeneous system of

CPU and GPU even to the threads in GPU The proposed method

dramatically improves the matching efficiency without compromising

recognition accuracy

62 REFERENCES

[1] C W Oyster The Human Eye Structure and Function Sunderland

Sinauer Associates 1999

[2] C Cuevas D Berjon F Moran and N Garcia ldquoMoving object

detection for real-time augmented reality applications in a GPGPUrdquo IEEE

Trans Consum Electron vol 58 no 1 pp 117ndash125 Feb 2012

[3] D C Cirean U Meier L M Gambardella and J Schmidhuber ldquoDeep

big simple neural nets for handwritten digit recognitionrdquo Neural Comput

vol 22 no 12 pp 3207ndash3220 2010

[4] F Z Sakr M Taher and A M Wahba ldquoHigh performance iris

recognition system on GPUrdquo in Proc ICCES 2011 pp 237ndash242

[5] G Poli J H Saito J F Mari and M R Zorzan ldquoProcessing

neocognitron of face recognition on high performance environment based

on GPU with CUDA architecturerdquo in Proc 20th Int Symp Comput

Archit High Perform Comput 2008 pp 81ndash88

[6] J Antikainen J Havel R Josth A Herout P Zemcik and M Hauta-

Kasari ldquoNonnegative tensor factorization accelerated using GPGPUrdquo IEEE

Trans Parallel Distrib Syst vol 22 no 7 pp 1135ndash1141 Feb 2011

[7] K-S Oh and K Jung ldquoGPU implementation of neural networksrdquo

Pattern Recognit vol 37 no 6 pp 1311ndash1314 2004

[8] P R Dixon T Oonishi and S Furui ldquoHarnessing graphics processors

for the fast computation of acoustic likelihoods in speech recognitionrdquo

Comput Speech Lang vol 23 no 4 pp 510ndash526 2009

[9] P Kaufman and A Alm ldquoClinical applicationrdquo Adlerrsquos Physiology of

the Eye 2003

[10] R N Rakvic B J Ulis R P Broussard R W Ives and N Steiner

ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics Security vol 4

no 4 pp 812ndash823 Dec 2009

[11] S Crihalmeanu and A Ross ldquoMultispectral scleral patterns for ocular

biometric recognitionrdquo Pattern Recognit Lett vol 33 no 14 pp 1860ndash

1869 Oct 2012

[12] W Wenying Z Dongming Z Yongdong L Jintao and G

Xiaoguang ldquoRobust spatial matching for object retrieval and its parallel

implementation on GPUrdquo IEEE Trans Multimedia vol 13 no 6 pp

1308ndash1318 Dec 2011Multimedia Sec Magdeburg Germany Sep 2004

pp 4ndash15

[13] Y Xu S Deka and R Righetti ldquoA hybrid CPU-GPGPU approach for

real-time elastographyrdquo IEEE Trans Ultrason Ferroelectr Freq Control

vol 58 no 12 pp 2631ndash2645 Dec 2011

[14] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive

multimodal eye recognitionrdquo Signal Image Video Process vol 7 no 4

pp 619ndash631 Jul 2013

[15] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive

approach for sclera image quality measurerdquo Int J Biometrics vol 5 no 2

pp 181ndash198 2013

[16] Z Zhou E Y Du N L Thomas and E J Delp ldquoA new human

identification method Sclera recognitionrdquo IEEE Trans Syst Man

Cybern A Syst Humans vol 42 no 3 pp 571ndash583 May 2012

Page 49: Major project report on

Due to scale- and rotation- invariance of the Y-shape features

registration is unnecessary before matching on Y shape descriptor The

whole matching algorithm is listed as algorithm 1

FIG

Here ytei and yta j are the Y shape descriptors of test template Tte

and target template Tta respectively dϕ is the Euclidian distance of angle

element of descriptors vector defined as (3) dxy is the Euclidian distance of

two descriptor centers defined as (4) ni and di are the matched descriptor

pairsrsquo number and their centers distance respectively tϕ is a distance

threshold and txy is the threshold to restrict the searching area We set tϕ to

30 and txy to 675 in our experiment Here

To match two sclera templates we searched the areas nearby to all

the Y shape branches The search area is limited to the corresponding left or

right half of the sclera in order to reduce the searching range and time The

distance of two branches is defined in (3) where ϕi j is the angle between

the j th branch and the polar from pupil center in desctiptor i

The number of matched pairs ni and the distance between Y shape

branches centers di are stored as the matching result We fuse the number of

matched branches and the average distance between matched branches

centers as (2) Here α is a factor to fuse the matching score which was set

to 30 in our study Ni and Nj is the total numbers of feature vectors in

template i and j separately The decision is regulated by the threshold t if

the sclerarsquos matching score is lower than t the sclera will be discarded The

sclera with high matching score will be passed to the next more precisely

matching process

242 STAGE II FINE MATCHING USING WPL DESCRIPTOR

The line segment WSL descriptor reveals more vessel structure detail of

sclera than the Y shape descriptor The variation of sclera vessel pattern is

nonlinear because

When acquiring an eye image in different gaze angle the vessel structure

will appear nonlinear shrink or extend because eyeball is spherical in shape

sclera is made up of four layers episclera stroma lamina fusca and

endothelium There are slightly differences among movement of these

layers Considering these factors our registration employed both single

shift transform and multi-parameter transform which combines shift

rotation and scale together

1) SHIFT PARAMETER SEARCH As we discussed before

segmentation may not be accurate As a result the detected iris center could

not be very accurate Shift transform is designed to tolerant possible errors

in pupil center detection in the segmentation step If there is no deformation

or only very minor deformation registration with shift transform together

would be adequate to achieve an accurate result We designed Algorithm 2

to get optimized shift parameter Where Tte is the test template and ssei is

the i th WPL descriptor of Tte Tta is the target template and ssai is the i th

WPL descriptor of Ttad(stek staj ) is Euclidean distance of escriptors stek

and staj

Δsk is the shift value of two descriptors defines as

We first randomly select an equal number of segment descriptors

stek in test template Tte from each quad and find its nearest neighbors staj _

in target template Tta The shift offset of them is recorded as the possible

registration shift factor _sk The final offset registration factor is _soptim

which has the smallest standard deviation among these candidate offsets

2) AFFINE TRANSFORM PARAMETER SEARCH

Affine transform is designed to tolerant some deformation of sclera

patterns in the matching step The affine transform algorithm is shown in

Algorithm 3 The shift value in the parameter set is obtained by randomly

selecting descriptor s(it )te and calculating the distance from its nearest

neighbor staj_ in Tta We transform the test template by the matrix in (7)

At end of the iteration we count the numbers of matched descriptor pairs

from the transformed template and the target template The factor β is

involved to determine if the pair of descriptor is matched and we set it to

be 20 pixels in our experiment After N iterations the optimized transform

parameter set is determined via selecting the maximum matching numbers

m(it) Here stei Tte staj and Tta is defined same as algorithm 2 tr (it )

shi f t θ(it )tr (it ) scale is the parameters of shift rotation and scale

transform generated in i tth iteration R(θ (it )) T (tr (it ) shi f t ) and S(tr (it

) scale) are the transform matrix defined as (7) To search optimize

transform parameter we iterated N times to generate these parameters In

our experiment we set iteration time to 512

3) REGISTRATION AND MATCHING ALGORITHM

Using the optimized parameter set determined from Algorithms 2

and 3 the test template will be registered and matched simultaneously The

registration and matching algorithm is listed in Algorithm 4 Here stei Tte

staj and Tta are defined same as Algorithms 2 and 3 θ(optm) tr (optm) shi f

t tr (optm) scale _soptim are the registration parameters attained from

Algorithms 2 and 3 R_θ(optm)_T _tr (optm) shi f t _S_tr (optm) scale

_ is the descriptor transform matrix defined in Algorithm 3 empty is the angle

between the segment descriptor and radius direction w is the weight of the

descriptor which indicates whether the descriptor is at the edge of sclera or

not To ensure that the nearest descriptors have a similar orientation we

used a constant factor α to check the abstract difference of two ɸ In our

experiment we set α to 5 The total matching score is minimal score of two

transformed result divided by the minimal matching score for test template

and target template

25 MAPPING THE SUBTASKS TO CUDA

CUDA is a single instruction multiple data (SIMD) system and

works as a coprocessor with a CPU A CUDA consists of many streaming

multiprocessors (SM) where the parallel part of the program should be

partitioned into threads by the programmer and mapped into those threads

There are multiple memory spaces in the CUDA memory hierarchy

register local memory shared memory global memory constant memory

and texture memory Register local memory and shared memory are on-

chip and could be a little time consuming to access these memories Only

shared memory can be accessed by other threads within the same block

However there is only limited availability of shared memory Global

memory constant memory and texture memory are off-chip memory and

accessible by all threads which would be very time consuming to access

these memories

Constant memory and texture memory are read-only and cacheable

memory Mapping algorithms to CUDA to achieve efficient processing is

not a trivial task There are several challenges in CUDA programming

If threads in a warp have different control path all the branches will be

executed serially To improve performance branch divergence within a

warp should be avoided

Global memory is slower than on-chip memory in term of access To

completely hide the latency of the small instructions set we should use on-

chip memory preferentially rather than global memory When global

memory access occurs threads in same warp should access the words in

sequence to achieve coalescence

Shared memory is much faster than the local and global memory space

But shared memory is organized into banks which are equal in size If two

addresses of memory request from different thread within a warp fall in the

same memory bank the access will be serialized To get maximum

performance memory requests should be scheduled to minimize bank

conflicts

251 MAPPING ALGORITHM TO BLOCKS

Because the proposed registration and matching algorithm has four

independent modules all the modules will be converted to different kernels

on the GPU These kernels are different in computation density thus we

map them to the GPU by various map strategies to fully utilize the

computing power of CUDA Figure 11 shows our scheme of CPU-GPU

task distribution and the partition among blocks and threads Algorithm 1 is

partitioned into coarse-grained parallel subtasks

We create a number of threads in this kernel The number of threads

is the same as the number of templates in the database As the upper middle

column shows in Figure 11 each target template will be assigned to one

thread One thread performs a pair of templates compare In our work we

use NVIDIA C2070 as our GPU Threads and blocks number is set to

1024 That means we can match our test template with up to 1024times1024

target templates at same time

Algorithms 2-4 will be partitioned into fine-grained subtasks which is

processed a section of descriptors in one thread As the lower portion of the

middle column shows in Figure 11 we assigned a target template to one

block Inside a block one thread corresponds a set of descriptors in this

template This partition makes every block execute independently and there

are no data exchange requirements between different blocks When all

threads complete their responding descriptor fractions the sum of the

intermediate results needs to be computed or compared A parallel prefix

sum algorithm is used to calculate the sum of intermediate results which is

show in right of Figure 11 Firstly all odd number threads compute the sum

of consecutive pairs of the results Then recursively every first of i (= 4 8

16 32 64 ) threads

compute the prefix sum on the new result The final result will be saved in

the first address which has the same variable name as the first intermediate

result

252 MAPPING INSIDE BLOCK

In shift argument searching there are two schemes we can choose to

map task

Mapping one pair of templates to all the threads in a block and then every

thread would take charge of a fraction of descriptors and cooperation with

other threads

Assigning a single possible shift offset to a thread and all the threads will

compute independently unless the final result should be compared with

other possible offset

Due to great number of sum and synchronization operations in every

nearest neighbor searching step we choose the second method to parallelize

shift searching In affine matrix generator we mapped an entire parameter

set searching to a thread and every thread randomly generated a set of

parameters and tried them independently The generated iterations were

assigned to all threads The challenge of this step is the randomly generated

numbers might be correlated among threads In the step of rotation and

scale registration generating we used the Mersenne Twister pseudorandom

number generator because it can use bitwise arithmetic and have long

period

The Mersenne twister as most of pseudorandom generators is iterative

Therefore itrsquos hard to parallelize a single twister state update step among

several execution threads To make sure that thousands of threads in the

launch grid generate uncorrelated random sequence many simultaneous

Mersenne twisters need to process with different initial states in parallel

But even ldquovery differentrdquo (by any definition) initial state values do not

prevent the emission of correlated sequences by each generator sharing

identical parameters To solve this problem and to enable efficient

implementation of Mersenne Twister on parallel architectures we used a

special offline tool for the dynamic creation of Mersenne Twisters

parameters modified from the algorithm developed by Makoto Matsumoto

and Takuji Nishimura In the registration and matching step when

searching the nearest neighbor a line segment that has already matched

with others should not be used again In our approach a flag

FIG

FIG

Variable denoting whether the line has been matched is stored in

shared memory To share the flags all the threads in a block should wait

synchronic operation at every query step Our solution is to use a single

thread in a block to process the matching

253 MEMORY MANAGEMENT

The bandwidth inside GPU board is much higher than the

bandwidth between host memory and device memory The data transfer

between host and device can lead to long latency As shown in Figure 11

we load the entire target templates set from database without considering

when they would be processed Therefore there was no data transfer from

host to device during the matching procedure In global memory the

components in descriptor y(ϕ1 ϕ2 ϕ3 x y) and s(x y rθ ϕw) were stored

separately This would guarantee contiguous kernels of Algorithm 2 to 4

can access their data in successive addresses Although such coalescing

access reduces the latency frequently global memory access was still a

slower way to get data In our kernel we loaded the test template to shared

memory to accelerate memory access Because the Algorithms 2 to 4

execute different number of iterations on same data the bank conflict does

not happen To maximize our texture memory space we set the system

cache to the lowest value and bonded our target descriptor to texture

memory Using this catchable memory our data access was accelerated

more

FIG

26 HISTOGRAM OF ORIENTED GRADIENTS

Histogram of oriented gradients is the feature descriptors It is primarily

applied to the design of target detection In this paper it is applied as the

feature for human recognition In the sclera region the vein patterns are the

edges of an image So HOG is used to determine the gradient orientation

and edge orientations of vein pattern in the sclera region of an eye image

To follow out this technique first of all divide the image into small

connected regions called cells For each cell compute the histogram of

gradient directions or edge orientations of the pixels Then the combination

of different histogram of different cell represents the descriptor To improve

accuracy histograms can be contrast normalized by calculating the intensity

from the block and then using this value normalizes all cells within the

block This normalization result shows that it is invariant to geometric and

photometric changes The gradient magnitude m(x y) and orientation 1050592(x

y) are calculated using x and y directions gradients dx (x y) and dy (x y)

Orientation binning is the second step of HOG This method utilized

to create cell histograms Each pixel within the cell used to give a weight to

the orientation which is found in the gradient computation Gradient

magnitude is used as the weight The cells are in the rectangular form The

binning of gradient orientation should be spread over 0 to 180 degrees and

opposite direction counts as the same In the Fig 8 depicts the edge

orientation of picture elements If the images have any illumination and

contrast changes then the gradient strength must be locally normalized For

that cells are grouped together into larger blocks These blocks are

overlapping so that each cell contributes more than once to the final

descriptor Here rectangular HOG (R-HOG) blocks are applied which are

mainly in square grids The performance of HOG is improved by putting

on a Gaussian window into each block

FIG

CHAPTER 3

SOFTWARE SPECIFICATION

31 GENERAL

MATLAB(matrix laboratory) is a numerical

computing environment and fourth-generation programming language

Developed by Math Works MATLAB allows matrix manipulations

plotting of functions and data implementation of algorithms creation

of user interfaces and interfacing with programs written in other languages

including C C++ Java and Fortran

Although MATLAB is intended primarily for numerical computing an

optional toolbox uses the MuPAD symbolic engine allowing access

to symbolic computing capabilities An additional package Simulink adds

graphicalmulti-domainsimulationandModel-Based

Design for dynamic and embedded systems

In 2004 MATLAB had around one million users across industry

and academia MATLAB users come from various backgrounds

of engineering science and economics MATLAB is widely used in

academic and research institutions as well as industrial enterprises

MATLAB was first adopted by researchers and practitioners

in control engineering Littles specialty but quickly spread to many other

domains It is now also used in education in particular the teaching

of linear algebra and numerical analysis and is popular amongst scientists

involved in image processing The MATLAB application is built around the

MATLAB language The simplest way to execute MATLAB code is to type

it in the Command Window which is one of the elements of the MATLAB

Desktop When code is entered in the Command Window MATLAB can

be used as an interactive mathematical shell Sequences of commands can

be saved in a text file typically using the MATLAB Editor as a script or

encapsulated into a function extending the commands available

MATLAB provides a number of features for documenting and

sharing your work You can integrate your MATLAB code with other

languages and applications and distribute your MATLAB algorithms and

applications

32 FEATURES OF MATLAB

High-level language for technical computing

Development environment for managing code files and data

Interactive tools for iterative exploration design and problem solving

Mathematical functions for linear algebra statistics Fourier analysis

filtering optimization and numerical integration

2-D and 3-D graphics functions for visualizing data

Tools for building custom graphical user interfaces

Functions for integrating MATLAB based algorithms with external

applications and languages such as C C++ FORTRAN Javatrade COM

and Microsoft Excel

MATLAB is used in vast area including signal and image

processing communications control design test and measurement

financial modeling and analysis and computational Add-on toolboxes

(collections of special-purpose MATLAB functions) extend the MATLAB

environment to solve particular classes of problems in these application

areas

MATLAB can be used on personal computers and powerful

server systems including the Cheaha compute cluster With the addition of

the Parallel Computing Toolbox the language can be extended with parallel

implementations for common computational functions including for-loop

unrolling Additionally this toolbox supports offloading computationally

intensive workloads to Cheaha the campus compute cluster MATLAB is

one of a few languages in which each variable is a matrix (broadly

construed) and knows how big it is Moreover the fundamental operators

(eg addition multiplication) are programmed to deal with matrices when

required And the MATLAB environment handles much of the bothersome

housekeeping that makes all this possible Since so many of the procedures

required for Macro-Investment Analysis involves matrices MATLAB

proves to be an extremely efficient language for both communication and

implementation

321 INTERFACING WITH OTHER LANGUAGES

MATLAB can call functions and subroutines written in the C

programming language or FORTRAN A wrapper function is created

allowing MATLAB data types to be passed and returned The dynamically

loadable object files created by compiling such functions are termed MEX-

files (for MATLAB executable)

Libraries written in Java ActiveX or NET can be directly called

from MATLAB and many MATLAB libraries (for

example XML or SQL support) are implemented as wrappers around Java

or ActiveX libraries Calling MATLAB from Java is more complicated but

can be done with MATLAB extension which is sold separately by Math

Works or using an undocumented mechanism called JMI (Java-to-Mat lab

Interface) which should not be confused with the unrelated Java that is also

called JMI

As alternatives to the MuPAD based Symbolic Math Toolbox

available from Math Works MATLAB can be connected

to Maple or Mathematical

Libraries also exist to import and export MathML

Development Environment

Startup Accelerator for faster MATLAB startup on Windows especially on

Windows XP and for network installations

Spreadsheet Import Tool that provides more options for selecting and

loading mixed textual and numeric data

Readability and navigation improvements to warning and error messages in

the MATLAB command window

Automatic variable and function renaming in the MATLAB Editor

Developing Algorithms and Applications

MATLAB provides a high-level language and development

tools that let you quickly develop and analyze your algorithms and

applications

The MATLAB Language

The MATLAB language supports the vector and matrix operations

that are fundamental to engineering and scientific problems It enables fast

development and execution With the MATLAB language you can

program and develop algorithms faster than with traditional languages

because you do not need to perform low-level administrative tasks such as

declaring variables specifying data types and allocating memory In many

cases MATLAB eliminates the need for lsquoforrsquo loops As a result one line of

MATLAB code can often replace several lines of C or C++ code

At the same time MATLAB provides all the features of a traditional

programming language including arithmetic operators flow control data

structures data types object-oriented programming (OOP) and debugging

features

MATLAB lets you execute commands or groups of commands one

at a time without compiling and linking enabling you to quickly iterate to

the optimal solution For fast execution of heavy matrix and vector

computations MATLAB uses processor-optimized libraries For general-

purpose scalar computations MATLAB generates machine-code

instructions using its JIT (Just-In-Time) compilation technology

This technology which is available on most platforms provides

execution speeds that rival those of traditional programming languages

Development Tools

MATLAB includes development tools that help you implement

your algorithm efficiently These include the following

MATLAB Editor

Provides standard editing and debugging features such as setting

breakpoints and single stepping

Code Analyzer

Checks your code for problems and recommends modifications to

maximize performance and maintainability

MATLAB Profiler

Records the time spent executing each line of code

Directory Reports

Scan all the files in a directory and report on code efficiency file

differences file dependencies and code coverage

Designing Graphical User Interfaces

By using the interactive tool GUIDE (Graphical User Interface

Development Environment) to layout design and edit user interfaces

GUIDE lets you include list boxes pull-down menus push buttons radio

buttons and sliders as well as MATLAB plots and Microsoft

ActiveXreg controls Alternatively you can create GUIs programmatically

using MATLAB functions

322 ANALYZING AND ACCESSING DATA

MATLAB supports the entire data analysis process from acquiring

data from external devices and databases through preprocessing

visualization and numerical analysis to producing presentation-quality

output

Data Analysis

MATLAB provides interactive tools and command-line functions for data

analysis operations including

Interpolating and decimating

Extracting sections of data scaling and averaging

Thresholding and smoothing

Correlation Fourier analysis and filtering

1-D peak valley and zero finding

Basic statistics and curve fitting

Matrix analysis

Data Access

MATLAB is an efficient platform for accessing data from

files other applications databases and external devices You can read data

from popular file formats such as Microsoft Excel ASCII text or binary

files image sound and video files and scientific files such as HDF and

HDF5 Low-level binary file IO functions let you work with data files in

any format Additional functions let you read data from Web pages and

XML

Visualizing Data

All the graphics features that are required to visualize engineering

and scientific data are available in MATLAB These include 2-D and 3-D

plotting functions 3-D volume visualization functions tools for

interactively creating plots and the ability to export results to all popular

graphics formats You can customize plots by adding multiple axes

changing line colors and markers adding annotation Latex equations and

legends and drawing shapes

2-D Plotting

Visualizing vectors of data with 2-D plotting functions that create

Line area bar and pie charts

Direction and velocity plots

Histograms

Polygons and surfaces

Scatterbubble plots

Animations

3-D Plotting and Volume Visualization

MATLAB provides functions for visualizing 2-D matrices 3-

D scalar and 3-D vector data You can use these functions to visualize and

understand large often complex multidimensional data Specifying plot

characteristics such as camera viewing angle perspective lighting effect

light source locations and transparency

3-D plotting functions include

Surface contour and mesh

Image plots

Cone slice stream and isosurface

323 PERFORMING NUMERIC COMPUTATION

MATLAB contains mathematical statistical and engineering

functions to support all common engineering and science operations These

functions developed by experts in mathematics are the foundation of the

MATLAB language The core math functions use the LAPACK and BLAS

linear algebra subroutine libraries and the FFTW Discrete Fourier

Transform library Because these processor-dependent libraries are

optimized to the different platforms that MATLAB supports they execute

faster than the equivalent C or C++ code

MATLAB provides the following types of functions for performing

mathematical operations and analyzing data

Matrix manipulation and linear algebra

Polynomials and interpolation

Fourier analysis and filtering

Data analysis and statistics

Optimization and numerical integration

Ordinary differential equations (ODEs)

Partial differential equations (PDEs)

Sparse matrix operations

MATLAB can perform arithmetic on a wide range of data types

including doubles singles and integers

CHAPTER 4

IMPLEMENTATION

41 GENERAL

Matlab is a program that was originally designed to simplify the

implementation of numerical linear algebra routines It has since grown into

something much bigger and it is used to implement numerical algorithms

for a wide range of applications The basic language used is very similar to

standard linear algebra notation but there are a few extensions that will

likely cause you some problems at first

42 SNAPSHOTS

ORIGINAL SCLERA IMAGE IS CONVERTED INTO GREY SCALE IMAGE

FIG

GREY SCALE IMAGE IS CONVERTED INTO BINARY IMAGE

FIG

EDGE DETECTON IS DONE BY OTSUrsquoS THRESHOLDING

FIG

SELECTING THE REGION OF INTEREST (SCLERA PART)

FIG

SELECTED ROI PART

FIG

FIG

ENHANCEMENT OF SCLERA IMAGE

FIG

FEATURE EXTRACTION OF SCLERA IMAGE USING GABOR

FILTERS

FIG

MATCHING WITH IMAGES IN DATABASE

FIG

DISPLAYING THE RESULT (MATCHED OR NOT MATCHED)

FIG

CHAPTER 5

APPLICATIONS

The applications of biometrics can be divided into the following three main groups

Commercial applications such as computer network login electronic data security ecommerce Internet access ATM credit card physical access control cellular phone PDA medical records management distance learning etc

Government applications such as national ID card correctional facility driverrsquos license social security welfare-disbursement border control Passport control etc

Forensic applications such as corpse identification criminal investigation terrorist identification parenthood determination missing children etc Traditionally commercial applications have used knowledge-based systems (eg PIN sand passwords) government applications have used token-based systems (eg ID cards and badges) and forensic applications have relied on human experts to match biometric features Biometric systems are being increasingly deployed in large scale civilian applications The Schiphol Premium scheme at the Amsterdam airport for example employs iris scan cards to speed up the passport and visa control procedures

CHAPTER 6

CONCLUSION AND FUTURE SCOPE

61 CONCLUSION

In this paper we proposed a new parallel sclera vein recognition

method which employees a two stage parallel approach for registration and

matching Even though the research focused on developing a parallel sclera

matching solution for the sequential line-descriptor method using CUDA

GPU architecture the parallel strategies developed in this research can be

applied to design parallel solutions to other sclera vein recognition methods

and general pattern recognition methods We designed the Y shape

descriptor to narrow the search range to increase the matching efficiency

which is a new feature extraction method to take advantage of the GPU

structures We developed the WPL descriptor to incorporate mask

information and make it more suitable for parallel computing which can

dramatically reduce data transferring and computation We then carefully

mapped our algorithms to GPU threads and blocks which is an important

step to achieve parallel computation efficiency using a GPU A work flow

which has high arithmetic intensity to hide the memory access latency was

designed to partition the computation task to the heterogeneous system of

CPU and GPU even to the threads in GPU The proposed method

dramatically improves the matching efficiency without compromising

recognition accuracy

62 REFERENCES

[1] C W Oyster The Human Eye Structure and Function Sunderland

Sinauer Associates 1999

[2] C Cuevas D Berjon F Moran and N Garcia ldquoMoving object

detection for real-time augmented reality applications in a GPGPUrdquo IEEE

Trans Consum Electron vol 58 no 1 pp 117ndash125 Feb 2012

[3] D C Cirean U Meier L M Gambardella and J Schmidhuber ldquoDeep

big simple neural nets for handwritten digit recognitionrdquo Neural Comput

vol 22 no 12 pp 3207ndash3220 2010

[4] F Z Sakr M Taher and A M Wahba ldquoHigh performance iris

recognition system on GPUrdquo in Proc ICCES 2011 pp 237ndash242

[5] G Poli J H Saito J F Mari and M R Zorzan ldquoProcessing

neocognitron of face recognition on high performance environment based

on GPU with CUDA architecturerdquo in Proc 20th Int Symp Comput

Archit High Perform Comput 2008 pp 81ndash88

[6] J Antikainen J Havel R Josth A Herout P Zemcik and M Hauta-

Kasari ldquoNonnegative tensor factorization accelerated using GPGPUrdquo IEEE

Trans Parallel Distrib Syst vol 22 no 7 pp 1135ndash1141 Feb 2011

[7] K-S Oh and K Jung ldquoGPU implementation of neural networksrdquo

Pattern Recognit vol 37 no 6 pp 1311ndash1314 2004

[8] P R Dixon T Oonishi and S Furui ldquoHarnessing graphics processors

for the fast computation of acoustic likelihoods in speech recognitionrdquo

Comput Speech Lang vol 23 no 4 pp 510ndash526 2009

[9] P Kaufman and A Alm ldquoClinical applicationrdquo Adlerrsquos Physiology of

the Eye 2003

[10] R N Rakvic B J Ulis R P Broussard R W Ives and N Steiner

ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics Security vol 4

no 4 pp 812ndash823 Dec 2009

[11] S Crihalmeanu and A Ross ldquoMultispectral scleral patterns for ocular

biometric recognitionrdquo Pattern Recognit Lett vol 33 no 14 pp 1860ndash

1869 Oct 2012

[12] W Wenying Z Dongming Z Yongdong L Jintao and G

Xiaoguang ldquoRobust spatial matching for object retrieval and its parallel

implementation on GPUrdquo IEEE Trans Multimedia vol 13 no 6 pp

1308ndash1318 Dec 2011Multimedia Sec Magdeburg Germany Sep 2004

pp 4ndash15

[13] Y Xu S Deka and R Righetti ldquoA hybrid CPU-GPGPU approach for

real-time elastographyrdquo IEEE Trans Ultrason Ferroelectr Freq Control

vol 58 no 12 pp 2631ndash2645 Dec 2011

[14] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive

multimodal eye recognitionrdquo Signal Image Video Process vol 7 no 4

pp 619ndash631 Jul 2013

[15] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive

approach for sclera image quality measurerdquo Int J Biometrics vol 5 no 2

pp 181ndash198 2013

[16] Z Zhou E Y Du N L Thomas and E J Delp ldquoA new human

identification method Sclera recognitionrdquo IEEE Trans Syst Man

Cybern A Syst Humans vol 42 no 3 pp 571ndash583 May 2012

Page 50: Major project report on

To match two sclera templates we searched the areas nearby to all

the Y shape branches The search area is limited to the corresponding left or

right half of the sclera in order to reduce the searching range and time The

distance of two branches is defined in (3) where ϕi j is the angle between

the j th branch and the polar from pupil center in desctiptor i

The number of matched pairs ni and the distance between Y shape

branches centers di are stored as the matching result We fuse the number of

matched branches and the average distance between matched branches

centers as (2) Here α is a factor to fuse the matching score which was set

to 30 in our study Ni and Nj is the total numbers of feature vectors in

template i and j separately The decision is regulated by the threshold t if

the sclerarsquos matching score is lower than t the sclera will be discarded The

sclera with high matching score will be passed to the next more precisely

matching process

242 STAGE II FINE MATCHING USING WPL DESCRIPTOR

The line segment WSL descriptor reveals more vessel structure detail of

sclera than the Y shape descriptor The variation of sclera vessel pattern is

nonlinear because

When acquiring an eye image in different gaze angle the vessel structure

will appear nonlinear shrink or extend because eyeball is spherical in shape

sclera is made up of four layers episclera stroma lamina fusca and

endothelium There are slightly differences among movement of these

layers Considering these factors our registration employed both single

shift transform and multi-parameter transform which combines shift

rotation and scale together

1) SHIFT PARAMETER SEARCH As we discussed before

segmentation may not be accurate As a result the detected iris center could

not be very accurate Shift transform is designed to tolerant possible errors

in pupil center detection in the segmentation step If there is no deformation

or only very minor deformation registration with shift transform together

would be adequate to achieve an accurate result We designed Algorithm 2

to get optimized shift parameter Where Tte is the test template and ssei is

the i th WPL descriptor of Tte Tta is the target template and ssai is the i th

WPL descriptor of Ttad(stek staj ) is Euclidean distance of escriptors stek

and staj

Δsk is the shift value of two descriptors defines as

We first randomly select an equal number of segment descriptors

stek in test template Tte from each quad and find its nearest neighbors staj _

in target template Tta The shift offset of them is recorded as the possible

registration shift factor _sk The final offset registration factor is _soptim

which has the smallest standard deviation among these candidate offsets

2) AFFINE TRANSFORM PARAMETER SEARCH

Affine transform is designed to tolerant some deformation of sclera

patterns in the matching step The affine transform algorithm is shown in

Algorithm 3 The shift value in the parameter set is obtained by randomly

selecting descriptor s(it )te and calculating the distance from its nearest

neighbor staj_ in Tta We transform the test template by the matrix in (7)

At end of the iteration we count the numbers of matched descriptor pairs

from the transformed template and the target template The factor β is

involved to determine if the pair of descriptor is matched and we set it to

be 20 pixels in our experiment After N iterations the optimized transform

parameter set is determined via selecting the maximum matching numbers

m(it) Here stei Tte staj and Tta is defined same as algorithm 2 tr (it )

shi f t θ(it )tr (it ) scale is the parameters of shift rotation and scale

transform generated in i tth iteration R(θ (it )) T (tr (it ) shi f t ) and S(tr (it

) scale) are the transform matrix defined as (7) To search optimize

transform parameter we iterated N times to generate these parameters In

our experiment we set iteration time to 512

3) REGISTRATION AND MATCHING ALGORITHM

Using the optimized parameter set determined from Algorithms 2

and 3 the test template will be registered and matched simultaneously The

registration and matching algorithm is listed in Algorithm 4 Here stei Tte

staj and Tta are defined same as Algorithms 2 and 3 θ(optm) tr (optm) shi f

t tr (optm) scale _soptim are the registration parameters attained from

Algorithms 2 and 3 R_θ(optm)_T _tr (optm) shi f t _S_tr (optm) scale

_ is the descriptor transform matrix defined in Algorithm 3 empty is the angle

between the segment descriptor and radius direction w is the weight of the

descriptor which indicates whether the descriptor is at the edge of sclera or

not To ensure that the nearest descriptors have a similar orientation we

used a constant factor α to check the abstract difference of two ɸ In our

experiment we set α to 5 The total matching score is minimal score of two

transformed result divided by the minimal matching score for test template

and target template

25 MAPPING THE SUBTASKS TO CUDA

CUDA is a single instruction multiple data (SIMD) system and

works as a coprocessor with a CPU A CUDA consists of many streaming

multiprocessors (SM) where the parallel part of the program should be

partitioned into threads by the programmer and mapped into those threads

There are multiple memory spaces in the CUDA memory hierarchy

register local memory shared memory global memory constant memory

and texture memory Register local memory and shared memory are on-

chip and could be a little time consuming to access these memories Only

shared memory can be accessed by other threads within the same block

However there is only limited availability of shared memory Global

memory constant memory and texture memory are off-chip memory and

accessible by all threads which would be very time consuming to access

these memories

Constant memory and texture memory are read-only and cacheable

memory Mapping algorithms to CUDA to achieve efficient processing is

not a trivial task There are several challenges in CUDA programming

If threads in a warp have different control path all the branches will be

executed serially To improve performance branch divergence within a

warp should be avoided

Global memory is slower than on-chip memory in term of access To

completely hide the latency of the small instructions set we should use on-

chip memory preferentially rather than global memory When global

memory access occurs threads in same warp should access the words in

sequence to achieve coalescence

Shared memory is much faster than the local and global memory space

But shared memory is organized into banks which are equal in size If two

addresses of memory request from different thread within a warp fall in the

same memory bank the access will be serialized To get maximum

performance memory requests should be scheduled to minimize bank

conflicts

251 MAPPING ALGORITHM TO BLOCKS

Because the proposed registration and matching algorithm has four

independent modules all the modules will be converted to different kernels

on the GPU These kernels are different in computation density thus we

map them to the GPU by various map strategies to fully utilize the

computing power of CUDA Figure 11 shows our scheme of CPU-GPU

task distribution and the partition among blocks and threads Algorithm 1 is

partitioned into coarse-grained parallel subtasks

We create a number of threads in this kernel The number of threads

is the same as the number of templates in the database As the upper middle

column shows in Figure 11 each target template will be assigned to one

thread One thread performs a pair of templates compare In our work we

use NVIDIA C2070 as our GPU Threads and blocks number is set to

1024 That means we can match our test template with up to 1024times1024

target templates at same time

Algorithms 2-4 will be partitioned into fine-grained subtasks which is

processed a section of descriptors in one thread As the lower portion of the

middle column shows in Figure 11 we assigned a target template to one

block Inside a block one thread corresponds a set of descriptors in this

template This partition makes every block execute independently and there

are no data exchange requirements between different blocks When all

threads complete their responding descriptor fractions the sum of the

intermediate results needs to be computed or compared A parallel prefix

sum algorithm is used to calculate the sum of intermediate results which is

show in right of Figure 11 Firstly all odd number threads compute the sum

of consecutive pairs of the results Then recursively every first of i (= 4 8

16 32 64 ) threads

compute the prefix sum on the new result The final result will be saved in

the first address which has the same variable name as the first intermediate

result

252 MAPPING INSIDE BLOCK

In shift argument searching there are two schemes we can choose to

map task

Mapping one pair of templates to all the threads in a block and then every

thread would take charge of a fraction of descriptors and cooperation with

other threads

Assigning a single possible shift offset to a thread and all the threads will

compute independently unless the final result should be compared with

other possible offset

Due to great number of sum and synchronization operations in every

nearest neighbor searching step we choose the second method to parallelize

shift searching In affine matrix generator we mapped an entire parameter

set searching to a thread and every thread randomly generated a set of

parameters and tried them independently The generated iterations were

assigned to all threads The challenge of this step is the randomly generated

numbers might be correlated among threads In the step of rotation and

scale registration generating we used the Mersenne Twister pseudorandom

number generator because it can use bitwise arithmetic and have long

period

The Mersenne twister as most of pseudorandom generators is iterative

Therefore itrsquos hard to parallelize a single twister state update step among

several execution threads To make sure that thousands of threads in the

launch grid generate uncorrelated random sequence many simultaneous

Mersenne twisters need to process with different initial states in parallel

But even ldquovery differentrdquo (by any definition) initial state values do not

prevent the emission of correlated sequences by each generator sharing

identical parameters To solve this problem and to enable efficient

implementation of Mersenne Twister on parallel architectures we used a

special offline tool for the dynamic creation of Mersenne Twisters

parameters modified from the algorithm developed by Makoto Matsumoto

and Takuji Nishimura In the registration and matching step when

searching the nearest neighbor a line segment that has already matched

with others should not be used again In our approach a flag

FIG

FIG

Variable denoting whether the line has been matched is stored in

shared memory To share the flags all the threads in a block should wait

synchronic operation at every query step Our solution is to use a single

thread in a block to process the matching

253 MEMORY MANAGEMENT

The bandwidth inside GPU board is much higher than the

bandwidth between host memory and device memory The data transfer

between host and device can lead to long latency As shown in Figure 11

we load the entire target templates set from database without considering

when they would be processed Therefore there was no data transfer from

host to device during the matching procedure In global memory the

components in descriptor y(ϕ1 ϕ2 ϕ3 x y) and s(x y rθ ϕw) were stored

separately This would guarantee contiguous kernels of Algorithm 2 to 4

can access their data in successive addresses Although such coalescing

access reduces the latency frequently global memory access was still a

slower way to get data In our kernel we loaded the test template to shared

memory to accelerate memory access Because the Algorithms 2 to 4

execute different number of iterations on same data the bank conflict does

not happen To maximize our texture memory space we set the system

cache to the lowest value and bonded our target descriptor to texture

memory Using this catchable memory our data access was accelerated

more

FIG

26 HISTOGRAM OF ORIENTED GRADIENTS

Histogram of oriented gradients is the feature descriptors It is primarily

applied to the design of target detection In this paper it is applied as the

feature for human recognition In the sclera region the vein patterns are the

edges of an image So HOG is used to determine the gradient orientation

and edge orientations of vein pattern in the sclera region of an eye image

To follow out this technique first of all divide the image into small

connected regions called cells For each cell compute the histogram of

gradient directions or edge orientations of the pixels Then the combination

of different histogram of different cell represents the descriptor To improve

accuracy histograms can be contrast normalized by calculating the intensity

from the block and then using this value normalizes all cells within the

block This normalization result shows that it is invariant to geometric and

photometric changes The gradient magnitude m(x y) and orientation 1050592(x

y) are calculated using x and y directions gradients dx (x y) and dy (x y)

Orientation binning is the second step of HOG This method utilized

to create cell histograms Each pixel within the cell used to give a weight to

the orientation which is found in the gradient computation Gradient

magnitude is used as the weight The cells are in the rectangular form The

binning of gradient orientation should be spread over 0 to 180 degrees and

opposite direction counts as the same In the Fig 8 depicts the edge

orientation of picture elements If the images have any illumination and

contrast changes then the gradient strength must be locally normalized For

that cells are grouped together into larger blocks These blocks are

overlapping so that each cell contributes more than once to the final

descriptor Here rectangular HOG (R-HOG) blocks are applied which are

mainly in square grids The performance of HOG is improved by putting

on a Gaussian window into each block

FIG

CHAPTER 3

SOFTWARE SPECIFICATION

31 GENERAL

MATLAB(matrix laboratory) is a numerical

computing environment and fourth-generation programming language

Developed by Math Works MATLAB allows matrix manipulations

plotting of functions and data implementation of algorithms creation

of user interfaces and interfacing with programs written in other languages

including C C++ Java and Fortran

Although MATLAB is intended primarily for numerical computing an

optional toolbox uses the MuPAD symbolic engine allowing access

to symbolic computing capabilities An additional package Simulink adds

graphicalmulti-domainsimulationandModel-Based

Design for dynamic and embedded systems

In 2004 MATLAB had around one million users across industry

and academia MATLAB users come from various backgrounds

of engineering science and economics MATLAB is widely used in

academic and research institutions as well as industrial enterprises

MATLAB was first adopted by researchers and practitioners

in control engineering Littles specialty but quickly spread to many other

domains It is now also used in education in particular the teaching

of linear algebra and numerical analysis and is popular amongst scientists

involved in image processing The MATLAB application is built around the

MATLAB language The simplest way to execute MATLAB code is to type

it in the Command Window which is one of the elements of the MATLAB

Desktop When code is entered in the Command Window MATLAB can

be used as an interactive mathematical shell Sequences of commands can

be saved in a text file typically using the MATLAB Editor as a script or

encapsulated into a function extending the commands available

MATLAB provides a number of features for documenting and

sharing your work You can integrate your MATLAB code with other

languages and applications and distribute your MATLAB algorithms and

applications

32 FEATURES OF MATLAB

High-level language for technical computing

Development environment for managing code files and data

Interactive tools for iterative exploration design and problem solving

Mathematical functions for linear algebra statistics Fourier analysis

filtering optimization and numerical integration

2-D and 3-D graphics functions for visualizing data

Tools for building custom graphical user interfaces

Functions for integrating MATLAB based algorithms with external

applications and languages such as C C++ FORTRAN Javatrade COM

and Microsoft Excel

MATLAB is used in vast area including signal and image

processing communications control design test and measurement

financial modeling and analysis and computational Add-on toolboxes

(collections of special-purpose MATLAB functions) extend the MATLAB

environment to solve particular classes of problems in these application

areas

MATLAB can be used on personal computers and powerful

server systems including the Cheaha compute cluster With the addition of

the Parallel Computing Toolbox the language can be extended with parallel

implementations for common computational functions including for-loop

unrolling Additionally this toolbox supports offloading computationally

intensive workloads to Cheaha the campus compute cluster MATLAB is

one of a few languages in which each variable is a matrix (broadly

construed) and knows how big it is Moreover the fundamental operators

(eg addition multiplication) are programmed to deal with matrices when

required And the MATLAB environment handles much of the bothersome

housekeeping that makes all this possible Since so many of the procedures

required for Macro-Investment Analysis involves matrices MATLAB

proves to be an extremely efficient language for both communication and

implementation

321 INTERFACING WITH OTHER LANGUAGES

MATLAB can call functions and subroutines written in the C

programming language or FORTRAN A wrapper function is created

allowing MATLAB data types to be passed and returned The dynamically

loadable object files created by compiling such functions are termed MEX-

files (for MATLAB executable)

Libraries written in Java ActiveX or NET can be directly called

from MATLAB and many MATLAB libraries (for

example XML or SQL support) are implemented as wrappers around Java

or ActiveX libraries Calling MATLAB from Java is more complicated but

can be done with MATLAB extension which is sold separately by Math

Works or using an undocumented mechanism called JMI (Java-to-Mat lab

Interface) which should not be confused with the unrelated Java that is also

called JMI

As alternatives to the MuPAD based Symbolic Math Toolbox

available from Math Works MATLAB can be connected

to Maple or Mathematical

Libraries also exist to import and export MathML

Development Environment

Startup Accelerator for faster MATLAB startup on Windows especially on

Windows XP and for network installations

Spreadsheet Import Tool that provides more options for selecting and

loading mixed textual and numeric data

Readability and navigation improvements to warning and error messages in

the MATLAB command window

Automatic variable and function renaming in the MATLAB Editor

Developing Algorithms and Applications

MATLAB provides a high-level language and development

tools that let you quickly develop and analyze your algorithms and

applications

The MATLAB Language

The MATLAB language supports the vector and matrix operations

that are fundamental to engineering and scientific problems It enables fast

development and execution With the MATLAB language you can

program and develop algorithms faster than with traditional languages

because you do not need to perform low-level administrative tasks such as

declaring variables specifying data types and allocating memory In many

cases MATLAB eliminates the need for lsquoforrsquo loops As a result one line of

MATLAB code can often replace several lines of C or C++ code

At the same time MATLAB provides all the features of a traditional

programming language including arithmetic operators flow control data

structures data types object-oriented programming (OOP) and debugging

features

MATLAB lets you execute commands or groups of commands one

at a time without compiling and linking enabling you to quickly iterate to

the optimal solution For fast execution of heavy matrix and vector

computations MATLAB uses processor-optimized libraries For general-

purpose scalar computations MATLAB generates machine-code

instructions using its JIT (Just-In-Time) compilation technology

This technology which is available on most platforms provides

execution speeds that rival those of traditional programming languages

Development Tools

MATLAB includes development tools that help you implement

your algorithm efficiently These include the following

MATLAB Editor

Provides standard editing and debugging features such as setting

breakpoints and single stepping

Code Analyzer

Checks your code for problems and recommends modifications to

maximize performance and maintainability

MATLAB Profiler

Records the time spent executing each line of code

Directory Reports

Scan all the files in a directory and report on code efficiency file

differences file dependencies and code coverage

Designing Graphical User Interfaces

By using the interactive tool GUIDE (Graphical User Interface

Development Environment) to layout design and edit user interfaces

GUIDE lets you include list boxes pull-down menus push buttons radio

buttons and sliders as well as MATLAB plots and Microsoft

ActiveXreg controls Alternatively you can create GUIs programmatically

using MATLAB functions

322 ANALYZING AND ACCESSING DATA

MATLAB supports the entire data analysis process from acquiring

data from external devices and databases through preprocessing

visualization and numerical analysis to producing presentation-quality

output

Data Analysis

MATLAB provides interactive tools and command-line functions for data

analysis operations including

Interpolating and decimating

Extracting sections of data scaling and averaging

Thresholding and smoothing

Correlation Fourier analysis and filtering

1-D peak valley and zero finding

Basic statistics and curve fitting

Matrix analysis

Data Access

MATLAB is an efficient platform for accessing data from

files other applications databases and external devices You can read data

from popular file formats such as Microsoft Excel ASCII text or binary

files image sound and video files and scientific files such as HDF and

HDF5 Low-level binary file IO functions let you work with data files in

any format Additional functions let you read data from Web pages and

XML

Visualizing Data

All the graphics features that are required to visualize engineering

and scientific data are available in MATLAB These include 2-D and 3-D

plotting functions 3-D volume visualization functions tools for

interactively creating plots and the ability to export results to all popular

graphics formats You can customize plots by adding multiple axes

changing line colors and markers adding annotation Latex equations and

legends and drawing shapes

2-D Plotting

Visualizing vectors of data with 2-D plotting functions that create

Line area bar and pie charts

Direction and velocity plots

Histograms

Polygons and surfaces

Scatterbubble plots

Animations

3-D Plotting and Volume Visualization

MATLAB provides functions for visualizing 2-D matrices 3-

D scalar and 3-D vector data You can use these functions to visualize and

understand large often complex multidimensional data Specifying plot

characteristics such as camera viewing angle perspective lighting effect

light source locations and transparency

3-D plotting functions include

Surface contour and mesh

Image plots

Cone slice stream and isosurface

323 PERFORMING NUMERIC COMPUTATION

MATLAB contains mathematical statistical and engineering

functions to support all common engineering and science operations These

functions developed by experts in mathematics are the foundation of the

MATLAB language The core math functions use the LAPACK and BLAS

linear algebra subroutine libraries and the FFTW Discrete Fourier

Transform library Because these processor-dependent libraries are

optimized to the different platforms that MATLAB supports they execute

faster than the equivalent C or C++ code

MATLAB provides the following types of functions for performing

mathematical operations and analyzing data

Matrix manipulation and linear algebra

Polynomials and interpolation

Fourier analysis and filtering

Data analysis and statistics

Optimization and numerical integration

Ordinary differential equations (ODEs)

Partial differential equations (PDEs)

Sparse matrix operations

MATLAB can perform arithmetic on a wide range of data types

including doubles singles and integers

CHAPTER 4

IMPLEMENTATION

41 GENERAL

Matlab is a program that was originally designed to simplify the

implementation of numerical linear algebra routines It has since grown into

something much bigger and it is used to implement numerical algorithms

for a wide range of applications The basic language used is very similar to

standard linear algebra notation but there are a few extensions that will

likely cause you some problems at first

42 SNAPSHOTS

ORIGINAL SCLERA IMAGE IS CONVERTED INTO GREY SCALE IMAGE

FIG

GREY SCALE IMAGE IS CONVERTED INTO BINARY IMAGE

FIG

EDGE DETECTON IS DONE BY OTSUrsquoS THRESHOLDING

FIG

SELECTING THE REGION OF INTEREST (SCLERA PART)

FIG

SELECTED ROI PART

FIG

FIG

ENHANCEMENT OF SCLERA IMAGE

FIG

FEATURE EXTRACTION OF SCLERA IMAGE USING GABOR

FILTERS

FIG

MATCHING WITH IMAGES IN DATABASE

FIG

DISPLAYING THE RESULT (MATCHED OR NOT MATCHED)

FIG

CHAPTER 5

APPLICATIONS

The applications of biometrics can be divided into the following three main groups

Commercial applications such as computer network login electronic data security ecommerce Internet access ATM credit card physical access control cellular phone PDA medical records management distance learning etc

Government applications such as national ID card correctional facility driverrsquos license social security welfare-disbursement border control Passport control etc

Forensic applications such as corpse identification criminal investigation terrorist identification parenthood determination missing children etc Traditionally commercial applications have used knowledge-based systems (eg PIN sand passwords) government applications have used token-based systems (eg ID cards and badges) and forensic applications have relied on human experts to match biometric features Biometric systems are being increasingly deployed in large scale civilian applications The Schiphol Premium scheme at the Amsterdam airport for example employs iris scan cards to speed up the passport and visa control procedures

CHAPTER 6

CONCLUSION AND FUTURE SCOPE

61 CONCLUSION

In this paper we proposed a new parallel sclera vein recognition

method which employees a two stage parallel approach for registration and

matching Even though the research focused on developing a parallel sclera

matching solution for the sequential line-descriptor method using CUDA

GPU architecture the parallel strategies developed in this research can be

applied to design parallel solutions to other sclera vein recognition methods

and general pattern recognition methods We designed the Y shape

descriptor to narrow the search range to increase the matching efficiency

which is a new feature extraction method to take advantage of the GPU

structures We developed the WPL descriptor to incorporate mask

information and make it more suitable for parallel computing which can

dramatically reduce data transferring and computation We then carefully

mapped our algorithms to GPU threads and blocks which is an important

step to achieve parallel computation efficiency using a GPU A work flow

which has high arithmetic intensity to hide the memory access latency was

designed to partition the computation task to the heterogeneous system of

CPU and GPU even to the threads in GPU The proposed method

dramatically improves the matching efficiency without compromising

recognition accuracy

62 REFERENCES

[1] C W Oyster The Human Eye Structure and Function Sunderland

Sinauer Associates 1999

[2] C Cuevas D Berjon F Moran and N Garcia ldquoMoving object

detection for real-time augmented reality applications in a GPGPUrdquo IEEE

Trans Consum Electron vol 58 no 1 pp 117ndash125 Feb 2012

[3] D C Cirean U Meier L M Gambardella and J Schmidhuber ldquoDeep

big simple neural nets for handwritten digit recognitionrdquo Neural Comput

vol 22 no 12 pp 3207ndash3220 2010

[4] F Z Sakr M Taher and A M Wahba ldquoHigh performance iris

recognition system on GPUrdquo in Proc ICCES 2011 pp 237ndash242

[5] G Poli J H Saito J F Mari and M R Zorzan ldquoProcessing

neocognitron of face recognition on high performance environment based

on GPU with CUDA architecturerdquo in Proc 20th Int Symp Comput

Archit High Perform Comput 2008 pp 81ndash88

[6] J Antikainen J Havel R Josth A Herout P Zemcik and M Hauta-

Kasari ldquoNonnegative tensor factorization accelerated using GPGPUrdquo IEEE

Trans Parallel Distrib Syst vol 22 no 7 pp 1135ndash1141 Feb 2011

[7] K-S Oh and K Jung ldquoGPU implementation of neural networksrdquo

Pattern Recognit vol 37 no 6 pp 1311ndash1314 2004

[8] P R Dixon T Oonishi and S Furui ldquoHarnessing graphics processors

for the fast computation of acoustic likelihoods in speech recognitionrdquo

Comput Speech Lang vol 23 no 4 pp 510ndash526 2009

[9] P Kaufman and A Alm ldquoClinical applicationrdquo Adlerrsquos Physiology of

the Eye 2003

[10] R N Rakvic B J Ulis R P Broussard R W Ives and N Steiner

ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics Security vol 4

no 4 pp 812ndash823 Dec 2009

[11] S Crihalmeanu and A Ross ldquoMultispectral scleral patterns for ocular

biometric recognitionrdquo Pattern Recognit Lett vol 33 no 14 pp 1860ndash

1869 Oct 2012

[12] W Wenying Z Dongming Z Yongdong L Jintao and G

Xiaoguang ldquoRobust spatial matching for object retrieval and its parallel

implementation on GPUrdquo IEEE Trans Multimedia vol 13 no 6 pp

1308ndash1318 Dec 2011Multimedia Sec Magdeburg Germany Sep 2004

pp 4ndash15

[13] Y Xu S Deka and R Righetti ldquoA hybrid CPU-GPGPU approach for

real-time elastographyrdquo IEEE Trans Ultrason Ferroelectr Freq Control

vol 58 no 12 pp 2631ndash2645 Dec 2011

[14] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive

multimodal eye recognitionrdquo Signal Image Video Process vol 7 no 4

pp 619ndash631 Jul 2013

[15] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive

approach for sclera image quality measurerdquo Int J Biometrics vol 5 no 2

pp 181ndash198 2013

[16] Z Zhou E Y Du N L Thomas and E J Delp ldquoA new human

identification method Sclera recognitionrdquo IEEE Trans Syst Man

Cybern A Syst Humans vol 42 no 3 pp 571ndash583 May 2012

Page 51: Major project report on

1) SHIFT PARAMETER SEARCH As we discussed before

segmentation may not be accurate As a result the detected iris center could

not be very accurate Shift transform is designed to tolerant possible errors

in pupil center detection in the segmentation step If there is no deformation

or only very minor deformation registration with shift transform together

would be adequate to achieve an accurate result We designed Algorithm 2

to get optimized shift parameter Where Tte is the test template and ssei is

the i th WPL descriptor of Tte Tta is the target template and ssai is the i th

WPL descriptor of Ttad(stek staj ) is Euclidean distance of escriptors stek

and staj

Δsk is the shift value of two descriptors defines as

We first randomly select an equal number of segment descriptors

stek in test template Tte from each quad and find its nearest neighbors staj _

in target template Tta The shift offset of them is recorded as the possible

registration shift factor _sk The final offset registration factor is _soptim

which has the smallest standard deviation among these candidate offsets

2) AFFINE TRANSFORM PARAMETER SEARCH

Affine transform is designed to tolerant some deformation of sclera

patterns in the matching step The affine transform algorithm is shown in

Algorithm 3 The shift value in the parameter set is obtained by randomly

selecting descriptor s(it )te and calculating the distance from its nearest

neighbor staj_ in Tta We transform the test template by the matrix in (7)

At end of the iteration we count the numbers of matched descriptor pairs

from the transformed template and the target template The factor β is

involved to determine if the pair of descriptor is matched and we set it to

be 20 pixels in our experiment After N iterations the optimized transform

parameter set is determined via selecting the maximum matching numbers

m(it) Here stei Tte staj and Tta is defined same as algorithm 2 tr (it )

shi f t θ(it )tr (it ) scale is the parameters of shift rotation and scale

transform generated in i tth iteration R(θ (it )) T (tr (it ) shi f t ) and S(tr (it

) scale) are the transform matrix defined as (7) To search optimize

transform parameter we iterated N times to generate these parameters In

our experiment we set iteration time to 512

3) REGISTRATION AND MATCHING ALGORITHM

Using the optimized parameter set determined from Algorithms 2

and 3 the test template will be registered and matched simultaneously The

registration and matching algorithm is listed in Algorithm 4 Here stei Tte

staj and Tta are defined same as Algorithms 2 and 3 θ(optm) tr (optm) shi f

t tr (optm) scale _soptim are the registration parameters attained from

Algorithms 2 and 3 R_θ(optm)_T _tr (optm) shi f t _S_tr (optm) scale

_ is the descriptor transform matrix defined in Algorithm 3 empty is the angle

between the segment descriptor and radius direction w is the weight of the

descriptor which indicates whether the descriptor is at the edge of sclera or

not To ensure that the nearest descriptors have a similar orientation we

used a constant factor α to check the abstract difference of two ɸ In our

experiment we set α to 5 The total matching score is minimal score of two

transformed result divided by the minimal matching score for test template

and target template

25 MAPPING THE SUBTASKS TO CUDA

CUDA is a single instruction multiple data (SIMD) system and

works as a coprocessor with a CPU A CUDA consists of many streaming

multiprocessors (SM) where the parallel part of the program should be

partitioned into threads by the programmer and mapped into those threads

There are multiple memory spaces in the CUDA memory hierarchy

register local memory shared memory global memory constant memory

and texture memory Register local memory and shared memory are on-

chip and could be a little time consuming to access these memories Only

shared memory can be accessed by other threads within the same block

However there is only limited availability of shared memory Global

memory constant memory and texture memory are off-chip memory and

accessible by all threads which would be very time consuming to access

these memories

Constant memory and texture memory are read-only and cacheable

memory Mapping algorithms to CUDA to achieve efficient processing is

not a trivial task There are several challenges in CUDA programming

If threads in a warp have different control path all the branches will be

executed serially To improve performance branch divergence within a

warp should be avoided

Global memory is slower than on-chip memory in term of access To

completely hide the latency of the small instructions set we should use on-

chip memory preferentially rather than global memory When global

memory access occurs threads in same warp should access the words in

sequence to achieve coalescence

Shared memory is much faster than the local and global memory space

But shared memory is organized into banks which are equal in size If two

addresses of memory request from different thread within a warp fall in the

same memory bank the access will be serialized To get maximum

performance memory requests should be scheduled to minimize bank

conflicts

251 MAPPING ALGORITHM TO BLOCKS

Because the proposed registration and matching algorithm has four

independent modules all the modules will be converted to different kernels

on the GPU These kernels are different in computation density thus we

map them to the GPU by various map strategies to fully utilize the

computing power of CUDA Figure 11 shows our scheme of CPU-GPU

task distribution and the partition among blocks and threads Algorithm 1 is

partitioned into coarse-grained parallel subtasks

We create a number of threads in this kernel The number of threads

is the same as the number of templates in the database As the upper middle

column shows in Figure 11 each target template will be assigned to one

thread One thread performs a pair of templates compare In our work we

use NVIDIA C2070 as our GPU Threads and blocks number is set to

1024 That means we can match our test template with up to 1024times1024

target templates at same time

Algorithms 2-4 will be partitioned into fine-grained subtasks which is

processed a section of descriptors in one thread As the lower portion of the

middle column shows in Figure 11 we assigned a target template to one

block Inside a block one thread corresponds a set of descriptors in this

template This partition makes every block execute independently and there

are no data exchange requirements between different blocks When all

threads complete their responding descriptor fractions the sum of the

intermediate results needs to be computed or compared A parallel prefix

sum algorithm is used to calculate the sum of intermediate results which is

show in right of Figure 11 Firstly all odd number threads compute the sum

of consecutive pairs of the results Then recursively every first of i (= 4 8

16 32 64 ) threads

compute the prefix sum on the new result The final result will be saved in

the first address which has the same variable name as the first intermediate

result

252 MAPPING INSIDE BLOCK

In shift argument searching there are two schemes we can choose to

map task

Mapping one pair of templates to all the threads in a block and then every

thread would take charge of a fraction of descriptors and cooperation with

other threads

Assigning a single possible shift offset to a thread and all the threads will

compute independently unless the final result should be compared with

other possible offset

Due to great number of sum and synchronization operations in every

nearest neighbor searching step we choose the second method to parallelize

shift searching In affine matrix generator we mapped an entire parameter

set searching to a thread and every thread randomly generated a set of

parameters and tried them independently The generated iterations were

assigned to all threads The challenge of this step is the randomly generated

numbers might be correlated among threads In the step of rotation and

scale registration generating we used the Mersenne Twister pseudorandom

number generator because it can use bitwise arithmetic and have long

period

The Mersenne twister as most of pseudorandom generators is iterative

Therefore itrsquos hard to parallelize a single twister state update step among

several execution threads To make sure that thousands of threads in the

launch grid generate uncorrelated random sequence many simultaneous

Mersenne twisters need to process with different initial states in parallel

But even ldquovery differentrdquo (by any definition) initial state values do not

prevent the emission of correlated sequences by each generator sharing

identical parameters To solve this problem and to enable efficient

implementation of Mersenne Twister on parallel architectures we used a

special offline tool for the dynamic creation of Mersenne Twisters

parameters modified from the algorithm developed by Makoto Matsumoto

and Takuji Nishimura In the registration and matching step when

searching the nearest neighbor a line segment that has already matched

with others should not be used again In our approach a flag

FIG

FIG

Variable denoting whether the line has been matched is stored in

shared memory To share the flags all the threads in a block should wait

synchronic operation at every query step Our solution is to use a single

thread in a block to process the matching

253 MEMORY MANAGEMENT

The bandwidth inside GPU board is much higher than the

bandwidth between host memory and device memory The data transfer

between host and device can lead to long latency As shown in Figure 11

we load the entire target templates set from database without considering

when they would be processed Therefore there was no data transfer from

host to device during the matching procedure In global memory the

components in descriptor y(ϕ1 ϕ2 ϕ3 x y) and s(x y rθ ϕw) were stored

separately This would guarantee contiguous kernels of Algorithm 2 to 4

can access their data in successive addresses Although such coalescing

access reduces the latency frequently global memory access was still a

slower way to get data In our kernel we loaded the test template to shared

memory to accelerate memory access Because the Algorithms 2 to 4

execute different number of iterations on same data the bank conflict does

not happen To maximize our texture memory space we set the system

cache to the lowest value and bonded our target descriptor to texture

memory Using this catchable memory our data access was accelerated

more

FIG

26 HISTOGRAM OF ORIENTED GRADIENTS

Histogram of oriented gradients is the feature descriptors It is primarily

applied to the design of target detection In this paper it is applied as the

feature for human recognition In the sclera region the vein patterns are the

edges of an image So HOG is used to determine the gradient orientation

and edge orientations of vein pattern in the sclera region of an eye image

To follow out this technique first of all divide the image into small

connected regions called cells For each cell compute the histogram of

gradient directions or edge orientations of the pixels Then the combination

of different histogram of different cell represents the descriptor To improve

accuracy histograms can be contrast normalized by calculating the intensity

from the block and then using this value normalizes all cells within the

block This normalization result shows that it is invariant to geometric and

photometric changes The gradient magnitude m(x y) and orientation 1050592(x

y) are calculated using x and y directions gradients dx (x y) and dy (x y)

Orientation binning is the second step of HOG This method utilized

to create cell histograms Each pixel within the cell used to give a weight to

the orientation which is found in the gradient computation Gradient

magnitude is used as the weight The cells are in the rectangular form The

binning of gradient orientation should be spread over 0 to 180 degrees and

opposite direction counts as the same In the Fig 8 depicts the edge

orientation of picture elements If the images have any illumination and

contrast changes then the gradient strength must be locally normalized For

that cells are grouped together into larger blocks These blocks are

overlapping so that each cell contributes more than once to the final

descriptor Here rectangular HOG (R-HOG) blocks are applied which are

mainly in square grids The performance of HOG is improved by putting

on a Gaussian window into each block

FIG

CHAPTER 3

SOFTWARE SPECIFICATION

31 GENERAL

MATLAB(matrix laboratory) is a numerical

computing environment and fourth-generation programming language

Developed by Math Works MATLAB allows matrix manipulations

plotting of functions and data implementation of algorithms creation

of user interfaces and interfacing with programs written in other languages

including C C++ Java and Fortran

Although MATLAB is intended primarily for numerical computing an

optional toolbox uses the MuPAD symbolic engine allowing access

to symbolic computing capabilities An additional package Simulink adds

graphicalmulti-domainsimulationandModel-Based

Design for dynamic and embedded systems

In 2004 MATLAB had around one million users across industry

and academia MATLAB users come from various backgrounds

of engineering science and economics MATLAB is widely used in

academic and research institutions as well as industrial enterprises

MATLAB was first adopted by researchers and practitioners

in control engineering Littles specialty but quickly spread to many other

domains It is now also used in education in particular the teaching

of linear algebra and numerical analysis and is popular amongst scientists

involved in image processing The MATLAB application is built around the

MATLAB language The simplest way to execute MATLAB code is to type

it in the Command Window which is one of the elements of the MATLAB

Desktop When code is entered in the Command Window MATLAB can

be used as an interactive mathematical shell Sequences of commands can

be saved in a text file typically using the MATLAB Editor as a script or

encapsulated into a function extending the commands available

MATLAB provides a number of features for documenting and

sharing your work You can integrate your MATLAB code with other

languages and applications and distribute your MATLAB algorithms and

applications

32 FEATURES OF MATLAB

High-level language for technical computing

Development environment for managing code files and data

Interactive tools for iterative exploration design and problem solving

Mathematical functions for linear algebra statistics Fourier analysis

filtering optimization and numerical integration

2-D and 3-D graphics functions for visualizing data

Tools for building custom graphical user interfaces

Functions for integrating MATLAB based algorithms with external

applications and languages such as C C++ FORTRAN Javatrade COM

and Microsoft Excel

MATLAB is used in vast area including signal and image

processing communications control design test and measurement

financial modeling and analysis and computational Add-on toolboxes

(collections of special-purpose MATLAB functions) extend the MATLAB

environment to solve particular classes of problems in these application

areas

MATLAB can be used on personal computers and powerful

server systems including the Cheaha compute cluster With the addition of

the Parallel Computing Toolbox the language can be extended with parallel

implementations for common computational functions including for-loop

unrolling Additionally this toolbox supports offloading computationally

intensive workloads to Cheaha the campus compute cluster MATLAB is

one of a few languages in which each variable is a matrix (broadly

construed) and knows how big it is Moreover the fundamental operators

(eg addition multiplication) are programmed to deal with matrices when

required And the MATLAB environment handles much of the bothersome

housekeeping that makes all this possible Since so many of the procedures

required for Macro-Investment Analysis involves matrices MATLAB

proves to be an extremely efficient language for both communication and

implementation

321 INTERFACING WITH OTHER LANGUAGES

MATLAB can call functions and subroutines written in the C

programming language or FORTRAN A wrapper function is created

allowing MATLAB data types to be passed and returned The dynamically

loadable object files created by compiling such functions are termed MEX-

files (for MATLAB executable)

Libraries written in Java ActiveX or NET can be directly called

from MATLAB and many MATLAB libraries (for

example XML or SQL support) are implemented as wrappers around Java

or ActiveX libraries Calling MATLAB from Java is more complicated but

can be done with MATLAB extension which is sold separately by Math

Works or using an undocumented mechanism called JMI (Java-to-Mat lab

Interface) which should not be confused with the unrelated Java that is also

called JMI

As alternatives to the MuPAD based Symbolic Math Toolbox

available from Math Works MATLAB can be connected

to Maple or Mathematical

Libraries also exist to import and export MathML

Development Environment

Startup Accelerator for faster MATLAB startup on Windows especially on

Windows XP and for network installations

Spreadsheet Import Tool that provides more options for selecting and

loading mixed textual and numeric data

Readability and navigation improvements to warning and error messages in

the MATLAB command window

Automatic variable and function renaming in the MATLAB Editor

Developing Algorithms and Applications

MATLAB provides a high-level language and development

tools that let you quickly develop and analyze your algorithms and

applications

The MATLAB Language

The MATLAB language supports the vector and matrix operations

that are fundamental to engineering and scientific problems It enables fast

development and execution With the MATLAB language you can

program and develop algorithms faster than with traditional languages

because you do not need to perform low-level administrative tasks such as

declaring variables specifying data types and allocating memory In many

cases MATLAB eliminates the need for lsquoforrsquo loops As a result one line of

MATLAB code can often replace several lines of C or C++ code

At the same time MATLAB provides all the features of a traditional

programming language including arithmetic operators flow control data

structures data types object-oriented programming (OOP) and debugging

features

MATLAB lets you execute commands or groups of commands one

at a time without compiling and linking enabling you to quickly iterate to

the optimal solution For fast execution of heavy matrix and vector

computations MATLAB uses processor-optimized libraries For general-

purpose scalar computations MATLAB generates machine-code

instructions using its JIT (Just-In-Time) compilation technology

This technology which is available on most platforms provides

execution speeds that rival those of traditional programming languages

Development Tools

MATLAB includes development tools that help you implement

your algorithm efficiently These include the following

MATLAB Editor

Provides standard editing and debugging features such as setting

breakpoints and single stepping

Code Analyzer

Checks your code for problems and recommends modifications to

maximize performance and maintainability

MATLAB Profiler

Records the time spent executing each line of code

Directory Reports

Scan all the files in a directory and report on code efficiency file

differences file dependencies and code coverage

Designing Graphical User Interfaces

By using the interactive tool GUIDE (Graphical User Interface

Development Environment) to layout design and edit user interfaces

GUIDE lets you include list boxes pull-down menus push buttons radio

buttons and sliders as well as MATLAB plots and Microsoft

ActiveXreg controls Alternatively you can create GUIs programmatically

using MATLAB functions

322 ANALYZING AND ACCESSING DATA

MATLAB supports the entire data analysis process from acquiring

data from external devices and databases through preprocessing

visualization and numerical analysis to producing presentation-quality

output

Data Analysis

MATLAB provides interactive tools and command-line functions for data

analysis operations including

Interpolating and decimating

Extracting sections of data scaling and averaging

Thresholding and smoothing

Correlation Fourier analysis and filtering

1-D peak valley and zero finding

Basic statistics and curve fitting

Matrix analysis

Data Access

MATLAB is an efficient platform for accessing data from

files other applications databases and external devices You can read data

from popular file formats such as Microsoft Excel ASCII text or binary

files image sound and video files and scientific files such as HDF and

HDF5 Low-level binary file IO functions let you work with data files in

any format Additional functions let you read data from Web pages and

XML

Visualizing Data

All the graphics features that are required to visualize engineering

and scientific data are available in MATLAB These include 2-D and 3-D

plotting functions 3-D volume visualization functions tools for

interactively creating plots and the ability to export results to all popular

graphics formats You can customize plots by adding multiple axes

changing line colors and markers adding annotation Latex equations and

legends and drawing shapes

2-D Plotting

Visualizing vectors of data with 2-D plotting functions that create

Line area bar and pie charts

Direction and velocity plots

Histograms

Polygons and surfaces

Scatterbubble plots

Animations

3-D Plotting and Volume Visualization

MATLAB provides functions for visualizing 2-D matrices 3-

D scalar and 3-D vector data You can use these functions to visualize and

understand large often complex multidimensional data Specifying plot

characteristics such as camera viewing angle perspective lighting effect

light source locations and transparency

3-D plotting functions include

Surface contour and mesh

Image plots

Cone slice stream and isosurface

323 PERFORMING NUMERIC COMPUTATION

MATLAB contains mathematical statistical and engineering

functions to support all common engineering and science operations These

functions developed by experts in mathematics are the foundation of the

MATLAB language The core math functions use the LAPACK and BLAS

linear algebra subroutine libraries and the FFTW Discrete Fourier

Transform library Because these processor-dependent libraries are

optimized to the different platforms that MATLAB supports they execute

faster than the equivalent C or C++ code

MATLAB provides the following types of functions for performing

mathematical operations and analyzing data

Matrix manipulation and linear algebra

Polynomials and interpolation

Fourier analysis and filtering

Data analysis and statistics

Optimization and numerical integration

Ordinary differential equations (ODEs)

Partial differential equations (PDEs)

Sparse matrix operations

MATLAB can perform arithmetic on a wide range of data types

including doubles singles and integers

CHAPTER 4

IMPLEMENTATION

41 GENERAL

Matlab is a program that was originally designed to simplify the

implementation of numerical linear algebra routines It has since grown into

something much bigger and it is used to implement numerical algorithms

for a wide range of applications The basic language used is very similar to

standard linear algebra notation but there are a few extensions that will

likely cause you some problems at first

42 SNAPSHOTS

ORIGINAL SCLERA IMAGE IS CONVERTED INTO GREY SCALE IMAGE

FIG

GREY SCALE IMAGE IS CONVERTED INTO BINARY IMAGE

FIG

EDGE DETECTON IS DONE BY OTSUrsquoS THRESHOLDING

FIG

SELECTING THE REGION OF INTEREST (SCLERA PART)

FIG

SELECTED ROI PART

FIG

FIG

ENHANCEMENT OF SCLERA IMAGE

FIG

FEATURE EXTRACTION OF SCLERA IMAGE USING GABOR

FILTERS

FIG

MATCHING WITH IMAGES IN DATABASE

FIG

DISPLAYING THE RESULT (MATCHED OR NOT MATCHED)

FIG

CHAPTER 5

APPLICATIONS

The applications of biometrics can be divided into the following three main groups

Commercial applications such as computer network login electronic data security ecommerce Internet access ATM credit card physical access control cellular phone PDA medical records management distance learning etc

Government applications such as national ID card correctional facility driverrsquos license social security welfare-disbursement border control Passport control etc

Forensic applications such as corpse identification criminal investigation terrorist identification parenthood determination missing children etc Traditionally commercial applications have used knowledge-based systems (eg PIN sand passwords) government applications have used token-based systems (eg ID cards and badges) and forensic applications have relied on human experts to match biometric features Biometric systems are being increasingly deployed in large scale civilian applications The Schiphol Premium scheme at the Amsterdam airport for example employs iris scan cards to speed up the passport and visa control procedures

CHAPTER 6

CONCLUSION AND FUTURE SCOPE

61 CONCLUSION

In this paper we proposed a new parallel sclera vein recognition

method which employees a two stage parallel approach for registration and

matching Even though the research focused on developing a parallel sclera

matching solution for the sequential line-descriptor method using CUDA

GPU architecture the parallel strategies developed in this research can be

applied to design parallel solutions to other sclera vein recognition methods

and general pattern recognition methods We designed the Y shape

descriptor to narrow the search range to increase the matching efficiency

which is a new feature extraction method to take advantage of the GPU

structures We developed the WPL descriptor to incorporate mask

information and make it more suitable for parallel computing which can

dramatically reduce data transferring and computation We then carefully

mapped our algorithms to GPU threads and blocks which is an important

step to achieve parallel computation efficiency using a GPU A work flow

which has high arithmetic intensity to hide the memory access latency was

designed to partition the computation task to the heterogeneous system of

CPU and GPU even to the threads in GPU The proposed method

dramatically improves the matching efficiency without compromising

recognition accuracy

62 REFERENCES

[1] C W Oyster The Human Eye Structure and Function Sunderland

Sinauer Associates 1999

[2] C Cuevas D Berjon F Moran and N Garcia ldquoMoving object

detection for real-time augmented reality applications in a GPGPUrdquo IEEE

Trans Consum Electron vol 58 no 1 pp 117ndash125 Feb 2012

[3] D C Cirean U Meier L M Gambardella and J Schmidhuber ldquoDeep

big simple neural nets for handwritten digit recognitionrdquo Neural Comput

vol 22 no 12 pp 3207ndash3220 2010

[4] F Z Sakr M Taher and A M Wahba ldquoHigh performance iris

recognition system on GPUrdquo in Proc ICCES 2011 pp 237ndash242

[5] G Poli J H Saito J F Mari and M R Zorzan ldquoProcessing

neocognitron of face recognition on high performance environment based

on GPU with CUDA architecturerdquo in Proc 20th Int Symp Comput

Archit High Perform Comput 2008 pp 81ndash88

[6] J Antikainen J Havel R Josth A Herout P Zemcik and M Hauta-

Kasari ldquoNonnegative tensor factorization accelerated using GPGPUrdquo IEEE

Trans Parallel Distrib Syst vol 22 no 7 pp 1135ndash1141 Feb 2011

[7] K-S Oh and K Jung ldquoGPU implementation of neural networksrdquo

Pattern Recognit vol 37 no 6 pp 1311ndash1314 2004

[8] P R Dixon T Oonishi and S Furui ldquoHarnessing graphics processors

for the fast computation of acoustic likelihoods in speech recognitionrdquo

Comput Speech Lang vol 23 no 4 pp 510ndash526 2009

[9] P Kaufman and A Alm ldquoClinical applicationrdquo Adlerrsquos Physiology of

the Eye 2003

[10] R N Rakvic B J Ulis R P Broussard R W Ives and N Steiner

ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics Security vol 4

no 4 pp 812ndash823 Dec 2009

[11] S Crihalmeanu and A Ross ldquoMultispectral scleral patterns for ocular

biometric recognitionrdquo Pattern Recognit Lett vol 33 no 14 pp 1860ndash

1869 Oct 2012

[12] W Wenying Z Dongming Z Yongdong L Jintao and G

Xiaoguang ldquoRobust spatial matching for object retrieval and its parallel

implementation on GPUrdquo IEEE Trans Multimedia vol 13 no 6 pp

1308ndash1318 Dec 2011Multimedia Sec Magdeburg Germany Sep 2004

pp 4ndash15

[13] Y Xu S Deka and R Righetti ldquoA hybrid CPU-GPGPU approach for

real-time elastographyrdquo IEEE Trans Ultrason Ferroelectr Freq Control

vol 58 no 12 pp 2631ndash2645 Dec 2011

[14] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive

multimodal eye recognitionrdquo Signal Image Video Process vol 7 no 4

pp 619ndash631 Jul 2013

[15] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive

approach for sclera image quality measurerdquo Int J Biometrics vol 5 no 2

pp 181ndash198 2013

[16] Z Zhou E Y Du N L Thomas and E J Delp ldquoA new human

identification method Sclera recognitionrdquo IEEE Trans Syst Man

Cybern A Syst Humans vol 42 no 3 pp 571ndash583 May 2012

Page 52: Major project report on

We first randomly select an equal number of segment descriptors

stek in test template Tte from each quad and find its nearest neighbors staj _

in target template Tta The shift offset of them is recorded as the possible

registration shift factor _sk The final offset registration factor is _soptim

which has the smallest standard deviation among these candidate offsets

2) AFFINE TRANSFORM PARAMETER SEARCH

Affine transform is designed to tolerant some deformation of sclera

patterns in the matching step The affine transform algorithm is shown in

Algorithm 3 The shift value in the parameter set is obtained by randomly

selecting descriptor s(it )te and calculating the distance from its nearest

neighbor staj_ in Tta We transform the test template by the matrix in (7)

At end of the iteration we count the numbers of matched descriptor pairs

from the transformed template and the target template The factor β is

involved to determine if the pair of descriptor is matched and we set it to

be 20 pixels in our experiment After N iterations the optimized transform

parameter set is determined via selecting the maximum matching numbers

m(it) Here stei Tte staj and Tta is defined same as algorithm 2 tr (it )

shi f t θ(it )tr (it ) scale is the parameters of shift rotation and scale

transform generated in i tth iteration R(θ (it )) T (tr (it ) shi f t ) and S(tr (it

) scale) are the transform matrix defined as (7) To search optimize

transform parameter we iterated N times to generate these parameters In

our experiment we set iteration time to 512

3) REGISTRATION AND MATCHING ALGORITHM

Using the optimized parameter set determined from Algorithms 2

and 3 the test template will be registered and matched simultaneously The

registration and matching algorithm is listed in Algorithm 4 Here stei Tte

staj and Tta are defined same as Algorithms 2 and 3 θ(optm) tr (optm) shi f

t tr (optm) scale _soptim are the registration parameters attained from

Algorithms 2 and 3 R_θ(optm)_T _tr (optm) shi f t _S_tr (optm) scale

_ is the descriptor transform matrix defined in Algorithm 3 empty is the angle

between the segment descriptor and radius direction w is the weight of the

descriptor which indicates whether the descriptor is at the edge of sclera or

not To ensure that the nearest descriptors have a similar orientation we

used a constant factor α to check the abstract difference of two ɸ In our

experiment we set α to 5 The total matching score is minimal score of two

transformed result divided by the minimal matching score for test template

and target template

25 MAPPING THE SUBTASKS TO CUDA

CUDA is a single instruction multiple data (SIMD) system and

works as a coprocessor with a CPU A CUDA consists of many streaming

multiprocessors (SM) where the parallel part of the program should be

partitioned into threads by the programmer and mapped into those threads

There are multiple memory spaces in the CUDA memory hierarchy

register local memory shared memory global memory constant memory

and texture memory Register local memory and shared memory are on-

chip and could be a little time consuming to access these memories Only

shared memory can be accessed by other threads within the same block

However there is only limited availability of shared memory Global

memory constant memory and texture memory are off-chip memory and

accessible by all threads which would be very time consuming to access

these memories

Constant memory and texture memory are read-only and cacheable

memory Mapping algorithms to CUDA to achieve efficient processing is

not a trivial task There are several challenges in CUDA programming

If threads in a warp have different control path all the branches will be

executed serially To improve performance branch divergence within a

warp should be avoided

Global memory is slower than on-chip memory in term of access To

completely hide the latency of the small instructions set we should use on-

chip memory preferentially rather than global memory When global

memory access occurs threads in same warp should access the words in

sequence to achieve coalescence

Shared memory is much faster than the local and global memory space

But shared memory is organized into banks which are equal in size If two

addresses of memory request from different thread within a warp fall in the

same memory bank the access will be serialized To get maximum

performance memory requests should be scheduled to minimize bank

conflicts

251 MAPPING ALGORITHM TO BLOCKS

Because the proposed registration and matching algorithm has four

independent modules all the modules will be converted to different kernels

on the GPU These kernels are different in computation density thus we

map them to the GPU by various map strategies to fully utilize the

computing power of CUDA Figure 11 shows our scheme of CPU-GPU

task distribution and the partition among blocks and threads Algorithm 1 is

partitioned into coarse-grained parallel subtasks

We create a number of threads in this kernel The number of threads

is the same as the number of templates in the database As the upper middle

column shows in Figure 11 each target template will be assigned to one

thread One thread performs a pair of templates compare In our work we

use NVIDIA C2070 as our GPU Threads and blocks number is set to

1024 That means we can match our test template with up to 1024times1024

target templates at same time

Algorithms 2-4 will be partitioned into fine-grained subtasks which is

processed a section of descriptors in one thread As the lower portion of the

middle column shows in Figure 11 we assigned a target template to one

block Inside a block one thread corresponds a set of descriptors in this

template This partition makes every block execute independently and there

are no data exchange requirements between different blocks When all

threads complete their responding descriptor fractions the sum of the

intermediate results needs to be computed or compared A parallel prefix

sum algorithm is used to calculate the sum of intermediate results which is

show in right of Figure 11 Firstly all odd number threads compute the sum

of consecutive pairs of the results Then recursively every first of i (= 4 8

16 32 64 ) threads

compute the prefix sum on the new result The final result will be saved in

the first address which has the same variable name as the first intermediate

result

252 MAPPING INSIDE BLOCK

In shift argument searching there are two schemes we can choose to

map task

Mapping one pair of templates to all the threads in a block and then every

thread would take charge of a fraction of descriptors and cooperation with

other threads

Assigning a single possible shift offset to a thread and all the threads will

compute independently unless the final result should be compared with

other possible offset

Due to great number of sum and synchronization operations in every

nearest neighbor searching step we choose the second method to parallelize

shift searching In affine matrix generator we mapped an entire parameter

set searching to a thread and every thread randomly generated a set of

parameters and tried them independently The generated iterations were

assigned to all threads The challenge of this step is the randomly generated

numbers might be correlated among threads In the step of rotation and

scale registration generating we used the Mersenne Twister pseudorandom

number generator because it can use bitwise arithmetic and have long

period

The Mersenne twister as most of pseudorandom generators is iterative

Therefore itrsquos hard to parallelize a single twister state update step among

several execution threads To make sure that thousands of threads in the

launch grid generate uncorrelated random sequence many simultaneous

Mersenne twisters need to process with different initial states in parallel

But even ldquovery differentrdquo (by any definition) initial state values do not

prevent the emission of correlated sequences by each generator sharing

identical parameters To solve this problem and to enable efficient

implementation of Mersenne Twister on parallel architectures we used a

special offline tool for the dynamic creation of Mersenne Twisters

parameters modified from the algorithm developed by Makoto Matsumoto

and Takuji Nishimura In the registration and matching step when

searching the nearest neighbor a line segment that has already matched

with others should not be used again In our approach a flag

FIG

FIG

Variable denoting whether the line has been matched is stored in

shared memory To share the flags all the threads in a block should wait

synchronic operation at every query step Our solution is to use a single

thread in a block to process the matching

253 MEMORY MANAGEMENT

The bandwidth inside GPU board is much higher than the

bandwidth between host memory and device memory The data transfer

between host and device can lead to long latency As shown in Figure 11

we load the entire target templates set from database without considering

when they would be processed Therefore there was no data transfer from

host to device during the matching procedure In global memory the

components in descriptor y(ϕ1 ϕ2 ϕ3 x y) and s(x y rθ ϕw) were stored

separately This would guarantee contiguous kernels of Algorithm 2 to 4

can access their data in successive addresses Although such coalescing

access reduces the latency frequently global memory access was still a

slower way to get data In our kernel we loaded the test template to shared

memory to accelerate memory access Because the Algorithms 2 to 4

execute different number of iterations on same data the bank conflict does

not happen To maximize our texture memory space we set the system

cache to the lowest value and bonded our target descriptor to texture

memory Using this catchable memory our data access was accelerated

more

FIG

26 HISTOGRAM OF ORIENTED GRADIENTS

Histogram of oriented gradients is the feature descriptors It is primarily

applied to the design of target detection In this paper it is applied as the

feature for human recognition In the sclera region the vein patterns are the

edges of an image So HOG is used to determine the gradient orientation

and edge orientations of vein pattern in the sclera region of an eye image

To follow out this technique first of all divide the image into small

connected regions called cells For each cell compute the histogram of

gradient directions or edge orientations of the pixels Then the combination

of different histogram of different cell represents the descriptor To improve

accuracy histograms can be contrast normalized by calculating the intensity

from the block and then using this value normalizes all cells within the

block This normalization result shows that it is invariant to geometric and

photometric changes The gradient magnitude m(x y) and orientation 1050592(x

y) are calculated using x and y directions gradients dx (x y) and dy (x y)

Orientation binning is the second step of HOG This method utilized

to create cell histograms Each pixel within the cell used to give a weight to

the orientation which is found in the gradient computation Gradient

magnitude is used as the weight The cells are in the rectangular form The

binning of gradient orientation should be spread over 0 to 180 degrees and

opposite direction counts as the same In the Fig 8 depicts the edge

orientation of picture elements If the images have any illumination and

contrast changes then the gradient strength must be locally normalized For

that cells are grouped together into larger blocks These blocks are

overlapping so that each cell contributes more than once to the final

descriptor Here rectangular HOG (R-HOG) blocks are applied which are

mainly in square grids The performance of HOG is improved by putting

on a Gaussian window into each block

FIG

CHAPTER 3

SOFTWARE SPECIFICATION

31 GENERAL

MATLAB(matrix laboratory) is a numerical

computing environment and fourth-generation programming language

Developed by Math Works MATLAB allows matrix manipulations

plotting of functions and data implementation of algorithms creation

of user interfaces and interfacing with programs written in other languages

including C C++ Java and Fortran

Although MATLAB is intended primarily for numerical computing an

optional toolbox uses the MuPAD symbolic engine allowing access

to symbolic computing capabilities An additional package Simulink adds

graphicalmulti-domainsimulationandModel-Based

Design for dynamic and embedded systems

In 2004 MATLAB had around one million users across industry

and academia MATLAB users come from various backgrounds

of engineering science and economics MATLAB is widely used in

academic and research institutions as well as industrial enterprises

MATLAB was first adopted by researchers and practitioners

in control engineering Littles specialty but quickly spread to many other

domains It is now also used in education in particular the teaching

of linear algebra and numerical analysis and is popular amongst scientists

involved in image processing The MATLAB application is built around the

MATLAB language The simplest way to execute MATLAB code is to type

it in the Command Window which is one of the elements of the MATLAB

Desktop When code is entered in the Command Window MATLAB can

be used as an interactive mathematical shell Sequences of commands can

be saved in a text file typically using the MATLAB Editor as a script or

encapsulated into a function extending the commands available

MATLAB provides a number of features for documenting and

sharing your work You can integrate your MATLAB code with other

languages and applications and distribute your MATLAB algorithms and

applications

32 FEATURES OF MATLAB

High-level language for technical computing

Development environment for managing code files and data

Interactive tools for iterative exploration design and problem solving

Mathematical functions for linear algebra statistics Fourier analysis

filtering optimization and numerical integration

2-D and 3-D graphics functions for visualizing data

Tools for building custom graphical user interfaces

Functions for integrating MATLAB based algorithms with external

applications and languages such as C C++ FORTRAN Javatrade COM

and Microsoft Excel

MATLAB is used in vast area including signal and image

processing communications control design test and measurement

financial modeling and analysis and computational Add-on toolboxes

(collections of special-purpose MATLAB functions) extend the MATLAB

environment to solve particular classes of problems in these application

areas

MATLAB can be used on personal computers and powerful

server systems including the Cheaha compute cluster With the addition of

the Parallel Computing Toolbox the language can be extended with parallel

implementations for common computational functions including for-loop

unrolling Additionally this toolbox supports offloading computationally

intensive workloads to Cheaha the campus compute cluster MATLAB is

one of a few languages in which each variable is a matrix (broadly

construed) and knows how big it is Moreover the fundamental operators

(eg addition multiplication) are programmed to deal with matrices when

required And the MATLAB environment handles much of the bothersome

housekeeping that makes all this possible Since so many of the procedures

required for Macro-Investment Analysis involves matrices MATLAB

proves to be an extremely efficient language for both communication and

implementation

321 INTERFACING WITH OTHER LANGUAGES

MATLAB can call functions and subroutines written in the C

programming language or FORTRAN A wrapper function is created

allowing MATLAB data types to be passed and returned The dynamically

loadable object files created by compiling such functions are termed MEX-

files (for MATLAB executable)

Libraries written in Java ActiveX or NET can be directly called

from MATLAB and many MATLAB libraries (for

example XML or SQL support) are implemented as wrappers around Java

or ActiveX libraries Calling MATLAB from Java is more complicated but

can be done with MATLAB extension which is sold separately by Math

Works or using an undocumented mechanism called JMI (Java-to-Mat lab

Interface) which should not be confused with the unrelated Java that is also

called JMI

As alternatives to the MuPAD based Symbolic Math Toolbox

available from Math Works MATLAB can be connected

to Maple or Mathematical

Libraries also exist to import and export MathML

Development Environment

Startup Accelerator for faster MATLAB startup on Windows especially on

Windows XP and for network installations

Spreadsheet Import Tool that provides more options for selecting and

loading mixed textual and numeric data

Readability and navigation improvements to warning and error messages in

the MATLAB command window

Automatic variable and function renaming in the MATLAB Editor

Developing Algorithms and Applications

MATLAB provides a high-level language and development

tools that let you quickly develop and analyze your algorithms and

applications

The MATLAB Language

The MATLAB language supports the vector and matrix operations

that are fundamental to engineering and scientific problems It enables fast

development and execution With the MATLAB language you can

program and develop algorithms faster than with traditional languages

because you do not need to perform low-level administrative tasks such as

declaring variables specifying data types and allocating memory In many

cases MATLAB eliminates the need for lsquoforrsquo loops As a result one line of

MATLAB code can often replace several lines of C or C++ code

At the same time MATLAB provides all the features of a traditional

programming language including arithmetic operators flow control data

structures data types object-oriented programming (OOP) and debugging

features

MATLAB lets you execute commands or groups of commands one

at a time without compiling and linking enabling you to quickly iterate to

the optimal solution For fast execution of heavy matrix and vector

computations MATLAB uses processor-optimized libraries For general-

purpose scalar computations MATLAB generates machine-code

instructions using its JIT (Just-In-Time) compilation technology

This technology which is available on most platforms provides

execution speeds that rival those of traditional programming languages

Development Tools

MATLAB includes development tools that help you implement

your algorithm efficiently These include the following

MATLAB Editor

Provides standard editing and debugging features such as setting

breakpoints and single stepping

Code Analyzer

Checks your code for problems and recommends modifications to

maximize performance and maintainability

MATLAB Profiler

Records the time spent executing each line of code

Directory Reports

Scan all the files in a directory and report on code efficiency file

differences file dependencies and code coverage

Designing Graphical User Interfaces

By using the interactive tool GUIDE (Graphical User Interface

Development Environment) to layout design and edit user interfaces

GUIDE lets you include list boxes pull-down menus push buttons radio

buttons and sliders as well as MATLAB plots and Microsoft

ActiveXreg controls Alternatively you can create GUIs programmatically

using MATLAB functions

322 ANALYZING AND ACCESSING DATA

MATLAB supports the entire data analysis process from acquiring

data from external devices and databases through preprocessing

visualization and numerical analysis to producing presentation-quality

output

Data Analysis

MATLAB provides interactive tools and command-line functions for data

analysis operations including

Interpolating and decimating

Extracting sections of data scaling and averaging

Thresholding and smoothing

Correlation Fourier analysis and filtering

1-D peak valley and zero finding

Basic statistics and curve fitting

Matrix analysis

Data Access

MATLAB is an efficient platform for accessing data from

files other applications databases and external devices You can read data

from popular file formats such as Microsoft Excel ASCII text or binary

files image sound and video files and scientific files such as HDF and

HDF5 Low-level binary file IO functions let you work with data files in

any format Additional functions let you read data from Web pages and

XML

Visualizing Data

All the graphics features that are required to visualize engineering

and scientific data are available in MATLAB These include 2-D and 3-D

plotting functions 3-D volume visualization functions tools for

interactively creating plots and the ability to export results to all popular

graphics formats You can customize plots by adding multiple axes

changing line colors and markers adding annotation Latex equations and

legends and drawing shapes

2-D Plotting

Visualizing vectors of data with 2-D plotting functions that create

Line area bar and pie charts

Direction and velocity plots

Histograms

Polygons and surfaces

Scatterbubble plots

Animations

3-D Plotting and Volume Visualization

MATLAB provides functions for visualizing 2-D matrices 3-

D scalar and 3-D vector data You can use these functions to visualize and

understand large often complex multidimensional data Specifying plot

characteristics such as camera viewing angle perspective lighting effect

light source locations and transparency

3-D plotting functions include

Surface contour and mesh

Image plots

Cone slice stream and isosurface

323 PERFORMING NUMERIC COMPUTATION

MATLAB contains mathematical statistical and engineering

functions to support all common engineering and science operations These

functions developed by experts in mathematics are the foundation of the

MATLAB language The core math functions use the LAPACK and BLAS

linear algebra subroutine libraries and the FFTW Discrete Fourier

Transform library Because these processor-dependent libraries are

optimized to the different platforms that MATLAB supports they execute

faster than the equivalent C or C++ code

MATLAB provides the following types of functions for performing

mathematical operations and analyzing data

Matrix manipulation and linear algebra

Polynomials and interpolation

Fourier analysis and filtering

Data analysis and statistics

Optimization and numerical integration

Ordinary differential equations (ODEs)

Partial differential equations (PDEs)

Sparse matrix operations

MATLAB can perform arithmetic on a wide range of data types

including doubles singles and integers

CHAPTER 4

IMPLEMENTATION

41 GENERAL

Matlab is a program that was originally designed to simplify the

implementation of numerical linear algebra routines It has since grown into

something much bigger and it is used to implement numerical algorithms

for a wide range of applications The basic language used is very similar to

standard linear algebra notation but there are a few extensions that will

likely cause you some problems at first

42 SNAPSHOTS

ORIGINAL SCLERA IMAGE IS CONVERTED INTO GREY SCALE IMAGE

FIG

GREY SCALE IMAGE IS CONVERTED INTO BINARY IMAGE

FIG

EDGE DETECTON IS DONE BY OTSUrsquoS THRESHOLDING

FIG

SELECTING THE REGION OF INTEREST (SCLERA PART)

FIG

SELECTED ROI PART

FIG

FIG

ENHANCEMENT OF SCLERA IMAGE

FIG

FEATURE EXTRACTION OF SCLERA IMAGE USING GABOR

FILTERS

FIG

MATCHING WITH IMAGES IN DATABASE

FIG

DISPLAYING THE RESULT (MATCHED OR NOT MATCHED)

FIG

CHAPTER 5

APPLICATIONS

The applications of biometrics can be divided into the following three main groups

Commercial applications such as computer network login electronic data security ecommerce Internet access ATM credit card physical access control cellular phone PDA medical records management distance learning etc

Government applications such as national ID card correctional facility driverrsquos license social security welfare-disbursement border control Passport control etc

Forensic applications such as corpse identification criminal investigation terrorist identification parenthood determination missing children etc Traditionally commercial applications have used knowledge-based systems (eg PIN sand passwords) government applications have used token-based systems (eg ID cards and badges) and forensic applications have relied on human experts to match biometric features Biometric systems are being increasingly deployed in large scale civilian applications The Schiphol Premium scheme at the Amsterdam airport for example employs iris scan cards to speed up the passport and visa control procedures

CHAPTER 6

CONCLUSION AND FUTURE SCOPE

61 CONCLUSION

In this paper we proposed a new parallel sclera vein recognition

method which employees a two stage parallel approach for registration and

matching Even though the research focused on developing a parallel sclera

matching solution for the sequential line-descriptor method using CUDA

GPU architecture the parallel strategies developed in this research can be

applied to design parallel solutions to other sclera vein recognition methods

and general pattern recognition methods We designed the Y shape

descriptor to narrow the search range to increase the matching efficiency

which is a new feature extraction method to take advantage of the GPU

structures We developed the WPL descriptor to incorporate mask

information and make it more suitable for parallel computing which can

dramatically reduce data transferring and computation We then carefully

mapped our algorithms to GPU threads and blocks which is an important

step to achieve parallel computation efficiency using a GPU A work flow

which has high arithmetic intensity to hide the memory access latency was

designed to partition the computation task to the heterogeneous system of

CPU and GPU even to the threads in GPU The proposed method

dramatically improves the matching efficiency without compromising

recognition accuracy

62 REFERENCES

[1] C W Oyster The Human Eye Structure and Function Sunderland

Sinauer Associates 1999

[2] C Cuevas D Berjon F Moran and N Garcia ldquoMoving object

detection for real-time augmented reality applications in a GPGPUrdquo IEEE

Trans Consum Electron vol 58 no 1 pp 117ndash125 Feb 2012

[3] D C Cirean U Meier L M Gambardella and J Schmidhuber ldquoDeep

big simple neural nets for handwritten digit recognitionrdquo Neural Comput

vol 22 no 12 pp 3207ndash3220 2010

[4] F Z Sakr M Taher and A M Wahba ldquoHigh performance iris

recognition system on GPUrdquo in Proc ICCES 2011 pp 237ndash242

[5] G Poli J H Saito J F Mari and M R Zorzan ldquoProcessing

neocognitron of face recognition on high performance environment based

on GPU with CUDA architecturerdquo in Proc 20th Int Symp Comput

Archit High Perform Comput 2008 pp 81ndash88

[6] J Antikainen J Havel R Josth A Herout P Zemcik and M Hauta-

Kasari ldquoNonnegative tensor factorization accelerated using GPGPUrdquo IEEE

Trans Parallel Distrib Syst vol 22 no 7 pp 1135ndash1141 Feb 2011

[7] K-S Oh and K Jung ldquoGPU implementation of neural networksrdquo

Pattern Recognit vol 37 no 6 pp 1311ndash1314 2004

[8] P R Dixon T Oonishi and S Furui ldquoHarnessing graphics processors

for the fast computation of acoustic likelihoods in speech recognitionrdquo

Comput Speech Lang vol 23 no 4 pp 510ndash526 2009

[9] P Kaufman and A Alm ldquoClinical applicationrdquo Adlerrsquos Physiology of

the Eye 2003

[10] R N Rakvic B J Ulis R P Broussard R W Ives and N Steiner

ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics Security vol 4

no 4 pp 812ndash823 Dec 2009

[11] S Crihalmeanu and A Ross ldquoMultispectral scleral patterns for ocular

biometric recognitionrdquo Pattern Recognit Lett vol 33 no 14 pp 1860ndash

1869 Oct 2012

[12] W Wenying Z Dongming Z Yongdong L Jintao and G

Xiaoguang ldquoRobust spatial matching for object retrieval and its parallel

implementation on GPUrdquo IEEE Trans Multimedia vol 13 no 6 pp

1308ndash1318 Dec 2011Multimedia Sec Magdeburg Germany Sep 2004

pp 4ndash15

[13] Y Xu S Deka and R Righetti ldquoA hybrid CPU-GPGPU approach for

real-time elastographyrdquo IEEE Trans Ultrason Ferroelectr Freq Control

vol 58 no 12 pp 2631ndash2645 Dec 2011

[14] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive

multimodal eye recognitionrdquo Signal Image Video Process vol 7 no 4

pp 619ndash631 Jul 2013

[15] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive

approach for sclera image quality measurerdquo Int J Biometrics vol 5 no 2

pp 181ndash198 2013

[16] Z Zhou E Y Du N L Thomas and E J Delp ldquoA new human

identification method Sclera recognitionrdquo IEEE Trans Syst Man

Cybern A Syst Humans vol 42 no 3 pp 571ndash583 May 2012

Page 53: Major project report on

) scale) are the transform matrix defined as (7) To search optimize

transform parameter we iterated N times to generate these parameters In

our experiment we set iteration time to 512

3) REGISTRATION AND MATCHING ALGORITHM

Using the optimized parameter set determined from Algorithms 2

and 3 the test template will be registered and matched simultaneously The

registration and matching algorithm is listed in Algorithm 4 Here stei Tte

staj and Tta are defined same as Algorithms 2 and 3 θ(optm) tr (optm) shi f

t tr (optm) scale _soptim are the registration parameters attained from

Algorithms 2 and 3 R_θ(optm)_T _tr (optm) shi f t _S_tr (optm) scale

_ is the descriptor transform matrix defined in Algorithm 3 empty is the angle

between the segment descriptor and radius direction w is the weight of the

descriptor which indicates whether the descriptor is at the edge of sclera or

not To ensure that the nearest descriptors have a similar orientation we

used a constant factor α to check the abstract difference of two ɸ In our

experiment we set α to 5 The total matching score is minimal score of two

transformed result divided by the minimal matching score for test template

and target template

25 MAPPING THE SUBTASKS TO CUDA

CUDA is a single instruction multiple data (SIMD) system and

works as a coprocessor with a CPU A CUDA consists of many streaming

multiprocessors (SM) where the parallel part of the program should be

partitioned into threads by the programmer and mapped into those threads

There are multiple memory spaces in the CUDA memory hierarchy

register local memory shared memory global memory constant memory

and texture memory Register local memory and shared memory are on-

chip and could be a little time consuming to access these memories Only

shared memory can be accessed by other threads within the same block

However there is only limited availability of shared memory Global

memory constant memory and texture memory are off-chip memory and

accessible by all threads which would be very time consuming to access

these memories

Constant memory and texture memory are read-only and cacheable

memory Mapping algorithms to CUDA to achieve efficient processing is

not a trivial task There are several challenges in CUDA programming

If threads in a warp have different control path all the branches will be

executed serially To improve performance branch divergence within a

warp should be avoided

Global memory is slower than on-chip memory in term of access To

completely hide the latency of the small instructions set we should use on-

chip memory preferentially rather than global memory When global

memory access occurs threads in same warp should access the words in

sequence to achieve coalescence

Shared memory is much faster than the local and global memory space

But shared memory is organized into banks which are equal in size If two

addresses of memory request from different thread within a warp fall in the

same memory bank the access will be serialized To get maximum

performance memory requests should be scheduled to minimize bank

conflicts

251 MAPPING ALGORITHM TO BLOCKS

Because the proposed registration and matching algorithm has four

independent modules all the modules will be converted to different kernels

on the GPU These kernels are different in computation density thus we

map them to the GPU by various map strategies to fully utilize the

computing power of CUDA Figure 11 shows our scheme of CPU-GPU

task distribution and the partition among blocks and threads Algorithm 1 is

partitioned into coarse-grained parallel subtasks

We create a number of threads in this kernel The number of threads

is the same as the number of templates in the database As the upper middle

column shows in Figure 11 each target template will be assigned to one

thread One thread performs a pair of templates compare In our work we

use NVIDIA C2070 as our GPU Threads and blocks number is set to

1024 That means we can match our test template with up to 1024times1024

target templates at same time

Algorithms 2-4 will be partitioned into fine-grained subtasks which is

processed a section of descriptors in one thread As the lower portion of the

middle column shows in Figure 11 we assigned a target template to one

block Inside a block one thread corresponds a set of descriptors in this

template This partition makes every block execute independently and there

are no data exchange requirements between different blocks When all

threads complete their responding descriptor fractions the sum of the

intermediate results needs to be computed or compared A parallel prefix

sum algorithm is used to calculate the sum of intermediate results which is

show in right of Figure 11 Firstly all odd number threads compute the sum

of consecutive pairs of the results Then recursively every first of i (= 4 8

16 32 64 ) threads

compute the prefix sum on the new result The final result will be saved in

the first address which has the same variable name as the first intermediate

result

252 MAPPING INSIDE BLOCK

In shift argument searching there are two schemes we can choose to

map task

Mapping one pair of templates to all the threads in a block and then every

thread would take charge of a fraction of descriptors and cooperation with

other threads

Assigning a single possible shift offset to a thread and all the threads will

compute independently unless the final result should be compared with

other possible offset

Due to great number of sum and synchronization operations in every

nearest neighbor searching step we choose the second method to parallelize

shift searching In affine matrix generator we mapped an entire parameter

set searching to a thread and every thread randomly generated a set of

parameters and tried them independently The generated iterations were

assigned to all threads The challenge of this step is the randomly generated

numbers might be correlated among threads In the step of rotation and

scale registration generating we used the Mersenne Twister pseudorandom

number generator because it can use bitwise arithmetic and have long

period

The Mersenne twister as most of pseudorandom generators is iterative

Therefore itrsquos hard to parallelize a single twister state update step among

several execution threads To make sure that thousands of threads in the

launch grid generate uncorrelated random sequence many simultaneous

Mersenne twisters need to process with different initial states in parallel

But even ldquovery differentrdquo (by any definition) initial state values do not

prevent the emission of correlated sequences by each generator sharing

identical parameters To solve this problem and to enable efficient

implementation of Mersenne Twister on parallel architectures we used a

special offline tool for the dynamic creation of Mersenne Twisters

parameters modified from the algorithm developed by Makoto Matsumoto

and Takuji Nishimura In the registration and matching step when

searching the nearest neighbor a line segment that has already matched

with others should not be used again In our approach a flag

FIG

FIG

Variable denoting whether the line has been matched is stored in

shared memory To share the flags all the threads in a block should wait

synchronic operation at every query step Our solution is to use a single

thread in a block to process the matching

253 MEMORY MANAGEMENT

The bandwidth inside GPU board is much higher than the

bandwidth between host memory and device memory The data transfer

between host and device can lead to long latency As shown in Figure 11

we load the entire target templates set from database without considering

when they would be processed Therefore there was no data transfer from

host to device during the matching procedure In global memory the

components in descriptor y(ϕ1 ϕ2 ϕ3 x y) and s(x y rθ ϕw) were stored

separately This would guarantee contiguous kernels of Algorithm 2 to 4

can access their data in successive addresses Although such coalescing

access reduces the latency frequently global memory access was still a

slower way to get data In our kernel we loaded the test template to shared

memory to accelerate memory access Because the Algorithms 2 to 4

execute different number of iterations on same data the bank conflict does

not happen To maximize our texture memory space we set the system

cache to the lowest value and bonded our target descriptor to texture

memory Using this catchable memory our data access was accelerated

more

FIG

26 HISTOGRAM OF ORIENTED GRADIENTS

Histogram of oriented gradients is the feature descriptors It is primarily

applied to the design of target detection In this paper it is applied as the

feature for human recognition In the sclera region the vein patterns are the

edges of an image So HOG is used to determine the gradient orientation

and edge orientations of vein pattern in the sclera region of an eye image

To follow out this technique first of all divide the image into small

connected regions called cells For each cell compute the histogram of

gradient directions or edge orientations of the pixels Then the combination

of different histogram of different cell represents the descriptor To improve

accuracy histograms can be contrast normalized by calculating the intensity

from the block and then using this value normalizes all cells within the

block This normalization result shows that it is invariant to geometric and

photometric changes The gradient magnitude m(x y) and orientation 1050592(x

y) are calculated using x and y directions gradients dx (x y) and dy (x y)

Orientation binning is the second step of HOG This method utilized

to create cell histograms Each pixel within the cell used to give a weight to

the orientation which is found in the gradient computation Gradient

magnitude is used as the weight The cells are in the rectangular form The

binning of gradient orientation should be spread over 0 to 180 degrees and

opposite direction counts as the same In the Fig 8 depicts the edge

orientation of picture elements If the images have any illumination and

contrast changes then the gradient strength must be locally normalized For

that cells are grouped together into larger blocks These blocks are

overlapping so that each cell contributes more than once to the final

descriptor Here rectangular HOG (R-HOG) blocks are applied which are

mainly in square grids The performance of HOG is improved by putting

on a Gaussian window into each block

FIG

CHAPTER 3

SOFTWARE SPECIFICATION

31 GENERAL

MATLAB(matrix laboratory) is a numerical

computing environment and fourth-generation programming language

Developed by Math Works MATLAB allows matrix manipulations

plotting of functions and data implementation of algorithms creation

of user interfaces and interfacing with programs written in other languages

including C C++ Java and Fortran

Although MATLAB is intended primarily for numerical computing an

optional toolbox uses the MuPAD symbolic engine allowing access

to symbolic computing capabilities An additional package Simulink adds

graphicalmulti-domainsimulationandModel-Based

Design for dynamic and embedded systems

In 2004 MATLAB had around one million users across industry

and academia MATLAB users come from various backgrounds

of engineering science and economics MATLAB is widely used in

academic and research institutions as well as industrial enterprises

MATLAB was first adopted by researchers and practitioners

in control engineering Littles specialty but quickly spread to many other

domains It is now also used in education in particular the teaching

of linear algebra and numerical analysis and is popular amongst scientists

involved in image processing The MATLAB application is built around the

MATLAB language The simplest way to execute MATLAB code is to type

it in the Command Window which is one of the elements of the MATLAB

Desktop When code is entered in the Command Window MATLAB can

be used as an interactive mathematical shell Sequences of commands can

be saved in a text file typically using the MATLAB Editor as a script or

encapsulated into a function extending the commands available

MATLAB provides a number of features for documenting and

sharing your work You can integrate your MATLAB code with other

languages and applications and distribute your MATLAB algorithms and

applications

32 FEATURES OF MATLAB

High-level language for technical computing

Development environment for managing code files and data

Interactive tools for iterative exploration design and problem solving

Mathematical functions for linear algebra statistics Fourier analysis

filtering optimization and numerical integration

2-D and 3-D graphics functions for visualizing data

Tools for building custom graphical user interfaces

Functions for integrating MATLAB based algorithms with external

applications and languages such as C C++ FORTRAN Javatrade COM

and Microsoft Excel

MATLAB is used in vast area including signal and image

processing communications control design test and measurement

financial modeling and analysis and computational Add-on toolboxes

(collections of special-purpose MATLAB functions) extend the MATLAB

environment to solve particular classes of problems in these application

areas

MATLAB can be used on personal computers and powerful

server systems including the Cheaha compute cluster With the addition of

the Parallel Computing Toolbox the language can be extended with parallel

implementations for common computational functions including for-loop

unrolling Additionally this toolbox supports offloading computationally

intensive workloads to Cheaha the campus compute cluster MATLAB is

one of a few languages in which each variable is a matrix (broadly

construed) and knows how big it is Moreover the fundamental operators

(eg addition multiplication) are programmed to deal with matrices when

required And the MATLAB environment handles much of the bothersome

housekeeping that makes all this possible Since so many of the procedures

required for Macro-Investment Analysis involves matrices MATLAB

proves to be an extremely efficient language for both communication and

implementation

321 INTERFACING WITH OTHER LANGUAGES

MATLAB can call functions and subroutines written in the C

programming language or FORTRAN A wrapper function is created

allowing MATLAB data types to be passed and returned The dynamically

loadable object files created by compiling such functions are termed MEX-

files (for MATLAB executable)

Libraries written in Java ActiveX or NET can be directly called

from MATLAB and many MATLAB libraries (for

example XML or SQL support) are implemented as wrappers around Java

or ActiveX libraries Calling MATLAB from Java is more complicated but

can be done with MATLAB extension which is sold separately by Math

Works or using an undocumented mechanism called JMI (Java-to-Mat lab

Interface) which should not be confused with the unrelated Java that is also

called JMI

As alternatives to the MuPAD based Symbolic Math Toolbox

available from Math Works MATLAB can be connected

to Maple or Mathematical

Libraries also exist to import and export MathML

Development Environment

Startup Accelerator for faster MATLAB startup on Windows especially on

Windows XP and for network installations

Spreadsheet Import Tool that provides more options for selecting and

loading mixed textual and numeric data

Readability and navigation improvements to warning and error messages in

the MATLAB command window

Automatic variable and function renaming in the MATLAB Editor

Developing Algorithms and Applications

MATLAB provides a high-level language and development

tools that let you quickly develop and analyze your algorithms and

applications

The MATLAB Language

The MATLAB language supports the vector and matrix operations

that are fundamental to engineering and scientific problems It enables fast

development and execution With the MATLAB language you can

program and develop algorithms faster than with traditional languages

because you do not need to perform low-level administrative tasks such as

declaring variables specifying data types and allocating memory In many

cases MATLAB eliminates the need for lsquoforrsquo loops As a result one line of

MATLAB code can often replace several lines of C or C++ code

At the same time MATLAB provides all the features of a traditional

programming language including arithmetic operators flow control data

structures data types object-oriented programming (OOP) and debugging

features

MATLAB lets you execute commands or groups of commands one

at a time without compiling and linking enabling you to quickly iterate to

the optimal solution For fast execution of heavy matrix and vector

computations MATLAB uses processor-optimized libraries For general-

purpose scalar computations MATLAB generates machine-code

instructions using its JIT (Just-In-Time) compilation technology

This technology which is available on most platforms provides

execution speeds that rival those of traditional programming languages

Development Tools

MATLAB includes development tools that help you implement

your algorithm efficiently These include the following

MATLAB Editor

Provides standard editing and debugging features such as setting

breakpoints and single stepping

Code Analyzer

Checks your code for problems and recommends modifications to

maximize performance and maintainability

MATLAB Profiler

Records the time spent executing each line of code

Directory Reports

Scan all the files in a directory and report on code efficiency file

differences file dependencies and code coverage

Designing Graphical User Interfaces

By using the interactive tool GUIDE (Graphical User Interface

Development Environment) to layout design and edit user interfaces

GUIDE lets you include list boxes pull-down menus push buttons radio

buttons and sliders as well as MATLAB plots and Microsoft

ActiveXreg controls Alternatively you can create GUIs programmatically

using MATLAB functions

322 ANALYZING AND ACCESSING DATA

MATLAB supports the entire data analysis process from acquiring

data from external devices and databases through preprocessing

visualization and numerical analysis to producing presentation-quality

output

Data Analysis

MATLAB provides interactive tools and command-line functions for data

analysis operations including

Interpolating and decimating

Extracting sections of data scaling and averaging

Thresholding and smoothing

Correlation Fourier analysis and filtering

1-D peak valley and zero finding

Basic statistics and curve fitting

Matrix analysis

Data Access

MATLAB is an efficient platform for accessing data from

files other applications databases and external devices You can read data

from popular file formats such as Microsoft Excel ASCII text or binary

files image sound and video files and scientific files such as HDF and

HDF5 Low-level binary file IO functions let you work with data files in

any format Additional functions let you read data from Web pages and

XML

Visualizing Data

All the graphics features that are required to visualize engineering

and scientific data are available in MATLAB These include 2-D and 3-D

plotting functions 3-D volume visualization functions tools for

interactively creating plots and the ability to export results to all popular

graphics formats You can customize plots by adding multiple axes

changing line colors and markers adding annotation Latex equations and

legends and drawing shapes

2-D Plotting

Visualizing vectors of data with 2-D plotting functions that create

Line area bar and pie charts

Direction and velocity plots

Histograms

Polygons and surfaces

Scatterbubble plots

Animations

3-D Plotting and Volume Visualization

MATLAB provides functions for visualizing 2-D matrices 3-

D scalar and 3-D vector data You can use these functions to visualize and

understand large often complex multidimensional data Specifying plot

characteristics such as camera viewing angle perspective lighting effect

light source locations and transparency

3-D plotting functions include

Surface contour and mesh

Image plots

Cone slice stream and isosurface

323 PERFORMING NUMERIC COMPUTATION

MATLAB contains mathematical statistical and engineering

functions to support all common engineering and science operations These

functions developed by experts in mathematics are the foundation of the

MATLAB language The core math functions use the LAPACK and BLAS

linear algebra subroutine libraries and the FFTW Discrete Fourier

Transform library Because these processor-dependent libraries are

optimized to the different platforms that MATLAB supports they execute

faster than the equivalent C or C++ code

MATLAB provides the following types of functions for performing

mathematical operations and analyzing data

Matrix manipulation and linear algebra

Polynomials and interpolation

Fourier analysis and filtering

Data analysis and statistics

Optimization and numerical integration

Ordinary differential equations (ODEs)

Partial differential equations (PDEs)

Sparse matrix operations

MATLAB can perform arithmetic on a wide range of data types

including doubles singles and integers

CHAPTER 4

IMPLEMENTATION

41 GENERAL

Matlab is a program that was originally designed to simplify the

implementation of numerical linear algebra routines It has since grown into

something much bigger and it is used to implement numerical algorithms

for a wide range of applications The basic language used is very similar to

standard linear algebra notation but there are a few extensions that will

likely cause you some problems at first

42 SNAPSHOTS

ORIGINAL SCLERA IMAGE IS CONVERTED INTO GREY SCALE IMAGE

FIG

GREY SCALE IMAGE IS CONVERTED INTO BINARY IMAGE

FIG

EDGE DETECTON IS DONE BY OTSUrsquoS THRESHOLDING

FIG

SELECTING THE REGION OF INTEREST (SCLERA PART)

FIG

SELECTED ROI PART

FIG

FIG

ENHANCEMENT OF SCLERA IMAGE

FIG

FEATURE EXTRACTION OF SCLERA IMAGE USING GABOR

FILTERS

FIG

MATCHING WITH IMAGES IN DATABASE

FIG

DISPLAYING THE RESULT (MATCHED OR NOT MATCHED)

FIG

CHAPTER 5

APPLICATIONS

The applications of biometrics can be divided into the following three main groups

Commercial applications such as computer network login electronic data security ecommerce Internet access ATM credit card physical access control cellular phone PDA medical records management distance learning etc

Government applications such as national ID card correctional facility driverrsquos license social security welfare-disbursement border control Passport control etc

Forensic applications such as corpse identification criminal investigation terrorist identification parenthood determination missing children etc Traditionally commercial applications have used knowledge-based systems (eg PIN sand passwords) government applications have used token-based systems (eg ID cards and badges) and forensic applications have relied on human experts to match biometric features Biometric systems are being increasingly deployed in large scale civilian applications The Schiphol Premium scheme at the Amsterdam airport for example employs iris scan cards to speed up the passport and visa control procedures

CHAPTER 6

CONCLUSION AND FUTURE SCOPE

61 CONCLUSION

In this paper we proposed a new parallel sclera vein recognition

method which employees a two stage parallel approach for registration and

matching Even though the research focused on developing a parallel sclera

matching solution for the sequential line-descriptor method using CUDA

GPU architecture the parallel strategies developed in this research can be

applied to design parallel solutions to other sclera vein recognition methods

and general pattern recognition methods We designed the Y shape

descriptor to narrow the search range to increase the matching efficiency

which is a new feature extraction method to take advantage of the GPU

structures We developed the WPL descriptor to incorporate mask

information and make it more suitable for parallel computing which can

dramatically reduce data transferring and computation We then carefully

mapped our algorithms to GPU threads and blocks which is an important

step to achieve parallel computation efficiency using a GPU A work flow

which has high arithmetic intensity to hide the memory access latency was

designed to partition the computation task to the heterogeneous system of

CPU and GPU even to the threads in GPU The proposed method

dramatically improves the matching efficiency without compromising

recognition accuracy

62 REFERENCES

[1] C W Oyster The Human Eye Structure and Function Sunderland

Sinauer Associates 1999

[2] C Cuevas D Berjon F Moran and N Garcia ldquoMoving object

detection for real-time augmented reality applications in a GPGPUrdquo IEEE

Trans Consum Electron vol 58 no 1 pp 117ndash125 Feb 2012

[3] D C Cirean U Meier L M Gambardella and J Schmidhuber ldquoDeep

big simple neural nets for handwritten digit recognitionrdquo Neural Comput

vol 22 no 12 pp 3207ndash3220 2010

[4] F Z Sakr M Taher and A M Wahba ldquoHigh performance iris

recognition system on GPUrdquo in Proc ICCES 2011 pp 237ndash242

[5] G Poli J H Saito J F Mari and M R Zorzan ldquoProcessing

neocognitron of face recognition on high performance environment based

on GPU with CUDA architecturerdquo in Proc 20th Int Symp Comput

Archit High Perform Comput 2008 pp 81ndash88

[6] J Antikainen J Havel R Josth A Herout P Zemcik and M Hauta-

Kasari ldquoNonnegative tensor factorization accelerated using GPGPUrdquo IEEE

Trans Parallel Distrib Syst vol 22 no 7 pp 1135ndash1141 Feb 2011

[7] K-S Oh and K Jung ldquoGPU implementation of neural networksrdquo

Pattern Recognit vol 37 no 6 pp 1311ndash1314 2004

[8] P R Dixon T Oonishi and S Furui ldquoHarnessing graphics processors

for the fast computation of acoustic likelihoods in speech recognitionrdquo

Comput Speech Lang vol 23 no 4 pp 510ndash526 2009

[9] P Kaufman and A Alm ldquoClinical applicationrdquo Adlerrsquos Physiology of

the Eye 2003

[10] R N Rakvic B J Ulis R P Broussard R W Ives and N Steiner

ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics Security vol 4

no 4 pp 812ndash823 Dec 2009

[11] S Crihalmeanu and A Ross ldquoMultispectral scleral patterns for ocular

biometric recognitionrdquo Pattern Recognit Lett vol 33 no 14 pp 1860ndash

1869 Oct 2012

[12] W Wenying Z Dongming Z Yongdong L Jintao and G

Xiaoguang ldquoRobust spatial matching for object retrieval and its parallel

implementation on GPUrdquo IEEE Trans Multimedia vol 13 no 6 pp

1308ndash1318 Dec 2011Multimedia Sec Magdeburg Germany Sep 2004

pp 4ndash15

[13] Y Xu S Deka and R Righetti ldquoA hybrid CPU-GPGPU approach for

real-time elastographyrdquo IEEE Trans Ultrason Ferroelectr Freq Control

vol 58 no 12 pp 2631ndash2645 Dec 2011

[14] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive

multimodal eye recognitionrdquo Signal Image Video Process vol 7 no 4

pp 619ndash631 Jul 2013

[15] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive

approach for sclera image quality measurerdquo Int J Biometrics vol 5 no 2

pp 181ndash198 2013

[16] Z Zhou E Y Du N L Thomas and E J Delp ldquoA new human

identification method Sclera recognitionrdquo IEEE Trans Syst Man

Cybern A Syst Humans vol 42 no 3 pp 571ndash583 May 2012

Page 54: Major project report on

accessible by all threads which would be very time consuming to access

these memories

Constant memory and texture memory are read-only and cacheable

memory Mapping algorithms to CUDA to achieve efficient processing is

not a trivial task There are several challenges in CUDA programming

If threads in a warp have different control path all the branches will be

executed serially To improve performance branch divergence within a

warp should be avoided

Global memory is slower than on-chip memory in term of access To

completely hide the latency of the small instructions set we should use on-

chip memory preferentially rather than global memory When global

memory access occurs threads in same warp should access the words in

sequence to achieve coalescence

Shared memory is much faster than the local and global memory space

But shared memory is organized into banks which are equal in size If two

addresses of memory request from different thread within a warp fall in the

same memory bank the access will be serialized To get maximum

performance memory requests should be scheduled to minimize bank

conflicts

251 MAPPING ALGORITHM TO BLOCKS

Because the proposed registration and matching algorithm has four

independent modules all the modules will be converted to different kernels

on the GPU These kernels are different in computation density thus we

map them to the GPU by various map strategies to fully utilize the

computing power of CUDA Figure 11 shows our scheme of CPU-GPU

task distribution and the partition among blocks and threads Algorithm 1 is

partitioned into coarse-grained parallel subtasks

We create a number of threads in this kernel The number of threads

is the same as the number of templates in the database As the upper middle

column shows in Figure 11 each target template will be assigned to one

thread One thread performs a pair of templates compare In our work we

use NVIDIA C2070 as our GPU Threads and blocks number is set to

1024 That means we can match our test template with up to 1024times1024

target templates at same time

Algorithms 2-4 will be partitioned into fine-grained subtasks which is

processed a section of descriptors in one thread As the lower portion of the

middle column shows in Figure 11 we assigned a target template to one

block Inside a block one thread corresponds a set of descriptors in this

template This partition makes every block execute independently and there

are no data exchange requirements between different blocks When all

threads complete their responding descriptor fractions the sum of the

intermediate results needs to be computed or compared A parallel prefix

sum algorithm is used to calculate the sum of intermediate results which is

show in right of Figure 11 Firstly all odd number threads compute the sum

of consecutive pairs of the results Then recursively every first of i (= 4 8

16 32 64 ) threads

compute the prefix sum on the new result The final result will be saved in

the first address which has the same variable name as the first intermediate

result

252 MAPPING INSIDE BLOCK

In shift argument searching there are two schemes we can choose to

map task

Mapping one pair of templates to all the threads in a block and then every

thread would take charge of a fraction of descriptors and cooperation with

other threads

Assigning a single possible shift offset to a thread and all the threads will

compute independently unless the final result should be compared with

other possible offset

Due to great number of sum and synchronization operations in every

nearest neighbor searching step we choose the second method to parallelize

shift searching In affine matrix generator we mapped an entire parameter

set searching to a thread and every thread randomly generated a set of

parameters and tried them independently The generated iterations were

assigned to all threads The challenge of this step is the randomly generated

numbers might be correlated among threads In the step of rotation and

scale registration generating we used the Mersenne Twister pseudorandom

number generator because it can use bitwise arithmetic and have long

period

The Mersenne twister as most of pseudorandom generators is iterative

Therefore itrsquos hard to parallelize a single twister state update step among

several execution threads To make sure that thousands of threads in the

launch grid generate uncorrelated random sequence many simultaneous

Mersenne twisters need to process with different initial states in parallel

But even ldquovery differentrdquo (by any definition) initial state values do not

prevent the emission of correlated sequences by each generator sharing

identical parameters To solve this problem and to enable efficient

implementation of Mersenne Twister on parallel architectures we used a

special offline tool for the dynamic creation of Mersenne Twisters

parameters modified from the algorithm developed by Makoto Matsumoto

and Takuji Nishimura In the registration and matching step when

searching the nearest neighbor a line segment that has already matched

with others should not be used again In our approach a flag

FIG

FIG

Variable denoting whether the line has been matched is stored in

shared memory To share the flags all the threads in a block should wait

synchronic operation at every query step Our solution is to use a single

thread in a block to process the matching

253 MEMORY MANAGEMENT

The bandwidth inside GPU board is much higher than the

bandwidth between host memory and device memory The data transfer

between host and device can lead to long latency As shown in Figure 11

we load the entire target templates set from database without considering

when they would be processed Therefore there was no data transfer from

host to device during the matching procedure In global memory the

components in descriptor y(ϕ1 ϕ2 ϕ3 x y) and s(x y rθ ϕw) were stored

separately This would guarantee contiguous kernels of Algorithm 2 to 4

can access their data in successive addresses Although such coalescing

access reduces the latency frequently global memory access was still a

slower way to get data In our kernel we loaded the test template to shared

memory to accelerate memory access Because the Algorithms 2 to 4

execute different number of iterations on same data the bank conflict does

not happen To maximize our texture memory space we set the system

cache to the lowest value and bonded our target descriptor to texture

memory Using this catchable memory our data access was accelerated

more

FIG

26 HISTOGRAM OF ORIENTED GRADIENTS

Histogram of oriented gradients is the feature descriptors It is primarily

applied to the design of target detection In this paper it is applied as the

feature for human recognition In the sclera region the vein patterns are the

edges of an image So HOG is used to determine the gradient orientation

and edge orientations of vein pattern in the sclera region of an eye image

To follow out this technique first of all divide the image into small

connected regions called cells For each cell compute the histogram of

gradient directions or edge orientations of the pixels Then the combination

of different histogram of different cell represents the descriptor To improve

accuracy histograms can be contrast normalized by calculating the intensity

from the block and then using this value normalizes all cells within the

block This normalization result shows that it is invariant to geometric and

photometric changes The gradient magnitude m(x y) and orientation 1050592(x

y) are calculated using x and y directions gradients dx (x y) and dy (x y)

Orientation binning is the second step of HOG This method utilized

to create cell histograms Each pixel within the cell used to give a weight to

the orientation which is found in the gradient computation Gradient

magnitude is used as the weight The cells are in the rectangular form The

binning of gradient orientation should be spread over 0 to 180 degrees and

opposite direction counts as the same In the Fig 8 depicts the edge

orientation of picture elements If the images have any illumination and

contrast changes then the gradient strength must be locally normalized For

that cells are grouped together into larger blocks These blocks are

overlapping so that each cell contributes more than once to the final

descriptor Here rectangular HOG (R-HOG) blocks are applied which are

mainly in square grids The performance of HOG is improved by putting

on a Gaussian window into each block

FIG

CHAPTER 3

SOFTWARE SPECIFICATION

31 GENERAL

MATLAB(matrix laboratory) is a numerical

computing environment and fourth-generation programming language

Developed by Math Works MATLAB allows matrix manipulations

plotting of functions and data implementation of algorithms creation

of user interfaces and interfacing with programs written in other languages

including C C++ Java and Fortran

Although MATLAB is intended primarily for numerical computing an

optional toolbox uses the MuPAD symbolic engine allowing access

to symbolic computing capabilities An additional package Simulink adds

graphicalmulti-domainsimulationandModel-Based

Design for dynamic and embedded systems

In 2004 MATLAB had around one million users across industry

and academia MATLAB users come from various backgrounds

of engineering science and economics MATLAB is widely used in

academic and research institutions as well as industrial enterprises

MATLAB was first adopted by researchers and practitioners

in control engineering Littles specialty but quickly spread to many other

domains It is now also used in education in particular the teaching

of linear algebra and numerical analysis and is popular amongst scientists

involved in image processing The MATLAB application is built around the

MATLAB language The simplest way to execute MATLAB code is to type

it in the Command Window which is one of the elements of the MATLAB

Desktop When code is entered in the Command Window MATLAB can

be used as an interactive mathematical shell Sequences of commands can

be saved in a text file typically using the MATLAB Editor as a script or

encapsulated into a function extending the commands available

MATLAB provides a number of features for documenting and

sharing your work You can integrate your MATLAB code with other

languages and applications and distribute your MATLAB algorithms and

applications

32 FEATURES OF MATLAB

High-level language for technical computing

Development environment for managing code files and data

Interactive tools for iterative exploration design and problem solving

Mathematical functions for linear algebra statistics Fourier analysis

filtering optimization and numerical integration

2-D and 3-D graphics functions for visualizing data

Tools for building custom graphical user interfaces

Functions for integrating MATLAB based algorithms with external

applications and languages such as C C++ FORTRAN Javatrade COM

and Microsoft Excel

MATLAB is used in vast area including signal and image

processing communications control design test and measurement

financial modeling and analysis and computational Add-on toolboxes

(collections of special-purpose MATLAB functions) extend the MATLAB

environment to solve particular classes of problems in these application

areas

MATLAB can be used on personal computers and powerful

server systems including the Cheaha compute cluster With the addition of

the Parallel Computing Toolbox the language can be extended with parallel

implementations for common computational functions including for-loop

unrolling Additionally this toolbox supports offloading computationally

intensive workloads to Cheaha the campus compute cluster MATLAB is

one of a few languages in which each variable is a matrix (broadly

construed) and knows how big it is Moreover the fundamental operators

(eg addition multiplication) are programmed to deal with matrices when

required And the MATLAB environment handles much of the bothersome

housekeeping that makes all this possible Since so many of the procedures

required for Macro-Investment Analysis involves matrices MATLAB

proves to be an extremely efficient language for both communication and

implementation

321 INTERFACING WITH OTHER LANGUAGES

MATLAB can call functions and subroutines written in the C

programming language or FORTRAN A wrapper function is created

allowing MATLAB data types to be passed and returned The dynamically

loadable object files created by compiling such functions are termed MEX-

files (for MATLAB executable)

Libraries written in Java ActiveX or NET can be directly called

from MATLAB and many MATLAB libraries (for

example XML or SQL support) are implemented as wrappers around Java

or ActiveX libraries Calling MATLAB from Java is more complicated but

can be done with MATLAB extension which is sold separately by Math

Works or using an undocumented mechanism called JMI (Java-to-Mat lab

Interface) which should not be confused with the unrelated Java that is also

called JMI

As alternatives to the MuPAD based Symbolic Math Toolbox

available from Math Works MATLAB can be connected

to Maple or Mathematical

Libraries also exist to import and export MathML

Development Environment

Startup Accelerator for faster MATLAB startup on Windows especially on

Windows XP and for network installations

Spreadsheet Import Tool that provides more options for selecting and

loading mixed textual and numeric data

Readability and navigation improvements to warning and error messages in

the MATLAB command window

Automatic variable and function renaming in the MATLAB Editor

Developing Algorithms and Applications

MATLAB provides a high-level language and development

tools that let you quickly develop and analyze your algorithms and

applications

The MATLAB Language

The MATLAB language supports the vector and matrix operations

that are fundamental to engineering and scientific problems It enables fast

development and execution With the MATLAB language you can

program and develop algorithms faster than with traditional languages

because you do not need to perform low-level administrative tasks such as

declaring variables specifying data types and allocating memory In many

cases MATLAB eliminates the need for lsquoforrsquo loops As a result one line of

MATLAB code can often replace several lines of C or C++ code

At the same time MATLAB provides all the features of a traditional

programming language including arithmetic operators flow control data

structures data types object-oriented programming (OOP) and debugging

features

MATLAB lets you execute commands or groups of commands one

at a time without compiling and linking enabling you to quickly iterate to

the optimal solution For fast execution of heavy matrix and vector

computations MATLAB uses processor-optimized libraries For general-

purpose scalar computations MATLAB generates machine-code

instructions using its JIT (Just-In-Time) compilation technology

This technology which is available on most platforms provides

execution speeds that rival those of traditional programming languages

Development Tools

MATLAB includes development tools that help you implement

your algorithm efficiently These include the following

MATLAB Editor

Provides standard editing and debugging features such as setting

breakpoints and single stepping

Code Analyzer

Checks your code for problems and recommends modifications to

maximize performance and maintainability

MATLAB Profiler

Records the time spent executing each line of code

Directory Reports

Scan all the files in a directory and report on code efficiency file

differences file dependencies and code coverage

Designing Graphical User Interfaces

By using the interactive tool GUIDE (Graphical User Interface

Development Environment) to layout design and edit user interfaces

GUIDE lets you include list boxes pull-down menus push buttons radio

buttons and sliders as well as MATLAB plots and Microsoft

ActiveXreg controls Alternatively you can create GUIs programmatically

using MATLAB functions

322 ANALYZING AND ACCESSING DATA

MATLAB supports the entire data analysis process from acquiring

data from external devices and databases through preprocessing

visualization and numerical analysis to producing presentation-quality

output

Data Analysis

MATLAB provides interactive tools and command-line functions for data

analysis operations including

Interpolating and decimating

Extracting sections of data scaling and averaging

Thresholding and smoothing

Correlation Fourier analysis and filtering

1-D peak valley and zero finding

Basic statistics and curve fitting

Matrix analysis

Data Access

MATLAB is an efficient platform for accessing data from

files other applications databases and external devices You can read data

from popular file formats such as Microsoft Excel ASCII text or binary

files image sound and video files and scientific files such as HDF and

HDF5 Low-level binary file IO functions let you work with data files in

any format Additional functions let you read data from Web pages and

XML

Visualizing Data

All the graphics features that are required to visualize engineering

and scientific data are available in MATLAB These include 2-D and 3-D

plotting functions 3-D volume visualization functions tools for

interactively creating plots and the ability to export results to all popular

graphics formats You can customize plots by adding multiple axes

changing line colors and markers adding annotation Latex equations and

legends and drawing shapes

2-D Plotting

Visualizing vectors of data with 2-D plotting functions that create

Line area bar and pie charts

Direction and velocity plots

Histograms

Polygons and surfaces

Scatterbubble plots

Animations

3-D Plotting and Volume Visualization

MATLAB provides functions for visualizing 2-D matrices 3-

D scalar and 3-D vector data You can use these functions to visualize and

understand large often complex multidimensional data Specifying plot

characteristics such as camera viewing angle perspective lighting effect

light source locations and transparency

3-D plotting functions include

Surface contour and mesh

Image plots

Cone slice stream and isosurface

323 PERFORMING NUMERIC COMPUTATION

MATLAB contains mathematical statistical and engineering

functions to support all common engineering and science operations These

functions developed by experts in mathematics are the foundation of the

MATLAB language The core math functions use the LAPACK and BLAS

linear algebra subroutine libraries and the FFTW Discrete Fourier

Transform library Because these processor-dependent libraries are

optimized to the different platforms that MATLAB supports they execute

faster than the equivalent C or C++ code

MATLAB provides the following types of functions for performing

mathematical operations and analyzing data

Matrix manipulation and linear algebra

Polynomials and interpolation

Fourier analysis and filtering

Data analysis and statistics

Optimization and numerical integration

Ordinary differential equations (ODEs)

Partial differential equations (PDEs)

Sparse matrix operations

MATLAB can perform arithmetic on a wide range of data types

including doubles singles and integers

CHAPTER 4

IMPLEMENTATION

41 GENERAL

Matlab is a program that was originally designed to simplify the

implementation of numerical linear algebra routines It has since grown into

something much bigger and it is used to implement numerical algorithms

for a wide range of applications The basic language used is very similar to

standard linear algebra notation but there are a few extensions that will

likely cause you some problems at first

42 SNAPSHOTS

ORIGINAL SCLERA IMAGE IS CONVERTED INTO GREY SCALE IMAGE

FIG

GREY SCALE IMAGE IS CONVERTED INTO BINARY IMAGE

FIG

EDGE DETECTON IS DONE BY OTSUrsquoS THRESHOLDING

FIG

SELECTING THE REGION OF INTEREST (SCLERA PART)

FIG

SELECTED ROI PART

FIG

FIG

ENHANCEMENT OF SCLERA IMAGE

FIG

FEATURE EXTRACTION OF SCLERA IMAGE USING GABOR

FILTERS

FIG

MATCHING WITH IMAGES IN DATABASE

FIG

DISPLAYING THE RESULT (MATCHED OR NOT MATCHED)

FIG

CHAPTER 5

APPLICATIONS

The applications of biometrics can be divided into the following three main groups

Commercial applications such as computer network login electronic data security ecommerce Internet access ATM credit card physical access control cellular phone PDA medical records management distance learning etc

Government applications such as national ID card correctional facility driverrsquos license social security welfare-disbursement border control Passport control etc

Forensic applications such as corpse identification criminal investigation terrorist identification parenthood determination missing children etc Traditionally commercial applications have used knowledge-based systems (eg PIN sand passwords) government applications have used token-based systems (eg ID cards and badges) and forensic applications have relied on human experts to match biometric features Biometric systems are being increasingly deployed in large scale civilian applications The Schiphol Premium scheme at the Amsterdam airport for example employs iris scan cards to speed up the passport and visa control procedures

CHAPTER 6

CONCLUSION AND FUTURE SCOPE

61 CONCLUSION

In this paper we proposed a new parallel sclera vein recognition

method which employees a two stage parallel approach for registration and

matching Even though the research focused on developing a parallel sclera

matching solution for the sequential line-descriptor method using CUDA

GPU architecture the parallel strategies developed in this research can be

applied to design parallel solutions to other sclera vein recognition methods

and general pattern recognition methods We designed the Y shape

descriptor to narrow the search range to increase the matching efficiency

which is a new feature extraction method to take advantage of the GPU

structures We developed the WPL descriptor to incorporate mask

information and make it more suitable for parallel computing which can

dramatically reduce data transferring and computation We then carefully

mapped our algorithms to GPU threads and blocks which is an important

step to achieve parallel computation efficiency using a GPU A work flow

which has high arithmetic intensity to hide the memory access latency was

designed to partition the computation task to the heterogeneous system of

CPU and GPU even to the threads in GPU The proposed method

dramatically improves the matching efficiency without compromising

recognition accuracy

62 REFERENCES

[1] C W Oyster The Human Eye Structure and Function Sunderland

Sinauer Associates 1999

[2] C Cuevas D Berjon F Moran and N Garcia ldquoMoving object

detection for real-time augmented reality applications in a GPGPUrdquo IEEE

Trans Consum Electron vol 58 no 1 pp 117ndash125 Feb 2012

[3] D C Cirean U Meier L M Gambardella and J Schmidhuber ldquoDeep

big simple neural nets for handwritten digit recognitionrdquo Neural Comput

vol 22 no 12 pp 3207ndash3220 2010

[4] F Z Sakr M Taher and A M Wahba ldquoHigh performance iris

recognition system on GPUrdquo in Proc ICCES 2011 pp 237ndash242

[5] G Poli J H Saito J F Mari and M R Zorzan ldquoProcessing

neocognitron of face recognition on high performance environment based

on GPU with CUDA architecturerdquo in Proc 20th Int Symp Comput

Archit High Perform Comput 2008 pp 81ndash88

[6] J Antikainen J Havel R Josth A Herout P Zemcik and M Hauta-

Kasari ldquoNonnegative tensor factorization accelerated using GPGPUrdquo IEEE

Trans Parallel Distrib Syst vol 22 no 7 pp 1135ndash1141 Feb 2011

[7] K-S Oh and K Jung ldquoGPU implementation of neural networksrdquo

Pattern Recognit vol 37 no 6 pp 1311ndash1314 2004

[8] P R Dixon T Oonishi and S Furui ldquoHarnessing graphics processors

for the fast computation of acoustic likelihoods in speech recognitionrdquo

Comput Speech Lang vol 23 no 4 pp 510ndash526 2009

[9] P Kaufman and A Alm ldquoClinical applicationrdquo Adlerrsquos Physiology of

the Eye 2003

[10] R N Rakvic B J Ulis R P Broussard R W Ives and N Steiner

ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics Security vol 4

no 4 pp 812ndash823 Dec 2009

[11] S Crihalmeanu and A Ross ldquoMultispectral scleral patterns for ocular

biometric recognitionrdquo Pattern Recognit Lett vol 33 no 14 pp 1860ndash

1869 Oct 2012

[12] W Wenying Z Dongming Z Yongdong L Jintao and G

Xiaoguang ldquoRobust spatial matching for object retrieval and its parallel

implementation on GPUrdquo IEEE Trans Multimedia vol 13 no 6 pp

1308ndash1318 Dec 2011Multimedia Sec Magdeburg Germany Sep 2004

pp 4ndash15

[13] Y Xu S Deka and R Righetti ldquoA hybrid CPU-GPGPU approach for

real-time elastographyrdquo IEEE Trans Ultrason Ferroelectr Freq Control

vol 58 no 12 pp 2631ndash2645 Dec 2011

[14] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive

multimodal eye recognitionrdquo Signal Image Video Process vol 7 no 4

pp 619ndash631 Jul 2013

[15] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive

approach for sclera image quality measurerdquo Int J Biometrics vol 5 no 2

pp 181ndash198 2013

[16] Z Zhou E Y Du N L Thomas and E J Delp ldquoA new human

identification method Sclera recognitionrdquo IEEE Trans Syst Man

Cybern A Syst Humans vol 42 no 3 pp 571ndash583 May 2012

Page 55: Major project report on

1024 That means we can match our test template with up to 1024times1024

target templates at same time

Algorithms 2-4 will be partitioned into fine-grained subtasks which is

processed a section of descriptors in one thread As the lower portion of the

middle column shows in Figure 11 we assigned a target template to one

block Inside a block one thread corresponds a set of descriptors in this

template This partition makes every block execute independently and there

are no data exchange requirements between different blocks When all

threads complete their responding descriptor fractions the sum of the

intermediate results needs to be computed or compared A parallel prefix

sum algorithm is used to calculate the sum of intermediate results which is

show in right of Figure 11 Firstly all odd number threads compute the sum

of consecutive pairs of the results Then recursively every first of i (= 4 8

16 32 64 ) threads

compute the prefix sum on the new result The final result will be saved in

the first address which has the same variable name as the first intermediate

result

252 MAPPING INSIDE BLOCK

In shift argument searching there are two schemes we can choose to

map task

Mapping one pair of templates to all the threads in a block and then every

thread would take charge of a fraction of descriptors and cooperation with

other threads

Assigning a single possible shift offset to a thread and all the threads will

compute independently unless the final result should be compared with

other possible offset

Due to great number of sum and synchronization operations in every

nearest neighbor searching step we choose the second method to parallelize

shift searching In affine matrix generator we mapped an entire parameter

set searching to a thread and every thread randomly generated a set of

parameters and tried them independently The generated iterations were

assigned to all threads The challenge of this step is the randomly generated

numbers might be correlated among threads In the step of rotation and

scale registration generating we used the Mersenne Twister pseudorandom

number generator because it can use bitwise arithmetic and have long

period

The Mersenne twister as most of pseudorandom generators is iterative

Therefore itrsquos hard to parallelize a single twister state update step among

several execution threads To make sure that thousands of threads in the

launch grid generate uncorrelated random sequence many simultaneous

Mersenne twisters need to process with different initial states in parallel

But even ldquovery differentrdquo (by any definition) initial state values do not

prevent the emission of correlated sequences by each generator sharing

identical parameters To solve this problem and to enable efficient

implementation of Mersenne Twister on parallel architectures we used a

special offline tool for the dynamic creation of Mersenne Twisters

parameters modified from the algorithm developed by Makoto Matsumoto

and Takuji Nishimura In the registration and matching step when

searching the nearest neighbor a line segment that has already matched

with others should not be used again In our approach a flag

FIG

FIG

Variable denoting whether the line has been matched is stored in

shared memory To share the flags all the threads in a block should wait

synchronic operation at every query step Our solution is to use a single

thread in a block to process the matching

253 MEMORY MANAGEMENT

The bandwidth inside GPU board is much higher than the

bandwidth between host memory and device memory The data transfer

between host and device can lead to long latency As shown in Figure 11

we load the entire target templates set from database without considering

when they would be processed Therefore there was no data transfer from

host to device during the matching procedure In global memory the

components in descriptor y(ϕ1 ϕ2 ϕ3 x y) and s(x y rθ ϕw) were stored

separately This would guarantee contiguous kernels of Algorithm 2 to 4

can access their data in successive addresses Although such coalescing

access reduces the latency frequently global memory access was still a

slower way to get data In our kernel we loaded the test template to shared

memory to accelerate memory access Because the Algorithms 2 to 4

execute different number of iterations on same data the bank conflict does

not happen To maximize our texture memory space we set the system

cache to the lowest value and bonded our target descriptor to texture

memory Using this catchable memory our data access was accelerated

more

FIG

26 HISTOGRAM OF ORIENTED GRADIENTS

Histogram of oriented gradients is the feature descriptors It is primarily

applied to the design of target detection In this paper it is applied as the

feature for human recognition In the sclera region the vein patterns are the

edges of an image So HOG is used to determine the gradient orientation

and edge orientations of vein pattern in the sclera region of an eye image

To follow out this technique first of all divide the image into small

connected regions called cells For each cell compute the histogram of

gradient directions or edge orientations of the pixels Then the combination

of different histogram of different cell represents the descriptor To improve

accuracy histograms can be contrast normalized by calculating the intensity

from the block and then using this value normalizes all cells within the

block This normalization result shows that it is invariant to geometric and

photometric changes The gradient magnitude m(x y) and orientation 1050592(x

y) are calculated using x and y directions gradients dx (x y) and dy (x y)

Orientation binning is the second step of HOG This method utilized

to create cell histograms Each pixel within the cell used to give a weight to

the orientation which is found in the gradient computation Gradient

magnitude is used as the weight The cells are in the rectangular form The

binning of gradient orientation should be spread over 0 to 180 degrees and

opposite direction counts as the same In the Fig 8 depicts the edge

orientation of picture elements If the images have any illumination and

contrast changes then the gradient strength must be locally normalized For

that cells are grouped together into larger blocks These blocks are

overlapping so that each cell contributes more than once to the final

descriptor Here rectangular HOG (R-HOG) blocks are applied which are

mainly in square grids The performance of HOG is improved by putting

on a Gaussian window into each block

FIG

CHAPTER 3

SOFTWARE SPECIFICATION

31 GENERAL

MATLAB(matrix laboratory) is a numerical

computing environment and fourth-generation programming language

Developed by Math Works MATLAB allows matrix manipulations

plotting of functions and data implementation of algorithms creation

of user interfaces and interfacing with programs written in other languages

including C C++ Java and Fortran

Although MATLAB is intended primarily for numerical computing an

optional toolbox uses the MuPAD symbolic engine allowing access

to symbolic computing capabilities An additional package Simulink adds

graphicalmulti-domainsimulationandModel-Based

Design for dynamic and embedded systems

In 2004 MATLAB had around one million users across industry

and academia MATLAB users come from various backgrounds

of engineering science and economics MATLAB is widely used in

academic and research institutions as well as industrial enterprises

MATLAB was first adopted by researchers and practitioners

in control engineering Littles specialty but quickly spread to many other

domains It is now also used in education in particular the teaching

of linear algebra and numerical analysis and is popular amongst scientists

involved in image processing The MATLAB application is built around the

MATLAB language The simplest way to execute MATLAB code is to type

it in the Command Window which is one of the elements of the MATLAB

Desktop When code is entered in the Command Window MATLAB can

be used as an interactive mathematical shell Sequences of commands can

be saved in a text file typically using the MATLAB Editor as a script or

encapsulated into a function extending the commands available

MATLAB provides a number of features for documenting and

sharing your work You can integrate your MATLAB code with other

languages and applications and distribute your MATLAB algorithms and

applications

32 FEATURES OF MATLAB

High-level language for technical computing

Development environment for managing code files and data

Interactive tools for iterative exploration design and problem solving

Mathematical functions for linear algebra statistics Fourier analysis

filtering optimization and numerical integration

2-D and 3-D graphics functions for visualizing data

Tools for building custom graphical user interfaces

Functions for integrating MATLAB based algorithms with external

applications and languages such as C C++ FORTRAN Javatrade COM

and Microsoft Excel

MATLAB is used in vast area including signal and image

processing communications control design test and measurement

financial modeling and analysis and computational Add-on toolboxes

(collections of special-purpose MATLAB functions) extend the MATLAB

environment to solve particular classes of problems in these application

areas

MATLAB can be used on personal computers and powerful

server systems including the Cheaha compute cluster With the addition of

the Parallel Computing Toolbox the language can be extended with parallel

implementations for common computational functions including for-loop

unrolling Additionally this toolbox supports offloading computationally

intensive workloads to Cheaha the campus compute cluster MATLAB is

one of a few languages in which each variable is a matrix (broadly

construed) and knows how big it is Moreover the fundamental operators

(eg addition multiplication) are programmed to deal with matrices when

required And the MATLAB environment handles much of the bothersome

housekeeping that makes all this possible Since so many of the procedures

required for Macro-Investment Analysis involves matrices MATLAB

proves to be an extremely efficient language for both communication and

implementation

321 INTERFACING WITH OTHER LANGUAGES

MATLAB can call functions and subroutines written in the C

programming language or FORTRAN A wrapper function is created

allowing MATLAB data types to be passed and returned The dynamically

loadable object files created by compiling such functions are termed MEX-

files (for MATLAB executable)

Libraries written in Java ActiveX or NET can be directly called

from MATLAB and many MATLAB libraries (for

example XML or SQL support) are implemented as wrappers around Java

or ActiveX libraries Calling MATLAB from Java is more complicated but

can be done with MATLAB extension which is sold separately by Math

Works or using an undocumented mechanism called JMI (Java-to-Mat lab

Interface) which should not be confused with the unrelated Java that is also

called JMI

As alternatives to the MuPAD based Symbolic Math Toolbox

available from Math Works MATLAB can be connected

to Maple or Mathematical

Libraries also exist to import and export MathML

Development Environment

Startup Accelerator for faster MATLAB startup on Windows especially on

Windows XP and for network installations

Spreadsheet Import Tool that provides more options for selecting and

loading mixed textual and numeric data

Readability and navigation improvements to warning and error messages in

the MATLAB command window

Automatic variable and function renaming in the MATLAB Editor

Developing Algorithms and Applications

MATLAB provides a high-level language and development

tools that let you quickly develop and analyze your algorithms and

applications

The MATLAB Language

The MATLAB language supports the vector and matrix operations

that are fundamental to engineering and scientific problems It enables fast

development and execution With the MATLAB language you can

program and develop algorithms faster than with traditional languages

because you do not need to perform low-level administrative tasks such as

declaring variables specifying data types and allocating memory In many

cases MATLAB eliminates the need for lsquoforrsquo loops As a result one line of

MATLAB code can often replace several lines of C or C++ code

At the same time MATLAB provides all the features of a traditional

programming language including arithmetic operators flow control data

structures data types object-oriented programming (OOP) and debugging

features

MATLAB lets you execute commands or groups of commands one

at a time without compiling and linking enabling you to quickly iterate to

the optimal solution For fast execution of heavy matrix and vector

computations MATLAB uses processor-optimized libraries For general-

purpose scalar computations MATLAB generates machine-code

instructions using its JIT (Just-In-Time) compilation technology

This technology which is available on most platforms provides

execution speeds that rival those of traditional programming languages

Development Tools

MATLAB includes development tools that help you implement

your algorithm efficiently These include the following

MATLAB Editor

Provides standard editing and debugging features such as setting

breakpoints and single stepping

Code Analyzer

Checks your code for problems and recommends modifications to

maximize performance and maintainability

MATLAB Profiler

Records the time spent executing each line of code

Directory Reports

Scan all the files in a directory and report on code efficiency file

differences file dependencies and code coverage

Designing Graphical User Interfaces

By using the interactive tool GUIDE (Graphical User Interface

Development Environment) to layout design and edit user interfaces

GUIDE lets you include list boxes pull-down menus push buttons radio

buttons and sliders as well as MATLAB plots and Microsoft

ActiveXreg controls Alternatively you can create GUIs programmatically

using MATLAB functions

322 ANALYZING AND ACCESSING DATA

MATLAB supports the entire data analysis process from acquiring

data from external devices and databases through preprocessing

visualization and numerical analysis to producing presentation-quality

output

Data Analysis

MATLAB provides interactive tools and command-line functions for data

analysis operations including

Interpolating and decimating

Extracting sections of data scaling and averaging

Thresholding and smoothing

Correlation Fourier analysis and filtering

1-D peak valley and zero finding

Basic statistics and curve fitting

Matrix analysis

Data Access

MATLAB is an efficient platform for accessing data from

files other applications databases and external devices You can read data

from popular file formats such as Microsoft Excel ASCII text or binary

files image sound and video files and scientific files such as HDF and

HDF5 Low-level binary file IO functions let you work with data files in

any format Additional functions let you read data from Web pages and

XML

Visualizing Data

All the graphics features that are required to visualize engineering

and scientific data are available in MATLAB These include 2-D and 3-D

plotting functions 3-D volume visualization functions tools for

interactively creating plots and the ability to export results to all popular

graphics formats You can customize plots by adding multiple axes

changing line colors and markers adding annotation Latex equations and

legends and drawing shapes

2-D Plotting

Visualizing vectors of data with 2-D plotting functions that create

Line area bar and pie charts

Direction and velocity plots

Histograms

Polygons and surfaces

Scatterbubble plots

Animations

3-D Plotting and Volume Visualization

MATLAB provides functions for visualizing 2-D matrices 3-

D scalar and 3-D vector data You can use these functions to visualize and

understand large often complex multidimensional data Specifying plot

characteristics such as camera viewing angle perspective lighting effect

light source locations and transparency

3-D plotting functions include

Surface contour and mesh

Image plots

Cone slice stream and isosurface

323 PERFORMING NUMERIC COMPUTATION

MATLAB contains mathematical statistical and engineering

functions to support all common engineering and science operations These

functions developed by experts in mathematics are the foundation of the

MATLAB language The core math functions use the LAPACK and BLAS

linear algebra subroutine libraries and the FFTW Discrete Fourier

Transform library Because these processor-dependent libraries are

optimized to the different platforms that MATLAB supports they execute

faster than the equivalent C or C++ code

MATLAB provides the following types of functions for performing

mathematical operations and analyzing data

Matrix manipulation and linear algebra

Polynomials and interpolation

Fourier analysis and filtering

Data analysis and statistics

Optimization and numerical integration

Ordinary differential equations (ODEs)

Partial differential equations (PDEs)

Sparse matrix operations

MATLAB can perform arithmetic on a wide range of data types

including doubles singles and integers

CHAPTER 4

IMPLEMENTATION

41 GENERAL

Matlab is a program that was originally designed to simplify the

implementation of numerical linear algebra routines It has since grown into

something much bigger and it is used to implement numerical algorithms

for a wide range of applications The basic language used is very similar to

standard linear algebra notation but there are a few extensions that will

likely cause you some problems at first

42 SNAPSHOTS

ORIGINAL SCLERA IMAGE IS CONVERTED INTO GREY SCALE IMAGE

FIG

GREY SCALE IMAGE IS CONVERTED INTO BINARY IMAGE

FIG

EDGE DETECTON IS DONE BY OTSUrsquoS THRESHOLDING

FIG

SELECTING THE REGION OF INTEREST (SCLERA PART)

FIG

SELECTED ROI PART

FIG

FIG

ENHANCEMENT OF SCLERA IMAGE

FIG

FEATURE EXTRACTION OF SCLERA IMAGE USING GABOR

FILTERS

FIG

MATCHING WITH IMAGES IN DATABASE

FIG

DISPLAYING THE RESULT (MATCHED OR NOT MATCHED)

FIG

CHAPTER 5

APPLICATIONS

The applications of biometrics can be divided into the following three main groups

Commercial applications such as computer network login electronic data security ecommerce Internet access ATM credit card physical access control cellular phone PDA medical records management distance learning etc

Government applications such as national ID card correctional facility driverrsquos license social security welfare-disbursement border control Passport control etc

Forensic applications such as corpse identification criminal investigation terrorist identification parenthood determination missing children etc Traditionally commercial applications have used knowledge-based systems (eg PIN sand passwords) government applications have used token-based systems (eg ID cards and badges) and forensic applications have relied on human experts to match biometric features Biometric systems are being increasingly deployed in large scale civilian applications The Schiphol Premium scheme at the Amsterdam airport for example employs iris scan cards to speed up the passport and visa control procedures

CHAPTER 6

CONCLUSION AND FUTURE SCOPE

61 CONCLUSION

In this paper we proposed a new parallel sclera vein recognition

method which employees a two stage parallel approach for registration and

matching Even though the research focused on developing a parallel sclera

matching solution for the sequential line-descriptor method using CUDA

GPU architecture the parallel strategies developed in this research can be

applied to design parallel solutions to other sclera vein recognition methods

and general pattern recognition methods We designed the Y shape

descriptor to narrow the search range to increase the matching efficiency

which is a new feature extraction method to take advantage of the GPU

structures We developed the WPL descriptor to incorporate mask

information and make it more suitable for parallel computing which can

dramatically reduce data transferring and computation We then carefully

mapped our algorithms to GPU threads and blocks which is an important

step to achieve parallel computation efficiency using a GPU A work flow

which has high arithmetic intensity to hide the memory access latency was

designed to partition the computation task to the heterogeneous system of

CPU and GPU even to the threads in GPU The proposed method

dramatically improves the matching efficiency without compromising

recognition accuracy

62 REFERENCES

[1] C W Oyster The Human Eye Structure and Function Sunderland

Sinauer Associates 1999

[2] C Cuevas D Berjon F Moran and N Garcia ldquoMoving object

detection for real-time augmented reality applications in a GPGPUrdquo IEEE

Trans Consum Electron vol 58 no 1 pp 117ndash125 Feb 2012

[3] D C Cirean U Meier L M Gambardella and J Schmidhuber ldquoDeep

big simple neural nets for handwritten digit recognitionrdquo Neural Comput

vol 22 no 12 pp 3207ndash3220 2010

[4] F Z Sakr M Taher and A M Wahba ldquoHigh performance iris

recognition system on GPUrdquo in Proc ICCES 2011 pp 237ndash242

[5] G Poli J H Saito J F Mari and M R Zorzan ldquoProcessing

neocognitron of face recognition on high performance environment based

on GPU with CUDA architecturerdquo in Proc 20th Int Symp Comput

Archit High Perform Comput 2008 pp 81ndash88

[6] J Antikainen J Havel R Josth A Herout P Zemcik and M Hauta-

Kasari ldquoNonnegative tensor factorization accelerated using GPGPUrdquo IEEE

Trans Parallel Distrib Syst vol 22 no 7 pp 1135ndash1141 Feb 2011

[7] K-S Oh and K Jung ldquoGPU implementation of neural networksrdquo

Pattern Recognit vol 37 no 6 pp 1311ndash1314 2004

[8] P R Dixon T Oonishi and S Furui ldquoHarnessing graphics processors

for the fast computation of acoustic likelihoods in speech recognitionrdquo

Comput Speech Lang vol 23 no 4 pp 510ndash526 2009

[9] P Kaufman and A Alm ldquoClinical applicationrdquo Adlerrsquos Physiology of

the Eye 2003

[10] R N Rakvic B J Ulis R P Broussard R W Ives and N Steiner

ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics Security vol 4

no 4 pp 812ndash823 Dec 2009

[11] S Crihalmeanu and A Ross ldquoMultispectral scleral patterns for ocular

biometric recognitionrdquo Pattern Recognit Lett vol 33 no 14 pp 1860ndash

1869 Oct 2012

[12] W Wenying Z Dongming Z Yongdong L Jintao and G

Xiaoguang ldquoRobust spatial matching for object retrieval and its parallel

implementation on GPUrdquo IEEE Trans Multimedia vol 13 no 6 pp

1308ndash1318 Dec 2011Multimedia Sec Magdeburg Germany Sep 2004

pp 4ndash15

[13] Y Xu S Deka and R Righetti ldquoA hybrid CPU-GPGPU approach for

real-time elastographyrdquo IEEE Trans Ultrason Ferroelectr Freq Control

vol 58 no 12 pp 2631ndash2645 Dec 2011

[14] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive

multimodal eye recognitionrdquo Signal Image Video Process vol 7 no 4

pp 619ndash631 Jul 2013

[15] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive

approach for sclera image quality measurerdquo Int J Biometrics vol 5 no 2

pp 181ndash198 2013

[16] Z Zhou E Y Du N L Thomas and E J Delp ldquoA new human

identification method Sclera recognitionrdquo IEEE Trans Syst Man

Cybern A Syst Humans vol 42 no 3 pp 571ndash583 May 2012

Page 56: Major project report on

numbers might be correlated among threads In the step of rotation and

scale registration generating we used the Mersenne Twister pseudorandom

number generator because it can use bitwise arithmetic and have long

period

The Mersenne twister as most of pseudorandom generators is iterative

Therefore itrsquos hard to parallelize a single twister state update step among

several execution threads To make sure that thousands of threads in the

launch grid generate uncorrelated random sequence many simultaneous

Mersenne twisters need to process with different initial states in parallel

But even ldquovery differentrdquo (by any definition) initial state values do not

prevent the emission of correlated sequences by each generator sharing

identical parameters To solve this problem and to enable efficient

implementation of Mersenne Twister on parallel architectures we used a

special offline tool for the dynamic creation of Mersenne Twisters

parameters modified from the algorithm developed by Makoto Matsumoto

and Takuji Nishimura In the registration and matching step when

searching the nearest neighbor a line segment that has already matched

with others should not be used again In our approach a flag

FIG

FIG

Variable denoting whether the line has been matched is stored in

shared memory To share the flags all the threads in a block should wait

synchronic operation at every query step Our solution is to use a single

thread in a block to process the matching

253 MEMORY MANAGEMENT

The bandwidth inside GPU board is much higher than the

bandwidth between host memory and device memory The data transfer

between host and device can lead to long latency As shown in Figure 11

we load the entire target templates set from database without considering

when they would be processed Therefore there was no data transfer from

host to device during the matching procedure In global memory the

components in descriptor y(ϕ1 ϕ2 ϕ3 x y) and s(x y rθ ϕw) were stored

separately This would guarantee contiguous kernels of Algorithm 2 to 4

can access their data in successive addresses Although such coalescing

access reduces the latency frequently global memory access was still a

slower way to get data In our kernel we loaded the test template to shared

memory to accelerate memory access Because the Algorithms 2 to 4

execute different number of iterations on same data the bank conflict does

not happen To maximize our texture memory space we set the system

cache to the lowest value and bonded our target descriptor to texture

memory Using this catchable memory our data access was accelerated

more

FIG

26 HISTOGRAM OF ORIENTED GRADIENTS

Histogram of oriented gradients is the feature descriptors It is primarily

applied to the design of target detection In this paper it is applied as the

feature for human recognition In the sclera region the vein patterns are the

edges of an image So HOG is used to determine the gradient orientation

and edge orientations of vein pattern in the sclera region of an eye image

To follow out this technique first of all divide the image into small

connected regions called cells For each cell compute the histogram of

gradient directions or edge orientations of the pixels Then the combination

of different histogram of different cell represents the descriptor To improve

accuracy histograms can be contrast normalized by calculating the intensity

from the block and then using this value normalizes all cells within the

block This normalization result shows that it is invariant to geometric and

photometric changes The gradient magnitude m(x y) and orientation 1050592(x

y) are calculated using x and y directions gradients dx (x y) and dy (x y)

Orientation binning is the second step of HOG This method utilized

to create cell histograms Each pixel within the cell used to give a weight to

the orientation which is found in the gradient computation Gradient

magnitude is used as the weight The cells are in the rectangular form The

binning of gradient orientation should be spread over 0 to 180 degrees and

opposite direction counts as the same In the Fig 8 depicts the edge

orientation of picture elements If the images have any illumination and

contrast changes then the gradient strength must be locally normalized For

that cells are grouped together into larger blocks These blocks are

overlapping so that each cell contributes more than once to the final

descriptor Here rectangular HOG (R-HOG) blocks are applied which are

mainly in square grids The performance of HOG is improved by putting

on a Gaussian window into each block

FIG

CHAPTER 3

SOFTWARE SPECIFICATION

31 GENERAL

MATLAB(matrix laboratory) is a numerical

computing environment and fourth-generation programming language

Developed by Math Works MATLAB allows matrix manipulations

plotting of functions and data implementation of algorithms creation

of user interfaces and interfacing with programs written in other languages

including C C++ Java and Fortran

Although MATLAB is intended primarily for numerical computing an

optional toolbox uses the MuPAD symbolic engine allowing access

to symbolic computing capabilities An additional package Simulink adds

graphicalmulti-domainsimulationandModel-Based

Design for dynamic and embedded systems

In 2004 MATLAB had around one million users across industry

and academia MATLAB users come from various backgrounds

of engineering science and economics MATLAB is widely used in

academic and research institutions as well as industrial enterprises

MATLAB was first adopted by researchers and practitioners

in control engineering Littles specialty but quickly spread to many other

domains It is now also used in education in particular the teaching

of linear algebra and numerical analysis and is popular amongst scientists

involved in image processing The MATLAB application is built around the

MATLAB language The simplest way to execute MATLAB code is to type

it in the Command Window which is one of the elements of the MATLAB

Desktop When code is entered in the Command Window MATLAB can

be used as an interactive mathematical shell Sequences of commands can

be saved in a text file typically using the MATLAB Editor as a script or

encapsulated into a function extending the commands available

MATLAB provides a number of features for documenting and

sharing your work You can integrate your MATLAB code with other

languages and applications and distribute your MATLAB algorithms and

applications

32 FEATURES OF MATLAB

High-level language for technical computing

Development environment for managing code files and data

Interactive tools for iterative exploration design and problem solving

Mathematical functions for linear algebra statistics Fourier analysis

filtering optimization and numerical integration

2-D and 3-D graphics functions for visualizing data

Tools for building custom graphical user interfaces

Functions for integrating MATLAB based algorithms with external

applications and languages such as C C++ FORTRAN Javatrade COM

and Microsoft Excel

MATLAB is used in vast area including signal and image

processing communications control design test and measurement

financial modeling and analysis and computational Add-on toolboxes

(collections of special-purpose MATLAB functions) extend the MATLAB

environment to solve particular classes of problems in these application

areas

MATLAB can be used on personal computers and powerful

server systems including the Cheaha compute cluster With the addition of

the Parallel Computing Toolbox the language can be extended with parallel

implementations for common computational functions including for-loop

unrolling Additionally this toolbox supports offloading computationally

intensive workloads to Cheaha the campus compute cluster MATLAB is

one of a few languages in which each variable is a matrix (broadly

construed) and knows how big it is Moreover the fundamental operators

(eg addition multiplication) are programmed to deal with matrices when

required And the MATLAB environment handles much of the bothersome

housekeeping that makes all this possible Since so many of the procedures

required for Macro-Investment Analysis involves matrices MATLAB

proves to be an extremely efficient language for both communication and

implementation

321 INTERFACING WITH OTHER LANGUAGES

MATLAB can call functions and subroutines written in the C

programming language or FORTRAN A wrapper function is created

allowing MATLAB data types to be passed and returned The dynamically

loadable object files created by compiling such functions are termed MEX-

files (for MATLAB executable)

Libraries written in Java ActiveX or NET can be directly called

from MATLAB and many MATLAB libraries (for

example XML or SQL support) are implemented as wrappers around Java

or ActiveX libraries Calling MATLAB from Java is more complicated but

can be done with MATLAB extension which is sold separately by Math

Works or using an undocumented mechanism called JMI (Java-to-Mat lab

Interface) which should not be confused with the unrelated Java that is also

called JMI

As alternatives to the MuPAD based Symbolic Math Toolbox

available from Math Works MATLAB can be connected

to Maple or Mathematical

Libraries also exist to import and export MathML

Development Environment

Startup Accelerator for faster MATLAB startup on Windows especially on

Windows XP and for network installations

Spreadsheet Import Tool that provides more options for selecting and

loading mixed textual and numeric data

Readability and navigation improvements to warning and error messages in

the MATLAB command window

Automatic variable and function renaming in the MATLAB Editor

Developing Algorithms and Applications

MATLAB provides a high-level language and development

tools that let you quickly develop and analyze your algorithms and

applications

The MATLAB Language

The MATLAB language supports the vector and matrix operations

that are fundamental to engineering and scientific problems It enables fast

development and execution With the MATLAB language you can

program and develop algorithms faster than with traditional languages

because you do not need to perform low-level administrative tasks such as

declaring variables specifying data types and allocating memory In many

cases MATLAB eliminates the need for lsquoforrsquo loops As a result one line of

MATLAB code can often replace several lines of C or C++ code

At the same time MATLAB provides all the features of a traditional

programming language including arithmetic operators flow control data

structures data types object-oriented programming (OOP) and debugging

features

MATLAB lets you execute commands or groups of commands one

at a time without compiling and linking enabling you to quickly iterate to

the optimal solution For fast execution of heavy matrix and vector

computations MATLAB uses processor-optimized libraries For general-

purpose scalar computations MATLAB generates machine-code

instructions using its JIT (Just-In-Time) compilation technology

This technology which is available on most platforms provides

execution speeds that rival those of traditional programming languages

Development Tools

MATLAB includes development tools that help you implement

your algorithm efficiently These include the following

MATLAB Editor

Provides standard editing and debugging features such as setting

breakpoints and single stepping

Code Analyzer

Checks your code for problems and recommends modifications to

maximize performance and maintainability

MATLAB Profiler

Records the time spent executing each line of code

Directory Reports

Scan all the files in a directory and report on code efficiency file

differences file dependencies and code coverage

Designing Graphical User Interfaces

By using the interactive tool GUIDE (Graphical User Interface

Development Environment) to layout design and edit user interfaces

GUIDE lets you include list boxes pull-down menus push buttons radio

buttons and sliders as well as MATLAB plots and Microsoft

ActiveXreg controls Alternatively you can create GUIs programmatically

using MATLAB functions

322 ANALYZING AND ACCESSING DATA

MATLAB supports the entire data analysis process from acquiring

data from external devices and databases through preprocessing

visualization and numerical analysis to producing presentation-quality

output

Data Analysis

MATLAB provides interactive tools and command-line functions for data

analysis operations including

Interpolating and decimating

Extracting sections of data scaling and averaging

Thresholding and smoothing

Correlation Fourier analysis and filtering

1-D peak valley and zero finding

Basic statistics and curve fitting

Matrix analysis

Data Access

MATLAB is an efficient platform for accessing data from

files other applications databases and external devices You can read data

from popular file formats such as Microsoft Excel ASCII text or binary

files image sound and video files and scientific files such as HDF and

HDF5 Low-level binary file IO functions let you work with data files in

any format Additional functions let you read data from Web pages and

XML

Visualizing Data

All the graphics features that are required to visualize engineering

and scientific data are available in MATLAB These include 2-D and 3-D

plotting functions 3-D volume visualization functions tools for

interactively creating plots and the ability to export results to all popular

graphics formats You can customize plots by adding multiple axes

changing line colors and markers adding annotation Latex equations and

legends and drawing shapes

2-D Plotting

Visualizing vectors of data with 2-D plotting functions that create

Line area bar and pie charts

Direction and velocity plots

Histograms

Polygons and surfaces

Scatterbubble plots

Animations

3-D Plotting and Volume Visualization

MATLAB provides functions for visualizing 2-D matrices 3-

D scalar and 3-D vector data You can use these functions to visualize and

understand large often complex multidimensional data Specifying plot

characteristics such as camera viewing angle perspective lighting effect

light source locations and transparency

3-D plotting functions include

Surface contour and mesh

Image plots

Cone slice stream and isosurface

323 PERFORMING NUMERIC COMPUTATION

MATLAB contains mathematical statistical and engineering

functions to support all common engineering and science operations These

functions developed by experts in mathematics are the foundation of the

MATLAB language The core math functions use the LAPACK and BLAS

linear algebra subroutine libraries and the FFTW Discrete Fourier

Transform library Because these processor-dependent libraries are

optimized to the different platforms that MATLAB supports they execute

faster than the equivalent C or C++ code

MATLAB provides the following types of functions for performing

mathematical operations and analyzing data

Matrix manipulation and linear algebra

Polynomials and interpolation

Fourier analysis and filtering

Data analysis and statistics

Optimization and numerical integration

Ordinary differential equations (ODEs)

Partial differential equations (PDEs)

Sparse matrix operations

MATLAB can perform arithmetic on a wide range of data types

including doubles singles and integers

CHAPTER 4

IMPLEMENTATION

41 GENERAL

Matlab is a program that was originally designed to simplify the

implementation of numerical linear algebra routines It has since grown into

something much bigger and it is used to implement numerical algorithms

for a wide range of applications The basic language used is very similar to

standard linear algebra notation but there are a few extensions that will

likely cause you some problems at first

42 SNAPSHOTS

ORIGINAL SCLERA IMAGE IS CONVERTED INTO GREY SCALE IMAGE

FIG

GREY SCALE IMAGE IS CONVERTED INTO BINARY IMAGE

FIG

EDGE DETECTON IS DONE BY OTSUrsquoS THRESHOLDING

FIG

SELECTING THE REGION OF INTEREST (SCLERA PART)

FIG

SELECTED ROI PART

FIG

FIG

ENHANCEMENT OF SCLERA IMAGE

FIG

FEATURE EXTRACTION OF SCLERA IMAGE USING GABOR

FILTERS

FIG

MATCHING WITH IMAGES IN DATABASE

FIG

DISPLAYING THE RESULT (MATCHED OR NOT MATCHED)

FIG

CHAPTER 5

APPLICATIONS

The applications of biometrics can be divided into the following three main groups

Commercial applications such as computer network login electronic data security ecommerce Internet access ATM credit card physical access control cellular phone PDA medical records management distance learning etc

Government applications such as national ID card correctional facility driverrsquos license social security welfare-disbursement border control Passport control etc

Forensic applications such as corpse identification criminal investigation terrorist identification parenthood determination missing children etc Traditionally commercial applications have used knowledge-based systems (eg PIN sand passwords) government applications have used token-based systems (eg ID cards and badges) and forensic applications have relied on human experts to match biometric features Biometric systems are being increasingly deployed in large scale civilian applications The Schiphol Premium scheme at the Amsterdam airport for example employs iris scan cards to speed up the passport and visa control procedures

CHAPTER 6

CONCLUSION AND FUTURE SCOPE

61 CONCLUSION

In this paper we proposed a new parallel sclera vein recognition

method which employees a two stage parallel approach for registration and

matching Even though the research focused on developing a parallel sclera

matching solution for the sequential line-descriptor method using CUDA

GPU architecture the parallel strategies developed in this research can be

applied to design parallel solutions to other sclera vein recognition methods

and general pattern recognition methods We designed the Y shape

descriptor to narrow the search range to increase the matching efficiency

which is a new feature extraction method to take advantage of the GPU

structures We developed the WPL descriptor to incorporate mask

information and make it more suitable for parallel computing which can

dramatically reduce data transferring and computation We then carefully

mapped our algorithms to GPU threads and blocks which is an important

step to achieve parallel computation efficiency using a GPU A work flow

which has high arithmetic intensity to hide the memory access latency was

designed to partition the computation task to the heterogeneous system of

CPU and GPU even to the threads in GPU The proposed method

dramatically improves the matching efficiency without compromising

recognition accuracy

62 REFERENCES

[1] C W Oyster The Human Eye Structure and Function Sunderland

Sinauer Associates 1999

[2] C Cuevas D Berjon F Moran and N Garcia ldquoMoving object

detection for real-time augmented reality applications in a GPGPUrdquo IEEE

Trans Consum Electron vol 58 no 1 pp 117ndash125 Feb 2012

[3] D C Cirean U Meier L M Gambardella and J Schmidhuber ldquoDeep

big simple neural nets for handwritten digit recognitionrdquo Neural Comput

vol 22 no 12 pp 3207ndash3220 2010

[4] F Z Sakr M Taher and A M Wahba ldquoHigh performance iris

recognition system on GPUrdquo in Proc ICCES 2011 pp 237ndash242

[5] G Poli J H Saito J F Mari and M R Zorzan ldquoProcessing

neocognitron of face recognition on high performance environment based

on GPU with CUDA architecturerdquo in Proc 20th Int Symp Comput

Archit High Perform Comput 2008 pp 81ndash88

[6] J Antikainen J Havel R Josth A Herout P Zemcik and M Hauta-

Kasari ldquoNonnegative tensor factorization accelerated using GPGPUrdquo IEEE

Trans Parallel Distrib Syst vol 22 no 7 pp 1135ndash1141 Feb 2011

[7] K-S Oh and K Jung ldquoGPU implementation of neural networksrdquo

Pattern Recognit vol 37 no 6 pp 1311ndash1314 2004

[8] P R Dixon T Oonishi and S Furui ldquoHarnessing graphics processors

for the fast computation of acoustic likelihoods in speech recognitionrdquo

Comput Speech Lang vol 23 no 4 pp 510ndash526 2009

[9] P Kaufman and A Alm ldquoClinical applicationrdquo Adlerrsquos Physiology of

the Eye 2003

[10] R N Rakvic B J Ulis R P Broussard R W Ives and N Steiner

ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics Security vol 4

no 4 pp 812ndash823 Dec 2009

[11] S Crihalmeanu and A Ross ldquoMultispectral scleral patterns for ocular

biometric recognitionrdquo Pattern Recognit Lett vol 33 no 14 pp 1860ndash

1869 Oct 2012

[12] W Wenying Z Dongming Z Yongdong L Jintao and G

Xiaoguang ldquoRobust spatial matching for object retrieval and its parallel

implementation on GPUrdquo IEEE Trans Multimedia vol 13 no 6 pp

1308ndash1318 Dec 2011Multimedia Sec Magdeburg Germany Sep 2004

pp 4ndash15

[13] Y Xu S Deka and R Righetti ldquoA hybrid CPU-GPGPU approach for

real-time elastographyrdquo IEEE Trans Ultrason Ferroelectr Freq Control

vol 58 no 12 pp 2631ndash2645 Dec 2011

[14] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive

multimodal eye recognitionrdquo Signal Image Video Process vol 7 no 4

pp 619ndash631 Jul 2013

[15] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive

approach for sclera image quality measurerdquo Int J Biometrics vol 5 no 2

pp 181ndash198 2013

[16] Z Zhou E Y Du N L Thomas and E J Delp ldquoA new human

identification method Sclera recognitionrdquo IEEE Trans Syst Man

Cybern A Syst Humans vol 42 no 3 pp 571ndash583 May 2012

Page 57: Major project report on

FIG

Variable denoting whether the line has been matched is stored in

shared memory To share the flags all the threads in a block should wait

synchronic operation at every query step Our solution is to use a single

thread in a block to process the matching

253 MEMORY MANAGEMENT

The bandwidth inside GPU board is much higher than the

bandwidth between host memory and device memory The data transfer

between host and device can lead to long latency As shown in Figure 11

we load the entire target templates set from database without considering

when they would be processed Therefore there was no data transfer from

host to device during the matching procedure In global memory the

components in descriptor y(ϕ1 ϕ2 ϕ3 x y) and s(x y rθ ϕw) were stored

separately This would guarantee contiguous kernels of Algorithm 2 to 4

can access their data in successive addresses Although such coalescing

access reduces the latency frequently global memory access was still a

slower way to get data In our kernel we loaded the test template to shared

memory to accelerate memory access Because the Algorithms 2 to 4

execute different number of iterations on same data the bank conflict does

not happen To maximize our texture memory space we set the system

cache to the lowest value and bonded our target descriptor to texture

memory Using this catchable memory our data access was accelerated

more

FIG

26 HISTOGRAM OF ORIENTED GRADIENTS

Histogram of oriented gradients is the feature descriptors It is primarily

applied to the design of target detection In this paper it is applied as the

feature for human recognition In the sclera region the vein patterns are the

edges of an image So HOG is used to determine the gradient orientation

and edge orientations of vein pattern in the sclera region of an eye image

To follow out this technique first of all divide the image into small

connected regions called cells For each cell compute the histogram of

gradient directions or edge orientations of the pixels Then the combination

of different histogram of different cell represents the descriptor To improve

accuracy histograms can be contrast normalized by calculating the intensity

from the block and then using this value normalizes all cells within the

block This normalization result shows that it is invariant to geometric and

photometric changes The gradient magnitude m(x y) and orientation 1050592(x

y) are calculated using x and y directions gradients dx (x y) and dy (x y)

Orientation binning is the second step of HOG This method utilized

to create cell histograms Each pixel within the cell used to give a weight to

the orientation which is found in the gradient computation Gradient

magnitude is used as the weight The cells are in the rectangular form The

binning of gradient orientation should be spread over 0 to 180 degrees and

opposite direction counts as the same In the Fig 8 depicts the edge

orientation of picture elements If the images have any illumination and

contrast changes then the gradient strength must be locally normalized For

that cells are grouped together into larger blocks These blocks are

overlapping so that each cell contributes more than once to the final

descriptor Here rectangular HOG (R-HOG) blocks are applied which are

mainly in square grids The performance of HOG is improved by putting

on a Gaussian window into each block

FIG

CHAPTER 3

SOFTWARE SPECIFICATION

31 GENERAL

MATLAB(matrix laboratory) is a numerical

computing environment and fourth-generation programming language

Developed by Math Works MATLAB allows matrix manipulations

plotting of functions and data implementation of algorithms creation

of user interfaces and interfacing with programs written in other languages

including C C++ Java and Fortran

Although MATLAB is intended primarily for numerical computing an

optional toolbox uses the MuPAD symbolic engine allowing access

to symbolic computing capabilities An additional package Simulink adds

graphicalmulti-domainsimulationandModel-Based

Design for dynamic and embedded systems

In 2004 MATLAB had around one million users across industry

and academia MATLAB users come from various backgrounds

of engineering science and economics MATLAB is widely used in

academic and research institutions as well as industrial enterprises

MATLAB was first adopted by researchers and practitioners

in control engineering Littles specialty but quickly spread to many other

domains It is now also used in education in particular the teaching

of linear algebra and numerical analysis and is popular amongst scientists

involved in image processing The MATLAB application is built around the

MATLAB language The simplest way to execute MATLAB code is to type

it in the Command Window which is one of the elements of the MATLAB

Desktop When code is entered in the Command Window MATLAB can

be used as an interactive mathematical shell Sequences of commands can

be saved in a text file typically using the MATLAB Editor as a script or

encapsulated into a function extending the commands available

MATLAB provides a number of features for documenting and

sharing your work You can integrate your MATLAB code with other

languages and applications and distribute your MATLAB algorithms and

applications

32 FEATURES OF MATLAB

High-level language for technical computing

Development environment for managing code files and data

Interactive tools for iterative exploration design and problem solving

Mathematical functions for linear algebra statistics Fourier analysis

filtering optimization and numerical integration

2-D and 3-D graphics functions for visualizing data

Tools for building custom graphical user interfaces

Functions for integrating MATLAB based algorithms with external

applications and languages such as C C++ FORTRAN Javatrade COM

and Microsoft Excel

MATLAB is used in vast area including signal and image

processing communications control design test and measurement

financial modeling and analysis and computational Add-on toolboxes

(collections of special-purpose MATLAB functions) extend the MATLAB

environment to solve particular classes of problems in these application

areas

MATLAB can be used on personal computers and powerful

server systems including the Cheaha compute cluster With the addition of

the Parallel Computing Toolbox the language can be extended with parallel

implementations for common computational functions including for-loop

unrolling Additionally this toolbox supports offloading computationally

intensive workloads to Cheaha the campus compute cluster MATLAB is

one of a few languages in which each variable is a matrix (broadly

construed) and knows how big it is Moreover the fundamental operators

(eg addition multiplication) are programmed to deal with matrices when

required And the MATLAB environment handles much of the bothersome

housekeeping that makes all this possible Since so many of the procedures

required for Macro-Investment Analysis involves matrices MATLAB

proves to be an extremely efficient language for both communication and

implementation

321 INTERFACING WITH OTHER LANGUAGES

MATLAB can call functions and subroutines written in the C

programming language or FORTRAN A wrapper function is created

allowing MATLAB data types to be passed and returned The dynamically

loadable object files created by compiling such functions are termed MEX-

files (for MATLAB executable)

Libraries written in Java ActiveX or NET can be directly called

from MATLAB and many MATLAB libraries (for

example XML or SQL support) are implemented as wrappers around Java

or ActiveX libraries Calling MATLAB from Java is more complicated but

can be done with MATLAB extension which is sold separately by Math

Works or using an undocumented mechanism called JMI (Java-to-Mat lab

Interface) which should not be confused with the unrelated Java that is also

called JMI

As alternatives to the MuPAD based Symbolic Math Toolbox

available from Math Works MATLAB can be connected

to Maple or Mathematical

Libraries also exist to import and export MathML

Development Environment

Startup Accelerator for faster MATLAB startup on Windows especially on

Windows XP and for network installations

Spreadsheet Import Tool that provides more options for selecting and

loading mixed textual and numeric data

Readability and navigation improvements to warning and error messages in

the MATLAB command window

Automatic variable and function renaming in the MATLAB Editor

Developing Algorithms and Applications

MATLAB provides a high-level language and development

tools that let you quickly develop and analyze your algorithms and

applications

The MATLAB Language

The MATLAB language supports the vector and matrix operations

that are fundamental to engineering and scientific problems It enables fast

development and execution With the MATLAB language you can

program and develop algorithms faster than with traditional languages

because you do not need to perform low-level administrative tasks such as

declaring variables specifying data types and allocating memory In many

cases MATLAB eliminates the need for lsquoforrsquo loops As a result one line of

MATLAB code can often replace several lines of C or C++ code

At the same time MATLAB provides all the features of a traditional

programming language including arithmetic operators flow control data

structures data types object-oriented programming (OOP) and debugging

features

MATLAB lets you execute commands or groups of commands one

at a time without compiling and linking enabling you to quickly iterate to

the optimal solution For fast execution of heavy matrix and vector

computations MATLAB uses processor-optimized libraries For general-

purpose scalar computations MATLAB generates machine-code

instructions using its JIT (Just-In-Time) compilation technology

This technology which is available on most platforms provides

execution speeds that rival those of traditional programming languages

Development Tools

MATLAB includes development tools that help you implement

your algorithm efficiently These include the following

MATLAB Editor

Provides standard editing and debugging features such as setting

breakpoints and single stepping

Code Analyzer

Checks your code for problems and recommends modifications to

maximize performance and maintainability

MATLAB Profiler

Records the time spent executing each line of code

Directory Reports

Scan all the files in a directory and report on code efficiency file

differences file dependencies and code coverage

Designing Graphical User Interfaces

By using the interactive tool GUIDE (Graphical User Interface

Development Environment) to layout design and edit user interfaces

GUIDE lets you include list boxes pull-down menus push buttons radio

buttons and sliders as well as MATLAB plots and Microsoft

ActiveXreg controls Alternatively you can create GUIs programmatically

using MATLAB functions

322 ANALYZING AND ACCESSING DATA

MATLAB supports the entire data analysis process from acquiring

data from external devices and databases through preprocessing

visualization and numerical analysis to producing presentation-quality

output

Data Analysis

MATLAB provides interactive tools and command-line functions for data

analysis operations including

Interpolating and decimating

Extracting sections of data scaling and averaging

Thresholding and smoothing

Correlation Fourier analysis and filtering

1-D peak valley and zero finding

Basic statistics and curve fitting

Matrix analysis

Data Access

MATLAB is an efficient platform for accessing data from

files other applications databases and external devices You can read data

from popular file formats such as Microsoft Excel ASCII text or binary

files image sound and video files and scientific files such as HDF and

HDF5 Low-level binary file IO functions let you work with data files in

any format Additional functions let you read data from Web pages and

XML

Visualizing Data

All the graphics features that are required to visualize engineering

and scientific data are available in MATLAB These include 2-D and 3-D

plotting functions 3-D volume visualization functions tools for

interactively creating plots and the ability to export results to all popular

graphics formats You can customize plots by adding multiple axes

changing line colors and markers adding annotation Latex equations and

legends and drawing shapes

2-D Plotting

Visualizing vectors of data with 2-D plotting functions that create

Line area bar and pie charts

Direction and velocity plots

Histograms

Polygons and surfaces

Scatterbubble plots

Animations

3-D Plotting and Volume Visualization

MATLAB provides functions for visualizing 2-D matrices 3-

D scalar and 3-D vector data You can use these functions to visualize and

understand large often complex multidimensional data Specifying plot

characteristics such as camera viewing angle perspective lighting effect

light source locations and transparency

3-D plotting functions include

Surface contour and mesh

Image plots

Cone slice stream and isosurface

323 PERFORMING NUMERIC COMPUTATION

MATLAB contains mathematical statistical and engineering

functions to support all common engineering and science operations These

functions developed by experts in mathematics are the foundation of the

MATLAB language The core math functions use the LAPACK and BLAS

linear algebra subroutine libraries and the FFTW Discrete Fourier

Transform library Because these processor-dependent libraries are

optimized to the different platforms that MATLAB supports they execute

faster than the equivalent C or C++ code

MATLAB provides the following types of functions for performing

mathematical operations and analyzing data

Matrix manipulation and linear algebra

Polynomials and interpolation

Fourier analysis and filtering

Data analysis and statistics

Optimization and numerical integration

Ordinary differential equations (ODEs)

Partial differential equations (PDEs)

Sparse matrix operations

MATLAB can perform arithmetic on a wide range of data types

including doubles singles and integers

CHAPTER 4

IMPLEMENTATION

41 GENERAL

Matlab is a program that was originally designed to simplify the

implementation of numerical linear algebra routines It has since grown into

something much bigger and it is used to implement numerical algorithms

for a wide range of applications The basic language used is very similar to

standard linear algebra notation but there are a few extensions that will

likely cause you some problems at first

42 SNAPSHOTS

ORIGINAL SCLERA IMAGE IS CONVERTED INTO GREY SCALE IMAGE

FIG

GREY SCALE IMAGE IS CONVERTED INTO BINARY IMAGE

FIG

EDGE DETECTON IS DONE BY OTSUrsquoS THRESHOLDING

FIG

SELECTING THE REGION OF INTEREST (SCLERA PART)

FIG

SELECTED ROI PART

FIG

FIG

ENHANCEMENT OF SCLERA IMAGE

FIG

FEATURE EXTRACTION OF SCLERA IMAGE USING GABOR

FILTERS

FIG

MATCHING WITH IMAGES IN DATABASE

FIG

DISPLAYING THE RESULT (MATCHED OR NOT MATCHED)

FIG

CHAPTER 5

APPLICATIONS

The applications of biometrics can be divided into the following three main groups

Commercial applications such as computer network login electronic data security ecommerce Internet access ATM credit card physical access control cellular phone PDA medical records management distance learning etc

Government applications such as national ID card correctional facility driverrsquos license social security welfare-disbursement border control Passport control etc

Forensic applications such as corpse identification criminal investigation terrorist identification parenthood determination missing children etc Traditionally commercial applications have used knowledge-based systems (eg PIN sand passwords) government applications have used token-based systems (eg ID cards and badges) and forensic applications have relied on human experts to match biometric features Biometric systems are being increasingly deployed in large scale civilian applications The Schiphol Premium scheme at the Amsterdam airport for example employs iris scan cards to speed up the passport and visa control procedures

CHAPTER 6

CONCLUSION AND FUTURE SCOPE

61 CONCLUSION

In this paper we proposed a new parallel sclera vein recognition

method which employees a two stage parallel approach for registration and

matching Even though the research focused on developing a parallel sclera

matching solution for the sequential line-descriptor method using CUDA

GPU architecture the parallel strategies developed in this research can be

applied to design parallel solutions to other sclera vein recognition methods

and general pattern recognition methods We designed the Y shape

descriptor to narrow the search range to increase the matching efficiency

which is a new feature extraction method to take advantage of the GPU

structures We developed the WPL descriptor to incorporate mask

information and make it more suitable for parallel computing which can

dramatically reduce data transferring and computation We then carefully

mapped our algorithms to GPU threads and blocks which is an important

step to achieve parallel computation efficiency using a GPU A work flow

which has high arithmetic intensity to hide the memory access latency was

designed to partition the computation task to the heterogeneous system of

CPU and GPU even to the threads in GPU The proposed method

dramatically improves the matching efficiency without compromising

recognition accuracy

62 REFERENCES

[1] C W Oyster The Human Eye Structure and Function Sunderland

Sinauer Associates 1999

[2] C Cuevas D Berjon F Moran and N Garcia ldquoMoving object

detection for real-time augmented reality applications in a GPGPUrdquo IEEE

Trans Consum Electron vol 58 no 1 pp 117ndash125 Feb 2012

[3] D C Cirean U Meier L M Gambardella and J Schmidhuber ldquoDeep

big simple neural nets for handwritten digit recognitionrdquo Neural Comput

vol 22 no 12 pp 3207ndash3220 2010

[4] F Z Sakr M Taher and A M Wahba ldquoHigh performance iris

recognition system on GPUrdquo in Proc ICCES 2011 pp 237ndash242

[5] G Poli J H Saito J F Mari and M R Zorzan ldquoProcessing

neocognitron of face recognition on high performance environment based

on GPU with CUDA architecturerdquo in Proc 20th Int Symp Comput

Archit High Perform Comput 2008 pp 81ndash88

[6] J Antikainen J Havel R Josth A Herout P Zemcik and M Hauta-

Kasari ldquoNonnegative tensor factorization accelerated using GPGPUrdquo IEEE

Trans Parallel Distrib Syst vol 22 no 7 pp 1135ndash1141 Feb 2011

[7] K-S Oh and K Jung ldquoGPU implementation of neural networksrdquo

Pattern Recognit vol 37 no 6 pp 1311ndash1314 2004

[8] P R Dixon T Oonishi and S Furui ldquoHarnessing graphics processors

for the fast computation of acoustic likelihoods in speech recognitionrdquo

Comput Speech Lang vol 23 no 4 pp 510ndash526 2009

[9] P Kaufman and A Alm ldquoClinical applicationrdquo Adlerrsquos Physiology of

the Eye 2003

[10] R N Rakvic B J Ulis R P Broussard R W Ives and N Steiner

ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics Security vol 4

no 4 pp 812ndash823 Dec 2009

[11] S Crihalmeanu and A Ross ldquoMultispectral scleral patterns for ocular

biometric recognitionrdquo Pattern Recognit Lett vol 33 no 14 pp 1860ndash

1869 Oct 2012

[12] W Wenying Z Dongming Z Yongdong L Jintao and G

Xiaoguang ldquoRobust spatial matching for object retrieval and its parallel

implementation on GPUrdquo IEEE Trans Multimedia vol 13 no 6 pp

1308ndash1318 Dec 2011Multimedia Sec Magdeburg Germany Sep 2004

pp 4ndash15

[13] Y Xu S Deka and R Righetti ldquoA hybrid CPU-GPGPU approach for

real-time elastographyrdquo IEEE Trans Ultrason Ferroelectr Freq Control

vol 58 no 12 pp 2631ndash2645 Dec 2011

[14] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive

multimodal eye recognitionrdquo Signal Image Video Process vol 7 no 4

pp 619ndash631 Jul 2013

[15] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive

approach for sclera image quality measurerdquo Int J Biometrics vol 5 no 2

pp 181ndash198 2013

[16] Z Zhou E Y Du N L Thomas and E J Delp ldquoA new human

identification method Sclera recognitionrdquo IEEE Trans Syst Man

Cybern A Syst Humans vol 42 no 3 pp 571ndash583 May 2012

Page 58: Major project report on

FIG

26 HISTOGRAM OF ORIENTED GRADIENTS

Histogram of oriented gradients is the feature descriptors It is primarily

applied to the design of target detection In this paper it is applied as the

feature for human recognition In the sclera region the vein patterns are the

edges of an image So HOG is used to determine the gradient orientation

and edge orientations of vein pattern in the sclera region of an eye image

To follow out this technique first of all divide the image into small

connected regions called cells For each cell compute the histogram of

gradient directions or edge orientations of the pixels Then the combination

of different histogram of different cell represents the descriptor To improve

accuracy histograms can be contrast normalized by calculating the intensity

from the block and then using this value normalizes all cells within the

block This normalization result shows that it is invariant to geometric and

photometric changes The gradient magnitude m(x y) and orientation 1050592(x

y) are calculated using x and y directions gradients dx (x y) and dy (x y)

Orientation binning is the second step of HOG This method utilized

to create cell histograms Each pixel within the cell used to give a weight to

the orientation which is found in the gradient computation Gradient

magnitude is used as the weight The cells are in the rectangular form The

binning of gradient orientation should be spread over 0 to 180 degrees and

opposite direction counts as the same In the Fig 8 depicts the edge

orientation of picture elements If the images have any illumination and

contrast changes then the gradient strength must be locally normalized For

that cells are grouped together into larger blocks These blocks are

overlapping so that each cell contributes more than once to the final

descriptor Here rectangular HOG (R-HOG) blocks are applied which are

mainly in square grids The performance of HOG is improved by putting

on a Gaussian window into each block

FIG

CHAPTER 3

SOFTWARE SPECIFICATION

31 GENERAL

MATLAB(matrix laboratory) is a numerical

computing environment and fourth-generation programming language

Developed by Math Works MATLAB allows matrix manipulations

plotting of functions and data implementation of algorithms creation

of user interfaces and interfacing with programs written in other languages

including C C++ Java and Fortran

Although MATLAB is intended primarily for numerical computing an

optional toolbox uses the MuPAD symbolic engine allowing access

to symbolic computing capabilities An additional package Simulink adds

graphicalmulti-domainsimulationandModel-Based

Design for dynamic and embedded systems

In 2004 MATLAB had around one million users across industry

and academia MATLAB users come from various backgrounds

of engineering science and economics MATLAB is widely used in

academic and research institutions as well as industrial enterprises

MATLAB was first adopted by researchers and practitioners

in control engineering Littles specialty but quickly spread to many other

domains It is now also used in education in particular the teaching

of linear algebra and numerical analysis and is popular amongst scientists

involved in image processing The MATLAB application is built around the

MATLAB language The simplest way to execute MATLAB code is to type

it in the Command Window which is one of the elements of the MATLAB

Desktop When code is entered in the Command Window MATLAB can

be used as an interactive mathematical shell Sequences of commands can

be saved in a text file typically using the MATLAB Editor as a script or

encapsulated into a function extending the commands available

MATLAB provides a number of features for documenting and

sharing your work You can integrate your MATLAB code with other

languages and applications and distribute your MATLAB algorithms and

applications

32 FEATURES OF MATLAB

High-level language for technical computing

Development environment for managing code files and data

Interactive tools for iterative exploration design and problem solving

Mathematical functions for linear algebra statistics Fourier analysis

filtering optimization and numerical integration

2-D and 3-D graphics functions for visualizing data

Tools for building custom graphical user interfaces

Functions for integrating MATLAB based algorithms with external

applications and languages such as C C++ FORTRAN Javatrade COM

and Microsoft Excel

MATLAB is used in vast area including signal and image

processing communications control design test and measurement

financial modeling and analysis and computational Add-on toolboxes

(collections of special-purpose MATLAB functions) extend the MATLAB

environment to solve particular classes of problems in these application

areas

MATLAB can be used on personal computers and powerful

server systems including the Cheaha compute cluster With the addition of

the Parallel Computing Toolbox the language can be extended with parallel

implementations for common computational functions including for-loop

unrolling Additionally this toolbox supports offloading computationally

intensive workloads to Cheaha the campus compute cluster MATLAB is

one of a few languages in which each variable is a matrix (broadly

construed) and knows how big it is Moreover the fundamental operators

(eg addition multiplication) are programmed to deal with matrices when

required And the MATLAB environment handles much of the bothersome

housekeeping that makes all this possible Since so many of the procedures

required for Macro-Investment Analysis involves matrices MATLAB

proves to be an extremely efficient language for both communication and

implementation

321 INTERFACING WITH OTHER LANGUAGES

MATLAB can call functions and subroutines written in the C

programming language or FORTRAN A wrapper function is created

allowing MATLAB data types to be passed and returned The dynamically

loadable object files created by compiling such functions are termed MEX-

files (for MATLAB executable)

Libraries written in Java ActiveX or NET can be directly called

from MATLAB and many MATLAB libraries (for

example XML or SQL support) are implemented as wrappers around Java

or ActiveX libraries Calling MATLAB from Java is more complicated but

can be done with MATLAB extension which is sold separately by Math

Works or using an undocumented mechanism called JMI (Java-to-Mat lab

Interface) which should not be confused with the unrelated Java that is also

called JMI

As alternatives to the MuPAD based Symbolic Math Toolbox

available from Math Works MATLAB can be connected

to Maple or Mathematical

Libraries also exist to import and export MathML

Development Environment

Startup Accelerator for faster MATLAB startup on Windows especially on

Windows XP and for network installations

Spreadsheet Import Tool that provides more options for selecting and

loading mixed textual and numeric data

Readability and navigation improvements to warning and error messages in

the MATLAB command window

Automatic variable and function renaming in the MATLAB Editor

Developing Algorithms and Applications

MATLAB provides a high-level language and development

tools that let you quickly develop and analyze your algorithms and

applications

The MATLAB Language

The MATLAB language supports the vector and matrix operations

that are fundamental to engineering and scientific problems It enables fast

development and execution With the MATLAB language you can

program and develop algorithms faster than with traditional languages

because you do not need to perform low-level administrative tasks such as

declaring variables specifying data types and allocating memory In many

cases MATLAB eliminates the need for lsquoforrsquo loops As a result one line of

MATLAB code can often replace several lines of C or C++ code

At the same time MATLAB provides all the features of a traditional

programming language including arithmetic operators flow control data

structures data types object-oriented programming (OOP) and debugging

features

MATLAB lets you execute commands or groups of commands one

at a time without compiling and linking enabling you to quickly iterate to

the optimal solution For fast execution of heavy matrix and vector

computations MATLAB uses processor-optimized libraries For general-

purpose scalar computations MATLAB generates machine-code

instructions using its JIT (Just-In-Time) compilation technology

This technology which is available on most platforms provides

execution speeds that rival those of traditional programming languages

Development Tools

MATLAB includes development tools that help you implement

your algorithm efficiently These include the following

MATLAB Editor

Provides standard editing and debugging features such as setting

breakpoints and single stepping

Code Analyzer

Checks your code for problems and recommends modifications to

maximize performance and maintainability

MATLAB Profiler

Records the time spent executing each line of code

Directory Reports

Scan all the files in a directory and report on code efficiency file

differences file dependencies and code coverage

Designing Graphical User Interfaces

By using the interactive tool GUIDE (Graphical User Interface

Development Environment) to layout design and edit user interfaces

GUIDE lets you include list boxes pull-down menus push buttons radio

buttons and sliders as well as MATLAB plots and Microsoft

ActiveXreg controls Alternatively you can create GUIs programmatically

using MATLAB functions

322 ANALYZING AND ACCESSING DATA

MATLAB supports the entire data analysis process from acquiring

data from external devices and databases through preprocessing

visualization and numerical analysis to producing presentation-quality

output

Data Analysis

MATLAB provides interactive tools and command-line functions for data

analysis operations including

Interpolating and decimating

Extracting sections of data scaling and averaging

Thresholding and smoothing

Correlation Fourier analysis and filtering

1-D peak valley and zero finding

Basic statistics and curve fitting

Matrix analysis

Data Access

MATLAB is an efficient platform for accessing data from

files other applications databases and external devices You can read data

from popular file formats such as Microsoft Excel ASCII text or binary

files image sound and video files and scientific files such as HDF and

HDF5 Low-level binary file IO functions let you work with data files in

any format Additional functions let you read data from Web pages and

XML

Visualizing Data

All the graphics features that are required to visualize engineering

and scientific data are available in MATLAB These include 2-D and 3-D

plotting functions 3-D volume visualization functions tools for

interactively creating plots and the ability to export results to all popular

graphics formats You can customize plots by adding multiple axes

changing line colors and markers adding annotation Latex equations and

legends and drawing shapes

2-D Plotting

Visualizing vectors of data with 2-D plotting functions that create

Line area bar and pie charts

Direction and velocity plots

Histograms

Polygons and surfaces

Scatterbubble plots

Animations

3-D Plotting and Volume Visualization

MATLAB provides functions for visualizing 2-D matrices 3-

D scalar and 3-D vector data You can use these functions to visualize and

understand large often complex multidimensional data Specifying plot

characteristics such as camera viewing angle perspective lighting effect

light source locations and transparency

3-D plotting functions include

Surface contour and mesh

Image plots

Cone slice stream and isosurface

323 PERFORMING NUMERIC COMPUTATION

MATLAB contains mathematical statistical and engineering

functions to support all common engineering and science operations These

functions developed by experts in mathematics are the foundation of the

MATLAB language The core math functions use the LAPACK and BLAS

linear algebra subroutine libraries and the FFTW Discrete Fourier

Transform library Because these processor-dependent libraries are

optimized to the different platforms that MATLAB supports they execute

faster than the equivalent C or C++ code

MATLAB provides the following types of functions for performing

mathematical operations and analyzing data

Matrix manipulation and linear algebra

Polynomials and interpolation

Fourier analysis and filtering

Data analysis and statistics

Optimization and numerical integration

Ordinary differential equations (ODEs)

Partial differential equations (PDEs)

Sparse matrix operations

MATLAB can perform arithmetic on a wide range of data types

including doubles singles and integers

CHAPTER 4

IMPLEMENTATION

41 GENERAL

Matlab is a program that was originally designed to simplify the

implementation of numerical linear algebra routines It has since grown into

something much bigger and it is used to implement numerical algorithms

for a wide range of applications The basic language used is very similar to

standard linear algebra notation but there are a few extensions that will

likely cause you some problems at first

42 SNAPSHOTS

ORIGINAL SCLERA IMAGE IS CONVERTED INTO GREY SCALE IMAGE

FIG

GREY SCALE IMAGE IS CONVERTED INTO BINARY IMAGE

FIG

EDGE DETECTON IS DONE BY OTSUrsquoS THRESHOLDING

FIG

SELECTING THE REGION OF INTEREST (SCLERA PART)

FIG

SELECTED ROI PART

FIG

FIG

ENHANCEMENT OF SCLERA IMAGE

FIG

FEATURE EXTRACTION OF SCLERA IMAGE USING GABOR

FILTERS

FIG

MATCHING WITH IMAGES IN DATABASE

FIG

DISPLAYING THE RESULT (MATCHED OR NOT MATCHED)

FIG

CHAPTER 5

APPLICATIONS

The applications of biometrics can be divided into the following three main groups

Commercial applications such as computer network login electronic data security ecommerce Internet access ATM credit card physical access control cellular phone PDA medical records management distance learning etc

Government applications such as national ID card correctional facility driverrsquos license social security welfare-disbursement border control Passport control etc

Forensic applications such as corpse identification criminal investigation terrorist identification parenthood determination missing children etc Traditionally commercial applications have used knowledge-based systems (eg PIN sand passwords) government applications have used token-based systems (eg ID cards and badges) and forensic applications have relied on human experts to match biometric features Biometric systems are being increasingly deployed in large scale civilian applications The Schiphol Premium scheme at the Amsterdam airport for example employs iris scan cards to speed up the passport and visa control procedures

CHAPTER 6

CONCLUSION AND FUTURE SCOPE

61 CONCLUSION

In this paper we proposed a new parallel sclera vein recognition

method which employees a two stage parallel approach for registration and

matching Even though the research focused on developing a parallel sclera

matching solution for the sequential line-descriptor method using CUDA

GPU architecture the parallel strategies developed in this research can be

applied to design parallel solutions to other sclera vein recognition methods

and general pattern recognition methods We designed the Y shape

descriptor to narrow the search range to increase the matching efficiency

which is a new feature extraction method to take advantage of the GPU

structures We developed the WPL descriptor to incorporate mask

information and make it more suitable for parallel computing which can

dramatically reduce data transferring and computation We then carefully

mapped our algorithms to GPU threads and blocks which is an important

step to achieve parallel computation efficiency using a GPU A work flow

which has high arithmetic intensity to hide the memory access latency was

designed to partition the computation task to the heterogeneous system of

CPU and GPU even to the threads in GPU The proposed method

dramatically improves the matching efficiency without compromising

recognition accuracy

62 REFERENCES

[1] C W Oyster The Human Eye Structure and Function Sunderland

Sinauer Associates 1999

[2] C Cuevas D Berjon F Moran and N Garcia ldquoMoving object

detection for real-time augmented reality applications in a GPGPUrdquo IEEE

Trans Consum Electron vol 58 no 1 pp 117ndash125 Feb 2012

[3] D C Cirean U Meier L M Gambardella and J Schmidhuber ldquoDeep

big simple neural nets for handwritten digit recognitionrdquo Neural Comput

vol 22 no 12 pp 3207ndash3220 2010

[4] F Z Sakr M Taher and A M Wahba ldquoHigh performance iris

recognition system on GPUrdquo in Proc ICCES 2011 pp 237ndash242

[5] G Poli J H Saito J F Mari and M R Zorzan ldquoProcessing

neocognitron of face recognition on high performance environment based

on GPU with CUDA architecturerdquo in Proc 20th Int Symp Comput

Archit High Perform Comput 2008 pp 81ndash88

[6] J Antikainen J Havel R Josth A Herout P Zemcik and M Hauta-

Kasari ldquoNonnegative tensor factorization accelerated using GPGPUrdquo IEEE

Trans Parallel Distrib Syst vol 22 no 7 pp 1135ndash1141 Feb 2011

[7] K-S Oh and K Jung ldquoGPU implementation of neural networksrdquo

Pattern Recognit vol 37 no 6 pp 1311ndash1314 2004

[8] P R Dixon T Oonishi and S Furui ldquoHarnessing graphics processors

for the fast computation of acoustic likelihoods in speech recognitionrdquo

Comput Speech Lang vol 23 no 4 pp 510ndash526 2009

[9] P Kaufman and A Alm ldquoClinical applicationrdquo Adlerrsquos Physiology of

the Eye 2003

[10] R N Rakvic B J Ulis R P Broussard R W Ives and N Steiner

ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics Security vol 4

no 4 pp 812ndash823 Dec 2009

[11] S Crihalmeanu and A Ross ldquoMultispectral scleral patterns for ocular

biometric recognitionrdquo Pattern Recognit Lett vol 33 no 14 pp 1860ndash

1869 Oct 2012

[12] W Wenying Z Dongming Z Yongdong L Jintao and G

Xiaoguang ldquoRobust spatial matching for object retrieval and its parallel

implementation on GPUrdquo IEEE Trans Multimedia vol 13 no 6 pp

1308ndash1318 Dec 2011Multimedia Sec Magdeburg Germany Sep 2004

pp 4ndash15

[13] Y Xu S Deka and R Righetti ldquoA hybrid CPU-GPGPU approach for

real-time elastographyrdquo IEEE Trans Ultrason Ferroelectr Freq Control

vol 58 no 12 pp 2631ndash2645 Dec 2011

[14] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive

multimodal eye recognitionrdquo Signal Image Video Process vol 7 no 4

pp 619ndash631 Jul 2013

[15] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive

approach for sclera image quality measurerdquo Int J Biometrics vol 5 no 2

pp 181ndash198 2013

[16] Z Zhou E Y Du N L Thomas and E J Delp ldquoA new human

identification method Sclera recognitionrdquo IEEE Trans Syst Man

Cybern A Syst Humans vol 42 no 3 pp 571ndash583 May 2012

Page 59: Major project report on

edges of an image So HOG is used to determine the gradient orientation

and edge orientations of vein pattern in the sclera region of an eye image

To follow out this technique first of all divide the image into small

connected regions called cells For each cell compute the histogram of

gradient directions or edge orientations of the pixels Then the combination

of different histogram of different cell represents the descriptor To improve

accuracy histograms can be contrast normalized by calculating the intensity

from the block and then using this value normalizes all cells within the

block This normalization result shows that it is invariant to geometric and

photometric changes The gradient magnitude m(x y) and orientation 1050592(x

y) are calculated using x and y directions gradients dx (x y) and dy (x y)

Orientation binning is the second step of HOG This method utilized

to create cell histograms Each pixel within the cell used to give a weight to

the orientation which is found in the gradient computation Gradient

magnitude is used as the weight The cells are in the rectangular form The

binning of gradient orientation should be spread over 0 to 180 degrees and

opposite direction counts as the same In the Fig 8 depicts the edge

orientation of picture elements If the images have any illumination and

contrast changes then the gradient strength must be locally normalized For

that cells are grouped together into larger blocks These blocks are

overlapping so that each cell contributes more than once to the final

descriptor Here rectangular HOG (R-HOG) blocks are applied which are

mainly in square grids The performance of HOG is improved by putting

on a Gaussian window into each block

FIG

CHAPTER 3

SOFTWARE SPECIFICATION

31 GENERAL

MATLAB(matrix laboratory) is a numerical

computing environment and fourth-generation programming language

Developed by Math Works MATLAB allows matrix manipulations

plotting of functions and data implementation of algorithms creation

of user interfaces and interfacing with programs written in other languages

including C C++ Java and Fortran

Although MATLAB is intended primarily for numerical computing an

optional toolbox uses the MuPAD symbolic engine allowing access

to symbolic computing capabilities An additional package Simulink adds

graphicalmulti-domainsimulationandModel-Based

Design for dynamic and embedded systems

In 2004 MATLAB had around one million users across industry

and academia MATLAB users come from various backgrounds

of engineering science and economics MATLAB is widely used in

academic and research institutions as well as industrial enterprises

MATLAB was first adopted by researchers and practitioners

in control engineering Littles specialty but quickly spread to many other

domains It is now also used in education in particular the teaching

of linear algebra and numerical analysis and is popular amongst scientists

involved in image processing The MATLAB application is built around the

MATLAB language The simplest way to execute MATLAB code is to type

it in the Command Window which is one of the elements of the MATLAB

Desktop When code is entered in the Command Window MATLAB can

be used as an interactive mathematical shell Sequences of commands can

be saved in a text file typically using the MATLAB Editor as a script or

encapsulated into a function extending the commands available

MATLAB provides a number of features for documenting and

sharing your work You can integrate your MATLAB code with other

languages and applications and distribute your MATLAB algorithms and

applications

32 FEATURES OF MATLAB

High-level language for technical computing

Development environment for managing code files and data

Interactive tools for iterative exploration design and problem solving

Mathematical functions for linear algebra statistics Fourier analysis

filtering optimization and numerical integration

2-D and 3-D graphics functions for visualizing data

Tools for building custom graphical user interfaces

Functions for integrating MATLAB based algorithms with external

applications and languages such as C C++ FORTRAN Javatrade COM

and Microsoft Excel

MATLAB is used in vast area including signal and image

processing communications control design test and measurement

financial modeling and analysis and computational Add-on toolboxes

(collections of special-purpose MATLAB functions) extend the MATLAB

environment to solve particular classes of problems in these application

areas

MATLAB can be used on personal computers and powerful

server systems including the Cheaha compute cluster With the addition of

the Parallel Computing Toolbox the language can be extended with parallel

implementations for common computational functions including for-loop

unrolling Additionally this toolbox supports offloading computationally

intensive workloads to Cheaha the campus compute cluster MATLAB is

one of a few languages in which each variable is a matrix (broadly

construed) and knows how big it is Moreover the fundamental operators

(eg addition multiplication) are programmed to deal with matrices when

required And the MATLAB environment handles much of the bothersome

housekeeping that makes all this possible Since so many of the procedures

required for Macro-Investment Analysis involves matrices MATLAB

proves to be an extremely efficient language for both communication and

implementation

321 INTERFACING WITH OTHER LANGUAGES

MATLAB can call functions and subroutines written in the C

programming language or FORTRAN A wrapper function is created

allowing MATLAB data types to be passed and returned The dynamically

loadable object files created by compiling such functions are termed MEX-

files (for MATLAB executable)

Libraries written in Java ActiveX or NET can be directly called

from MATLAB and many MATLAB libraries (for

example XML or SQL support) are implemented as wrappers around Java

or ActiveX libraries Calling MATLAB from Java is more complicated but

can be done with MATLAB extension which is sold separately by Math

Works or using an undocumented mechanism called JMI (Java-to-Mat lab

Interface) which should not be confused with the unrelated Java that is also

called JMI

As alternatives to the MuPAD based Symbolic Math Toolbox

available from Math Works MATLAB can be connected

to Maple or Mathematical

Libraries also exist to import and export MathML

Development Environment

Startup Accelerator for faster MATLAB startup on Windows especially on

Windows XP and for network installations

Spreadsheet Import Tool that provides more options for selecting and

loading mixed textual and numeric data

Readability and navigation improvements to warning and error messages in

the MATLAB command window

Automatic variable and function renaming in the MATLAB Editor

Developing Algorithms and Applications

MATLAB provides a high-level language and development

tools that let you quickly develop and analyze your algorithms and

applications

The MATLAB Language

The MATLAB language supports the vector and matrix operations

that are fundamental to engineering and scientific problems It enables fast

development and execution With the MATLAB language you can

program and develop algorithms faster than with traditional languages

because you do not need to perform low-level administrative tasks such as

declaring variables specifying data types and allocating memory In many

cases MATLAB eliminates the need for lsquoforrsquo loops As a result one line of

MATLAB code can often replace several lines of C or C++ code

At the same time MATLAB provides all the features of a traditional

programming language including arithmetic operators flow control data

structures data types object-oriented programming (OOP) and debugging

features

MATLAB lets you execute commands or groups of commands one

at a time without compiling and linking enabling you to quickly iterate to

the optimal solution For fast execution of heavy matrix and vector

computations MATLAB uses processor-optimized libraries For general-

purpose scalar computations MATLAB generates machine-code

instructions using its JIT (Just-In-Time) compilation technology

This technology which is available on most platforms provides

execution speeds that rival those of traditional programming languages

Development Tools

MATLAB includes development tools that help you implement

your algorithm efficiently These include the following

MATLAB Editor

Provides standard editing and debugging features such as setting

breakpoints and single stepping

Code Analyzer

Checks your code for problems and recommends modifications to

maximize performance and maintainability

MATLAB Profiler

Records the time spent executing each line of code

Directory Reports

Scan all the files in a directory and report on code efficiency file

differences file dependencies and code coverage

Designing Graphical User Interfaces

By using the interactive tool GUIDE (Graphical User Interface

Development Environment) to layout design and edit user interfaces

GUIDE lets you include list boxes pull-down menus push buttons radio

buttons and sliders as well as MATLAB plots and Microsoft

ActiveXreg controls Alternatively you can create GUIs programmatically

using MATLAB functions

322 ANALYZING AND ACCESSING DATA

MATLAB supports the entire data analysis process from acquiring

data from external devices and databases through preprocessing

visualization and numerical analysis to producing presentation-quality

output

Data Analysis

MATLAB provides interactive tools and command-line functions for data

analysis operations including

Interpolating and decimating

Extracting sections of data scaling and averaging

Thresholding and smoothing

Correlation Fourier analysis and filtering

1-D peak valley and zero finding

Basic statistics and curve fitting

Matrix analysis

Data Access

MATLAB is an efficient platform for accessing data from

files other applications databases and external devices You can read data

from popular file formats such as Microsoft Excel ASCII text or binary

files image sound and video files and scientific files such as HDF and

HDF5 Low-level binary file IO functions let you work with data files in

any format Additional functions let you read data from Web pages and

XML

Visualizing Data

All the graphics features that are required to visualize engineering

and scientific data are available in MATLAB These include 2-D and 3-D

plotting functions 3-D volume visualization functions tools for

interactively creating plots and the ability to export results to all popular

graphics formats You can customize plots by adding multiple axes

changing line colors and markers adding annotation Latex equations and

legends and drawing shapes

2-D Plotting

Visualizing vectors of data with 2-D plotting functions that create

Line area bar and pie charts

Direction and velocity plots

Histograms

Polygons and surfaces

Scatterbubble plots

Animations

3-D Plotting and Volume Visualization

MATLAB provides functions for visualizing 2-D matrices 3-

D scalar and 3-D vector data You can use these functions to visualize and

understand large often complex multidimensional data Specifying plot

characteristics such as camera viewing angle perspective lighting effect

light source locations and transparency

3-D plotting functions include

Surface contour and mesh

Image plots

Cone slice stream and isosurface

323 PERFORMING NUMERIC COMPUTATION

MATLAB contains mathematical statistical and engineering

functions to support all common engineering and science operations These

functions developed by experts in mathematics are the foundation of the

MATLAB language The core math functions use the LAPACK and BLAS

linear algebra subroutine libraries and the FFTW Discrete Fourier

Transform library Because these processor-dependent libraries are

optimized to the different platforms that MATLAB supports they execute

faster than the equivalent C or C++ code

MATLAB provides the following types of functions for performing

mathematical operations and analyzing data

Matrix manipulation and linear algebra

Polynomials and interpolation

Fourier analysis and filtering

Data analysis and statistics

Optimization and numerical integration

Ordinary differential equations (ODEs)

Partial differential equations (PDEs)

Sparse matrix operations

MATLAB can perform arithmetic on a wide range of data types

including doubles singles and integers

CHAPTER 4

IMPLEMENTATION

41 GENERAL

Matlab is a program that was originally designed to simplify the

implementation of numerical linear algebra routines It has since grown into

something much bigger and it is used to implement numerical algorithms

for a wide range of applications The basic language used is very similar to

standard linear algebra notation but there are a few extensions that will

likely cause you some problems at first

42 SNAPSHOTS

ORIGINAL SCLERA IMAGE IS CONVERTED INTO GREY SCALE IMAGE

FIG

GREY SCALE IMAGE IS CONVERTED INTO BINARY IMAGE

FIG

EDGE DETECTON IS DONE BY OTSUrsquoS THRESHOLDING

FIG

SELECTING THE REGION OF INTEREST (SCLERA PART)

FIG

SELECTED ROI PART

FIG

FIG

ENHANCEMENT OF SCLERA IMAGE

FIG

FEATURE EXTRACTION OF SCLERA IMAGE USING GABOR

FILTERS

FIG

MATCHING WITH IMAGES IN DATABASE

FIG

DISPLAYING THE RESULT (MATCHED OR NOT MATCHED)

FIG

CHAPTER 5

APPLICATIONS

The applications of biometrics can be divided into the following three main groups

Commercial applications such as computer network login electronic data security ecommerce Internet access ATM credit card physical access control cellular phone PDA medical records management distance learning etc

Government applications such as national ID card correctional facility driverrsquos license social security welfare-disbursement border control Passport control etc

Forensic applications such as corpse identification criminal investigation terrorist identification parenthood determination missing children etc Traditionally commercial applications have used knowledge-based systems (eg PIN sand passwords) government applications have used token-based systems (eg ID cards and badges) and forensic applications have relied on human experts to match biometric features Biometric systems are being increasingly deployed in large scale civilian applications The Schiphol Premium scheme at the Amsterdam airport for example employs iris scan cards to speed up the passport and visa control procedures

CHAPTER 6

CONCLUSION AND FUTURE SCOPE

61 CONCLUSION

In this paper we proposed a new parallel sclera vein recognition

method which employees a two stage parallel approach for registration and

matching Even though the research focused on developing a parallel sclera

matching solution for the sequential line-descriptor method using CUDA

GPU architecture the parallel strategies developed in this research can be

applied to design parallel solutions to other sclera vein recognition methods

and general pattern recognition methods We designed the Y shape

descriptor to narrow the search range to increase the matching efficiency

which is a new feature extraction method to take advantage of the GPU

structures We developed the WPL descriptor to incorporate mask

information and make it more suitable for parallel computing which can

dramatically reduce data transferring and computation We then carefully

mapped our algorithms to GPU threads and blocks which is an important

step to achieve parallel computation efficiency using a GPU A work flow

which has high arithmetic intensity to hide the memory access latency was

designed to partition the computation task to the heterogeneous system of

CPU and GPU even to the threads in GPU The proposed method

dramatically improves the matching efficiency without compromising

recognition accuracy

62 REFERENCES

[1] C W Oyster The Human Eye Structure and Function Sunderland

Sinauer Associates 1999

[2] C Cuevas D Berjon F Moran and N Garcia ldquoMoving object

detection for real-time augmented reality applications in a GPGPUrdquo IEEE

Trans Consum Electron vol 58 no 1 pp 117ndash125 Feb 2012

[3] D C Cirean U Meier L M Gambardella and J Schmidhuber ldquoDeep

big simple neural nets for handwritten digit recognitionrdquo Neural Comput

vol 22 no 12 pp 3207ndash3220 2010

[4] F Z Sakr M Taher and A M Wahba ldquoHigh performance iris

recognition system on GPUrdquo in Proc ICCES 2011 pp 237ndash242

[5] G Poli J H Saito J F Mari and M R Zorzan ldquoProcessing

neocognitron of face recognition on high performance environment based

on GPU with CUDA architecturerdquo in Proc 20th Int Symp Comput

Archit High Perform Comput 2008 pp 81ndash88

[6] J Antikainen J Havel R Josth A Herout P Zemcik and M Hauta-

Kasari ldquoNonnegative tensor factorization accelerated using GPGPUrdquo IEEE

Trans Parallel Distrib Syst vol 22 no 7 pp 1135ndash1141 Feb 2011

[7] K-S Oh and K Jung ldquoGPU implementation of neural networksrdquo

Pattern Recognit vol 37 no 6 pp 1311ndash1314 2004

[8] P R Dixon T Oonishi and S Furui ldquoHarnessing graphics processors

for the fast computation of acoustic likelihoods in speech recognitionrdquo

Comput Speech Lang vol 23 no 4 pp 510ndash526 2009

[9] P Kaufman and A Alm ldquoClinical applicationrdquo Adlerrsquos Physiology of

the Eye 2003

[10] R N Rakvic B J Ulis R P Broussard R W Ives and N Steiner

ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics Security vol 4

no 4 pp 812ndash823 Dec 2009

[11] S Crihalmeanu and A Ross ldquoMultispectral scleral patterns for ocular

biometric recognitionrdquo Pattern Recognit Lett vol 33 no 14 pp 1860ndash

1869 Oct 2012

[12] W Wenying Z Dongming Z Yongdong L Jintao and G

Xiaoguang ldquoRobust spatial matching for object retrieval and its parallel

implementation on GPUrdquo IEEE Trans Multimedia vol 13 no 6 pp

1308ndash1318 Dec 2011Multimedia Sec Magdeburg Germany Sep 2004

pp 4ndash15

[13] Y Xu S Deka and R Righetti ldquoA hybrid CPU-GPGPU approach for

real-time elastographyrdquo IEEE Trans Ultrason Ferroelectr Freq Control

vol 58 no 12 pp 2631ndash2645 Dec 2011

[14] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive

multimodal eye recognitionrdquo Signal Image Video Process vol 7 no 4

pp 619ndash631 Jul 2013

[15] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive

approach for sclera image quality measurerdquo Int J Biometrics vol 5 no 2

pp 181ndash198 2013

[16] Z Zhou E Y Du N L Thomas and E J Delp ldquoA new human

identification method Sclera recognitionrdquo IEEE Trans Syst Man

Cybern A Syst Humans vol 42 no 3 pp 571ndash583 May 2012

Page 60: Major project report on

FIG

CHAPTER 3

SOFTWARE SPECIFICATION

31 GENERAL

MATLAB(matrix laboratory) is a numerical

computing environment and fourth-generation programming language

Developed by Math Works MATLAB allows matrix manipulations

plotting of functions and data implementation of algorithms creation

of user interfaces and interfacing with programs written in other languages

including C C++ Java and Fortran

Although MATLAB is intended primarily for numerical computing an

optional toolbox uses the MuPAD symbolic engine allowing access

to symbolic computing capabilities An additional package Simulink adds

graphicalmulti-domainsimulationandModel-Based

Design for dynamic and embedded systems

In 2004 MATLAB had around one million users across industry

and academia MATLAB users come from various backgrounds

of engineering science and economics MATLAB is widely used in

academic and research institutions as well as industrial enterprises

MATLAB was first adopted by researchers and practitioners

in control engineering Littles specialty but quickly spread to many other

domains It is now also used in education in particular the teaching

of linear algebra and numerical analysis and is popular amongst scientists

involved in image processing The MATLAB application is built around the

MATLAB language The simplest way to execute MATLAB code is to type

it in the Command Window which is one of the elements of the MATLAB

Desktop When code is entered in the Command Window MATLAB can

be used as an interactive mathematical shell Sequences of commands can

be saved in a text file typically using the MATLAB Editor as a script or

encapsulated into a function extending the commands available

MATLAB provides a number of features for documenting and

sharing your work You can integrate your MATLAB code with other

languages and applications and distribute your MATLAB algorithms and

applications

32 FEATURES OF MATLAB

High-level language for technical computing

Development environment for managing code files and data

Interactive tools for iterative exploration design and problem solving

Mathematical functions for linear algebra statistics Fourier analysis

filtering optimization and numerical integration

2-D and 3-D graphics functions for visualizing data

Tools for building custom graphical user interfaces

Functions for integrating MATLAB based algorithms with external

applications and languages such as C C++ FORTRAN Javatrade COM

and Microsoft Excel

MATLAB is used in vast area including signal and image

processing communications control design test and measurement

financial modeling and analysis and computational Add-on toolboxes

(collections of special-purpose MATLAB functions) extend the MATLAB

environment to solve particular classes of problems in these application

areas

MATLAB can be used on personal computers and powerful

server systems including the Cheaha compute cluster With the addition of

the Parallel Computing Toolbox the language can be extended with parallel

implementations for common computational functions including for-loop

unrolling Additionally this toolbox supports offloading computationally

intensive workloads to Cheaha the campus compute cluster MATLAB is

one of a few languages in which each variable is a matrix (broadly

construed) and knows how big it is Moreover the fundamental operators

(eg addition multiplication) are programmed to deal with matrices when

required And the MATLAB environment handles much of the bothersome

housekeeping that makes all this possible Since so many of the procedures

required for Macro-Investment Analysis involves matrices MATLAB

proves to be an extremely efficient language for both communication and

implementation

321 INTERFACING WITH OTHER LANGUAGES

MATLAB can call functions and subroutines written in the C

programming language or FORTRAN A wrapper function is created

allowing MATLAB data types to be passed and returned The dynamically

loadable object files created by compiling such functions are termed MEX-

files (for MATLAB executable)

Libraries written in Java ActiveX or NET can be directly called

from MATLAB and many MATLAB libraries (for

example XML or SQL support) are implemented as wrappers around Java

or ActiveX libraries Calling MATLAB from Java is more complicated but

can be done with MATLAB extension which is sold separately by Math

Works or using an undocumented mechanism called JMI (Java-to-Mat lab

Interface) which should not be confused with the unrelated Java that is also

called JMI

As alternatives to the MuPAD based Symbolic Math Toolbox

available from Math Works MATLAB can be connected

to Maple or Mathematical

Libraries also exist to import and export MathML

Development Environment

Startup Accelerator for faster MATLAB startup on Windows especially on

Windows XP and for network installations

Spreadsheet Import Tool that provides more options for selecting and

loading mixed textual and numeric data

Readability and navigation improvements to warning and error messages in

the MATLAB command window

Automatic variable and function renaming in the MATLAB Editor

Developing Algorithms and Applications

MATLAB provides a high-level language and development

tools that let you quickly develop and analyze your algorithms and

applications

The MATLAB Language

The MATLAB language supports the vector and matrix operations

that are fundamental to engineering and scientific problems It enables fast

development and execution With the MATLAB language you can

program and develop algorithms faster than with traditional languages

because you do not need to perform low-level administrative tasks such as

declaring variables specifying data types and allocating memory In many

cases MATLAB eliminates the need for lsquoforrsquo loops As a result one line of

MATLAB code can often replace several lines of C or C++ code

At the same time MATLAB provides all the features of a traditional

programming language including arithmetic operators flow control data

structures data types object-oriented programming (OOP) and debugging

features

MATLAB lets you execute commands or groups of commands one

at a time without compiling and linking enabling you to quickly iterate to

the optimal solution For fast execution of heavy matrix and vector

computations MATLAB uses processor-optimized libraries For general-

purpose scalar computations MATLAB generates machine-code

instructions using its JIT (Just-In-Time) compilation technology

This technology which is available on most platforms provides

execution speeds that rival those of traditional programming languages

Development Tools

MATLAB includes development tools that help you implement

your algorithm efficiently These include the following

MATLAB Editor

Provides standard editing and debugging features such as setting

breakpoints and single stepping

Code Analyzer

Checks your code for problems and recommends modifications to

maximize performance and maintainability

MATLAB Profiler

Records the time spent executing each line of code

Directory Reports

Scan all the files in a directory and report on code efficiency file

differences file dependencies and code coverage

Designing Graphical User Interfaces

By using the interactive tool GUIDE (Graphical User Interface

Development Environment) to layout design and edit user interfaces

GUIDE lets you include list boxes pull-down menus push buttons radio

buttons and sliders as well as MATLAB plots and Microsoft

ActiveXreg controls Alternatively you can create GUIs programmatically

using MATLAB functions

322 ANALYZING AND ACCESSING DATA

MATLAB supports the entire data analysis process from acquiring

data from external devices and databases through preprocessing

visualization and numerical analysis to producing presentation-quality

output

Data Analysis

MATLAB provides interactive tools and command-line functions for data

analysis operations including

Interpolating and decimating

Extracting sections of data scaling and averaging

Thresholding and smoothing

Correlation Fourier analysis and filtering

1-D peak valley and zero finding

Basic statistics and curve fitting

Matrix analysis

Data Access

MATLAB is an efficient platform for accessing data from

files other applications databases and external devices You can read data

from popular file formats such as Microsoft Excel ASCII text or binary

files image sound and video files and scientific files such as HDF and

HDF5 Low-level binary file IO functions let you work with data files in

any format Additional functions let you read data from Web pages and

XML

Visualizing Data

All the graphics features that are required to visualize engineering

and scientific data are available in MATLAB These include 2-D and 3-D

plotting functions 3-D volume visualization functions tools for

interactively creating plots and the ability to export results to all popular

graphics formats You can customize plots by adding multiple axes

changing line colors and markers adding annotation Latex equations and

legends and drawing shapes

2-D Plotting

Visualizing vectors of data with 2-D plotting functions that create

Line area bar and pie charts

Direction and velocity plots

Histograms

Polygons and surfaces

Scatterbubble plots

Animations

3-D Plotting and Volume Visualization

MATLAB provides functions for visualizing 2-D matrices 3-

D scalar and 3-D vector data You can use these functions to visualize and

understand large often complex multidimensional data Specifying plot

characteristics such as camera viewing angle perspective lighting effect

light source locations and transparency

3-D plotting functions include

Surface contour and mesh

Image plots

Cone slice stream and isosurface

323 PERFORMING NUMERIC COMPUTATION

MATLAB contains mathematical statistical and engineering

functions to support all common engineering and science operations These

functions developed by experts in mathematics are the foundation of the

MATLAB language The core math functions use the LAPACK and BLAS

linear algebra subroutine libraries and the FFTW Discrete Fourier

Transform library Because these processor-dependent libraries are

optimized to the different platforms that MATLAB supports they execute

faster than the equivalent C or C++ code

MATLAB provides the following types of functions for performing

mathematical operations and analyzing data

Matrix manipulation and linear algebra

Polynomials and interpolation

Fourier analysis and filtering

Data analysis and statistics

Optimization and numerical integration

Ordinary differential equations (ODEs)

Partial differential equations (PDEs)

Sparse matrix operations

MATLAB can perform arithmetic on a wide range of data types

including doubles singles and integers

CHAPTER 4

IMPLEMENTATION

41 GENERAL

Matlab is a program that was originally designed to simplify the

implementation of numerical linear algebra routines It has since grown into

something much bigger and it is used to implement numerical algorithms

for a wide range of applications The basic language used is very similar to

standard linear algebra notation but there are a few extensions that will

likely cause you some problems at first

42 SNAPSHOTS

ORIGINAL SCLERA IMAGE IS CONVERTED INTO GREY SCALE IMAGE

FIG

GREY SCALE IMAGE IS CONVERTED INTO BINARY IMAGE

FIG

EDGE DETECTON IS DONE BY OTSUrsquoS THRESHOLDING

FIG

SELECTING THE REGION OF INTEREST (SCLERA PART)

FIG

SELECTED ROI PART

FIG

FIG

ENHANCEMENT OF SCLERA IMAGE

FIG

FEATURE EXTRACTION OF SCLERA IMAGE USING GABOR

FILTERS

FIG

MATCHING WITH IMAGES IN DATABASE

FIG

DISPLAYING THE RESULT (MATCHED OR NOT MATCHED)

FIG

CHAPTER 5

APPLICATIONS

The applications of biometrics can be divided into the following three main groups

Commercial applications such as computer network login electronic data security ecommerce Internet access ATM credit card physical access control cellular phone PDA medical records management distance learning etc

Government applications such as national ID card correctional facility driverrsquos license social security welfare-disbursement border control Passport control etc

Forensic applications such as corpse identification criminal investigation terrorist identification parenthood determination missing children etc Traditionally commercial applications have used knowledge-based systems (eg PIN sand passwords) government applications have used token-based systems (eg ID cards and badges) and forensic applications have relied on human experts to match biometric features Biometric systems are being increasingly deployed in large scale civilian applications The Schiphol Premium scheme at the Amsterdam airport for example employs iris scan cards to speed up the passport and visa control procedures

CHAPTER 6

CONCLUSION AND FUTURE SCOPE

61 CONCLUSION

In this paper we proposed a new parallel sclera vein recognition

method which employees a two stage parallel approach for registration and

matching Even though the research focused on developing a parallel sclera

matching solution for the sequential line-descriptor method using CUDA

GPU architecture the parallel strategies developed in this research can be

applied to design parallel solutions to other sclera vein recognition methods

and general pattern recognition methods We designed the Y shape

descriptor to narrow the search range to increase the matching efficiency

which is a new feature extraction method to take advantage of the GPU

structures We developed the WPL descriptor to incorporate mask

information and make it more suitable for parallel computing which can

dramatically reduce data transferring and computation We then carefully

mapped our algorithms to GPU threads and blocks which is an important

step to achieve parallel computation efficiency using a GPU A work flow

which has high arithmetic intensity to hide the memory access latency was

designed to partition the computation task to the heterogeneous system of

CPU and GPU even to the threads in GPU The proposed method

dramatically improves the matching efficiency without compromising

recognition accuracy

62 REFERENCES

[1] C W Oyster The Human Eye Structure and Function Sunderland

Sinauer Associates 1999

[2] C Cuevas D Berjon F Moran and N Garcia ldquoMoving object

detection for real-time augmented reality applications in a GPGPUrdquo IEEE

Trans Consum Electron vol 58 no 1 pp 117ndash125 Feb 2012

[3] D C Cirean U Meier L M Gambardella and J Schmidhuber ldquoDeep

big simple neural nets for handwritten digit recognitionrdquo Neural Comput

vol 22 no 12 pp 3207ndash3220 2010

[4] F Z Sakr M Taher and A M Wahba ldquoHigh performance iris

recognition system on GPUrdquo in Proc ICCES 2011 pp 237ndash242

[5] G Poli J H Saito J F Mari and M R Zorzan ldquoProcessing

neocognitron of face recognition on high performance environment based

on GPU with CUDA architecturerdquo in Proc 20th Int Symp Comput

Archit High Perform Comput 2008 pp 81ndash88

[6] J Antikainen J Havel R Josth A Herout P Zemcik and M Hauta-

Kasari ldquoNonnegative tensor factorization accelerated using GPGPUrdquo IEEE

Trans Parallel Distrib Syst vol 22 no 7 pp 1135ndash1141 Feb 2011

[7] K-S Oh and K Jung ldquoGPU implementation of neural networksrdquo

Pattern Recognit vol 37 no 6 pp 1311ndash1314 2004

[8] P R Dixon T Oonishi and S Furui ldquoHarnessing graphics processors

for the fast computation of acoustic likelihoods in speech recognitionrdquo

Comput Speech Lang vol 23 no 4 pp 510ndash526 2009

[9] P Kaufman and A Alm ldquoClinical applicationrdquo Adlerrsquos Physiology of

the Eye 2003

[10] R N Rakvic B J Ulis R P Broussard R W Ives and N Steiner

ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics Security vol 4

no 4 pp 812ndash823 Dec 2009

[11] S Crihalmeanu and A Ross ldquoMultispectral scleral patterns for ocular

biometric recognitionrdquo Pattern Recognit Lett vol 33 no 14 pp 1860ndash

1869 Oct 2012

[12] W Wenying Z Dongming Z Yongdong L Jintao and G

Xiaoguang ldquoRobust spatial matching for object retrieval and its parallel

implementation on GPUrdquo IEEE Trans Multimedia vol 13 no 6 pp

1308ndash1318 Dec 2011Multimedia Sec Magdeburg Germany Sep 2004

pp 4ndash15

[13] Y Xu S Deka and R Righetti ldquoA hybrid CPU-GPGPU approach for

real-time elastographyrdquo IEEE Trans Ultrason Ferroelectr Freq Control

vol 58 no 12 pp 2631ndash2645 Dec 2011

[14] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive

multimodal eye recognitionrdquo Signal Image Video Process vol 7 no 4

pp 619ndash631 Jul 2013

[15] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive

approach for sclera image quality measurerdquo Int J Biometrics vol 5 no 2

pp 181ndash198 2013

[16] Z Zhou E Y Du N L Thomas and E J Delp ldquoA new human

identification method Sclera recognitionrdquo IEEE Trans Syst Man

Cybern A Syst Humans vol 42 no 3 pp 571ndash583 May 2012

Page 61: Major project report on

In 2004 MATLAB had around one million users across industry

and academia MATLAB users come from various backgrounds

of engineering science and economics MATLAB is widely used in

academic and research institutions as well as industrial enterprises

MATLAB was first adopted by researchers and practitioners

in control engineering Littles specialty but quickly spread to many other

domains It is now also used in education in particular the teaching

of linear algebra and numerical analysis and is popular amongst scientists

involved in image processing The MATLAB application is built around the

MATLAB language The simplest way to execute MATLAB code is to type

it in the Command Window which is one of the elements of the MATLAB

Desktop When code is entered in the Command Window MATLAB can

be used as an interactive mathematical shell Sequences of commands can

be saved in a text file typically using the MATLAB Editor as a script or

encapsulated into a function extending the commands available

MATLAB provides a number of features for documenting and

sharing your work You can integrate your MATLAB code with other

languages and applications and distribute your MATLAB algorithms and

applications

32 FEATURES OF MATLAB

High-level language for technical computing

Development environment for managing code files and data

Interactive tools for iterative exploration design and problem solving

Mathematical functions for linear algebra statistics Fourier analysis

filtering optimization and numerical integration

2-D and 3-D graphics functions for visualizing data

Tools for building custom graphical user interfaces

Functions for integrating MATLAB based algorithms with external

applications and languages such as C C++ FORTRAN Javatrade COM

and Microsoft Excel

MATLAB is used in vast area including signal and image

processing communications control design test and measurement

financial modeling and analysis and computational Add-on toolboxes

(collections of special-purpose MATLAB functions) extend the MATLAB

environment to solve particular classes of problems in these application

areas

MATLAB can be used on personal computers and powerful

server systems including the Cheaha compute cluster With the addition of

the Parallel Computing Toolbox the language can be extended with parallel

implementations for common computational functions including for-loop

unrolling Additionally this toolbox supports offloading computationally

intensive workloads to Cheaha the campus compute cluster MATLAB is

one of a few languages in which each variable is a matrix (broadly

construed) and knows how big it is Moreover the fundamental operators

(eg addition multiplication) are programmed to deal with matrices when

required And the MATLAB environment handles much of the bothersome

housekeeping that makes all this possible Since so many of the procedures

required for Macro-Investment Analysis involves matrices MATLAB

proves to be an extremely efficient language for both communication and

implementation

321 INTERFACING WITH OTHER LANGUAGES

MATLAB can call functions and subroutines written in the C

programming language or FORTRAN A wrapper function is created

allowing MATLAB data types to be passed and returned The dynamically

loadable object files created by compiling such functions are termed MEX-

files (for MATLAB executable)

Libraries written in Java ActiveX or NET can be directly called

from MATLAB and many MATLAB libraries (for

example XML or SQL support) are implemented as wrappers around Java

or ActiveX libraries Calling MATLAB from Java is more complicated but

can be done with MATLAB extension which is sold separately by Math

Works or using an undocumented mechanism called JMI (Java-to-Mat lab

Interface) which should not be confused with the unrelated Java that is also

called JMI

As alternatives to the MuPAD based Symbolic Math Toolbox

available from Math Works MATLAB can be connected

to Maple or Mathematical

Libraries also exist to import and export MathML

Development Environment

Startup Accelerator for faster MATLAB startup on Windows especially on

Windows XP and for network installations

Spreadsheet Import Tool that provides more options for selecting and

loading mixed textual and numeric data

Readability and navigation improvements to warning and error messages in

the MATLAB command window

Automatic variable and function renaming in the MATLAB Editor

Developing Algorithms and Applications

MATLAB provides a high-level language and development

tools that let you quickly develop and analyze your algorithms and

applications

The MATLAB Language

The MATLAB language supports the vector and matrix operations

that are fundamental to engineering and scientific problems It enables fast

development and execution With the MATLAB language you can

program and develop algorithms faster than with traditional languages

because you do not need to perform low-level administrative tasks such as

declaring variables specifying data types and allocating memory In many

cases MATLAB eliminates the need for lsquoforrsquo loops As a result one line of

MATLAB code can often replace several lines of C or C++ code

At the same time MATLAB provides all the features of a traditional

programming language including arithmetic operators flow control data

structures data types object-oriented programming (OOP) and debugging

features

MATLAB lets you execute commands or groups of commands one

at a time without compiling and linking enabling you to quickly iterate to

the optimal solution For fast execution of heavy matrix and vector

computations MATLAB uses processor-optimized libraries For general-

purpose scalar computations MATLAB generates machine-code

instructions using its JIT (Just-In-Time) compilation technology

This technology which is available on most platforms provides

execution speeds that rival those of traditional programming languages

Development Tools

MATLAB includes development tools that help you implement

your algorithm efficiently These include the following

MATLAB Editor

Provides standard editing and debugging features such as setting

breakpoints and single stepping

Code Analyzer

Checks your code for problems and recommends modifications to

maximize performance and maintainability

MATLAB Profiler

Records the time spent executing each line of code

Directory Reports

Scan all the files in a directory and report on code efficiency file

differences file dependencies and code coverage

Designing Graphical User Interfaces

By using the interactive tool GUIDE (Graphical User Interface

Development Environment) to layout design and edit user interfaces

GUIDE lets you include list boxes pull-down menus push buttons radio

buttons and sliders as well as MATLAB plots and Microsoft

ActiveXreg controls Alternatively you can create GUIs programmatically

using MATLAB functions

322 ANALYZING AND ACCESSING DATA

MATLAB supports the entire data analysis process from acquiring

data from external devices and databases through preprocessing

visualization and numerical analysis to producing presentation-quality

output

Data Analysis

MATLAB provides interactive tools and command-line functions for data

analysis operations including

Interpolating and decimating

Extracting sections of data scaling and averaging

Thresholding and smoothing

Correlation Fourier analysis and filtering

1-D peak valley and zero finding

Basic statistics and curve fitting

Matrix analysis

Data Access

MATLAB is an efficient platform for accessing data from

files other applications databases and external devices You can read data

from popular file formats such as Microsoft Excel ASCII text or binary

files image sound and video files and scientific files such as HDF and

HDF5 Low-level binary file IO functions let you work with data files in

any format Additional functions let you read data from Web pages and

XML

Visualizing Data

All the graphics features that are required to visualize engineering

and scientific data are available in MATLAB These include 2-D and 3-D

plotting functions 3-D volume visualization functions tools for

interactively creating plots and the ability to export results to all popular

graphics formats You can customize plots by adding multiple axes

changing line colors and markers adding annotation Latex equations and

legends and drawing shapes

2-D Plotting

Visualizing vectors of data with 2-D plotting functions that create

Line area bar and pie charts

Direction and velocity plots

Histograms

Polygons and surfaces

Scatterbubble plots

Animations

3-D Plotting and Volume Visualization

MATLAB provides functions for visualizing 2-D matrices 3-

D scalar and 3-D vector data You can use these functions to visualize and

understand large often complex multidimensional data Specifying plot

characteristics such as camera viewing angle perspective lighting effect

light source locations and transparency

3-D plotting functions include

Surface contour and mesh

Image plots

Cone slice stream and isosurface

323 PERFORMING NUMERIC COMPUTATION

MATLAB contains mathematical statistical and engineering

functions to support all common engineering and science operations These

functions developed by experts in mathematics are the foundation of the

MATLAB language The core math functions use the LAPACK and BLAS

linear algebra subroutine libraries and the FFTW Discrete Fourier

Transform library Because these processor-dependent libraries are

optimized to the different platforms that MATLAB supports they execute

faster than the equivalent C or C++ code

MATLAB provides the following types of functions for performing

mathematical operations and analyzing data

Matrix manipulation and linear algebra

Polynomials and interpolation

Fourier analysis and filtering

Data analysis and statistics

Optimization and numerical integration

Ordinary differential equations (ODEs)

Partial differential equations (PDEs)

Sparse matrix operations

MATLAB can perform arithmetic on a wide range of data types

including doubles singles and integers

CHAPTER 4

IMPLEMENTATION

41 GENERAL

Matlab is a program that was originally designed to simplify the

implementation of numerical linear algebra routines It has since grown into

something much bigger and it is used to implement numerical algorithms

for a wide range of applications The basic language used is very similar to

standard linear algebra notation but there are a few extensions that will

likely cause you some problems at first

42 SNAPSHOTS

ORIGINAL SCLERA IMAGE IS CONVERTED INTO GREY SCALE IMAGE

FIG

GREY SCALE IMAGE IS CONVERTED INTO BINARY IMAGE

FIG

EDGE DETECTON IS DONE BY OTSUrsquoS THRESHOLDING

FIG

SELECTING THE REGION OF INTEREST (SCLERA PART)

FIG

SELECTED ROI PART

FIG

FIG

ENHANCEMENT OF SCLERA IMAGE

FIG

FEATURE EXTRACTION OF SCLERA IMAGE USING GABOR

FILTERS

FIG

MATCHING WITH IMAGES IN DATABASE

FIG

DISPLAYING THE RESULT (MATCHED OR NOT MATCHED)

FIG

CHAPTER 5

APPLICATIONS

The applications of biometrics can be divided into the following three main groups

Commercial applications such as computer network login electronic data security ecommerce Internet access ATM credit card physical access control cellular phone PDA medical records management distance learning etc

Government applications such as national ID card correctional facility driverrsquos license social security welfare-disbursement border control Passport control etc

Forensic applications such as corpse identification criminal investigation terrorist identification parenthood determination missing children etc Traditionally commercial applications have used knowledge-based systems (eg PIN sand passwords) government applications have used token-based systems (eg ID cards and badges) and forensic applications have relied on human experts to match biometric features Biometric systems are being increasingly deployed in large scale civilian applications The Schiphol Premium scheme at the Amsterdam airport for example employs iris scan cards to speed up the passport and visa control procedures

CHAPTER 6

CONCLUSION AND FUTURE SCOPE

61 CONCLUSION

In this paper we proposed a new parallel sclera vein recognition

method which employees a two stage parallel approach for registration and

matching Even though the research focused on developing a parallel sclera

matching solution for the sequential line-descriptor method using CUDA

GPU architecture the parallel strategies developed in this research can be

applied to design parallel solutions to other sclera vein recognition methods

and general pattern recognition methods We designed the Y shape

descriptor to narrow the search range to increase the matching efficiency

which is a new feature extraction method to take advantage of the GPU

structures We developed the WPL descriptor to incorporate mask

information and make it more suitable for parallel computing which can

dramatically reduce data transferring and computation We then carefully

mapped our algorithms to GPU threads and blocks which is an important

step to achieve parallel computation efficiency using a GPU A work flow

which has high arithmetic intensity to hide the memory access latency was

designed to partition the computation task to the heterogeneous system of

CPU and GPU even to the threads in GPU The proposed method

dramatically improves the matching efficiency without compromising

recognition accuracy

62 REFERENCES

[1] C W Oyster The Human Eye Structure and Function Sunderland

Sinauer Associates 1999

[2] C Cuevas D Berjon F Moran and N Garcia ldquoMoving object

detection for real-time augmented reality applications in a GPGPUrdquo IEEE

Trans Consum Electron vol 58 no 1 pp 117ndash125 Feb 2012

[3] D C Cirean U Meier L M Gambardella and J Schmidhuber ldquoDeep

big simple neural nets for handwritten digit recognitionrdquo Neural Comput

vol 22 no 12 pp 3207ndash3220 2010

[4] F Z Sakr M Taher and A M Wahba ldquoHigh performance iris

recognition system on GPUrdquo in Proc ICCES 2011 pp 237ndash242

[5] G Poli J H Saito J F Mari and M R Zorzan ldquoProcessing

neocognitron of face recognition on high performance environment based

on GPU with CUDA architecturerdquo in Proc 20th Int Symp Comput

Archit High Perform Comput 2008 pp 81ndash88

[6] J Antikainen J Havel R Josth A Herout P Zemcik and M Hauta-

Kasari ldquoNonnegative tensor factorization accelerated using GPGPUrdquo IEEE

Trans Parallel Distrib Syst vol 22 no 7 pp 1135ndash1141 Feb 2011

[7] K-S Oh and K Jung ldquoGPU implementation of neural networksrdquo

Pattern Recognit vol 37 no 6 pp 1311ndash1314 2004

[8] P R Dixon T Oonishi and S Furui ldquoHarnessing graphics processors

for the fast computation of acoustic likelihoods in speech recognitionrdquo

Comput Speech Lang vol 23 no 4 pp 510ndash526 2009

[9] P Kaufman and A Alm ldquoClinical applicationrdquo Adlerrsquos Physiology of

the Eye 2003

[10] R N Rakvic B J Ulis R P Broussard R W Ives and N Steiner

ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics Security vol 4

no 4 pp 812ndash823 Dec 2009

[11] S Crihalmeanu and A Ross ldquoMultispectral scleral patterns for ocular

biometric recognitionrdquo Pattern Recognit Lett vol 33 no 14 pp 1860ndash

1869 Oct 2012

[12] W Wenying Z Dongming Z Yongdong L Jintao and G

Xiaoguang ldquoRobust spatial matching for object retrieval and its parallel

implementation on GPUrdquo IEEE Trans Multimedia vol 13 no 6 pp

1308ndash1318 Dec 2011Multimedia Sec Magdeburg Germany Sep 2004

pp 4ndash15

[13] Y Xu S Deka and R Righetti ldquoA hybrid CPU-GPGPU approach for

real-time elastographyrdquo IEEE Trans Ultrason Ferroelectr Freq Control

vol 58 no 12 pp 2631ndash2645 Dec 2011

[14] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive

multimodal eye recognitionrdquo Signal Image Video Process vol 7 no 4

pp 619ndash631 Jul 2013

[15] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive

approach for sclera image quality measurerdquo Int J Biometrics vol 5 no 2

pp 181ndash198 2013

[16] Z Zhou E Y Du N L Thomas and E J Delp ldquoA new human

identification method Sclera recognitionrdquo IEEE Trans Syst Man

Cybern A Syst Humans vol 42 no 3 pp 571ndash583 May 2012

Page 62: Major project report on

financial modeling and analysis and computational Add-on toolboxes

(collections of special-purpose MATLAB functions) extend the MATLAB

environment to solve particular classes of problems in these application

areas

MATLAB can be used on personal computers and powerful

server systems including the Cheaha compute cluster With the addition of

the Parallel Computing Toolbox the language can be extended with parallel

implementations for common computational functions including for-loop

unrolling Additionally this toolbox supports offloading computationally

intensive workloads to Cheaha the campus compute cluster MATLAB is

one of a few languages in which each variable is a matrix (broadly

construed) and knows how big it is Moreover the fundamental operators

(eg addition multiplication) are programmed to deal with matrices when

required And the MATLAB environment handles much of the bothersome

housekeeping that makes all this possible Since so many of the procedures

required for Macro-Investment Analysis involves matrices MATLAB

proves to be an extremely efficient language for both communication and

implementation

321 INTERFACING WITH OTHER LANGUAGES

MATLAB can call functions and subroutines written in the C

programming language or FORTRAN A wrapper function is created

allowing MATLAB data types to be passed and returned The dynamically

loadable object files created by compiling such functions are termed MEX-

files (for MATLAB executable)

Libraries written in Java ActiveX or NET can be directly called

from MATLAB and many MATLAB libraries (for

example XML or SQL support) are implemented as wrappers around Java

or ActiveX libraries Calling MATLAB from Java is more complicated but

can be done with MATLAB extension which is sold separately by Math

Works or using an undocumented mechanism called JMI (Java-to-Mat lab

Interface) which should not be confused with the unrelated Java that is also

called JMI

As alternatives to the MuPAD based Symbolic Math Toolbox

available from Math Works MATLAB can be connected

to Maple or Mathematical

Libraries also exist to import and export MathML

Development Environment

Startup Accelerator for faster MATLAB startup on Windows especially on

Windows XP and for network installations

Spreadsheet Import Tool that provides more options for selecting and

loading mixed textual and numeric data

Readability and navigation improvements to warning and error messages in

the MATLAB command window

Automatic variable and function renaming in the MATLAB Editor

Developing Algorithms and Applications

MATLAB provides a high-level language and development

tools that let you quickly develop and analyze your algorithms and

applications

The MATLAB Language

The MATLAB language supports the vector and matrix operations

that are fundamental to engineering and scientific problems It enables fast

development and execution With the MATLAB language you can

program and develop algorithms faster than with traditional languages

because you do not need to perform low-level administrative tasks such as

declaring variables specifying data types and allocating memory In many

cases MATLAB eliminates the need for lsquoforrsquo loops As a result one line of

MATLAB code can often replace several lines of C or C++ code

At the same time MATLAB provides all the features of a traditional

programming language including arithmetic operators flow control data

structures data types object-oriented programming (OOP) and debugging

features

MATLAB lets you execute commands or groups of commands one

at a time without compiling and linking enabling you to quickly iterate to

the optimal solution For fast execution of heavy matrix and vector

computations MATLAB uses processor-optimized libraries For general-

purpose scalar computations MATLAB generates machine-code

instructions using its JIT (Just-In-Time) compilation technology

This technology which is available on most platforms provides

execution speeds that rival those of traditional programming languages

Development Tools

MATLAB includes development tools that help you implement

your algorithm efficiently These include the following

MATLAB Editor

Provides standard editing and debugging features such as setting

breakpoints and single stepping

Code Analyzer

Checks your code for problems and recommends modifications to

maximize performance and maintainability

MATLAB Profiler

Records the time spent executing each line of code

Directory Reports

Scan all the files in a directory and report on code efficiency file

differences file dependencies and code coverage

Designing Graphical User Interfaces

By using the interactive tool GUIDE (Graphical User Interface

Development Environment) to layout design and edit user interfaces

GUIDE lets you include list boxes pull-down menus push buttons radio

buttons and sliders as well as MATLAB plots and Microsoft

ActiveXreg controls Alternatively you can create GUIs programmatically

using MATLAB functions

322 ANALYZING AND ACCESSING DATA

MATLAB supports the entire data analysis process from acquiring

data from external devices and databases through preprocessing

visualization and numerical analysis to producing presentation-quality

output

Data Analysis

MATLAB provides interactive tools and command-line functions for data

analysis operations including

Interpolating and decimating

Extracting sections of data scaling and averaging

Thresholding and smoothing

Correlation Fourier analysis and filtering

1-D peak valley and zero finding

Basic statistics and curve fitting

Matrix analysis

Data Access

MATLAB is an efficient platform for accessing data from

files other applications databases and external devices You can read data

from popular file formats such as Microsoft Excel ASCII text or binary

files image sound and video files and scientific files such as HDF and

HDF5 Low-level binary file IO functions let you work with data files in

any format Additional functions let you read data from Web pages and

XML

Visualizing Data

All the graphics features that are required to visualize engineering

and scientific data are available in MATLAB These include 2-D and 3-D

plotting functions 3-D volume visualization functions tools for

interactively creating plots and the ability to export results to all popular

graphics formats You can customize plots by adding multiple axes

changing line colors and markers adding annotation Latex equations and

legends and drawing shapes

2-D Plotting

Visualizing vectors of data with 2-D plotting functions that create

Line area bar and pie charts

Direction and velocity plots

Histograms

Polygons and surfaces

Scatterbubble plots

Animations

3-D Plotting and Volume Visualization

MATLAB provides functions for visualizing 2-D matrices 3-

D scalar and 3-D vector data You can use these functions to visualize and

understand large often complex multidimensional data Specifying plot

characteristics such as camera viewing angle perspective lighting effect

light source locations and transparency

3-D plotting functions include

Surface contour and mesh

Image plots

Cone slice stream and isosurface

323 PERFORMING NUMERIC COMPUTATION

MATLAB contains mathematical statistical and engineering

functions to support all common engineering and science operations These

functions developed by experts in mathematics are the foundation of the

MATLAB language The core math functions use the LAPACK and BLAS

linear algebra subroutine libraries and the FFTW Discrete Fourier

Transform library Because these processor-dependent libraries are

optimized to the different platforms that MATLAB supports they execute

faster than the equivalent C or C++ code

MATLAB provides the following types of functions for performing

mathematical operations and analyzing data

Matrix manipulation and linear algebra

Polynomials and interpolation

Fourier analysis and filtering

Data analysis and statistics

Optimization and numerical integration

Ordinary differential equations (ODEs)

Partial differential equations (PDEs)

Sparse matrix operations

MATLAB can perform arithmetic on a wide range of data types

including doubles singles and integers

CHAPTER 4

IMPLEMENTATION

41 GENERAL

Matlab is a program that was originally designed to simplify the

implementation of numerical linear algebra routines It has since grown into

something much bigger and it is used to implement numerical algorithms

for a wide range of applications The basic language used is very similar to

standard linear algebra notation but there are a few extensions that will

likely cause you some problems at first

42 SNAPSHOTS

ORIGINAL SCLERA IMAGE IS CONVERTED INTO GREY SCALE IMAGE

FIG

GREY SCALE IMAGE IS CONVERTED INTO BINARY IMAGE

FIG

EDGE DETECTON IS DONE BY OTSUrsquoS THRESHOLDING

FIG

SELECTING THE REGION OF INTEREST (SCLERA PART)

FIG

SELECTED ROI PART

FIG

FIG

ENHANCEMENT OF SCLERA IMAGE

FIG

FEATURE EXTRACTION OF SCLERA IMAGE USING GABOR

FILTERS

FIG

MATCHING WITH IMAGES IN DATABASE

FIG

DISPLAYING THE RESULT (MATCHED OR NOT MATCHED)

FIG

CHAPTER 5

APPLICATIONS

The applications of biometrics can be divided into the following three main groups

Commercial applications such as computer network login electronic data security ecommerce Internet access ATM credit card physical access control cellular phone PDA medical records management distance learning etc

Government applications such as national ID card correctional facility driverrsquos license social security welfare-disbursement border control Passport control etc

Forensic applications such as corpse identification criminal investigation terrorist identification parenthood determination missing children etc Traditionally commercial applications have used knowledge-based systems (eg PIN sand passwords) government applications have used token-based systems (eg ID cards and badges) and forensic applications have relied on human experts to match biometric features Biometric systems are being increasingly deployed in large scale civilian applications The Schiphol Premium scheme at the Amsterdam airport for example employs iris scan cards to speed up the passport and visa control procedures

CHAPTER 6

CONCLUSION AND FUTURE SCOPE

61 CONCLUSION

In this paper we proposed a new parallel sclera vein recognition

method which employees a two stage parallel approach for registration and

matching Even though the research focused on developing a parallel sclera

matching solution for the sequential line-descriptor method using CUDA

GPU architecture the parallel strategies developed in this research can be

applied to design parallel solutions to other sclera vein recognition methods

and general pattern recognition methods We designed the Y shape

descriptor to narrow the search range to increase the matching efficiency

which is a new feature extraction method to take advantage of the GPU

structures We developed the WPL descriptor to incorporate mask

information and make it more suitable for parallel computing which can

dramatically reduce data transferring and computation We then carefully

mapped our algorithms to GPU threads and blocks which is an important

step to achieve parallel computation efficiency using a GPU A work flow

which has high arithmetic intensity to hide the memory access latency was

designed to partition the computation task to the heterogeneous system of

CPU and GPU even to the threads in GPU The proposed method

dramatically improves the matching efficiency without compromising

recognition accuracy

62 REFERENCES

[1] C W Oyster The Human Eye Structure and Function Sunderland

Sinauer Associates 1999

[2] C Cuevas D Berjon F Moran and N Garcia ldquoMoving object

detection for real-time augmented reality applications in a GPGPUrdquo IEEE

Trans Consum Electron vol 58 no 1 pp 117ndash125 Feb 2012

[3] D C Cirean U Meier L M Gambardella and J Schmidhuber ldquoDeep

big simple neural nets for handwritten digit recognitionrdquo Neural Comput

vol 22 no 12 pp 3207ndash3220 2010

[4] F Z Sakr M Taher and A M Wahba ldquoHigh performance iris

recognition system on GPUrdquo in Proc ICCES 2011 pp 237ndash242

[5] G Poli J H Saito J F Mari and M R Zorzan ldquoProcessing

neocognitron of face recognition on high performance environment based

on GPU with CUDA architecturerdquo in Proc 20th Int Symp Comput

Archit High Perform Comput 2008 pp 81ndash88

[6] J Antikainen J Havel R Josth A Herout P Zemcik and M Hauta-

Kasari ldquoNonnegative tensor factorization accelerated using GPGPUrdquo IEEE

Trans Parallel Distrib Syst vol 22 no 7 pp 1135ndash1141 Feb 2011

[7] K-S Oh and K Jung ldquoGPU implementation of neural networksrdquo

Pattern Recognit vol 37 no 6 pp 1311ndash1314 2004

[8] P R Dixon T Oonishi and S Furui ldquoHarnessing graphics processors

for the fast computation of acoustic likelihoods in speech recognitionrdquo

Comput Speech Lang vol 23 no 4 pp 510ndash526 2009

[9] P Kaufman and A Alm ldquoClinical applicationrdquo Adlerrsquos Physiology of

the Eye 2003

[10] R N Rakvic B J Ulis R P Broussard R W Ives and N Steiner

ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics Security vol 4

no 4 pp 812ndash823 Dec 2009

[11] S Crihalmeanu and A Ross ldquoMultispectral scleral patterns for ocular

biometric recognitionrdquo Pattern Recognit Lett vol 33 no 14 pp 1860ndash

1869 Oct 2012

[12] W Wenying Z Dongming Z Yongdong L Jintao and G

Xiaoguang ldquoRobust spatial matching for object retrieval and its parallel

implementation on GPUrdquo IEEE Trans Multimedia vol 13 no 6 pp

1308ndash1318 Dec 2011Multimedia Sec Magdeburg Germany Sep 2004

pp 4ndash15

[13] Y Xu S Deka and R Righetti ldquoA hybrid CPU-GPGPU approach for

real-time elastographyrdquo IEEE Trans Ultrason Ferroelectr Freq Control

vol 58 no 12 pp 2631ndash2645 Dec 2011

[14] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive

multimodal eye recognitionrdquo Signal Image Video Process vol 7 no 4

pp 619ndash631 Jul 2013

[15] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive

approach for sclera image quality measurerdquo Int J Biometrics vol 5 no 2

pp 181ndash198 2013

[16] Z Zhou E Y Du N L Thomas and E J Delp ldquoA new human

identification method Sclera recognitionrdquo IEEE Trans Syst Man

Cybern A Syst Humans vol 42 no 3 pp 571ndash583 May 2012

Page 63: Major project report on

As alternatives to the MuPAD based Symbolic Math Toolbox

available from Math Works MATLAB can be connected

to Maple or Mathematical

Libraries also exist to import and export MathML

Development Environment

Startup Accelerator for faster MATLAB startup on Windows especially on

Windows XP and for network installations

Spreadsheet Import Tool that provides more options for selecting and

loading mixed textual and numeric data

Readability and navigation improvements to warning and error messages in

the MATLAB command window

Automatic variable and function renaming in the MATLAB Editor

Developing Algorithms and Applications

MATLAB provides a high-level language and development

tools that let you quickly develop and analyze your algorithms and

applications

The MATLAB Language

The MATLAB language supports the vector and matrix operations

that are fundamental to engineering and scientific problems It enables fast

development and execution With the MATLAB language you can

program and develop algorithms faster than with traditional languages

because you do not need to perform low-level administrative tasks such as

declaring variables specifying data types and allocating memory In many

cases MATLAB eliminates the need for lsquoforrsquo loops As a result one line of

MATLAB code can often replace several lines of C or C++ code

At the same time MATLAB provides all the features of a traditional

programming language including arithmetic operators flow control data

structures data types object-oriented programming (OOP) and debugging

features

MATLAB lets you execute commands or groups of commands one

at a time without compiling and linking enabling you to quickly iterate to

the optimal solution For fast execution of heavy matrix and vector

computations MATLAB uses processor-optimized libraries For general-

purpose scalar computations MATLAB generates machine-code

instructions using its JIT (Just-In-Time) compilation technology

This technology which is available on most platforms provides

execution speeds that rival those of traditional programming languages

Development Tools

MATLAB includes development tools that help you implement

your algorithm efficiently These include the following

MATLAB Editor

Provides standard editing and debugging features such as setting

breakpoints and single stepping

Code Analyzer

Checks your code for problems and recommends modifications to

maximize performance and maintainability

MATLAB Profiler

Records the time spent executing each line of code

Directory Reports

Scan all the files in a directory and report on code efficiency file

differences file dependencies and code coverage

Designing Graphical User Interfaces

By using the interactive tool GUIDE (Graphical User Interface

Development Environment) to layout design and edit user interfaces

GUIDE lets you include list boxes pull-down menus push buttons radio

buttons and sliders as well as MATLAB plots and Microsoft

ActiveXreg controls Alternatively you can create GUIs programmatically

using MATLAB functions

322 ANALYZING AND ACCESSING DATA

MATLAB supports the entire data analysis process from acquiring

data from external devices and databases through preprocessing

visualization and numerical analysis to producing presentation-quality

output

Data Analysis

MATLAB provides interactive tools and command-line functions for data

analysis operations including

Interpolating and decimating

Extracting sections of data scaling and averaging

Thresholding and smoothing

Correlation Fourier analysis and filtering

1-D peak valley and zero finding

Basic statistics and curve fitting

Matrix analysis

Data Access

MATLAB is an efficient platform for accessing data from

files other applications databases and external devices You can read data

from popular file formats such as Microsoft Excel ASCII text or binary

files image sound and video files and scientific files such as HDF and

HDF5 Low-level binary file IO functions let you work with data files in

any format Additional functions let you read data from Web pages and

XML

Visualizing Data

All the graphics features that are required to visualize engineering

and scientific data are available in MATLAB These include 2-D and 3-D

plotting functions 3-D volume visualization functions tools for

interactively creating plots and the ability to export results to all popular

graphics formats You can customize plots by adding multiple axes

changing line colors and markers adding annotation Latex equations and

legends and drawing shapes

2-D Plotting

Visualizing vectors of data with 2-D plotting functions that create

Line area bar and pie charts

Direction and velocity plots

Histograms

Polygons and surfaces

Scatterbubble plots

Animations

3-D Plotting and Volume Visualization

MATLAB provides functions for visualizing 2-D matrices 3-

D scalar and 3-D vector data You can use these functions to visualize and

understand large often complex multidimensional data Specifying plot

characteristics such as camera viewing angle perspective lighting effect

light source locations and transparency

3-D plotting functions include

Surface contour and mesh

Image plots

Cone slice stream and isosurface

323 PERFORMING NUMERIC COMPUTATION

MATLAB contains mathematical statistical and engineering

functions to support all common engineering and science operations These

functions developed by experts in mathematics are the foundation of the

MATLAB language The core math functions use the LAPACK and BLAS

linear algebra subroutine libraries and the FFTW Discrete Fourier

Transform library Because these processor-dependent libraries are

optimized to the different platforms that MATLAB supports they execute

faster than the equivalent C or C++ code

MATLAB provides the following types of functions for performing

mathematical operations and analyzing data

Matrix manipulation and linear algebra

Polynomials and interpolation

Fourier analysis and filtering

Data analysis and statistics

Optimization and numerical integration

Ordinary differential equations (ODEs)

Partial differential equations (PDEs)

Sparse matrix operations

MATLAB can perform arithmetic on a wide range of data types

including doubles singles and integers

CHAPTER 4

IMPLEMENTATION

41 GENERAL

Matlab is a program that was originally designed to simplify the

implementation of numerical linear algebra routines It has since grown into

something much bigger and it is used to implement numerical algorithms

for a wide range of applications The basic language used is very similar to

standard linear algebra notation but there are a few extensions that will

likely cause you some problems at first

42 SNAPSHOTS

ORIGINAL SCLERA IMAGE IS CONVERTED INTO GREY SCALE IMAGE

FIG

GREY SCALE IMAGE IS CONVERTED INTO BINARY IMAGE

FIG

EDGE DETECTON IS DONE BY OTSUrsquoS THRESHOLDING

FIG

SELECTING THE REGION OF INTEREST (SCLERA PART)

FIG

SELECTED ROI PART

FIG

FIG

ENHANCEMENT OF SCLERA IMAGE

FIG

FEATURE EXTRACTION OF SCLERA IMAGE USING GABOR

FILTERS

FIG

MATCHING WITH IMAGES IN DATABASE

FIG

DISPLAYING THE RESULT (MATCHED OR NOT MATCHED)

FIG

CHAPTER 5

APPLICATIONS

The applications of biometrics can be divided into the following three main groups

Commercial applications such as computer network login electronic data security ecommerce Internet access ATM credit card physical access control cellular phone PDA medical records management distance learning etc

Government applications such as national ID card correctional facility driverrsquos license social security welfare-disbursement border control Passport control etc

Forensic applications such as corpse identification criminal investigation terrorist identification parenthood determination missing children etc Traditionally commercial applications have used knowledge-based systems (eg PIN sand passwords) government applications have used token-based systems (eg ID cards and badges) and forensic applications have relied on human experts to match biometric features Biometric systems are being increasingly deployed in large scale civilian applications The Schiphol Premium scheme at the Amsterdam airport for example employs iris scan cards to speed up the passport and visa control procedures

CHAPTER 6

CONCLUSION AND FUTURE SCOPE

61 CONCLUSION

In this paper we proposed a new parallel sclera vein recognition

method which employees a two stage parallel approach for registration and

matching Even though the research focused on developing a parallel sclera

matching solution for the sequential line-descriptor method using CUDA

GPU architecture the parallel strategies developed in this research can be

applied to design parallel solutions to other sclera vein recognition methods

and general pattern recognition methods We designed the Y shape

descriptor to narrow the search range to increase the matching efficiency

which is a new feature extraction method to take advantage of the GPU

structures We developed the WPL descriptor to incorporate mask

information and make it more suitable for parallel computing which can

dramatically reduce data transferring and computation We then carefully

mapped our algorithms to GPU threads and blocks which is an important

step to achieve parallel computation efficiency using a GPU A work flow

which has high arithmetic intensity to hide the memory access latency was

designed to partition the computation task to the heterogeneous system of

CPU and GPU even to the threads in GPU The proposed method

dramatically improves the matching efficiency without compromising

recognition accuracy

62 REFERENCES

[1] C W Oyster The Human Eye Structure and Function Sunderland

Sinauer Associates 1999

[2] C Cuevas D Berjon F Moran and N Garcia ldquoMoving object

detection for real-time augmented reality applications in a GPGPUrdquo IEEE

Trans Consum Electron vol 58 no 1 pp 117ndash125 Feb 2012

[3] D C Cirean U Meier L M Gambardella and J Schmidhuber ldquoDeep

big simple neural nets for handwritten digit recognitionrdquo Neural Comput

vol 22 no 12 pp 3207ndash3220 2010

[4] F Z Sakr M Taher and A M Wahba ldquoHigh performance iris

recognition system on GPUrdquo in Proc ICCES 2011 pp 237ndash242

[5] G Poli J H Saito J F Mari and M R Zorzan ldquoProcessing

neocognitron of face recognition on high performance environment based

on GPU with CUDA architecturerdquo in Proc 20th Int Symp Comput

Archit High Perform Comput 2008 pp 81ndash88

[6] J Antikainen J Havel R Josth A Herout P Zemcik and M Hauta-

Kasari ldquoNonnegative tensor factorization accelerated using GPGPUrdquo IEEE

Trans Parallel Distrib Syst vol 22 no 7 pp 1135ndash1141 Feb 2011

[7] K-S Oh and K Jung ldquoGPU implementation of neural networksrdquo

Pattern Recognit vol 37 no 6 pp 1311ndash1314 2004

[8] P R Dixon T Oonishi and S Furui ldquoHarnessing graphics processors

for the fast computation of acoustic likelihoods in speech recognitionrdquo

Comput Speech Lang vol 23 no 4 pp 510ndash526 2009

[9] P Kaufman and A Alm ldquoClinical applicationrdquo Adlerrsquos Physiology of

the Eye 2003

[10] R N Rakvic B J Ulis R P Broussard R W Ives and N Steiner

ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics Security vol 4

no 4 pp 812ndash823 Dec 2009

[11] S Crihalmeanu and A Ross ldquoMultispectral scleral patterns for ocular

biometric recognitionrdquo Pattern Recognit Lett vol 33 no 14 pp 1860ndash

1869 Oct 2012

[12] W Wenying Z Dongming Z Yongdong L Jintao and G

Xiaoguang ldquoRobust spatial matching for object retrieval and its parallel

implementation on GPUrdquo IEEE Trans Multimedia vol 13 no 6 pp

1308ndash1318 Dec 2011Multimedia Sec Magdeburg Germany Sep 2004

pp 4ndash15

[13] Y Xu S Deka and R Righetti ldquoA hybrid CPU-GPGPU approach for

real-time elastographyrdquo IEEE Trans Ultrason Ferroelectr Freq Control

vol 58 no 12 pp 2631ndash2645 Dec 2011

[14] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive

multimodal eye recognitionrdquo Signal Image Video Process vol 7 no 4

pp 619ndash631 Jul 2013

[15] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive

approach for sclera image quality measurerdquo Int J Biometrics vol 5 no 2

pp 181ndash198 2013

[16] Z Zhou E Y Du N L Thomas and E J Delp ldquoA new human

identification method Sclera recognitionrdquo IEEE Trans Syst Man

Cybern A Syst Humans vol 42 no 3 pp 571ndash583 May 2012

Page 64: Major project report on

MATLAB lets you execute commands or groups of commands one

at a time without compiling and linking enabling you to quickly iterate to

the optimal solution For fast execution of heavy matrix and vector

computations MATLAB uses processor-optimized libraries For general-

purpose scalar computations MATLAB generates machine-code

instructions using its JIT (Just-In-Time) compilation technology

This technology which is available on most platforms provides

execution speeds that rival those of traditional programming languages

Development Tools

MATLAB includes development tools that help you implement

your algorithm efficiently These include the following

MATLAB Editor

Provides standard editing and debugging features such as setting

breakpoints and single stepping

Code Analyzer

Checks your code for problems and recommends modifications to

maximize performance and maintainability

MATLAB Profiler

Records the time spent executing each line of code

Directory Reports

Scan all the files in a directory and report on code efficiency file

differences file dependencies and code coverage

Designing Graphical User Interfaces

By using the interactive tool GUIDE (Graphical User Interface

Development Environment) to layout design and edit user interfaces

GUIDE lets you include list boxes pull-down menus push buttons radio

buttons and sliders as well as MATLAB plots and Microsoft

ActiveXreg controls Alternatively you can create GUIs programmatically

using MATLAB functions

322 ANALYZING AND ACCESSING DATA

MATLAB supports the entire data analysis process from acquiring

data from external devices and databases through preprocessing

visualization and numerical analysis to producing presentation-quality

output

Data Analysis

MATLAB provides interactive tools and command-line functions for data

analysis operations including

Interpolating and decimating

Extracting sections of data scaling and averaging

Thresholding and smoothing

Correlation Fourier analysis and filtering

1-D peak valley and zero finding

Basic statistics and curve fitting

Matrix analysis

Data Access

MATLAB is an efficient platform for accessing data from

files other applications databases and external devices You can read data

from popular file formats such as Microsoft Excel ASCII text or binary

files image sound and video files and scientific files such as HDF and

HDF5 Low-level binary file IO functions let you work with data files in

any format Additional functions let you read data from Web pages and

XML

Visualizing Data

All the graphics features that are required to visualize engineering

and scientific data are available in MATLAB These include 2-D and 3-D

plotting functions 3-D volume visualization functions tools for

interactively creating plots and the ability to export results to all popular

graphics formats You can customize plots by adding multiple axes

changing line colors and markers adding annotation Latex equations and

legends and drawing shapes

2-D Plotting

Visualizing vectors of data with 2-D plotting functions that create

Line area bar and pie charts

Direction and velocity plots

Histograms

Polygons and surfaces

Scatterbubble plots

Animations

3-D Plotting and Volume Visualization

MATLAB provides functions for visualizing 2-D matrices 3-

D scalar and 3-D vector data You can use these functions to visualize and

understand large often complex multidimensional data Specifying plot

characteristics such as camera viewing angle perspective lighting effect

light source locations and transparency

3-D plotting functions include

Surface contour and mesh

Image plots

Cone slice stream and isosurface

323 PERFORMING NUMERIC COMPUTATION

MATLAB contains mathematical statistical and engineering

functions to support all common engineering and science operations These

functions developed by experts in mathematics are the foundation of the

MATLAB language The core math functions use the LAPACK and BLAS

linear algebra subroutine libraries and the FFTW Discrete Fourier

Transform library Because these processor-dependent libraries are

optimized to the different platforms that MATLAB supports they execute

faster than the equivalent C or C++ code

MATLAB provides the following types of functions for performing

mathematical operations and analyzing data

Matrix manipulation and linear algebra

Polynomials and interpolation

Fourier analysis and filtering

Data analysis and statistics

Optimization and numerical integration

Ordinary differential equations (ODEs)

Partial differential equations (PDEs)

Sparse matrix operations

MATLAB can perform arithmetic on a wide range of data types

including doubles singles and integers

CHAPTER 4

IMPLEMENTATION

41 GENERAL

Matlab is a program that was originally designed to simplify the

implementation of numerical linear algebra routines It has since grown into

something much bigger and it is used to implement numerical algorithms

for a wide range of applications The basic language used is very similar to

standard linear algebra notation but there are a few extensions that will

likely cause you some problems at first

42 SNAPSHOTS

ORIGINAL SCLERA IMAGE IS CONVERTED INTO GREY SCALE IMAGE

FIG

GREY SCALE IMAGE IS CONVERTED INTO BINARY IMAGE

FIG

EDGE DETECTON IS DONE BY OTSUrsquoS THRESHOLDING

FIG

SELECTING THE REGION OF INTEREST (SCLERA PART)

FIG

SELECTED ROI PART

FIG

FIG

ENHANCEMENT OF SCLERA IMAGE

FIG

FEATURE EXTRACTION OF SCLERA IMAGE USING GABOR

FILTERS

FIG

MATCHING WITH IMAGES IN DATABASE

FIG

DISPLAYING THE RESULT (MATCHED OR NOT MATCHED)

FIG

CHAPTER 5

APPLICATIONS

The applications of biometrics can be divided into the following three main groups

Commercial applications such as computer network login electronic data security ecommerce Internet access ATM credit card physical access control cellular phone PDA medical records management distance learning etc

Government applications such as national ID card correctional facility driverrsquos license social security welfare-disbursement border control Passport control etc

Forensic applications such as corpse identification criminal investigation terrorist identification parenthood determination missing children etc Traditionally commercial applications have used knowledge-based systems (eg PIN sand passwords) government applications have used token-based systems (eg ID cards and badges) and forensic applications have relied on human experts to match biometric features Biometric systems are being increasingly deployed in large scale civilian applications The Schiphol Premium scheme at the Amsterdam airport for example employs iris scan cards to speed up the passport and visa control procedures

CHAPTER 6

CONCLUSION AND FUTURE SCOPE

61 CONCLUSION

In this paper we proposed a new parallel sclera vein recognition

method which employees a two stage parallel approach for registration and

matching Even though the research focused on developing a parallel sclera

matching solution for the sequential line-descriptor method using CUDA

GPU architecture the parallel strategies developed in this research can be

applied to design parallel solutions to other sclera vein recognition methods

and general pattern recognition methods We designed the Y shape

descriptor to narrow the search range to increase the matching efficiency

which is a new feature extraction method to take advantage of the GPU

structures We developed the WPL descriptor to incorporate mask

information and make it more suitable for parallel computing which can

dramatically reduce data transferring and computation We then carefully

mapped our algorithms to GPU threads and blocks which is an important

step to achieve parallel computation efficiency using a GPU A work flow

which has high arithmetic intensity to hide the memory access latency was

designed to partition the computation task to the heterogeneous system of

CPU and GPU even to the threads in GPU The proposed method

dramatically improves the matching efficiency without compromising

recognition accuracy

62 REFERENCES

[1] C W Oyster The Human Eye Structure and Function Sunderland

Sinauer Associates 1999

[2] C Cuevas D Berjon F Moran and N Garcia ldquoMoving object

detection for real-time augmented reality applications in a GPGPUrdquo IEEE

Trans Consum Electron vol 58 no 1 pp 117ndash125 Feb 2012

[3] D C Cirean U Meier L M Gambardella and J Schmidhuber ldquoDeep

big simple neural nets for handwritten digit recognitionrdquo Neural Comput

vol 22 no 12 pp 3207ndash3220 2010

[4] F Z Sakr M Taher and A M Wahba ldquoHigh performance iris

recognition system on GPUrdquo in Proc ICCES 2011 pp 237ndash242

[5] G Poli J H Saito J F Mari and M R Zorzan ldquoProcessing

neocognitron of face recognition on high performance environment based

on GPU with CUDA architecturerdquo in Proc 20th Int Symp Comput

Archit High Perform Comput 2008 pp 81ndash88

[6] J Antikainen J Havel R Josth A Herout P Zemcik and M Hauta-

Kasari ldquoNonnegative tensor factorization accelerated using GPGPUrdquo IEEE

Trans Parallel Distrib Syst vol 22 no 7 pp 1135ndash1141 Feb 2011

[7] K-S Oh and K Jung ldquoGPU implementation of neural networksrdquo

Pattern Recognit vol 37 no 6 pp 1311ndash1314 2004

[8] P R Dixon T Oonishi and S Furui ldquoHarnessing graphics processors

for the fast computation of acoustic likelihoods in speech recognitionrdquo

Comput Speech Lang vol 23 no 4 pp 510ndash526 2009

[9] P Kaufman and A Alm ldquoClinical applicationrdquo Adlerrsquos Physiology of

the Eye 2003

[10] R N Rakvic B J Ulis R P Broussard R W Ives and N Steiner

ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics Security vol 4

no 4 pp 812ndash823 Dec 2009

[11] S Crihalmeanu and A Ross ldquoMultispectral scleral patterns for ocular

biometric recognitionrdquo Pattern Recognit Lett vol 33 no 14 pp 1860ndash

1869 Oct 2012

[12] W Wenying Z Dongming Z Yongdong L Jintao and G

Xiaoguang ldquoRobust spatial matching for object retrieval and its parallel

implementation on GPUrdquo IEEE Trans Multimedia vol 13 no 6 pp

1308ndash1318 Dec 2011Multimedia Sec Magdeburg Germany Sep 2004

pp 4ndash15

[13] Y Xu S Deka and R Righetti ldquoA hybrid CPU-GPGPU approach for

real-time elastographyrdquo IEEE Trans Ultrason Ferroelectr Freq Control

vol 58 no 12 pp 2631ndash2645 Dec 2011

[14] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive

multimodal eye recognitionrdquo Signal Image Video Process vol 7 no 4

pp 619ndash631 Jul 2013

[15] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive

approach for sclera image quality measurerdquo Int J Biometrics vol 5 no 2

pp 181ndash198 2013

[16] Z Zhou E Y Du N L Thomas and E J Delp ldquoA new human

identification method Sclera recognitionrdquo IEEE Trans Syst Man

Cybern A Syst Humans vol 42 no 3 pp 571ndash583 May 2012

Page 65: Major project report on

buttons and sliders as well as MATLAB plots and Microsoft

ActiveXreg controls Alternatively you can create GUIs programmatically

using MATLAB functions

322 ANALYZING AND ACCESSING DATA

MATLAB supports the entire data analysis process from acquiring

data from external devices and databases through preprocessing

visualization and numerical analysis to producing presentation-quality

output

Data Analysis

MATLAB provides interactive tools and command-line functions for data

analysis operations including

Interpolating and decimating

Extracting sections of data scaling and averaging

Thresholding and smoothing

Correlation Fourier analysis and filtering

1-D peak valley and zero finding

Basic statistics and curve fitting

Matrix analysis

Data Access

MATLAB is an efficient platform for accessing data from

files other applications databases and external devices You can read data

from popular file formats such as Microsoft Excel ASCII text or binary

files image sound and video files and scientific files such as HDF and

HDF5 Low-level binary file IO functions let you work with data files in

any format Additional functions let you read data from Web pages and

XML

Visualizing Data

All the graphics features that are required to visualize engineering

and scientific data are available in MATLAB These include 2-D and 3-D

plotting functions 3-D volume visualization functions tools for

interactively creating plots and the ability to export results to all popular

graphics formats You can customize plots by adding multiple axes

changing line colors and markers adding annotation Latex equations and

legends and drawing shapes

2-D Plotting

Visualizing vectors of data with 2-D plotting functions that create

Line area bar and pie charts

Direction and velocity plots

Histograms

Polygons and surfaces

Scatterbubble plots

Animations

3-D Plotting and Volume Visualization

MATLAB provides functions for visualizing 2-D matrices 3-

D scalar and 3-D vector data You can use these functions to visualize and

understand large often complex multidimensional data Specifying plot

characteristics such as camera viewing angle perspective lighting effect

light source locations and transparency

3-D plotting functions include

Surface contour and mesh

Image plots

Cone slice stream and isosurface

323 PERFORMING NUMERIC COMPUTATION

MATLAB contains mathematical statistical and engineering

functions to support all common engineering and science operations These

functions developed by experts in mathematics are the foundation of the

MATLAB language The core math functions use the LAPACK and BLAS

linear algebra subroutine libraries and the FFTW Discrete Fourier

Transform library Because these processor-dependent libraries are

optimized to the different platforms that MATLAB supports they execute

faster than the equivalent C or C++ code

MATLAB provides the following types of functions for performing

mathematical operations and analyzing data

Matrix manipulation and linear algebra

Polynomials and interpolation

Fourier analysis and filtering

Data analysis and statistics

Optimization and numerical integration

Ordinary differential equations (ODEs)

Partial differential equations (PDEs)

Sparse matrix operations

MATLAB can perform arithmetic on a wide range of data types

including doubles singles and integers

CHAPTER 4

IMPLEMENTATION

41 GENERAL

Matlab is a program that was originally designed to simplify the

implementation of numerical linear algebra routines It has since grown into

something much bigger and it is used to implement numerical algorithms

for a wide range of applications The basic language used is very similar to

standard linear algebra notation but there are a few extensions that will

likely cause you some problems at first

42 SNAPSHOTS

ORIGINAL SCLERA IMAGE IS CONVERTED INTO GREY SCALE IMAGE

FIG

GREY SCALE IMAGE IS CONVERTED INTO BINARY IMAGE

FIG

EDGE DETECTON IS DONE BY OTSUrsquoS THRESHOLDING

FIG

SELECTING THE REGION OF INTEREST (SCLERA PART)

FIG

SELECTED ROI PART

FIG

FIG

ENHANCEMENT OF SCLERA IMAGE

FIG

FEATURE EXTRACTION OF SCLERA IMAGE USING GABOR

FILTERS

FIG

MATCHING WITH IMAGES IN DATABASE

FIG

DISPLAYING THE RESULT (MATCHED OR NOT MATCHED)

FIG

CHAPTER 5

APPLICATIONS

The applications of biometrics can be divided into the following three main groups

Commercial applications such as computer network login electronic data security ecommerce Internet access ATM credit card physical access control cellular phone PDA medical records management distance learning etc

Government applications such as national ID card correctional facility driverrsquos license social security welfare-disbursement border control Passport control etc

Forensic applications such as corpse identification criminal investigation terrorist identification parenthood determination missing children etc Traditionally commercial applications have used knowledge-based systems (eg PIN sand passwords) government applications have used token-based systems (eg ID cards and badges) and forensic applications have relied on human experts to match biometric features Biometric systems are being increasingly deployed in large scale civilian applications The Schiphol Premium scheme at the Amsterdam airport for example employs iris scan cards to speed up the passport and visa control procedures

CHAPTER 6

CONCLUSION AND FUTURE SCOPE

61 CONCLUSION

In this paper we proposed a new parallel sclera vein recognition

method which employees a two stage parallel approach for registration and

matching Even though the research focused on developing a parallel sclera

matching solution for the sequential line-descriptor method using CUDA

GPU architecture the parallel strategies developed in this research can be

applied to design parallel solutions to other sclera vein recognition methods

and general pattern recognition methods We designed the Y shape

descriptor to narrow the search range to increase the matching efficiency

which is a new feature extraction method to take advantage of the GPU

structures We developed the WPL descriptor to incorporate mask

information and make it more suitable for parallel computing which can

dramatically reduce data transferring and computation We then carefully

mapped our algorithms to GPU threads and blocks which is an important

step to achieve parallel computation efficiency using a GPU A work flow

which has high arithmetic intensity to hide the memory access latency was

designed to partition the computation task to the heterogeneous system of

CPU and GPU even to the threads in GPU The proposed method

dramatically improves the matching efficiency without compromising

recognition accuracy

62 REFERENCES

[1] C W Oyster The Human Eye Structure and Function Sunderland

Sinauer Associates 1999

[2] C Cuevas D Berjon F Moran and N Garcia ldquoMoving object

detection for real-time augmented reality applications in a GPGPUrdquo IEEE

Trans Consum Electron vol 58 no 1 pp 117ndash125 Feb 2012

[3] D C Cirean U Meier L M Gambardella and J Schmidhuber ldquoDeep

big simple neural nets for handwritten digit recognitionrdquo Neural Comput

vol 22 no 12 pp 3207ndash3220 2010

[4] F Z Sakr M Taher and A M Wahba ldquoHigh performance iris

recognition system on GPUrdquo in Proc ICCES 2011 pp 237ndash242

[5] G Poli J H Saito J F Mari and M R Zorzan ldquoProcessing

neocognitron of face recognition on high performance environment based

on GPU with CUDA architecturerdquo in Proc 20th Int Symp Comput

Archit High Perform Comput 2008 pp 81ndash88

[6] J Antikainen J Havel R Josth A Herout P Zemcik and M Hauta-

Kasari ldquoNonnegative tensor factorization accelerated using GPGPUrdquo IEEE

Trans Parallel Distrib Syst vol 22 no 7 pp 1135ndash1141 Feb 2011

[7] K-S Oh and K Jung ldquoGPU implementation of neural networksrdquo

Pattern Recognit vol 37 no 6 pp 1311ndash1314 2004

[8] P R Dixon T Oonishi and S Furui ldquoHarnessing graphics processors

for the fast computation of acoustic likelihoods in speech recognitionrdquo

Comput Speech Lang vol 23 no 4 pp 510ndash526 2009

[9] P Kaufman and A Alm ldquoClinical applicationrdquo Adlerrsquos Physiology of

the Eye 2003

[10] R N Rakvic B J Ulis R P Broussard R W Ives and N Steiner

ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics Security vol 4

no 4 pp 812ndash823 Dec 2009

[11] S Crihalmeanu and A Ross ldquoMultispectral scleral patterns for ocular

biometric recognitionrdquo Pattern Recognit Lett vol 33 no 14 pp 1860ndash

1869 Oct 2012

[12] W Wenying Z Dongming Z Yongdong L Jintao and G

Xiaoguang ldquoRobust spatial matching for object retrieval and its parallel

implementation on GPUrdquo IEEE Trans Multimedia vol 13 no 6 pp

1308ndash1318 Dec 2011Multimedia Sec Magdeburg Germany Sep 2004

pp 4ndash15

[13] Y Xu S Deka and R Righetti ldquoA hybrid CPU-GPGPU approach for

real-time elastographyrdquo IEEE Trans Ultrason Ferroelectr Freq Control

vol 58 no 12 pp 2631ndash2645 Dec 2011

[14] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive

multimodal eye recognitionrdquo Signal Image Video Process vol 7 no 4

pp 619ndash631 Jul 2013

[15] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive

approach for sclera image quality measurerdquo Int J Biometrics vol 5 no 2

pp 181ndash198 2013

[16] Z Zhou E Y Du N L Thomas and E J Delp ldquoA new human

identification method Sclera recognitionrdquo IEEE Trans Syst Man

Cybern A Syst Humans vol 42 no 3 pp 571ndash583 May 2012

Page 66: Major project report on

interactively creating plots and the ability to export results to all popular

graphics formats You can customize plots by adding multiple axes

changing line colors and markers adding annotation Latex equations and

legends and drawing shapes

2-D Plotting

Visualizing vectors of data with 2-D plotting functions that create

Line area bar and pie charts

Direction and velocity plots

Histograms

Polygons and surfaces

Scatterbubble plots

Animations

3-D Plotting and Volume Visualization

MATLAB provides functions for visualizing 2-D matrices 3-

D scalar and 3-D vector data You can use these functions to visualize and

understand large often complex multidimensional data Specifying plot

characteristics such as camera viewing angle perspective lighting effect

light source locations and transparency

3-D plotting functions include

Surface contour and mesh

Image plots

Cone slice stream and isosurface

323 PERFORMING NUMERIC COMPUTATION

MATLAB contains mathematical statistical and engineering

functions to support all common engineering and science operations These

functions developed by experts in mathematics are the foundation of the

MATLAB language The core math functions use the LAPACK and BLAS

linear algebra subroutine libraries and the FFTW Discrete Fourier

Transform library Because these processor-dependent libraries are

optimized to the different platforms that MATLAB supports they execute

faster than the equivalent C or C++ code

MATLAB provides the following types of functions for performing

mathematical operations and analyzing data

Matrix manipulation and linear algebra

Polynomials and interpolation

Fourier analysis and filtering

Data analysis and statistics

Optimization and numerical integration

Ordinary differential equations (ODEs)

Partial differential equations (PDEs)

Sparse matrix operations

MATLAB can perform arithmetic on a wide range of data types

including doubles singles and integers

CHAPTER 4

IMPLEMENTATION

41 GENERAL

Matlab is a program that was originally designed to simplify the

implementation of numerical linear algebra routines It has since grown into

something much bigger and it is used to implement numerical algorithms

for a wide range of applications The basic language used is very similar to

standard linear algebra notation but there are a few extensions that will

likely cause you some problems at first

42 SNAPSHOTS

ORIGINAL SCLERA IMAGE IS CONVERTED INTO GREY SCALE IMAGE

FIG

GREY SCALE IMAGE IS CONVERTED INTO BINARY IMAGE

FIG

EDGE DETECTON IS DONE BY OTSUrsquoS THRESHOLDING

FIG

SELECTING THE REGION OF INTEREST (SCLERA PART)

FIG

SELECTED ROI PART

FIG

FIG

ENHANCEMENT OF SCLERA IMAGE

FIG

FEATURE EXTRACTION OF SCLERA IMAGE USING GABOR

FILTERS

FIG

MATCHING WITH IMAGES IN DATABASE

FIG

DISPLAYING THE RESULT (MATCHED OR NOT MATCHED)

FIG

CHAPTER 5

APPLICATIONS

The applications of biometrics can be divided into the following three main groups

Commercial applications such as computer network login electronic data security ecommerce Internet access ATM credit card physical access control cellular phone PDA medical records management distance learning etc

Government applications such as national ID card correctional facility driverrsquos license social security welfare-disbursement border control Passport control etc

Forensic applications such as corpse identification criminal investigation terrorist identification parenthood determination missing children etc Traditionally commercial applications have used knowledge-based systems (eg PIN sand passwords) government applications have used token-based systems (eg ID cards and badges) and forensic applications have relied on human experts to match biometric features Biometric systems are being increasingly deployed in large scale civilian applications The Schiphol Premium scheme at the Amsterdam airport for example employs iris scan cards to speed up the passport and visa control procedures

CHAPTER 6

CONCLUSION AND FUTURE SCOPE

61 CONCLUSION

In this paper we proposed a new parallel sclera vein recognition

method which employees a two stage parallel approach for registration and

matching Even though the research focused on developing a parallel sclera

matching solution for the sequential line-descriptor method using CUDA

GPU architecture the parallel strategies developed in this research can be

applied to design parallel solutions to other sclera vein recognition methods

and general pattern recognition methods We designed the Y shape

descriptor to narrow the search range to increase the matching efficiency

which is a new feature extraction method to take advantage of the GPU

structures We developed the WPL descriptor to incorporate mask

information and make it more suitable for parallel computing which can

dramatically reduce data transferring and computation We then carefully

mapped our algorithms to GPU threads and blocks which is an important

step to achieve parallel computation efficiency using a GPU A work flow

which has high arithmetic intensity to hide the memory access latency was

designed to partition the computation task to the heterogeneous system of

CPU and GPU even to the threads in GPU The proposed method

dramatically improves the matching efficiency without compromising

recognition accuracy

62 REFERENCES

[1] C W Oyster The Human Eye Structure and Function Sunderland

Sinauer Associates 1999

[2] C Cuevas D Berjon F Moran and N Garcia ldquoMoving object

detection for real-time augmented reality applications in a GPGPUrdquo IEEE

Trans Consum Electron vol 58 no 1 pp 117ndash125 Feb 2012

[3] D C Cirean U Meier L M Gambardella and J Schmidhuber ldquoDeep

big simple neural nets for handwritten digit recognitionrdquo Neural Comput

vol 22 no 12 pp 3207ndash3220 2010

[4] F Z Sakr M Taher and A M Wahba ldquoHigh performance iris

recognition system on GPUrdquo in Proc ICCES 2011 pp 237ndash242

[5] G Poli J H Saito J F Mari and M R Zorzan ldquoProcessing

neocognitron of face recognition on high performance environment based

on GPU with CUDA architecturerdquo in Proc 20th Int Symp Comput

Archit High Perform Comput 2008 pp 81ndash88

[6] J Antikainen J Havel R Josth A Herout P Zemcik and M Hauta-

Kasari ldquoNonnegative tensor factorization accelerated using GPGPUrdquo IEEE

Trans Parallel Distrib Syst vol 22 no 7 pp 1135ndash1141 Feb 2011

[7] K-S Oh and K Jung ldquoGPU implementation of neural networksrdquo

Pattern Recognit vol 37 no 6 pp 1311ndash1314 2004

[8] P R Dixon T Oonishi and S Furui ldquoHarnessing graphics processors

for the fast computation of acoustic likelihoods in speech recognitionrdquo

Comput Speech Lang vol 23 no 4 pp 510ndash526 2009

[9] P Kaufman and A Alm ldquoClinical applicationrdquo Adlerrsquos Physiology of

the Eye 2003

[10] R N Rakvic B J Ulis R P Broussard R W Ives and N Steiner

ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics Security vol 4

no 4 pp 812ndash823 Dec 2009

[11] S Crihalmeanu and A Ross ldquoMultispectral scleral patterns for ocular

biometric recognitionrdquo Pattern Recognit Lett vol 33 no 14 pp 1860ndash

1869 Oct 2012

[12] W Wenying Z Dongming Z Yongdong L Jintao and G

Xiaoguang ldquoRobust spatial matching for object retrieval and its parallel

implementation on GPUrdquo IEEE Trans Multimedia vol 13 no 6 pp

1308ndash1318 Dec 2011Multimedia Sec Magdeburg Germany Sep 2004

pp 4ndash15

[13] Y Xu S Deka and R Righetti ldquoA hybrid CPU-GPGPU approach for

real-time elastographyrdquo IEEE Trans Ultrason Ferroelectr Freq Control

vol 58 no 12 pp 2631ndash2645 Dec 2011

[14] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive

multimodal eye recognitionrdquo Signal Image Video Process vol 7 no 4

pp 619ndash631 Jul 2013

[15] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive

approach for sclera image quality measurerdquo Int J Biometrics vol 5 no 2

pp 181ndash198 2013

[16] Z Zhou E Y Du N L Thomas and E J Delp ldquoA new human

identification method Sclera recognitionrdquo IEEE Trans Syst Man

Cybern A Syst Humans vol 42 no 3 pp 571ndash583 May 2012

Page 67: Major project report on

MATLAB provides the following types of functions for performing

mathematical operations and analyzing data

Matrix manipulation and linear algebra

Polynomials and interpolation

Fourier analysis and filtering

Data analysis and statistics

Optimization and numerical integration

Ordinary differential equations (ODEs)

Partial differential equations (PDEs)

Sparse matrix operations

MATLAB can perform arithmetic on a wide range of data types

including doubles singles and integers

CHAPTER 4

IMPLEMENTATION

41 GENERAL

Matlab is a program that was originally designed to simplify the

implementation of numerical linear algebra routines It has since grown into

something much bigger and it is used to implement numerical algorithms

for a wide range of applications The basic language used is very similar to

standard linear algebra notation but there are a few extensions that will

likely cause you some problems at first

42 SNAPSHOTS

ORIGINAL SCLERA IMAGE IS CONVERTED INTO GREY SCALE IMAGE

FIG

GREY SCALE IMAGE IS CONVERTED INTO BINARY IMAGE

FIG

EDGE DETECTON IS DONE BY OTSUrsquoS THRESHOLDING

FIG

SELECTING THE REGION OF INTEREST (SCLERA PART)

FIG

SELECTED ROI PART

FIG

FIG

ENHANCEMENT OF SCLERA IMAGE

FIG

FEATURE EXTRACTION OF SCLERA IMAGE USING GABOR

FILTERS

FIG

MATCHING WITH IMAGES IN DATABASE

FIG

DISPLAYING THE RESULT (MATCHED OR NOT MATCHED)

FIG

CHAPTER 5

APPLICATIONS

The applications of biometrics can be divided into the following three main groups

Commercial applications such as computer network login electronic data security ecommerce Internet access ATM credit card physical access control cellular phone PDA medical records management distance learning etc

Government applications such as national ID card correctional facility driverrsquos license social security welfare-disbursement border control Passport control etc

Forensic applications such as corpse identification criminal investigation terrorist identification parenthood determination missing children etc Traditionally commercial applications have used knowledge-based systems (eg PIN sand passwords) government applications have used token-based systems (eg ID cards and badges) and forensic applications have relied on human experts to match biometric features Biometric systems are being increasingly deployed in large scale civilian applications The Schiphol Premium scheme at the Amsterdam airport for example employs iris scan cards to speed up the passport and visa control procedures

CHAPTER 6

CONCLUSION AND FUTURE SCOPE

61 CONCLUSION

In this paper we proposed a new parallel sclera vein recognition

method which employees a two stage parallel approach for registration and

matching Even though the research focused on developing a parallel sclera

matching solution for the sequential line-descriptor method using CUDA

GPU architecture the parallel strategies developed in this research can be

applied to design parallel solutions to other sclera vein recognition methods

and general pattern recognition methods We designed the Y shape

descriptor to narrow the search range to increase the matching efficiency

which is a new feature extraction method to take advantage of the GPU

structures We developed the WPL descriptor to incorporate mask

information and make it more suitable for parallel computing which can

dramatically reduce data transferring and computation We then carefully

mapped our algorithms to GPU threads and blocks which is an important

step to achieve parallel computation efficiency using a GPU A work flow

which has high arithmetic intensity to hide the memory access latency was

designed to partition the computation task to the heterogeneous system of

CPU and GPU even to the threads in GPU The proposed method

dramatically improves the matching efficiency without compromising

recognition accuracy

62 REFERENCES

[1] C W Oyster The Human Eye Structure and Function Sunderland

Sinauer Associates 1999

[2] C Cuevas D Berjon F Moran and N Garcia ldquoMoving object

detection for real-time augmented reality applications in a GPGPUrdquo IEEE

Trans Consum Electron vol 58 no 1 pp 117ndash125 Feb 2012

[3] D C Cirean U Meier L M Gambardella and J Schmidhuber ldquoDeep

big simple neural nets for handwritten digit recognitionrdquo Neural Comput

vol 22 no 12 pp 3207ndash3220 2010

[4] F Z Sakr M Taher and A M Wahba ldquoHigh performance iris

recognition system on GPUrdquo in Proc ICCES 2011 pp 237ndash242

[5] G Poli J H Saito J F Mari and M R Zorzan ldquoProcessing

neocognitron of face recognition on high performance environment based

on GPU with CUDA architecturerdquo in Proc 20th Int Symp Comput

Archit High Perform Comput 2008 pp 81ndash88

[6] J Antikainen J Havel R Josth A Herout P Zemcik and M Hauta-

Kasari ldquoNonnegative tensor factorization accelerated using GPGPUrdquo IEEE

Trans Parallel Distrib Syst vol 22 no 7 pp 1135ndash1141 Feb 2011

[7] K-S Oh and K Jung ldquoGPU implementation of neural networksrdquo

Pattern Recognit vol 37 no 6 pp 1311ndash1314 2004

[8] P R Dixon T Oonishi and S Furui ldquoHarnessing graphics processors

for the fast computation of acoustic likelihoods in speech recognitionrdquo

Comput Speech Lang vol 23 no 4 pp 510ndash526 2009

[9] P Kaufman and A Alm ldquoClinical applicationrdquo Adlerrsquos Physiology of

the Eye 2003

[10] R N Rakvic B J Ulis R P Broussard R W Ives and N Steiner

ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics Security vol 4

no 4 pp 812ndash823 Dec 2009

[11] S Crihalmeanu and A Ross ldquoMultispectral scleral patterns for ocular

biometric recognitionrdquo Pattern Recognit Lett vol 33 no 14 pp 1860ndash

1869 Oct 2012

[12] W Wenying Z Dongming Z Yongdong L Jintao and G

Xiaoguang ldquoRobust spatial matching for object retrieval and its parallel

implementation on GPUrdquo IEEE Trans Multimedia vol 13 no 6 pp

1308ndash1318 Dec 2011Multimedia Sec Magdeburg Germany Sep 2004

pp 4ndash15

[13] Y Xu S Deka and R Righetti ldquoA hybrid CPU-GPGPU approach for

real-time elastographyrdquo IEEE Trans Ultrason Ferroelectr Freq Control

vol 58 no 12 pp 2631ndash2645 Dec 2011

[14] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive

multimodal eye recognitionrdquo Signal Image Video Process vol 7 no 4

pp 619ndash631 Jul 2013

[15] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive

approach for sclera image quality measurerdquo Int J Biometrics vol 5 no 2

pp 181ndash198 2013

[16] Z Zhou E Y Du N L Thomas and E J Delp ldquoA new human

identification method Sclera recognitionrdquo IEEE Trans Syst Man

Cybern A Syst Humans vol 42 no 3 pp 571ndash583 May 2012

Page 68: Major project report on

CHAPTER 4

IMPLEMENTATION

41 GENERAL

Matlab is a program that was originally designed to simplify the

implementation of numerical linear algebra routines It has since grown into

something much bigger and it is used to implement numerical algorithms

for a wide range of applications The basic language used is very similar to

standard linear algebra notation but there are a few extensions that will

likely cause you some problems at first

42 SNAPSHOTS

ORIGINAL SCLERA IMAGE IS CONVERTED INTO GREY SCALE IMAGE

FIG

GREY SCALE IMAGE IS CONVERTED INTO BINARY IMAGE

FIG

EDGE DETECTON IS DONE BY OTSUrsquoS THRESHOLDING

FIG

SELECTING THE REGION OF INTEREST (SCLERA PART)

FIG

SELECTED ROI PART

FIG

FIG

ENHANCEMENT OF SCLERA IMAGE

FIG

FEATURE EXTRACTION OF SCLERA IMAGE USING GABOR

FILTERS

FIG

MATCHING WITH IMAGES IN DATABASE

FIG

DISPLAYING THE RESULT (MATCHED OR NOT MATCHED)

FIG

CHAPTER 5

APPLICATIONS

The applications of biometrics can be divided into the following three main groups

Commercial applications such as computer network login electronic data security ecommerce Internet access ATM credit card physical access control cellular phone PDA medical records management distance learning etc

Government applications such as national ID card correctional facility driverrsquos license social security welfare-disbursement border control Passport control etc

Forensic applications such as corpse identification criminal investigation terrorist identification parenthood determination missing children etc Traditionally commercial applications have used knowledge-based systems (eg PIN sand passwords) government applications have used token-based systems (eg ID cards and badges) and forensic applications have relied on human experts to match biometric features Biometric systems are being increasingly deployed in large scale civilian applications The Schiphol Premium scheme at the Amsterdam airport for example employs iris scan cards to speed up the passport and visa control procedures

CHAPTER 6

CONCLUSION AND FUTURE SCOPE

61 CONCLUSION

In this paper we proposed a new parallel sclera vein recognition

method which employees a two stage parallel approach for registration and

matching Even though the research focused on developing a parallel sclera

matching solution for the sequential line-descriptor method using CUDA

GPU architecture the parallel strategies developed in this research can be

applied to design parallel solutions to other sclera vein recognition methods

and general pattern recognition methods We designed the Y shape

descriptor to narrow the search range to increase the matching efficiency

which is a new feature extraction method to take advantage of the GPU

structures We developed the WPL descriptor to incorporate mask

information and make it more suitable for parallel computing which can

dramatically reduce data transferring and computation We then carefully

mapped our algorithms to GPU threads and blocks which is an important

step to achieve parallel computation efficiency using a GPU A work flow

which has high arithmetic intensity to hide the memory access latency was

designed to partition the computation task to the heterogeneous system of

CPU and GPU even to the threads in GPU The proposed method

dramatically improves the matching efficiency without compromising

recognition accuracy

62 REFERENCES

[1] C W Oyster The Human Eye Structure and Function Sunderland

Sinauer Associates 1999

[2] C Cuevas D Berjon F Moran and N Garcia ldquoMoving object

detection for real-time augmented reality applications in a GPGPUrdquo IEEE

Trans Consum Electron vol 58 no 1 pp 117ndash125 Feb 2012

[3] D C Cirean U Meier L M Gambardella and J Schmidhuber ldquoDeep

big simple neural nets for handwritten digit recognitionrdquo Neural Comput

vol 22 no 12 pp 3207ndash3220 2010

[4] F Z Sakr M Taher and A M Wahba ldquoHigh performance iris

recognition system on GPUrdquo in Proc ICCES 2011 pp 237ndash242

[5] G Poli J H Saito J F Mari and M R Zorzan ldquoProcessing

neocognitron of face recognition on high performance environment based

on GPU with CUDA architecturerdquo in Proc 20th Int Symp Comput

Archit High Perform Comput 2008 pp 81ndash88

[6] J Antikainen J Havel R Josth A Herout P Zemcik and M Hauta-

Kasari ldquoNonnegative tensor factorization accelerated using GPGPUrdquo IEEE

Trans Parallel Distrib Syst vol 22 no 7 pp 1135ndash1141 Feb 2011

[7] K-S Oh and K Jung ldquoGPU implementation of neural networksrdquo

Pattern Recognit vol 37 no 6 pp 1311ndash1314 2004

[8] P R Dixon T Oonishi and S Furui ldquoHarnessing graphics processors

for the fast computation of acoustic likelihoods in speech recognitionrdquo

Comput Speech Lang vol 23 no 4 pp 510ndash526 2009

[9] P Kaufman and A Alm ldquoClinical applicationrdquo Adlerrsquos Physiology of

the Eye 2003

[10] R N Rakvic B J Ulis R P Broussard R W Ives and N Steiner

ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics Security vol 4

no 4 pp 812ndash823 Dec 2009

[11] S Crihalmeanu and A Ross ldquoMultispectral scleral patterns for ocular

biometric recognitionrdquo Pattern Recognit Lett vol 33 no 14 pp 1860ndash

1869 Oct 2012

[12] W Wenying Z Dongming Z Yongdong L Jintao and G

Xiaoguang ldquoRobust spatial matching for object retrieval and its parallel

implementation on GPUrdquo IEEE Trans Multimedia vol 13 no 6 pp

1308ndash1318 Dec 2011Multimedia Sec Magdeburg Germany Sep 2004

pp 4ndash15

[13] Y Xu S Deka and R Righetti ldquoA hybrid CPU-GPGPU approach for

real-time elastographyrdquo IEEE Trans Ultrason Ferroelectr Freq Control

vol 58 no 12 pp 2631ndash2645 Dec 2011

[14] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive

multimodal eye recognitionrdquo Signal Image Video Process vol 7 no 4

pp 619ndash631 Jul 2013

[15] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive

approach for sclera image quality measurerdquo Int J Biometrics vol 5 no 2

pp 181ndash198 2013

[16] Z Zhou E Y Du N L Thomas and E J Delp ldquoA new human

identification method Sclera recognitionrdquo IEEE Trans Syst Man

Cybern A Syst Humans vol 42 no 3 pp 571ndash583 May 2012

Page 69: Major project report on

FIG

EDGE DETECTON IS DONE BY OTSUrsquoS THRESHOLDING

FIG

SELECTING THE REGION OF INTEREST (SCLERA PART)

FIG

SELECTED ROI PART

FIG

FIG

ENHANCEMENT OF SCLERA IMAGE

FIG

FEATURE EXTRACTION OF SCLERA IMAGE USING GABOR

FILTERS

FIG

MATCHING WITH IMAGES IN DATABASE

FIG

DISPLAYING THE RESULT (MATCHED OR NOT MATCHED)

FIG

CHAPTER 5

APPLICATIONS

The applications of biometrics can be divided into the following three main groups

Commercial applications such as computer network login electronic data security ecommerce Internet access ATM credit card physical access control cellular phone PDA medical records management distance learning etc

Government applications such as national ID card correctional facility driverrsquos license social security welfare-disbursement border control Passport control etc

Forensic applications such as corpse identification criminal investigation terrorist identification parenthood determination missing children etc Traditionally commercial applications have used knowledge-based systems (eg PIN sand passwords) government applications have used token-based systems (eg ID cards and badges) and forensic applications have relied on human experts to match biometric features Biometric systems are being increasingly deployed in large scale civilian applications The Schiphol Premium scheme at the Amsterdam airport for example employs iris scan cards to speed up the passport and visa control procedures

CHAPTER 6

CONCLUSION AND FUTURE SCOPE

61 CONCLUSION

In this paper we proposed a new parallel sclera vein recognition

method which employees a two stage parallel approach for registration and

matching Even though the research focused on developing a parallel sclera

matching solution for the sequential line-descriptor method using CUDA

GPU architecture the parallel strategies developed in this research can be

applied to design parallel solutions to other sclera vein recognition methods

and general pattern recognition methods We designed the Y shape

descriptor to narrow the search range to increase the matching efficiency

which is a new feature extraction method to take advantage of the GPU

structures We developed the WPL descriptor to incorporate mask

information and make it more suitable for parallel computing which can

dramatically reduce data transferring and computation We then carefully

mapped our algorithms to GPU threads and blocks which is an important

step to achieve parallel computation efficiency using a GPU A work flow

which has high arithmetic intensity to hide the memory access latency was

designed to partition the computation task to the heterogeneous system of

CPU and GPU even to the threads in GPU The proposed method

dramatically improves the matching efficiency without compromising

recognition accuracy

62 REFERENCES

[1] C W Oyster The Human Eye Structure and Function Sunderland

Sinauer Associates 1999

[2] C Cuevas D Berjon F Moran and N Garcia ldquoMoving object

detection for real-time augmented reality applications in a GPGPUrdquo IEEE

Trans Consum Electron vol 58 no 1 pp 117ndash125 Feb 2012

[3] D C Cirean U Meier L M Gambardella and J Schmidhuber ldquoDeep

big simple neural nets for handwritten digit recognitionrdquo Neural Comput

vol 22 no 12 pp 3207ndash3220 2010

[4] F Z Sakr M Taher and A M Wahba ldquoHigh performance iris

recognition system on GPUrdquo in Proc ICCES 2011 pp 237ndash242

[5] G Poli J H Saito J F Mari and M R Zorzan ldquoProcessing

neocognitron of face recognition on high performance environment based

on GPU with CUDA architecturerdquo in Proc 20th Int Symp Comput

Archit High Perform Comput 2008 pp 81ndash88

[6] J Antikainen J Havel R Josth A Herout P Zemcik and M Hauta-

Kasari ldquoNonnegative tensor factorization accelerated using GPGPUrdquo IEEE

Trans Parallel Distrib Syst vol 22 no 7 pp 1135ndash1141 Feb 2011

[7] K-S Oh and K Jung ldquoGPU implementation of neural networksrdquo

Pattern Recognit vol 37 no 6 pp 1311ndash1314 2004

[8] P R Dixon T Oonishi and S Furui ldquoHarnessing graphics processors

for the fast computation of acoustic likelihoods in speech recognitionrdquo

Comput Speech Lang vol 23 no 4 pp 510ndash526 2009

[9] P Kaufman and A Alm ldquoClinical applicationrdquo Adlerrsquos Physiology of

the Eye 2003

[10] R N Rakvic B J Ulis R P Broussard R W Ives and N Steiner

ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics Security vol 4

no 4 pp 812ndash823 Dec 2009

[11] S Crihalmeanu and A Ross ldquoMultispectral scleral patterns for ocular

biometric recognitionrdquo Pattern Recognit Lett vol 33 no 14 pp 1860ndash

1869 Oct 2012

[12] W Wenying Z Dongming Z Yongdong L Jintao and G

Xiaoguang ldquoRobust spatial matching for object retrieval and its parallel

implementation on GPUrdquo IEEE Trans Multimedia vol 13 no 6 pp

1308ndash1318 Dec 2011Multimedia Sec Magdeburg Germany Sep 2004

pp 4ndash15

[13] Y Xu S Deka and R Righetti ldquoA hybrid CPU-GPGPU approach for

real-time elastographyrdquo IEEE Trans Ultrason Ferroelectr Freq Control

vol 58 no 12 pp 2631ndash2645 Dec 2011

[14] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive

multimodal eye recognitionrdquo Signal Image Video Process vol 7 no 4

pp 619ndash631 Jul 2013

[15] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive

approach for sclera image quality measurerdquo Int J Biometrics vol 5 no 2

pp 181ndash198 2013

[16] Z Zhou E Y Du N L Thomas and E J Delp ldquoA new human

identification method Sclera recognitionrdquo IEEE Trans Syst Man

Cybern A Syst Humans vol 42 no 3 pp 571ndash583 May 2012

Page 70: Major project report on

FIG

SELECTED ROI PART

FIG

FIG

ENHANCEMENT OF SCLERA IMAGE

FIG

FEATURE EXTRACTION OF SCLERA IMAGE USING GABOR

FILTERS

FIG

MATCHING WITH IMAGES IN DATABASE

FIG

DISPLAYING THE RESULT (MATCHED OR NOT MATCHED)

FIG

CHAPTER 5

APPLICATIONS

The applications of biometrics can be divided into the following three main groups

Commercial applications such as computer network login electronic data security ecommerce Internet access ATM credit card physical access control cellular phone PDA medical records management distance learning etc

Government applications such as national ID card correctional facility driverrsquos license social security welfare-disbursement border control Passport control etc

Forensic applications such as corpse identification criminal investigation terrorist identification parenthood determination missing children etc Traditionally commercial applications have used knowledge-based systems (eg PIN sand passwords) government applications have used token-based systems (eg ID cards and badges) and forensic applications have relied on human experts to match biometric features Biometric systems are being increasingly deployed in large scale civilian applications The Schiphol Premium scheme at the Amsterdam airport for example employs iris scan cards to speed up the passport and visa control procedures

CHAPTER 6

CONCLUSION AND FUTURE SCOPE

61 CONCLUSION

In this paper we proposed a new parallel sclera vein recognition

method which employees a two stage parallel approach for registration and

matching Even though the research focused on developing a parallel sclera

matching solution for the sequential line-descriptor method using CUDA

GPU architecture the parallel strategies developed in this research can be

applied to design parallel solutions to other sclera vein recognition methods

and general pattern recognition methods We designed the Y shape

descriptor to narrow the search range to increase the matching efficiency

which is a new feature extraction method to take advantage of the GPU

structures We developed the WPL descriptor to incorporate mask

information and make it more suitable for parallel computing which can

dramatically reduce data transferring and computation We then carefully

mapped our algorithms to GPU threads and blocks which is an important

step to achieve parallel computation efficiency using a GPU A work flow

which has high arithmetic intensity to hide the memory access latency was

designed to partition the computation task to the heterogeneous system of

CPU and GPU even to the threads in GPU The proposed method

dramatically improves the matching efficiency without compromising

recognition accuracy

62 REFERENCES

[1] C W Oyster The Human Eye Structure and Function Sunderland

Sinauer Associates 1999

[2] C Cuevas D Berjon F Moran and N Garcia ldquoMoving object

detection for real-time augmented reality applications in a GPGPUrdquo IEEE

Trans Consum Electron vol 58 no 1 pp 117ndash125 Feb 2012

[3] D C Cirean U Meier L M Gambardella and J Schmidhuber ldquoDeep

big simple neural nets for handwritten digit recognitionrdquo Neural Comput

vol 22 no 12 pp 3207ndash3220 2010

[4] F Z Sakr M Taher and A M Wahba ldquoHigh performance iris

recognition system on GPUrdquo in Proc ICCES 2011 pp 237ndash242

[5] G Poli J H Saito J F Mari and M R Zorzan ldquoProcessing

neocognitron of face recognition on high performance environment based

on GPU with CUDA architecturerdquo in Proc 20th Int Symp Comput

Archit High Perform Comput 2008 pp 81ndash88

[6] J Antikainen J Havel R Josth A Herout P Zemcik and M Hauta-

Kasari ldquoNonnegative tensor factorization accelerated using GPGPUrdquo IEEE

Trans Parallel Distrib Syst vol 22 no 7 pp 1135ndash1141 Feb 2011

[7] K-S Oh and K Jung ldquoGPU implementation of neural networksrdquo

Pattern Recognit vol 37 no 6 pp 1311ndash1314 2004

[8] P R Dixon T Oonishi and S Furui ldquoHarnessing graphics processors

for the fast computation of acoustic likelihoods in speech recognitionrdquo

Comput Speech Lang vol 23 no 4 pp 510ndash526 2009

[9] P Kaufman and A Alm ldquoClinical applicationrdquo Adlerrsquos Physiology of

the Eye 2003

[10] R N Rakvic B J Ulis R P Broussard R W Ives and N Steiner

ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics Security vol 4

no 4 pp 812ndash823 Dec 2009

[11] S Crihalmeanu and A Ross ldquoMultispectral scleral patterns for ocular

biometric recognitionrdquo Pattern Recognit Lett vol 33 no 14 pp 1860ndash

1869 Oct 2012

[12] W Wenying Z Dongming Z Yongdong L Jintao and G

Xiaoguang ldquoRobust spatial matching for object retrieval and its parallel

implementation on GPUrdquo IEEE Trans Multimedia vol 13 no 6 pp

1308ndash1318 Dec 2011Multimedia Sec Magdeburg Germany Sep 2004

pp 4ndash15

[13] Y Xu S Deka and R Righetti ldquoA hybrid CPU-GPGPU approach for

real-time elastographyrdquo IEEE Trans Ultrason Ferroelectr Freq Control

vol 58 no 12 pp 2631ndash2645 Dec 2011

[14] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive

multimodal eye recognitionrdquo Signal Image Video Process vol 7 no 4

pp 619ndash631 Jul 2013

[15] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive

approach for sclera image quality measurerdquo Int J Biometrics vol 5 no 2

pp 181ndash198 2013

[16] Z Zhou E Y Du N L Thomas and E J Delp ldquoA new human

identification method Sclera recognitionrdquo IEEE Trans Syst Man

Cybern A Syst Humans vol 42 no 3 pp 571ndash583 May 2012

Page 71: Major project report on

FIG

ENHANCEMENT OF SCLERA IMAGE

FIG

FEATURE EXTRACTION OF SCLERA IMAGE USING GABOR

FILTERS

FIG

MATCHING WITH IMAGES IN DATABASE

FIG

DISPLAYING THE RESULT (MATCHED OR NOT MATCHED)

FIG

CHAPTER 5

APPLICATIONS

The applications of biometrics can be divided into the following three main groups

Commercial applications such as computer network login electronic data security ecommerce Internet access ATM credit card physical access control cellular phone PDA medical records management distance learning etc

Government applications such as national ID card correctional facility driverrsquos license social security welfare-disbursement border control Passport control etc

Forensic applications such as corpse identification criminal investigation terrorist identification parenthood determination missing children etc Traditionally commercial applications have used knowledge-based systems (eg PIN sand passwords) government applications have used token-based systems (eg ID cards and badges) and forensic applications have relied on human experts to match biometric features Biometric systems are being increasingly deployed in large scale civilian applications The Schiphol Premium scheme at the Amsterdam airport for example employs iris scan cards to speed up the passport and visa control procedures

CHAPTER 6

CONCLUSION AND FUTURE SCOPE

61 CONCLUSION

In this paper we proposed a new parallel sclera vein recognition

method which employees a two stage parallel approach for registration and

matching Even though the research focused on developing a parallel sclera

matching solution for the sequential line-descriptor method using CUDA

GPU architecture the parallel strategies developed in this research can be

applied to design parallel solutions to other sclera vein recognition methods

and general pattern recognition methods We designed the Y shape

descriptor to narrow the search range to increase the matching efficiency

which is a new feature extraction method to take advantage of the GPU

structures We developed the WPL descriptor to incorporate mask

information and make it more suitable for parallel computing which can

dramatically reduce data transferring and computation We then carefully

mapped our algorithms to GPU threads and blocks which is an important

step to achieve parallel computation efficiency using a GPU A work flow

which has high arithmetic intensity to hide the memory access latency was

designed to partition the computation task to the heterogeneous system of

CPU and GPU even to the threads in GPU The proposed method

dramatically improves the matching efficiency without compromising

recognition accuracy

62 REFERENCES

[1] C W Oyster The Human Eye Structure and Function Sunderland

Sinauer Associates 1999

[2] C Cuevas D Berjon F Moran and N Garcia ldquoMoving object

detection for real-time augmented reality applications in a GPGPUrdquo IEEE

Trans Consum Electron vol 58 no 1 pp 117ndash125 Feb 2012

[3] D C Cirean U Meier L M Gambardella and J Schmidhuber ldquoDeep

big simple neural nets for handwritten digit recognitionrdquo Neural Comput

vol 22 no 12 pp 3207ndash3220 2010

[4] F Z Sakr M Taher and A M Wahba ldquoHigh performance iris

recognition system on GPUrdquo in Proc ICCES 2011 pp 237ndash242

[5] G Poli J H Saito J F Mari and M R Zorzan ldquoProcessing

neocognitron of face recognition on high performance environment based

on GPU with CUDA architecturerdquo in Proc 20th Int Symp Comput

Archit High Perform Comput 2008 pp 81ndash88

[6] J Antikainen J Havel R Josth A Herout P Zemcik and M Hauta-

Kasari ldquoNonnegative tensor factorization accelerated using GPGPUrdquo IEEE

Trans Parallel Distrib Syst vol 22 no 7 pp 1135ndash1141 Feb 2011

[7] K-S Oh and K Jung ldquoGPU implementation of neural networksrdquo

Pattern Recognit vol 37 no 6 pp 1311ndash1314 2004

[8] P R Dixon T Oonishi and S Furui ldquoHarnessing graphics processors

for the fast computation of acoustic likelihoods in speech recognitionrdquo

Comput Speech Lang vol 23 no 4 pp 510ndash526 2009

[9] P Kaufman and A Alm ldquoClinical applicationrdquo Adlerrsquos Physiology of

the Eye 2003

[10] R N Rakvic B J Ulis R P Broussard R W Ives and N Steiner

ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics Security vol 4

no 4 pp 812ndash823 Dec 2009

[11] S Crihalmeanu and A Ross ldquoMultispectral scleral patterns for ocular

biometric recognitionrdquo Pattern Recognit Lett vol 33 no 14 pp 1860ndash

1869 Oct 2012

[12] W Wenying Z Dongming Z Yongdong L Jintao and G

Xiaoguang ldquoRobust spatial matching for object retrieval and its parallel

implementation on GPUrdquo IEEE Trans Multimedia vol 13 no 6 pp

1308ndash1318 Dec 2011Multimedia Sec Magdeburg Germany Sep 2004

pp 4ndash15

[13] Y Xu S Deka and R Righetti ldquoA hybrid CPU-GPGPU approach for

real-time elastographyrdquo IEEE Trans Ultrason Ferroelectr Freq Control

vol 58 no 12 pp 2631ndash2645 Dec 2011

[14] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive

multimodal eye recognitionrdquo Signal Image Video Process vol 7 no 4

pp 619ndash631 Jul 2013

[15] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive

approach for sclera image quality measurerdquo Int J Biometrics vol 5 no 2

pp 181ndash198 2013

[16] Z Zhou E Y Du N L Thomas and E J Delp ldquoA new human

identification method Sclera recognitionrdquo IEEE Trans Syst Man

Cybern A Syst Humans vol 42 no 3 pp 571ndash583 May 2012

Page 72: Major project report on

FIG

MATCHING WITH IMAGES IN DATABASE

FIG

DISPLAYING THE RESULT (MATCHED OR NOT MATCHED)

FIG

CHAPTER 5

APPLICATIONS

The applications of biometrics can be divided into the following three main groups

Commercial applications such as computer network login electronic data security ecommerce Internet access ATM credit card physical access control cellular phone PDA medical records management distance learning etc

Government applications such as national ID card correctional facility driverrsquos license social security welfare-disbursement border control Passport control etc

Forensic applications such as corpse identification criminal investigation terrorist identification parenthood determination missing children etc Traditionally commercial applications have used knowledge-based systems (eg PIN sand passwords) government applications have used token-based systems (eg ID cards and badges) and forensic applications have relied on human experts to match biometric features Biometric systems are being increasingly deployed in large scale civilian applications The Schiphol Premium scheme at the Amsterdam airport for example employs iris scan cards to speed up the passport and visa control procedures

CHAPTER 6

CONCLUSION AND FUTURE SCOPE

61 CONCLUSION

In this paper we proposed a new parallel sclera vein recognition

method which employees a two stage parallel approach for registration and

matching Even though the research focused on developing a parallel sclera

matching solution for the sequential line-descriptor method using CUDA

GPU architecture the parallel strategies developed in this research can be

applied to design parallel solutions to other sclera vein recognition methods

and general pattern recognition methods We designed the Y shape

descriptor to narrow the search range to increase the matching efficiency

which is a new feature extraction method to take advantage of the GPU

structures We developed the WPL descriptor to incorporate mask

information and make it more suitable for parallel computing which can

dramatically reduce data transferring and computation We then carefully

mapped our algorithms to GPU threads and blocks which is an important

step to achieve parallel computation efficiency using a GPU A work flow

which has high arithmetic intensity to hide the memory access latency was

designed to partition the computation task to the heterogeneous system of

CPU and GPU even to the threads in GPU The proposed method

dramatically improves the matching efficiency without compromising

recognition accuracy

62 REFERENCES

[1] C W Oyster The Human Eye Structure and Function Sunderland

Sinauer Associates 1999

[2] C Cuevas D Berjon F Moran and N Garcia ldquoMoving object

detection for real-time augmented reality applications in a GPGPUrdquo IEEE

Trans Consum Electron vol 58 no 1 pp 117ndash125 Feb 2012

[3] D C Cirean U Meier L M Gambardella and J Schmidhuber ldquoDeep

big simple neural nets for handwritten digit recognitionrdquo Neural Comput

vol 22 no 12 pp 3207ndash3220 2010

[4] F Z Sakr M Taher and A M Wahba ldquoHigh performance iris

recognition system on GPUrdquo in Proc ICCES 2011 pp 237ndash242

[5] G Poli J H Saito J F Mari and M R Zorzan ldquoProcessing

neocognitron of face recognition on high performance environment based

on GPU with CUDA architecturerdquo in Proc 20th Int Symp Comput

Archit High Perform Comput 2008 pp 81ndash88

[6] J Antikainen J Havel R Josth A Herout P Zemcik and M Hauta-

Kasari ldquoNonnegative tensor factorization accelerated using GPGPUrdquo IEEE

Trans Parallel Distrib Syst vol 22 no 7 pp 1135ndash1141 Feb 2011

[7] K-S Oh and K Jung ldquoGPU implementation of neural networksrdquo

Pattern Recognit vol 37 no 6 pp 1311ndash1314 2004

[8] P R Dixon T Oonishi and S Furui ldquoHarnessing graphics processors

for the fast computation of acoustic likelihoods in speech recognitionrdquo

Comput Speech Lang vol 23 no 4 pp 510ndash526 2009

[9] P Kaufman and A Alm ldquoClinical applicationrdquo Adlerrsquos Physiology of

the Eye 2003

[10] R N Rakvic B J Ulis R P Broussard R W Ives and N Steiner

ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics Security vol 4

no 4 pp 812ndash823 Dec 2009

[11] S Crihalmeanu and A Ross ldquoMultispectral scleral patterns for ocular

biometric recognitionrdquo Pattern Recognit Lett vol 33 no 14 pp 1860ndash

1869 Oct 2012

[12] W Wenying Z Dongming Z Yongdong L Jintao and G

Xiaoguang ldquoRobust spatial matching for object retrieval and its parallel

implementation on GPUrdquo IEEE Trans Multimedia vol 13 no 6 pp

1308ndash1318 Dec 2011Multimedia Sec Magdeburg Germany Sep 2004

pp 4ndash15

[13] Y Xu S Deka and R Righetti ldquoA hybrid CPU-GPGPU approach for

real-time elastographyrdquo IEEE Trans Ultrason Ferroelectr Freq Control

vol 58 no 12 pp 2631ndash2645 Dec 2011

[14] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive

multimodal eye recognitionrdquo Signal Image Video Process vol 7 no 4

pp 619ndash631 Jul 2013

[15] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive

approach for sclera image quality measurerdquo Int J Biometrics vol 5 no 2

pp 181ndash198 2013

[16] Z Zhou E Y Du N L Thomas and E J Delp ldquoA new human

identification method Sclera recognitionrdquo IEEE Trans Syst Man

Cybern A Syst Humans vol 42 no 3 pp 571ndash583 May 2012

Page 73: Major project report on

FIG

DISPLAYING THE RESULT (MATCHED OR NOT MATCHED)

FIG

CHAPTER 5

APPLICATIONS

The applications of biometrics can be divided into the following three main groups

Commercial applications such as computer network login electronic data security ecommerce Internet access ATM credit card physical access control cellular phone PDA medical records management distance learning etc

Government applications such as national ID card correctional facility driverrsquos license social security welfare-disbursement border control Passport control etc

Forensic applications such as corpse identification criminal investigation terrorist identification parenthood determination missing children etc Traditionally commercial applications have used knowledge-based systems (eg PIN sand passwords) government applications have used token-based systems (eg ID cards and badges) and forensic applications have relied on human experts to match biometric features Biometric systems are being increasingly deployed in large scale civilian applications The Schiphol Premium scheme at the Amsterdam airport for example employs iris scan cards to speed up the passport and visa control procedures

CHAPTER 6

CONCLUSION AND FUTURE SCOPE

61 CONCLUSION

In this paper we proposed a new parallel sclera vein recognition

method which employees a two stage parallel approach for registration and

matching Even though the research focused on developing a parallel sclera

matching solution for the sequential line-descriptor method using CUDA

GPU architecture the parallel strategies developed in this research can be

applied to design parallel solutions to other sclera vein recognition methods

and general pattern recognition methods We designed the Y shape

descriptor to narrow the search range to increase the matching efficiency

which is a new feature extraction method to take advantage of the GPU

structures We developed the WPL descriptor to incorporate mask

information and make it more suitable for parallel computing which can

dramatically reduce data transferring and computation We then carefully

mapped our algorithms to GPU threads and blocks which is an important

step to achieve parallel computation efficiency using a GPU A work flow

which has high arithmetic intensity to hide the memory access latency was

designed to partition the computation task to the heterogeneous system of

CPU and GPU even to the threads in GPU The proposed method

dramatically improves the matching efficiency without compromising

recognition accuracy

62 REFERENCES

[1] C W Oyster The Human Eye Structure and Function Sunderland

Sinauer Associates 1999

[2] C Cuevas D Berjon F Moran and N Garcia ldquoMoving object

detection for real-time augmented reality applications in a GPGPUrdquo IEEE

Trans Consum Electron vol 58 no 1 pp 117ndash125 Feb 2012

[3] D C Cirean U Meier L M Gambardella and J Schmidhuber ldquoDeep

big simple neural nets for handwritten digit recognitionrdquo Neural Comput

vol 22 no 12 pp 3207ndash3220 2010

[4] F Z Sakr M Taher and A M Wahba ldquoHigh performance iris

recognition system on GPUrdquo in Proc ICCES 2011 pp 237ndash242

[5] G Poli J H Saito J F Mari and M R Zorzan ldquoProcessing

neocognitron of face recognition on high performance environment based

on GPU with CUDA architecturerdquo in Proc 20th Int Symp Comput

Archit High Perform Comput 2008 pp 81ndash88

[6] J Antikainen J Havel R Josth A Herout P Zemcik and M Hauta-

Kasari ldquoNonnegative tensor factorization accelerated using GPGPUrdquo IEEE

Trans Parallel Distrib Syst vol 22 no 7 pp 1135ndash1141 Feb 2011

[7] K-S Oh and K Jung ldquoGPU implementation of neural networksrdquo

Pattern Recognit vol 37 no 6 pp 1311ndash1314 2004

[8] P R Dixon T Oonishi and S Furui ldquoHarnessing graphics processors

for the fast computation of acoustic likelihoods in speech recognitionrdquo

Comput Speech Lang vol 23 no 4 pp 510ndash526 2009

[9] P Kaufman and A Alm ldquoClinical applicationrdquo Adlerrsquos Physiology of

the Eye 2003

[10] R N Rakvic B J Ulis R P Broussard R W Ives and N Steiner

ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics Security vol 4

no 4 pp 812ndash823 Dec 2009

[11] S Crihalmeanu and A Ross ldquoMultispectral scleral patterns for ocular

biometric recognitionrdquo Pattern Recognit Lett vol 33 no 14 pp 1860ndash

1869 Oct 2012

[12] W Wenying Z Dongming Z Yongdong L Jintao and G

Xiaoguang ldquoRobust spatial matching for object retrieval and its parallel

implementation on GPUrdquo IEEE Trans Multimedia vol 13 no 6 pp

1308ndash1318 Dec 2011Multimedia Sec Magdeburg Germany Sep 2004

pp 4ndash15

[13] Y Xu S Deka and R Righetti ldquoA hybrid CPU-GPGPU approach for

real-time elastographyrdquo IEEE Trans Ultrason Ferroelectr Freq Control

vol 58 no 12 pp 2631ndash2645 Dec 2011

[14] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive

multimodal eye recognitionrdquo Signal Image Video Process vol 7 no 4

pp 619ndash631 Jul 2013

[15] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive

approach for sclera image quality measurerdquo Int J Biometrics vol 5 no 2

pp 181ndash198 2013

[16] Z Zhou E Y Du N L Thomas and E J Delp ldquoA new human

identification method Sclera recognitionrdquo IEEE Trans Syst Man

Cybern A Syst Humans vol 42 no 3 pp 571ndash583 May 2012

Page 74: Major project report on

CHAPTER 5

APPLICATIONS

The applications of biometrics can be divided into the following three main groups

Commercial applications such as computer network login electronic data security ecommerce Internet access ATM credit card physical access control cellular phone PDA medical records management distance learning etc

Government applications such as national ID card correctional facility driverrsquos license social security welfare-disbursement border control Passport control etc

Forensic applications such as corpse identification criminal investigation terrorist identification parenthood determination missing children etc Traditionally commercial applications have used knowledge-based systems (eg PIN sand passwords) government applications have used token-based systems (eg ID cards and badges) and forensic applications have relied on human experts to match biometric features Biometric systems are being increasingly deployed in large scale civilian applications The Schiphol Premium scheme at the Amsterdam airport for example employs iris scan cards to speed up the passport and visa control procedures

CHAPTER 6

CONCLUSION AND FUTURE SCOPE

61 CONCLUSION

In this paper we proposed a new parallel sclera vein recognition

method which employees a two stage parallel approach for registration and

matching Even though the research focused on developing a parallel sclera

matching solution for the sequential line-descriptor method using CUDA

GPU architecture the parallel strategies developed in this research can be

applied to design parallel solutions to other sclera vein recognition methods

and general pattern recognition methods We designed the Y shape

descriptor to narrow the search range to increase the matching efficiency

which is a new feature extraction method to take advantage of the GPU

structures We developed the WPL descriptor to incorporate mask

information and make it more suitable for parallel computing which can

dramatically reduce data transferring and computation We then carefully

mapped our algorithms to GPU threads and blocks which is an important

step to achieve parallel computation efficiency using a GPU A work flow

which has high arithmetic intensity to hide the memory access latency was

designed to partition the computation task to the heterogeneous system of

CPU and GPU even to the threads in GPU The proposed method

dramatically improves the matching efficiency without compromising

recognition accuracy

62 REFERENCES

[1] C W Oyster The Human Eye Structure and Function Sunderland

Sinauer Associates 1999

[2] C Cuevas D Berjon F Moran and N Garcia ldquoMoving object

detection for real-time augmented reality applications in a GPGPUrdquo IEEE

Trans Consum Electron vol 58 no 1 pp 117ndash125 Feb 2012

[3] D C Cirean U Meier L M Gambardella and J Schmidhuber ldquoDeep

big simple neural nets for handwritten digit recognitionrdquo Neural Comput

vol 22 no 12 pp 3207ndash3220 2010

[4] F Z Sakr M Taher and A M Wahba ldquoHigh performance iris

recognition system on GPUrdquo in Proc ICCES 2011 pp 237ndash242

[5] G Poli J H Saito J F Mari and M R Zorzan ldquoProcessing

neocognitron of face recognition on high performance environment based

on GPU with CUDA architecturerdquo in Proc 20th Int Symp Comput

Archit High Perform Comput 2008 pp 81ndash88

[6] J Antikainen J Havel R Josth A Herout P Zemcik and M Hauta-

Kasari ldquoNonnegative tensor factorization accelerated using GPGPUrdquo IEEE

Trans Parallel Distrib Syst vol 22 no 7 pp 1135ndash1141 Feb 2011

[7] K-S Oh and K Jung ldquoGPU implementation of neural networksrdquo

Pattern Recognit vol 37 no 6 pp 1311ndash1314 2004

[8] P R Dixon T Oonishi and S Furui ldquoHarnessing graphics processors

for the fast computation of acoustic likelihoods in speech recognitionrdquo

Comput Speech Lang vol 23 no 4 pp 510ndash526 2009

[9] P Kaufman and A Alm ldquoClinical applicationrdquo Adlerrsquos Physiology of

the Eye 2003

[10] R N Rakvic B J Ulis R P Broussard R W Ives and N Steiner

ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics Security vol 4

no 4 pp 812ndash823 Dec 2009

[11] S Crihalmeanu and A Ross ldquoMultispectral scleral patterns for ocular

biometric recognitionrdquo Pattern Recognit Lett vol 33 no 14 pp 1860ndash

1869 Oct 2012

[12] W Wenying Z Dongming Z Yongdong L Jintao and G

Xiaoguang ldquoRobust spatial matching for object retrieval and its parallel

implementation on GPUrdquo IEEE Trans Multimedia vol 13 no 6 pp

1308ndash1318 Dec 2011Multimedia Sec Magdeburg Germany Sep 2004

pp 4ndash15

[13] Y Xu S Deka and R Righetti ldquoA hybrid CPU-GPGPU approach for

real-time elastographyrdquo IEEE Trans Ultrason Ferroelectr Freq Control

vol 58 no 12 pp 2631ndash2645 Dec 2011

[14] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive

multimodal eye recognitionrdquo Signal Image Video Process vol 7 no 4

pp 619ndash631 Jul 2013

[15] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive

approach for sclera image quality measurerdquo Int J Biometrics vol 5 no 2

pp 181ndash198 2013

[16] Z Zhou E Y Du N L Thomas and E J Delp ldquoA new human

identification method Sclera recognitionrdquo IEEE Trans Syst Man

Cybern A Syst Humans vol 42 no 3 pp 571ndash583 May 2012

Page 75: Major project report on

CHAPTER 6

CONCLUSION AND FUTURE SCOPE

61 CONCLUSION

In this paper we proposed a new parallel sclera vein recognition

method which employees a two stage parallel approach for registration and

matching Even though the research focused on developing a parallel sclera

matching solution for the sequential line-descriptor method using CUDA

GPU architecture the parallel strategies developed in this research can be

applied to design parallel solutions to other sclera vein recognition methods

and general pattern recognition methods We designed the Y shape

descriptor to narrow the search range to increase the matching efficiency

which is a new feature extraction method to take advantage of the GPU

structures We developed the WPL descriptor to incorporate mask

information and make it more suitable for parallel computing which can

dramatically reduce data transferring and computation We then carefully

mapped our algorithms to GPU threads and blocks which is an important

step to achieve parallel computation efficiency using a GPU A work flow

which has high arithmetic intensity to hide the memory access latency was

designed to partition the computation task to the heterogeneous system of

CPU and GPU even to the threads in GPU The proposed method

dramatically improves the matching efficiency without compromising

recognition accuracy

62 REFERENCES

[1] C W Oyster The Human Eye Structure and Function Sunderland

Sinauer Associates 1999

[2] C Cuevas D Berjon F Moran and N Garcia ldquoMoving object

detection for real-time augmented reality applications in a GPGPUrdquo IEEE

Trans Consum Electron vol 58 no 1 pp 117ndash125 Feb 2012

[3] D C Cirean U Meier L M Gambardella and J Schmidhuber ldquoDeep

big simple neural nets for handwritten digit recognitionrdquo Neural Comput

vol 22 no 12 pp 3207ndash3220 2010

[4] F Z Sakr M Taher and A M Wahba ldquoHigh performance iris

recognition system on GPUrdquo in Proc ICCES 2011 pp 237ndash242

[5] G Poli J H Saito J F Mari and M R Zorzan ldquoProcessing

neocognitron of face recognition on high performance environment based

on GPU with CUDA architecturerdquo in Proc 20th Int Symp Comput

Archit High Perform Comput 2008 pp 81ndash88

[6] J Antikainen J Havel R Josth A Herout P Zemcik and M Hauta-

Kasari ldquoNonnegative tensor factorization accelerated using GPGPUrdquo IEEE

Trans Parallel Distrib Syst vol 22 no 7 pp 1135ndash1141 Feb 2011

[7] K-S Oh and K Jung ldquoGPU implementation of neural networksrdquo

Pattern Recognit vol 37 no 6 pp 1311ndash1314 2004

[8] P R Dixon T Oonishi and S Furui ldquoHarnessing graphics processors

for the fast computation of acoustic likelihoods in speech recognitionrdquo

Comput Speech Lang vol 23 no 4 pp 510ndash526 2009

[9] P Kaufman and A Alm ldquoClinical applicationrdquo Adlerrsquos Physiology of

the Eye 2003

[10] R N Rakvic B J Ulis R P Broussard R W Ives and N Steiner

ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics Security vol 4

no 4 pp 812ndash823 Dec 2009

[11] S Crihalmeanu and A Ross ldquoMultispectral scleral patterns for ocular

biometric recognitionrdquo Pattern Recognit Lett vol 33 no 14 pp 1860ndash

1869 Oct 2012

[12] W Wenying Z Dongming Z Yongdong L Jintao and G

Xiaoguang ldquoRobust spatial matching for object retrieval and its parallel

implementation on GPUrdquo IEEE Trans Multimedia vol 13 no 6 pp

1308ndash1318 Dec 2011Multimedia Sec Magdeburg Germany Sep 2004

pp 4ndash15

[13] Y Xu S Deka and R Righetti ldquoA hybrid CPU-GPGPU approach for

real-time elastographyrdquo IEEE Trans Ultrason Ferroelectr Freq Control

vol 58 no 12 pp 2631ndash2645 Dec 2011

[14] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive

multimodal eye recognitionrdquo Signal Image Video Process vol 7 no 4

pp 619ndash631 Jul 2013

[15] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive

approach for sclera image quality measurerdquo Int J Biometrics vol 5 no 2

pp 181ndash198 2013

[16] Z Zhou E Y Du N L Thomas and E J Delp ldquoA new human

identification method Sclera recognitionrdquo IEEE Trans Syst Man

Cybern A Syst Humans vol 42 no 3 pp 571ndash583 May 2012

Page 76: Major project report on

62 REFERENCES

[1] C W Oyster The Human Eye Structure and Function Sunderland

Sinauer Associates 1999

[2] C Cuevas D Berjon F Moran and N Garcia ldquoMoving object

detection for real-time augmented reality applications in a GPGPUrdquo IEEE

Trans Consum Electron vol 58 no 1 pp 117ndash125 Feb 2012

[3] D C Cirean U Meier L M Gambardella and J Schmidhuber ldquoDeep

big simple neural nets for handwritten digit recognitionrdquo Neural Comput

vol 22 no 12 pp 3207ndash3220 2010

[4] F Z Sakr M Taher and A M Wahba ldquoHigh performance iris

recognition system on GPUrdquo in Proc ICCES 2011 pp 237ndash242

[5] G Poli J H Saito J F Mari and M R Zorzan ldquoProcessing

neocognitron of face recognition on high performance environment based

on GPU with CUDA architecturerdquo in Proc 20th Int Symp Comput

Archit High Perform Comput 2008 pp 81ndash88

[6] J Antikainen J Havel R Josth A Herout P Zemcik and M Hauta-

Kasari ldquoNonnegative tensor factorization accelerated using GPGPUrdquo IEEE

Trans Parallel Distrib Syst vol 22 no 7 pp 1135ndash1141 Feb 2011

[7] K-S Oh and K Jung ldquoGPU implementation of neural networksrdquo

Pattern Recognit vol 37 no 6 pp 1311ndash1314 2004

[8] P R Dixon T Oonishi and S Furui ldquoHarnessing graphics processors

for the fast computation of acoustic likelihoods in speech recognitionrdquo

Comput Speech Lang vol 23 no 4 pp 510ndash526 2009

[9] P Kaufman and A Alm ldquoClinical applicationrdquo Adlerrsquos Physiology of

the Eye 2003

[10] R N Rakvic B J Ulis R P Broussard R W Ives and N Steiner

ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics Security vol 4

no 4 pp 812ndash823 Dec 2009

[11] S Crihalmeanu and A Ross ldquoMultispectral scleral patterns for ocular

biometric recognitionrdquo Pattern Recognit Lett vol 33 no 14 pp 1860ndash

1869 Oct 2012

[12] W Wenying Z Dongming Z Yongdong L Jintao and G

Xiaoguang ldquoRobust spatial matching for object retrieval and its parallel

implementation on GPUrdquo IEEE Trans Multimedia vol 13 no 6 pp

1308ndash1318 Dec 2011Multimedia Sec Magdeburg Germany Sep 2004

pp 4ndash15

[13] Y Xu S Deka and R Righetti ldquoA hybrid CPU-GPGPU approach for

real-time elastographyrdquo IEEE Trans Ultrason Ferroelectr Freq Control

vol 58 no 12 pp 2631ndash2645 Dec 2011

[14] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive

multimodal eye recognitionrdquo Signal Image Video Process vol 7 no 4

pp 619ndash631 Jul 2013

[15] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive

approach for sclera image quality measurerdquo Int J Biometrics vol 5 no 2

pp 181ndash198 2013

[16] Z Zhou E Y Du N L Thomas and E J Delp ldquoA new human

identification method Sclera recognitionrdquo IEEE Trans Syst Man

Cybern A Syst Humans vol 42 no 3 pp 571ndash583 May 2012

Page 77: Major project report on

[11] S Crihalmeanu and A Ross ldquoMultispectral scleral patterns for ocular

biometric recognitionrdquo Pattern Recognit Lett vol 33 no 14 pp 1860ndash

1869 Oct 2012

[12] W Wenying Z Dongming Z Yongdong L Jintao and G

Xiaoguang ldquoRobust spatial matching for object retrieval and its parallel

implementation on GPUrdquo IEEE Trans Multimedia vol 13 no 6 pp

1308ndash1318 Dec 2011Multimedia Sec Magdeburg Germany Sep 2004

pp 4ndash15

[13] Y Xu S Deka and R Righetti ldquoA hybrid CPU-GPGPU approach for

real-time elastographyrdquo IEEE Trans Ultrason Ferroelectr Freq Control

vol 58 no 12 pp 2631ndash2645 Dec 2011

[14] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive

multimodal eye recognitionrdquo Signal Image Video Process vol 7 no 4

pp 619ndash631 Jul 2013

[15] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive

approach for sclera image quality measurerdquo Int J Biometrics vol 5 no 2

pp 181ndash198 2013

[16] Z Zhou E Y Du N L Thomas and E J Delp ldquoA new human

identification method Sclera recognitionrdquo IEEE Trans Syst Man

Cybern A Syst Humans vol 42 no 3 pp 571ndash583 May 2012

Page 78: Major project report on