WEB ALGORITHM SEARCH ENGINE BASED NETWORK … Algorithm Search Engine Based...scientific way of...

24
WEB ALGORITHM SEARCH ENGINE BASED NETWORK MODELING OF MALARIA TRANSMISSION EZE, MONDAY OKPOTO A thesis Submitted In fulfillment of the requirement for the degree of Doctor of Philosophy (PhD) in Computer Science. Faculty of Computer Science and Information Technology UNIVERSITI MALAYSIA, SARAWAK 2013

Transcript of WEB ALGORITHM SEARCH ENGINE BASED NETWORK … Algorithm Search Engine Based...scientific way of...

WEB ALGORITHM SEARCH ENGINE BASED

NETWORK MODELING OF MALARIA TRANSMISSION

EZE, MONDAY OKPOTO

A thesis Submitted

In fulfillment of the requirement for the degree of Doctor of Philosophy (PhD)

in Computer Science.

Faculty of Computer Science and Information Technology

UNIVERSITI MALAYSIA, SARAWAK

2013

ii

ACKNOWLEDGEMENTS

I thank God who granted me the needed strength throughout the period of this doctorate

research. My supervisors Assoc. Prof. Dr Jane Labadin and Terrin Lim deserve my

appreciations for their supports and guidance in making this work a huge success. I also

acknowledge the supports of the Dean of the Faculty of Computer Science and Information

Technology, UNIMAS, Prof. Dr. Narayanan Kulathuramaiyer for creating the needed

atmosphere for research and learning in the faculty. My appreciation further goes to the

Center for Graduate Studies and the leadership of UNIMAS for granting me financial

sponsorship through the Zamalah Postgraduate Scholarship program. Similarly, I

acknowledge with gratitude, the Ministry of Higher Education Malaysia, for supporting this

work through the Fundamental Research Grant scheme FRGS/2/10/SG/UNIMAS/02/04. My

international conferences were financed by this research grant. Finally, I thank my wife,

children, parents and in-laws for their patience, understanding and moral supports throughout

these three years of being away from home as a result of my postgraduate studies.

iii

DEDICATION

To my dear wife Ifeoma Faith Eze (Mrs), and all my children for their patience, prayers and

supports all through these years of being away from my home country to pursue my doctorate

degree. To my parents, Chief and Mrs Patrick Eze Okpoto for training me. To my inlaws,

Chief and Chief (Mrs) Obikpe for all their cares.

iv

ABSTRACT

Malaria has been described as one of the most dangerous and widest spread tropical diseases,

with an estimated 247 million cases around the globe in the year 2006 alone. This calls for

urgent scientific interventions. Since malaria is a vector borne disease, this research tackled

the issue of malaria transmission from the angle of vector detection through a search engine.

There are observed cases of attempting vector control on a trial and errors basis, with no

scientific way of determining the locations of critical vector densities. Unfortunately, such a

practice leads to waste of resources on the wrong places, while ignoring the areas of critical

vector existence. This research formalizes a contact network using a number of attributes of

the malaria vectors, the public places, and the human beings that affect malaria transmission.

The resulting structure is a heterogeneous bipartite contact network of two node types - the

public places and the human beings nodes. The human beings are those who have suffered

from malaria, even when their residential homes were under reliable vector control. Such an

exclusion principle makes it obvious that these people, most probably contacted the disease

from outside their residential homes. The Hypertext Induced Topical Search (HITS) web

search algorithm was adapted to implement a search engine, which uses the bipartite contact

network as the input. MATLAB was used to implement the model system. The output shows

the public places which habour the infected malaria vectors, and their corresponding vector

densities. The model output was validated with UCINET 6.0 as the benchmark system. A root

mean square error (RMSE) value of 0.0023 was obtained when the output of the benchmark

system is compared with that of the search engine model. This result indicates a high and

acceptable level of accuracy.

v

ABSTRAK

Malaria merupakan salah satu penyakit tropika yang paling merbahaya dan luas

tersebar, dengan anggaran 247 juta kes di seluruh dunia pada tahun 2006 sahaja. Keadaan ini

memerlukan intervensi saintifik yang mendesak. Memandangkan malaria ialah penyakit yang

disebabkan oleh vektor, kajian ini cuba menangani isu penyebaran malaria melalui

pengesanan vektor menggunakan carian enjin. Terdapat kes-kes yang cuba mengawal vektor

tanpa menggunakan kaedah saintifik dalam menentukan kawasan kepadatan vektor kritikal.

Namun, kaedah tersebut membawa kepada pembaziran sumber pada kawasan yang salah, di

samping mengabaikan kewujudan kawasan-kawasan vektor kritikal. Kajian ini membina

rangkaian hubungan yang menggunakan beberapa ciri-ciri vektor malaria, tempat awam, dan

manusia yang mempengaruhi penyebaran penyakit malaria. Struktur yang dihasilkan adalah

rangkaian hubungan dwibahagian berheterogen yang terdiri daripada dua jenis nod - tempat-

tempat awam dan manusia. Manusia masih menjadi mangsa jangkitan malaria walaupun

kediaman mereka dilindungi menggunakan kawalan vector yang bagus. Berdasarkan prinsip

pengecualian yang dinyatakan, jelas menunjukkan bahawa kemungkinan besar, mangsa

dijangkiti penyakit ini di luar kawasan kediaman mereka. Algoritma carian web Hypertext

Induced Topical Search (HITS) telah digunapakai untuk melaksanakan enjin carian yang

menggunakan rangkaian hubungan dwibahagian sebagai input. MATLAB digunakan untuk

melaksanakan sistem model. Hasilnya, model ini menunjukkan tempat-tempat umum yang

mempunyai vektor malaria yang dijangkiti, serta dengan kepadatan vektornya. Model output

itu telah disahkan dengan menggunakan UCINET 6.0 sebagai sistem penanda aras. Nilai Root

Mean Square Error (RMSE) sebanyak 0.0023 terhasil apabila output sistem penanda aras ini

dibandingkan dengan model carian enjin. Keputusan ini menunjukkan tahap kejituan yang

tinggi dan boleh diterimapakai.

vi

LIST OF PUBLICATIONS/ RESEARCH PRESENTATIONS

Eze, M., Labadin, J., Lim, T. (2010, May 12-13). Role of Computational Science In Malaria

Research. In the Proceedings/ Book of Abstracts of Young ICT Researchers Colloquium

2010, FCSIT UNIMAS, p40

Eze, M., Labadin, J., Lim, T. (2011a, Mar 20-22). Emerging Computational Strategy for

Eradication of Malaria. The Proceedings of 2011 IEEE Symposium on Computers &

Informatics (IEEE /ISCI 2011), Kuala Lumpur, p715-720.

Eze, M., Labadin, J., Lim, T. (2011b, June 17-19). Mosquito Flight Model and Applications

in Malaria Control. In Proc. of 3rd International Conference on Computer Engineering and

Technology (ICCET 2011), Kuala Lumpur , pg 59-64.

Eze, M., Labadin, J., Lim, T. (2011c, July 18-22). The Binary Tree-Based Heterogeneous

Network Link Model for Malaria Research. In Proceedings of 7th International Congress for

Industrial and Applied Mathematics Conf. (ICIAM 2011), Vancouver Canada, p546-547.

Eze, M., Labadin, J., Lim, T. (2011d, July 11-14). Contact Strength Generating Algorithm for

Application in Malaria Transmission Network. In Proceedings of 7th International Conference

on IT in Asia (CITA 2011), Kuching, Sarawak, Malaysia, p21-26.

Eze, M., Labadin, J., Lim, T. (2012, January). Structural Convergence of WebGraph, Social

Network & Malaria Network: An Analytical Framework for Emerging Web-Hybrid Search

Engine. Accepted for 2nd

Review by the International Journal of Web Eng. & Tech. (IJWET).

vii

Short Publications/Research Summaries Presented

Eze, M., Labadin, J., Lim, T. (2011). Network Modeling of Malaria Transmission. Being a

Research Summary Published in the FCSIT 2011 Research Bulletin.

Eze, M., Labadin, J., Lim, T. (2012, Feb 29). Network Modeling of Malaria Transmission.

Being a Research Summary presented in the FCSIT UNIMAS Open Day 2012.

Eze, M., Labadin, J., Lim, T. (2012, March 21-22). Network-based Modeling of the

Transmission of Mosquito-Borne Disease. Being a Research Poster presented in the 5th

UNIMAS Research EXPO 2012.

Eze, M., Labadin, J., Lim, T. (2012). Network Modeling of Vector-Borne Diseases. Being a

research summary presented in A research Forum between Sarawak Health Department and

Computational Sciences Department, FCSIT UNIMAS on May 22, 2012.

Labadin, J., Lim,T. & Eze, M (June 2012). Network Modelling of Malaria Transmission,

UNIMAS Research Update 2012, Vol. 8, No.1, pg. 11

viii

TABLE OF CONTENTS

TITLE PAGE ...................................................................................................... i

ACKNOWLEDGEMENTS …………………………………………………………… ii

DEDICATION …………………………………………………………………… iii

ABSTRACT ………….……..…………………………………………………………. iv

ABSTRAK ………………..………………………………………………………….. v

LIST OF PUBLICATIONS/RESEARCH PRESENTATIONS ……………………. vi

TABLE OF CONTENTS …………………………………………………………… viii

LIST OF APPENDICES ……………………………………………………………. xiii

LIST OF TABLES …………………………………………………………………… xiii

LIST OF FIGURES ……………………………………………………………………. xiv

LIST OF EQUATIONS …………………………………………………………… xvi

LIST OF ABBREVIATIONS ……………………………………………………. xvii

CHAPTER 1: INTRODUCTION ………………………………………………… 1

1.0 OPENING ……………………………………………………………….. 1

1.1 BACKGROUND OF STUDY ………………………………………… 2

1.2 RESEARCH PROBLEMS ………………………………………………… 5

1.3 RESEARCH QUESTIONS ………………………………………………… 7

1.4 OBJECTIVES OF STUDY ……………………………………………….… 8

1.5 SCOPE OF STUDY ………………………………………………………… 9

1.6 SIGNIFICANCE OF STUDY ……………………………………….... 10

1.7 RESEARCH METHODOLOGY ……………………………………….… 11

1.8 THESIS OUTLINE ……………………………………………………….… 14

ix

CHAPTER 2: LITERATURE SURVEY ………………………………………….. 15

2.0 INTRODUCTION …………………………………………………………. 15

2.1 WHAT IS MALARIA? …………………………………………………. 15

2.1.1 Malaria Lifecycle …………………………………………………. 17

2.1.2 Malaria Lifecycle and Contact Networks …………………………. 18

2.2 COMPUTATIONAL EPIDEMIOLOGY OF MALARIA ………………… 19

2.2.1 Malaria Transmission Factors ………………………………… 20

2.2.1a Demographic Factors ………………………………… 21

2.2.1b Human and Socioeconomic Factors ………………………… 21

2.2.1c Biological and Clinical Factors ………………………… 22

2.2.1d Topological and Environmental Factors ………………… 22

2.3 PUBLIC PLACES IN DISEASE TRANSMISSION ………………… 23

2.4 FROM GRAPH THEORY TO NETWORKS ………………………….. 25

2.5 BIPARTITE NETWORKS ………………………………………………… 28

2.6 CONTACT NETWORKS …………………………………………………. 29

2.7 MOSQUITO BEHAVIOUR IN CONTACT NETWORKS …………. 32

2.8 CONTACT STRENGTH DETERMINING FACTORS ………………… 35

2.8.1 Important Deductions and the way forward ………………… 37

2.9 STRUCTURAL SIMILARITY RESEARCH ………………………….. 39

2.10 BACKGROUND STUDY OF HITS ALGORITHM ………………….. 43

2.11 WEB SEARCH ENGINE APPLICATIONS IN NON-WEB FIELDS ….. 46

2.12 CRITICAL APPRAISAL OF EXISTING METHODOLOGIES …………… 47

2.13 CHAPTER SUMMARY ………………………………………………….. 49

x

CHAPTER 3: CONTACT NETWORK MODEL FORMALIZATION ………… 51

3.0 INTRODUCTION …………………………………………………………… 51

3.1 CONTACT NETWORK STRUCTURAL REPRESENTATION …………… 51

3.1.1 Contact Network Structural Definitions …………………………… 52

3.1.2 Malaria Contact Network Structural Problem …………………… 55

3.1.3 Contact Network Construction in Real Life …………………… 56

3.2 MALARIA VECTOR ACTIVITY MODELS …………………………. 59

3.2.1 Malaria Life Cycle Duration Model ………………………….. 60

3.2.2 Malaria Vector Biting Model …………………………………… 61

3.2.3 Malaria Vector Abundance Model …………………………………… 62

3.2.4 Malaria Vector Survival Model …………………………………… 62

3.2.5 Larval Count Estimation Model ……………………………………. 63

3.3 PUBLIC PLACES MODELS …………………………………………… 64

3.3.1 Expected Number of Annual Working Days Model ……………………. 65

3.3.2 Actual Number of Annual Working Days Model ……………………. 66

3.4 HUMAN BEINGS PARAMETERS ……………………………………. 66

3.5 CONTACT NETWORK PARAMETER ASSIGNMENTS ……………. 67

3.6 THE CONTACT STRENGTH MODEL CALCULATIONS ……………. 74

3.7 THE CONTACT STRENGTH NORMALIZATION …………………….. 76

3.8 CHAPTER SUMMARY …………………………………………………… 77

CHAPTER 4: SYSTEM DESIGN, IMPLEMENTATION AND RESULTS …… 79

4.0 INTRODUCTION …………………………………………………………… 79

4.1 SEARCH ALGORITHM FEATURES …………………………………… 80

xi

4.1.1 Weight Matrix Generation by HITS Algorithm in the Web – PS1 …… 81

4.1.2 Weight Matrix Generation by HITS in Malaria Network- PS2 …… 83

4.2 SYSTEM DESIGN …………………………………………………………… 83

4.2.1 SEARCH ENGINE WORKFLOW DESIGN …………………… 84

4.2.1a Inputs Section …………………………………………… 85

4.2.1b Transformation Section …………………………………… 86

4.2.1c Search and Indexing Section …………………………………… 88

4.2.1d Interpretation of Search Result …………………………… 91

4.2.2 EXTENDED CONTRIBUTIONS SECTION …………………… 92

4.2.2a Contact Network Crowd Analysis ………………………….. 92

4.2.2b Contact Network Evaluation Engine ………………………….. 94

4.2.2c Malaria Indirect Transfer Analysis ………………………….. 96

4.2.2d Pyramidal Visualization System …………………………. 99

4.2.3 System Output Section …………………………………………. 102

4.3 SYSTEM IMPLEMENTATION …………………………………………. 102

4.3.1 SYSTEM IMPLEMENTATION ENVIRONMENT ………………….. 103

4.3.2 SYSTEM OPTIMIZATION STRATEGIES …………………. 103

4.3.2a Storage Space Saving through Sparse Matrix Application …... 104

4.3.2b Speed Improvement Benefit …………………………………… 106

4.3.2c Fault Avoidance Strategy …………………………………… 106

4.3.3 IMPLEMENTATION LIMITATIONS AND CHALLENGES …… 107

4.4 CHAPTER SUMMARY …………………………………………………… 107

xii

CHAPTER 5: MODEL VALIDATION …………………………………………… 109

5.0 INTRODUCTION …………………………………………………………… 109

5.1 MODEL VALIDATION FRAMEWORK …………………………………… 110

5.2 BENCHMARK VALIDATION …………………………………………… 113

5.2.1 Benchmark Validation Platform …………………………………… 113

5.2.2 Benchmark Validation Workflow …………………………………… 115

5.2.2a Benchmark Validation Tasks 1 (Loading Data into System) ……. 116

5.2.2b Benchmark Validation Tasks 2 (System Runs and Output) ……. 116

5.2.2c Benchmark Validation Tasks 3 (Error Analysis) ……………. 120

5.2.2d Benchmark Validation Task 4 (Interpretation of Result) ……. 122

5.3 ANALYTICAL VALIDATION ……………………………………………. 122

5.3.1 Opening Time analysis of MCPP …………………………………… 122

5.3.2 Network Crowding and Vector Density Correlation Analysis …… 124

5.3.3 Contact Strength and Vector Density Correlation Analysis …………… 126

5.4 VALIDATION RESULT DISCUSSION …………………………………… 128

5.4.1 Discussion on Benchmark Validation …………………………… 128

5.4.2 Discussion on Analytical Validations …………………………… 128

CHAPTER 6: SUMMARY AND CONCLUSION …………………………. 131

6.0 INTRODUCTION ………………………………………………………….. 131

6.1 SUMMARY OF CURRENT RESEARCH …………………………………. 131

6.2 MAIN CONTRIBUTIONS …………………………………………………. 133

6.3 FUTURE RESEARCH ………………………………………………….. 136

6.3.1 Wide Area Malaria Vector Density Mapping Project …………… 136

xiii

6.3.2 Vector-borne Disease Flow Path Modeling …………………………… 136

6.3.3 The Wind, Flood and Malaria Vectors in Contact Networks …………… 137

6.3.4 Partitioned Version of the Contact Network …………………………… 137

6.4 DEPLOYMENT INFORMATION …………………………………………. 137

6.5 CONCLUSION …………………………………………………………. 138

REFERENCES …………………………………………………………………. 139

LIST OF APPENDICES

APPENDIX MP: Extended Implementation Section …………………………….. 165

APPENDIX TT: Tables of Implementation Related-Data ................................... 175

APPENDIX FF: Implementation Flow Charts ……………………………………... 189

APPENDIX IO: Implementation Outputs …………………………………….. 215

APPENDIX TR: Implementation Test Run Messages (Minimal Listing) …….. 226

APPENDIX SC: Implementation Source Code …………………………….. 229

LIST OF TABLES

(These are the tables in the main text. The other tables in the appendices are not listed here)

Table 3.01A: Feasibility Research Summary Table …………………………………… 58

Table 3.01B: Link Matrix (Columns 1-20) …………………………………………… 68

Table 3.01C: Link Matrix (Column 21-40) …………………………………………… 69

Table 3.01D: Bin2dec and Dec2bin Conversions …………………………………… 71

Table 4.03T: Crowd Matrix …………………………………………………………… 93

Table 4.04T: Sample Link Matrix …………………………………………………… 97

Table 4.05T: Sparse Matrix ……………………………………………………………. 97

xiv

Table 4.09T: Summary of results ……………………………………………………. 108

Table 5.02T: RMSE Analysis Detailed Calculation Table ……………………………. 121

Table 5.03T: Vector Density vs Crowd Correlation Calculation Table ……………. 125

Table 5.04T: Vector Density vs Contact Strength Correlation Table …………….. 127

LIST OF FIGURES

(These are the figures in the main text. The other figures in the appendices are not listed here)

Fig. 1.01F: Minimal Flowchart of the Methodology …………………………… 12

Fig. 2.01F: Literature Survey Domains …………………………………………… 16

Fig. 2.02F: Developmental Life Cycle of Malaria …………………………………… 18

Fig. 2.03F: Contact Building Block Diagram …………………………………… 19

Fig. 2.04F: Epidemiological Triangle …………………………………………… 20

Fig. 2.05F: Average Global Household Size …………………………………………… 24

Fig. 2.06F: A Sample Graph …………………………………………………… 25

Fig. 2.07F: Graph vis-à-vis Network Structures …………………………………… 26

Fig. 2.08F: A Sample Network Modeled from Simple Graph Structures …………… 27

Fig. 2.09F: Minimal Algorithm on progression from Graph to Network Model …… 28

Fig. 2.10F: A Sample 3P by 5H Contact Network …………………………………… 30

Fig. 2.11F: Single Node Network …………………………………………………… 32

Fig. 2.12F: Mosquito Behavioural Model Facts …………………………………… 34

Fig. 2.13F: Measure of Level of Contacts in Disease Related Researches …………… 35

Fig. 2.14F: Measure of Level of Contacts in Non-Disease Related Researches ……. 36

Fig. 2.15F: A Sample contact network with arbitrary edge weights ……………. 38

Fig. 2.16F: Demonstration of Link and Weight Matrices ……………………………. 38

xv

Fig. 2.17F: Illustrations of Web Graph and Social Network ……………………………. 40

Fig. 2.18F: Transformation into Adjacency Matrix ……………………………………. 42

Fig. 2.19F: Illustration of Dynamic and Static Network ……………………………. 44

Fig. 2.20F: Non-web fields with web algorithm search engines ………………….. 46

Fig. 2.21F: Classification of Malaria vector species ………………………………….. 48

Fig. 3.01F: Model Formalization Coverage Areas ………………………………….. 52

Fig. 3.02F: A Sample Contact Network Diagram ………………………………….. 54

Fig. 3.03F: Public Places used for Vector Existence Feasibility Research ………….. 57

Fig. 3.04F: Contact Strength Model Block Diagram ………………………….. 67

Fig. 3.05F: Partial Sketch of the Model Contact Network …………………………… 77

Fig. 4.01F: System Design and Implementation Coverage Areas …………………… 80

Fig. 4.02F: Search Engine Comparative Features Diagram …………………………… 82

Fig. 4.03F: System Design Framework …………………………………………… 84

Fig. 4.06F: System Workflow …………………………………………………… 85

Fig. 4.07F: Structural Attributes of the Hub and Authority Matrices …………… 87

Fig. 4.08F: Sketch of the Implementation Iteration Steps …………………… 90

Fig. 4.09F: Result derived from Indexing Operation …………………………… 91

Fig. 4.10F: Crowd Analysis Workflow ……………………………………………. 92

Fig. 4.11F: Public Places Crowd Graph ……………………………………………. 93

Fig. 4.12F: Indirect Transfer Analysis Workflow Design ……………………………. 97

Fig. 4.13F: Similarity measures related to the most critical public place P2 ……………. 98

Fig. 4.14F: Pyramidal Visualization Workflow ……………………………………. 99

Fig. 4.15F: Sparse Matrix Transformation into three Linear Vectors ……………. 100

Fig. 4.16F: Sparse Matrix Transformation into three Linear Vectors ……………. 101

xvi

Fig. 4.23F: Space saving through use of sparse matrices ……………………………. 104

Fig. 5.01F: Model Validation Coverage Areas …………………………………… 110

Fig. 5.02F: System Validation Framework …………………………………………… 112

Fig. 5.03F: Benchmark Validation Broken into 4 Specific Tasks …………………… 115

Fig. 5.10F: Benchmark Ranking Result (B-Result) …………………………………… 117

Fig. 5.11F: Result of Sorting the M-Result and B-Result Datasets …………………… 119

Fig. 5.12F: Details of the Calibration Operations ……………………………………. 119

Fig. 5.13F: Error analysis datasets (B-Result and M-Result) …………………………….. 120

Fig. 5.14F: MCPP Open Time Analysis Result …………………………………….. 123

LIST OF EQUATIONS

Equation (2.1) Definition of Link Matrix …………………………………… 41

Equation (2.2) Definition of Adjacency Matrix …………………………… 41

Equation (2.3) Hub Matrix Transformation Equation …………………… 45

Equation (2.4) Authority Matrix Transformation Equation …………………… 45

Equation (3.1) Contact Network Components Set Equation …………………… 53

Equation (3.2) Polynomial Fit Temperature Normalization Equation …… 61

Equation (3.3) Malaria Life Cycle Duration Model Equation ……………. 61

Equation (3.4) Malaria Vector Biting Model Equation …………………… 61

Equation (3.5) Malaria Vector Abundance Model Equation …………………… 62

Equation (3.6) Polynomial Fit Elevation Normalization Equation …………… 62

Equation (3.7) Malaria Vector Survival Model Equation …………………… 63

Equation (3.8) Larval Count Estimation Model Equation …………………… 64

Equation (3.9) Expected Number of Annual Working Days Model …………… 65

xvii

Equation (3.10) Actual Number of Annual Working Days Model …………… 66

Equation (3.11) Contact Strength Calculation Model …………………………… 74

Equation (3.12) Expanded Contact Strength Calculation Model …………… 75

Equation (3.13) Contact Strength Normalization Equation …………………… 76

Equation (4.1) Contact Strength to Hub Derivation Equation …………… 86

Equation (4.2) Contact Strength to Authority Hub Derivation Equation …… 86

Equation (4.3) Power Method Eigen Equation ……………………………. 88

Equation (4.4) EigenVector Estimation Iteration Equation …………………… 89

Equation (4.5) Contact Strength Max2Max Ratio Evaluation Equation …… 94

Equation (4.6) Crowd Max2Max Ratio Evaluation …………………………… 95

Equation (4.7) Jaccard Similarity Coefficient …………………………… 96

Equation (4.8-13) Space Savings Benefit Calculation Equations …………… 105

Equation (5.1) RMSE Definition Equation …………………………………… 121

Equation (5.2) Network Crowding Vs Vector Density Correlation Equation …... 124

Equation (5.3) Contact Strength and Vector Density Correlation Equation …… 126

LIST OF ABBREVIATIONS

ABBREVATION MEANING

#EWD Expected Annual Number of Working Days

#YEARMIN Actual Annual Working Days (designated by #YEARMIN)

AUTH Authority Matrix

CnetSimVer1.0 Contact Network Simulation System Version 1.0

CO2 Carbon dioxide

FlowSNxxx Stands for ‘Flow Chart Code xxx’

xviii

HITS Hypertext Induced Topical Search.

MATLAB Matrix Laboratory

SCodeSNxxx Source Code Serial number

OUTPUT.LOGx Log generated by UCINET during system runs.

Pidx Public Place Index Column

PPCTx Public Place Close Time for a given public place Px.

PPDTx Public Place Time Duration for a given public place Px.

PPOTx Public Place Open Time for any public place Px.

PPSim Public Places Similarity

Pscore Public Place Rank Score

RMSE Root Mean Square Error

SIM Similarity Measure Function

UNIMAS Universiti Malaysia, Sarawak

xP by wH Contact network of x public places and w human nodes (x,w:integers).

1

CHAPTER ONE

INTRODUCTION

1.0 OPENING

The domain of this research is in Computational Modeling, and the specific area is in

the modeling of vector-borne disease transmission using contact network models, in particular

in view of detecting locations of possibly high density of infected mosquitoes. In order to do

this, the research problem needs to be properly constructed into network models where the

nodes and edges are clearly defined for the purposes of the said detection. The problem

solving steps are evolved, designed and then implemented using an appropriate computer

programming platform. The network structure is transformed into an appropriate format in

order to use it as the input into the model. The model is run, validated, and a number of

analytical results generated.

The motivation for this study is as follows. The conventional way of disease

modeling, commonly known as compartmental modeling, is by constructing differential

equations. Unfortunately, such models lack the support to detect locations of possibly high

density of infected mosquitoes. Locating the public places that harbour malaria vectors is very

important in the eradication of malaria. This is because, without this, vector control efforts

will be wasted on areas of less importance. Hence, the main contribution of this research is to

demonstrate a new approach to model vector-borne disease transmission.

1.1 BACKGROUND OF STUDY

Malaria is a vector-borne disease that results from blood infection by protozoan

parasites of the genus Plasmodium, which are transmitted from one human being to another

2

by female Anopheles mosquitoes (Richard & Kamini, 2002). The four species of malaria

parasites that infect humans are Plasmodium falciparum, Plasmodium vivax, Plasmodium

malariae and Plasmodium ovale. Malaria is one of the most dangerous and widest spread

tropical diseases, according to Global Risk Forum (2009). As reported by WHO (2008), there

were an estimated 247 million malaria cases worldwide in 2006, causing nearly a million

deaths, mostly of children under 5 years. It has been stated that about 3.3 billion people (about

half of the world's population) are at risk of malaria. Every year, this leads to about 250

million malaria cases and nearly one million deaths. People living in the poorest countries are

the most vulnerable (WHO, 2010). A research mentioned malaria as one of the root causes of

poverty (Malaria Consortium, 2010). It has been estimated that malaria cuts economic growth

rates by as much as 1.3% in countries with high disease rates (UNDP/World Bank/WHO,

2003). A child dies of malaria every 30 seconds (National Institute of Allergy and Infectious

Disease NIAID, 2010). Historical survey shows that malaria has existed for centuries, and

several eradication efforts have failed to make the desired impacts. Cox (2010) also recounted

that for about 2500 years, there was an erroneous belief that malaria resulted from polluted air

rising from swamps. Kelly-Hope & McKenzie (2009) had cited malaria as the most serious

mosquito-borne disease. The discovery of malaria parasite (Pettersson, 2005) and malaria

vector (Feachem et al., 2009) in the 19th

century are important milestones in malaria research,

thus promoting “scientific precision” over “trial and errors”.

The Bill & Melinda Gates Foundation (2009) pointed out that research towards

eradication of malaria is an unavoidable venture. The goal of this research is to tackle the

issue of malaria transmission through vector detection. This is through application of a search

engine on a contact network with the aim of detecting the public places that harbour malaria

vectors, and ranking such public places in terms of their vector densities. Public places

3

(example markets, schools and others) are chosen due to the fact that they accommodate

higher population size of human beings than average residential homes. Since disease spread

increases with increase in population size, transmission is expected to also increase and affect

more human beings through the public places. Vector control itself may not be successful

without reliable scientific tools that detect locations for urgent vector control attention.

A conventional approach to study the disease transmission is through compartmental

modeling, which employs a system of differential equations (Ladeau et al., 2011). The SIR

Model (Dimitrov & Meyers, 2010), which breaks the population into three compartments -

susceptible, infected and recovered - is in this category. Unfortunately, the compartmental

modeling approach is based on some unrealistic assumptions, one of which is the concept of

homogeneous mixing. This is the assumption that all individuals in a transmission

environment have equal chances of mixing with others, and hence have uniform probability of

contacting the disease. Network modeling is an improvement over compartmental approach in

the sense that it has the ability to depict the complexity of the real world (Craft & Caillaud,

2011) by capturing the interactions that lead to disease transmission. Hence rather than simply

assuming that all individuals have equal chances of contacting the disease, this model’s

approach takes note of the fact that contacts always vary, and that the probability of disease

transmission is proportional to the level of such contacts. Contact network modeling is rooted

in graph theory.

Before defining contact networks, it is important to first define graphs and networks,

and point out the relationship between the two concepts. A graph is a mathematical structure

made up of a set of points called nodes that are connected by lines called edges. A network is

a graph where the nodes and edges have been assigned meaningful values. The word

meaningful in this sense implies that the resulting structure automatically becomes associated

4

with a particular field of study. For instance, a road network is a graph structure where the

nodes represent different cities, the edges represent the roads connecting the cities, and the

edge labels represent the actual distance of the roads. Hence, a graph is a mathematical model

of networks. From the angle of the object oriented paradigm, a network is simply an

instantiation of a graph object.

A contact network is a graph structure where each node represents a person (or

location), and the edges represent contacts among people (or locations) in the network

(Meyers, 2007). In infectious disease epidemiology, a contact network depicts interactions

that can lead to disease transmission. Human infectious diseases get transmitted as a result of

human contacts either with other infected human beings, locations, or non-human infectious

agents, depending on the disease in question. For instance, lung infections are contacted by

being in locations with particulate air pollution (Fullerton et al., 2008). Malaria transmission

takes places when a human is bitten by infected vectors. The human contact in this case takes

place within a location where these infected vectors thrive. A contact network is therefore a

structure that model a disease transmission environment as a set of nodes and edges, such that

the disease transmits from one node to another through the edges (Salathe & Jones, 2010). In

contact networks, the higher the edge weight (measure of level of possible contact), the higher

the probability of transmission between adjacent nodes (Schumm et al., 2007).

A contact network can be categorized as either homogeneous (single node) or

heterogeneous. A single node contact network is a network in which all the nodes are of the

same type, while a heterogeneous network is one where the nodes are of different types, and

hence of different behavioural attributes. A contagious skin disease such as small pox can be

modeled using a single node network since transfer is directly from person to person, unlike

in malaria where vectors are involved. The complexity is minimal when modeling such a

5

disease since only a single node type is involved in the disease transmission environment.

This is different in the case of malaria transmission, which requires a heterogeneous contact

network where the two node types ‘public places’ and ‘human beings’ have a number of

dissimilar attributes. For instance, while human beings usually move about, public places are

usually stationary. Furthermore, since malaria vectors have mobility (they can fly), their

attributes have to be factored into the model, thereby making heterogeneous contact network

modeling more complex.

The research problem here is therefore about the necessity to detect and rank the

public places that account for the infection of the human beings. These are the reservoirs for

infected malaria vectors.

1.2 RESEARCH PROBLEMS

Malaria transmission in public places is a problem that needs scientific intervention.

An article by Rogers (2009) reported that a number of public places, such as bars and

restaurants, had closed outdoor terraces or shut down completely because of what was

described as a “100 billion mosquito invasion”.

Unfortunately, there is a research gap that needs to be filled in terms of the detection

of public places that harbour these vectors. A practical scenario was observed in 2010 when a

team of vector control experts visited UNIMAS to spray the institution against malaria and

dengue vectors. In an interview, they mentioned that they lacked vector detection tools, which

resulted in the team possibly spraying in the wrong places.

A number of tools in existence for vector detection have some associated

disadvantages. One such technology is the laser-wielding robot that detects mosquitoes in the

air and shoots them dead (Robert, 2009). Two serious concerns expressed from public opinion

6

are that the lasers could be harmful to human beings, and that the technology could

mistakenly kill other insects that are useful in the ecosystem. Vector detection is an issue that

calls for research.

Disease modeling methods that assume population homogeneity have been described

as faulty and unrealistic (Tom & Gerardo, 2009). The term population homogeneity refers to

the assumption that every person within the disease transmission environment has equal

probability of mixing with others and hence getting infected by the disease. An improvement

over this faulty strategy is to build models that emphasize variation of contacts leading to

disease transmission. Network modeling is a method that takes into consideration the

variation of contacts in disease transmission, which is the method proposed in this research as

a way to address the observed deficiency.

The difficulty in modeling of malaria transmission arises due to its complex life cycle.

Fortunately, every malaria transmission involves contacts (blood sucking bites) between

human beings and the vectors. While this scenario could be used to build contact networks for

malaria transmission studies, there are some important issues to be dealt with, one of which is

the fact that public places and human beings have different attributes. This would mean that a

heterogeneous network rather than a single-node network would be more appropriate for

studying malaria transmission in public places. However, Christakis & Fowler (2009) stated

that the complexity associated with heterogeneous networks generation has impeded many

network researches, which is an issue that needs to be addressed.

Given that a contact network model is involved, a search engine would be appropriate

to be used to detect the public places of interest, which is another research gap to be filled.

For this purpose, we will propose a web search engine algorithm on the contact network for

vector reservoir detection. To the best of our knowledge, no previous research has applied