WEB ALGORITHM SEARCH ENGINE BASED NETWORK … Algorithm Search Engine Based...scientific way of...
Transcript of WEB ALGORITHM SEARCH ENGINE BASED NETWORK … Algorithm Search Engine Based...scientific way of...
WEB ALGORITHM SEARCH ENGINE BASED
NETWORK MODELING OF MALARIA TRANSMISSION
EZE, MONDAY OKPOTO
A thesis Submitted
In fulfillment of the requirement for the degree of Doctor of Philosophy (PhD)
in Computer Science.
Faculty of Computer Science and Information Technology
UNIVERSITI MALAYSIA, SARAWAK
2013
ii
ACKNOWLEDGEMENTS
I thank God who granted me the needed strength throughout the period of this doctorate
research. My supervisors Assoc. Prof. Dr Jane Labadin and Terrin Lim deserve my
appreciations for their supports and guidance in making this work a huge success. I also
acknowledge the supports of the Dean of the Faculty of Computer Science and Information
Technology, UNIMAS, Prof. Dr. Narayanan Kulathuramaiyer for creating the needed
atmosphere for research and learning in the faculty. My appreciation further goes to the
Center for Graduate Studies and the leadership of UNIMAS for granting me financial
sponsorship through the Zamalah Postgraduate Scholarship program. Similarly, I
acknowledge with gratitude, the Ministry of Higher Education Malaysia, for supporting this
work through the Fundamental Research Grant scheme FRGS/2/10/SG/UNIMAS/02/04. My
international conferences were financed by this research grant. Finally, I thank my wife,
children, parents and in-laws for their patience, understanding and moral supports throughout
these three years of being away from home as a result of my postgraduate studies.
iii
DEDICATION
To my dear wife Ifeoma Faith Eze (Mrs), and all my children for their patience, prayers and
supports all through these years of being away from my home country to pursue my doctorate
degree. To my parents, Chief and Mrs Patrick Eze Okpoto for training me. To my inlaws,
Chief and Chief (Mrs) Obikpe for all their cares.
iv
ABSTRACT
Malaria has been described as one of the most dangerous and widest spread tropical diseases,
with an estimated 247 million cases around the globe in the year 2006 alone. This calls for
urgent scientific interventions. Since malaria is a vector borne disease, this research tackled
the issue of malaria transmission from the angle of vector detection through a search engine.
There are observed cases of attempting vector control on a trial and errors basis, with no
scientific way of determining the locations of critical vector densities. Unfortunately, such a
practice leads to waste of resources on the wrong places, while ignoring the areas of critical
vector existence. This research formalizes a contact network using a number of attributes of
the malaria vectors, the public places, and the human beings that affect malaria transmission.
The resulting structure is a heterogeneous bipartite contact network of two node types - the
public places and the human beings nodes. The human beings are those who have suffered
from malaria, even when their residential homes were under reliable vector control. Such an
exclusion principle makes it obvious that these people, most probably contacted the disease
from outside their residential homes. The Hypertext Induced Topical Search (HITS) web
search algorithm was adapted to implement a search engine, which uses the bipartite contact
network as the input. MATLAB was used to implement the model system. The output shows
the public places which habour the infected malaria vectors, and their corresponding vector
densities. The model output was validated with UCINET 6.0 as the benchmark system. A root
mean square error (RMSE) value of 0.0023 was obtained when the output of the benchmark
system is compared with that of the search engine model. This result indicates a high and
acceptable level of accuracy.
v
ABSTRAK
Malaria merupakan salah satu penyakit tropika yang paling merbahaya dan luas
tersebar, dengan anggaran 247 juta kes di seluruh dunia pada tahun 2006 sahaja. Keadaan ini
memerlukan intervensi saintifik yang mendesak. Memandangkan malaria ialah penyakit yang
disebabkan oleh vektor, kajian ini cuba menangani isu penyebaran malaria melalui
pengesanan vektor menggunakan carian enjin. Terdapat kes-kes yang cuba mengawal vektor
tanpa menggunakan kaedah saintifik dalam menentukan kawasan kepadatan vektor kritikal.
Namun, kaedah tersebut membawa kepada pembaziran sumber pada kawasan yang salah, di
samping mengabaikan kewujudan kawasan-kawasan vektor kritikal. Kajian ini membina
rangkaian hubungan yang menggunakan beberapa ciri-ciri vektor malaria, tempat awam, dan
manusia yang mempengaruhi penyebaran penyakit malaria. Struktur yang dihasilkan adalah
rangkaian hubungan dwibahagian berheterogen yang terdiri daripada dua jenis nod - tempat-
tempat awam dan manusia. Manusia masih menjadi mangsa jangkitan malaria walaupun
kediaman mereka dilindungi menggunakan kawalan vector yang bagus. Berdasarkan prinsip
pengecualian yang dinyatakan, jelas menunjukkan bahawa kemungkinan besar, mangsa
dijangkiti penyakit ini di luar kawasan kediaman mereka. Algoritma carian web Hypertext
Induced Topical Search (HITS) telah digunapakai untuk melaksanakan enjin carian yang
menggunakan rangkaian hubungan dwibahagian sebagai input. MATLAB digunakan untuk
melaksanakan sistem model. Hasilnya, model ini menunjukkan tempat-tempat umum yang
mempunyai vektor malaria yang dijangkiti, serta dengan kepadatan vektornya. Model output
itu telah disahkan dengan menggunakan UCINET 6.0 sebagai sistem penanda aras. Nilai Root
Mean Square Error (RMSE) sebanyak 0.0023 terhasil apabila output sistem penanda aras ini
dibandingkan dengan model carian enjin. Keputusan ini menunjukkan tahap kejituan yang
tinggi dan boleh diterimapakai.
vi
LIST OF PUBLICATIONS/ RESEARCH PRESENTATIONS
Eze, M., Labadin, J., Lim, T. (2010, May 12-13). Role of Computational Science In Malaria
Research. In the Proceedings/ Book of Abstracts of Young ICT Researchers Colloquium
2010, FCSIT UNIMAS, p40
Eze, M., Labadin, J., Lim, T. (2011a, Mar 20-22). Emerging Computational Strategy for
Eradication of Malaria. The Proceedings of 2011 IEEE Symposium on Computers &
Informatics (IEEE /ISCI 2011), Kuala Lumpur, p715-720.
Eze, M., Labadin, J., Lim, T. (2011b, June 17-19). Mosquito Flight Model and Applications
in Malaria Control. In Proc. of 3rd International Conference on Computer Engineering and
Technology (ICCET 2011), Kuala Lumpur , pg 59-64.
Eze, M., Labadin, J., Lim, T. (2011c, July 18-22). The Binary Tree-Based Heterogeneous
Network Link Model for Malaria Research. In Proceedings of 7th International Congress for
Industrial and Applied Mathematics Conf. (ICIAM 2011), Vancouver Canada, p546-547.
Eze, M., Labadin, J., Lim, T. (2011d, July 11-14). Contact Strength Generating Algorithm for
Application in Malaria Transmission Network. In Proceedings of 7th International Conference
on IT in Asia (CITA 2011), Kuching, Sarawak, Malaysia, p21-26.
Eze, M., Labadin, J., Lim, T. (2012, January). Structural Convergence of WebGraph, Social
Network & Malaria Network: An Analytical Framework for Emerging Web-Hybrid Search
Engine. Accepted for 2nd
Review by the International Journal of Web Eng. & Tech. (IJWET).
vii
Short Publications/Research Summaries Presented
Eze, M., Labadin, J., Lim, T. (2011). Network Modeling of Malaria Transmission. Being a
Research Summary Published in the FCSIT 2011 Research Bulletin.
Eze, M., Labadin, J., Lim, T. (2012, Feb 29). Network Modeling of Malaria Transmission.
Being a Research Summary presented in the FCSIT UNIMAS Open Day 2012.
Eze, M., Labadin, J., Lim, T. (2012, March 21-22). Network-based Modeling of the
Transmission of Mosquito-Borne Disease. Being a Research Poster presented in the 5th
UNIMAS Research EXPO 2012.
Eze, M., Labadin, J., Lim, T. (2012). Network Modeling of Vector-Borne Diseases. Being a
research summary presented in A research Forum between Sarawak Health Department and
Computational Sciences Department, FCSIT UNIMAS on May 22, 2012.
Labadin, J., Lim,T. & Eze, M (June 2012). Network Modelling of Malaria Transmission,
UNIMAS Research Update 2012, Vol. 8, No.1, pg. 11
viii
TABLE OF CONTENTS
TITLE PAGE ...................................................................................................... i
ACKNOWLEDGEMENTS …………………………………………………………… ii
DEDICATION …………………………………………………………………… iii
ABSTRACT ………….……..…………………………………………………………. iv
ABSTRAK ………………..………………………………………………………….. v
LIST OF PUBLICATIONS/RESEARCH PRESENTATIONS ……………………. vi
TABLE OF CONTENTS …………………………………………………………… viii
LIST OF APPENDICES ……………………………………………………………. xiii
LIST OF TABLES …………………………………………………………………… xiii
LIST OF FIGURES ……………………………………………………………………. xiv
LIST OF EQUATIONS …………………………………………………………… xvi
LIST OF ABBREVIATIONS ……………………………………………………. xvii
CHAPTER 1: INTRODUCTION ………………………………………………… 1
1.0 OPENING ……………………………………………………………….. 1
1.1 BACKGROUND OF STUDY ………………………………………… 2
1.2 RESEARCH PROBLEMS ………………………………………………… 5
1.3 RESEARCH QUESTIONS ………………………………………………… 7
1.4 OBJECTIVES OF STUDY ……………………………………………….… 8
1.5 SCOPE OF STUDY ………………………………………………………… 9
1.6 SIGNIFICANCE OF STUDY ……………………………………….... 10
1.7 RESEARCH METHODOLOGY ……………………………………….… 11
1.8 THESIS OUTLINE ……………………………………………………….… 14
ix
CHAPTER 2: LITERATURE SURVEY ………………………………………….. 15
2.0 INTRODUCTION …………………………………………………………. 15
2.1 WHAT IS MALARIA? …………………………………………………. 15
2.1.1 Malaria Lifecycle …………………………………………………. 17
2.1.2 Malaria Lifecycle and Contact Networks …………………………. 18
2.2 COMPUTATIONAL EPIDEMIOLOGY OF MALARIA ………………… 19
2.2.1 Malaria Transmission Factors ………………………………… 20
2.2.1a Demographic Factors ………………………………… 21
2.2.1b Human and Socioeconomic Factors ………………………… 21
2.2.1c Biological and Clinical Factors ………………………… 22
2.2.1d Topological and Environmental Factors ………………… 22
2.3 PUBLIC PLACES IN DISEASE TRANSMISSION ………………… 23
2.4 FROM GRAPH THEORY TO NETWORKS ………………………….. 25
2.5 BIPARTITE NETWORKS ………………………………………………… 28
2.6 CONTACT NETWORKS …………………………………………………. 29
2.7 MOSQUITO BEHAVIOUR IN CONTACT NETWORKS …………. 32
2.8 CONTACT STRENGTH DETERMINING FACTORS ………………… 35
2.8.1 Important Deductions and the way forward ………………… 37
2.9 STRUCTURAL SIMILARITY RESEARCH ………………………….. 39
2.10 BACKGROUND STUDY OF HITS ALGORITHM ………………….. 43
2.11 WEB SEARCH ENGINE APPLICATIONS IN NON-WEB FIELDS ….. 46
2.12 CRITICAL APPRAISAL OF EXISTING METHODOLOGIES …………… 47
2.13 CHAPTER SUMMARY ………………………………………………….. 49
x
CHAPTER 3: CONTACT NETWORK MODEL FORMALIZATION ………… 51
3.0 INTRODUCTION …………………………………………………………… 51
3.1 CONTACT NETWORK STRUCTURAL REPRESENTATION …………… 51
3.1.1 Contact Network Structural Definitions …………………………… 52
3.1.2 Malaria Contact Network Structural Problem …………………… 55
3.1.3 Contact Network Construction in Real Life …………………… 56
3.2 MALARIA VECTOR ACTIVITY MODELS …………………………. 59
3.2.1 Malaria Life Cycle Duration Model ………………………….. 60
3.2.2 Malaria Vector Biting Model …………………………………… 61
3.2.3 Malaria Vector Abundance Model …………………………………… 62
3.2.4 Malaria Vector Survival Model …………………………………… 62
3.2.5 Larval Count Estimation Model ……………………………………. 63
3.3 PUBLIC PLACES MODELS …………………………………………… 64
3.3.1 Expected Number of Annual Working Days Model ……………………. 65
3.3.2 Actual Number of Annual Working Days Model ……………………. 66
3.4 HUMAN BEINGS PARAMETERS ……………………………………. 66
3.5 CONTACT NETWORK PARAMETER ASSIGNMENTS ……………. 67
3.6 THE CONTACT STRENGTH MODEL CALCULATIONS ……………. 74
3.7 THE CONTACT STRENGTH NORMALIZATION …………………….. 76
3.8 CHAPTER SUMMARY …………………………………………………… 77
CHAPTER 4: SYSTEM DESIGN, IMPLEMENTATION AND RESULTS …… 79
4.0 INTRODUCTION …………………………………………………………… 79
4.1 SEARCH ALGORITHM FEATURES …………………………………… 80
xi
4.1.1 Weight Matrix Generation by HITS Algorithm in the Web – PS1 …… 81
4.1.2 Weight Matrix Generation by HITS in Malaria Network- PS2 …… 83
4.2 SYSTEM DESIGN …………………………………………………………… 83
4.2.1 SEARCH ENGINE WORKFLOW DESIGN …………………… 84
4.2.1a Inputs Section …………………………………………… 85
4.2.1b Transformation Section …………………………………… 86
4.2.1c Search and Indexing Section …………………………………… 88
4.2.1d Interpretation of Search Result …………………………… 91
4.2.2 EXTENDED CONTRIBUTIONS SECTION …………………… 92
4.2.2a Contact Network Crowd Analysis ………………………….. 92
4.2.2b Contact Network Evaluation Engine ………………………….. 94
4.2.2c Malaria Indirect Transfer Analysis ………………………….. 96
4.2.2d Pyramidal Visualization System …………………………. 99
4.2.3 System Output Section …………………………………………. 102
4.3 SYSTEM IMPLEMENTATION …………………………………………. 102
4.3.1 SYSTEM IMPLEMENTATION ENVIRONMENT ………………….. 103
4.3.2 SYSTEM OPTIMIZATION STRATEGIES …………………. 103
4.3.2a Storage Space Saving through Sparse Matrix Application …... 104
4.3.2b Speed Improvement Benefit …………………………………… 106
4.3.2c Fault Avoidance Strategy …………………………………… 106
4.3.3 IMPLEMENTATION LIMITATIONS AND CHALLENGES …… 107
4.4 CHAPTER SUMMARY …………………………………………………… 107
xii
CHAPTER 5: MODEL VALIDATION …………………………………………… 109
5.0 INTRODUCTION …………………………………………………………… 109
5.1 MODEL VALIDATION FRAMEWORK …………………………………… 110
5.2 BENCHMARK VALIDATION …………………………………………… 113
5.2.1 Benchmark Validation Platform …………………………………… 113
5.2.2 Benchmark Validation Workflow …………………………………… 115
5.2.2a Benchmark Validation Tasks 1 (Loading Data into System) ……. 116
5.2.2b Benchmark Validation Tasks 2 (System Runs and Output) ……. 116
5.2.2c Benchmark Validation Tasks 3 (Error Analysis) ……………. 120
5.2.2d Benchmark Validation Task 4 (Interpretation of Result) ……. 122
5.3 ANALYTICAL VALIDATION ……………………………………………. 122
5.3.1 Opening Time analysis of MCPP …………………………………… 122
5.3.2 Network Crowding and Vector Density Correlation Analysis …… 124
5.3.3 Contact Strength and Vector Density Correlation Analysis …………… 126
5.4 VALIDATION RESULT DISCUSSION …………………………………… 128
5.4.1 Discussion on Benchmark Validation …………………………… 128
5.4.2 Discussion on Analytical Validations …………………………… 128
CHAPTER 6: SUMMARY AND CONCLUSION …………………………. 131
6.0 INTRODUCTION ………………………………………………………….. 131
6.1 SUMMARY OF CURRENT RESEARCH …………………………………. 131
6.2 MAIN CONTRIBUTIONS …………………………………………………. 133
6.3 FUTURE RESEARCH ………………………………………………….. 136
6.3.1 Wide Area Malaria Vector Density Mapping Project …………… 136
xiii
6.3.2 Vector-borne Disease Flow Path Modeling …………………………… 136
6.3.3 The Wind, Flood and Malaria Vectors in Contact Networks …………… 137
6.3.4 Partitioned Version of the Contact Network …………………………… 137
6.4 DEPLOYMENT INFORMATION …………………………………………. 137
6.5 CONCLUSION …………………………………………………………. 138
REFERENCES …………………………………………………………………. 139
LIST OF APPENDICES
APPENDIX MP: Extended Implementation Section …………………………….. 165
APPENDIX TT: Tables of Implementation Related-Data ................................... 175
APPENDIX FF: Implementation Flow Charts ……………………………………... 189
APPENDIX IO: Implementation Outputs …………………………………….. 215
APPENDIX TR: Implementation Test Run Messages (Minimal Listing) …….. 226
APPENDIX SC: Implementation Source Code …………………………….. 229
LIST OF TABLES
(These are the tables in the main text. The other tables in the appendices are not listed here)
Table 3.01A: Feasibility Research Summary Table …………………………………… 58
Table 3.01B: Link Matrix (Columns 1-20) …………………………………………… 68
Table 3.01C: Link Matrix (Column 21-40) …………………………………………… 69
Table 3.01D: Bin2dec and Dec2bin Conversions …………………………………… 71
Table 4.03T: Crowd Matrix …………………………………………………………… 93
Table 4.04T: Sample Link Matrix …………………………………………………… 97
Table 4.05T: Sparse Matrix ……………………………………………………………. 97
xiv
Table 4.09T: Summary of results ……………………………………………………. 108
Table 5.02T: RMSE Analysis Detailed Calculation Table ……………………………. 121
Table 5.03T: Vector Density vs Crowd Correlation Calculation Table ……………. 125
Table 5.04T: Vector Density vs Contact Strength Correlation Table …………….. 127
LIST OF FIGURES
(These are the figures in the main text. The other figures in the appendices are not listed here)
Fig. 1.01F: Minimal Flowchart of the Methodology …………………………… 12
Fig. 2.01F: Literature Survey Domains …………………………………………… 16
Fig. 2.02F: Developmental Life Cycle of Malaria …………………………………… 18
Fig. 2.03F: Contact Building Block Diagram …………………………………… 19
Fig. 2.04F: Epidemiological Triangle …………………………………………… 20
Fig. 2.05F: Average Global Household Size …………………………………………… 24
Fig. 2.06F: A Sample Graph …………………………………………………… 25
Fig. 2.07F: Graph vis-à-vis Network Structures …………………………………… 26
Fig. 2.08F: A Sample Network Modeled from Simple Graph Structures …………… 27
Fig. 2.09F: Minimal Algorithm on progression from Graph to Network Model …… 28
Fig. 2.10F: A Sample 3P by 5H Contact Network …………………………………… 30
Fig. 2.11F: Single Node Network …………………………………………………… 32
Fig. 2.12F: Mosquito Behavioural Model Facts …………………………………… 34
Fig. 2.13F: Measure of Level of Contacts in Disease Related Researches …………… 35
Fig. 2.14F: Measure of Level of Contacts in Non-Disease Related Researches ……. 36
Fig. 2.15F: A Sample contact network with arbitrary edge weights ……………. 38
Fig. 2.16F: Demonstration of Link and Weight Matrices ……………………………. 38
xv
Fig. 2.17F: Illustrations of Web Graph and Social Network ……………………………. 40
Fig. 2.18F: Transformation into Adjacency Matrix ……………………………………. 42
Fig. 2.19F: Illustration of Dynamic and Static Network ……………………………. 44
Fig. 2.20F: Non-web fields with web algorithm search engines ………………….. 46
Fig. 2.21F: Classification of Malaria vector species ………………………………….. 48
Fig. 3.01F: Model Formalization Coverage Areas ………………………………….. 52
Fig. 3.02F: A Sample Contact Network Diagram ………………………………….. 54
Fig. 3.03F: Public Places used for Vector Existence Feasibility Research ………….. 57
Fig. 3.04F: Contact Strength Model Block Diagram ………………………….. 67
Fig. 3.05F: Partial Sketch of the Model Contact Network …………………………… 77
Fig. 4.01F: System Design and Implementation Coverage Areas …………………… 80
Fig. 4.02F: Search Engine Comparative Features Diagram …………………………… 82
Fig. 4.03F: System Design Framework …………………………………………… 84
Fig. 4.06F: System Workflow …………………………………………………… 85
Fig. 4.07F: Structural Attributes of the Hub and Authority Matrices …………… 87
Fig. 4.08F: Sketch of the Implementation Iteration Steps …………………… 90
Fig. 4.09F: Result derived from Indexing Operation …………………………… 91
Fig. 4.10F: Crowd Analysis Workflow ……………………………………………. 92
Fig. 4.11F: Public Places Crowd Graph ……………………………………………. 93
Fig. 4.12F: Indirect Transfer Analysis Workflow Design ……………………………. 97
Fig. 4.13F: Similarity measures related to the most critical public place P2 ……………. 98
Fig. 4.14F: Pyramidal Visualization Workflow ……………………………………. 99
Fig. 4.15F: Sparse Matrix Transformation into three Linear Vectors ……………. 100
Fig. 4.16F: Sparse Matrix Transformation into three Linear Vectors ……………. 101
xvi
Fig. 4.23F: Space saving through use of sparse matrices ……………………………. 104
Fig. 5.01F: Model Validation Coverage Areas …………………………………… 110
Fig. 5.02F: System Validation Framework …………………………………………… 112
Fig. 5.03F: Benchmark Validation Broken into 4 Specific Tasks …………………… 115
Fig. 5.10F: Benchmark Ranking Result (B-Result) …………………………………… 117
Fig. 5.11F: Result of Sorting the M-Result and B-Result Datasets …………………… 119
Fig. 5.12F: Details of the Calibration Operations ……………………………………. 119
Fig. 5.13F: Error analysis datasets (B-Result and M-Result) …………………………….. 120
Fig. 5.14F: MCPP Open Time Analysis Result …………………………………….. 123
LIST OF EQUATIONS
Equation (2.1) Definition of Link Matrix …………………………………… 41
Equation (2.2) Definition of Adjacency Matrix …………………………… 41
Equation (2.3) Hub Matrix Transformation Equation …………………… 45
Equation (2.4) Authority Matrix Transformation Equation …………………… 45
Equation (3.1) Contact Network Components Set Equation …………………… 53
Equation (3.2) Polynomial Fit Temperature Normalization Equation …… 61
Equation (3.3) Malaria Life Cycle Duration Model Equation ……………. 61
Equation (3.4) Malaria Vector Biting Model Equation …………………… 61
Equation (3.5) Malaria Vector Abundance Model Equation …………………… 62
Equation (3.6) Polynomial Fit Elevation Normalization Equation …………… 62
Equation (3.7) Malaria Vector Survival Model Equation …………………… 63
Equation (3.8) Larval Count Estimation Model Equation …………………… 64
Equation (3.9) Expected Number of Annual Working Days Model …………… 65
xvii
Equation (3.10) Actual Number of Annual Working Days Model …………… 66
Equation (3.11) Contact Strength Calculation Model …………………………… 74
Equation (3.12) Expanded Contact Strength Calculation Model …………… 75
Equation (3.13) Contact Strength Normalization Equation …………………… 76
Equation (4.1) Contact Strength to Hub Derivation Equation …………… 86
Equation (4.2) Contact Strength to Authority Hub Derivation Equation …… 86
Equation (4.3) Power Method Eigen Equation ……………………………. 88
Equation (4.4) EigenVector Estimation Iteration Equation …………………… 89
Equation (4.5) Contact Strength Max2Max Ratio Evaluation Equation …… 94
Equation (4.6) Crowd Max2Max Ratio Evaluation …………………………… 95
Equation (4.7) Jaccard Similarity Coefficient …………………………… 96
Equation (4.8-13) Space Savings Benefit Calculation Equations …………… 105
Equation (5.1) RMSE Definition Equation …………………………………… 121
Equation (5.2) Network Crowding Vs Vector Density Correlation Equation …... 124
Equation (5.3) Contact Strength and Vector Density Correlation Equation …… 126
LIST OF ABBREVIATIONS
ABBREVATION MEANING
#EWD Expected Annual Number of Working Days
#YEARMIN Actual Annual Working Days (designated by #YEARMIN)
AUTH Authority Matrix
CnetSimVer1.0 Contact Network Simulation System Version 1.0
CO2 Carbon dioxide
FlowSNxxx Stands for ‘Flow Chart Code xxx’
xviii
HITS Hypertext Induced Topical Search.
MATLAB Matrix Laboratory
SCodeSNxxx Source Code Serial number
OUTPUT.LOGx Log generated by UCINET during system runs.
Pidx Public Place Index Column
PPCTx Public Place Close Time for a given public place Px.
PPDTx Public Place Time Duration for a given public place Px.
PPOTx Public Place Open Time for any public place Px.
PPSim Public Places Similarity
Pscore Public Place Rank Score
RMSE Root Mean Square Error
SIM Similarity Measure Function
UNIMAS Universiti Malaysia, Sarawak
xP by wH Contact network of x public places and w human nodes (x,w:integers).
1
CHAPTER ONE
INTRODUCTION
1.0 OPENING
The domain of this research is in Computational Modeling, and the specific area is in
the modeling of vector-borne disease transmission using contact network models, in particular
in view of detecting locations of possibly high density of infected mosquitoes. In order to do
this, the research problem needs to be properly constructed into network models where the
nodes and edges are clearly defined for the purposes of the said detection. The problem
solving steps are evolved, designed and then implemented using an appropriate computer
programming platform. The network structure is transformed into an appropriate format in
order to use it as the input into the model. The model is run, validated, and a number of
analytical results generated.
The motivation for this study is as follows. The conventional way of disease
modeling, commonly known as compartmental modeling, is by constructing differential
equations. Unfortunately, such models lack the support to detect locations of possibly high
density of infected mosquitoes. Locating the public places that harbour malaria vectors is very
important in the eradication of malaria. This is because, without this, vector control efforts
will be wasted on areas of less importance. Hence, the main contribution of this research is to
demonstrate a new approach to model vector-borne disease transmission.
1.1 BACKGROUND OF STUDY
Malaria is a vector-borne disease that results from blood infection by protozoan
parasites of the genus Plasmodium, which are transmitted from one human being to another
2
by female Anopheles mosquitoes (Richard & Kamini, 2002). The four species of malaria
parasites that infect humans are Plasmodium falciparum, Plasmodium vivax, Plasmodium
malariae and Plasmodium ovale. Malaria is one of the most dangerous and widest spread
tropical diseases, according to Global Risk Forum (2009). As reported by WHO (2008), there
were an estimated 247 million malaria cases worldwide in 2006, causing nearly a million
deaths, mostly of children under 5 years. It has been stated that about 3.3 billion people (about
half of the world's population) are at risk of malaria. Every year, this leads to about 250
million malaria cases and nearly one million deaths. People living in the poorest countries are
the most vulnerable (WHO, 2010). A research mentioned malaria as one of the root causes of
poverty (Malaria Consortium, 2010). It has been estimated that malaria cuts economic growth
rates by as much as 1.3% in countries with high disease rates (UNDP/World Bank/WHO,
2003). A child dies of malaria every 30 seconds (National Institute of Allergy and Infectious
Disease NIAID, 2010). Historical survey shows that malaria has existed for centuries, and
several eradication efforts have failed to make the desired impacts. Cox (2010) also recounted
that for about 2500 years, there was an erroneous belief that malaria resulted from polluted air
rising from swamps. Kelly-Hope & McKenzie (2009) had cited malaria as the most serious
mosquito-borne disease. The discovery of malaria parasite (Pettersson, 2005) and malaria
vector (Feachem et al., 2009) in the 19th
century are important milestones in malaria research,
thus promoting “scientific precision” over “trial and errors”.
The Bill & Melinda Gates Foundation (2009) pointed out that research towards
eradication of malaria is an unavoidable venture. The goal of this research is to tackle the
issue of malaria transmission through vector detection. This is through application of a search
engine on a contact network with the aim of detecting the public places that harbour malaria
vectors, and ranking such public places in terms of their vector densities. Public places
3
(example markets, schools and others) are chosen due to the fact that they accommodate
higher population size of human beings than average residential homes. Since disease spread
increases with increase in population size, transmission is expected to also increase and affect
more human beings through the public places. Vector control itself may not be successful
without reliable scientific tools that detect locations for urgent vector control attention.
A conventional approach to study the disease transmission is through compartmental
modeling, which employs a system of differential equations (Ladeau et al., 2011). The SIR
Model (Dimitrov & Meyers, 2010), which breaks the population into three compartments -
susceptible, infected and recovered - is in this category. Unfortunately, the compartmental
modeling approach is based on some unrealistic assumptions, one of which is the concept of
homogeneous mixing. This is the assumption that all individuals in a transmission
environment have equal chances of mixing with others, and hence have uniform probability of
contacting the disease. Network modeling is an improvement over compartmental approach in
the sense that it has the ability to depict the complexity of the real world (Craft & Caillaud,
2011) by capturing the interactions that lead to disease transmission. Hence rather than simply
assuming that all individuals have equal chances of contacting the disease, this model’s
approach takes note of the fact that contacts always vary, and that the probability of disease
transmission is proportional to the level of such contacts. Contact network modeling is rooted
in graph theory.
Before defining contact networks, it is important to first define graphs and networks,
and point out the relationship between the two concepts. A graph is a mathematical structure
made up of a set of points called nodes that are connected by lines called edges. A network is
a graph where the nodes and edges have been assigned meaningful values. The word
meaningful in this sense implies that the resulting structure automatically becomes associated
4
with a particular field of study. For instance, a road network is a graph structure where the
nodes represent different cities, the edges represent the roads connecting the cities, and the
edge labels represent the actual distance of the roads. Hence, a graph is a mathematical model
of networks. From the angle of the object oriented paradigm, a network is simply an
instantiation of a graph object.
A contact network is a graph structure where each node represents a person (or
location), and the edges represent contacts among people (or locations) in the network
(Meyers, 2007). In infectious disease epidemiology, a contact network depicts interactions
that can lead to disease transmission. Human infectious diseases get transmitted as a result of
human contacts either with other infected human beings, locations, or non-human infectious
agents, depending on the disease in question. For instance, lung infections are contacted by
being in locations with particulate air pollution (Fullerton et al., 2008). Malaria transmission
takes places when a human is bitten by infected vectors. The human contact in this case takes
place within a location where these infected vectors thrive. A contact network is therefore a
structure that model a disease transmission environment as a set of nodes and edges, such that
the disease transmits from one node to another through the edges (Salathe & Jones, 2010). In
contact networks, the higher the edge weight (measure of level of possible contact), the higher
the probability of transmission between adjacent nodes (Schumm et al., 2007).
A contact network can be categorized as either homogeneous (single node) or
heterogeneous. A single node contact network is a network in which all the nodes are of the
same type, while a heterogeneous network is one where the nodes are of different types, and
hence of different behavioural attributes. A contagious skin disease such as small pox can be
modeled using a single node network since transfer is directly from person to person, unlike
in malaria where vectors are involved. The complexity is minimal when modeling such a
5
disease since only a single node type is involved in the disease transmission environment.
This is different in the case of malaria transmission, which requires a heterogeneous contact
network where the two node types ‘public places’ and ‘human beings’ have a number of
dissimilar attributes. For instance, while human beings usually move about, public places are
usually stationary. Furthermore, since malaria vectors have mobility (they can fly), their
attributes have to be factored into the model, thereby making heterogeneous contact network
modeling more complex.
The research problem here is therefore about the necessity to detect and rank the
public places that account for the infection of the human beings. These are the reservoirs for
infected malaria vectors.
1.2 RESEARCH PROBLEMS
Malaria transmission in public places is a problem that needs scientific intervention.
An article by Rogers (2009) reported that a number of public places, such as bars and
restaurants, had closed outdoor terraces or shut down completely because of what was
described as a “100 billion mosquito invasion”.
Unfortunately, there is a research gap that needs to be filled in terms of the detection
of public places that harbour these vectors. A practical scenario was observed in 2010 when a
team of vector control experts visited UNIMAS to spray the institution against malaria and
dengue vectors. In an interview, they mentioned that they lacked vector detection tools, which
resulted in the team possibly spraying in the wrong places.
A number of tools in existence for vector detection have some associated
disadvantages. One such technology is the laser-wielding robot that detects mosquitoes in the
air and shoots them dead (Robert, 2009). Two serious concerns expressed from public opinion
6
are that the lasers could be harmful to human beings, and that the technology could
mistakenly kill other insects that are useful in the ecosystem. Vector detection is an issue that
calls for research.
Disease modeling methods that assume population homogeneity have been described
as faulty and unrealistic (Tom & Gerardo, 2009). The term population homogeneity refers to
the assumption that every person within the disease transmission environment has equal
probability of mixing with others and hence getting infected by the disease. An improvement
over this faulty strategy is to build models that emphasize variation of contacts leading to
disease transmission. Network modeling is a method that takes into consideration the
variation of contacts in disease transmission, which is the method proposed in this research as
a way to address the observed deficiency.
The difficulty in modeling of malaria transmission arises due to its complex life cycle.
Fortunately, every malaria transmission involves contacts (blood sucking bites) between
human beings and the vectors. While this scenario could be used to build contact networks for
malaria transmission studies, there are some important issues to be dealt with, one of which is
the fact that public places and human beings have different attributes. This would mean that a
heterogeneous network rather than a single-node network would be more appropriate for
studying malaria transmission in public places. However, Christakis & Fowler (2009) stated
that the complexity associated with heterogeneous networks generation has impeded many
network researches, which is an issue that needs to be addressed.
Given that a contact network model is involved, a search engine would be appropriate
to be used to detect the public places of interest, which is another research gap to be filled.
For this purpose, we will propose a web search engine algorithm on the contact network for
vector reservoir detection. To the best of our knowledge, no previous research has applied