Solving Linear Programs in MapReduce - Max Planck · PDF fileSolving Linear Programs ......
Transcript of Solving Linear Programs in MapReduce - Max Planck · PDF fileSolving Linear Programs ......
Universitat des SaarlandesMax-Planck-Institut fur Informatik
Solving Linear Programs in MapReduce
Masterarbeit im Fach Informatik
Masters Thesis in Computer Science
von / by
Mahdi Ebrahimi
angefertigt unter der Leitung von / supervised by
Prof. Dr. Gerhard Weikum
betreut von / advised by
Dr. Rainer Gemulla
begutachtet von / reviewers
Prof. Dr. Gerhard Weikum
Dr. Rainer Gemulla
Saarbrucken, May 30, 2011
Non-plagiarism Statement
Hereby I confirm that this thesis is my own work and that I have documented all sources
used.
(Mahdi Ebrahimi)
Saarbrucken, May 30, 2011
Declaration of Consent
Herewith I agree that my thesis will be made available through the library of the Com-
puter Science Department.
(Mahdi Ebrahimi)
Saarbrucken, May 30, 2011
iii
iv
Abstract
Most interesting discrete optimization problems are NP-hard, thus no efficient algorithm
to find optimal solution to such problems is likely to exist. Linear programming plays a
central role in design and analysis of many approximation algorithms. However, linear
program instances in real-world applications grow enormously. In this thesis, we study
the Awerbuch-Khandekar parallel algorithm for approximating linear programs, provide
strategies for efficient realization of the algorithm in MapReduce, and discuss methods
to improve its performance in practice. Further, we characterize numerical properties of
the algorithm by comparing it with partially-distributed optimization methods. Finally,
we evaluate the algorithm on a weighted maximum satisfiability problem generated by
SOFIE knowledge extraction framework on the complete Academic Corpus.
vi
Acknowledgements
I would like to express my sincere gratitude to Prof. Gerhard Weikum and Dr. Rainer
Gemulla for giving me the opportunity to work under their supervision. Regular dis-
cussions with Dr. Rainer Gemulla were helpful in setting up the targets and motivating
to further meander through the course of this study. I am also thankful to Dr. Mauro
Sozio for initiating this work. Finally, I am grateful to my parents, relatives and friends
for their support.
vii
viii
Contents
Abstract v
Acknowledgements vii
List of Figures xii
List of Tables xv
List of Algorithms xvii
1 Introduction 11.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.3 Outline of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.4 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2 Preliminaries 52.1 Linear Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.2 Mixed Packing-Covering Linear Programs . . . . . . . . . . . . . . . . . . 72.3 LP-approximation for Weighted MAX-SAT . . . . . . . . . . . . . . . . . 72.4 Binary Search for Optimal Solution . . . . . . . . . . . . . . . . . . . . . . 82.5 MapReduce . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.6 SOFIE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.6.1 Statements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112.6.2 Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112.6.3 MAX-SAT Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3 Solving Linear Programs in MapReduce 153.1 Awerbuch-Khandekar Algorithm (AK) . . . . . . . . . . . . . . . . . . . . 153.2 Realization in MapReduce . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.2.1 MR-MixedPC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183.2.2 MR-MixedPC-E . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203.2.3 MR-MixedPC-S . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4 Experiments 254.1 Awerbuch-Khandekar Performance Analysis . . . . . . . . . . . . . . . . . 25
ix
Contents x
4.2 Scalability Test for Sequential LP Solvers . . . . . . . . . . . . . . . . . . 264.3 Comparison with Partially-distributed Methods . . . . . . . . . . . . . . . 28
4.3.1 BFGS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294.3.2 L-BFGS-B . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.4 Large-scale Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . 334.5 Case Study: SOFIE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.5.1 Mid-size Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . 344.5.2 Large-scale Experiment . . . . . . . . . . . . . . . . . . . . . . . . 36
5 Conclusion and Future Work 455.1 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
6 Appendix A. 47
7 Appendix B. 49
xi
List of Figures
2.1 SOFIE Main Components . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
4.1 The AK Potential Value for Different Epsilons . . . . . . . . . . . . . . . . 264.2 The AK Violation Value for Different Epsilons . . . . . . . . . . . . . . . 274.3 CPlex Runtime . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284.4 Eventual Potential Values in Non-converging BFGS Runs . . . . . . . . . 294.5 Potential values in a Non-converging BFGS run . . . . . . . . . . . . . . . 304.6 Violation Comparison for Modified AK Algorithm . . . . . . . . . . . . . 364.7 MR-MixedPC Job History . . . . . . . . . . . . . . . . . . . . . . . . . . . 384.8 MR-MixedPC-E Job History . . . . . . . . . . . . . . . . . . . . . . . . . 394.9 MR-MixedPC-S Job History . . . . . . . . . . . . . . . . . . . . . . . . . . 404.10 MR-MixedPC Resource Utilization Diagram . . . . . . . . . . . . . . . . . 414.11 MR-MixedPC-E Resource Utilization Diagram . . . . . . . . . . . . . . . 424.12 MR-MixedPC-S Resource Utilization Diagram . . . . . . . . . . . . . . . 43
xiii
List of Tables
3.1 Parameter Setting for the AK Algorithm . . . . . . . . . . . . . . . . . . . 17
4.1 Comparison with Partially-distributed Methods . . . . . . . . . . . . . . . 334.2 Case Study Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
xv
List of Algorithms
3.1 AK Algorithm for Mixed Packing-Covering . . . . . . . . . . . . . . . . . . 16
3.2 MR-MixedPC: MapI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.3 MR-MixedPC: ReduceI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.4 MR-MixedPC: MapII . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.5 MR-MixedPC: ReduceII . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.6 MR-MixedPC-E: ReduceI, Preprocessing Phase . . . . . . . . . . . . . . . . 21
3.7 MR-MixedPC-E: MapI, Iterative Phase . . . . . . . . . . . . . . . . . . . . 22
3.8 MR-MixedPC-E: MapII, Iterative Phase . . . . . . . . . . . . . . . . . . . . 22
3.9 lsum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.10 Numerically Stable AK for Mixed Packing-Covering . . . . . . . . . . . . . 24
6.1 MR-MixedPC-E: ReduceII, Preprocessing Phase . . . . . . . . . . . . . . . 47
6.2 MR-MixedPC-S: MapI, Iterative Phase . . . . . . . . . . . . . . . . . . . . 47
6.3 MR-MixedPC-S: MapII, Iterative Phase . . . . . . . . . . . . . . . . . . . . 48
xvii
xviii
Chapter 1
Introduction
1.1 Motivation
One of the promises of the information technology era is to deploy computers to support
rapid, informed decision making by sifting through large amounts of data. The objective
is to make decisions to achieve some best possible goal. The study of how to make
decisions of these sorts has created the field of discrete optimization. Unfortunately, most
interesting discrete optimization problems are NP-hard, thus no efficient algorithm to
find optimal solution to such problems is likely to exist. Linear programs play a central
role in design and analysis of many approximation algorithms [25]. However, linear
program instances in real-world applications grow enormously.
Rapid improvement and availability of cheap, commodity high-performance components
was the driving force for a new era in computing to use networks of computers to
handle large-scale computations [3]. MapReduce is a powerful computational model
that has proved successful in large-scale distributed data analysis[6]. A MapReduce
cluster is easy to run and to maintain, and all the issues related to parallel execution of
algorithms, such as the communication and synchronization of processes, partitioning
and distribution of data, mapping of processes onto processors, and fault tolerance are
automatically handled by the framework. Furthermore, many large-scale applications
such as SOFIE knowledge extraction framework require scanning through huge volumes
of documents, which makes MapReduce a perfect match for them. It is often desirable
to be able to solve the resulted optimization problems in these applications on the same
cluster as the one it was generated on.
1
Chapter 1. Introduction 2
1.2 Contribution
We make the following contributions in our work:
1. We implement the Awerbuch-Khandekar parallel algorithm for solving linear pro-
grams, and study its performance from a practical point of view.
2. We propose strategies for improving the performance of the Awerbuch-Khandekar
algorithm in practice.
3. We study the scalability of state-of-the-art sequential linear program solvers.
4. We compare the performance of the Awerbuch-Khandekar algorithm with partially-
distributed approaches.
5. We propose several realizations for the Awerbuch-Khandekar algorithm in MapRe-
duce, and analyze their performance on large-scale experiments.
6. We analyze the performance of the Awerbuch-Khandekar algorithm on a weighted
maximum satisfiability problem generated by SOFIE knowledge extraction frame-
work on the complete Academic Corpus.
7. We evaluate the quality of the solution of the Awerbuch-Khandekar algorithm by
comparing it with FMS* and CPlex methods.
1.3 Outline of the Thesis
This thesis is organized as follows: Preliminary concepts that are referred to in the
rest of the study are elaborated in Chapter 2. The Awerbuch-Khandekar algorithm
and its realization in MapReduce are discussed in Chapter 3. Chapter 4 describes
our experimental settings, and provides the results of various tests on scalability and
performance of the algorithm. Chapter 5 concludes this study and provides directions
for future work.
1.4 Related Work
It was first shown by Papadimitriou and Yannakakis [16] that positive linear programs
can be well-approximated even if the constraint matrix is distributed among a set of de-
cision makers that are not allowed to communicate. There, it was shown that the worst
case approximation ratio is related to the maximum number of variables appearing in
Chapter 1. Introduction 3
each constraint. At the same time, Luby and Nisan [14] developed a parallel approxi-
mation algorithm that found a feasible value within ε of an optimal feasible solution to
positive linear programs with a running time polynomial in log (N) /ε, where N is the
number of non-zero coefficients associated with the problem instance. Later, the idea
was pushed further by Bartal et al. [5] by allowing local communication between dis-
tributed decision makers. As a result, a distributed algorithm was obtained that could
achieve (1 + ε) approximation to the optimal solution, while using only polylogarithmic
number of local communication rounds. Later on, Kuhn et al. [11] provided a tight
classification of the trade off between the amount of local information and the quality
of the solution, and outlined two specific algorithms for small and unbounded message
size distributed environments.
Another work by Young [26] outlined sequential and parallel algorithms that approxi-
mated mixed packing and covering linear programs (in contrast to pure packing or pure
covering, which only have ”≤” or ”≥” inequalities, but not both), and provided a par-
allel algorithm that ran in time polylogarithmic in the input size with total number of
operations comparable to sequential algorithms. Jansen [10] later provided a parallel
approximation algorithm to solve general mixed packing and covering problems. Awer-
buch and Khandekar [1] introduced the first stateless approximation algorithm for mixed
packing covering linear programs.
The Portable, Extensible Toolkit for Scientific Computation (PETSc) [4] and the Toolkit
for Advanced Optimization (TAO) [7] packages provide facilities for solving linear pro-
grams based on the Message Passing Interface (MPI) standard.
The MapReduce framework proved successful in many large-scale applications. Liu et
al. [13] accomplished nonnegative matrix factorization on web-scale dyadic data with
tens of millions by hundreds of millions matrices containing billions of nonzero entries.
Chierichetti et al. [6] reported successful speedups in MapReduce in comparison to se-
quential greedy algorithms in solving Max-Cover problem on five large-scale data sets
derived from Yahoo! logs. Paradies [17] used MapReduce to perform document clus-
tering in the area of entity matching, where documents from various data sources were
matched together. Zhao et al. [27] and Li et al. [12] proposed parallel algorithms for
k-means clustering based on MapReduce.
Chapter 2
Preliminaries
2.1 Linear Programming
One of the promises of the information technology era is to deploy computers to support
rapid, informed decision making by sifting through large amounts of data. Deciding
inventory levels, routing vehicles, and organizing data for efficient retrieval are examples
of everyday problems in today’s society. Discrete optimization is a branch of optimiza-
tion that studies the question of how to make decisions of these sort to achieve some
best possible objective [25].
Unfortunately, there is no efficient algorithm for most interesting discrete optimization
problems, where an efficient algorithm is known to run in polynomial time in its input
size. In consequence, it is not possible to have an algorithm for most discrete optimiza-
tion problems which finds the optimal solution, in polynomial time, for any instance.
One approach to this problem is to relax the latter requirement, and develop efficient
algorithms that find the optimal solution for the specific problem at hand. The resulting
algorithm is only useful for the special instances it was designed for.
A more common approach is to relax the polynomial time requirement, and find the opti-
mal solution by searching through the complete set of possible solutions. This approach,
however, turns out to become intractable as input grows.
By far the most common approach relaxes the requirement of optimality of the solution,
and settles for a ”good enough”, approximate solution that could be found in tractable
time. Linear programming, frequently abbreviated to LP, plays a central role in design
and analysis of many approximation algorithms, and there has been an enormous study
of various LP-based approximation approaches [25].
5
Chapter 2. Preliminaries 6
The objective in a linear program is to minimize a linear function subject to linear
equality constraints. A standard LP in vector-matrix notation is written as:
Minimize: cTx
subject to: Ax = b, A ∈ Rm×n(2.1)
where x ∈ Rn+, c ∈ Rn, and b ∈ Rm [8].
The above equation can be expanded by vector-matrix multiplication as follows:
Minimize: c1x1 + c2x2 + · · · + cnxn
Subject to: a11x1 + a12x2 + · · · + a1nxn = b1
a21x1 + a22x2 + · · · + a2nxn = b2...
......
......
am1x1 + am2x2 + · · · + amnxn = bm
(2.2)
where ci and xi are the ith element of vectors c and b, respectively, and aij represents
the jth element of the ith row of matrix A. Without loss of generality, the right-hand
side values in Eq. 2.1 can be restricted to non-negative values, i.e., b ∈ Rm+ .
Other variations of Eq. 2.1 are considerable: for example, maximizing the objective
function rather than minimizing it, having inequalities in addition to equations, and
allowing negative variables. However, the above form is general enough to capture all
these extensions. For a detailed discussion on the transformation techniques, the reader
is referred to [22].
Integer programming is another variation of LP, where constraints requiring integer
variables are allowed. Example integer programs include restricting some variables to
natural numbers, or to a bounded range such as {0, 1}. Unlike linear programming,
integer programs are NP-complete, so no efficient algorithm to solve general integer
programs is likely to exist [25].
An optimal solution x ∈ Rn+ to Eq. 2.1 is the one which minimizes the objective function
while satisfying all constraints. Although there are very efficient, sequential algorithms
that find the optimal solution for general linear programs [25], there is currently no
parallel algorithm, to the extent of our knowledge, that optimally solves general LPs.
However, there are subclasses of linear programs that have been extensively studied in
distributed settings in recent years [1, 2, 5, 10, 11, 14, 16, 26]. This work is concerned
with Mixed Packing-Covering linear programs.
Chapter 2. Preliminaries 7
2.2 Mixed Packing-Covering Linear Programs
Mixed Packing-Covering (Mixed PC) linear programs are an important subclass of LP,
where both less-than-or-equal and greater-than-or-equal constraints are allowed, but the
coefficient matrices are restricted to non-negative values. Both inequality constraints
(≤, ≥) are explicitly denoted in the standard form of a Mixed PC. The former are often
referred to as packing, while the latter are known as covering constraints. Intuitively,
packing constraints are linear inequalities with upper bounds that can not be exceeded,
while covering constraints define inequalities with lower bounds to be satisfied. A Mixed
PC linear program in vector-matrix notation is denoted as follows:
Minimize cTx
subject to Ax ≤ b, A ∈ Rm×n+
Cx ≥ d, C ∈ Rk×n+
(2.3)
where x ∈ Rn+, c ∈ Rn
+, b ∈ Rm+ , and d ∈ Rk
+ [15].
Despite the additional restriction in comparison with general linear programs, Mixed
PC is an expressive model that is able to represent a large family of interesting opti-
mization problems such as set cover, (weighted) maximum satisfiability, maximum cut,
and multicommodity flow, to name but a few.
As an example, we outline the procedure to approximate a weighted maximum satisfia-
bility (Weighted MAX-SAT) problem using Mixed PC linear programs. Later, we will
use this framework to solve MAX-SAT problems in our case studies.
2.3 LP-approximation for Weighted MAX-SAT
DEFINITION 2.1. Given n Boolean variables x1, . . . , xn, where each variable xi ∈{0, 1}, m clauses C1, . . . , Cm, where each clause Cj is a disjunction of some number of
the variables and their negations, and a nonnegative weight wj associated to each clause
Cj , the Weighted MAX-SAT problem is the task of finding a truth assignment to Boolean
variables such that the sum of the weights of the satisfied clauses is maximized [23, 25].
LP approximating of Weighted MAX-SAT contains three steps. First, the problem is
modeled as a mixed packing-covering integer program as shown in Eq. 2.4.
Chapter 2. Preliminaries 8
Maximize∑m
j=1wjzj
subject to∑
xi∈X(cj) xi +∑
xi∈X(cj) xi ≥ zj ∀cj ∈ C (a)
xi + xi ≤ 1 ∀xi ∈ X (b)
xi + xi ≥ 1 ∀xi ∈ X (c)
xi ∈ {0, 1} ∀xi ∈ X (d)
zj ∈ {0, 1} ∀cj ∈ C (e)
(2.4)
Here, the set of all variables and clauses are represented with X and C, respectively.
To address all positive variables participating in clause cj , we use the notation X(cj).
Similarly, X(cj) represents the set of negative variables in cj .
A clause is satisfied if one of the positive variables is set to 1, or one of the negated
variables is set to 0. This is captured in the first constraint in 2.4, tagged with (a).
When a clause cj is satisfied, its corresponding slack variable zj is set to 1, which
in consequence, adds the associated weight of the satisfied clause to the value of the
objective function. Additionally, constraints (b) and (c) jointly assure that of a variable
and its negation, one and only one is set to 1.
In the next step, the integer program in 2.4 is relaxed to LP, by switching variable types
from Boolean to fractional ((d) and (e) are replaced with xi ∈ [0, 1] and zj ∈ [0, 1],
respectively), and the resulting LP is solved.
The eventual solution is computed by rounding the fractional variables to Boolean.
Rounding variables to 1 or 0 by flipping a coin that is biased to the fractional value
of variables guarantees (1− 1e )-approximation for MAX-SAT. A more sophisticated ap-
proach provides 34 -optimal approximation for this problem. For a detailed survey of
different rounding schemes and their approximation ratios, the reader is referred to [25].
Currently available parallel algorithms for Mixed PC linear programs are limited to find-
ing a feasible solution that satisfies all constraints, regardless of the value of the objective
function. Following, we describe how binary search can be applied with feasible solu-
tions to find the optimal solution to the minimization problem in Eq. 2.3. Maximization
problems can be treated similarly.
2.4 Binary Search for Optimal Solution
Given the objective function cTx, and a feasible solution x∗, an upper bound for the
objective function is evaluated as u∗ = cTx∗. Next, we add a new constraint cTx ≤ u∗
2
Chapter 2. Preliminaries 9
to packing constraints, and solve the new system for the next feasible solution. If the
new system turns out to be feasible, u∗
2 becomes the new upper bound, and binary
search continues with the tighter constraint cTx ≤ u∗
4 . Otherwise, u∗ remains as the
upper bound, u∗
2 becomes the new lower bound, and binary search continues with a new
constraint cTx ≤ 3u∗
4 . This procedure is repeated until the length of the search interval
is less than or equal to the desired precision of the optimal solution.
2.5 MapReduce
Starting in early 1990s, rapid improvement and availability of cheap, commodity high-
performance components was the driving force for a new era in computing to use net-
works of computers to handle large-scale computations [3].
However, developing distributed applications requires a significant amount of effort to
address regular issues in parallelizing algorithms, such as the communication and syn-
chronization of processes, partitioning and distribution of data, mapping of processes
onto processors, and fault tolerance. Developers often find themselves reimplementing
similar procedures from one application to the other.
In this work, we are going to study distributed LP solving in MapReduce. A MapReduce
cluster is easy to run and to maintain, and all the aforementioned issues related to paral-
lel execution of algorithms are automatically handled by the framework. Hence, it is very
convenient to implement algorithms in MapReduce, since the application developer can
concentrate on the main logic of the algorithm. Furthermore, many applications, includ-
ing SOFIE knowledge extraction framework, require scanning through large amounts of
documents, which makes MapReduce a perfect match for them. It is often more desir-
able to be able to solve the resulting optimization problems in these applications on the
same cluster as the one they were generated on, rather than migrating it to yet another
framework (e.g. a Message Passing cluster).
The MapReduce programming model is based on key/value tuples. The Map function
written by the user, takes a stream of input key/value pairs, and produces a set of
intermediate key/values. Next, the shuffle stage starts by the MapReduce library which
groups together all intermediate values associated with the same intermediate key. In
the Reduce function, also written by the user, intermediate values for the same key are
brought and processed together, to produce a possibly smaller set of output key/value
pairs [9].
As an example, the illustrative problem of counting the number of word occurrences in
a large collection of documents in MapReduce is represented as following: The input
Chapter 2. Preliminaries 10
Figure 2.1: SOFIE Main Components
key/value to the Map function is a document name, and its contents. The function scans
through the document and emits each word plus the associated count of the occurrences
of that word in the document. Shuffling groups together occurrences of the same word
in all documents, and passes them to the Reduce function. The Reduce function sums
up all the occurrences, and emits the word and its overall count of occurrences.
2.6 SOFIE
Self-Organizing Framework for Information Extraction (SOFIE), is an ontology-oriented
information extraction framework that aims at extracting high-quality ontological facts.
There are three main challenges in any ontology-based information extraction (IE) frame-
work including SOFIE, namely pattern selection - finding meaningful patterns in text,
entity disambiguation - selecting among multiple possibly ambiguous mappings of words
or phrases in the text to their most probably intended meanings in the ontology, and
consistency checking - scrutinizing a large set of IE-provided noisy candidates against
a trusted core of facts in the ontology. Rather than addressing each of these issues
separately, SOFIE’s novel approach simultaneously solves the three problems by casting
them into a Weighted MAX-SAT problem [19, 21].
In this section, the components of this framework are introduced, and the construction
of its MAX-SAT model is elaborated. We will later return to this problem in case studies
in Chapter 4. SOFIE’s main components are depicted in Fig. 2.1 1.1The figure is borrowed from [19].
Chapter 2. Preliminaries 11
2.6.1 Statements
Statements in SOFIE are relations of arbitrary arity. Each statement is assigned a truth
value of 0 or 1, that is denoted in square brackets. Hence the statement ”Albert Einstein
is born in Ulm”, for example, is represented as
bornIn(AlbertEinstein, Ulm)[1]
A statement with truth value 1 is a fact. A statement with unknown truth value is
a hypothesis. There are two types of facts in SOFIE, ontological and textual facts.
Ontological facts come from an underlying ontology, e.g. YAGO [20]. Textual facts
are extracted from a given text corpus, and are divided in two categories. The first
category are pattern occurrence facts that make assertions about the occurrence of
textual patterns. For example, if pattern ”X went to school in Y ” is detected between
Einstein and Germany, the following fact is generated:
patternOcc(”X went to school in Y ”, Einstein, Germany)[1]
The second category uses linguistic techniques to estimate the likeliness of mappings
from words or phrases in text to entities and relations in the ontology. These estimations
appear in the form of disambiguationPrior facts. As an example, disambiguation priors
for Einstein might look like2:
disambPrior(Einstein, AlbertEinstein, 0.8)[1]
disambPrior(Einstein, HermannEinstein, 0.2)[1]
SOFIE uses rules to form new hypotheses based on ontological and textual facts. Hy-
potheses are either concerned with disambiguation of entities, e.g. disambiguateAs
(Einstein,AlbertEinstein)[?], about a certain pattern expressing a certain relation,
e.g. expresses(”X lives in Y ”, LivesInLocation)[?], or even express new potential
facts, e.g. developed(Microsoft, JavaProgrammingLanguage)[?].
2.6.2 Rules
Rules are background knowledge that are represented as logical formulae over a set of
literals. A literal is a statement that can have placeholders for the relation name or
for its entities. Conventionally, uppercase strings are used to represent placeholders.
Following is a sample rule stating that a person who is born in Ulm can not be born in
London:2The reported numbers here are imaginary. For a detailed description on how SOFIE estimates these
values, please refer to [19].
Chapter 2. Preliminaries 12
bornIn(X, Ulm) =⇒ ¬bornIn(X, London)
A rule in SOFIE is defined as follows [19]:
DEFINITION 2.2. Given a set of literals L, a rule over L is one of the following:
• an element of L
• an expression of the form ¬R, where R is a rule over L
• an expression of the form (R1 � R2), where R1 and R2 are rules over L and � ∈{∧,∨, =⇒ , ⇐⇒ }.
A rule is said to be grounded when all placeholders in its literals are replaced by entities.
All occurrences of one placeholder within a grounding must be replaced by the same
entity. Replacing the placeholder X in the aforementioned rule with AlbertEinstein, for
example, generates the following grounded instance:
bornIn(AlbertEinstein, Ulm) =⇒ ¬bornIn(AlbertEinstein, London)
2.6.3 MAX-SAT Model
Each grounded rule in SOFIE is transformed into one or multiple clauses in the modelling
of the MAX-SAT problem. Associating weights to clauses in MAX-SAT is an adequate
tool to prioritize clauses according to their importance. This intuition is exploited in
SOFIE to introduce a concept of softness for rules.
Generally speaking, there might be no solution to the MAX-SAT problem in SOFIE
that could satisfy all clauses at the same time. For example, as soon as there are two
disambiguation priors suggesting different disambiguations for the same phrase, one of
the clauses has to be violated.
In contrast to the soft rules, there are also hard rules concerning the consistency of the
solution, that are not allowed to be violated at all. Defined over a set of hypotheses, a
hard rule restricts the maximum number of accepted hypotheses to 1 (e.g., among all
hypotheses suggesting a single person to be born in different places, maximally one can
be true). The set of hypotheses that participate in a hard rule are known as a competitor
set.
Eq. 2.5 shows a Weighted MAX-SAT problem with competitor sets, as it appears in
SOFIE.
Chapter 2. Preliminaries 13
Maximize∑m
j=1wjzj
subject to∑
xi∈X(cj) xi +∑
xi∈X(cj) xi ≥ zj ∀cj ∈ C (a)∑xi∈sk
xi ≤ 1 ∀sk ∈ S (b)
xi + xi ≤ 1 ∀xi ∈ X (c)
xi + xi ≥ 1 ∀xi ∈ X (d)
xi ∈ {0, 1} ∀xi ∈ X (e)
zj ∈ {0, 1} ∀cj ∈ C (f)
(2.5)
The set of all statements that appear in clauses compose the set of variables (X), and the
differentiation in the importance of rules is imposed by assigning different weights to the
soft and hard rules. Furthermore, to assure that hard rules are not violated, constraint
(b) is added for each competitor set sk ∈ S, where S is the set of all competitor sets.
Chapter 3
Solving Linear Programs in
MapReduce
In this chapter, we will focus on the Awerbuch-Khandekar (AK) algorithm for solving
linear programs [15]. The algorithm’s acceptable time complexity (polylogarithmic in
number of variables, number of constraints, and the largest entry in the coefficient
matrices) makes it a promissing approach to investigate. Furthermore, it is a simple,
numerically stable algorithm requiring local information and bounded small message size,
which offers nice opportunities for efficient realization in MapReduce. Future work could
further expand this study to compare the AK method with other parallel algorithms
(e.g, [10, 26]).
3.1 Awerbuch-Khandekar Algorithm (AK)
Given a mixed packing-covering linear program
Ax ≤ b, A ∈ Rm×n+
Cx ≥ d, C ∈ Rk×n+
(3.1)
where x ∈ Rn+, b ∈ Rm
+ , and d ∈ Rk+, the Awerbuch-Khandekar algorithm finds a feasible
solution to (3.1). We assume without loss of generality, that b = 1 and d = 1, where
1 denotes a vector of all 1s in appropriate length. Additionally, all non-zero entries
in A and C coefficient matrices are considered to be in range [1,M ]. Noncomplying
formulations could always be transformed by proper scaling of rows and columns in
constraints matrices, as follows.
15
Chapter 3. Solving Linear Programs in MapReduce 16
First, each row i is divided by its associated right-hand side value bi for packing or
di for covering constraints. Next, for each variable xj , cj is chosen to be the smallest
non-zero coefficient of xj in jth column of both packing and covering matrices. Then
xj is replaced by xj = cj ∗ xj and all coefficients in column j are divided by cj . The
resulting problem solves for the new variables xj , and complies with the aforementioned
requirements. Computing xj values from xj is straightforward, xj = xj
cj.
AK is an iterative algorithm, that converges to an ε-feasible solution, if any, after poly-
logarithmic number of iterations.
DEFINITION 3.1. A solution x ∈ Rn+ is said to be ε-feasible if the maximum violation
of packing and covering constraints is less-than-or-equal to ε, i.e., Ax ≤ (1 + ε).1 1 and
Cx ≥ (1− ε).1.
THEOREM 3.1. Given an initial solution x0 ∈ Rn+, the AK solution becomesO(ε)-feasible
in
poly
(log(nmkM)
ε
)number of iterations. Furthermore, once the solution becomes O(ε)-feasible, it always
remains O(ε)-feasible.
This method is depicted in Algorithm 3.1 (α, β, and δ are constants to be defined later):
Algorithm 3.1: AK Algorithm for Mixed Packing-Coveringinput: x ∈ Rn
+
repeat1
y(x)← exp [µ. (Ax− 1)]2
z(x)← exp [µ. (1−Cx)]3
for j = 1 . . . n do4
Rj ←AT
j y(x)
CTj z(x)5
if Rj ≤ 1− α then6
xj ← max (xj (1 + β) , δ)7
else if Rj ≥ 1 + α then8
xj ← xj (1− β)9
end10
end11
until Ax ≤ (1 + ε).1 and Cx ≥ (1− ε).1 ;12
return x13
Each iteration of the AK algorithm contains three main steps. First, an indicator of
the amount of violation for each constraint is computed by evaluating (Ax − 1) and
(1 − Cx) in lines 2 and 3. Next, ratio R estimating the share of each variable in1Throughout this study, dot (.) is used to represent scalar-vector multiplication.
Chapter 3. Solving Linear Programs in MapReduce 17
violation of packing and/or covering constraints is computed in line 5. Finally, based on
corresponding ratios, new values for all variables are decided. A variable xj ∈ x will be
increased, if it has a dominating role in violation of covering constraints (lines 6 and 7).
Otherwise, if packing constrains are more dominantly violated because of xj , its value
will be decreased (lines 9 and 10). When the violation of constraints in both sides is
acceptable, xj will not change.
By following these steps, AK intuitively mimics gradient descent method. The goal is to
minimize violation by finding the stationary point for an exponential penalty function,
called potential function.
DEFINITION 3.2. Given x ∈ Rn+, let y and z be vectors in Rm
+ and Rk+, defined as
y(x) = exp [µ. (Ax− 1)] (3.2)
z(x) = exp [µ. (1−Cx)] (3.3)
where µ is a constant to be defined later. Assuming xt, yt, and zt to be values of x,
y(x), and z(x) in round t, the potential function is defined as following:
Φt = 1.yt + 1.zt (3.4)
Parameters
The choice of parameters µ, α, β, and δ in the AK algorithm is based on the maximum
accepted violation (ε), and is shown in Table 3.1. For an in-depth analysis on the
selection criterion, the reader is referred to [1].
µ 1ε ln mkM
ε
α ε4
β Θ(εµ
)δ Θ
(ε
µnM
)Table 3.1: Parameter Setting for the AK Algorithm
Chapter 3. Solving Linear Programs in MapReduce 18
3.2 Realization in MapReduce
In the last section, the gradual update procedure of the AK algorithm was described,
where new values for variables were decided based on matrix-vector multiplication op-
erations. First, A and C matrices were multiplied to vector x to evaluate the violation
indicator functions, y and z. Then, AT and CT matrices were multiplied to the vio-
lation vectors to find out whether and how each variable should be modified to reduce
violations.
In this section, we outline how MapReduce programming model can be used to imple-
ment the aforementioned operations. We begin our survey by the most straightforward
implementation. Afterwards, we move towards more efficient and numerically stable
realizations.
3.2.1 MR-MixedPC
Most matrices in practice happen to be sparse with only a small fraction of non-zero
elements. Often, higher scalability and better performance could be achieved by avoiding
the unnecessary consideration of zero-valued elements. To that end, only non-zero entries
of a sparse matrix are stored in triplet format (row, column, value), where row and
column indicate the position of an entry in the matrix, and value contains its value.
We adapt a slightly modified version of this representation in design of our methods,
which has the extra field origin. Origin indicates whether an entry belongs to the
packing or to the covering matrix. We refer to this data structure as MatrixEntry. A
dot (.) is used to access the fields of a MatrixEntry, e.g., MatrixEntry.row.
Now, consider the multiplication of constraint matrices to the variables vector, x. Using
the newly defined MatrixEntry data structure, we jointly treat the matrices as a set of
MatrixEntry instances, where each instance contains adequate information to identify
its origin, its location, and its value. To perform this multiplication, entries from the
same row must be gathered together. We take advantage of the inherent shuffle stage
in MapReduce to achieve this. The steps are as follows.
Map. The Map function is defined to be an identity function, i.e., it returns each
MatrixEntry that it receives. Since we plan to group together all entries from the same
row, an entry’s row number is used as its key. This function is depicted in Algorithm 3.2.
Reduce. As a result of the map and the shuffle stages, the Reduce function receives all
matrix entries from the same row. The reducer iterates over the entries, and multiplies
each variable by its corresponding coefficient.
Chapter 3. Solving Linear Programs in MapReduce 19
Algorithm 3.2: MR-MixedPC: MapIinput: Key, MatrixEntry
Key ← MatrixEntry.row1
Emit(Key, MatrixEntry)2
The Reduce function is shown in Algorithm 3.3. Since entries from both packing and
covering matrices are treated simultaneously, a reducer receives entries on the same row
of both matrices. We take care of this by checking the origin of each entry.
Algorithm 3.3: MR-MixedPC: ReduceIinput: Key, MatrixEntryList
sump ← 0, sumc ← 01
foreach MatrixEntry in MatrixEntryList do2
if MatrixEntry.origin == Packing then3
sump ← sump+ MatrixEntry.value ∗x[MatrixEntry.column]4
else5
sumc ← sumc+ MatrixEntry.value ∗x[MatrixEntry.column]6
end7
end8
if sump > 0 then9
value← exp[µ ∗ (sump − 1)]10
Emit((y, Key) , value)11
end12
if sumc > 0 then13
value← exp[µ ∗ (1− sumc)]14
Emit((z, Key), value)15
end16
Similarly, a second MapReduce job is used to compute ATy(x)
CTz(x). The intuition here is
the same as the previous job, except that the transpose of the matrices has to be used.
The implementation is as follows.
Map. The Map function is again an identity function. However, instead of row numbers,
column numbers are used as the intermediate keys. This way, shuffling reconstructs
columns of the matrices instead of their rows. Columns of a matrix are rows of the
transpose of that matrix. Algorithm 3.4 shows the second Map function.
Algorithm 3.4: MR-MixedPC: MapIIinput: Key, MatrixEntry
Key ← MatrixEntry.column1
Emit(Key, MatrixEntry)2
Reduce. Algorithm 3.5 outlines the second reducer. It performs the multiplications
as before. Additionally, given that entries in the same columns of both packing and
Chapter 3. Solving Linear Programs in MapReduce 20
covering matrices arrive at the same reducer, the function can go further in lines 10 to
17 to decide on the new value of the corresponding variable for the column at hand.
Algorithm 3.5: MR-MixedPC: ReduceIIinput: Key, MatrixEntryList
sump ← 0, sumc ← 01
foreach MatrixEntry in MatrixEntryList do2
if MatrixEntry.origin == Packing then3
sump ← sump+ MatrixEntry.value ∗y[ MatrixEntry.row ]4
else5
sumc ← sumc+ MatrixEntry.value ∗z[ MatrixEntry.row ]6
end7
end8
j ← Key9
r ← sump
sumc10
if r ≤ 1− α then11
xj ← max (xj (1 + β) , δ)12
end13
if r ≥ 1 + α then14
xj ← xj (1− β)15
end16
Emit(j, xj)17
In summary, matrices in MR-MixedPC are represented as a collection of non-zero entries,
that are gathered together in shuffling to reconstruct matrix rows and columns. The
two MapReduce jobs are repeatedly run one after the other. In the beginning of each
iteration, master node in the cluster announces the current value of all variables to
workers. When the first MapReduce job is complete, the corresponding violation for the
current solution is propagated amongst the workers.
MR-MixedPC provides a basic understanding of how the AK algorithm can be realized
in MapReduce, and establishes a baseline for a more efficient implementation, MR-
MixedPC-E.
3.2.2 MR-MixedPC-E
The main observation for Efficient MR-MixedPC is that although the packing and cov-
ering matrices are fixed during runtime, shuffling has to be repeated to reconstruct rows
and columns in each iteration. Considerable speed up could be achieved by removing
the redundant shuffles. To that end, the construction of rows and columns from matrix
entries is removed from the body of the algorithm, and is performed only once, in a
preprocessing phase before the algorithm starts to iterate.
Chapter 3. Solving Linear Programs in MapReduce 21
We introduce two new data structures in MR-MixedPC-E, MatrixRow and MatrixCol-
umn. As the names suggest, instances of these data structures are meant to contain
rows and columns of sparse matrices. MR-MixedPC-E is composed from two building
blocks, a preprocessing part which is performed only once when the algorithm starts,
and an iterative part that repeatedly runs until the solution is found.
Preprocessing Phase
The goal at this step is to generate a row-based as well as a column-based view of the
input matrices that could be consumed later on by the iterative part.
We use two MapReduce jobs to implement this. The Map functions are the same as those
in MR-MixedPC. The reducers, however, are simpler. Paired with the first mapper, the
first reducer shown in Algorithm 3.6 receives a row number with a list of non-zero entries
in the row, and returns a MatrixRow instance containing the row entries. The second
reducer works with the next mapper to generate matrix columns in the same way. The
implementation is removed to Appendix A.
Algorithm 3.6: MR-MixedPC-E: ReduceI, Preprocessing Phaseinput: Key, MatrixEntryList
MatrixRow row = new MatrixRow()1
foreach MatrixEntry in MatrixEntryList do2
row.add(MatrixEntry)3
end4
Emit(Key, row)5
Iterative Phase
Next, the iterative part starts. Since matrix rows and columns are already available,
Map functions directly implement multiplications, with no further need to shuffling and
reducers. Algorithms 3.7 and 3.8 show the Map functions to compute y and z, andATy(x)
CTz(x), respectively.
At this point, the efficient realization of the AK method in MapReduce is complete.
MR-MixedPC-E omits zero-valued matrix entries by using sparse matrix representation.
Composed from two pairs of MapReduce jobs, the preprocessing phase loads the input
data into adequate structures that are consumed by the mapper functions in the iterative
part to gradually improve the solution.
Chapter 3. Solving Linear Programs in MapReduce 22
Algorithm 3.7: MR-MixedPC-E: MapI, Iterative Phaseinput: Key, MatrixRow
sump ← 0, sumc ← 01
foreach MatrixEntry in MatrixRow do2
if MatrixEntry.origin == Packing then3
sump ← sump+ MatrixEntry.value ∗x[ MatrixEntry.column ]4
else5
sumc ← sumc+ MatrixEntry.value ∗x[ MatrixEntry.column ]6
end7
end8
if sump > 0 then9
value← exp[µ ∗ (sump − 1)]10
Emit((y, Key), value)11
end12
if sumc > 0 then13
value← exp[µ ∗ (1− sumc)]14
Emit((z, Key), value)15
end16
Algorithm 3.8: MR-MixedPC-E: MapII, Iterative Phaseinput: Key, MatrixColumn
sump ← 0, sumc ← 01
foreach MatrixEntry in MatrixColumn do2
if MatrixEntry.origin == Packing then3
sump ← sump+ MatrixEntry.value ∗y[ MatrixEntry.row ]4
else5
sumc ← sumc+ MatrixEntry.value ∗z[ MatrixEntry.row ]6
end7
end8
j ← Key9
r ← sump
sumc10
if r ≤ 1− α then11
xj ← max (xj (1 + β) , δ)12
else if r ≥ 1 + α then13
xj ← xj (1− β)14
end15
Emit(j, xj)16
Chapter 3. Solving Linear Programs in MapReduce 23
3.2.3 MR-MixedPC-S
Evaluating the exponential function for very large numbers might lead to buffer overflow
and numerical instability, in practice. A common solution is to downscale the computa-
tions using the log function. Logarithm is a strictly monotonic function, so it preserves
the order of its arguments (i.e., if x1 < x2 then log(x1) < log(x2)).
Using this property, it is possible to perform the comparisons in lines 6 and 8 of the
original AK algorithm in 3.1 in log scale. The goal is to replace the evaluation of the
exponential functions in y(x) and z(x) with a numerically stable version. To this end, we
first introduce the method for precise calculation of logarithm of a sum by Pihlakas [18].
Pihlakas method is based on the notion ln (a+ b) = ln (exp [ln (a)− ln (b)] + 1) + ln (b).
We adapt the bivariate method in [18] to implement multivariate lsum, a numerically
stable function for computing logarithm of sum of a list of numbers from their individual
logs. The implementation is represented in Algorithm 3.9.
MAX MANTISSA in 3.9 refers to the maximal value of mantissa in double-precision
floating point numbers, and top, add, and len are vector operators. Top removes and
returns the last element in a vector. Add appends an element to the end of the vector,
and len returns the count of the elements.
Algorithm 3.9: lsuminput: v
repeat1
lna ← top(v)2
lnb ← top(v)3
if abs (lna − lnb) ≥ ln (MAX MANTISSA) then4
add(v , max (lna, lnb))5
else6
add(v , ln (exp [lna− lnb] + 1) + lnb)7
end8
until len(v) == 1 ;9
return top(v)10
The next step is to compute ln(Rj):
Chapter 3. Solving Linear Programs in MapReduce 24
ln (Rj) = lnATj y
CTj z
(3.5)
= ln ATj y− ln CT
j z
= lnm∑i=1
ATjiyi − ln
k∑i=1
CTjizi
= ln(ATj1y1 + · · ·+ AT
jmym)− ln
(CTj1z1 + · · ·+ CT
jkzk)
= lsum(ln AT
j1y1, . . . , ln ATjmym
)− lsum
(ln CT
j1z1, . . . , ln CTjkzk
)= lsum
(ln[ATj1 + µ ∗ (A1x− 1)
], . . . , ln
[ATjm + µ ∗ (Amx− 1)
])− lsum
(ln[CTj1 + µ ∗ (1−C1x)
], . . . , ln
[CTjk + µ ∗ (1−Ckx)
])As Eq. 3.5 shows, evaluating ln(Rj) can be reduced to a series of log, addition, and
multiplication operations, with all exponential evaluations redirected to the numerically
stable lsum function.
Putting it all together, the numerically stable AK is depicted in Algorithm 3.10.
Algorithm 3.10: Numerically Stable AK for Mixed Packing-Coveringinput: x ∈ Rn
+
repeat1
for j = 1 . . . n do2
t1 ← ln ATj + µ. (Ax− 1)3
t2 ← ln CTj + µ. (1−Cx)4
ln Rj ← lsum (t1)− lsum (t2)5
if ln Rj ≤ ln (1− α) then6
xj ← max (xj (1 + β) , δ)7
else if ln Rj ≥ ln (1 + α) then8
xj ← xj (1− β)9
end10
end11
until Ax ≤ (1 + ε).1 and Cx ≥ (1− ε).1 ;12
return x13
MR-MixedPC-S adapts the same implementation as MR-MixedPC-E, with minor mod-
ifications in its iterative section, namely the first mapper computes log of y and z, and
the second one consumes the logs to evaluate ln(Rj). The implementation is similar to
algorithms 3.7 and 3.8. The complete outline can be found in Appendix A.
Chapter 4
Experiments
4.1 Awerbuch-Khandekar Performance Analysis
Chapter 3 introduced the AK algorithm to find a feasible solution for mixed packing-
covering problems. The method was based on parallel approximation of gradient descent
approach to minimize an exponential penalization function of the sum of all violations,
referred to as the potential function. The step size in each iteration was bounded by the
maximum accepted violation, ε.
In this section, the connection between violation of constraints with the value of the
potential function is examined, and the role of ε as an important calibration factor in
the algorithm is quantified.
The experiment is composed of two randomly generated coefficient matrices of equal
size, 1000 × 1000, with 90% randomly selected zero-valued entries per each row. This
problem was repeatedly solved by the algorithm for various ε values, and the value of
the potential function as well as the largest violation in each iteration was recorded. The
complete process was replicated 50 times to assure that the observations are meaningful.
The average values over all replications are reported here. To provide comparable results,
all experiments were run for 2, 000 iterations, even in cases where the expected violation
was achieved earlier.
Figure 4.1 represents the value of the potential function in each iteration for multiple
choices of ε. As the figure shows, the algorithm successfully decreases the potential at
each step until a minimum value is reached, when no more changes are observed after-
wards. Figure 4.2 verifies that decreasing the value of the potential function effectively
reduces the violation of constraints.
25
Chapter 4. Experiments 26
0 500 1000 1500 2000
2000
2500
3000
3500
4000
Iteration No.
Pot
entia
l Val
ue●
●
● ● ● ● ● ●
● ● ● ● ● ●
●
● ● ● ● ●
●
●
● ●
eps=1.0eps=0.8eps=0.6eps=0.4eps=0.2
Figure 4.1: The AK Potential Value for Different Epsilons
As Fig. 4.2 suggests, better violation is achieved at the cost of more iterations. The
reason is that decreasing the epsilon reduces the step size in the gradient descent method
at the same time. As a result, more steps are required. However, it is the case that the
eventual violation is often much smaller than the initially declared value. We exploit
this phenomenon in case studies to practically speed up the algorithm by separating the
epsilon parameter from the maximum accepted violation.
4.2 Scalability Test for Sequential LP Solvers
There are many variations of sequential LP solvers, that find the exact solution to linear
programs. In this section we describe an experiment to examine the scalability of some
off-the-shelf LP solvers. The considered methods include LPSolve, Symphony, GLPK,
and CPlex. The experiments were run on a 2.66 GHz Intel R©CoreTM2 processor with 4
GB RAM.
Chapter 4. Experiments 27
0 500 1000 1500 2000
0.01
0.02
0.05
0.10
0.20
0.50
1.00
Iteration No.
Max
Vio
latio
n●
●
●
● ● ● ●
●
●
●
●
● ● ●
●●
●
●
● ● ●
● ● ● ●
●
●
●
● ● ● ●●
●
●
eps=1.0eps=0.8eps=0.6eps=0.4eps=0.2
Figure 4.2: The AK Violation Value for Different Epsilons
We started with a problem with 200 constraints over 100 variables, which contained
90% zero-valued coefficients per each constraint. The value of non-zero coefficients as
well as the location of zero-valued ones were decided using a uniform random number
generator. The feasibility of the problem was guaranteed by the construction method.
In this experiment, we repeatedly doubled the dimension of the problem, and exposed
them to the available solvers to measure their runtime. Among LPSolve, Symphony,
and GLPK, LPSolve and Symphony were observed to be the slowest and the fastest,
respectively. However, non of these methods scaled to problems with more than 6400
constraints, as the corresponding matrices could no longer be loaded into memory. With
CPlex, however, larger problems were considerable, since the method avoided allocat-
ing non-necessary space to zero-valued entries by using sparse matrix representation in
triplet format. But the runtime proved to be an issue here.
Figure 4.3 plots CPlex runtime against the dimension of the problem. As it can be seen,
the runtime shows an almost 10-fold increase at each step while moving from 200 to
Chapter 4. Experiments 28
Figure 4.3: CPlex Runtime
6400 constraints. Doubling the number of constraints from 6400 to 12800 increased the
runtime by a factor of 30.
4.3 Comparison with Partially-distributed Methods
This section is aimed at studying the performance of the AK algorithm in minimizing
the potential function, by comparing it with two centralized optimization techniques,
the Broyden-Fletcher-Goldfarb-Shanno (BFGS), and its Limited memory alternative
(L-BFGS-B).
The BFGS family of methods are known as quasi-Newton techniques, and belong to the
general class of hill-climbing non-linear optimization approaches that seek the station-
ary point of a twice continuously differentiable function, where the necessary optimality
condition of the gradient being zero is satisfied. On the one hand, these iterative op-
timization methods are centralized in the sense that decisions on new values of the
variables in each iteration has to be taken in a central node that is aware of the values
of all variables. On the other hand, they could be partially distributed by outsourcing
the demanding operations in each iteration to a MapReduce cluster, for example.
In this experiment a uniform-distribution random number generator was used to con-
struct packing and covering constraint matrices of 1000×1000 dimensions, that contained
90% randomly selected zero-valued entries in each row. The problems were solved by
the AK, BFGS, and L-BFGS methods. This procedure was replicated 50 times, and
potential value in each iteration, as well as the overall number of the evaluations of the
potential function were recorded.
Chapter 4. Experiments 29
4.3.1 BFGS
Starting with BFGS, the algorithm showed strong dependency to the input data. In 20
out of 50 runs, BFGS managed to converge to the optimal solution in less than 1000
iterations, with the average potential value 1738, that was achieved after an average of
815 iterations, requiring 12746 evaluations of the potential function, on average.
In the other 30 runs which the running processes were forced to stop after 1000 iterations,
large variations in the final value of the potential function were observed, with 1739 and
2962659 being the minimum and the maximum values, respectively. The average value
in this case was 455968 with large standard deviation equal to 805172, requiring 17015
evaluations of the potential function, on average. Figure 4.4 shows the diversity of BFGS
results when the method did not converge after 1000 iterations. In this figure, the first
and third quantiles are represented by two horizontal lines restricting the borders of the
box, with a dark, horizontal segment at the median, plus a whisker that extends to the
maximum value. Here, the minimum happens to be very close to the first quantile, so
its corresponding whisker is not visible in the picture.
Figure 4.4: Eventual Potential Values in Non-converging BFGS Runs
Chapter 4. Experiments 30
The AK algorithm with ε = 0.8, on the other hand, successfully converged in all cases
after 250 iterations, on average, requiring the same number of evaluations of the potential
function. The average potential was 2200, with minimum, maximum, and standard
deviation being 2170, 2238, and 19, respectively.
A closer look at the steps taken by BFGS reveals the reason for the method’s low
convergence rate in this experiment. Since our objective function is exponential, it is
very sensitive to small changes in values of the variables, which introduces significant
numerical instability to this method. While the AK algorithm is proven to never decrease
the quality of the solution [1], this obviously is not the case with BFGS. As an example,
Figure 4.5 shows what BFGS examined solutions for a diverging run in our experiment
look like. Although BFGS never takes steps that make the solution worse, this internally
relies on its fail safe behavior that requires testing different candidate solutions before
deciding on the next one.
●
●
●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●●
●
●
●●
●
●
●●●
●
●
●●●
●
●
●●●
●
●
●●●
●
●
●●●
●
●
●●●
●
●
●●
●
●●●●
●
●
●●●
●
●
●●●
●
●
●●●
●
●
●●●●
●
●
●●●
●
●●●●
●
●
●●●
●
●
●●●●
●
●
●
●●●
●
●
●
●●●
●
●
●●●
●
●
●●●●
●
●
●●●●
●
●
●●●
●
●
●●●
●
●
●●●
●
●
●●●
●
●
●●●
●
●
●●●
●
●
●●●
●
●
●●●
●
●
●●●
●
●
●●●
●
●
●●●
●
●
●●●
●
●
●●●
●
●
●●●
●
●
●●●
●
●
●●●
●
●
●●●
●
●
●●●
●
●
●●●
●
●
●●●
●
●
●●●
●
●
●●●
●
●
●●●
●
●
●●●
●
●
●●●
●
●
●●●
●
●
●●●
●
●
●●●
●
●
●●●
●
●
●●●
●
●
●●●
●
●
●●●
●
●
●●●
●
●
●●●
●
●
●●●
●
●
●●●
●
●
●●●
●
●●●●
●
●
●●●
●
●
●●●
●
●
●●●
●
●
●●●
●
●●●●
●
●●●●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●
●
●
●●●●
●
●●●●
●
●●●●
●
●●●●
●
●
●●●
●
●
●●●●
●
●
●●●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●
●
●
●●●●
●
●
●●●●
●
●
●●●
●
●
●●●
●
●
●●●●
●
●
●●●
●
●
●●●
●
●
●●●
●
●
●●●
●
●
●●●
●
●
●●●
●
●
●●●
●
●
●●●
●
●
●●●
●
●
●●●
●
●
●●●
●
●
●●●
●
●
●●●
●
●
●●●
●
●
●●●
●
●
●●●
●
●
●●●
●
●
●●●
●
●
●●●
●
●
●●●
●
●
●●●
●
●
●●●
●
●
●●●
●
●
●●●
●
●
●●●
●
●
●●●
●
●
●●●
●
●
●●●
●
●
●●●
●
●
●●●
●
●
●●●
●
●
●●●
●
●
●●●
●
●
●●●
●
●
●●●
●
●
●●●
●
●
●●●
●
●
●●●
●
●
●●●
●
●
●●●
●
●
●●●
●
●
●●●
●
●
●●●
●
●●●●
●
●
●●●
●
●
●●●
●
●
●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●●●●
●
●●●●
●
●
●●●
●
●●●●
●
●●●●
●
●●●●
●
●
●●●
●
●●●●
●
●●●●
●
●●●●
●
●●●●
●
●●●●
●
●
●●●
●
●●●●
●
●●●●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●●●●●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●●
●
●
●●●●●●
●
●●●●●●
●
●
●●●●●
●
●●●●●●
●
●●●●●●
●
●
●●●●●●
●
●
●●●●●●
●
●
●●●●●●
●
●
●●●●●●
●
●
●●●●●●
●
●
●●●●●●
●
●
●●●●●●
●
●
●●●●●●
●
●
●●●●●●
●
●
●●●●●●
●
●
●●●●●●
●
●
●●●●●●
●
●
●●●●●●
●
●
●●●●●●
●
●
●●●●●●
●
●
●●●●●●
●
●
●●●●●●
●
●
●●●●●●
●
●
●●●●●●
●
●
●●●●●●
●
●
●●●●●●
●
●
●●●●●●
●
●
●●●●●●
●
●
●●●●●●
●
●
●●●●●●
●
●
●●●●●●
●
●
●●●●●●
●
●
●●●●●●
●
●
●●●●●●
●
●
●●●●●●
●
●
●●●●●●
●
●
●●●●●●
●
●
●●●●●●
●
●
●●●●●●
●
●
●●●●●●
●
●
●●●●●●
●
●
●●●●●●
●
●
●●●●●●
●
●
●●●●●●
●
●●●●●●●
●
●●●●●●●
●
●●●●●●●
●
●
●●●●●●●
●
●
●●●●●●●
●
●
●●●●●●●
●
●
●●●●●●●
●
●
●●●●●●●
●
●
●●●●●●●
●
●
●●●●●●●
●
●
●●
●
●
●●●
●
●
●●●
●
●
●●●
●
●
●●●
●
●
●●●
●
●
●●●
●
●
●●●
●
●
●●●
●
●
●●●●
●
●
●●●
●
●
●
●●●
●
●
●
●●●
●
●
●●●●
●
●
●●●
●
●
●●●●
●
●
●●●●
●
●
●●●
●
●
●●●
●
●
●●●
●
●
●●●
●
●
●●●
●
●
●●●
●
●
●●●
●
●
●●●
●
●
●●●
●
●
●●●
●
●
●●●
●
●
●●●
●
●
●●●
●
●
●●●
●
●
●●●●
●
●
●●●
●
●
●●●
●
●●●●
●
●
●●●
●
●
●●●
●
●●●●
●
●●●●
●
●●●●
●
●●●●
●
●●●●
●
●
●●●
●
●●●●
●
●
●●●●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●●●●●
●
●●●●●
●
●●●●
●
●●●●●
●
●●●●●
●
●●●●●
●
●
●●●●●
●
●●●●●
●
●●●●
●
●●●●●
●
●●●●●
●
●●●●
●
●●●●
●
●●●●●
●
●●●●
●
●
●●●●
●
●●●●
●
●●●●●
●
●●●●
●
●●●●
●
●●●●●
●
●
●●●●●
●
●
●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●●●●●
●
●
●●●●
●
●●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●●●●●
●
●●●●●
●
●●●●●
●
●●●●●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●●●●●●
●
●●●●●●
●
●●●●●●
●
●
●●●●●
●
●●●●●●
●
●●●●●●
●
●
●●●●●●
●
●
●●●●●●
●
●
●●●●●●
●
●
●●●●●●
●
●
●●●●●●
●
●
●●●●●●
●
●
●●●●●●
●
●
●●●●●●
●
●
●●●●●●
●
●
●●●●●●
●
●
●●●●●●
●
●
●●●●●●
●
●
●●●●●●
●
●
●●●●●●
●
●
●●●●●●
●
●
●●●●●●
●
●
●●●●●●
●
●
●●●●●●
●
●
●●●●●●
●
●
●●●●●●
●
●
●●●●●●
●
●
●●●●●●
●
●
●●●●●●
●
●
●●●●●●
●
●
●●●●●●
●
●
●●●●●●
●
●
●●●●●●
●
●
●●●●●●
●
●
●●●●●●
●
●
●●●●●●
●
●
●●●●●●
●
●
●●●●●●
●
●
●●●●●●
●
●
●●●●●●
●
●
●●●●●●
●
●
●●●●●●
●
●
●●●●●●
●
●
●●●●●●
●
●
●●●●●●
●
●
●●●●●●
●
●
●●●●●●
●
●
●●●●●●
●
●
●●●●●●
●
●
●●●●●●
●
●
●●●●●●
●
●
●●●●●●
●
●
●●●●●●
●
●
●●●●●●
●
●
●●●●●●
●
●
●●●●●●
●
●
●●●●●●
●
●
●●●●●●
●
●
●●●●●●
●
●
●●
●
●
●●●
●
●
●●●
●
●
●●●
●
●
●●●
●
●
●●●
●
●
●●●
●
●
●●●
●
●
●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●
●
●
●●●
●
●
●●●
●
●
●●●
●
●
●●●
●
●
●●●
●
●●●●
●
●
●●●
●
●
●●●
●
●
●●●
●
●
●●●
●
●
●●●
●
●
●●●
●
●
●●●
●
●●●●
●
●●●●
●
●
●●●●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●●●●●
●
●●●●●
●
●
●●●●
●
●●●●●
●
●
●●●●
●
●●●●●
●
●●●●●
●
●●●●●
●
●●●●●
●
●●●●●
●
●●●●●
●
●●●●●
●
●●●●●
●
●●●●●
●
●●●●●
●
●●●●●
●
●●●●●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●●●●●
●
●●●●●
●
●●●●●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●●●●●
●
●●●●●
●
●
●●●●
●
●●●●●
●
●●●●●
●
●●●●●
●
●●●●●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●●●●●●
●
●●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●●●●●●
●
●
●●●●●●
●
●
●●●●●●
●
●
●●●●●●
●
●
●●●●●●
●
●●●●●●
●
●
●●●●●
●
●
●●●●●
●
●
●●●●●
●
●●●●●●
●
●
●●●●●●
●
●
●●●●●●
●
●
●●●●●●
●
●
●●●●●●
●
●
●●●●●●
●
●
●●●●●●
●
●
●●●●●●
●
●
●●●●●●
●
●
●●●●●●
●
●
●●●●●●
●
●
●●●●●●
●
●
●●●●●●
●
●
●●●●●●
●
●
●●●●●●
●
●
●●●●●●
●
●
●●●●●●
●
●
●●●●●●
●
●
●●●●●●
●
●
●●●●●●
●
●
●●●●●●
●
●
●●●●●●
●
●
●●●●●●
●
●
●●●●●●
●
●
●●●●●●
●
●
●●●●●●
●
●
●●●●●●
●
●
●●●●●●
●
●
●●●●●●
●
●
●●●●●●
●
●
●●●●●●
●
●
●●●●●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●●
●
●
●●●
●
●
●●●
●
●
●●
●
●
●●●
●
●
●●●
●
●
●●●
●
●
●●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●●
●
●
●●●
●
●
●●●
●
●
●●●
●
●
●●●
●
●
●●●
●
●
●●●
●
●
●●●
●
●
●●●
●
●
●●●
●
●
●●●
●
●
●●●
●
●
●●●
●
●
●●●
●
●
●●●
●
●
●●●
●
●
●●●
●
●
●●●
●
●
●●●
●
●
●●●
●
●
●●●
●
●
●●●
●
●
●●●
●
●
●●●
●
●
●●●
●
●
●●●
●
●●●●
●
●
●●●
●
●
●●●
●
●
●●●
●
●●●●
●
●●●●
●
●
●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
●
●
●●●●
0 1000 2000 3000 4000 5000 6000
1e−
071e
+57
1e+
121
1e+
185
1e+
249
Iteration No.
Pot
entia
l Val
ue
Figure 4.5: Potential values in a Non-converging BFGS run
Chapter 4. Experiments 31
The specific example shown in Fig. 4.5 corresponds to a run that required the evaluation
of the potential function for 16500 different solutions during 1000 iterations, where
many potential values where larger than the biggest double-precision value in a 64-bit
architecture (we refer to these values as Inf, from now on). To be able to draw the
diagram, Inf values had to be removed, leaving us with 6250 values.
4.3.2 L-BFGS-B
The BFGS ability to work with Inf values made its application on our exponential
potential function straightforward. However, this convenience came at the cost of nu-
merical instability of the method. For this reason, we repeated our experiment with
L-BFGS-B. This method does not allow the objective function to assume Inf values,
hence it is expected to represent higher numerical stability. Additionally, L-BFGS-B
allows variables to be bounded. By defining a zero lower bound in our case, we could
assure that AK and L-BFGS-B solve exactly the same problem.
Fitting an exponential penalty function into L-BFGS-B, however, introduces new chal-
lenges to smooth the function using numerical methods, since small changes in variables
can easily explode the value of the function. Here, we used the log technique as a
frequent approach.
In general, applying the log function on the potential was examined in two different
ways. The first trial followed the approach in Chapter 3 by working with the log of the
potential function, φ(x) = ln Φ(x). The computations are similar to MR-MixedPC-S,
and are omitted for brevity.
Additionally, quasi newton approaches, including L-BFGS-B, require the gradient of the
objective function. Following the definition of the new function φ(x), we have
∇φ(x) =∇Φ(x)Φ(x)
(4.1)
Since the potential function reappears in its gradient, we first compute ln∇φ(x) =
ln∇Φ(x)− ln Φ(x). The effect of the additional log calculation can be reversed later by
computing the exponential value of the final result. We have already discussed ln Φ(x).
The next step is to drive ln∇Φ(x). Following the definition of Φ(x) in Eq. 3.4,
Chapter 4. Experiments 32
ln∇Φ(x) = ln
[m∑i=1
∇yi (x) +k∑i=1
∇zi (x)
](4.2)
= ln
[m∑i=1
∇ exp (µ ∗ [Aix− 1]) +k∑i=1
∇ exp (µ ∗ [1−Cix])
](4.3)
= ln
[m∑i=1
µyi (x) Ai +k∑i=1
(−µ) zi (x) Ci
](4.4)
Since components of the potential function in the form of yi and zi are still present in
Eq. 4.4, direct evaluation of the argument of the logarithm function in this equation
can still result in numerical instability. That is while replacement of the log of the
summation with the lsum of the logarithm of the individual elements, as we did for
ln Φ(x) is impossible, because the log function for negative numbers is not defined. In
consequence, this approach proves to be inapplicable with our potential function and
L-BFGS-B method.
The next approach was to downscale the input of the potential function instead of the
function itself. We define φ(x) = Φ (ln (1 + x)). The gradient is derived as follows:
∇φ(x) =m∑i=1
∇yi (ln (1 + x)) +k∑i=1
∇zi (ln (1 + x)) (4.5)
=m∑i=1
∇ exp (µ ∗ [Ai ln (1 + x)− 1]) +k∑i=1
∇ exp (µ ∗ [1−Ci ln (1 + x)]) (4.6)
=m∑i=1
µyi (ln (1 + x)) Ai1
1 + x−
k∑i=1
µzi (ln (1 + x)) Ci1
1 + x(4.7)
Since the exponential component in the objective function and its gradient has to still be
evaluated in the new approach, chances that the computations overflow are not removed
completely. As a matter of fact, this approach occasionally resulted in Inf values in
our experiments. A limited workaround in this case was to decrease non-zero entries in
the coefficient matrices both in count and in scale, however, this obviously might not
be possible in real applications. With this additional limitation, L-BFGS-B successfully
converged in all 50 replications, and reduced the value of the objective function to an
average of 1819 after 135 iterations, requiring the same number of evaluations of the
potential function.
Chapter 4. Experiments 33
BFGSL-BFGS-B AK
Converged DivergedNo. of Iterations 815 1000 135 250Potential Value 1738 455968 1819 2200Potential STD 17 805172 11 19
Table 4.1: Comparison with Partially-distributed Methods
The AK, BFGS, and L-BFGS-B performance results are summarized in Table 4.1. In
summary, BFGS was easy to apply, but it was numerically unstable with our exponential
potential function. L-BFGS-B was applicable only after downscaling the function and it
was still prone to failure due to overflow. The AK algorithm presented a slightly slower
convergence rate, however, it was highly numerically-stable and easy to apply.
4.4 Large-scale Experiments
In this section we move towards a considerably large problem that does not fit into the
memory of a single machine. The problem contained 160, 000 constraints over 80, 000
variables with 90% randomly selected zero-valued entries. The resulting coefficient ma-
trix in sparse triplet format took more than 10 GB space to store.
This experiment was run on an experimental cluster of 4 blade servers. Each server
had a 48-Core processor and 50 GB RAM. The MapReduce cluster was installed us-
ing Hadoop [24]. One of the servers was exclusively devoted to the master node, and
the others were used as workers. Each worker was assigned a maximum capacity of 4
simultaneous Map and 3 Reduce jobs.
Figure 4.7 represents the sequence of Map, Shuffle, and Reduce attempts for MR-
MixedPC over time. Each row in the diagram corresponds to an available slot on a
worker node. Since we have 3 nodes in the cluster with 7 overall slots on each (4 Maps
and 3 Reduces), there are 21 rows in the diagram. Each column corresponds to a com-
plete MapReduce job. The diagram shows the first 5 iterations of MR-MixedPC. Since
a pair of MapReduce jobs are performed in each iteration, the diagram is composed of
10 overall columns. The figure reveals that short Map and Reduce attempts in MR-
MixedPC are dominated by long Shuffle stages.
The attempts history for MR-MixedPC-E is depicted in Figure 4.8. The first 2 columns
correspond to preprocessing phase in the algorithm. The next 10 columns correspond
to the iterative part. As it was described in Chapter 3, the efficient realization of
MR-MixedPC avoids the unnecessary sort and shuffle operations during iterations by
consuming the results of the preprocessing phase. As it can be seen, this can effectively
Chapter 4. Experiments 34
reduce the runtime of the algorithm. For 5 iterations in this experiment, the completion
time was decreased from more than 4000 to less than 3000 seconds.
Similar plot for MR-MixedPC-S is shown in Figure 4.9. As expected, the orchestration
of the attempts are exactly the same as MR-MixedPC-E. However, due to the extra
overhead of calculations in logarithmic scale, the numerically stable implementation is
slightly slower than the efficient version.
Figures 4.10 to 4.12 show processor, disk, and network resource utilization of a single
worker node. As the figures suggest, MR-MixedPC is most demanding on all resources,
while storage and network traffic sharply decline during iterative phase in MR-MixedPC-
E and MR-MixedPC-S. Additionally, processor consumption considerably decreases in
both efficient and numerically-stable realizations. Compared to the efficient version,
however, the stable version is more CPU intensive.
4.5 Case Study: SOFIE
As it was introduced in Chapter 2, ontology-based information extraction is cast into
a Weighted MAX-SAT problem in SOFIE. In this section, we use the AK algorithm to
approximate the MAX-SAT instance that is generated by SOFIE. We also evaluate the
quality of the solution by comparing it to FMS* (SOFIE’s MAX-SAT solver), and CPlex
methods. Textual facts in this experiment were extracted from the Academic Corpus.
The ontological facts were retrieved from YAGO [20] knowledge base.
In order to scale to relatively large problems, rather than considering the whole corpus
at once, SOFIE splits textual facts into batches and repeatedly runs over them, one by
one. When running in the batch mode, accepted hypotheses in each run are stored in
the knowledge-base to be used as new facts in the upcoming runs. The first experiment
was designed to be small enough such that SOFIE could be run without batching. The
second experiment used SOFIE with 20, 000 batch size.
4.5.1 Mid-size Experiment
In this experiment 200, 000 pattern occurrences were used. The resulting problem con-
tained 86, 944 clauses over 92, 444 positive and negative variables with 3, 575 competitor
sets for 96, 071 hypotheses. The average length of clauses was 1.678230 with 1 and 4
being the minimum and maximum length, respectively. A variable or its negation, on
average, participated in 3.156765 clauses. The most frequent variable was observed in
4, 886 clauses, while the least frequent ones were seen only once. The longest competitor
Chapter 4. Experiments 35
set contained 3 variables. The shortest one had naturally 2. The average number of
variables in competitor sets was 2.554126.
It took around 2 hours to generated the hypotheses, while the elapsed time to cast
the hypothese into the corresponding MAX-SAT formulation was negligible (less than
a minute). The resulted MAX-SAT problem was composed of 136, 741 constraints over
179, 388 variables.
It turned out to be trivial to compute the optimal solution to this problem with CPlex.
The method was able to solve the problem with Boolean variables type in less than 2
seconds, achieving 95.01208% of the overall weights. CPlex runtime logs are provided in
Appendix B. On the other hand, the FMS* algorithm took 19 hours to terminate, and
achieved 0.995888 of the optimal solution.
Finally, we approximated this problem with the AK algorithm using the LP-relaxation
technique introduced in Chapter 2. The goal here is to compare the quality of the
AK solution with the other two methods. Since the weight of the optimal solution
was already found by CPlex, we avoided the unnecessary binary search for the optimal
solution, and directly used the optimal weight in removing the objective of the Mixed
PC problem. However, when this value is not known, binary search is inevitable.
We noticed earlier, in the analysis of the algorithm in the beginning of this chapter,
that the number of iterations dramatically increases as the maximum accepted violation
decreases, such that the algorithm did not terminate after 2, 000 iterations when violation
parameter was set to 0.2. However, it was noticed that given an ε, the algorithm often
reached relatively smaller violations. There, it was suggested to exploit this phenomenon
to speed up the algorithm by separating the violation parameter from ε. We call this
new parameter ξ. Following we summarize the results of the experiments with the AK
algorithm running on the MAX-SAT problem at hand.
As it was expected, the original algorithm with ε = 0.1 did not terminate after a few
thousands of iterations, and it had to be stopped. Next, the modified version was started
with ξ = 0.1. The ε was initially set to 8, and it was logarithmically decreased over time.
To decide on when to decrease epsilon, we adopted the following intuition: Since the
algorithm is actually running with larger step sizes than it should be, it either reaches
the expected violation, or it gets stock at some best reachable violation. As a result, we
decreased the ε whenever the violation got fixed, or it started to behave strangely (e.g.,
the violation started to increase).
Instead of using random initial solutions, as it was the case with the last two experiments,
it is sometimes possible to engineer an initial solution for the specific problem at hand. In
the next experiment, setting all variables to 1 turned out to be a good initial solution,
Chapter 4. Experiments 36
Figure 4.6: Violation Comparison for Modified AK Algorithm
since it effectively reduced maximum initial violation to 7.69, compared to 43, 523 in
previous runs. This selection followed the simple intuition that the number of constraints
that will be satisfied in this way (the covering constraints) will be much more than those
that will be violated (the packing constraints).
Figure 4.6 compares the violation of the above methods in each iteration. As it can be
seen, the performance was sharply improved by modifying epsilon, such that the algo-
rithm converged to the expected maximum violation after 570 iterations. The number
of iterations further decreased to 216 after the modified initial solution was used.
After rounding from fractional to Boolean, the AK solution satisfied 81.11134% of the
overall weights, which was 0.853695 of the optimal solution.
4.5.2 Large-scale Experiment
In this experiment 1, 500, 000 textual facts from the complete Academic Corpus were
used. The generated problem contained 1, 161, 170 clauses over 1, 083, 898 positive and
negative variables with 39, 123 competitor sets for 836, 939 hypotheses. The longest com-
petitor set had 3 variables, and the average length of all competitor sets was 2.547683.
Chapter 4. Experiments 37
Mid-size Large-scaleCPlex FM* AK CPlex FM* AK
Weight of Satis-fied Clauses (%of total)
95.01208 94.6214 81.11134 93.37597 85.62238 78.26682
ApproximationRatio (% ofoptimal weight)
100 99.5888 85.3695 100 91.69638 83.81902
Table 4.2: Case Study Results
Due to the large dimensions of clauses and variables, computing further statistics was
not straightforward.
Generating the hypotheses took around 122 hours. The corresponding MAX-SAT prob-
lem was composed of 1, 742, 242 constraints for 2, 245, 068 variables, and was constructed
in 5 minutes, roughly.
The optimal solution to this problem with Boolean variable types was computed with
CPlex method in the course of 101.4 seconds. The optimal solution satisfied 93.37597%
of all weights. The FMS* solution was found after 63 hours, and satisfied 85.62238%
of the overall weights, which is 0.9169638 of the optimal solution. The AK algorithm
with varying epsilon and designed initial solution terminated after 535 iterations, and
satisfied 78.26682% of all weights, that was 0.8381902 of the optimal solution. Table 4.2
summarizes the results.
Chapter 4. Experiments 38
01000
20003000
4000
0 5 10 15 20 25
Tim
e [s]
Node/Slot
map
shufflesort
reducefailed
Fig
ure
4.7
:M
R-M
ixedPC
JobH
istory
Chapter 4. Experiments 39
050
010
0015
0020
0025
00
0510152025
Tim
e [s
]
Node/Slot
map
shuf
fleso
rtre
duce
faile
d
Fig
ure
4.8
:M
R-M
ixed
PC
-EJo
bH
isto
ry
Chapter 4. Experiments 40
0500
10001500
20002500
3000
0 5 10 15 20 25
Tim
e [s]
Node/Slot
map
shufflesort
reducefailed
Fig
ure
4.9
:M
R-M
ixedPC
-SJob
History
Chapter 4. Experiments 41
010
0020
0030
0040
00
02468
Tim
e [s
]
Node/Slot
map
shuf
fleso
rtre
duce
faile
d
Ave
rage
CP
U u
tiliz
atio
n
Tim
e
CPU utilization [%]
0.57
6999
9027
2522
429.
5769
9990
2725
858.
5769
9990
2725
1287
.576
9999
0273
1716
.576
9999
0273
2145
.576
9999
0273
2574
.576
9999
0273
3003
.576
9999
0273
3432
.576
9999
0273
3861
.576
9999
0273
4290
.576
9999
0273
020406080100
020406080100
Use
rS
ysW
ait
Dis
k
Tim
e
Traffic [MB/s]
0.57
6999
9027
2522
429.
5769
9990
2725
858.
5769
9990
2725
1287
.576
9999
0273
1716
.576
9999
0273
2145
.576
9999
0273
2574
.576
9999
0273
3003
.576
9999
0273
3432
.576
9999
0273
3861
.576
9999
0273
4290
.576
9999
0273
010203040506070
010203040506070
Rea
d M
b/s
Writ
e M
b/s
Net
wor
k
Tim
e
Traffic [MB/s]
0.57
6999
9027
2522
429.
5769
9990
2725
858.
5769
9990
2725
1287
.576
9999
0273
1716
.576
9999
0273
2145
.576
9999
0273
2574
.576
9999
0273
3003
.576
9999
0273
3432
.576
9999
0273
3861
.576
9999
0273
4290
.576
9999
0273
−5000500
−5000500
Loca
l rea
dR
emot
e re
adLo
cal w
rite
Rem
ote
writ
e
Fig
ure
4.1
0:
MR
-Mix
edP
CR
esou
rce
Uti
lizat
ion
Dia
gram
Chapter 4. Experiments 42
0500
10001500
20002500
0 2 4 6 8
Tim
e [s]
Node/Slot
map
shufflesort
reducefailed
Average C
PU
utilization
Tim
e
CPU utilization [%]
0.263000011444092280.263000011444
552.263000011444824.263000011444
1096.263000011441368.26300001144
1640.263000011441912.26300001144
2184.263000011442456.26300001144
2728.26300001144
0 20 40 60 80 100
0 20 40 60 80 100
User
Sys
Wait
Disk
Tim
e
Traffic [MB/s]
0.263000011444092280.263000011444
552.263000011444824.263000011444
1096.263000011441368.26300001144
1640.263000011441912.26300001144
2184.263000011442456.26300001144
2728.26300001144
0 10 20 30 40 50 60 70
0 10 20 30 40 50 60 70
Read M
b/sW
rite Mb/s
Netw
ork
Tim
e
Traffic [MB/s]
0.263000011444092280.263000011444
552.263000011444824.263000011444
1096.263000011441368.26300001144
1640.263000011441912.26300001144
2184.263000011442456.26300001144
2728.26300001144
−400 0 200 400
−400 0 200 400
Local readR
emote read
Local write
Rem
ote write
Fig
ure
4.1
1:
MR
-MixedP
C-E
Resource
Utilization
Diagram
Chapter 4. Experiments 43
050
010
0015
0020
0025
0030
00
02468
Tim
e [s
]
Node/Slot
map
shuf
fleso
rtre
duce
faile
d
Ave
rage
CP
U u
tiliz
atio
n
Tim
e
CPU utilization [%]
0.26
5000
1049
0417
530
6.26
5000
1049
0460
3.26
5000
1049
0490
0.26
5000
1049
0411
89.2
6500
0104
914
69.2
6500
0104
917
49.2
6500
0104
920
29.2
6500
0104
923
09.2
6500
0104
925
89.2
6500
0104
928
69.2
6500
0104
9
020406080100
020406080100
Use
rS
ysW
ait
Dis
k
Tim
e
Traffic [MB/s]
0.26
5000
1049
0417
530
6.26
5000
1049
0460
3.26
5000
1049
0490
0.26
5000
1049
0411
89.2
6500
0104
914
69.2
6500
0104
917
49.2
6500
0104
920
29.2
6500
0104
923
09.2
6500
0104
925
89.2
6500
0104
928
69.2
6500
0104
9
020406080
020406080
Rea
d M
b/s
Writ
e M
b/s
Net
wor
k
Tim
e
Traffic [MB/s]
0.26
5000
1049
0417
530
6.26
5000
1049
0460
3.26
5000
1049
0490
0.26
5000
1049
0411
89.2
6500
0104
914
69.2
6500
0104
917
49.2
6500
0104
920
29.2
6500
0104
923
09.2
6500
0104
925
89.2
6500
0104
928
69.2
6500
0104
9
−4000200400
−4000200400
Loca
l rea
dR
emot
e re
adLo
cal w
rite
Rem
ote
writ
e
Fig
ure
4.1
2:
MR
-Mix
edP
C-S
Res
ourc
eU
tiliz
atio
nD
iagr
am
Chapter 5
Conclusion and Future Work
Many discrete optimization problems are NP-hard. In consequence, it is not possible to
have an algorithm for most interesting problems that simultaneously finds the optimal
solution, in polynomial time, for any instance. By far the most common approach to this
problem relaxes the requirement of optimality of the solution, and settles for a ”good
enough”, approximate solution that could be found in tractable time. Linear program-
ming plays a central role in design and analysis of many approximation algorithms. LP
problem instances in real-world applications, however, tend to grow enormously.
Rapid improvement and availability of cheap, commodity high-performance components
was the driving force for a new era in computing to use networks of computers to handle
large-scale computations.
Our Work: In this work performance of the Awerbuch-Khandekar (AK) parallel method
for solving linear programs was examined, and a modification to speed up the algorithm
in practice was suggested. In comparison with partially-distributed optimization tech-
niques, AK represented numerical stability, and comparable performance. Next, the
implementation of the algorithm in MapReduce was considered, and opportunities for
efficient realization of the algorithm were studied.
Finally, the solution of the algorithm on mid-size and large-scale MAX-SAT problems,
generated by SOFIE knowledge extraction framework, were compared to FMS* and
CPlex methods. Despite the large dimensions of the problems, both instances proved
to be trivial, such that CPlex was able to find the optimal solution in both cases in
the course of a few minutes. Further investigation of the clauses revealed that despite
the large dimensions of the problems, the average number of variables in each clause
was less than 2, which was due to the specific, short-length standard rules in SOFIE
framework. The same phenomenon was observed in competitor sets with an average of
45
Chapter 5. Conclusion and Future Work 46
2.55 variables in each. Furthermore, the ratio of the number of competitor sets over the
number of clauses in both instances was less than 0.05.
On the other hand, FMS* took 19 hours without batching on the mid-size, and 63 hours
with batching on the large-scale problem, with the approximation ratio degrading in the
latter instance from 99.58% to 91.69%. Ideally, when running over batches, the flow of
information is bidirectional, meaning that previously accepted hypotheses are seen in
forthcoming batches, and at the same time, wrongly accepted hypotheses from previous
batches are rejected based on the new evidences in the current batch. However, when
SOFIE accepts a hypothesis, it is stored in the knowledge-base and is treated as a fact
by the upcoming batches. As a result, wrongly accepted hypotheses can not be rejected,
and they might even seed to generate and accept further wrong hypotheses.
The modified AK algorithm with a designed initial solution presented high performance
improvements on MAX-SAT instances, and achieved well above 80% of the optimal
weight in both cases.
5.1 Future Work
Our work can be further expanded to consider other parallel LP solvers, in order to
compare their strengths and weaknesses with the AK algorithm that we used. Also,
more concrete analysis on the choice of the initial solution and the modification scheme
for ε in the modified version of the algorithm can be helpful to generalize this method
to other problem instances. Finally, it would be interesting to apply this framework in
other case studies with possibly non-trivial, large problem instances.
Chapter 6
Appendix A.
Algorithm 6.1: MR-MixedPC-E: ReduceII, Preprocessing Phaseinput: Key, MatrixEntryList
MatrixColumn column = new MatrixColumn()1
foreach MatrixEntry in MatrixEntryList do2
column.add(MatrixEntry)3
end4
Emit(Key, column)5
Algorithm 6.2: MR-MixedPC-S: MapI, Iterative Phaseinput: Key, MatrixRow
sump ← 0, sumc ← 01
foreach MatrixEntry in MatrixRow do2
if MatrixEntry.origin == Packing then3
sump ← sump+ MatrixEntry.value ∗x[ MatrixEntry.column ]4
else5
sumc ← sumc+ MatrixEntry.value ∗x[ MatrixEntry.column ]6
end7
end8
if sump > 0 then9
value← µ ∗ (sump − 1)10
Emit((y, Key), value)11
end12
if sumc > 0 then13
value← µ ∗ (1− sumc)14
Emit((z, Key), value)15
end16
47
Appendix A. 48
Algorithm 6.3: MR-MixedPC-S: MapII, Iterative Phaseinput: Key, MatrixColumn
foreach MatrixEntry in MatrixColumn do1
if MatrixEntry.origin == Packing then2
y[MatrixEntry.row]← ln(MatrixEntry.value) + y[MatrixEntry.row]3
else4
z[MatrixEntry.row]← ln(MatrixEntry.value) + z[MatrixEntry.row]5
end6
end7
j ← Key8
lnr ← lsum(y)− lsum(z)9
if lnr ≤ ln (1− α) then10
xj ← max (xj (1 + β) , δ)11
else if lnr ≥ ln (1 + α) then12
xj ← xj (1− β)13
end14
Emit(j, xj)15
Chapter 7
Appendix B.
CPlex Log for Mid-size MAX-SAT Problem:
Rcplex: num variables=188198 num constraints=138326
Tried aggregator 3 times.
MIP Presolve eliminated 6898 rows and 28891 columns.
Aggregator did 103537 substitutions.
Reduced MIP has 27891 rows, 55770 columns, and 105854 nonzeros.
Reduced MIP has 45359 binaries, 10411 generals, 0 SOSs, and 0 indicators.
Presolve time = 0.74 sec.
Clique table members: 1509.
MIP emphasis: balance optimality and feasibility.
MIP search method: dynamic search.
Parallel mode: none, using 1 thread.
Root relaxation solution time = 0.20 sec.
Nodes Cuts/
Node Left Objective IInf Best Integer Best Node ItCnt Gap
* 0+ 0 2370076.0215 0 ---
0 0 5284574.1144 179 2370076.0215 5284574.1144 3426 122.97%
* 0+ 0 5284056.7538 5284574.1144 3426 0.01%
49
Appendix B. 50
CPlex Log for Large-scale MAX-SAT Problem:
Rcplex: num variables=2245068 num constraints=1742242
Presolve has eliminated 0 rows and 0 columns...
Presolve has eliminated 29205 rows and 256314 columns...
Aggregator has done 1208457 substitutions...
Presolve has eliminated 38907 rows and 256344 columns...
Aggregator has done 1208457 substitutions...
Presolve has improved bounds 104685 times...
Presolve has eliminated 183366 rows and 510277 columns...
Aggregator has done 1315678 substitutions...
Presolve has improved bounds 104685 times...
Tried aggregator 5 times.
MIP Presolve eliminated 187999 rows and 514159 columns.
Aggregator did 1316592 substitutions.
Reduced MIP has 237651 rows, 414317 columns, and 814772 nonzeros.
Reduced MIP has 378166 binaries, 36151 generals, 0 SOSs, and 0 indicators.
Presolve time = 81.21 sec.
Clique table members: 21128.
MIP emphasis: balance optimality and feasibility.
MIP search method: dynamic search.
Parallel mode: none, using 1 thread.
Root relaxation solution time = 6.69 sec.
Nodes Cuts/
Node Left Objective IInf Best Integer Best Node ItCnt Gap
* 0+ 0 3.69162e+07 0 ---
0 0 5.56022e+07 2233 3.69162e+07 5.56022e+07 52813 50.62%
* 0+ 0 5.55954e+07 5.56022e+07 52813 0.01%
0 0 5.55999e+07 1045 5.55954e+07 Cuts: 675 53425 0.01%
Zero-half cuts applied: 649
Gomory fractional cuts applied: 21
Bibliography
[1] Baruch Awerbuch and Rohit Khandekar. Stateless algorithms for mixed packing
and covering linear programs with polylogarithmic convergence, 2009. manuscript.
[2] Baruch Awerbuch and Rohit Khandekar. Stateless distributed gradient descent for
positive linear programs. SIAM J. Comput., 38(6):2468–2486, 2009.
[3] Mark Baker and Rajkumar Buyya. Cluster computing: The commodity supercom-
puter. Softw., Pract. Exper., 29(6):551–576, 1999.
[4] S. Balay, K. Buschelman, V. Eijkhout, W. Gropp, D. Kaushik, M. Knepley, L. Curf-
man McInnes, B. Smith, and H. Zhang. Petsc users manual, 2010.
[5] Yair Bartal, John W. Byers, and Danny Raz. Fast, distributed approximation
algorithms for positive linear programming with applications to flow control. SIAM
J. Comput., 33(6):1261–1279, 2004.
[6] Flavio Chierichetti, Ravi Kumar, and Andrew Tomkins. Max-cover in map-reduce.
In WWW, pages 231–240, 2010.
[7] Lois Curfman McInnes, Jorge J. Mor’e, Todd Munson, and Jason Sarich. Tao users
manual, 2010.
[8] George B. Dantzig and Mukund N. Thapa. Linear Programming 1: Introduction,
chapter The Linear Programming Problem. Springer, 1997.
[9] Jeffrey Dean and Sanjay Ghemawat. Mapreduce: Simplified data processing on
large clusters. In OSDI, pages 137–150, 2004.
[10] Klaus Jansen. Approximation algorithm for the mixed fractional packing and cov-
ering problem. SIAM Journal on Optimization, 17(2):331–352, 2006.
[11] Fabian Kuhn, Thomas Moscibroda, and Roger Wattenhofer. The price of being
near-sighted. In SODA, pages 980–989, 2006.
[12] Hai-Guang Li, Gong-Qing Wu, Xuegang Hu, Jing Zhang, Lian Li, and Xindong
Wu. K-means clustering with bagging and mapreduce. In HICSS, pages 1–8, 2011.
51
Bibliography 52
[13] Chao Liu, Hung chih Yang, Jinliang Fan, Li-Wei He, and Yi-Min Wang. Distributed
nonnegative matrix factorization for web-scale dyadic data analysis on mapreduce.
In WWW, pages 681–690, 2010.
[14] Michael Luby and Noam Nisan. A parallel approximation algorithm for positive
linear programming. In STOC, pages 448–457, 1993.
[15] Faraz Makari Manshadi, Baruch Awerbuch, Rohit Khandekar, Julian Mestre,
Mauro Sozio, and Gerhard Weikum. Message-passing and map-reduce algorithms
for mixed packing-covering optimization with applications in data management.
Technical report under preparation.
[16] Christos H. Papadimitriou and Mihalis Yannakakis. Linear programming without
the matrix. In STOC, pages 121–129, 1993.
[17] Marcus Paradies. An efficient blocking technique for reference matching using
mapreduce. Datenbank-Spektrum, 11(1):47–49, 2011.
[18] Roland Pihlakas. Method for calculating precise logarithm of a sum, 2007.
manuscript.
[19] Fabian M. Suchanek. Automated Construction and Growth of a Large Ontology.
PhD thesis, Saarland University, 2009.
[20] Fabian M. Suchanek, Gjergji Kasneci, and Gerhard Weikum. Yago: A large ontol-
ogy from wikipedia and wordnet. J. Web Sem., 6(3):203–217, 2008.
[21] Fabian M. Suchanek, Mauro Sozio, and Gerhard Weikum. Sofie: a self-organizing
framework for information extraction. WWW, pages 631–640, 2009.
[22] Leonard W. Swanson. Linear Programming Basic Theory and Applications, chapter
Generalized Linear Programming Problems. McGRAW-HILL, 1985.
[23] Vijay V. Vazirani. Approximation Algorithms, chapter Maximum Satisfiability.
Springer, 2003.
[24] Tom White. Hadoop - The Definitive Guide: MapReduce for the Cloud. O’Reilly,
2009. ISBN 978-0-596-52197-4.
[25] David P. Williamson and David B. Shmoys. The Design of Approximation Algo-
rithms. Cambridge University Press, 2010.
[26] Neal E. Young. Sequential and parallel algorithms for mixed packing and covering.
In FOCS, pages 538–546, 2001.
[27] Weizhong Zhao, Huifang Ma, and Qing He. Parallel -means clustering based on
mapreduce. In CloudCom, pages 674–679, 2009.