Solving Linear Programs in MapReduce - Max Planck · PDF fileSolving Linear Programs ......

Universitat des SaarlandesMax-Planck-Institut fur Informatik

Solving Linear Programs in MapReduce

Masterarbeit im Fach Informatik

Masters Thesis in Computer Science

von / by

Mahdi Ebrahimi

angefertigt unter der Leitung von / supervised by

Prof. Dr. Gerhard Weikum

betreut von / advised by

Dr. Rainer Gemulla

begutachtet von / reviewers

Prof. Dr. Gerhard Weikum

Dr. Rainer Gemulla

Saarbrucken, May 30, 2011

Non-plagiarism Statement

Hereby I confirm that this thesis is my own work and that I have documented all sources

used.

(Mahdi Ebrahimi)


Declaration of Consent

Herewith I agree that my thesis will be made available through the library of the Com-

puter Science Department.

(Mahdi Ebrahimi)


iii

Abstract

Most interesting discrete optimization problems are NP-hard, thus no efficient algorithm

to find optimal solution to such problems is likely to exist. Linear programming plays a

central role in design and analysis of many approximation algorithms. However, linear

program instances in real-world applications grow enormously. In this thesis, we study

the Awerbuch-Khandekar parallel algorithm for approximating linear programs, provide

strategies for efficient realization of the algorithm in MapReduce, and discuss methods

to improve its performance in practice. Further, we characterize numerical properties of

the algorithm by comparing it with partially-distributed optimization methods. Finally,

we evaluate the algorithm on a weighted maximum satisfiability problem generated by

SOFIE knowledge extraction framework on the complete Academic Corpus.

Acknowledgements

I would like to express my sincere gratitude to Prof. Gerhard Weikum and Dr. Rainer

Gemulla for giving me the opportunity to work under their supervision. Regular dis-

cussions with Dr. Rainer Gemulla were helpful in setting up the targets and motivating

to further meander through the course of this study. I am also thankful to Dr. Mauro

Sozio for initiating this work. Finally, I am grateful to my parents, relatives and friends

for their support.

vii

Contents

Abstract v

Acknowledgements vii

List of Figures xii

List of Tables xv

List of Algorithms xvii

1 Introduction 11.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.3 Outline of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.4 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

2 Preliminaries 52.1 Linear Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.2 Mixed Packing-Covering Linear Programs . . . . . . . . . . . . . . . . . . 72.3 LP-approximation for Weighted MAX-SAT . . . . . . . . . . . . . . . . . 72.4 Binary Search for Optimal Solution . . . . . . . . . . . . . . . . . . . . . . 82.5 MapReduce . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.6 SOFIE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.6.1 Statements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112.6.2 Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112.6.3 MAX-SAT Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

3 Solving Linear Programs in MapReduce 153.1 Awerbuch-Khandekar Algorithm (AK) . . . . . . . . . . . . . . . . . . . . 153.2 Realization in MapReduce . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

3.2.1 MR-MixedPC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183.2.2 MR-MixedPC-E . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203.2.3 MR-MixedPC-S . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

4 Experiments 254.1 Awerbuch-Khandekar Performance Analysis . . . . . . . . . . . . . . . . . 25

ix

Contents x

4.2 Scalability Test for Sequential LP Solvers . . . . . . . . . . . . . . . . . . 264.3 Comparison with Partially-distributed Methods . . . . . . . . . . . . . . . 28

4.3.1 BFGS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294.3.2 L-BFGS-B . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

4.4 Large-scale Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . 334.5 Case Study: SOFIE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

4.5.1 Mid-size Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . 344.5.2 Large-scale Experiment . . . . . . . . . . . . . . . . . . . . . . . . 36

5 Conclusion and Future Work 455.1 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

6 Appendix A. 47

7 Appendix B. 49

List of Figures

2.1 SOFIE Main Components . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

4.1 The AK Potential Value for Different Epsilons . . . . . . . . . . . . . . . . 264.2 The AK Violation Value for Different Epsilons . . . . . . . . . . . . . . . 274.3 CPlex Runtime . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284.4 Eventual Potential Values in Non-converging BFGS Runs . . . . . . . . . 294.5 Potential values in a Non-converging BFGS run . . . . . . . . . . . . . . . 304.6 Violation Comparison for Modified AK Algorithm . . . . . . . . . . . . . 364.7 MR-MixedPC Job History . . . . . . . . . . . . . . . . . . . . . . . . . . . 384.8 MR-MixedPC-E Job History . . . . . . . . . . . . . . . . . . . . . . . . . 394.9 MR-MixedPC-S Job History . . . . . . . . . . . . . . . . . . . . . . . . . . 404.10 MR-MixedPC Resource Utilization Diagram . . . . . . . . . . . . . . . . . 414.11 MR-MixedPC-E Resource Utilization Diagram . . . . . . . . . . . . . . . 424.12 MR-MixedPC-S Resource Utilization Diagram . . . . . . . . . . . . . . . 43

xiii

List of Tables

3.1 Parameter Setting for the AK Algorithm . . . . . . . . . . . . . . . . . . . 17

4.1 Comparison with Partially-distributed Methods . . . . . . . . . . . . . . . 334.2 Case Study Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

xv

List of Algorithms

3.1 AK Algorithm for Mixed Packing-Covering . . . . . . . . . . . . . . . . . . 16

3.2 MR-MixedPC: MapI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

3.3 MR-MixedPC: ReduceI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

3.4 MR-MixedPC: MapII . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

3.5 MR-MixedPC: ReduceII . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

3.6 MR-MixedPC-E: ReduceI, Preprocessing Phase . . . . . . . . . . . . . . . . 21

3.7 MR-MixedPC-E: MapI, Iterative Phase . . . . . . . . . . . . . . . . . . . . 22

3.8 MR-MixedPC-E: MapII, Iterative Phase . . . . . . . . . . . . . . . . . . . . 22

3.9 lsum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

3.10 Numerically Stable AK for Mixed Packing-Covering . . . . . . . . . . . . . 24

6.1 MR-MixedPC-E: ReduceII, Preprocessing Phase . . . . . . . . . . . . . . . 47

6.2 MR-MixedPC-S: MapI, Iterative Phase . . . . . . . . . . . . . . . . . . . . 47

6.3 MR-MixedPC-S: MapII, Iterative Phase . . . . . . . . . . . . . . . . . . . . 48

xvii

Chapter 1

Introduction

1.1 Motivation

One of the promises of the information technology era is to deploy computers to support

rapid, informed decision making by sifting through large amounts of data. The objective

is to make decisions to achieve some best possible goal. The study of how to make

decisions of these sorts has created the field of discrete optimization. Unfortunately, most

interesting discrete optimization problems are NP-hard, thus no efficient algorithm to

find optimal solution to such problems is likely to exist. Linear programs play a central

role in design and analysis of many approximation algorithms [25]. However, linear

program instances in real-world applications grow enormously.

Rapid improvement and availability of cheap, commodity high-performance components

was the driving force for a new era in computing to use networks of computers to

handle large-scale computations [3]. MapReduce is a powerful computational model

that has proved successful in large-scale distributed data analysis[6]. A MapReduce

cluster is easy to run and to maintain, and all the issues related to parallel execution of

algorithms, such as the communication and synchronization of processes, partitioning

and distribution of data, mapping of processes onto processors, and fault tolerance are

automatically handled by the framework. Furthermore, many large-scale applications

such as SOFIE knowledge extraction framework require scanning through huge volumes

of documents, which makes MapReduce a perfect match for them. It is often desirable

to be able to solve the resulted optimization problems in these applications on the same

cluster as the one it was generated on.

1

Chapter 1. Introduction 2

1.2 Contribution

We make the following contributions in our work:

1. We implement the Awerbuch-Khandekar parallel algorithm for solving linear pro-

grams, and study its performance from a practical point of view.

2. We propose strategies for improving the performance of the Awerbuch-Khandekar

algorithm in practice.

3. We study the scalability of state-of-the-art sequential linear program solvers.

4. We compare the performance of the Awerbuch-Khandekar algorithm with partially-

distributed approaches.

5. We propose several realizations for the Awerbuch-Khandekar algorithm in MapRe-

duce, and analyze their performance on large-scale experiments.

6. We analyze the performance of the Awerbuch-Khandekar algorithm on a weighted

maximum satisfiability problem generated by SOFIE knowledge extraction frame-

work on the complete Academic Corpus.

7. We evaluate the quality of the solution of the Awerbuch-Khandekar algorithm by

comparing it with FMS* and CPlex methods.

1.3 Outline of the Thesis

This thesis is organized as follows: Preliminary concepts that are referred to in the

rest of the study are elaborated in Chapter 2. The Awerbuch-Khandekar algorithm

and its realization in MapReduce are discussed in Chapter 3. Chapter 4 describes

our experimental settings, and provides the results of various tests on scalability and

performance of the algorithm. Chapter 5 concludes this study and provides directions

for future work.

1.4 Related Work

It was first shown by Papadimitriou and Yannakakis [16] that positive linear programs

can be well-approximated even if the constraint matrix is distributed among a set of de-

cision makers that are not allowed to communicate. There, it was shown that the worst

case approximation ratio is related to the maximum number of variables appearing in

Chapter 1. Introduction 3

each constraint. At the same time, Luby and Nisan [14] developed a parallel approxi-

mation algorithm that found a feasible value within ε of an optimal feasible solution to

positive linear programs with a running time polynomial in log (N) /ε, where N is the

number of non-zero coefficients associated with the problem instance. Later, the idea

was pushed further by Bartal et al. [5] by allowing local communication between dis-

tributed decision makers. As a result, a distributed algorithm was obtained that could

achieve (1 + ε) approximation to the optimal solution, while using only polylogarithmic

number of local communication rounds. Later on, Kuhn et al. [11] provided a tight

classification of the trade off between the amount of local information and the quality

of the solution, and outlined two specific algorithms for small and unbounded message

size distributed environments.

Another work by Young [26] outlined sequential and parallel algorithms that approxi-

mated mixed packing and covering linear programs (in contrast to pure packing or pure

covering, which only have ”≤” or ”≥” inequalities, but not both), and provided a par-

allel algorithm that ran in time polylogarithmic in the input size with total number of

operations comparable to sequential algorithms. Jansen [10] later provided a parallel

approximation algorithm to solve general mixed packing and covering problems. Awer-

buch and Khandekar [1] introduced the first stateless approximation algorithm for mixed

packing covering linear programs.

The Portable, Extensible Toolkit for Scientific Computation (PETSc) [4] and the Toolkit

for Advanced Optimization (TAO) [7] packages provide facilities for solving linear pro-

grams based on the Message Passing Interface (MPI) standard.

The MapReduce framework proved successful in many large-scale applications. Liu et

al. [13] accomplished nonnegative matrix factorization on web-scale dyadic data with

tens of millions by hundreds of millions matrices containing billions of nonzero entries.

Chierichetti et al. [6] reported successful speedups in MapReduce in comparison to se-

quential greedy algorithms in solving Max-Cover problem on five large-scale data sets

derived from Yahoo! logs. Paradies [17] used MapReduce to perform document clus-

tering in the area of entity matching, where documents from various data sources were

matched together. Zhao et al. [27] and Li et al. [12] proposed parallel algorithms for

k-means clustering based on MapReduce.

Chapter 2

Preliminaries

2.1 Linear Programming

One of the promises of the information technology era is to deploy computers to support

rapid, informed decision making by sifting through large amounts of data. Deciding

inventory levels, routing vehicles, and organizing data for efficient retrieval are examples

of everyday problems in today’s society. Discrete optimization is a branch of optimiza-

tion that studies the question of how to make decisions of these sort to achieve some

best possible objective [25].

Unfortunately, there is no efficient algorithm for most interesting discrete optimization

problems, where an efficient algorithm is known to run in polynomial time in its input

size. In consequence, it is not possible to have an algorithm for most discrete optimiza-

tion problems which finds the optimal solution, in polynomial time, for any instance.

One approach to this problem is to relax the latter requirement, and develop efficient

algorithms that find the optimal solution for the specific problem at hand. The resulting

algorithm is only useful for the special instances it was designed for.

A more common approach is to relax the polynomial time requirement, and find the opti-

mal solution by searching through the complete set of possible solutions. This approach,

however, turns out to become intractable as input grows.

By far the most common approach relaxes the requirement of optimality of the solution,

and settles for a ”good enough”, approximate solution that could be found in tractable

time. Linear programming, frequently abbreviated to LP, plays a central role in design

and analysis of many approximation algorithms, and there has been an enormous study

of various LP-based approximation approaches [25].

5

Chapter 2. Preliminaries 6

The objective in a linear program is to minimize a linear function subject to linear

equality constraints. A standard LP in vector-matrix notation is written as:

Minimize: cTx

subject to: Ax = b, A ∈ Rm×n(2.1)

where x ∈ Rn+, c ∈ Rn, and b ∈ Rm [8].

The above equation can be expanded by vector-matrix multiplication as follows:

Minimize: c1x1 + c2x2 + · · · + cnxn

Subject to: a11x1 + a12x2 + · · · + a1nxn = b1

a21x1 + a22x2 + · · · + a2nxn = b2...

......

......

am1x1 + am2x2 + · · · + amnxn = bm

(2.2)

where ci and xi are the ith element of vectors c and b, respectively, and aij represents

the jth element of the ith row of matrix A. Without loss of generality, the right-hand

side values in Eq. 2.1 can be restricted to non-negative values, i.e., b ∈ Rm+ .

Other variations of Eq. 2.1 are considerable: for example, maximizing the objective

function rather than minimizing it, having inequalities in addition to equations, and

allowing negative variables. However, the above form is general enough to capture all

these extensions. For a detailed discussion on the transformation techniques, the reader

is referred to [22].

Integer programming is another variation of LP, where constraints requiring integer

variables are allowed. Example integer programs include restricting some variables to

natural numbers, or to a bounded range such as {0, 1}. Unlike linear programming,

integer programs are NP-complete, so no efficient algorithm to solve general integer

programs is likely to exist [25].

An optimal solution x ∈ Rn+ to Eq. 2.1 is the one which minimizes the objective function

while satisfying all constraints. Although there are very efficient, sequential algorithms

that find the optimal solution for general linear programs [25], there is currently no

parallel algorithm, to the extent of our knowledge, that optimally solves general LPs.

However, there are subclasses of linear programs that have been extensively studied in

distributed settings in recent years [1, 2, 5, 10, 11, 14, 16, 26]. This work is concerned

with Mixed Packing-Covering linear programs.


2.2 Mixed Packing-Covering Linear Programs

Mixed Packing-Covering (Mixed PC) linear programs are an important subclass of LP,

where both less-than-or-equal and greater-than-or-equal constraints are allowed, but the

coefficient matrices are restricted to non-negative values. Both inequality constraints

(≤, ≥) are explicitly denoted in the standard form of a Mixed PC. The former are often

referred to as packing, while the latter are known as covering constraints. Intuitively,

packing constraints are linear inequalities with upper bounds that can not be exceeded,

while covering constraints define inequalities with lower bounds to be satisfied. A Mixed

PC linear program in vector-matrix notation is denoted as follows:

Minimize cTx

subject to Ax ≤ b, A ∈ Rm×n+

Cx ≥ d, C ∈ Rk×n+

(2.3)

where x ∈ Rn+, c ∈ Rn

+, b ∈ Rm+ , and d ∈ Rk

+ [15].

Despite the additional restriction in comparison with general linear programs, Mixed

PC is an expressive model that is able to represent a large family of interesting opti-

mization problems such as set cover, (weighted) maximum satisfiability, maximum cut,

and multicommodity flow, to name but a few.

As an example, we outline the procedure to approximate a weighted maximum satisfia-

bility (Weighted MAX-SAT) problem using Mixed PC linear programs. Later, we will

use this framework to solve MAX-SAT problems in our case studies.

2.3 LP-approximation for Weighted MAX-SAT

DEFINITION 2.1. Given n Boolean variables x1, . . . , xn, where each variable xi ∈{0, 1}, m clauses C1, . . . , Cm, where each clause Cj is a disjunction of some number of

the variables and their negations, and a nonnegative weight wj associated to each clause

Cj , the Weighted MAX-SAT problem is the task of finding a truth assignment to Boolean

variables such that the sum of the weights of the satisfied clauses is maximized [23, 25].

LP approximating of Weighted MAX-SAT contains three steps. First, the problem is

modeled as a mixed packing-covering integer program as shown in Eq. 2.4.


Maximize∑m

j=1wjzj

subject to∑

xi∈X(cj) xi +∑

xi∈X(cj) xi ≥ zj ∀cj ∈ C (a)

xi + xi ≤ 1 ∀xi ∈ X (b)

xi + xi ≥ 1 ∀xi ∈ X (c)

xi ∈ {0, 1} ∀xi ∈ X (d)

zj ∈ {0, 1} ∀cj ∈ C (e)

(2.4)

Here, the set of all variables and clauses are represented with X and C, respectively.

To address all positive variables participating in clause cj , we use the notation X(cj).

Similarly, X(cj) represents the set of negative variables in cj .

A clause is satisfied if one of the positive variables is set to 1, or one of the negated

variables is set to 0. This is captured in the first constraint in 2.4, tagged with (a).

When a clause cj is satisfied, its corresponding slack variable zj is set to 1, which

in consequence, adds the associated weight of the satisfied clause to the value of the

objective function. Additionally, constraints (b) and (c) jointly assure that of a variable

and its negation, one and only one is set to 1.

In the next step, the integer program in 2.4 is relaxed to LP, by switching variable types

from Boolean to fractional ((d) and (e) are replaced with xi ∈ [0, 1] and zj ∈ [0, 1],

respectively), and the resulting LP is solved.

The eventual solution is computed by rounding the fractional variables to Boolean.

Rounding variables to 1 or 0 by flipping a coin that is biased to the fractional value

of variables guarantees (1− 1e )-approximation for MAX-SAT. A more sophisticated ap-

proach provides 34 -optimal approximation for this problem. For a detailed survey of

different rounding schemes and their approximation ratios, the reader is referred to [25].

Currently available parallel algorithms for Mixed PC linear programs are limited to find-

ing a feasible solution that satisfies all constraints, regardless of the value of the objective

function. Following, we describe how binary search can be applied with feasible solu-

tions to find the optimal solution to the minimization problem in Eq. 2.3. Maximization

problems can be treated similarly.

2.4 Binary Search for Optimal Solution

Given the objective function cTx, and a feasible solution x∗, an upper bound for the

objective function is evaluated as u∗ = cTx∗. Next, we add a new constraint cTx ≤ u∗

2


to packing constraints, and solve the new system for the next feasible solution. If the

new system turns out to be feasible, u∗

2 becomes the new upper bound, and binary

search continues with the tighter constraint cTx ≤ u∗

4 . Otherwise, u∗ remains as the

upper bound, u∗

2 becomes the new lower bound, and binary search continues with a new

constraint cTx ≤ 3u∗

4 . This procedure is repeated until the length of the search interval

is less than or equal to the desired precision of the optimal solution.

2.5 MapReduce

Starting in early 1990s, rapid improvement and availability of cheap, commodity high-

performance components was the driving force for a new era in computing to use net-

works of computers to handle large-scale computations [3].

However, developing distributed applications requires a significant amount of effort to

address regular issues in parallelizing algorithms, such as the communication and syn-

chronization of processes, partitioning and distribution of data, mapping of processes

onto processors, and fault tolerance. Developers often find themselves reimplementing

similar procedures from one application to the other.

In this work, we are going to study distributed LP solving in MapReduce. A MapReduce

cluster is easy to run and to maintain, and all the aforementioned issues related to paral-

lel execution of algorithms are automatically handled by the framework. Hence, it is very

convenient to implement algorithms in MapReduce, since the application developer can

concentrate on the main logic of the algorithm. Furthermore, many applications, includ-

ing SOFIE knowledge extraction framework, require scanning through large amounts of

documents, which makes MapReduce a perfect match for them. It is often more desir-

able to be able to solve the resulting optimization problems in these applications on the

same cluster as the one they were generated on, rather than migrating it to yet another

framework (e.g. a Message Passing cluster).

The MapReduce programming model is based on key/value tuples. The Map function

written by the user, takes a stream of input key/value pairs, and produces a set of

intermediate key/values. Next, the shuffle stage starts by the MapReduce library which

groups together all intermediate values associated with the same intermediate key. In

the Reduce function, also written by the user, intermediate values for the same key are

brought and processed together, to produce a possibly smaller set of output key/value

pairs [9].

As an example, the illustrative problem of counting the number of word occurrences in

a large collection of documents in MapReduce is represented as following: The input


Figure 2.1: SOFIE Main Components

key/value to the Map function is a document name, and its contents. The function scans

through the document and emits each word plus the associated count of the occurrences

of that word in the document. Shuffling groups together occurrences of the same word

in all documents, and passes them to the Reduce function. The Reduce function sums

up all the occurrences, and emits the word and its overall count of occurrences.

2.6 SOFIE

Self-Organizing Framework for Information Extraction (SOFIE), is an ontology-oriented

information extraction framework that aims at extracting high-quality ontological facts.

There are three main challenges in any ontology-based information extraction (IE) frame-

work including SOFIE, namely pattern selection - finding meaningful patterns in text,

entity disambiguation - selecting among multiple possibly ambiguous mappings of words

or phrases in the text to their most probably intended meanings in the ontology, and

consistency checking - scrutinizing a large set of IE-provided noisy candidates against

a trusted core of facts in the ontology. Rather than addressing each of these issues

separately, SOFIE’s novel approach simultaneously solves the three problems by casting

them into a Weighted MAX-SAT problem [19, 21].

In this section, the components of this framework are introduced, and the construction

of its MAX-SAT model is elaborated. We will later return to this problem in case studies

in Chapter 4. SOFIE’s main components are depicted in Fig. 2.1 1.1The figure is borrowed from [19].


2.6.1 Statements

Statements in SOFIE are relations of arbitrary arity. Each statement is assigned a truth

value of 0 or 1, that is denoted in square brackets. Hence the statement ”Albert Einstein

is born in Ulm”, for example, is represented as

bornIn(AlbertEinstein, Ulm)[1]

A statement with truth value 1 is a fact. A statement with unknown truth value is

a hypothesis. There are two types of facts in SOFIE, ontological and textual facts.

Ontological facts come from an underlying ontology, e.g. YAGO [20]. Textual facts

are extracted from a given text corpus, and are divided in two categories. The first

category are pattern occurrence facts that make assertions about the occurrence of

textual patterns. For example, if pattern ”X went to school in Y ” is detected between

Einstein and Germany, the following fact is generated:

patternOcc(”X went to school in Y ”, Einstein, Germany)[1]

The second category uses linguistic techniques to estimate the likeliness of mappings

from words or phrases in text to entities and relations in the ontology. These estimations

appear in the form of disambiguationPrior facts. As an example, disambiguation priors

for Einstein might look like2:

disambPrior(Einstein, AlbertEinstein, 0.8)[1]

disambPrior(Einstein, HermannEinstein, 0.2)[1]

SOFIE uses rules to form new hypotheses based on ontological and textual facts. Hy-

potheses are either concerned with disambiguation of entities, e.g. disambiguateAs

(Einstein,AlbertEinstein)[?], about a certain pattern expressing a certain relation,

e.g. expresses(”X lives in Y ”, LivesInLocation)[?], or even express new potential

facts, e.g. developed(Microsoft, JavaProgrammingLanguage)[?].

2.6.2 Rules

Rules are background knowledge that are represented as logical formulae over a set of

literals. A literal is a statement that can have placeholders for the relation name or

for its entities. Conventionally, uppercase strings are used to represent placeholders.

Following is a sample rule stating that a person who is born in Ulm can not be born in

London:2The reported numbers here are imaginary. For a detailed description on how SOFIE estimates these

values, please refer to [19].


bornIn(X, Ulm) =⇒ ¬bornIn(X, London)

A rule in SOFIE is defined as follows [19]:

DEFINITION 2.2. Given a set of literals L, a rule over L is one of the following:

• an element of L

• an expression of the form ¬R, where R is a rule over L

• an expression of the form (R1 � R2), where R1 and R2 are rules over L and � ∈{∧,∨, =⇒ , ⇐⇒ }.

A rule is said to be grounded when all placeholders in its literals are replaced by entities.

All occurrences of one placeholder within a grounding must be replaced by the same

entity. Replacing the placeholder X in the aforementioned rule with AlbertEinstein, for

example, generates the following grounded instance:

bornIn(AlbertEinstein, Ulm) =⇒ ¬bornIn(AlbertEinstein, London)

2.6.3 MAX-SAT Model

Each grounded rule in SOFIE is transformed into one or multiple clauses in the modelling

of the MAX-SAT problem. Associating weights to clauses in MAX-SAT is an adequate

tool to prioritize clauses according to their importance. This intuition is exploited in

SOFIE to introduce a concept of softness for rules.

Generally speaking, there might be no solution to the MAX-SAT problem in SOFIE

that could satisfy all clauses at the same time. For example, as soon as there are two

disambiguation priors suggesting different disambiguations for the same phrase, one of

the clauses has to be violated.

In contrast to the soft rules, there are also hard rules concerning the consistency of the

solution, that are not allowed to be violated at all. Defined over a set of hypotheses, a

hard rule restricts the maximum number of accepted hypotheses to 1 (e.g., among all

hypotheses suggesting a single person to be born in different places, maximally one can

be true). The set of hypotheses that participate in a hard rule are known as a competitor

set.

Eq. 2.5 shows a Weighted MAX-SAT problem with competitor sets, as it appears in

SOFIE.


Maximize∑m

j=1wjzj

subject to∑

xi∈X(cj) xi +∑

xi∈X(cj) xi ≥ zj ∀cj ∈ C (a)∑xi∈sk

xi ≤ 1 ∀sk ∈ S (b)

xi + xi ≤ 1 ∀xi ∈ X (c)

xi + xi ≥ 1 ∀xi ∈ X (d)

xi ∈ {0, 1} ∀xi ∈ X (e)

zj ∈ {0, 1} ∀cj ∈ C (f)

(2.5)

The set of all statements that appear in clauses compose the set of variables (X), and the

differentiation in the importance of rules is imposed by assigning different weights to the

soft and hard rules. Furthermore, to assure that hard rules are not violated, constraint

(b) is added for each competitor set sk ∈ S, where S is the set of all competitor sets.

Chapter 3

Solving Linear Programs in

MapReduce

In this chapter, we will focus on the Awerbuch-Khandekar (AK) algorithm for solving

linear programs [15]. The algorithm’s acceptable time complexity (polylogarithmic in

number of variables, number of constraints, and the largest entry in the coefficient

matrices) makes it a promissing approach to investigate. Furthermore, it is a simple,

numerically stable algorithm requiring local information and bounded small message size,

which offers nice opportunities for efficient realization in MapReduce. Future work could

further expand this study to compare the AK method with other parallel algorithms

(e.g, [10, 26]).

3.1 Awerbuch-Khandekar Algorithm (AK)

Given a mixed packing-covering linear program

Ax ≤ b, A ∈ Rm×n+

Cx ≥ d, C ∈ Rk×n+

(3.1)

where x ∈ Rn+, b ∈ Rm

+ , and d ∈ Rk+, the Awerbuch-Khandekar algorithm finds a feasible

solution to (3.1). We assume without loss of generality, that b = 1 and d = 1, where

1 denotes a vector of all 1s in appropriate length. Additionally, all non-zero entries

in A and C coefficient matrices are considered to be in range [1,M ]. Noncomplying

formulations could always be transformed by proper scaling of rows and columns in

constraints matrices, as follows.

15

Chapter 3. Solving Linear Programs in MapReduce 16

First, each row i is divided by its associated right-hand side value bi for packing or

di for covering constraints. Next, for each variable xj , cj is chosen to be the smallest

non-zero coefficient of xj in jth column of both packing and covering matrices. Then

xj is replaced by xj = cj ∗ xj and all coefficients in column j are divided by cj . The

resulting problem solves for the new variables xj , and complies with the aforementioned

requirements. Computing xj values from xj is straightforward, xj = xj

cj.

AK is an iterative algorithm, that converges to an ε-feasible solution, if any, after poly-

logarithmic number of iterations.

DEFINITION 3.1. A solution x ∈ Rn+ is said to be ε-feasible if the maximum violation

of packing and covering constraints is less-than-or-equal to ε, i.e., Ax ≤ (1 + ε).1 1 and

Cx ≥ (1− ε).1.

THEOREM 3.1. Given an initial solution x0 ∈ Rn+, the AK solution becomesO(ε)-feasible

in

poly

(log(nmkM)

ε

)number of iterations. Furthermore, once the solution becomes O(ε)-feasible, it always

remains O(ε)-feasible.

This method is depicted in Algorithm 3.1 (α, β, and δ are constants to be defined later):

Algorithm 3.1: AK Algorithm for Mixed Packing-Coveringinput: x ∈ Rn

+

repeat1

y(x)← exp [µ. (Ax− 1)]2

z(x)← exp [µ. (1−Cx)]3

for j = 1 . . . n do4

Rj ←AT

j y(x)

CTj z(x)5

if Rj ≤ 1− α then6

xj ← max (xj (1 + β) , δ)7

else if Rj ≥ 1 + α then8

xj ← xj (1− β)9

end10

end11

until Ax ≤ (1 + ε).1 and Cx ≥ (1− ε).1 ;12

return x13

Each iteration of the AK algorithm contains three main steps. First, an indicator of

the amount of violation for each constraint is computed by evaluating (Ax − 1) and

(1 − Cx) in lines 2 and 3. Next, ratio R estimating the share of each variable in1Throughout this study, dot (.) is used to represent scalar-vector multiplication.


violation of packing and/or covering constraints is computed in line 5. Finally, based on

corresponding ratios, new values for all variables are decided. A variable xj ∈ x will be

increased, if it has a dominating role in violation of covering constraints (lines 6 and 7).

Otherwise, if packing constrains are more dominantly violated because of xj , its value

will be decreased (lines 9 and 10). When the violation of constraints in both sides is

acceptable, xj will not change.

By following these steps, AK intuitively mimics gradient descent method. The goal is to

minimize violation by finding the stationary point for an exponential penalty function,

called potential function.

DEFINITION 3.2. Given x ∈ Rn+, let y and z be vectors in Rm

+ and Rk+, defined as

y(x) = exp [µ. (Ax− 1)] (3.2)

z(x) = exp [µ. (1−Cx)] (3.3)

where µ is a constant to be defined later. Assuming xt, yt, and zt to be values of x,

y(x), and z(x) in round t, the potential function is defined as following:

Φt = 1.yt + 1.zt (3.4)

Parameters

The choice of parameters µ, α, β, and δ in the AK algorithm is based on the maximum

accepted violation (ε), and is shown in Table 3.1. For an in-depth analysis on the

selection criterion, the reader is referred to [1].

µ 1ε ln mkM

ε

α ε4

β Θ(εµ

)δ Θ

(ε

µnM

)Table 3.1: Parameter Setting for the AK Algorithm


3.2 Realization in MapReduce

In the last section, the gradual update procedure of the AK algorithm was described,

where new values for variables were decided based on matrix-vector multiplication op-

erations. First, A and C matrices were multiplied to vector x to evaluate the violation

indicator functions, y and z. Then, AT and CT matrices were multiplied to the vio-

lation vectors to find out whether and how each variable should be modified to reduce

violations.

In this section, we outline how MapReduce programming model can be used to imple-

ment the aforementioned operations. We begin our survey by the most straightforward

implementation. Afterwards, we move towards more efficient and numerically stable

realizations.

3.2.1 MR-MixedPC

Most matrices in practice happen to be sparse with only a small fraction of non-zero

elements. Often, higher scalability and better performance could be achieved by avoiding

the unnecessary consideration of zero-valued elements. To that end, only non-zero entries

of a sparse matrix are stored in triplet format (row, column, value), where row and

column indicate the position of an entry in the matrix, and value contains its value.

We adapt a slightly modified version of this representation in design of our methods,

which has the extra field origin. Origin indicates whether an entry belongs to the

packing or to the covering matrix. We refer to this data structure as MatrixEntry. A

dot (.) is used to access the fields of a MatrixEntry, e.g., MatrixEntry.row.

Now, consider the multiplication of constraint matrices to the variables vector, x. Using

the newly defined MatrixEntry data structure, we jointly treat the matrices as a set of

MatrixEntry instances, where each instance contains adequate information to identify

its origin, its location, and its value. To perform this multiplication, entries from the

same row must be gathered together. We take advantage of the inherent shuffle stage

in MapReduce to achieve this. The steps are as follows.

Map. The Map function is defined to be an identity function, i.e., it returns each

MatrixEntry that it receives. Since we plan to group together all entries from the same

row, an entry’s row number is used as its key. This function is depicted in Algorithm 3.2.

Reduce. As a result of the map and the shuffle stages, the Reduce function receives all

matrix entries from the same row. The reducer iterates over the entries, and multiplies

each variable by its corresponding coefficient.


Algorithm 3.2: MR-MixedPC: MapIinput: Key, MatrixEntry

Key ← MatrixEntry.row1

Emit(Key, MatrixEntry)2

The Reduce function is shown in Algorithm 3.3. Since entries from both packing and

covering matrices are treated simultaneously, a reducer receives entries on the same row

of both matrices. We take care of this by checking the origin of each entry.

Algorithm 3.3: MR-MixedPC: ReduceIinput: Key, MatrixEntryList

sump ← 0, sumc ← 01

foreach MatrixEntry in MatrixEntryList do2

if MatrixEntry.origin == Packing then3

sump ← sump+ MatrixEntry.value ∗x[MatrixEntry.column]4

else5

sumc ← sumc+ MatrixEntry.value ∗x[MatrixEntry.column]6

end7

end8

if sump > 0 then9

value← exp[µ ∗ (sump − 1)]10

Emit((y, Key) , value)11

end12

if sumc > 0 then13

value← exp[µ ∗ (1− sumc)]14

Emit((z, Key), value)15

end16

Similarly, a second MapReduce job is used to compute ATy(x)

CTz(x). The intuition here is

the same as the previous job, except that the transpose of the matrices has to be used.

The implementation is as follows.

Map. The Map function is again an identity function. However, instead of row numbers,

column numbers are used as the intermediate keys. This way, shuffling reconstructs

columns of the matrices instead of their rows. Columns of a matrix are rows of the

transpose of that matrix. Algorithm 3.4 shows the second Map function.

Algorithm 3.4: MR-MixedPC: MapIIinput: Key, MatrixEntry

Key ← MatrixEntry.column1

Emit(Key, MatrixEntry)2

Reduce. Algorithm 3.5 outlines the second reducer. It performs the multiplications

as before. Additionally, given that entries in the same columns of both packing and


covering matrices arrive at the same reducer, the function can go further in lines 10 to

17 to decide on the new value of the corresponding variable for the column at hand.

Algorithm 3.5: MR-MixedPC: ReduceIIinput: Key, MatrixEntryList




sump ← sump+ MatrixEntry.value ∗y[ MatrixEntry.row ]4

else5

sumc ← sumc+ MatrixEntry.value ∗z[ MatrixEntry.row ]6

end7

end8

j ← Key9

r ← sump

sumc10

if r ≤ 1− α then11

xj ← max (xj (1 + β) , δ)12

end13

if r ≥ 1 + α then14

xj ← xj (1− β)15

end16

Emit(j, xj)17

In summary, matrices in MR-MixedPC are represented as a collection of non-zero entries,

that are gathered together in shuffling to reconstruct matrix rows and columns. The

two MapReduce jobs are repeatedly run one after the other. In the beginning of each

iteration, master node in the cluster announces the current value of all variables to

workers. When the first MapReduce job is complete, the corresponding violation for the

current solution is propagated amongst the workers.

MR-MixedPC provides a basic understanding of how the AK algorithm can be realized

in MapReduce, and establishes a baseline for a more efficient implementation, MR-

MixedPC-E.

3.2.2 MR-MixedPC-E

The main observation for Efficient MR-MixedPC is that although the packing and cov-

ering matrices are fixed during runtime, shuffling has to be repeated to reconstruct rows

and columns in each iteration. Considerable speed up could be achieved by removing

the redundant shuffles. To that end, the construction of rows and columns from matrix

entries is removed from the body of the algorithm, and is performed only once, in a

preprocessing phase before the algorithm starts to iterate.


We introduce two new data structures in MR-MixedPC-E, MatrixRow and MatrixCol-

umn. As the names suggest, instances of these data structures are meant to contain

rows and columns of sparse matrices. MR-MixedPC-E is composed from two building

blocks, a preprocessing part which is performed only once when the algorithm starts,

and an iterative part that repeatedly runs until the solution is found.

Preprocessing Phase

The goal at this step is to generate a row-based as well as a column-based view of the

input matrices that could be consumed later on by the iterative part.

We use two MapReduce jobs to implement this. The Map functions are the same as those

in MR-MixedPC. The reducers, however, are simpler. Paired with the first mapper, the

first reducer shown in Algorithm 3.6 receives a row number with a list of non-zero entries

in the row, and returns a MatrixRow instance containing the row entries. The second

reducer works with the next mapper to generate matrix columns in the same way. The

implementation is removed to Appendix A.

Algorithm 3.6: MR-MixedPC-E: ReduceI, Preprocessing Phaseinput: Key, MatrixEntryList

MatrixRow row = new MatrixRow()1


row.add(MatrixEntry)3

end4

Emit(Key, row)5

Iterative Phase

Next, the iterative part starts. Since matrix rows and columns are already available,

Map functions directly implement multiplications, with no further need to shuffling and

reducers. Algorithms 3.7 and 3.8 show the Map functions to compute y and z, andATy(x)

CTz(x), respectively.

At this point, the efficient realization of the AK method in MapReduce is complete.

MR-MixedPC-E omits zero-valued matrix entries by using sparse matrix representation.

Composed from two pairs of MapReduce jobs, the preprocessing phase loads the input

data into adequate structures that are consumed by the mapper functions in the iterative

part to gradually improve the solution.


Algorithm 3.7: MR-MixedPC-E: MapI, Iterative Phaseinput: Key, MatrixRow


foreach MatrixEntry in MatrixRow do2


sump ← sump+ MatrixEntry.value ∗x[ MatrixEntry.column ]4

else5

sumc ← sumc+ MatrixEntry.value ∗x[ MatrixEntry.column ]6

end7

end8

if sump > 0 then9

value← exp[µ ∗ (sump − 1)]10

Emit((y, Key), value)11

end12

if sumc > 0 then13

value← exp[µ ∗ (1− sumc)]14


end16

Algorithm 3.8: MR-MixedPC-E: MapII, Iterative Phaseinput: Key, MatrixColumn


foreach MatrixEntry in MatrixColumn do2


sump ← sump+ MatrixEntry.value ∗y[ MatrixEntry.row ]4

else5

sumc ← sumc+ MatrixEntry.value ∗z[ MatrixEntry.row ]6

end7

end8

j ← Key9

r ← sump

sumc10

if r ≤ 1− α then11

xj ← max (xj (1 + β) , δ)12

else if r ≥ 1 + α then13

xj ← xj (1− β)14

end15

Emit(j, xj)16


3.2.3 MR-MixedPC-S

Evaluating the exponential function for very large numbers might lead to buffer overflow

and numerical instability, in practice. A common solution is to downscale the computa-

tions using the log function. Logarithm is a strictly monotonic function, so it preserves

the order of its arguments (i.e., if x1 < x2 then log(x1) < log(x2)).

Using this property, it is possible to perform the comparisons in lines 6 and 8 of the

original AK algorithm in 3.1 in log scale. The goal is to replace the evaluation of the

exponential functions in y(x) and z(x) with a numerically stable version. To this end, we

first introduce the method for precise calculation of logarithm of a sum by Pihlakas [18].

Pihlakas method is based on the notion ln (a+ b) = ln (exp [ln (a)− ln (b)] + 1) + ln (b).

We adapt the bivariate method in [18] to implement multivariate lsum, a numerically

stable function for computing logarithm of sum of a list of numbers from their individual

logs. The implementation is represented in Algorithm 3.9.

MAX MANTISSA in 3.9 refers to the maximal value of mantissa in double-precision

floating point numbers, and top, add, and len are vector operators. Top removes and

returns the last element in a vector. Add appends an element to the end of the vector,

and len returns the count of the elements.

Algorithm 3.9: lsuminput: v

repeat1

lna ← top(v)2

lnb ← top(v)3

if abs (lna − lnb) ≥ ln (MAX MANTISSA) then4

add(v , max (lna, lnb))5

else6

add(v , ln (exp [lna− lnb] + 1) + lnb)7

end8

until len(v) == 1 ;9

return top(v)10

The next step is to compute ln(Rj):


ln (Rj) = lnATj y

CTj z

(3.5)

= ln ATj y− ln CT

j z

= lnm∑i=1

ATjiyi − ln

k∑i=1

CTjizi

= ln(ATj1y1 + · · ·+ AT

jmym)− ln

(CTj1z1 + · · ·+ CT

jkzk)

= lsum(ln AT

j1y1, . . . , ln ATjmym

)− lsum

(ln CT

j1z1, . . . , ln CTjkzk

)= lsum

(ln[ATj1 + µ ∗ (A1x− 1)

], . . . , ln

[ATjm + µ ∗ (Amx− 1)

])− lsum

(ln[CTj1 + µ ∗ (1−C1x)

], . . . , ln

[CTjk + µ ∗ (1−Ckx)

])As Eq. 3.5 shows, evaluating ln(Rj) can be reduced to a series of log, addition, and

multiplication operations, with all exponential evaluations redirected to the numerically

stable lsum function.

Putting it all together, the numerically stable AK is depicted in Algorithm 3.10.

Algorithm 3.10: Numerically Stable AK for Mixed Packing-Coveringinput: x ∈ Rn

+

repeat1

for j = 1 . . . n do2

t1 ← ln ATj + µ. (Ax− 1)3

t2 ← ln CTj + µ. (1−Cx)4

ln Rj ← lsum (t1)− lsum (t2)5

if ln Rj ≤ ln (1− α) then6

xj ← max (xj (1 + β) , δ)7

else if ln Rj ≥ ln (1 + α) then8

xj ← xj (1− β)9

end10

end11

until Ax ≤ (1 + ε).1 and Cx ≥ (1− ε).1 ;12

return x13

MR-MixedPC-S adapts the same implementation as MR-MixedPC-E, with minor mod-

ifications in its iterative section, namely the first mapper computes log of y and z, and

the second one consumes the logs to evaluate ln(Rj). The implementation is similar to

algorithms 3.7 and 3.8. The complete outline can be found in Appendix A.

Chapter 4

Experiments

4.1 Awerbuch-Khandekar Performance Analysis

Chapter 3 introduced the AK algorithm to find a feasible solution for mixed packing-

covering problems. The method was based on parallel approximation of gradient descent

approach to minimize an exponential penalization function of the sum of all violations,

referred to as the potential function. The step size in each iteration was bounded by the

maximum accepted violation, ε.

In this section, the connection between violation of constraints with the value of the

potential function is examined, and the role of ε as an important calibration factor in

the algorithm is quantified.

The experiment is composed of two randomly generated coefficient matrices of equal

size, 1000 × 1000, with 90% randomly selected zero-valued entries per each row. This

problem was repeatedly solved by the algorithm for various ε values, and the value of

the potential function as well as the largest violation in each iteration was recorded. The

complete process was replicated 50 times to assure that the observations are meaningful.

The average values over all replications are reported here. To provide comparable results,

all experiments were run for 2, 000 iterations, even in cases where the expected violation

was achieved earlier.

Figure 4.1 represents the value of the potential function in each iteration for multiple

choices of ε. As the figure shows, the algorithm successfully decreases the potential at

each step until a minimum value is reached, when no more changes are observed after-

wards. Figure 4.2 verifies that decreasing the value of the potential function effectively

reduces the violation of constraints.

25

Chapter 4. Experiments 26

0 500 1000 1500 2000

2000

2500

3000

3500

4000

Iteration No.

Pot

entia

l Val

ue●

●

● ● ● ● ● ●

● ● ● ● ● ●

●

● ● ● ● ●

●

●

● ●

eps=1.0eps=0.8eps=0.6eps=0.4eps=0.2

Figure 4.1: The AK Potential Value for Different Epsilons

As Fig. 4.2 suggests, better violation is achieved at the cost of more iterations. The

reason is that decreasing the epsilon reduces the step size in the gradient descent method

at the same time. As a result, more steps are required. However, it is the case that the

eventual violation is often much smaller than the initially declared value. We exploit

this phenomenon in case studies to practically speed up the algorithm by separating the

epsilon parameter from the maximum accepted violation.

4.2 Scalability Test for Sequential LP Solvers

There are many variations of sequential LP solvers, that find the exact solution to linear

programs. In this section we describe an experiment to examine the scalability of some

off-the-shelf LP solvers. The considered methods include LPSolve, Symphony, GLPK,

and CPlex. The experiments were run on a 2.66 GHz Intel R©CoreTM2 processor with 4

GB RAM.


0 500 1000 1500 2000

0.01

0.02

0.05

0.10

0.20

0.50

1.00

Iteration No.

Max

Vio

latio

n●

●

●

● ● ● ●

●

●

●

●

● ● ●

●●

●

●

● ● ●

● ● ● ●

●

●

●

● ● ● ●●

●

●

eps=1.0eps=0.8eps=0.6eps=0.4eps=0.2

Figure 4.2: The AK Violation Value for Different Epsilons

We started with a problem with 200 constraints over 100 variables, which contained

90% zero-valued coefficients per each constraint. The value of non-zero coefficients as

well as the location of zero-valued ones were decided using a uniform random number

generator. The feasibility of the problem was guaranteed by the construction method.

In this experiment, we repeatedly doubled the dimension of the problem, and exposed

them to the available solvers to measure their runtime. Among LPSolve, Symphony,

and GLPK, LPSolve and Symphony were observed to be the slowest and the fastest,

respectively. However, non of these methods scaled to problems with more than 6400

constraints, as the corresponding matrices could no longer be loaded into memory. With

CPlex, however, larger problems were considerable, since the method avoided allocat-

ing non-necessary space to zero-valued entries by using sparse matrix representation in

triplet format. But the runtime proved to be an issue here.

Figure 4.3 plots CPlex runtime against the dimension of the problem. As it can be seen,

the runtime shows an almost 10-fold increase at each step while moving from 200 to


Figure 4.3: CPlex Runtime

6400 constraints. Doubling the number of constraints from 6400 to 12800 increased the

runtime by a factor of 30.

4.3 Comparison with Partially-distributed Methods

This section is aimed at studying the performance of the AK algorithm in minimizing

the potential function, by comparing it with two centralized optimization techniques,

the Broyden-Fletcher-Goldfarb-Shanno (BFGS), and its Limited memory alternative

(L-BFGS-B).

The BFGS family of methods are known as quasi-Newton techniques, and belong to the

general class of hill-climbing non-linear optimization approaches that seek the station-

ary point of a twice continuously differentiable function, where the necessary optimality

condition of the gradient being zero is satisfied. On the one hand, these iterative op-

timization methods are centralized in the sense that decisions on new values of the

variables in each iteration has to be taken in a central node that is aware of the values

of all variables. On the other hand, they could be partially distributed by outsourcing

the demanding operations in each iteration to a MapReduce cluster, for example.

In this experiment a uniform-distribution random number generator was used to con-

struct packing and covering constraint matrices of 1000×1000 dimensions, that contained

90% randomly selected zero-valued entries in each row. The problems were solved by

the AK, BFGS, and L-BFGS methods. This procedure was replicated 50 times, and

potential value in each iteration, as well as the overall number of the evaluations of the

potential function were recorded.


4.3.1 BFGS

Starting with BFGS, the algorithm showed strong dependency to the input data. In 20

out of 50 runs, BFGS managed to converge to the optimal solution in less than 1000

iterations, with the average potential value 1738, that was achieved after an average of

815 iterations, requiring 12746 evaluations of the potential function, on average.

In the other 30 runs which the running processes were forced to stop after 1000 iterations,

large variations in the final value of the potential function were observed, with 1739 and

2962659 being the minimum and the maximum values, respectively. The average value

in this case was 455968 with large standard deviation equal to 805172, requiring 17015

evaluations of the potential function, on average. Figure 4.4 shows the diversity of BFGS

results when the method did not converge after 1000 iterations. In this figure, the first

and third quantiles are represented by two horizontal lines restricting the borders of the

box, with a dark, horizontal segment at the median, plus a whisker that extends to the

maximum value. Here, the minimum happens to be very close to the first quantile, so

its corresponding whisker is not visible in the picture.

Figure 4.4: Eventual Potential Values in Non-converging BFGS Runs


The AK algorithm with ε = 0.8, on the other hand, successfully converged in all cases

after 250 iterations, on average, requiring the same number of evaluations of the potential

function. The average potential was 2200, with minimum, maximum, and standard

deviation being 2170, 2238, and 19, respectively.

A closer look at the steps taken by BFGS reveals the reason for the method’s low

convergence rate in this experiment. Since our objective function is exponential, it is

very sensitive to small changes in values of the variables, which introduces significant

numerical instability to this method. While the AK algorithm is proven to never decrease

the quality of the solution [1], this obviously is not the case with BFGS. As an example,

Figure 4.5 shows what BFGS examined solutions for a diverging run in our experiment

look like. Although BFGS never takes steps that make the solution worse, this internally

relies on its fail safe behavior that requires testing different candidate solutions before

deciding on the next one.

●

●

●

●

●

●●

●

●

●●

●

●

●●

●

●

●●

●

●

●●

●

●

●●●

●

●

●●

●

●

●●●

●

●

●●●

●

●

●●●

●

●

●●●

●

●

●●●

●

●

●●●

●

●

●●

●

●●●●

●

●

●●●

●

●

●●●

●

●

●●●

●

●

●●●●

●

●

●●●

●

●●●●

●

●

●●●

●

●

●●●●

●

●

●

●●●

●

●

●

●●●

●

●

●●●

●

●

●●●●

●

●

●●●●

●

●

●●●

●

●

●●●

●

●

●●●

●

●

●●●

●

●

●●●

●

●

●●●

●

●

●●●

●

●

●●●

●

●

●●●

●

●

●●●

●

●

●●●

●

●

●●●

●

●

●●●

●

●

●●●

●

●

●●●

●

●

●●●

●

●

●●●

●

●

●●●

●

●

●●●

●

●

●●●

●

●

●●●

●

●

●●●

●

●

●●●

●

●

●●●

●

●

●●●

●

●

●●●

●

●

●●●

●

●

●●●

●

●

●●●

●

●

●●●

●

●

●●●

●

●

●●●

●

●

●●●

●

●

●●●

●

●

●●●

●

●

●●●

●

●

●●●

●

●●●●

●

●

●●●

●

●

●●●

●

●

●●●

●

●

●●●

●

●●●●

●

●●●●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●

●

●

●●●●

●

●●●●

●

●●●●

●

●●●●

●

●

●●●

●

●

●●●●

●

●

●●●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●

●

●

●●●●

●

●

●●●●

●

●

●●●

●

●

●●●

●

●

●●●●

●

●

●●●

●

●

●●●

●

●

●●●

●

●

●●●

●

●

●●●

●

●

●●●

●

●

●●●

●

●

●●●

●

●

●●●

●

●

●●●

●

●

●●●

●

●

●●●

●

●

●●●

●

●

●●●

●

●

●●●

●

●

●●●

●

●

●●●

●

●

●●●

●

●

●●●

●

●

●●●

●

●

●●●

●

●

●●●

●

●

●●●

●

●

●●●

●

●

●●●

●

●

●●●

●

●

●●●

●

●

●●●

●

●

●●●

●

●

●●●

●

●

●●●

●

●

●●●

●

●

●●●

●

●

●●●

●

●

●●●

●

●

●●●

●

●

●●●

●

●

●●●

●

●

●●●

●

●

●●●

●

●

●●●

●

●

●●●

●

●

●●●

●

●●●●

●

●

●●●

●

●

●●●

●

●

●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●●●●

●

●●●●

●

●

●●●

●

●●●●

●

●●●●

●

●●●●

●

●

●●●

●

●●●●

●

●●●●

●

●●●●

●

●●●●

●

●●●●

●

●

●●●

●

●●●●

●

●●●●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●●●●●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●●

●

●

●●●●●●

●

●●●●●●

●

●

●●●●●

●

●●●●●●

●

●●●●●●

●

●

●●●●●●

●

●

●●●●●●

●

●

●●●●●●

●

●

●●●●●●

●

●

●●●●●●

●

●

●●●●●●

●

●

●●●●●●

●

●

●●●●●●

●

●

●●●●●●

●

●

●●●●●●

●

●

●●●●●●

●

●

●●●●●●

●

●

●●●●●●

●

●

●●●●●●

●

●

●●●●●●

●

●

●●●●●●

●

●

●●●●●●

●

●

●●●●●●

●

●

●●●●●●

●

●

●●●●●●

●

●

●●●●●●

●

●

●●●●●●

●

●

●●●●●●

●

●

●●●●●●

●

●

●●●●●●

●

●

●●●●●●

●

●

●●●●●●

●

●

●●●●●●

●

●

●●●●●●

●

●

●●●●●●

●

●

●●●●●●

●

●

●●●●●●

●

●

●●●●●●

●

●

●●●●●●

●

●

●●●●●●

●

●

●●●●●●

●

●

●●●●●●

●

●

●●●●●●

●

●

●●●●●●

●

●●●●●●●

●

●●●●●●●

●

●●●●●●●

●

●

●●●●●●●

●

●

●●●●●●●

●

●

●●●●●●●

●

●

●●●●●●●

●

●

●●●●●●●

●

●

●●●●●●●

●

●

●●●●●●●

●

●

●●

●

●

●●●

●

●

●●●

●

●

●●●

●

●

●●●

●

●

●●●

●

●

●●●

●

●

●●●

●

●

●●●

●

●

●●●●

●

●

●●●

●

●

●

●●●

●

●

●

●●●

●

●

●●●●

●

●

●●●

●

●

●●●●

●

●

●●●●

●

●

●●●

●

●

●●●

●

●

●●●

●

●

●●●

●

●

●●●

●

●

●●●

●

●

●●●

●

●

●●●

●

●

●●●

●

●

●●●

●

●

●●●

●

●

●●●

●

●

●●●

●

●

●●●

●

●

●●●●

●

●

●●●

●

●

●●●

●

●●●●

●

●

●●●

●

●

●●●

●

●●●●

●

●●●●

●

●●●●

●

●●●●

●

●●●●

●

●

●●●

●

●●●●

●

●

●●●●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●●●●●

●

●●●●●

●

●●●●

●

●●●●●

●

●●●●●

●

●●●●●

●

●

●●●●●

●

●●●●●

●

●●●●

●

●●●●●

●

●●●●●

●

●●●●

●

●●●●

●

●●●●●

●

●●●●

●

●

●●●●

●

●●●●

●

●●●●●

●

●●●●

●

●●●●

●

●●●●●

●

●

●●●●●

●

●

●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●●●●●

●

●

●●●●

●

●●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●●●●●

●

●●●●●

●

●●●●●

●

●●●●●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●●●●●●

●

●●●●●●

●

●●●●●●

●

●

●●●●●

●

●●●●●●

●

●●●●●●

●

●

●●●●●●

●

●

●●●●●●

●

●

●●●●●●

●

●

●●●●●●

●

●

●●●●●●

●

●

●●●●●●

●

●

●●●●●●

●

●

●●●●●●

●

●

●●●●●●

●

●

●●●●●●

●

●

●●●●●●

●

●

●●●●●●

●

●

●●●●●●

●

●

●●●●●●

●

●

●●●●●●

●

●

●●●●●●

●

●

●●●●●●

●

●

●●●●●●

●

●

●●●●●●

●

●

●●●●●●

●

●

●●●●●●

●

●

●●●●●●

●

●

●●●●●●

●

●

●●●●●●

●

●

●●●●●●

●

●

●●●●●●

●

●

●●●●●●

●

●

●●●●●●

●

●

●●●●●●

●

●

●●●●●●

●

●

●●●●●●

●

●

●●●●●●

●

●

●●●●●●

●

●

●●●●●●

●

●

●●●●●●

●

●

●●●●●●

●

●

●●●●●●

●

●

●●●●●●

●

●

●●●●●●

●

●

●●●●●●

●

●

●●●●●●

●

●

●●●●●●

●

●

●●●●●●

●

●

●●●●●●

●

●

●●●●●●

●

●

●●●●●●

●

●

●●●●●●

●

●

●●●●●●

●

●

●●●●●●

●

●

●●●●●●

●

●

●●●●●●

●

●

●●●●●●

●

●

●●●●●●

●

●

●●

●

●

●●●

●

●

●●●

●

●

●●●

●

●

●●●

●

●

●●●

●

●

●●●

●

●

●●●

●

●

●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●

●

●

●●●

●

●

●●●

●

●

●●●

●

●

●●●

●

●

●●●

●

●●●●

●

●

●●●

●

●

●●●

●

●

●●●

●

●

●●●

●

●

●●●

●

●

●●●

●

●

●●●

●

●●●●

●

●●●●

●

●

●●●●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●●●●●

●

●●●●●

●

●

●●●●

●

●●●●●

●

●

●●●●

●

●●●●●

●

●●●●●

●

●●●●●

●

●●●●●

●

●●●●●

●

●●●●●

●

●●●●●

●

●●●●●

●

●●●●●

●

●●●●●

●

●●●●●

●

●●●●●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●●●●●

●

●●●●●

●

●●●●●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●●●●●

●

●●●●●

●

●

●●●●

●

●●●●●

●

●●●●●

●

●●●●●

●

●●●●●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●●●●●●

●

●●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●●●●●●

●

●

●●●●●●

●

●

●●●●●●

●

●

●●●●●●

●

●

●●●●●●

●

●●●●●●

●

●

●●●●●

●

●

●●●●●

●

●

●●●●●

●

●●●●●●

●

●

●●●●●●

●

●

●●●●●●

●

●

●●●●●●

●

●

●●●●●●

●

●

●●●●●●

●

●

●●●●●●

●

●

●●●●●●

●

●

●●●●●●

●

●

●●●●●●

●

●

●●●●●●

●

●

●●●●●●

●

●

●●●●●●

●

●

●●●●●●

●

●

●●●●●●

●

●

●●●●●●

●

●

●●●●●●

●

●

●●●●●●

●

●

●●●●●●

●

●

●●●●●●

●

●

●●●●●●

●

●

●●●●●●

●

●

●●●●●●

●

●

●●●●●●

●

●

●●●●●●

●

●

●●●●●●

●

●

●●●●●●

●

●

●●●●●●

●

●

●●●●●●

●

●

●●●●●●

●

●

●●●●●●

●

●

●●●●●●

●

●

●●

●

●

●●

●

●

●●

●

●

●●

●

●

●●●

●

●

●●●

●

●

●●●

●

●

●●

●

●

●●●

●

●

●●●

●

●

●●●

●

●

●●●

●

●

●●

●

●

●●

●

●

●●

●

●

●●●

●

●

●●●

●

●

●●●

●

●

●●●

●

●

●●●

●

●

●●●

●

●

●●●

●

●

●●●

●

●

●●●

●

●

●●●

●

●

●●●

●

●

●●●

●

●

●●●

●

●

●●●

●

●

●●●

●

●

●●●

●

●

●●●

●

●

●●●

●

●

●●●

●

●

●●●

●

●

●●●

●

●

●●●

●

●

●●●

●

●

●●●

●

●

●●●

●

●

●●●

●

●●●●

●

●

●●●

●

●

●●●

●

●

●●●

●

●●●●

●

●●●●

●

●

●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

●

●

●●●●

0 1000 2000 3000 4000 5000 6000

1e−

071e

+57

1e+

121

1e+

185

1e+

249

Iteration No.

Pot

entia

l Val

ue

Figure 4.5: Potential values in a Non-converging BFGS run


The specific example shown in Fig. 4.5 corresponds to a run that required the evaluation

of the potential function for 16500 different solutions during 1000 iterations, where

many potential values where larger than the biggest double-precision value in a 64-bit

architecture (we refer to these values as Inf, from now on). To be able to draw the

diagram, Inf values had to be removed, leaving us with 6250 values.

4.3.2 L-BFGS-B

The BFGS ability to work with Inf values made its application on our exponential

potential function straightforward. However, this convenience came at the cost of nu-

merical instability of the method. For this reason, we repeated our experiment with

L-BFGS-B. This method does not allow the objective function to assume Inf values,

hence it is expected to represent higher numerical stability. Additionally, L-BFGS-B

allows variables to be bounded. By defining a zero lower bound in our case, we could

assure that AK and L-BFGS-B solve exactly the same problem.

Fitting an exponential penalty function into L-BFGS-B, however, introduces new chal-

lenges to smooth the function using numerical methods, since small changes in variables

can easily explode the value of the function. Here, we used the log technique as a

frequent approach.

In general, applying the log function on the potential was examined in two different

ways. The first trial followed the approach in Chapter 3 by working with the log of the

potential function, φ(x) = ln Φ(x). The computations are similar to MR-MixedPC-S,

and are omitted for brevity.

Additionally, quasi newton approaches, including L-BFGS-B, require the gradient of the

objective function. Following the definition of the new function φ(x), we have

∇φ(x) =∇Φ(x)Φ(x)

(4.1)

Since the potential function reappears in its gradient, we first compute ln∇φ(x) =

ln∇Φ(x)− ln Φ(x). The effect of the additional log calculation can be reversed later by

computing the exponential value of the final result. We have already discussed ln Φ(x).

The next step is to drive ln∇Φ(x). Following the definition of Φ(x) in Eq. 3.4,


ln∇Φ(x) = ln

[m∑i=1

∇yi (x) +k∑i=1

∇zi (x)

](4.2)

= ln

[m∑i=1

∇ exp (µ ∗ [Aix− 1]) +k∑i=1

∇ exp (µ ∗ [1−Cix])

](4.3)

= ln

[m∑i=1

µyi (x) Ai +k∑i=1

(−µ) zi (x) Ci

](4.4)

Since components of the potential function in the form of yi and zi are still present in

Eq. 4.4, direct evaluation of the argument of the logarithm function in this equation

can still result in numerical instability. That is while replacement of the log of the

summation with the lsum of the logarithm of the individual elements, as we did for

ln Φ(x) is impossible, because the log function for negative numbers is not defined. In

consequence, this approach proves to be inapplicable with our potential function and

L-BFGS-B method.

The next approach was to downscale the input of the potential function instead of the

function itself. We define φ(x) = Φ (ln (1 + x)). The gradient is derived as follows:

∇φ(x) =m∑i=1

∇yi (ln (1 + x)) +k∑i=1

∇zi (ln (1 + x)) (4.5)

=m∑i=1

∇ exp (µ ∗ [Ai ln (1 + x)− 1]) +k∑i=1

∇ exp (µ ∗ [1−Ci ln (1 + x)]) (4.6)

=m∑i=1

µyi (ln (1 + x)) Ai1

1 + x−

k∑i=1

µzi (ln (1 + x)) Ci1

1 + x(4.7)

Since the exponential component in the objective function and its gradient has to still be

evaluated in the new approach, chances that the computations overflow are not removed

completely. As a matter of fact, this approach occasionally resulted in Inf values in

our experiments. A limited workaround in this case was to decrease non-zero entries in

the coefficient matrices both in count and in scale, however, this obviously might not

be possible in real applications. With this additional limitation, L-BFGS-B successfully

converged in all 50 replications, and reduced the value of the objective function to an

average of 1819 after 135 iterations, requiring the same number of evaluations of the

potential function.


BFGSL-BFGS-B AK

Converged DivergedNo. of Iterations 815 1000 135 250Potential Value 1738 455968 1819 2200Potential STD 17 805172 11 19

Table 4.1: Comparison with Partially-distributed Methods

The AK, BFGS, and L-BFGS-B performance results are summarized in Table 4.1. In

summary, BFGS was easy to apply, but it was numerically unstable with our exponential

potential function. L-BFGS-B was applicable only after downscaling the function and it

was still prone to failure due to overflow. The AK algorithm presented a slightly slower

convergence rate, however, it was highly numerically-stable and easy to apply.

4.4 Large-scale Experiments

In this section we move towards a considerably large problem that does not fit into the

memory of a single machine. The problem contained 160, 000 constraints over 80, 000

variables with 90% randomly selected zero-valued entries. The resulting coefficient ma-

trix in sparse triplet format took more than 10 GB space to store.

This experiment was run on an experimental cluster of 4 blade servers. Each server

had a 48-Core processor and 50 GB RAM. The MapReduce cluster was installed us-

ing Hadoop [24]. One of the servers was exclusively devoted to the master node, and

the others were used as workers. Each worker was assigned a maximum capacity of 4

simultaneous Map and 3 Reduce jobs.

Figure 4.7 represents the sequence of Map, Shuffle, and Reduce attempts for MR-

MixedPC over time. Each row in the diagram corresponds to an available slot on a

worker node. Since we have 3 nodes in the cluster with 7 overall slots on each (4 Maps

and 3 Reduces), there are 21 rows in the diagram. Each column corresponds to a com-

plete MapReduce job. The diagram shows the first 5 iterations of MR-MixedPC. Since

a pair of MapReduce jobs are performed in each iteration, the diagram is composed of

10 overall columns. The figure reveals that short Map and Reduce attempts in MR-

MixedPC are dominated by long Shuffle stages.

The attempts history for MR-MixedPC-E is depicted in Figure 4.8. The first 2 columns

correspond to preprocessing phase in the algorithm. The next 10 columns correspond

to the iterative part. As it was described in Chapter 3, the efficient realization of

MR-MixedPC avoids the unnecessary sort and shuffle operations during iterations by

consuming the results of the preprocessing phase. As it can be seen, this can effectively


reduce the runtime of the algorithm. For 5 iterations in this experiment, the completion

time was decreased from more than 4000 to less than 3000 seconds.

Similar plot for MR-MixedPC-S is shown in Figure 4.9. As expected, the orchestration

of the attempts are exactly the same as MR-MixedPC-E. However, due to the extra

overhead of calculations in logarithmic scale, the numerically stable implementation is

slightly slower than the efficient version.

Figures 4.10 to 4.12 show processor, disk, and network resource utilization of a single

worker node. As the figures suggest, MR-MixedPC is most demanding on all resources,

while storage and network traffic sharply decline during iterative phase in MR-MixedPC-

E and MR-MixedPC-S. Additionally, processor consumption considerably decreases in

both efficient and numerically-stable realizations. Compared to the efficient version,

however, the stable version is more CPU intensive.

4.5 Case Study: SOFIE

As it was introduced in Chapter 2, ontology-based information extraction is cast into

a Weighted MAX-SAT problem in SOFIE. In this section, we use the AK algorithm to

approximate the MAX-SAT instance that is generated by SOFIE. We also evaluate the

quality of the solution by comparing it to FMS* (SOFIE’s MAX-SAT solver), and CPlex

methods. Textual facts in this experiment were extracted from the Academic Corpus.

The ontological facts were retrieved from YAGO [20] knowledge base.

In order to scale to relatively large problems, rather than considering the whole corpus

at once, SOFIE splits textual facts into batches and repeatedly runs over them, one by

one. When running in the batch mode, accepted hypotheses in each run are stored in

the knowledge-base to be used as new facts in the upcoming runs. The first experiment

was designed to be small enough such that SOFIE could be run without batching. The

second experiment used SOFIE with 20, 000 batch size.

4.5.1 Mid-size Experiment

In this experiment 200, 000 pattern occurrences were used. The resulting problem con-

tained 86, 944 clauses over 92, 444 positive and negative variables with 3, 575 competitor

sets for 96, 071 hypotheses. The average length of clauses was 1.678230 with 1 and 4

being the minimum and maximum length, respectively. A variable or its negation, on

average, participated in 3.156765 clauses. The most frequent variable was observed in

4, 886 clauses, while the least frequent ones were seen only once. The longest competitor


set contained 3 variables. The shortest one had naturally 2. The average number of

variables in competitor sets was 2.554126.

It took around 2 hours to generated the hypotheses, while the elapsed time to cast

the hypothese into the corresponding MAX-SAT formulation was negligible (less than

a minute). The resulted MAX-SAT problem was composed of 136, 741 constraints over

179, 388 variables.

It turned out to be trivial to compute the optimal solution to this problem with CPlex.

The method was able to solve the problem with Boolean variables type in less than 2

seconds, achieving 95.01208% of the overall weights. CPlex runtime logs are provided in

Appendix B. On the other hand, the FMS* algorithm took 19 hours to terminate, and

achieved 0.995888 of the optimal solution.

Finally, we approximated this problem with the AK algorithm using the LP-relaxation

technique introduced in Chapter 2. The goal here is to compare the quality of the

AK solution with the other two methods. Since the weight of the optimal solution

was already found by CPlex, we avoided the unnecessary binary search for the optimal

solution, and directly used the optimal weight in removing the objective of the Mixed

PC problem. However, when this value is not known, binary search is inevitable.

We noticed earlier, in the analysis of the algorithm in the beginning of this chapter,

that the number of iterations dramatically increases as the maximum accepted violation

decreases, such that the algorithm did not terminate after 2, 000 iterations when violation

parameter was set to 0.2. However, it was noticed that given an ε, the algorithm often

reached relatively smaller violations. There, it was suggested to exploit this phenomenon

to speed up the algorithm by separating the violation parameter from ε. We call this

new parameter ξ. Following we summarize the results of the experiments with the AK

algorithm running on the MAX-SAT problem at hand.

As it was expected, the original algorithm with ε = 0.1 did not terminate after a few

thousands of iterations, and it had to be stopped. Next, the modified version was started

with ξ = 0.1. The ε was initially set to 8, and it was logarithmically decreased over time.

To decide on when to decrease epsilon, we adopted the following intuition: Since the

algorithm is actually running with larger step sizes than it should be, it either reaches

the expected violation, or it gets stock at some best reachable violation. As a result, we

decreased the ε whenever the violation got fixed, or it started to behave strangely (e.g.,

the violation started to increase).

Instead of using random initial solutions, as it was the case with the last two experiments,

it is sometimes possible to engineer an initial solution for the specific problem at hand. In

the next experiment, setting all variables to 1 turned out to be a good initial solution,


Figure 4.6: Violation Comparison for Modified AK Algorithm

since it effectively reduced maximum initial violation to 7.69, compared to 43, 523 in

previous runs. This selection followed the simple intuition that the number of constraints

that will be satisfied in this way (the covering constraints) will be much more than those

that will be violated (the packing constraints).

Figure 4.6 compares the violation of the above methods in each iteration. As it can be

seen, the performance was sharply improved by modifying epsilon, such that the algo-

rithm converged to the expected maximum violation after 570 iterations. The number

of iterations further decreased to 216 after the modified initial solution was used.

After rounding from fractional to Boolean, the AK solution satisfied 81.11134% of the

overall weights, which was 0.853695 of the optimal solution.

4.5.2 Large-scale Experiment

In this experiment 1, 500, 000 textual facts from the complete Academic Corpus were

used. The generated problem contained 1, 161, 170 clauses over 1, 083, 898 positive and

negative variables with 39, 123 competitor sets for 836, 939 hypotheses. The longest com-

petitor set had 3 variables, and the average length of all competitor sets was 2.547683.


Mid-size Large-scaleCPlex FM* AK CPlex FM* AK

Weight of Satis-fied Clauses (%of total)

95.01208 94.6214 81.11134 93.37597 85.62238 78.26682

ApproximationRatio (% ofoptimal weight)

100 99.5888 85.3695 100 91.69638 83.81902

Table 4.2: Case Study Results

Due to the large dimensions of clauses and variables, computing further statistics was

not straightforward.

Generating the hypotheses took around 122 hours. The corresponding MAX-SAT prob-

lem was composed of 1, 742, 242 constraints for 2, 245, 068 variables, and was constructed

in 5 minutes, roughly.

The optimal solution to this problem with Boolean variable types was computed with

CPlex method in the course of 101.4 seconds. The optimal solution satisfied 93.37597%

of all weights. The FMS* solution was found after 63 hours, and satisfied 85.62238%

of the overall weights, which is 0.9169638 of the optimal solution. The AK algorithm

with varying epsilon and designed initial solution terminated after 535 iterations, and

satisfied 78.26682% of all weights, that was 0.8381902 of the optimal solution. Table 4.2

summarizes the results.


01000

20003000

4000

0 5 10 15 20 25

Tim

e [s]

Node/Slot

map

shufflesort

reducefailed

Fig

ure

4.7

:M

R-M

ixedPC

JobH

istory


050

010

0015

0020

0025

00

0510152025

Tim

e [s

]

Node/Slot

map

shuf

fleso

rtre

duce

faile

d

Fig

ure

4.8

:M

R-M

ixed

PC

-EJo

bH

isto

ry


0500

10001500

20002500

3000

0 5 10 15 20 25

Tim

e [s]

Node/Slot

map

shufflesort

reducefailed

Fig

ure

4.9

:M

R-M

ixedPC

-SJob

History


010

0020

0030

0040

00

02468

Tim

e [s

]

Node/Slot

map

shuf

fleso

rtre

duce

faile

d

Ave

rage

CP

U u

tiliz

atio

n

Tim

e

CPU utilization [%]

0.57

6999

9027

2522

429.

5769

9990

2725

858.

5769

9990

2725

1287

.576

9999

0273

1716

.576

9999

0273

2145

.576

9999

0273

2574

.576

9999

0273

3003

.576

9999

0273

3432

.576

9999

0273

3861

.576

9999

0273

4290

.576

9999

0273

020406080100

020406080100

Use

rS

ysW

ait

Dis

k

Tim

e

Traffic [MB/s]

0.57

6999

9027

2522

429.

5769

9990

2725

858.

5769

9990

2725

1287

.576

9999

0273

1716

.576

9999

0273

2145

.576

9999

0273

2574

.576

9999

0273

3003

.576

9999

0273

3432

.576

9999

0273

3861

.576

9999

0273

4290

.576

9999

0273

010203040506070

010203040506070

Rea

d M

b/s

Writ

e M

b/s

Net

wor

k

Tim

e

Traffic [MB/s]

0.57

6999

9027

2522

429.

5769

9990

2725

858.

5769

9990

2725

1287

.576

9999

0273

1716

.576

9999

0273

2145

.576

9999

0273

2574

.576

9999

0273

3003

.576

9999

0273

3432

.576

9999

0273

3861

.576

9999

0273

4290

.576

9999

0273

−5000500

−5000500

Loca

l rea

dR

emot

e re

adLo

cal w

rite

Rem

ote

writ

e

Fig

ure

4.1

0:

MR

-Mix

edP

CR

esou

rce

Uti

lizat

ion

Dia

gram


0500

10001500

20002500

0 2 4 6 8

Tim

e [s]

Node/Slot

map

shufflesort

reducefailed

Average C

PU

utilization

Tim

e

CPU utilization [%]

0.263000011444092280.263000011444

552.263000011444824.263000011444

1096.263000011441368.26300001144

1640.263000011441912.26300001144

2184.263000011442456.26300001144

2728.26300001144

0 20 40 60 80 100

0 20 40 60 80 100

User

Sys

Wait

Disk

Tim

e

Traffic [MB/s]

0.263000011444092280.263000011444

552.263000011444824.263000011444

1096.263000011441368.26300001144

1640.263000011441912.26300001144

2184.263000011442456.26300001144

2728.26300001144

0 10 20 30 40 50 60 70

0 10 20 30 40 50 60 70

Read M

b/sW

rite Mb/s

Netw

ork

Tim

e

Traffic [MB/s]

0.263000011444092280.263000011444

552.263000011444824.263000011444

1096.263000011441368.26300001144

1640.263000011441912.26300001144

2184.263000011442456.26300001144

2728.26300001144

−400 0 200 400

−400 0 200 400

Local readR

emote read

Local write

Rem

ote write

Fig

ure

4.1

1:

MR

-MixedP

C-E

Resource

Utilization

Diagram


050

010

0015

0020

0025

0030

00

02468

Tim

e [s

]

Node/Slot

map

shuf

fleso

rtre

duce

faile

d

Ave

rage

CP

U u

tiliz

atio

n

Tim

e

CPU utilization [%]

0.26

5000

1049

0417

530

6.26

5000

1049

0460

3.26

5000

1049

0490

0.26

5000

1049

0411

89.2

6500

0104

914

69.2

6500

0104

917

49.2

6500

0104

920

29.2

6500

0104

923

09.2

6500

0104

925

89.2

6500

0104

928

69.2

6500

0104

9

020406080100

020406080100

Use

rS

ysW

ait

Dis

k

Tim

e

Traffic [MB/s]

0.26

5000

1049

0417

530

6.26

5000

1049

0460

3.26

5000

1049

0490

0.26

5000

1049

0411

89.2

6500

0104

914

69.2

6500

0104

917

49.2

6500

0104

920

29.2

6500

0104

923

09.2

6500

0104

925

89.2

6500

0104

928

69.2

6500

0104

9

020406080

020406080

Rea

d M

b/s

Writ

e M

b/s

Net

wor

k

Tim

e

Traffic [MB/s]

0.26

5000

1049

0417

530

6.26

5000

1049

0460

3.26

5000

1049

0490

0.26

5000

1049

0411

89.2

6500

0104

914

69.2

6500

0104

917

49.2

6500

0104

920

29.2

6500

0104

923

09.2

6500

0104

925

89.2

6500

0104

928

69.2

6500

0104

9

−4000200400

−4000200400

Loca

l rea

dR

emot

e re

adLo

cal w

rite

Rem

ote

writ

e

Fig

ure

4.1

2:

MR

-Mix

edP

C-S

Res

ourc

eU

tiliz

atio

nD

iagr

am

Chapter 5

Conclusion and Future Work

Many discrete optimization problems are NP-hard. In consequence, it is not possible to

have an algorithm for most interesting problems that simultaneously finds the optimal

solution, in polynomial time, for any instance. By far the most common approach to this

problem relaxes the requirement of optimality of the solution, and settles for a ”good

enough”, approximate solution that could be found in tractable time. Linear program-

ming plays a central role in design and analysis of many approximation algorithms. LP

problem instances in real-world applications, however, tend to grow enormously.

Rapid improvement and availability of cheap, commodity high-performance components

was the driving force for a new era in computing to use networks of computers to handle

large-scale computations.

Our Work: In this work performance of the Awerbuch-Khandekar (AK) parallel method

for solving linear programs was examined, and a modification to speed up the algorithm

in practice was suggested. In comparison with partially-distributed optimization tech-

niques, AK represented numerical stability, and comparable performance. Next, the

implementation of the algorithm in MapReduce was considered, and opportunities for

efficient realization of the algorithm were studied.

Finally, the solution of the algorithm on mid-size and large-scale MAX-SAT problems,

generated by SOFIE knowledge extraction framework, were compared to FMS* and

CPlex methods. Despite the large dimensions of the problems, both instances proved

to be trivial, such that CPlex was able to find the optimal solution in both cases in

the course of a few minutes. Further investigation of the clauses revealed that despite

the large dimensions of the problems, the average number of variables in each clause

was less than 2, which was due to the specific, short-length standard rules in SOFIE

framework. The same phenomenon was observed in competitor sets with an average of

45

Chapter 5. Conclusion and Future Work 46

2.55 variables in each. Furthermore, the ratio of the number of competitor sets over the

number of clauses in both instances was less than 0.05.

On the other hand, FMS* took 19 hours without batching on the mid-size, and 63 hours

with batching on the large-scale problem, with the approximation ratio degrading in the

latter instance from 99.58% to 91.69%. Ideally, when running over batches, the flow of

information is bidirectional, meaning that previously accepted hypotheses are seen in

forthcoming batches, and at the same time, wrongly accepted hypotheses from previous

batches are rejected based on the new evidences in the current batch. However, when

SOFIE accepts a hypothesis, it is stored in the knowledge-base and is treated as a fact

by the upcoming batches. As a result, wrongly accepted hypotheses can not be rejected,

and they might even seed to generate and accept further wrong hypotheses.

The modified AK algorithm with a designed initial solution presented high performance

improvements on MAX-SAT instances, and achieved well above 80% of the optimal

weight in both cases.

5.1 Future Work

Our work can be further expanded to consider other parallel LP solvers, in order to

compare their strengths and weaknesses with the AK algorithm that we used. Also,

more concrete analysis on the choice of the initial solution and the modification scheme

for ε in the modified version of the algorithm can be helpful to generalize this method

to other problem instances. Finally, it would be interesting to apply this framework in

other case studies with possibly non-trivial, large problem instances.

Chapter 6

Appendix A.

Algorithm 6.1: MR-MixedPC-E: ReduceII, Preprocessing Phaseinput: Key, MatrixEntryList

MatrixColumn column = new MatrixColumn()1


column.add(MatrixEntry)3

end4

Emit(Key, column)5

Algorithm 6.2: MR-MixedPC-S: MapI, Iterative Phaseinput: Key, MatrixRow


foreach MatrixEntry in MatrixRow do2


sump ← sump+ MatrixEntry.value ∗x[ MatrixEntry.column ]4

else5

sumc ← sumc+ MatrixEntry.value ∗x[ MatrixEntry.column ]6

end7

end8

if sump > 0 then9

value← µ ∗ (sump − 1)10

Emit((y, Key), value)11

end12

if sumc > 0 then13

value← µ ∗ (1− sumc)14


end16

47

Appendix A. 48

Algorithm 6.3: MR-MixedPC-S: MapII, Iterative Phaseinput: Key, MatrixColumn

foreach MatrixEntry in MatrixColumn do1


y[MatrixEntry.row]← ln(MatrixEntry.value) + y[MatrixEntry.row]3

else4

z[MatrixEntry.row]← ln(MatrixEntry.value) + z[MatrixEntry.row]5

end6

end7

j ← Key8

lnr ← lsum(y)− lsum(z)9

if lnr ≤ ln (1− α) then10

xj ← max (xj (1 + β) , δ)11

else if lnr ≥ ln (1 + α) then12

xj ← xj (1− β)13

end14

Emit(j, xj)15

Chapter 7

Appendix B.

CPlex Log for Mid-size MAX-SAT Problem:

Rcplex: num variables=188198 num constraints=138326

Tried aggregator 3 times.

MIP Presolve eliminated 6898 rows and 28891 columns.

Aggregator did 103537 substitutions.

Reduced MIP has 27891 rows, 55770 columns, and 105854 nonzeros.

Reduced MIP has 45359 binaries, 10411 generals, 0 SOSs, and 0 indicators.

Presolve time = 0.74 sec.

Clique table members: 1509.

MIP emphasis: balance optimality and feasibility.

MIP search method: dynamic search.

Parallel mode: none, using 1 thread.

Root relaxation solution time = 0.20 sec.

Nodes Cuts/

Node Left Objective IInf Best Integer Best Node ItCnt Gap

* 0+ 0 2370076.0215 0 ---

0 0 5284574.1144 179 2370076.0215 5284574.1144 3426 122.97%

* 0+ 0 5284056.7538 5284574.1144 3426 0.01%

49

Appendix B. 50

CPlex Log for Large-scale MAX-SAT Problem:

Rcplex: num variables=2245068 num constraints=1742242

Presolve has eliminated 0 rows and 0 columns...


Aggregator has done 1208457 substitutions...



Presolve has improved bounds 104685 times...



Presolve has improved bounds 104685 times...

Tried aggregator 5 times.

MIP Presolve eliminated 187999 rows and 514159 columns.

Aggregator did 1316592 substitutions.

Reduced MIP has 237651 rows, 414317 columns, and 814772 nonzeros.

Reduced MIP has 378166 binaries, 36151 generals, 0 SOSs, and 0 indicators.

Presolve time = 81.21 sec.

Clique table members: 21128.

MIP emphasis: balance optimality and feasibility.

MIP search method: dynamic search.

Parallel mode: none, using 1 thread.

Root relaxation solution time = 6.69 sec.

Nodes Cuts/

Node Left Objective IInf Best Integer Best Node ItCnt Gap

* 0+ 0 3.69162e+07 0 ---

0 0 5.56022e+07 2233 3.69162e+07 5.56022e+07 52813 50.62%

* 0+ 0 5.55954e+07 5.56022e+07 52813 0.01%

0 0 5.55999e+07 1045 5.55954e+07 Cuts: 675 53425 0.01%

Zero-half cuts applied: 649

Gomory fractional cuts applied: 21

Bibliography

[1] Baruch Awerbuch and Rohit Khandekar. Stateless algorithms for mixed packing

and covering linear programs with polylogarithmic convergence, 2009. manuscript.

[2] Baruch Awerbuch and Rohit Khandekar. Stateless distributed gradient descent for

positive linear programs. SIAM J. Comput., 38(6):2468–2486, 2009.

[3] Mark Baker and Rajkumar Buyya. Cluster computing: The commodity supercom-

puter. Softw., Pract. Exper., 29(6):551–576, 1999.

[4] S. Balay, K. Buschelman, V. Eijkhout, W. Gropp, D. Kaushik, M. Knepley, L. Curf-

man McInnes, B. Smith, and H. Zhang. Petsc users manual, 2010.

[5] Yair Bartal, John W. Byers, and Danny Raz. Fast, distributed approximation

algorithms for positive linear programming with applications to flow control. SIAM

J. Comput., 33(6):1261–1279, 2004.

[6] Flavio Chierichetti, Ravi Kumar, and Andrew Tomkins. Max-cover in map-reduce.

In WWW, pages 231–240, 2010.

[7] Lois Curfman McInnes, Jorge J. Mor’e, Todd Munson, and Jason Sarich. Tao users

manual, 2010.

[8] George B. Dantzig and Mukund N. Thapa. Linear Programming 1: Introduction,

chapter The Linear Programming Problem. Springer, 1997.

[9] Jeffrey Dean and Sanjay Ghemawat. Mapreduce: Simplified data processing on

large clusters. In OSDI, pages 137–150, 2004.

[10] Klaus Jansen. Approximation algorithm for the mixed fractional packing and cov-

ering problem. SIAM Journal on Optimization, 17(2):331–352, 2006.

[11] Fabian Kuhn, Thomas Moscibroda, and Roger Wattenhofer. The price of being

near-sighted. In SODA, pages 980–989, 2006.

[12] Hai-Guang Li, Gong-Qing Wu, Xuegang Hu, Jing Zhang, Lian Li, and Xindong

Wu. K-means clustering with bagging and mapreduce. In HICSS, pages 1–8, 2011.

51

Bibliography 52

[13] Chao Liu, Hung chih Yang, Jinliang Fan, Li-Wei He, and Yi-Min Wang. Distributed

nonnegative matrix factorization for web-scale dyadic data analysis on mapreduce.

In WWW, pages 681–690, 2010.

[14] Michael Luby and Noam Nisan. A parallel approximation algorithm for positive

linear programming. In STOC, pages 448–457, 1993.

[15] Faraz Makari Manshadi, Baruch Awerbuch, Rohit Khandekar, Julian Mestre,

Mauro Sozio, and Gerhard Weikum. Message-passing and map-reduce algorithms

for mixed packing-covering optimization with applications in data management.

Technical report under preparation.

[16] Christos H. Papadimitriou and Mihalis Yannakakis. Linear programming without

the matrix. In STOC, pages 121–129, 1993.

[17] Marcus Paradies. An efficient blocking technique for reference matching using

mapreduce. Datenbank-Spektrum, 11(1):47–49, 2011.

[18] Roland Pihlakas. Method for calculating precise logarithm of a sum, 2007.

manuscript.

[19] Fabian M. Suchanek. Automated Construction and Growth of a Large Ontology.

PhD thesis, Saarland University, 2009.

[20] Fabian M. Suchanek, Gjergji Kasneci, and Gerhard Weikum. Yago: A large ontol-

ogy from wikipedia and wordnet. J. Web Sem., 6(3):203–217, 2008.

[21] Fabian M. Suchanek, Mauro Sozio, and Gerhard Weikum. Sofie: a self-organizing

framework for information extraction. WWW, pages 631–640, 2009.

[22] Leonard W. Swanson. Linear Programming Basic Theory and Applications, chapter

Generalized Linear Programming Problems. McGRAW-HILL, 1985.

[23] Vijay V. Vazirani. Approximation Algorithms, chapter Maximum Satisfiability.

Springer, 2003.

[24] Tom White. Hadoop - The Definitive Guide: MapReduce for the Cloud. O’Reilly,

2009. ISBN 978-0-596-52197-4.

[25] David P. Williamson and David B. Shmoys. The Design of Approximation Algo-

rithms. Cambridge University Press, 2010.

[26] Neal E. Young. Sequential and parallel algorithms for mixed packing and covering.

In FOCS, pages 538–546, 2001.

[27] Weizhong Zhao, Huifang Ma, and Qing He. Parallel -means clustering based on

mapreduce. In CloudCom, pages 674–679, 2009.

Solving Linear Programs in MapReduce - Max Planck · PDF fileSolving Linear Programs ......

Documents

Transcript of Solving Linear Programs in MapReduce - Max Planck · PDF fileSolving Linear Programs ......