A case study in geometric algebra: Fitting room models to ... · 4.1 Geometric algebra When one...

BSC THESIS (15 ECTS)A case study in geometric algebra: Fitting roommodels to 3D point clouds

Author:Moos HUETING Supervisors:Dr. Marcel WORRINGDr. Daniël FONTIJNEJuly 15, 2011AbstractMany geometrical problems exist which have been researched thoroughly, but always using classicalmethods such as linear algebra as a framework for the problem. As linear algebra is an algebra basedon coordinates and numbers as basic elements of computation, this leads to longwinded and non-universalcode. Geometric algebra is an alternative formalism in which geometric objects are the basic elements ofcomputation. Using this formalism to represent geometrical problems can often yield more readable andmore compact code. In this paper we present a case study of such a problem – specifically fitting roommodels to 3D point clouds – and the advantages geometric algebra has over classical methods in solvingthis problem.

Contents

1 Introduction 2

2 Context 3

3 Research question 4

4 Method 54.1 Geometric algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54.1.1 Overview of geometric algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54.1.2 On compactness of expression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74.1.3 Example: Plane through three points, two methods . . . . . . . . . . . . . . . . . . . . 74.2 RANSAC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84.3 Hough transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94.3.1 Nearest Neighbour Hough Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94.3.2 Unique representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104.4 3D to 2D . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 Experiments 135.1 Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135.1.1 GA implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135.1.2 Generating the datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135.2 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135.2.1 Artificial set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135.2.2 Real set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145.3 RANSAC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145.4 Hough transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145.5 Hough transform, 3D to 2D . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165.6 On computational speed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176 Conclusions 18

7 Discussion 18

1

1 Introduction

When tackling geometrical problems, one of the most common approaches is to create a model of the problemin classical linear algebra, solving it in that formalism using numbers as the basic elements of computation.This method has been used for a long time and yields satisfactory results (see section 2). However, theuse of numbers as basic elements in a geometric problem generates long-winded expressions which arein most cases far from intuitive. Here lies a possibility for improvement. A cleaner way of representinggeometric problems would arise if there was a way in which we could represent the objects about whichwe are reasoning in a more direct way. Geometric algebra (GA) [1], an alternative algebra for representingand computing with geometric objects and problems, fills that void. In GA, complete subspaces such asplanes and lines are the elements of computation. As a result, computations can be done directly on theseelements, without the need for manually manipulating any of their coordinates. This creates a compactness ofexpression which generates clean and compact code. The advantages of this algebra over classical methodsare easily expressed but hard to clarify with an in-depth theoretical analysis. After all, going very deep intothe nuts and bolts of the algebra will most likely not demonstrate the compactness of expression which isapparent when we deal with it on the surface. As a result, we have chosen to use a case study to show theadvantages of using GA over classical methods in geometric problems.For the case study, we tackle the problem of creating 3D models from multiple 2D representations. This isa prime example of a geometrical problem which has been discussed using classical formalisms many times(see section 2). As we want to show the advantages of using GA for problems with a geometrical context,this problem makes for a good case study.2D representations of the environment such as photographs have many uses, but are quite limited. Theworld we live in is not two dimensional. By definition, when creating a 2D representation of a 3D world1,information is lost. Using 3D models of the environment thus creates new possibilities.Creating 3D models of any kind of environment from scratch using manual modeling software is expensive,and the process takes too much time to be effectively used on a regular basis. 2D imagery, on the otherhand, is easy to create and quick to come by. Using multiple 2D views of an environment to create 3Dmodels, thus, removes the difficulties posed by the direct way of generating these models.This method of reconstruction is twofold. First, using multiple view geometry, a 3D point cloud is createdfrom the 2D imagery. Then, using this 3D point cloud, a surface 3D reconstruction is created from thecaptured scene. The generation of these point clouds is fairly straightforward and can be done in manydifferent ways. In the following section we will briefly discuss the research done on this topic. After thepoint cloud has been generated, the step of fitting the point cloud to a 3D surface is more challenging andhas been subject of many diverse approaches. All efforts done in the past used classical, linear algebrabased techniques. We will discuss the same problem, but instead using the formalism of GA.In section 2, we will describe previous work done on generating 3D surfaces from reconstructed point clouds,and briefly mention the work done on generating the point clouds themselves. Afterwards, we will detail theresearch question posed in this paper, presenting the added value given by this work. In section 4 we willdescribe the methods used in this work, combining geometric algebra with classic techniques like RANSACand Hough transforms. The experiments done with these methods and the results generated are presentedin section 5. The conclusions that can be extrapolated from the research will be discussed in section 6.Finally, in section 7 we will discuss possible future work and relevant improvements.

1A 3D world in which the dimensions are independent, which they are in our world.2

2 Context

The problem in our case study is not a new one. Much research has been done on how to most effectivelyreconstruct 3D models from 2D imagery. This process is twofold. The first step deals with reconstructing a3D point cloud from the captured 2D data (in most cases photography), also called multiple view geometry,after which the actual surface model of the environment is reconstructed from that point cloud in the secondstep.The standard work on multiple view geometry was written by Hartley & Zisserman [2], which describes thecomplete process of generating 3D point clouds from 2D imagery. One recent novel method for generatinga point cloud from multiple 2D representations of an environment was presented by Esteban [3]. It is basedon creating a simulated stereo vision model using a single camera, where two photographs taken shortlyafter each other in time represent 2 virtual cameras, which are used as input for a stereo vision model, whichremoves the need for a dual camera system when reconstructing 3D environments in real time.When the point cloud has been constructed, it should be fitted to actual surface models to regenerate theenvironment from which the original data was captured. This particular problem has been studied numeroustimes. One method for regenerating the actual surface model of the environment from the generated pointcloud is based on RANSAC. It was presented by Schnabel et al. [4]. They proposed a modified version ofRANSAC (which we will discuss in section 4.2) specifically tailored to finding a number of different shapesin a point cloud.Another method which is often used for finding specific shapes within a dataset is the Hough transform,which we will discuss in section 4.3. Although commonly used for detecting lines and circles in 2D datasets,it can also be used in a 3D environment or point clouds derived from them. Recently, Borrmann et al. [5]devised a method of implementing the Hough transform in 3D environments, especially designed for findingplanes.We have seen that many methods have been proposed for resolving the problem at hand, and although agood deal of research has already been performed, there has been no single mention of geometric algebra.The compactness of expression inherent to GA when dealing with geometric problems should be apparentwhen applying it to our case.Although geometric algebra was first mentioned and discussed by Grassmann [6], we have turned mostly toa quite recent book by Dorst et al. [1] for the geometric algebra used in our research. It gives a completeoverview of the algebra as well as how it can be applied in a computer driven setting.

3

3 Research question

Much research has been conducted on reconstructing a 3D surface model from a reconstructed 3D pointcloud. However, in all research done, the basic elements of computation are real numbers. This makesfor tedious equations and computations, as the surfaces and points in question are only parametrized bytheir numerical constituents. It would be efficient if the points and surfaces could be viewed as elementsthemselves. This is where geometric algebra comes in.Geometric algebra is an algebra in which different kinds of geometrical shapes and objects are the basicelements of computation, such as lines and planes. When they are created, different operators are availablewhich can easily compute a number of relations between the terms of the equation, such as the distancebetween two points or the intersection of a plane and a line. This algebra does not make new computationspossible, but it does simplify many computations significantly, which means that tasks involving geometricalcomputations will often be more compactly represented compared to using classical methods.The question addressed in this paper arises from these points. Although the problem at hand has beensolved with classical techniques many times before, does the use of geometric algebra result in more compactequations and code? Furthermore, does the use of geometric algebra present any other advantages that onedoes not readily acknowledge? In section 4 we will give a comparison of the code when using geometricalgebra and the code that solves the same problem with classical methods, and present any other findingsregarding the difference between them. Section 5 will show some results we have gathered using animplementation we created using the methods described in section 4.

4

4 Method

4.1 Geometric algebra

When one needs to represent geometric objects, geometric algebra offers an alternative to the algebraicapproach. In geometric algebra, geometric entities are the basic elements of computation and can behandled without working with the coordinate constituents of the objects. Because of its geometric nature,many problems concerning the manipulation of geometric space and objects yield more intuitive computationsthan when using a classical representation.For reference, we discuss some (but not all) of the operators and objects available in geometric algebra. Fora more in-depth resource on the workings of GA we refer to Dorst et al. [1].4.1.1 Overview of geometric algebra

Basis vectors Similar to linear algebra, the most basic elements of GA are the basis vectors of which thedirectional space is comprised. In 3D space, these are e1, e2, e3 and each of these correspond to one of thex, y, z directions2.The specific model we deal with (Conformal Geometric Algebra or CGA) expands the directional space with2 extra dimensions, which correspond to the point in the origin o (the origin can be chosen arbitrarily) andthe point at infinity∞. This point at infinity is a point which all lines and planes have in common and whichdoes not change under Euclidean transformations. By making this point explicit, the algebraic patterns ingeometrical statements become more universal [1].Outer product The outer product, also called the wedge product is denoted as ∧ and spans the subspacecomprised of its constituents. For example, e1 ∧ e2 denotes the subspace of all multiples of e1 and e2. Suchan outer product results in a blade, and its dimensionality is called its grade.3 This product is defined overall elements of GA and is purely algebraic.

Figure 1: Basis vectors e1 and e2 Figure 2: The blade which results from the outer prod-uct between e1 and e2: e1 ∧ e2. It represents the sub-space spanned by the constituents (e1, e2). In a 2Dspace, this is of course the complete space, in whichcase the blade is called the pseudoscalar

2It is possible to use a representational space in which the basis vectors correspond to completely different directions, but we willnot cover that possibility here.3An element comprised of blades of different grades is called a multivector)5

A blade with the same grade as the representational space is called a pseudoscalar and is denoted as Inwhere n is the grade of the pseudoscalar. All pseudoscalars of the same space are scalar multiples of eachother (see figure 2).Contraction The contraction is a more abstract product and has been expressed by Dorst et al. [1] as:

The contraction A on B of a blade A of grade a and a blade B of grade b is a specific sub-bladeof B of grade b− a perpendicular to A, with a weight proportional to the norm of B and to thenorm of the projection of A onto B.It can be used to ‘take a certain subspace out of another subspace’. For example, we can use it to retrieveone of the original vectors from which a blade has been previously made up:

A = e1 ∧ e2 → e1cA = e2 (1)For vectors, the contraction is quite similar to the more familiar dot product from linear algebra. In the modelwe use (conformal geometric algebra), though, the dot product has to be extended to the added dimensionsof ~o and ∞, which is where it differs from the classical dot product. The result table is listed in figure 3.

~o e1 e2 e3 ∞~o 0 0 0 0 -1e1 0 1 0 0 0e2 0 0 1 0 0e3 0 0 0 1 0∞ -1 0 0 0 0

Figure 3: Table of outcomes for the contraction between basis vectorsNote that the rules for using the contraction are not as straightforward as this example may make onebelieve. An in-depth discussion of these algebraic rules are offered in [1].Geometric product The geometric product is the fundamental product of geometric algebra and all otherproducts are derived from it. It is simply denoted by a space: ~a~b means the geometric product between aand b. For vectors, it is simply defined as

~a~b = acb+ a ∧ b (2)In more concrete terms, we can say that the geometric product between two elements contains every re-lationship between those elements (for example the distance between them, the angle between them, theircontainment relationship, etc.).The definition of the geometric product for objects of higher grade (dimensionality) is too involved to listhere. More details may be found in Dorst et al. [1].Dual form Any expression in geometric algebra has a one-on-one mapping with its dual form. This meansthat any object in GA can be expressed in 2 ways: directly and dually. The dual form of an object is denotedby the operator ∗ and can be computed from the direct form using a simple equation:

P∗ = PcI−1n (3)

Conversely, the direct form can be retrieved from the dual form using undualization:P = P∗cIn also written as P = (P∗)−∗ (4)

The geometric interpretation of the dual form is not always easily extrapolated, but for many objects the twodifferent representations both correspond to classical well-known representations. For example, the direct6

form of a plane is the outer product (which spans a subspace between elements as we have just discussed)between three points on the plane and the point at infinity, while its dual form uses a combination of theplane’s normal vector and distance to the origin to define it fully, a representation which users of linearalgebra should be familiar with.Conformal geometric algebra Geometric algebra is an algebra which can be implemented using manydifferent models. The model we use is called conformal geometric algebra which is specifically designed forEuclidean geometry and its transformations. All Euclidean transformations (those comprised of rotations,reflections, translations and their compositions) can be expressed using the versor product.A versor is simply an object which represents some transformation. The creation of such a versor is oftentimesquite simple, and we will encounter such a computation in section 4.4. The transformation can be appliedby computing the versor product. With a versor V and a to-be-transformed object O this is done as follows:

Ot = VOV−1 (5)Here V−1 is the inverse of the versor. All orthogonal transformations (i.e. transformations that preserveangles and lengths of vectors) can be represented like this, and in CGA all Euclidean transformations areorthogonal. This creates a very compact way of expressing quite complex transformations in a universal way.Moreover, in CGA, points can be expressed explicitly and different from vectors. While a simple vectorcomprised of multiples of the 3 basic direction vectors (e1, e2, e3) denotes a direction in space, we would liketo make an explicit representation of an actual point in space. In CGA, we represent such a point with

p = ~o+ ~v + 12~vc~v∞ (6)This representation, combined with the contraction table listed before in figure 3, means we can simply usethe following equation to find the distance between two points:

D = p1cp2 (7)This distance measure also means that we can easily check if two points are identical: if the above equationreturns 0, the points in question are clearly in the same position and thus the same point.4.1.2 On compactness of expression

The advantages of geometric algebra we will show are based on their compactness of expression. However,it must be noted that this does not simply constitute a shorter way of writing down the problem definition.After all, using natural language we can easily define the complete problem at hand as “fit a room model tothis given point cloud”. The difference lies in the fact that the representation given by geometric algebra iscompletely deterministic: following the rules of the algebra (of which we have listed a selection in section4.1.1), one can calculate the result of any single expression. There are no external functions involved otherthan the basic rules for calculating the different products in the algebra.Solving the problem stated as “fit a room model to this given point cloud” is obviously not as straight-forward.We are thus talking about compactness of expression while retaining the possibility of directly calculatingthe result of the expression.4.1.3 Example: Plane through three points, two methods

We will try to make the workings of geometric algebra more concrete using an example. Here we will lista geometric problem together with its solution using classical methods. Then, for comparison, we solve thesame problem using GA to show its compactness of expression.Two geometrical operations we need for our case study are creating the plane P through three given pointsp1, p2, p3 and calculating the distance between any arbitrary point and such a plane (see section 4.2). Inlinear algebra, given these three points, one first calculates the normal of the plane:

n = (p1 − p2)× (p1 − p3) (8)7

where × denotes the cross product. This normal vector combined with any of the three points (pd) definesthe plane fully. Calculating the distance between an arbitrary point pa and this plane is a relatively involvedoperation. One first calculates the vector w from pd to pa, and then projects this vector onto the normal n.The length of the resulting vector is equal to the distance D of the point to the plane:D = |projnw| = |n · w||n| (9)

Even though these computations are relatively cheap to perform, quite some mathematics are involved andthe process is not intuitive.In conformal geometric algebra, we can use the outer product (∧) to span a subspace S using as manyconstituents as necessary. A sphere, for example, is defined by any four points on its surface. Rememberingthat the point at infinity is common to all planes and lines (section 4.1), with three given points p1, p2, p3the plane through these points is thus created with the simple equationP = p1 ∧ p2 ∧ p3 ∧∞ (10)

This fully defines the plane. Using this representation, one can easily calculate the distance between P andan arbitrary point p. To do so, the plane is converted to dual form, after which the distance is given usingthe contraction [1]:D = pcP∗ (11)This method, although not necessarily4 faster (see section 5.6), is more intuitive and generates code that isclear and easy to maintain.

4.2 RANSAC

RANSAC is an iterative method used to estimate parameters of some (mathematical) model making use ofa set of datapoints containing outliers. First published in 1981 by Fischler and Bolles [7], it has seen quitesome variations, but the core has remained the same.The assumption upon which RANSAC is based is that a dataset contains valid datapoints and outliers. Inan iterative manner, datapoints are randomly selected from the set and a model is fitted to those points. Anerror measure is calculated from that model given the rest of the points, and noted. Then the process startsover again. This process is repeated a number of times, after which the best model (with the lowest errormeasure) is returned as the right model.RANSAC can be described textually as presented in algorithm 1.Algorithm 1 RANSAC1. Select at random the minimum amount of points necessary to determine the model parameters2. Create model from these points3. Determine how many points in the total set of points lie within a predefined threshold θ of the model4. If this model has a lower error measure than the current best model, save it5. Repeat steps 1 through 4 for a predetermined amount of N steps6. Return best modelA plane is defined by just three points, and thus for the problem at hand we select 3 points at random fromthe dataset and generate the plane through these points. In section 4.1 this was shown to be defined as

P = p1 ∧ p2 ∧ p3 ∧∞ (12)This is a basic element of computation and the original elements from which the plane was constructed arenot needed for any further computation involving the plane.5 The distance D between the plane P and any

4This depends on the efficiency of the implementation.5The original elements are conversely also impossible to recreate from the combined representation.8

arbitrary point p was defined asD = P∗ c p (13)where P∗ is the dual form of P. These two computations are performed iteratively as shown in algorithm 1.The results gathered using this method are presented in section 5.3.

4.3 Hough transform

The Hough transform is a technique used for feature extraction, mostly seen in image analysis. It provides amethod for finding imperfect instances of a certain class of shapes within a dataset using a voting procedure.This voting procedure is carried out in parameter space, whose dimensionality is equal to the numberof unknown parameters of the shape class to be detected. The idea behind the transform is relativelystraightforward: for each point in the dataset, the shapes that can be formed containing that point aregenerated. An accumulator array stores the occurrences of these shapes using their parameters. If a certainshape is present in the dataset, all points in the set that lie on this shape should cluster around its parametersin the accumulator array. In the end, the local maxima in the accumulator space correspond to shapes foundin the dataset.In its most basic form, the Hough transform can be described as in algorithm 2.Algorithm 2 Hough TransformA← {}for all p ∈ dataset dopar← parameters of pA[par]← A[par] + 1

end forreturn local maxima in A

This process is computationally quite expensive. For each point in the dataset, a quite large number ofparametrized shapes have to be generated (the number of shapes generated for each point dictates theprecision with which shapes can be detected). With the task at hand, a dataset with 50000 points is notout of the ordinary. This yields a computation which is infeasibly expensive. Another version of the Houghtransform called the Randomized Hough Transform (RHT), presented in 1990 by Xu [8], removes this problem.For a shape class defined by n parameters, instead of passing through each point, n points are selected atrandom and mapped to 1 point in the accumulator array. This procedure is then repeated. After some time,the accumulator array will show local maxima at the parameters corresponding to shapes in the dataset.Algorithm 3 Randomized Hough Transform for planesA← {}repeatp1, p2, p3 ← random selection of three points from datasetpar← parameters of plane (normal and distance to origin) defined by p1, p2, p3A[par]← A[par] + 1 {Here A[par] can be a new cell or an already existing cell with a maximum distanceto the current parameters of δ}

until accumulator array has clear maximareturn local maxima in A

The error threshold δ specified in algorithm 3 above is introduced because the dataset used could be quitenoisy, making the parametrized planes not identical. In our case, even though two parametrized planes couldrepresent the same real plane in the room environment, they could have parameters which are not identicalbecause of measurement noise. This way these planes would still fall in the same accumulator cell.4.3.1 Nearest Neighbour Hough Transform

To make the process even faster, we have opted for not choosing the three points from the dataset at random,but first creating a table listing the 2 nearest neighbours for each of the datapoints. Then, the accumulatorarray is filled by passing over each point in the dataset and creating the plane through it and its two9

nearest neighbours. This increases the speed significantly, as the probability that 3 points that lie veryclose together are part of the same plane is much higher than a selection of 3 random points from the set.4.3.2 Unique representation

For storing the generated planes in the accumulator array, we need to make sure that the planes generatedare unique: a plane generated from a set of three points S should render the exact same representation as aplane generated from another set of three points S on that same plane, otherwise a local maximum will neverform in the parameter space. Such a unique representation is easily extrapolated from the representationused in the previous section.The dual form of a plane in CGA is a simple vector, in which the e1, e2, e3 components (the Euclidean part)denote the normal direction and the ∞ component is proportional to the distance of the plane from theorigin. If the plane is normalized, the ∞ component is equal to the distance to the origin. When the planeis normalized, just 3 values need to be saved in the accumulator array in order to uniquely store the plane:Pn = P∗

P∗cP∗ (14)Now just two Euclidean components and the∞ component need to be saved, the third Euclidean componentcan be retrieved by acknowledging that because the plane is normalized, the following equation must hold:√

e21 + e22 + e23 = 1 (15)The fact that three components need to be saved is not surprising. In classical techniques, the most commonlyused unique representation of planes is one where the angle θ of the normal with the (x, y) plane is savedtogether with the angle φ of the normal with the (x, z) plane and the distance to the origin. These areexactly the degrees of freedom a plane in 3D has, so our unique representation in GA cannot possibly getany more compact.There is one more issue which we need to take into account: a plane defined by normal vector E is in ourcase the same as the plane defined by normal vector −E . By specifying that we want the largest componentof the normal vector to always be positive, we circumvent this problem. If the constructed plane does notsatisfy this constraint, we can simply multiply by −1 to get the representation we want.4.4 3D to 2D

Although the methods listed above are correct and should render good results (setting aside noise in thedata), when looking at the problem closely we should notice that we are essentially dealing with just 2dimensions: the walls of a room stand straight up and are (most of the time) not tilted. Certainly we shoulduse this information to our advantage.One method of using this extra piece of information is by looking at the data from above and treating thatview as a 2D dataset, in which lines need to be found instead of planes. However, the data generated frommultiple view geometry methods are often tilted: it is very likely that the pictures taken were not completelylevel with the horizontal axis, and thus looking directly from above does not correspond with looking at theroom directly from above. This should be corrected first.By first looking for the bottom or top plane (i.e., floor or ceiling) in the dataset, we can then rotate thecomplete dataset so that this plane is level with the horizontal plane. Then, we can look from above asmentioned before and will be left with a lower-dimensional problem.Finding the bottom or top plane by means of RANSAC or the Hough transform can be done by selectingthe points used for generating the planes from a small portion of the set which has the lowest or highestvertical component (e2 in our model). This plane P should then be rotated to be level with the horizontalplane. With p1 as the point corresponding with e1 and p3 corresponding with e3, the horizontal plane H isdefined as

H = ~o ∧ p1 ∧ p3 ∧∞ (16)10

Now, as pointed out in section 4.1.1, we can create a versor to perform the rotation we want. In geometricalgebra, the versor rotating one object A to another B can be computed asV = 1 + BA (17)

which maps to our problem asV = 1 +HP (18)

This is the complete definition of the versor. It can be applied to each point in the dataset, and the resultwill be the dataset rotated so that the found plane is level with the horizontal plane.Now that the point cloud has been rotated appropriately, we can project it onto the horizontal plane. Thehorizontal plane can be defined in dual form by its normal ( ~e2). Afterwards we can span the line perpendicularto the horizontal plane and a single point p in the cloud using the outer product, spanning another subspace:L = e2 ∧ p ∧∞ (19)

The projection of the point p on the horizontal plane is then given by the meet operation, which is definedasPproj = L∗cP−∗ (20)

where −∗ means undualization (section 4.1.1). Doing this operation for all points in the point cloud resultsin the cloud being projected onto the horizontal plane, which is what we strived for.Here we see an incredible difference between linear algebra and geometric algebra. The same problem canbe tackled in linear algebra, but is much more involved. We list it here for comparison.In linear algebra, planes are not direct objects of computation and are represented by a combination ofdifferent vectors, describing the angle in space and the location. In our problem we thus want to rotate theangle vectors describing the found plane to the angle vectors describing the horizontal plane. The rotationof one vector ~v1 to another vector ~v2 is a process comprised of two distinct steps. First, the axis and angleof rotation need to be computed. Then, using this axis and angle, a matrix can be computed which performsthe wanted rotation.Given the two vectors, the axis of rotation is calculated using the cross product, which returns a vectorperpendicular to both constituents:~a = ~v1 × ~v2 (21)

We will need to normalize this axis vector before we can use it:~a = ~a

||~a|| (22)Then, we calculate the angle between these vectors using the dot product:

φ = acos( ~v1 · ~v2|| ~v1]|||| ~v2||

) (23)where acos is the arc cosine. If we denote the x, y and z components of the normalized axis vector as x, y, zrespectively, the following matrix performs the rotation:

R = (1− cos(φ))x2 + cos(φ) (1− cos(φ))xy− sin(φ)z (1− cos(φ))xz + sin(φ)y(1− cos(φ))xy+ sin(φ)z (1− cos(φ))y2 + cos(φ) (1− cos(φ))yz − sin(φ)x(1− cos(φ))xz − sin(φ)y (1− cos(φ))yz + sin(φ)x (1− cos(φ))z2 + cos(φ) (24)

11

This should the be applied to each point in the point cloud. Afterwards, the projection onto the horizontalplane is done by “throwing away” the vertical component, which can be achieved using the following matrix:P = 1 0 00 1 00 0 0

(25)Compare this equation and the step before with the simple versor and the versor product listed above.Clearly, the problem is in this case expressed much more compact in geometric algebra and generates muchcleaner code.

12

5 Experiments

We implemented the methods presented in section 4 and tested the resulting implementations on differentdatasets. Here we present the results using specifically two datasets, one of which was generated artificiallyand another which was extrapolated from pictures of an actual room using a multiple view geometry algorithm.5.1 Software

5.1.1 GA implementation

For our geometric algebra expressions we used GAIGEN by Daniel Fontijne, which is a code generator forgeometric algebra. It was written in C and outputs C code. For purpose of ease of use, we created a couplingbetween Python and the generated C code from GAIGEN. This conversion results in a significantly slowerimplementation than the original C implementation, but makes implementation of the algorithms proposedstraightforward. Implementing the algorithms in C is a surefire way of significantly increasing the speed ofthe algorithms.5.1.2 Generating the datasets

For generating the datasets, a combination of Microsoft Photosynth [9] and PMVS [10] was used. As inputthey use pictures of an environment, and the output is a 3D reconstructed point cloud of the environment.The discussion of these programs is outside the scope of this work.5.2 Data

5.2.1 Artificial set

The first dataset we have used is one which was created artificially, in which two 2D images of a computergenerated room were given to a multiple view geometry algorithm along with handcrafted feature pairs, thusresulting in a very clean dataset with a low amount of noise. It consists of roughly 25000 datapoints in 3Dspace.

Figure 4: Artificial dataset6, front view Figure 5: Artificial dataset, side viewAs can be seen especially in the side view, the back plane consists of many points, and is expected to beeasily found using any of the methods used.

13

5.2.2 Real set

The second dataset used was generated from a number of pictures taken from a room filled with furnitureand other objects. This results in many points inside the room being added to the point cloud, which, forour algorithms, is simply a form of noise.

Figure 6: Real dataset, front view Figure 7: Real dataset, side view5.3 RANSAC

We implemented RANSAC as presented in section 4.2 using GA, and ran the resulting implementation onour two different datasets.As expected, the backplane of the artificial dataset was easily discovered. After the backplane, the rightwall was the next most likely plane to be detected, but this result varied. In some cases, the bottom planewas found first (see figure 8). This is all due to the random aspect of the algorithm.Unfortunately, RANSAC proved to be insufficiently powerful for the much more noisy real dataset. As canbe seen in figure 9, planes were found that do not at all correspond with the walls in the room. This was tobe expected, as the non-empty room has many datapoints in its point cloud corresponding to furniture andother objects present in the environment.5.4 Hough transform

We followed the Hough transform algorithm as we listed it in section 4.3, especially as specified in section4.3.1, using the nearest neighbours of points to increase the speed of the algorithm.The results rendered were quite promising. With the artificial dataset, 4 of the 5 expected planes wereeasily found, with the left planes sometimes being overlooked. As can be seen in the dataset (section 5.2),this is also the wall that is the least represented in the point cloud.

6N.B.: Only 1 in every 100 points shown14

Figure 8: RANSAC run on the artificial dataset. In this particular instance, the bottom and top planes werefound quite well

Figure 9: RANSAC run on the real dataset. As can be seen, the dataset is way too noisy to be successfullyprocessedMore importantly, the results on the real dataset are much better than with RANSAC. Although the sidewalls are still not found, the top and bottom planes are found quite well.

15

Figure 10: Hough run on the artificial dataset.

Figure 11: Hough run on the real dataset. The ceiling and floor of the room are generated quite well.5.5 Hough transform, 3D to 2D

As described in section 4.4, we have implemented a Hough transform which first finds the bottom or topplane which it then rotates to be level with the horizontal axis. Then the whole point cloud is projected ontothe horizontal plane. The flattened datasets which came out of this are shown below in figures 12 and 13.

Figure 12: Flattened artificial dataset Figure 13: Flattened real datasetUsing these flattened datasets, we could run the Hough transform again but this time trying to find lines

16

which correspond to the walls of the room. The results of this second Hough transform can be seen in figure14 and 15. The green lines in both images correspond to actual walls in the dataset. The yellow line in thereal set corresponds to a quite accurate line within the dataset, but which is not an actual wall. The other(blue) lines are caused by datapoints which do not correspond with any wall and should thus not appear ina perfect version.Interestingly, the right wall is found quite well in the artificial set, something that didn’t happen with theoriginal Hough transform (see section 5.4). However, the left wall is not found as one of the top lines in the2D set.The biggest improvement can be seen in the real set: no single wall was found in the original Houghtransform (only the ceiling and floor), but the left wall is found perfectly in this 2D version.

Figure 14: The 2D Hough transform run on the flat-tened artificial dataset Figure 15: The 2D Hough transform run on the flat-tened real dataset5.6 On computational speed

We have seen that the representational power of geometric algebra overshadows that of linear algebra whenit comes to geometric problems. However, it must be noted that geometric algebra is not a set of algorithmsbut a formalism. This means that in and of itself geometric algebra will not offer an increase in computationalspeed over methods incorporating linear algebra.At the time of writing, computer hardware is optimized for computing linear algebra expressions7. No suchhardware optimizations are present for geometric algebra, although research on it has been done (Mishraet. al [11], [12]).

7A graphics card is in essence nothing more than a very quick matrix multiplier17

6 Conclusions

The classic algorithms that have been discussed in section 2 work well as they stand. However, usinggeometric algebra significantly increases the compactness of expression. This could already be seen withour implementations of RANSAC and the Hough transform, but was most apparent when the full powerof geometric algebra could be used when converting the originally three-dimensional problem to a two-dimensional one.Overall, the methods used for solving the problem at hand were not sufficiently powerful to offer a completestart-to-finish foolproof 3D reconstruction implementation. However, the results were promising (especiallythose of the Hough transform), and could be the starting point for more intricate algorithms, all the whileusing the compactness of expression of geometric algebra to keep the code clean.It must be kept in mind that geometric algebra is only a formalism and thus methods incorporating it arenot inherently quicker than those based on classical methods.7 Discussion

The results rendered were not unexpected. As a formalism tailored specifically to geometric problems, itseemed to fit the problem at hand like a glove. Our expectation of much compacter code was met, as wehave seen in the previous sections. However, the full power of geometric algebra has not been revealedyet. Although a significant improvement over classical methods with respect to compactness of code hasbeen shown for the 3D-to-2D Hough transform, many more intricate details of geometric algebra and theiradvantages when put to use in geometric problems have because of the nature of the case not been touchedon. A case study involving a higher level geometric problem could be the basis of a better display ofrepresentational power.Furthermore, the universal power of geometric algebra means we could easily extend the algorithms discussedto rooms that are not strictly planar but may have spherical components, without a great increase of rep-resentational complexity. More research into this could lead to a witnessed spectacular difference betweenthe representational power of classical methods and geometric algebra concerning geometric problems.The datasets used were very noisy and thus did not render the results we would have liked. Revising thedatasets to be more noise-free or researching methods of cleaning up the data could resolve this issue andis a good start for future efforts.When the planes generated are sufficiently accurate, the corners of the room still have to be extrapolated.Simply intersecting all the planes is not enough, as figure 16 demonstrates.By computing what area of the plane is actually supported by the dataset it could be possible to differentiatebetween actual corners and regular plane intersections. In figure 16 for example, plane P will not find anysupport along the line between point A and point B, thus the intersection at A could be reasoned to be justan intersection, and not an actual corner. As all steps in this procedure8 are quite easily represented ingeometric algebra, implementing this is a suggestion for future research.Implementation-wise, as it stands, the speed of the algorithms implemented could be significantly increasedby porting the implementation to a lower level language like C. The translation steps currently necessaryto switch between C and Python are an enormous bottleneck for speed. Although it will not render novelresults, it will make new datasets available for processing which are currently too large to handle.

8Finding support by calculating distances between points and planes and finding the intersecting lines between planes18

Figure 16: When simply calculating plane intersections, all the room corners do get returned, but also someintersections which are not actual corners

19

List of Figures

1 Basis vectors e1 and e2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 Outer product of e1 and e2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 Table of outcomes for the contraction between basis vectors . . . . . . . . . . . . . . . . . . . . 64 Artificial dataset, front view . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 Artificial dataset, side view . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136 Real dataset, front view . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147 Real dataset, side view . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148 RANSAC run on the artificial dataset. In this particular instance, the bottom and top planeswere found quite well . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159 RANSAC run on the real dataset. As can be seen, the dataset is way too noisy to besuccessfully processed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1510 Hough run on the artificial dataset. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1611 Hough run on the real dataset. The ceiling and floor of the room are generated quite well. . 1612 Flattened artificial dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1613 Flattened real dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1614 The 2D Hough transform run on the flattened artificial dataset . . . . . . . . . . . . . . . . . . 1715 The 2D Hough transform run on the flattened real dataset . . . . . . . . . . . . . . . . . . . . . 1716 When simply calculating plane intersections, all the room corners do get returned, but alsosome intersections which are not actual corners . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

20

References

[1] L. Dorst, D. Fontijne, and S. Mann. Geometric algebra for computer science: an object-oriented approachto geometry. Morgan Kaufmann, 2009.[2] R. I. Hartley and A. Zisserman. Multiple View Geometry in Computer Vision. Cambridge UniversityPress, ISBN: 0521623049, 2000.[3] I. Esteban, J. Dijk, and F. Groen. Automatic 3D modeling of the urban landscape. In Ultra ModernTelecommunications and Control Systems and Workshops (ICUMT), 2010 International Congress on,pages 421–428. IEEE, 2010.[4] R. Schnabel, R. Wahl, and R. Klein. Efficient RANSAC for Point-Cloud Shape Detection. In ComputerGraphics Forum, volume 26, pages 214–226. Wiley Online Library, 2007.[5] D. Borrmann, J. Elseberg, et al. The 3D Hough Transform for Plane Detection in Point Clouds: A Reviewand a new Accumulator Design. 3DR Express, 2011.[6] H. Grassmann. Die lineale Ausdehnungslehre ein neuer Zweig der Mathematik. O. Wigand, 1844.[7] R.C. Bolles and M.A. Fischler. A ransac-based approach to model fitting and its application to findingcylinders in range data. In International Joint Conference on Artificial Intelligence, pages 637–643.Citeseer, 1981.[8] L. Xu, E. Oja, and P. Kultanen. A new curve detection method: randomized hough transform (rht). PatternRecognition Letters, 11(5):331–338, 1990.[9] Microsoft photosynth. http://photosynth.net/.[10] Y. Furukawa and J Ponce. Pmvs. http://grail.cs.washington.edu/software/pmvs/.[11] B. Mishra and P. Wilson. Color edge detection hardware based on geometric algebra. In Visual MediaProduction, 2006. CVMP 2006. 3rd European Conference on, pages 115–121. IET, 2006.[12] B. Mishra and P. Wilson. Hardware implementation of a geometric algebra processor core. In IMACSInternational Conference on Applications of Computer Algebra, 2005.[13] G. Vosselman, S. Dijkman, et al. 3D building model reconstruction from point clouds and groundplans. International Archives of Photogrammetry Remote Sensing and Spatial Information Sciences,34(3/W4):37–44, 2001.

21

A case study in geometric algebra: Fitting room models to ... · 4.1 Geometric algebra When one...

Documents

Transcript of A case study in geometric algebra: Fitting room models to ... · 4.1 Geometric algebra When one...