Analytical Mechanics - Personal Web Space - UMBC

A Guided Tour of

Analytical Mechanicswith animations in MAPLE and MATHEMATICA

Rouben RostamianDepartment of Mathematics and Statistics

[email protected]

December 1, 2014

Contents

Preface vii

1 An introduction through examples 11.1 The simple pendulum à la Newton . . . . . . . . . . . . . . . . . . . . . . 11.2 The simple pendulum à la Euler . . . . . . . . . . . . . . . . . . . . . . . . 31.3 The simple pendulum à la Lagrange . . . . . . . . . . . . . . . . . . . . . . 31.4 The double pendulum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2 Work and potential energy 9Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

3 A single particle in a conservative force field 133.1 The principle of conservation of energy . . . . . . . . . . . . . . . . . . . 133.2 The scalar case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143.3 Stability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163.4 The phase portrait of a simple pendulum . . . . . . . . . . . . . . . . . . 16Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

4 The Kapitsa pendulum 194.1 The inverted pendulum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194.2 Averaging out the fast oscillations . . . . . . . . . . . . . . . . . . . . . . . 194.3 Stability analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

5 Calculus of variations 255.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

5.1.1 A straight line is the shortest path . . . . . . . . . . . . . . 255.2 The brachistochrone . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255.3 Mathematical preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

5.3.1 Basic lemmas . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275.3.2 The variation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

5.4 The central problem of the calculus of variations . . . . . . . . . . . . . 305.5 The invariance of Euler’s equation under change of coordinates . . . 325.6 The solution of Problem 5.1 . . . . . . . . . . . . . . . . . . . . . . . . . . 335.7 The solution of Problem 5.1 in polar coordinates . . . . . . . . . . . . . 345.8 The solution of Problem 5.2 . . . . . . . . . . . . . . . . . . . . . . . . . . 355.9 A variational problem in two unknowns . . . . . . . . . . . . . . . . . . 365.10 Lagrange multipliers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

iii

iv Contents

5.11 Calculus of variations with pointwise constraints . . . . . . . . . . . . . 385.12 Calculus of variations with integral constraints . . . . . . . . . . . . . . 41Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

6 Lagrangian mechanics 456.1 Newtonian mechanics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 456.2 Holonomic constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 466.3 Generalized coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 496.4 Virtual displacements, virtual work, and generalized force . . . . . . . 506.5 External versus reaction forces . . . . . . . . . . . . . . . . . . . . . . . . . 526.6 The equations of motion for a holonomic system . . . . . . . . . . . . . 53Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

7 Angular velocity 57Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

8 The moment of inertia tensor 598.1 A brief introduction to tensor algebra . . . . . . . . . . . . . . . . . . . . 59

8.1.1 Tensor algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . 598.1.2 Connection with R3 and 3× 3 matrices . . . . . . . . . . . 618.1.3 Symmetric tensors . . . . . . . . . . . . . . . . . . . . . . . . 63

8.2 The moment of inertia tensor . . . . . . . . . . . . . . . . . . . . . . . . . 648.3 Translation of the origin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 658.4 The principal moments of inertia . . . . . . . . . . . . . . . . . . . . . . . 67Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

9 Constraint reactions 69Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

10 The Gibbs-Appell formulation of dynamics 7510.1 Gibbs-Appell according to Lurie [9] . . . . . . . . . . . . . . . . . . . . . 75

10.1.1 Acceleration in generalized coordinates . . . . . . . . . . . 7510.1.2 Ideal constraints and the fundamental equation of dy-

namics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7510.1.3 Virtual work and generalized force . . . . . . . . . . . . . . 7610.1.4 Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7710.1.5 Virtual displacements . . . . . . . . . . . . . . . . . . . . . . 7810.1.6 Back to the fundamental equation: Part 1 . . . . . . . . . 7810.1.7 Back to the fundamental equation: Part 2 . . . . . . . . . 7910.1.8 The Gibbs-Appell equations of motion . . . . . . . . . . . 7910.1.9 Quasi-velocities . . . . . . . . . . . . . . . . . . . . . . . . . . 8010.1.10 Appell’s equations of motion in terms of quasi-velocities 81

10.2 Gibbs-Appell according to Gantmacher [10] . . . . . . . . . . . . . . . . 8110.2.1 Pseudocoordinates . . . . . . . . . . . . . . . . . . . . . . . . 8210.2.2 Work and generalized forces . . . . . . . . . . . . . . . . . . 8310.2.3 Newton’s equations in pseudocoordinates . . . . . . . . . 8410.2.4 The energy of the acceleration . . . . . . . . . . . . . . . . . 84

10.3 A modification noted by Desloge . . . . . . . . . . . . . . . . . . . . . . . 8510.4 The simple pendulum via Gibbs-Appell . . . . . . . . . . . . . . . . . . . 86

10.5 The Caplygin sleigh . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

Contents v

10.6 The Caplygin sleigh revisited . . . . . . . . . . . . . . . . . . . . . . . . . . 8810.7 The problem from page 63 of Gantmacher . . . . . . . . . . . . . . . . . 8910.8 Rigid body dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

10.8.1 Three frames of reference . . . . . . . . . . . . . . . . . . . . 9310.8.2 The energy of acceleration for a rigid body . . . . . . . . 93

10.9 The rolling coin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9410.9.1 The three frames . . . . . . . . . . . . . . . . . . . . . . . . . . 9410.9.2 The angular velocity . . . . . . . . . . . . . . . . . . . . . . . 9410.9.3 The no-slip constraint . . . . . . . . . . . . . . . . . . . . . . 9710.9.4 The acceleration of the coin’s center . . . . . . . . . . . . . 9710.9.5 The rotational acceleration . . . . . . . . . . . . . . . . . . . 98

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

11 Quaternions 10111.1 The quaternion algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10111.2 The geometry of the quaternions . . . . . . . . . . . . . . . . . . . . . . . 102

11.2.1 The reflection operator . . . . . . . . . . . . . . . . . . . . . 10211.2.2 The rotation operator . . . . . . . . . . . . . . . . . . . . . . 103

11.3 Angular velocity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10611.4 A differential equation for the quaternion rotation . . . . . . . . . . . . 10711.5 Unbalanced ball rolling on a horizontal plane . . . . . . . . . . . . . . . 108

11.5.1 The no-slip condition . . . . . . . . . . . . . . . . . . . . . . 10811.5.2 The Gibbs function and the equations of motion . . . . 109

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

A Maple basics 113A.1 Configuring MAPLE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113A.2 The execution group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113A.3 MAPLE key bindings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114A.4 Expression sequences, lists, and sets . . . . . . . . . . . . . . . . . . . . . . 114A.5 Selecting and removing subsets . . . . . . . . . . . . . . . . . . . . . . . . . 115A.6 Solving equations symbolically . . . . . . . . . . . . . . . . . . . . . . . . . 116A.7 Solving equations numerically . . . . . . . . . . . . . . . . . . . . . . . . . 116A.8 The eval() function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117A.9 Expressions and functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117A.10 Vectors and Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118A.11 Differentiation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119A.12 Solving differential equations . . . . . . . . . . . . . . . . . . . . . . . . . . 120

A.12.1 Solving differential equations symbolically . . . . . . . . . 120A.12.2 Solving differential equations numerically . . . . . . . . . 121

A.13 Plotting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121A.13.1 Plotting a single function . . . . . . . . . . . . . . . . . . . . 121A.13.2 Plotting multiple function together . . . . . . . . . . . . . 122A.13.3 Parametric plot . . . . . . . . . . . . . . . . . . . . . . . . . . 122A.13.4 Plotting points and more . . . . . . . . . . . . . . . . . . . . 123A.13.5 Overlaying multiple plots . . . . . . . . . . . . . . . . . . . . 123A.13.6 Reflecting a plot . . . . . . . . . . . . . . . . . . . . . . . . . . 124

A.14 The Euler–Lagrange equations . . . . . . . . . . . . . . . . . . . . . . . . . 125A.15 The animation of a simple pendulum . . . . . . . . . . . . . . . . . . . . . 125

vi Contents

Preface

By “solving a problem” I mean performing all the steps laid out below:

1. Select configuration parameters.

2. Define the position vectors r1,r2, . . . of the point masses in terms of the generalizedcoordinates q1, q2, . . . .

3. Compute the velocities of the point masses:

vi = ri =∑

j

∂ ri

∂ q j

q j , i = 1,2, . . . .

4. Compute the kinetic energy T = 12

∑

i mi‖vi‖2, the potential energy V , and theLagrangian L= T −V .

5. Form the equations of motion (a system of second order differential equations(DEs)) in the unknowns q1(t ), q2(t ), . . .:

d

d t

∂ L

∂ q j

=∂ L

∂ q j

, j = 1,2, . . . .

If done by hand, this step would be the most labor-intensive part of the calculations.The calculations can get unbearably complex and can easily lead to formulas thatfill more than one page. Fortunately we can relegate the tedious computations to acomputer algebra system (CAS) such as MAPLE or MATHEMATICA. I use MAPLE

in my own work, therefore I will use that for the purposes of this class. I believethat MATHEMATICA has the equivalent capabilities, and you are welcome to useit instead, if you so prefer. I have, however, no experience with MATHEMATICA,therefore I cannot help you there.

6. Solve the system of DEs. Except for a few special cases, such system are generally notsolvable in terms of elementary function. One solves them numerically with thehelp of specialized software. MAPLE and MATHEMATICA offer that functionalityas well.

The software replaces the continues time variable t by a closely spaced “time ticks”t0, t1, t2, . . . which span the time interval of interest, say [0,T ], and then it ap-plies some rather sophisticated numerical algorithms to evaluate the unknownsq1(t ), q2(t ), . . . at those time ticks. The result may be presented as:

(a) a table of numbers; but that’s not very illuminating, so it’s rarely done thatway;

vii

viii Preface

(b) as a set of plots of q j versus t . This is the most common way. Both MAPLE

and MATHEMATICA can do this easily; or

(c) as a computer animation, which is the most “user friendly” choice but whichtakes some work—and a certain amount of know-how—to produce. I willshow you how to do this in MAPLE.

Chapter 1

An introduction throughexamples

This chapter introduces some of basic ideas involved in Lagrangian formulation of dy-namics through examples. You will need to take some of the statements and formulas forgranted, since they won’t be formally introduced until several chapters later. The ideahere is to acquire some “gut feeling” for the subject which will help motivate some of theabstract concepts to come.

1.1 The simple pendulum à la Newton

A pendulum, specifically a simple pendulum, is a massless rigid rod of fixed length ℓ, oneend of which is attached to, and can swing about, an immobile pivot, and to the other endof which is attached a point of mass m, called the bob.1 The force of gravity tends to pullthe pendulum down so that to bring the free end to the lowest possible position, calledthe pendulum’s stable equilibrium configuration. A pendulum can stay motionless in thestable equilibrium configuration forever. If disturbed slightly away from the equilibrium,however, it will oscillate back and forth about it, indefinitely in principle if there are nofrictional/dissipative effects. Figure 1.1 shows a simple pendulum at a generic positionwhere the rod makes an angle ϕ relative to the vertical.

The pendulum may also be balanced in an inverted position, obtained by turning itupward about the pivot by 180 degrees (remember that the connecting rod is rigid.) Thatposition, which admittedly is difficult to achieve in practice, is called the pendulum’s un-stable equilibrium configuration. A pendulum can stay motionless in the unstable equilib-rium configuration forever, in principle. If disturbed slightly away from that equilibrium,however, it will move away from it in general.

The stable and unstable equilibria are the only possible equilibrium position of a sim-ple pendulum. The pendulum cannot stay motionless at an angle, say at 45 degrees, rela-tive to the vertical.

A pendulum’s initial condition completely determine its future motion. I am assum-ing here that the only external action on the pendulum is the force of gravity. The initialcondition consist of a pair of data items, one being the initial angle that the rod makesrelative to the vertical, and the other is the initial velocity which is the bob is set intomotion.

As a specific instance, consider the case where the rot’s initial angle is zero, and the

1The pendulum of a grandfather clock is a reasonably good example of such a pendulum, albeit the rod isnot massless, and the mass attached to the end of it is not literally a point mass.

1

2 Chapter 1. An introduction through examples

ℓ

ϕ

m g

x

y

ϕ

w = m gj

−τer

er

eϕ i

j

Figure 1.1: On the left is a depiction of the physical shape of the pendulum. On theright we see the pendulum’s mathematical model given by the position vec-tor r. The force of the weight has been decomposed into components par-allel and perpendicular to the motion.

bob’s initial velocity is small. Then the pendulum will oscillate back and forth about thestable configuration, similar to what we see in a grandfather clock. If the initial velocityis slightly larger, the pendulum will undergo wider oscillations. If the initial velocity islarge enough, the pendulum will not oscillate at all. It will swing about pivot, reach theunstable equilibrium position, go past it, fall down from the other side, and return to itsinitial position, having made a complete 360 degree rotation about the pivot. At this pointthe pendulum finds itself in the same condition that it had at the initial time, therefore itwill repeat what it did the first time around. In the absence of energy dissipating factors,the rotations about the pivot will continue indefinitely.

To make a mathematical model of the pendulum, we introduce the Cartesian coordi-nates xy with the origin at the pendulum’s pivot, and the y axis pointing down. We alsointroduce the unit vectors i and j along the x and y axes, and the unit vectors er along thependulum’s rod and eϕ which is perpendicular to it, as shown in Figure 1.1. The vectors

er and eϕ may be expressed as linear combinations of the vectors i and j:

er = i sinϕ+ j cosϕ, eϕ =−icosϕ+ j sinϕ.

Furthermore, let us observe that their time derivatives are related through

er = iϕ cosϕ− jϕ sinϕ,=−ϕeϕ, eϕ = iϕ sinϕ+ jϕ cosϕ,= ϕer . (1.1)

The bob’s position vector r(t ) relative to the origin is r = ℓer , therefore, the bob’svelocity v = r and acceleration a= v may be computed easily with the help of (1.1):

v = r = (ℓer )· =−ℓϕeϕ , a= v = (−ℓϕeϕ)· =−ℓϕeϕ − ℓϕeϕ =−ℓϕeϕ − ℓϕ2er .

We see that the bob’s acceleration has a component along eϕ and another along er .

Newton’s law of motion asserts that ma = F , where F is the resultant of all forcesacting on the bob. Referring to Figure 1.1 we see that the forces acting on the bob consistof weight w and the tension −τer along the rod.2 It follows that

m(−ℓϕeϕ − ℓϕ2er ) =w−τer .

2The assertion that the force exerted on the bob by the rod lies along the rod requires justification. See thenext section for elaboration.

1.2. The simple pendulum à la Euler 3

Upon replacing w with its decomposition w = m gj = (m g cosϕ)er+(m g sinϕ)eϕ, and

collecting the coefficients of er and and eϕ, we arrive at

mℓϕ+m g sinϕ

eϕ +

mℓϕ2+m g cosϕ−τ

er = 0.

Since er and eϕ are orthogonal, hence linearly independent, each of the expressions in

the square brackets is zero. We conclude that

mℓϕ+m g sinϕ = 0, mℓϕ2+m g cosϕ−τ = 0. (1.2)

The first equation is a second order differential equation in ϕ. It has a unique solutionfor any prescribed initial condition pair ϕ(0) and ϕ(0), although the solution is not ex-pressible in terms of elementary functions. In practice, one solves the equation througha numerical approximation algorithm on a computer. Once the solution is obtained, itmay be substituted in the second equation to evaluate the tension τ in the pendulum’srod, should it be of interest.

1.2 The simple pendulum à la Euler

In the previous section we assumed, without explanation, that the force within the pen-dulum’s rod points along the rod; see Figure 1.1 where that force is shown as the vector−τer .

That assumption seems to be so “obvious” that many textbooks on mechanics and itsapplications present it without as much as a comment. A close scrutiny, however, showsthat the assumption is far from obvious, and in fact, it is not a logical consequence ofany of Newton’s laws of motion. Antman [3] presents a critical analysis of this issue andconcludes that the proper approach is through an application of Euler’s law of motion,which states that the rate of change of the pendulum’s angular momentum equals theresultant torque applied to it.

1.3 The simple pendulum à la Lagrange

In this section we rederive the previous section’s differential equation of motion of simplependulum through Lagrange’s analytical approach. We no longer need the vectors er

and eϕ. Instead, we write the bob’s position vector r directly in terms of its i and j

components:r = (ℓ sinϕ)i+(ℓcosϕ)j,

and then differentiate to find the velocity

v = r = (ℓϕ cosϕ)i− (ℓϕ sinϕ)j.

It follows that that ‖v‖2 = ℓ2ϕ2.To proceed further, we introduce a few definitions and assertions whose motivations

and explanations will emerge only in subsequent chapters.

• A the kinetic energy T of a point mass m moving with velocity v is T = 12 m‖v‖2.

In the case of the pendulum this becomes T = 12 mℓ2ϕ2.

• The potential energy V of a point mass m in a constant gravitational field equalsm g h where g is the acceleration due to gravity, and h is its height above an arbitrar-ily selected reference point. In the case of the pendulum, the elevation of the bob rel-ative to the lowest point in its path is h = ℓ(1−cosϕ), therefore V = m gℓ(1−cosϕ).


• The Lagrangian L of a mechanical system is the difference between its kinetic andpotential energies, that is, L= T −V . In the case of the pendulum we have:

L(ϕ, ϕ) =1

2mℓ2ϕ2−m gℓ(1− cosϕ). (1.3)

As the notation above indicates, we are viewing the Lagrangian L as a function twovariables ϕ and ϕ. It should be emphasized that ϕ and ϕ are considered independentvariables here. If you find the notation ϕ confusing in that regard, consider renaming ittoω, as in

L(ϕ,ω) =1

2mℓ2ω2−m gℓ(1− cosϕ).

Now L is a function of two independent variables ϕ andω.The Lagrangian completely characterizes a mechanical system. It incorporates the

system’s parameters, geometry, and physics, all in one neat bundle. Beyond this point theanalysis of the system’s motion is pure calculus—or analysis, as Lagrange called it in hisMécanique Analytique—with no need to refer to the system’s components and geometry.

According to the theory that Lagrange developed, the equation of motion of a me-chanical system whose Lagrangian depends on two variables ϕ and ϕ, is given by

d

d t

∂ L

∂ ϕ

=∂ L

∂ ϕ. (1.4)

In the case of pendulum we have:

∂ L

∂ ϕ= mℓ2ϕ,

∂ L

∂ ϕ=−m gℓ sinϕ,

and therefore the equation of motion is

(mℓ2ϕ)· =−m gℓ sinϕ,

or equivalently,

ϕ+g

lsinϕ = 0, (1.5)

which agrees with the first equation in (1.2). The second of those equations may be ob-tained through the Lagrangian approach as well, but we will not get into that right now.

1.4 The double pendulum

A double pendulum is obtained by suspending a second pendulum from the bob of a firstpendulum, as shown in the left diagram in Figure 1.2. The double pendulum’s geometricconfiguration is specified through the two angles ϕ1 and ϕ2 that the rods make relative tothe vertical.

To make a mathematical model of a double pendulum, we follow the ideas sketchedin the previous section. Specifically, we introduce the xy Cartesian coordinates and theunit vectors i and j as shown in Figure 1.2, and then express the position vectors r1 andr2 of the two bobs in terms of their components relative to i and j:

r1 = (ℓ1 sinϕ1)i+(ℓ1 cosϕ1)j, r2 = r1+(ℓ2 sinϕ2)i+(ℓ2 cosϕ2)j. (1.6)

1.4. The double pendulum 5

ℓ1

m1 g

ϕ1ℓ2

m2 g

ϕ2

x

y

r1

m1 gj

ϕ1

ℓ2

m2 gj

ϕ2

i

j r2

Figure 1.2: On the left is a depiction of the physical shape of the double pendulum. Onthe right we see the pendulum’s mathematical model given by the positionvectors r1 and r2 or the two bobs.

Then we find the velocities of the bobs through differentiation:

v1 = (ℓ1ϕ1 cosϕ1)i− (ℓ1ϕ1 sinϕ1)j, v2 = v1+(ℓ2ϕ2 cosϕ2)i− (ℓ2ϕ2 sinϕ2)j.

We see that ‖v1‖2 = ℓ21ϕ

21 . Computing ‖v2‖2 takes only a little bit more work. We observe

that v2 = v1+ v, where v = (ℓ2ϕ2 cosϕ2)i− (ℓ2ϕ2 sinϕ2)j. Therefore

‖v2‖2 = ‖v1‖2+ ‖v‖2+ 2v1 · v= ℓ2

1ϕ21 + ℓ

22ϕ

22 + 2

(ℓ1ϕ1 cosϕ1)i− (ℓ1ϕ1 sinϕ1)j

·

(ℓ2ϕ2 cosϕ2)i− (ℓ2ϕ2 sinϕ2)j

= ℓ21ϕ

21 + ℓ

22ϕ

22 + 2ℓ1ℓ2ϕ1ϕ2(cosϕ1 cosϕ2+ sinϕ1 sinϕ2).

= ℓ21ϕ

21 + ℓ

22ϕ

22 + 2ℓ1ℓ2ϕ1ϕ2 cos(ϕ2−ϕ1).

We conclude that the double pendulum’s kinetic energy is

T =1

2m1ℓ

21ϕ

21 +

1

2m2

ℓ21ϕ

21 + ℓ

22ϕ

22 + 2ℓ1ℓ2ϕ1ϕ2 cos(ϕ2−ϕ1)

=1

2(m1+m2)ℓ

21ϕ

21 +

1

2m2ℓ

22ϕ

22 +m2ℓ1ℓ2ϕ1ϕ2 cos(ϕ2−ϕ1).

As to the potential energy, let us recall that a mass’s potential energy in a constantgravitational field is the product of its weight and its elevation above a certain referencepoint. In the case of a double pendulum, it is easiest to set the reference point at the originof the coordinates; see Figure 1.2. Then the j components of the vectors r1 and r2 providethe elevations of the bobs below the reference point, therefore their elevations above thereference point will require a sign reversal. Referring to (1.6) we see that

V =−m1 g cosϕ1−m2 g

ℓ1 cosϕ1+ ℓ2 cosϕ2

=−(m1+m2)g cosϕ1−m2 gℓ2 cosϕ2.

Thus, the double pendulum’s Lagrangian, L= T −V , takes the form

L(ϕ1,ϕ2, ϕ1, ϕ2) =1

2(m1+m2)ℓ

21ϕ

21 +

1

2m2ℓ

22ϕ

22 +m2ℓ1ℓ2ϕ1ϕ2 cos(ϕ2−ϕ1)

+ (m1+m2)g cosϕ1+m2 gℓ2 cosϕ2.

In the previous section’s simple pendulum, the Lagrangian L(ϕ, ϕ) was a functiontwo variables. In the present case, the Lagrangian L(ϕ1,ϕ2, ϕ1, ϕ2) is a function of four


variables. In general, if a mechanical system’s geometric configuration is specified throughn variables q1, . . . , qn , then its Lagrangian is a function of 2n variables q1, . . . , qn , q1, . . . , qn .The equivalent of the single equation of motion (1.4) now is a system of n equations, calledthe mechanical system’s Euler–Lagrange equations:

d

d t

∂ L

∂ qi

=∂ L

∂ qi

, i = 1, . . . , n.

The variable q1, . . . , qn are called the system’s generalized coordinates.Applied to the case of double pendulum, the Euler–Lagrange equations lead to

d

d t

∂ L

∂ ϕ1

=∂ L

∂ ϕ1

,d

d t

∂ L

∂ ϕ2

=∂ L

∂ ϕ2

.

To evaluate these explicitly, we begin by computing

∂ L

∂ ϕ1

= (m1+m2)ℓ21ϕ1+m2ℓ1ℓ2ϕ2 cos(ϕ2−ϕ1),

∂ L

∂ ϕ2

=m2ℓ22ϕ2+m2ℓ1ℓ2ϕ1 cos(ϕ2−ϕ1),

∂ L

∂ ϕ1

=m2ℓ1ℓ2ϕ1ϕ2 sin(ϕ2−ϕ1)− (m1+m2)g sinϕ1,

∂ L

∂ ϕ2

=−m2ℓ1ℓ2ϕ1ϕ2 sin(ϕ2−ϕ1)−m2 gℓ2 sinϕ2.

We conclude that the differential equations of motion are

(m1+m2)ℓ21ϕ1+m2ℓ1ℓ2ϕ2 cos(ϕ2−ϕ1)

·

= m2ℓ1ℓ2ϕ1ϕ2 sin(ϕ2−ϕ1)− (m1+m2)g sinϕ1,

m2ℓ22ϕ2+m2ℓ1ℓ2ϕ1 cos(ϕ2−ϕ1)

·

=−m2ℓ1ℓ2ϕ1ϕ2 sin(ϕ2−ϕ1)−m2 gℓ2 sinϕ2.

Exercises

1.1. Pendulum with a mobile pivot. Figure 1.3 shows a pendulum whose pivot isallowed to move horizontally without friction. The pivot has mass m1 while thebob has mass m2. Find the equations of motion of the pendulum.

1.2. A spherical pendulum. The motion of the simple pendulum of length ℓ intro-duced in this chapter was confined to a single vertical plane, and therefore thependulum’s bob moved along a circular arc of radius ℓ. If off-plane motions arepermitted, then the bob will move on a sphere of radius ℓ centered at the pivot. Inthat setting the pendulum is called a spherical pendulum; see Figure 1.4.Derive the equations of motion of the spherical pendulum.

1.3. Bead on a spinning hoop. A circular wire hoop of radius R spins about a verticaldiameter at a constant angular velocity Ω. A bead of mass m can slide without fric-tion along the hoop. The hoop’s radius that connects to the bead makes an angleof ϕ(t ) with respect to the vertical; see Figure 1.5. Find the differential equationsatisfied by ϕ.

Exercises 7

x

y

x

ℓ

m1

m2

ϕ

Figure 1.3: Pendulum with a horizontally mobile pivot (Exercise 1).

xy

z

ij

k

ℓ

θ

ϕ

Figure 1.4: A spherical pendulum (Exercise 2).

1.4. A governor mechanism. Figure 1.6 is a schematic drawing of a (simplified) Wattgovernor which was invented for the automatic control of the speed of steam en-gines. Our version consists of four massless rigid links of length ℓ each, hinged attheir ends to form a rhombus. The vertex O remains motionless, while the sleeveat vertex S can slide on the device’s vertical shaft, thereby change the rhomus’sshape. Two balls of mass m1 each are attached to the vertices A and B . The sleeve’smass is m2. The entire assembly rotates at a constant angular speed Ω about thevertical shaft. Find the differential equation satisfied by the angle ϕ marked on thediagram.

1.5. Two masses on a string. A particle P of mass m1 lies on a smooth horizontal tableand is attached to a long, inextensible string which passes through a smooth holeO in the table and hangs down. The other end of the string carries a particle Q ofmass m2; see the illustration in Figure 1.7.The particle P is positioned at the point (a, 0,0) in the xy z coordinates shown, andgiven a horizontal initial velocity perpendicular to the x axis. Find the differentialequations of motion.Hint: Let ρ(t ) and ϕ(t ) be P ’s position at time t in polar coordinates as seen inFigure 1.7. The equations of motions constitute a system of differential in ρ(t ) andϕ(t ) .


R

m g

ϕ

Ω

Figure 1.5: Bead on a rotating hoop (Exercise 3).

ℓ

ℓ ℓ

ℓϕ

Ω

O

A

m1

B

m1

Sm2

Figure 1.6: A simplified Watt governor (Exercise 4).

x

y

z

ρ(t )

Pm1

Q m2

ϕ(t )

Figure 1.7: The point P slides on the table. The point Q moves vertically (Exercise 5).

Chapter 2

Work and potential energy

Work is the product of force and its displacement. To be precise, the infinitesimal workdW performed in displacing a force F by an infinitesimal distance dr is dW = F · dr.If the point of the application of the force moves along a path C in space, then the workperformed along the path is the line integral

W =

∫

C

F · dr. (2.1)

If you are repositioning a massive desk in an office, for example, then work measures theamount of effort exerted by you in performing the task.

Expanding upon the moving of the desk scenario, suppose that you intend to movethe desk from a point A to a point B . It should be obvious that the amount of workperformed will vary, depending on the path along which you move the desk between Aand B . Chances are that the shortest (straight line) path will require lesser effort than along path that winds around the office.

There are many interesting and important situations where, unlike the moving of thedesk example, the work performed in goings from a point A to a point B is independent ofthe path taken between A and B . The most elementary example is the raising or loweringof a weight. To see how it works, set up a Cartesian coordinate system in space so thatthe x and y axes are horizontal, and the z axis points up. Let ra = (xa , ya , za) and rb =(xb , yb , zb ) be the position vectors3 of the starting and ending points A and B , and letr = ⟨x, y, z⟩ be the position vector of a generic point along a path C (A,B)with endpointsA and B . Suppose that we move an object of mass m along that path. The force of theobject’s weight is F = ⟨0,0,−m g ⟩, where g is the acceleration of gravity. The workperformed along the path is

W =

∫

C (A,B)

F · dr

=

∫

C (A,B)

⟨0,0,−m g ⟩ · ⟨d x, d y, d z⟩ =∫

C (A,B)

−m g d z =−m g (zb − za).

We see that the work in moving the weight from A to B is expressed in terms the z co-ordinates of the endpoints, thus it is the same on all conceivable paths that go from Ato B .

3A position vector of a point P (x , y, ) is the vector r = ⟨x , y, z⟩ that extends from the origin to the point P .

9

10 Chapter 2. Work and potential energy

To generalize, consider a (possibly position dependent) force field F (r)with the prop-erty that the work performed in going from a given point A to an arbitrary point r in spaceis independent of the path from A to r. Let us write V (r) for the negative of the value ofthat integral, that is,

V (r) =−∫

C (A,r)

F (r′) · dr′. (2.2)

The function V defined this way is called the potential function, or simply the potential, ofthe the vector fieldF . Equivalently, the vector fieldF is said to be derived from a potential.In (2.2) I have written r′ for the dummy variable of integration in order to distinguish itfrom the position vector r which designates the path’s endpoint. The minus sign doesnot have a deep significance; it’s convenient to build it into the definition since it leads tomore pleasing forms of general statements, such as the one on conservation of energy.

Theorem 2.1. Consider a continuous vector field F defined in an open and connected do-main D in the n-dimensional space, and suppose that F possesses a potential function V asin (2.2). Then V is differentiable and F (r) =−∇V (r).

Proof. By definition, the gradient ∇V of a function V at a point r is the vector with theproperty that for any unit vector e, the directional derivative of V in the direction of e isgiven by ∇V (r) ·e. That is,

∇V (r) ·e= limh→0

V (r+ he)−V (r)

h.

To simplify the discussion, let us write P and Q for the points in space correspondingto the position vectors r and r+ he, as illustrated in Figure 2.1. Pick a path C (A,r) toevaluate V at P , then extend that path as a straight line segment to Q to evaluate V at Q.Then the difference V (Q)−V (P ) amounts to an integration along the straight segmentPQ:

V (r+ he)−V (r) =V (Q)−V (P ) =−∫ h

0

F (r+ ξ e) ·edξ ,

Then it follows that

V (r+ he)−V (r)

h=− 1

h

∫ h

0

F (r+ ξ e) ·edξ =−F (r+ ξ e) ·e

for some ξ ∈ (0, h), the latter assertion being a consequence of the Mean Value Theoremfor integrals; see e.g., Stewart [4].

As h goes to zero, so does ξ because ξ ∈ (0, h). It follows that

∇V (r) ·e= limh→0

V (r+ he)−V (r)

h=−F (r) ·e.

Since this holds for every e, it follows that ∇V (r) =−F (r).

Remark 2.1. Let is point out the roles that the theorem’s technical assumptions play inthe proof:

• The continuity of the vector field F enters at the point where the Mean Value The-orem is applied. That theorem is not true without continuity.

11

A

P Q

r

r+

he

e

C(A

,r)

Figure 2.1: If V (P ) is evaluated by integration along the path C (A,r), then V (Q)maybe evaluated by integrating along that same path, and then continuing alongthe straight line segment PQ of length h in the direction e.

• The assumption that the domainD is connected is needed to ensure that a path maybe established between any pair of points in D . That’s what enabled us to sketchthe curve C (A,r) that connects the points A and P in Figure 2.1.

• The assumption that the domain D is open means that a ball of positive radius maybe placed around any point P ∈ D so that the ball lies entirely within D . It’s thatproperty which ensures that moving away from P by a small distance h, as we didin Figure 2.1, we land safely on a point Q which lies within D .

Remark 2.2. Adding a constant to the potential function V does not affect the equalityF (r) =∇V (r). Thus, a vector field’s potential, if it has one, is defined modulo an additiveconstant.

Example 2.2. Earlier in the this section we observed that the force field corresponding toan object’s weight is F = ⟨0,0,−m g ⟩. We see that F (r) = −∇V (r) where V (x, y, z) =m g z. We will use m g z as the potential of a weight throughout these notes. Note theeffect of the minus sign in (2.2); in its absence the potential of a weight would have been−m g z.

Example 2.3. In the previous example we assumed that the acceleration of gravity g is aconstant. That’s a good assumption if the changes in height during the motion are smallrelative to the radius of the Earth. In general, the gravitational force that a point mass Mexerts on a point mass m drops as the inverse square of the distance. Specifically, Newton’slaw of gravitation says

F (r) =−

GM m

‖r‖2

r

‖r‖ . (2.3)

where r is m’s position vector relative to M , and G is the universal gravitational constant.The inverse square law is manifested through the ‖r‖2 term that appears in the denomi-nator inside the parentheses. The factor r/‖r‖ is a unit vector that points from M to m.It is possible to show (see Exercise ??) that F is derived from a potential.

Theorem 2.4. Suppose the force field F is derived from a potential V . Then the work per-

12 Chapter 2. Work and potential energy

formed in moving the force along any path from a point ra to rb is given by

W =V (ra)−V (rb ). (2.4)

Proof. We have F (r) =−∇V (r) therefore

F · dr =−∇V (r) · dr =−D∂ V

∂ x1

, . . .∂ V

∂ xn

E

· ⟨d x1, . . . , d xn⟩

=−∂ V

∂ x1

d x1 + · · ·+∂ V

∂ xn

d xn

=−dV ,

therefore

W =

∫

C

F · dr =∫ rb

ra

−dV =V (ra)−V (rb ).

Exercises

2.1. Verify that the gravitational potential field F (r) in (2.3) is derived from the poten-tial

V (r) =GM m

‖r‖ .

Chapter 3

A single particle in aconservative force field

3.1 The principle of conservation of energy

Newton’s law of motion, F = mr relates the acceleration r of a point of constant massm subjected to a force F . If the force is derived from a potential V (r), that is, F =−∇V ,then the law of motion takes the form

mr =−∇V (r). (3.1)

Multiplying this through by the velocity r

mr · r =−∇V (r) · r,

and then rearranging1

2m(r · r)·+

V (r)·= 0,

we arrive atd

d t

1

2m‖r‖2+V (r)

= 0,

which tells us that the quantity

E =1

2m‖r‖2+V (r) (3.2)

remains constant during the motion. The constant E is called the particle’s mechanicalenergy (or just the energy for short). The first term on the right-hand side of (3.2) is calledthe kinetic energy; the second term is called the potential energy. The constancy of E in amotion is called the principle of conservation of energy.

Remark 3.1. The kinetic and potential energies don’t remain constant during the mo-tion; it’s their sum that does. Therefore the reduction of one is accompanied by the in-crease of the other. It helps to think of this as a conversion of one form of energy tothe other. The myriad of motion phenomena encountered in our daily experiences aremanifestations of such interplay between the kinetic and potential energies.

Remark 3.2. The conservation of the total energy E is a consequence of the assumptionthat the force fieldF is derived from a potential. This should explain the alternative name,

13

14 Chapter 3. A single particle in a conservative force field

a conservative force field, which is commonly used to refer to a force field derived from apotential.

Remark 3.3. Had we chosen against putting the minus sign in the definition (2.2),the principle of conservation of energy would have stated that the difference between thekinetic and potential energies remains constant, which is not as appealing as saying thattheir sum remains a constant.

3.2 The scalar case

The rest of this chapter is devoted to a study of the scalar version of equation (3.1), thatis,

mx =−V ′(x), (3.3)

where x is scalar, and V ′ is the derivative of a potential V . In addition to the obviousapplications in one-dimensional dynamics, this equation occurs in quite a number of othercontext which are far from one-dimensional motions. The equation of motion of a simplependulum (1.5), for instance, falls in this category, but the motion is certainly not linear.We will more on this in Section 3.4.

The previous section’s statement on conservation of energy, which in the scalar casetakes the form

E =1

2mx2 +V (x), (3.4)

is a first order differential equation in the unknown x(t ), and which may be solved, in

principle, through a separation of variables. We have x2 =2m

E −V (x)

, therefore x =

±q

2m

E −V (x)

, and hence

∫d x

Æ

2

E −V (x) =±

∫

d t =±t +C .

Expect for the most trivial cases, the integral on the left is impossible to evaluate interms of elementary functions. It is possible, however, to obtain quite an adequate “feel”for the solution, without performing any integration at all, through a phase plane analysisof the equation.

To explain the idea, consider the potential function V whose graph is shown in Fig-ure 3.1(a). Regard the solution x(t ) of the differential equation (3.3) as the abscissa of apoint P that moves along the horizontal axis in that figure according to the equation’sdynamics. Then the point Q with coordinates

x(t ),V (x(t )

slides on the graph of Vaccordingly. The length of the line segment PQ equals the potential energy V (x). We ex-tend that segment upward to a point R so the the length of QR equals the kinetic energy12 mx2. Since the sum of the kinetic and potential energies remains a constant E duringthe motion, the locus of the point R is the horizontal line V = E , as marked on the figure.

The point Q cannot rise above the horizontal line V = E during the motion because

the nonnegative length of the line segment QR (which equals 12 mx2) prevents it. Conse-

quently, the motion of Q is confined to the graph’s red-colored arc. We refer to that arc asa potential well corresponding to the energy E and we think of Q as a point that has falleninto the well and is unable to get out.

At the edges of the potential well the potential energy equals E and the kinetic en-ergy, and therefore the velocity x, are zero. In the interior, where the potential energy

3.2. The scalar case 15

x

V (x)

x

x

E

Q

P

R

12 mx2

V (x)(a)

(b)

Figure 3.1: The dynamics of the equation x +V ′(x) = 0 is completely determined bythe potential function V . The figure on top shows the graph of V (x), andan energy well corresponding to an energy level E . The coordinate x is con-fined to the energy well shown in red. Since the total energy is conserved,as the potential energy drops below E within the well, the kinetic energyincreases, resulting in the phase portrait shown in the bottom figure.

drops below E , the kinetic energy, and therefore the velocity squared, x2, are positive.We conclude that as we traverse the potential well from left to right, the velocity beginsat zero, increases gradually (in absolute value) to a maximum, then drops back to zero atthe rightmost end. The sign of the velocity may be positive or negative since the onlyinformation we are getting from Figure 3.1(a) is on the square of the velocity.

This observation leads to the red oval in the diagram shown in Figure 3.1(b). Thehorizontal axis in that figure is the same as the x in Figure 3.1(a). The vertical axis isthe velocity x . Observe that at the leftmost and rightmost points of the oval, whichcorrespond to the extremes of the potential well, the velocity is zero, and in between inrises to a maximum (or falls to a minimum), as we expect. The oval is symmetric withrespect to the x axis because a given x2 yields two velocities ±|x |.

The red oval constructed in the previous discourse depends on the choice of the energylevel E . It should be clear that lowering E slightly will shrink the oval, and raising Eslightly, will expand it. The black curves in Figure 3.1(b) are the result of selecting variousvalues of E .

Figure 3.1(b) is called the phase diagram or phase portrait of the differential equation (3.3).The curves in it are called orbits. An alternative to the geometric construction of the or-bits carried out above, we may equally well view them as implicitly defined curves in thex–x plane through the equation (3.4). Varying the parameter E produces the family of allorbits, some of which are shown in Figure 3.1(b).

Yet another way of viewing the orbits is as level curves of the the surface defined bythe function

E(x, x) =1

2mx2 +V (x).

Two views of the surface E(x, x) corresponding to the potential V (x) of Figure 3.1(a) are


Figure 3.2: Two views of the energy surface E(x, x) corresponding to the potential V (x)of Figure 3.1(a). The graph on the right has been turned upside down tobring out some of the details which are hidden in the conventional view onthe left.

shown in Figure 3.2.

3.3 Stability

We have seen that the dynamics of the equation (3.3) dictate that a hypothetical particletrapped in an energy well cannot escape. The closer the energy level E is to the bottomof the well, the lesser room there is for the particle to maneuver.4 In the extreme casewhen the particle’s energy matches that of the well’s bottom, the particle cannot moveat all. We express this by saying that the bottom of the well is a stable equilibrium pointof the differential equation (3.3). If the energy level is increases just slightly, the particlewill move along an oval around the equilibrium point. Referring to the construction ofthe phase portrait from the potential V , it should be evident that a local minimum ofV corresponds to a stable equilibrium. Similarly, a local maximum of V corresponds to asaddle on the energy surface (see Figure 3.2) therefore a local maximum of V correspondsto an unstable equilibrium.

3.4 The phase portrait of a simple pendulum

The previous section’s treatment of the scalar equation of motion has greater applicabilitythan it may appear on the surface. The variable x(t ) there need not be the coordinate of amoving point along a straight line. For instance, according to (1.5) on page 4, the motionof a simple pendulum is described by the differential equation

ϕ+g

ℓsinϕ = 0, (3.5)

whereϕ = ϕ(t ) is the angle the the pendulum’s rod makes relative to the downward point-ing vertical, as depicted in Figure 3.3 on the left. Although the motion of the pendulumtakes places in two dimensions, the equation of motion (3.5) is exactly of the form (3.3)with V ′(ϕ) = g/ℓ sinϕ, hence V (ϕ) = g/ℓ(1− cosϕ), therefore the analysis of the pre-

4I am assuming a round-bottomed energy well here. In a flat-bottomed energy well the particle can movearound no matter how shallow the well.

Exercises 17

ℓϕ

m g

ϕ

V (ϕ) = 1− cosϕ

0−2π −π π 2π

1

2

ϕ

ϕ

Figure 3.3: The pendulum is shown on the left. The graph of the potential functionV (ϕ) = g/ℓ(1− cosϕ) (with g/ℓ = 1) is shown at top right. The corre-sponding phase portrait is shown at bottom right. The function V has aperiod of 2π, therefore it would have sufficed to limit the plots to the range−π≤ ϕ ≤π. Outside of that range, things repeat by periodicity.

vious section applies.5 The graph of V (ϕ) and the resulting phase portrait, constructedaccording to the previous section’s guidelines, are shown in Figure 3.3 on the right.

The configuration of a pendulum is completely specified by the angle ϕ at any instant.The configuration corresponding to ϕ + 2kπ is exactly the same as that of ϕ for anyinteger k. In other words, the pendulum’s configuration is determined by ϕ mod 2π. Inparticular, the left and right edges of the phase portrait in Figure 3.3 correspond to thesame configuration. This is best visualized by wrapping the phase portrait into a cylinderand gluing the left and right edges together. This is illustrated in Figure 3.4.

Exercises

3.1. Analyze the stability of the spinning hoop of Exercise 3. Show that the lowerequilibrium is stable if Ω is small, and unstable if it is large. Find the value of Ωwhere the transition takes place.

3.2. Plot representative phase portraits for the two cases of the problem above.

5As noted earlier, the potential function is defined within an additive arbitrary constant, therefore the “1”in 1− cosϕ is immaterial; its inclusion makes V (0) = 0 which is nice but of no special consequence.


Figure 3.4: Two views of the pendulum’s phase portrait as wrapped into a cylinder toemphasize that the pendulum’s configuration depends on ϕ mod 2π.

Chapter 4

The Kapitsa pendulum

4.1 The inverted pendulum

The pendulum in Figure 4.1 consists of a massless rod of length ℓ, a point mass m as thebob, and a pivot which oscillates vertically according to y = a cosωt , where a andω areprescribed constants. In Exercise 1 you will show that the equation of motion is

ϕ+g

ℓsinϕ+

aω2

ℓsinϕ cosωt = 0, (4.1)

where ϕ is the angle of the rod relative to the vertical, as shown.If the amplitude a of the pivot’s oscillation is small, and if ω is not too large, then

we expect the system to behave similar to an ordinary pendulum, albeit with somewhatjittery oscillations. In particular, the lower equilibrium ϕ = 0 would be stable and theupper equilibrium ϕ = π would be unstable. A graph of the the solution ϕ(t ) of thependulum’s equation of motion with smallish a andω is shown on Figure 4.1.

It is the purpose of this chapter to show that asω is increases beyond a critical thresh-old, the pendulum’s behavior changes drastically. Specifically, the lower equilibrium be-comes unstable and the upper equilibrium becomes stable. Thus, the pendulum turnsaround by 180 degrees, points upward, and oscillates about the ϕ =π position! Figure 4.2shows the solution of the pendulum’s equation of motion for a relatively fast ω. Notethat the oscillation now take place about the upper equilibrium ϕ =π. The pendulum isstanding upright, pointing up!

4.2 Averaging out the fast oscillations

To explain this interesting phenomenon, Introduce the a hypothetical “nominal rod”which connects the origin to the bob, and let ψ be its angle relative to the vertical, asshown in the schematic diagram in Figure 4.1. The let δ = ϕ−ψ be the angle betweenthe real rod and the nominal rod. According to the Law of Sines applied to the triangleshown in the figure, we have:

sinψ

ℓ=

sinδ

a cosωt

that is,

sinδ =a

ℓsinψcosωt .

19

20 Chapter 4. The Kapitsa pendulum

x

y

δ

ψa cosωt

pivot’s nominal position

m g

ℓ

rod’s nominal position

ϕ

−π6

π6

ϕ(t )

t

Figure 4.1: The pivot oscillates vertically according to a cosωt about the pendulum’snominal pivot. In the schematic diagram on the left the pivot’s displacementis exaggerated; we assume that a/ℓ is very small in our computations. Whenthe pendulum’s arm makes a “nominal” angle ψ with the vertical, the angleactually oscillates rapidly in the range ψ±δ . The graph on the right is thatof the angle ϕ(t ) obtained by solving the differential equation (4.1) withparameters ℓ= 1, g = 1, a = 0.05, andω = 20, and initial conditionsϕ(0) =5π/6, ϕ(0) = 0. The oscillation about the lower equilibrium position (ϕ =0) is stable since ω2 < 2gℓ/a.

ϕ(t )

t

π4

π2

3π4

π

5π4

Figure 4.2: The solution of the differential equation (4.1) with parameters ℓ= 1, g = 1,a = 0.05, and ω = 40, and initial conditions ϕ(0) = π/6, ϕ(0) = 0. Theoscillation about the upper equilibrium position (ϕ =π) is stable sinceω2 >2gℓ/a. The figure on the right is an enlarged copy of a portion of the graphon the left. We see that ϕ(t ) consists of high-frequency, small-amplitudeoscillations riding on a slowly oscillating function.

Then the assumption a ≪ ℓ implies that sinδ ≪ 1, therefore sinδ ≈ δ. We concludethat

δ ≈ a

ℓsinψcosωt . (4.2)

We are going to need δ’s second derivative soon, so let’s calculate it right now. We

4.2. Averaging out the fast oscillations 21

have:

δ ≈ a

ℓ

ψcosψcosωt −ω sinψ sinωt

,

δ ≈ a

ℓ

ψcosψcosωt − ψ2 sinψcosωt − 2ωψ cosψ sinωt −ω2 sinψcosωt

.

We are interested in high frequency oscillations of the pivot, that is, ω≫ 1. Therefore,the term withω2 in the expression above dominates the rest. We conclude that

δ ≈−aω2

ℓsinψcosωt .

Now we go to the differential equation (4.1) and replace ϕ by ψ+δ, and replace sinϕwith its Taylor series approximation

sinϕ = sin(ψ+δ)≈ sinψ+δ cosψ.

We get:

ψ+ δ +g

ℓ

sinψ+δ cosψ

+aω2

ℓ

sinψ+δ cosψ

cosωt = 0.

We multiply out everything and replace δ with the expression obtained above, and arriveat

ψ− aω2

ℓsinψcosωt+

g

ℓsinψ+

g

ℓδ cosψ+

aω2

ℓsinψcosωt+

aω2

ℓδ cosψcosωt = 0.

The second and fifth terms cancel, leaving us with

ψ+g

ℓsinψ+

g

ℓδ cosψ+

aω2

ℓδ cosψ sinωt = 0.

Then we substitute for δ from (4.2):

ψ+g

ℓsinψ+

a g

ℓ2sinψcosψcosωt +

a2ω2

ℓ2sinψcosψcos2ωt = 0. (4.3)

In the graphs of ϕ(t ) in figures 4.1 and 4.2 we see that the period 2π/ω of the pivot’soscillations is much smaller than the oscillations of the pendulum itself. Consequently,within one such time period, the value of ψ and its derivatives are essentially constants.On the basis of this observations, we average (4.3) over one 2π/ω period, where we regardψ as constant. Since

1

2π/ω

∫ 2π/ω

0

cosωt d t = 0,1

2π/ω

∫ 2π/ω

0

cos2ωt d t =1

2,

we get

ψ+g

ℓsinψ+

a2ω2

2ℓ2sinψcosψ= 0,

or the equivalent

ψ+g

ℓ

h

1+a2ω2

2ℓgcosψ

i

sinψ= 0. (4.4)


In comparison with the equation (3.5) of the motion of an ordinary unforced pendu-lum, the Kapitsa pendulum sees an “effective acceleration of gravity” given by

gh

1+a2ω2

2ℓgcosψ

i

.

If the pendulum’s motion is in the 0<ψ<π/2 regime, the quantity in the square bracketsis greater than 1, therefore vibrating the support is tantamount to increasing the acceler-ation of gravity.6 If, however, the pendulum’s motion is in the π/2 < ψ < π regime,then the effective acceleration of gravity may become negative if the coefficient of cosψis sufficiently large. The latter will happen ifω is sufficiently large. That’s tantamount toreversing the direction of gravity, which sort of explains why the pendulum turns upright.

4.3 Stability analysis

The effective equation of motion (4.4) of the Kapitsa pendulum is of the type (3.3) whichwas studied in Chapter 3. Comparing the two, we see that

V ′(ψ) =g

ℓsinψ+

a2ω2

2ℓ2sinψcosψ, (4.5)

whence

V (ψ) =g

ℓ(1− cosψ)+

a2ω2

4ℓ2sin2ψ (4.6)

The analysis presented in Chapter 3 is based entirely on the shape of V ’s graph. Thereforewe proceed to analyze the shape.

The equation’s equilibria are the roots of the equation V ′(ψ) = 0. Upon factorizingthe equation as

h g

ℓ+

a2ω2

2ℓ2cosψ

i

sinψ

we see that the roots are the solutions of

sinψ= 0 and cosψ=− 2g l

a2ω2.

The first equation yields ψ = 0 and ψ = π as roots. (It suffices to look for roots in the

0≤ψ≤π range.) The second equation yields a root ψ given by

ψ= cos−1

− 2g l

a2ω2

(4.7)

if and only if ω2 > 2gℓ/a2. (If that ratio equals to 1 then the root is π, which duplicateswhat we have already found.)

Table 4.1 lists the critical points of the function V (ψ), along with the values of V , V ′,and V ′′ at those points. We see that:

• V ′′(0) > 0 regardless of the parameter values, therefore the hanging-down equilib-rium, ψ= 0, is always stable;

6Don’t take this literally; the acceleration of gravity in (3.5) is a constant while the effective acceleration ofgravity in (4.4) depends on cosψ, therefore is not a constant.

Exercises 23

ψ 0 ψ π

V (ψ) 0 V (ψ)2g

ℓ

V ′(ψ) 0 0 0

V ′′(ψ)g

l

ha2ω2

2gℓ+ 1

i

−a2ω2

2ℓ2sin2ψ

g

l

ha2ω2

2gℓ− 1

i

Table 4.1: The analysis of the critical points of the function V (ψ) defined in (4.6). The

critical point ψ, defined in (4.7) exists if and only ifω2 > 2gℓ/a2.

Figure 4.3: Two representative graphs of the function V (ψ) in (4.6) with the parametersℓ = 1, g = 1, a = 0.05. On the left we have taken ω = 20, which leads to2g l/(a2ω2) = 2 > 1. This corresponds to a stable equilibrium at ψ = 0and an unstable equilibrium at ψ = π. On the left we have taken ω = 40,which leads to 2g l/(a2ω2) = 1/2< 1. This corresponds to stable equilibriaat ψ= 0 and ψ=π, and an unstable equilibrium at ψ= 2π/3.

• V ′′(ψ)< 0 regardless of the parameter values, therefore the equilibrium ψ= 0, if itexists, is unstable; and

• if ω2 < 2gℓ/a2 then V ′′(π) < 0, therefore the inverted equilibrium, ψ = π, isunstable; but if ω2 > 2gℓ/a2 then V ′′(π) > 0, therefore the inverted equilibrium,ψ=π, is stable.

Figure 4.3 shows the graphs of V (ψ) for two representative cases. The graphs areplotted over the range [−2π, 2π] to give a clear sense of their nature; only the range [0,π]is of true relevance to us.

Exercises

4.1. Derive the equation of motion (4.1) of Kapitsa’s pendulum.


4.2. Horizontally oscillating pivot. Consider a pendulum similar to Kapitsa’s, butwhose pivot oscillates horizontally rather than vertically. Derive the equation ofmotion and do a stability analysis.

Chapter 5

Calculus of variations

5.1 Introduction

5.1.1 A straight line is the shortest path

That a straight line is the shortest path that connects a pair of points in the Euclidean planeseems so obvious that it hardly needs elaboration. It is somewhat surprising, therefore,that when the question is formulated as a minimization problem, the conclusion, thoughtrue, is far from immediate. Let us look into the details.

Fix two points P1(x1, y1) and P2(x2, y2) in the Cartesian coordinate plane, where x2 >x1, and consider a function f whose graph goes through the points P1 and P2. The lengthof the graph is given by the usual formula from calculus:

ℓ( f ) =∫ x2

x1

Æ

1+ f ′(x)2 d x. (5.1)

The notation ℓ( f ) emphasizes that the length depends on the choice of the function f .Now the question of the path of minimal length between two points may stated as follows:

Problem 5.1. Find a function f whose graph goes through the points P1 and P2, andwhich minimizes the length ℓ( f ).

Calculus of variations is the branch of mathematics that deals with that and similarquestions.

5.2 The brachistochrone

Set up a Cartesian coordinate system in a vertical plane, with a horizontal x axis and adownward pointing y axis. Fix a point P2(x2, y2) as shown in Figure 5.1 and consider asmooth function f whose graph passes through the point P1 at the origin and P2. Thinkof the curve shown there as a slide in a playground. A particle slides without friction fromP1 (with zero initial velocity) and arrives at P2. Can we tell how long it takes?

Yes. The particle’s position vector, relative to the origin of the coordinate systemis r =

x, f (x)

, therefore its velocity is v = r =

x , f ′(x)x

, where a superposed dot

indicates the time derivative. It follows that ‖v‖2 =

1+ f ′(x)2

x2, therefore the particle’s

25

26 Chapter 5. Calculus of variations

x

y

P1(0,0)

P2(x2, y2)

Figure 5.1: How long does it take to slide from P1 to P2?

mechanical energy, which is the sum of its kinetic and potential energies, is given by

E =1

2m

1+ f ′(x)2

x2−m g f (x),

where m is the particle’s mass, and g is the acceleration of gravity. In particular, since theinitial velocity is zero, the mechanical energy at the initial time zero. Furthermore, sincethere is no friction, mechanical energy is conserved, therefore

1

2m

1+ f ′(x)2

x2−m g f (x) = 0 for all t .

This is a first order differential equation for the unknown x(t ). To solve, we isolate x :

x =±√√√ 2g f (x)

1+ f ′(x)2.

Let us consider the forward motion of the particle only, therefore select the “+” sign.Since x = d x/d t , we get

d x

d t=

√√√ 2g f (x)

1+ f ′(x)2.

Separate the variables:

d t =

√√√1+ f ′(x)2

2g f (x),

and integrate between t = 0 when x = 0, and t = T when x = x2. We have

∫ T

0

d t =

∫ x2

0

√√√1+ f ′(x)2

2g f (x)d x.

We conclude that the time T to go from P1 to P2 is given by

T ( f ) =

∫ x2

0

√√√1+ f ′(x)2

2g f (x)d x, (5.2)

where we have written T as T ( f ) to emphasize the dependence of the elapsed time on thefunction f , that is, on the slide’s shape. This leads to the following question, known asthe brachistochrone problem:

5.3. Mathematical preliminaries 27

Problem 5.2. Find a function f whose graph goes through the points P1 and P2, andwhich minimizes the travel time T ( f ).

In other words, we are looking for the shape of a slide that will let us slide from P1 toP2 in the fastest possible way.

A historical remark

The brachistochrone problem (from the Greek brachistos+chronos; brachistos = short-est, chronos= time) was proposed by Johann Benoulli in 1696 in Acta Eruditorum, one ofthe earliest Eupopean scientific journals. His solution, along with those of several otherprominent mathematicians, was published in the journal in the following year. The var-ious techiniques developed for analyzing the problem were distilled, decades later, in thecapable hands of Euler and Lagrange, into what became known as the calculus of varia-tions.

5.3 Mathematical preliminaries

Before moving on to the analysis of the problems introduced in the previous section, andthat of the calculus of variations in general, we collect here a few elementary mathematicalconcepts and results which facilitates the subsequent treatment. The presentation herefollows closely that of Gelfand and Fomin [5].

5.3.1 Basic lemmas

We write C [a, b ] for the set of continuous functions defined on the closed interval [a, b ].For any nonnegative integer n, we write C n[a, b ] for the the set of those functions inC [a, b ] whose derivatives of order up to n are in C [a, b ]. In particular, C 0[a, b ] =C [a, b ].

Lemma 5.1. If α ∈C [a, b ] and if

∫ b

a

α(x)h ′(x)d x = 0 for all h in

C 1[a, b ] : h(a) = h(b ) = 0

, (5.3)

then α is constant on [a, b ].

Proof. Define the constant c through

∫ b

a

α(x)− c

d x = 0,

and let

h(x) =

∫ x

a

α(s)− c

d s .

Then h meets the requirements of (5.3). We also have h ′(x) = α(x)− c , therefore

∫ b

a

α(x)− c2

d x =

∫ b

a

α(x)− c

h ′(x)d x =

∫ b

a

α(x)h ′(x)d x −∫ b

a

c h ′(x)d x = 0,

whence α≡ c .


Lemma 5.2. Let α and β be in C [a, b ] and

∫ b

a

α(x)h(x)+β(x)h ′(x)

d x = 0 for all h in

C 1[a, b ] : h(a) = h(b ) = 0

. (5.4)

Then β is differentiable, and β′(x) = α(x) for all x in [a, b ].

Proof. Let

A(x) =

∫ x

a

α(s)d s ,

then calculate as follows, applying integration by parts:

∫ b

a

α(x)h(x)d x =

∫ b

a

A′(x)h(x)d x =A(x)h(x)

b

a−∫ b

a

A(x)h ′(x)d x

=−∫ b

a

A(x)h ′(x)d x.

Therefore (5.4) may be written as

∫ b

a

−A(x)+β(x)

h ′(x)d x = 0 for all h in

C 1[a, b ] : h(a) = h(b ) = 0

.

Then from Lemma 5.1 it follows that

−A(x)+β(x) ≡ constant on [a, b ],

which implies, in particular, that β ∈ C 1[a, b ] since A∈ C 1[a, b ]. Upon differentiationwe see that β′(x) =A′(x) = α(x).

5.3.2 The variation

An affine subspace is a subset of a linear spaceX obtained by translating a subspace ofXparallel to itself. The illustration in Figure 5.2 shows a subspaceN ofX which has beentranslated to produce the affine subspace A . Thus, if x1 and x2 are inA , then x1 − x2

is in N . Equivalently, if x0 ∈ A , then every x ∈ A is of the form x = x0 + h, whereh ∈N .

Definition 5.3. LetX ,N , andA be as above, and suppose thatX is equipped with a norm‖ · ‖. Consider the (generally nonlinear) function J :A → R. The variation of J at a pointx ∈A is a linear function Jx :N →R, if one exists, with the property that

J (x + h) = J (x)+ Jx (h)+ ε‖h‖, for all h ∈N , (5.5)

where ε→ 0 as ‖h‖→ 0.

Remark 5.1. The variation of a function is closely related to the derivative. In fact, thetwo concepts coincide ifN =X .

Remark 5.2. The traditional notation for the variation of J is δJ . However the δsymbol is assigned too many diverse roles in in mechanics. I prefer the more innocuous

5.3. Mathematical preliminaries 29

X

O

N

A

Figure 5.2: The oval represents a normed space X with O as the origin. The affinesubpaceA is obtained by a parallel translation of the subspaceN .

notation Jx , which has the additional advantage that it makes explicit the point x at whichit is evaluated.

Theorem 5.4. The variation of a function, if it exists, is unique.

Proof. Suppose that Jx and Jx are both variations of J at x. Then

J (x + h) = J (x)+ Jx (h)+ ε1‖h‖,J (x + h) = J (x)+ Jx (h)+ ε2‖h‖,

for all h ∈N . It follows that

Jx(h)− Jx (h) = (ε1− ε2)‖h‖

for all h inN .Fix h ∈N arbitrarily, and set hn = h/n. Then we have

Jx(hn)− Jx (hn) = (ε1− ε2)‖hn‖,

which has two consequences. First, by the linearity of Jx and Jx , and the homogeneity ofthe norm we have:

Jx(hn) =1

nJx (h), Jx(hn) =

1

nJx (h), ‖hn‖=

1

n‖h‖,

therefore

Jx(h)− Jx (h) = (ε1− ε2)‖h‖.Second, h/n → 0 as n →∞, therefore ε1 and ε2 go to zero as n →∞. It follows that

Jx(h)− Jx (h) = 0 for all h ∈N , therefore Jx = Jx .

Definition 5.5. Let J :A → R be as in Defintion 5.3. We say J is stationary at x ∈A ifJx(h) = 0 for all h ∈N .


Note that if J is stationary at x, then it follows from (5.5) that

J (x + h)− J (x)

‖h‖ = ε→ 0, as h→ 0,

which corresponds to the familiar statement from elementary calculus which says that afunction is stationary when its derivative is zero.

5.4 The central problem of the calculus of variations

The minimum length problem of subsection 5.1.1, and the minimum time problem ofsubsection 5.2, are special cases of the following general problem.

LetX =C 1[a, b ] equipped with the norm

‖y‖1 = maxa≤x≤b

|y(x)|, |y ′(x)|

, y ∈X . (5.6)

Define the subspaceN ofX through

N =

y ∈X : y(a) = y(b ) = 0

, (5.7)

and the affine subspaceA ofX through

A =

y ∈X : y(a) =A, y(b ) = B

, (5.8)

where A and B are given. Note that A is a parallel translation of N ; the difference ofany two elements ofA is inN .

Let’s say we are given a function F : R3→R, which we will assume is twice continu-ously differentiable. Define the function J :A →R through

J (y) =

∫ b

a

F

x, y(x), y ′(x)

d x, y ∈A , (5.9)

and then introduce the central problem of the calculus of variations:

Problem 5.3. Find y ∈A so that J is stationary at y.

Remark 5.3. The function J is defined over the domain A which consists of a setof functions. This is unlike the common functions studied in elementary calculus wherefunction are typically defined over R or Rn . In the modern theory of functions, a functionis a mapping from a set (the domain) to another set (the range). The domain and rangemay be sets of any type, therefore the fact that the domain of J is a set of functions is of noway special. However, in the historical context in which the calculus of variations arose,a function whose domain was anything other than Rn was an oddity, therefore such afunction was called a functional, implying a “function-like object”. In modern times thereis no particular justification for such a special terminology, nevertheless functions like Jare often called functionals, carrying on the tradition.

Remark 5.4. Section 5.1.1’s curve of minimal length (equation (5.1)) corresponds to

F (x, y, y ′) =p

1+ y ′2,

5.4. The central problem of the calculus of variations 31

and Section 5.2’s brachistochrone problem (equation (5.2)) corresponds to

F (x, y, y ′) =

√√√1+ y ′2

2g y.

Note that we are using y ′ as a formal argument here—the prime in y ′ does not mean thederivative. The two functions F shown above could have been given equally well as

F (x, y, z) =p

1+ z2, F (x, y, z) =

√√√1+ z2

2g y,

but the y ′ notation is a useful reminder that the y ′ is a placeholder for the actual derivativey ′(x) in the integrand of in J in (5.9).

Theorem 5.6. Consider the function J :A →R defined in (5.9). If J is stationary at y ∈A ,then

Fy −d

d xFy ′ = 0, (5.10)

where a subscript to F means the partial derivative of F with respect to the correspondingargument.

Proof. Let us say J is stationary at y ∈A . From the Taylor series expansion of F abouty:

F (x, y + h, y ′+ h ′) = F (x, y, y ′)+ Fy (x, y, y ′)h + Fy ′ (x, y, y ′)h ′

+1

2Fyy (x, y, y ′)h2+ Fyy ′ (x, y, y ′)h h ′+

1

2Fy ′y ′ (x, y, y ′)h ′2+ · · · .

Integrating over [a, b ] we obtain

J (y + h) = J (y)+

∫ b

a

Fy (x, y, y ′)h + Fy ′ (x, y, y ′)h ′

d t + ε‖h‖1,

where the norm ‖ · ‖1 is defined in (5.6), and ε → 0 as ‖h‖1 → 0. Then it follows fromDefintion 5.3 that the variation of J is given by

Jy (h) =

∫ b

a

Fy (x, y, y ′)h + Fy ′ (x, y, y ′)h ′

d t .

Since J is stationary at y, we have∫ b

a

Fy (x, y, y ′)h + Fy ′ (x, y, y ′)h ′

d t = 0, for all h inN .

Then according to Lemma 5.2, Fy ′ (x, y, y ′) is differnetiable and dd x

Fy ′ (x, y, y ′) = Fy (x, y, y ′).

Remark 5.5. Equations (5.10) is called the Euler equation corresponding to the functionJ . In expanded form it looks like this:

∂ F

∂ y

x,y(x),y ′ (x)−

d

d x

∂ F

∂ y ′

x,y(x),y ′(x)

= 0, x ∈ [a, b ].


This is a second order differential equation in y. It is to be solved for y ∈A which suppliesthe boundary conditions y(a) =A and y(b ) = B in accordance with (5.8).

In the frequently occurring special case where the function F (x, y, y ′) in (5.9) has noexplicit dependence on x, that is, when J is of the form

J (y) =

∫ b

a

F

y(x), y ′(x)

d x, y ∈ S, (5.11)

then the second order differential (5.10) may be reduced to a first order equation, thusresulting in significant simplification. We state this as the following:

Theorem 5.7. If y is a stationary point of J defined in (5.11), then

F − y ′Fy ′ =C (5.12)

where C is a constant.

Proof. To show that the left-hand side of (5.12) is a constant, let us compute its derivative:

d

d x

F

y(x), y ′(x)

− y ′(x)Fy ′

y(x), y ′(x)

= Fy

y(x), y ′(x)

y ′(x)+ Fy ′

y(x), y ′(x)

y ′′(x)

− y ′′(x)Fy ′

y(x), y ′(x)

− y ′(x)d

d x

Fy ′

y(x), y ′(x)

.

The second and third terms on the right-hand side cancel, and we are left with

d

d x

F

y(x), y ′(x)

− y ′(x)Fy ′

y(x), y ′(x)

=

Fy

y(x), y ′(x)

− d

d x

Fy ′

y(x), y ′(x)

y ′(x).

The right-hand side is zero due to (5.10), completing the proof.

5.5 The invariance of Euler’s equation under change ofcoordinates

A remarkable property of Euler’s equation is its invariance under change of coordinatesin the following sense. Consder a change of variables from (x, y) to (u, v), where

x =X (u, v), y =Y (u, v),

and suppose that the transformation is smooth and one-to-one, which in particular impliesthat Jacobian of the transformation is nonzero:

det

Xu XvYu Yv

6= 0.

5.6. The solution of Problem 5.1 33

A subscript indicates partial derivative, as in Xu =∂ X∂ u

. A curve given by the equationy = y(x) in the xy-plane corresponds to a curve v = v(u) in the uv-plane. In particular,we have

d x =Xu d u +Xv d v, d y = Yu d u +Yv d v,

which has two immediates consequences. First, we have d x = (Xu +Xv v ′)d u, and sec-ond,

y ′ =d y

d x=

Yu d u +Yv d v

Xu d u +Xv d v=

Yu +Yv v ′

Xu +Xv v ′.

With the aid of these, we change the variables from x and y to u and v in (5.9):

∫ b

a

F

x, y(x), y ′(x)

d x =

∫ b

a

F

X (u, v),Y (u, v),Yu +Yv v ′

Xu +Xv v ′

(Xu +Xv v ′)d u,

where a and b are the images of a and b under our transformation. Thus, we set

F (u, v, v ′) = F

X (u, v),Y (u, v),Yu +Yv v ′

Xu +Xv v ′

(Xu +Xv v ′),

and define

J (v) =

∫ b

a

F (u, v, v ′)d v.

Remark 5.6. The rest of the argument requires the introduction of the concept ofvariational derivative which we haven’t done. See [5, Section 8] for the rest. But for now,perhaps the following heuristic argument will do.

Review Section 5.4 to verify for yourself that neither the statement of the problem (5.9),nor its solution (5.10) refer to any particular coordinate system. Therefore the x and ynotation notwithstanding, the entire treatment is independent of a choice of coordinates.For instance, if F (θ,ρ,ρ′) is constructed in polar coordinates (θ,ρ), then the stationarypoint is given by the differential equation

Fρ−d

dθFρ′ = 0.

This is an absolutely crucial observation in the development of the theory of Lagrangianmechanics.

5.6 The solution of Problem 5.1

To illustrate the previous section’s general development, let us apply it to solve Prob-

lem 5.1. In view of to (5.1), we set F (x, y, y ′) =p

1+ y ′2, therefore

∂ F

∂ y= 0,

∂ F

∂ y ′=

y ′p

1+ y ′2,

whenced

d x

∂ F

∂ y ′

x,y(x),y ′ (x)

=d

d x

y ′(x)

p

1+ y ′(x)2

,


and the differential equation (5.10) takes the form

d

d x

y ′(x)

p

1+ y ′(x)2

= 0.

Solving this is quite straightforward. The derivative of the parenthesized quantity is zero,therefore the parenthesized quantity is a constant, say c :

y ′(x)p

1+ y ′(x)2= c .

It follows that y ′(x) = cp

1+ y ′(x)2, therefore y ′(x)2 = c2(1 + y ′(x)2), whence (1 −c2)y ′(x)2 = c2, which implies that y ′(x) is constant, therefore the graph of y(x) is a straightline, as asserted.

Remark 5.7. Since F =p

1+ y ′2 has no explicit dependence on x in this case, we couldhave solved the problem by applying the alternative formula (5.12) which would have leadto

p

1+ y ′2− y ′y ′

p

1+ y ′2=C .

This simplifies to Cp

1+ y ′2 =−1, whence we conclude that y ′ is a constant, as before.

5.7 The solution of Problem 5.1 in polar coordinates

In Section 5.5 we noted that the Euler equations are independent of the choice of coor-dinates. To illustrate this, let us solve Problem 5.1, that is, determining the shortest pathbetween two points, in polar coordinates.

Thus, let ρ(θ) be a function in the polar coordinates (θ,ρ) so that its graphs passesthrough the given points P1(θ1,ρ1) and P2(θ2,ρ2). In calculus it is shown that the path’slength is obtained through the formula

ℓ(ρ) =∫ θ2

θ1

Æ

ρ(θ)2+ρ′(θ)2 dθ,

therefore F (θ,ρ,ρ′) =p

ρ2+ρ′2. Note that F has no explicit dependence on θ. Fromhere we get

∂ F

∂ ρ′=

ρ′p

ρ2+ρ′2,

therefore the Euler equation (5.12) takes the form

p

ρ2+ρ′2− ρ′2p

ρ2+ρ′2= c .

Multiplying the equation through byp

ρ2+ρ′2 and simplifying, we get cp

ρ2+ρ′2 = ρ2,

whence c2(ρ2 + ρ′2) = ρ4, and consequently, cρ′ = ρp

ρ2− c2, which may be solvedthrough separation of variables:

∫cdρ

ρp

ρ2− c2=

∫

dθ = θ+ k ,

5.8. The solution of Problem 5.2 35

where k is the constant of integration.The integral on the left-hand side is may be evaluated with the help of the change of

variables ρ= c/u. We have dρ=−c/u2 d u, therefore∫

cdρ

ρp

ρ2− c2=

∫c(− c

u2 )d u

cu

Æ

( cu)2− c2

=−∫

d up1− u2

=−cos−1 u =−cos−1 c

ρ.

We conclude that the solution of the differential equation is = cos−1 cρ = θ+ k, that is,

c

ρ= cos(−θ− k) = cos(θ+ k) = cos k cosθ− sin k sinθ.

Multiplying through by ρ noting that x = ρ cosθ and y −ρ sinθ, where x and y are theCartesian coordinates corresponding the the polar (θ,ρ), the solution reduces to

c = x cos k + y sin k ,

which is the equation of a straight line.

5.8 The solution of Problem 5.2

In view of (5.2), we set

F (x, y, y ′) =

√√√1+ y ′2

y.

I have dropped the factor of 2g in the denominator a constant multiplier does not affectthe location of a function’s minimum. Note that F does not have an explicit dependenceon x, therefore the special Euler formula (5.12) yields,

√√√1+ y ′2

y− y ′2p

y(1+ y ′2=C ,

which simplifies to

y(1+ y ′2) =1

C 2.

Introducing the alternative constant a = 1/C 2, and then solving for y ′ we obtain

y ′ =p

a− yy.

Upon separating the variables and integrating we get∫ s

y

a− yd y =

∫

d x = x + b . (∗)

The integral on the left-hand side may be evaluated with the help of a change of variablesof the form y = a sin2θ = a

2 (1 − cosθ). With hindsight, the change of variables y =

a sin2 θ2 produces a better-looking result, and that’s what we will use. We have d y =

a sin θ2 cos θ2 dθ, and

∫ sy

a− yd y =

∫√√√√

a sin2 θ2

a− a sin2 θ2

a sinθ

2cos

θ

2dθ

= a

∫

sin2 θ

2dθ =

a

2

∫

(1− cosθ)dθ =a

2(θ− sinθ).


From (∗) we conclude that a2 (θ − sinθ) = x + b . Thus, we have arrived at the general

solution of our Euler equation in parametric form

x =a

2(θ− sinθ)− b ,

y =a

2(1− cosθ).

The constant a and b may be determined form the problem’s auxiliary conditions. Re-ferring to Figure 5.1, we see that the solution curve is to go through the points (0,0) and(x2, y2). The first of these conditions is fulfilled by taking b = 0. The second conditionrequires that

x2 =a

2(θ− sin θ),

y2 =a

2(1− cos θ),

where θ is the value of θ at the point P2. Upon solving this system for the unknowns a

and θ, we arrive at the solution of the brachistochrone problem, expressed parametricallyas follows:

x =a

2(θ− sinθ),

y =a

2(1− cosθ).

You will recognized that these are the parametric equations of a cycloid. Therefore thefastest playground slide is a cycloid that goes through the slide’s start and finish points.

5.9 A variational problem in two unknowns

This section needs to be completely rewritten!

The variational problem defined in equation (5.9) involves a single unknown functiony = y(x). The following generalization involves two unknowns, x(t ) and y(t ):

J (x, y) =

∫ t2

t1

F

t , x(t ), y(t ), x(t ), y(t )

d t , (5.13)

where a superposed dot indicates a derivative with respect to t .Consider the set X of all smooth functions from (t1, t2) to R such that x(t1) = a1,

y(t1) = b1, x(t2) = a2, y(t3) = b2, where a1, b1, a2, b2 are given.

Problem 5.4. Find functions x ∈X and y ∈X which minimize J (x, y).

Following the technique introduced in Section 5.4, it is not too hard to prove thefollowing theorem whose proof will be left as an exercise:

Theorem 5.8. Consider the function J : X ×X → R defined in (5.13). If J achieves aminimum at (x, y) ∈ X ×X , then x and y satisfy the following system of second orderdifferential equation

Fx −d

d tFx = 0, Fy −

d

d tFy = 0. (5.14)

5.10. Lagrange multipliers 37

5.10 Lagrange multipliers

This section is not really necessary; consider removing it.

This section reviews the Lagrange multipliers technique which is usually covered ina multivariable calculus course. If you are familiar with the concept, have a quick lookthrough this section nevertheless, since the methodology developed here caries in a par-allel fashion to the subsequent sections.

Problem 5.5. Let the functions f : R3→R and g : R3→R be given. Find the minimumof f subject to g (x, y, z) = 0.

Geometrically, we think of g (x, y, z) = 0 as a surface S in R3, and we are asking forthe minimum of f on that surface.

Solution. Let’s say f achieves a minimum at a point P0(x0, y0, z0) ∈ S. The rest of thesolution falls into two parts, which analyze the properties of g and f at P0.

Part 1: The analysis of g

Let C be an arbitrary curve in S which passes through P0, and let’s write

x0+α(s), y0 +

β(s), z0 + γ (s)

for the parametric representation of C . The requirement that C passesthrough P0 implies that α(0) = β(0) = γ (0) = 0. The requirement that C lies within Simplies that g

x0+α(s), y0 +β(s), z0 + γ (s)

= 0 for all s , and consequently

∂ g

∂ xα′(s)+

∂ g

∂ yβ′(s)+

∂ g

∂ zγ ′(s) = 0, for all s ,

or equivalently,D∂ g

∂ x,∂ g

∂ y,∂ g

∂ z

E

·

α′(s),β′(s),γ ′(s)

= 0.

This says that in particular at the point P0 where s = 0 we have

∇gP0⊥

α′(0),β′(0),γ ′(0)

. (5.15)

The vector

α′(0),β′(0),γ ′(0)

. is tangent to the curve C at the point P0. Since C is arbi-

trary, the vector

α′(0),β′(0),γ ′(0)

. can be anything. We conclude that∇gP0

is perpen-

dicular to all tangent vectors of all curves that pass through P0, therefore it is perpendicularto the surface S at P0.

Part 2: The analysis of f

We introduce the function

ϕ(s) = f

x0+α(s), y0 +β(s), z0 + γ (s)

= 0

which expresses the values taken on by the function f on the curve C . Since f achieves aminimum at P0 on S, the ϕ(s) achieves a minimum at s = 0, and therefore ϕ′(0) = 0. Wehave

ϕ′(s) =D∂ f

∂ x,∂ f

∂ y,∂ f

∂ z

E

·

α′(s),β′(s),γ ′(s)

,


therefore∇ f

P0⊥

α′(0),β′(0),γ ′(0)

. (5.16)

We conclude that ∇ fP0

is perpendicular to all tangent vectors of all curves that pass

through P0, therefore it is perpendicular to the surface S at P0.

Part 3: Conclusion

We have seen that∇g

P0and∇ f

P0

are both perpendicular to the surface S at P0. It follows

that those two vectors are parallel, that is, there exists a scalar λ so that∇ f = λ∇g at P0.The solution of Problem 5.5 then reduces to solving the system of 4 equations

∂ f

∂ x= λ

∂ g

∂ x,

∂ f

∂ y= λ

∂ g

∂ y,

∂ f

∂ z= λ

∂ g

∂ z, g (x, y, z) = 0,

for the four unknowns x, y, z,λ.

Remark 5.8. The statement and conclusion of Problem 5.5 generalizes to Rn in theobvious way.

Example 5.9. Find the point on the ellipse x2/9+ y2/4= 1 which is closest to the point(a, b ).

Solution. We let f (x, y) = (x−a)2+(y− b )2 be the square of the distance from any point(x, y) to the point (a, b ); and g (x.y) = x2/9+y2/4−1. Then the problem may be restatedas: minimize f subject to g (x, y) = 0. Therefore we need to solve the nonlinear systemof equations

2(x− a) =2

9λx, 2(y − b ) =

1

2λy,

x2

9+

y2

4= 1.

There is no simple solution to this system; the usual method of elimination leads to afourth degree polynomial equation. If a and b are given as numbers, then we may solvethe system numerically.

5.11 Calculus of variations with pointwise constraints

This section needs to be completely rewritten!Gelfand & Fomin’s approach calls for the introduction of the variational derivative

Here is a variant of Problem 5.4 with an added constraint.Consider the set X of all smooth functions from (t1, t2) to R such that x(t1) = a1,

y(t1) = b1, x(t2) = a2, y(t3) = b2, where a1, b1, a2, b2 are given.

Problem 5.6. The functions L : R5→R and G : R2→R are given. We define

J (x, y) =

∫ t2

t1

L

t , x(t ), y(t ), x(t ), y(t )

d t . (5.17)

Find functions x ∈X and y ∈X which minimize J (x, y) subject to G(x, y) = 0.

5.11. Calculus of variations with pointwise constraints 39

Solution. Let α and β be a pair of arbitrary smooth scalar-valued functions defined onsome interval (−δ,δ), and such that α(0) =β(0) = 0.

Let ξ and η be a pair of arbitrary smooth scalar-valued function defined on the interval(t1, t2) which vanish at t1 and t2,

Suppose the constrained minimum of J is achieved at x ∈X and y ∈ X . Then, if α,β, ξ , and η are such that

G

x(t )+α(s)ξ (t ), y(t )+β(s)η(t )) = 0, −δ ≤ s ≤ δ, t1 ≤ t ≤ t2, (5.18)

we haveJ (x +αξ , y +βη)≥ J (x, y).

Upon differentiation (5.18) with respect to s and then setting s to zero we get

∂ G

∂ xξ (t )α′(0)+

∂ G

∂ yη(t )β′(0) = 0.

Now integrate the result

α′(0)∫ t2

t1

∂ G

∂ xξ (t )d t +β′(0)

∫ t2

t1

∂ G

∂ yη(t )d t = 0.

We conclude that the vectors ⟨α′(0),β′(0)⟩ and

D∫ t2

t1

∂ G

∂ xξ (t )d t ,

∫ t2

t1

∂ G

∂ yη(t )d t

E

(5.19)

are orthogonal.Next, let us introduce ϕ : (−δ,δ)→R via

ϕ(s) = J (x +αξ , y +βη)

=

∫ t2

t1

L

t , x(t )+α(s)ξ (t ), y(t )+β(s)η(t ), x(t )+α(s)ξ (t ), y(t )+β(s)η(t )

d t .

Then the previous inequality takes the form ϕ(s)≥ ϕ(0). which indicates that ϕ achievesa minimum at 0. We conclude that ϕ′(0) = 0. We proceed, therefore, to calculate ϕ′(s).We have:

ϕ′(s) =∫ t2

t1

h∂ L

∂ xξ (t )α′(s)+

∂ L

∂ yη(t )β′(s)+

∂ L

∂ xξ (t )α′(s)+

∂ L

∂ yη(t )β′(s)

i

d t

= α′(s)∫ t2

t1

∂ L

∂ xξ (t )+

∂ L

∂ xξ (t )

d t +β′(s)∫ t2

t1

∂ L

∂ yη(t )+

∂ L

∂ yη(t )

d t .

= α′(s)∫ t2

t1

∂ L

∂ x− d

d t

∂ L

∂ x

ξ (t )d t +β′(s)∫ t2

t1

∂ L

∂ y− d

d t

∂ L

∂ y

η(t )d t

Then from ϕ′(0) = 0 we get that the vectors ⟨α′(0),β′(0)⟩ and

D∫ t2

t1

∂ L

∂ x− d

d t

∂ L

∂ x

ξ (t )d t ,

∫ t2

t1

∂ L

∂ y− d

d t

∂ L

∂ y

η(t )d tE

(5.20)

are orthogonal.


Since the vectors (5.19) and (5.20) are both orthogonal to ⟨α′(0),β′(0)⟩, and since thelatter is arbitrary, we conclude that the two former vectors are parallel, therefore, thereexists a function λ(t ) so that

d

d t

∂ L

∂ x− ∂ L

∂ x= λ

∂ G

∂ x,

d

d t

∂ L

∂ y− ∂ L

∂ y= λ

∂ G

∂ y. (5.21)

Example 5.10. Find the curve of shortest length in R3 that lies on the cylinder x2+y2 = 1and connects the points ??.

Solution. Let the shortest curve be parametrized as

x(t ), y(t ), z(t )

, t1 ≤ t ≤ t2. Thecurve’s length is

∫ t2

t1

p

x2+ y2+ z2 d t .

Therefore we let

L

t , x(t ), y(t ), z(t ), x(t ), y(t ), z(t )

=p

x2+ y2+ z2, G(x, y, z) = x2+ y2− 1,

and apply the equations (5.21). We have

∂ L

∂ x=

xp

x2+ y2+ z2,

∂ L

∂ x= 0,

∂ G

∂ x= 2x.

These lead to the system of equations

xp

x2+ y2+ z2

·= 2λx,

yp

x2+ y2+ z2

·= 2λy,

zp

x2+ y2+ z2

·= 0,

x2+ y2 = 1.

If the curve is parametrized by the arc length s , then x2 + y2 + z2 = 1, therefore theequations simplify to

x = 2λx, y = 2λy, z = 0, x2+ y2 = 1, (5.22)

where a superposed dot means differentiation with respect to s . The z equation integratesreadily to z(s) = as + b .

To continue, let us differentiate the constraint equation twice. We get x x + y y = 0and x2 + x x + y2 + y y = 0. For the second derivatives we substitute from (5.22) to getx2+ 2λx2 + y2+ 2λy2 = 0, which we rearrange into

2λ(x2+ y2)+ x2+ y2 = 0.

This further simplifies by observing that:

5.12. Calculus of variations with integral constraints 41

1. x2+ y2 = 1 (the constraint equation);

2. x2+ y2+ z2 = 1 (the arc length parametrization) therefore x2+ y2 = 1− z2; and

3. z = a.

We conclude that 2λ= 1− a2, therefore

λ=1

2(1− a2).

It just happens that in this case λ is a constant; in general we expect λ to be a function.Going back to the differential equations in (5.22), we have

x = (1− a2)x, y = (1− a2)y,

which yields the solution

x(s) = cos(p

1− a2 s + c), y(s) = sin(p

1− a2 s + c). z(s) = as + b ,

which we recognize as the parametric equations of a helix.

5.12 Calculus of variations with integral constraints

This section needs to be completely rewritten!Gelfand & Fomin’s approach calls for the introduction of the variational derivative

Consider the set X of all smooth functions from (t1, t2) to R such that x(t1) = a1,x(t2) = a2, where a1, and a2 are given. Similarly, let Y be the set of all smooth functionsfrom (t1, t2) to R such that y(t1) = b1, y(t2) = b2, where b1, and b2 are given.

Problem 5.7. The functions L : R5→R and G : R5→R are given. We define

J (x.y) =

∫ t2

t1

L

t , x(t ), y(t ), x(t ), y(t )

d t . (5.23)

Find the functions x ∈X and y ∈Y which minimize J (x, y) subject to

∫ t2

t1

G

t , x(t ), y(t ), x(t ), y(t ))

d t = 0.

With a technique similar to those in the previous sections, it is not difficult to see thatthe solution of the problem is given by

d

d t

∂

∂ x(L+λG) =

∂

∂ x(L+λG),

d

d t

∂

∂ y(L+λG) =

∂

∂ y(L+λG), (5.24)

where λ is a constant to be determined.


Example 5.11. Find the closed curve of a prescribed length ℓ in the plane which enclosesthe largest possible area.

Solution. Suppose the curve is given in parametric form as

x(t ), y(t )

, 0 ≤ t ≤ 1, withx(0) = x(1) and y(0) = y(1). The area enclosed by the curve, according to Green’s Theo-rem, is

1

2

∫ 1

0

(xy − y x)d t (5.25)

and the curve’s length is∫ 1

0

p

x2+ y2 d t . (5.26)

Therefore we wish to maximize (5.25) subject to (5.26). We apply the equation (5.24) tosolve this. We have:

L

t , x(t ), y(t ), x(t ), y(t )

=1

2(xy − y x)

and

G

t , x(t ), y(t ), x(t ), y(t )

=p

x2+ y2,

therefore

L+λG =1

2(xy − y x)+λ

p

x2+ y2,

whence

∂

∂ x(L+λG) =−1

2y +

λxp

x2+ y2,

∂

∂ y(L+λG) =

1

2y +

λyp

x2+ y2

∂

∂ x(L+λG) =

1

2y

∂

∂ y(L+λG) =−1

2x.

If we pick the arc length, s , for parametrization, we get x2 + y2 = 1, therefore equa-tions (5.24) lead to

λx = y, λy =−x .

Integrating once we obtain

λx = y + c1, λy =−x + c2.

We solve the resulting system of ODEs and arrive at

x(s) =Asin1

λs +B cos

1

λs + c2,

y(s) =Acos1

λs −B sin

1

λs − c1,

which we recognize as a parametric equation of a circle.

Exercises 43

x

y

(x1, y1)

(x2, y2)

Figure 5.3: The graph of y(x) is revolved about the x axis (left) to produce a surface ofrevolution (right) (Exercise 1).

Exercises

5.1. Minimal surface. Find the function y(x), x1 ≤ x ≤ x2, whose graph passes throughthe points (x1, y1) and (x2, y2) in the Cartesian plane, and the surface of revolutionobtained by rotating the graph about the x axis has the least area. See Figure 5.3.

5.2. Derive the Euler equations (5.14). Hint: Suppose (x, y) is a minimizing pair. Thenfor any pair of scalars α,β ∈R, and any pair of functions ξ and η which vanish att1 and t2, we have:

J (x +αξ , y +βη)≥ J (x, y).

For fixed ξ and η let

ϕ(α,β) = J (x +αξ , y +βη).

Argue that ϕ achieves a minimum at (0,0), then proceed as on the proof of Theo-rem 5.6.

5.3. The geodesic. Find the differential equations corresponding to the shortest curvelying on the surface z = f (x, y) in R3, which connects the points P1

x1, y1, f (x1, y1)

and P2

x2, y2, f (x2, y2)

. Such a curve is called a geodesic. Hint: Any such curve maybe represented parametrically as

r(t)=

u(t ), v(t ), f (u(t ), v(t )

, 0≤ t ≤ 1,

such that

u(0) = x1, v(0) = y1, u(1) = x2, v(1) = y2.

Then we have

dr(t)=

u , v,∂ f

∂ uu +

∂ f

∂ vv

d t .

Let d s be the infinitesimal curve length corresponding to the increment d t . Wehave

d s2 = ‖dr(t)‖2 =

u2+ v2+∂ f

∂ uu +

∂ f

∂ vv2

d t 2,


therefore the curve’s length is

L(u, v, u , v) =

∫ 1

0

§h

1+∂ f

∂ u

2i

u2+h

1+∂ f

∂ v

2i

v2+ 2∂ f

∂ u

∂ f

∂ vu vª1/2

d t .

5.4. Apply the previous exercise’s general result to the special case where z = f (x, y) =p1− x2, that is, the surface is (a part of) a circular cylinder. Conclude that geodesics

are helixes.

Chapter 6

Lagrangian mechanics

6.1 Newtonian mechanics

Let r(t ) be the position vector at time t of a particle (point mass) of constant mass mmoving in the three-dimensional space under the influence of a force f (t ). Accordingto Newton, the equation of motion is mr = f , where, to simplify the notation, I havewritten r and f for r(t ) and f (t ). A superimposed dot on a variable indicates the timederivative of that variable. Thus, r is the particle’s velocity and r is its acceleration.

The motion of a collection of N particles is given as a set N vectorial equations

mk rk = fk , k = 1,2, . . . ,N , (6.1)

where mk is the mass of the kth particle, rk is its position vector, and fk is the resultantof all forces acting on mk .

Example 6.1. Consider and idealized “dumbbell” consisting of two particles of massesm1 and m2, connected with a rigid massless rod, as shown in Figure 6.1(a). In the freeflight of the dumbbell, as when it is tossed up in the air, the force exerted on m1 is theresultant of the (known) weight vector m1g and the (unknown) push/pull f12 the rod.That is, f1 = m1g+f12.

Let us write rk = ⟨rk ,1, rk ,2, rk ,3⟩ and fk = ⟨ fk ,1, fk ,2, fk ,3⟩ for the Cartesian represen-tations of the vectors rk and fk . Then the N vector equations above may equivalently beviewed as 3N scalar equations

mk rk , j = fk , j , j = 1,2,3, k = 1,2, . . . ,N . (6.2)

The following obvious trick flattens the doubly-indexed variables into singly-indexquantities and results in a significant algebraic simplification. We introduce the vectors

x= ⟨ r1,1, r1,2, r1,3︸︷︷︸

r1

, r2,1, r2,2, r2,3︸︷︷︸

r2

, . . . rN ,1, rN ,2, rN ,3︸︷︷︸

rN

⟩, (6.3a)

f = ⟨ f1,1, f1,2, f1,3︸︷︷︸

f1

, f2,1, f2,2, f2,3︸︷︷︸

f2

, . . . fN ,1, fN ,2, fN ,3︸︷︷︸

fN

⟩, (6.3b)

m= ⟨ m1, m1, m1︸︷︷︸

m1

, m2, m2, m2︸︷︷︸

m2

, . . . , mN , mN , mN︸︷︷︸

mN

⟩, (6.3c)

45

46 Chapter 6. Lagrangian mechanics

ℓ

m1g

m2gf12

f21ℓ12

ℓ23

ℓ31m1g

m2g

m3g

f12

f21f23

f32

f13f31

(a) A idealized dumbbell (b) a rigid triangle

Figure 6.1: This idealized dumbbell on the left consists of two points masses connectedthrough a massless rigid rod of length ℓ. In free flight, the force exerted onm1 is the resultant of the weight vector m1g and the push/pull of the rodf12. The rigid triangle on the right consists of three point masses connectedthrough a massless rigid rods.

and write (6.2) as

mi xi = fi , i = 1,2, . . . , 3N . (6.4)

The change from (6.2) to (6.4) may seem merely cosmetic, but it entails a major changeof philosophy and opens the doors to Lagrangian mechanics, as we shall see. Specifically,we view (6.4) as the differential equation of a motion of a point x in R3N . Accordingto (6.3a), knowing the position of the single point x ∈ R3N is equivalent to knowingthe positions of the N points r1, r2, . . . , rN in the (physical) three-dimensional space.Thus, the study of the motion of a system of N points in the three-dimensional space isequivalent to the study of the motion of a single point in the abstract R3N . Specifying an x

in the R3N amounts to specifying the geometrical configuration of the particle system.

Definition 6.2. The 3N-dimensional space introduced above is called the mechanical system’sCartesian configuration space. In analogy with Newton’s equations of motion, the vectorsx, f , m defined in (6.3) are called the position, the force, and the mass of the single abstract“particle” moving in the configuration space.

Remark 6.1. Although it is tempting to think of the equation of motion (6.4) as ageneralization of Newton’s equation mx= f to R3N , the analogy is imperfect. The truegeneralization would have been

mxi = fi , i = 1,2, . . . , 3N ,

involving only a single m. In contrast, (6.4)’s fictitious “particle” exhibits different massesalong different coordinate directions.

6.2 Holonomic constraints

The motion of a particle in the three-dimensional physical space traces a curve, as in thearc of a thrown ball, or the orbit of a planet. The motion of N particle then traces Ncurves in the three-dimensional space. The position x in configuration space, definedin (6.3a), merges the coordinates of the N particles into one, therefore the motion of theentire N -particle system appears as a single curve in the configuration space. We call that

6.2. Holonomic constraints 47

curve the system’s orbit in the configuration space. When there is no risk of confusion, wewill simply call it the orbit.

If there are no impediments in placing the particles independently in arbitrary po-sitions in space, then the orbit of the system of N particle may reach any point in theconfiguration space—all is needed is the application of an appropriate force to get there.If, however, the relative movements of the points are constrained, as in the dumbbell ofFigure 6.1(a), only a subset of the configuration space may be reached.

Example 6.3. Let r1 = ⟨r1,1, r1,2, r1,3⟩ and r2 = ⟨r2,1, r2,2, r2,3⟩. be the position vectors of

the dumbbell of Figure 6.1(a). Then, according to (6.3a) the position vector x ∈ R6 isgiven by

x= ⟨r1,1, r1,2, r1,3, r2,1, r2,2, r2,3⟩.The constraint of the fixed length ℓ of the connecting rod is expressed as ‖r1 − r2‖ = ℓ,or more explicitly, as

(r1,1− r2,1)2+(r1,2− r2,2)

2+(r1,3− r2,3)2 = ℓ2,

that is,(x1− x4)

2+(x2− x5)2+(x3− x6)

2− ℓ2 = 0. (6.5)

This defines a 5-dimensional “surface”—a manifold is the technical term—embedded in R6.The point orbit cannot roam arbitrarily in R6; it is constrained to stay on that manifold.

Example 6.4. Figure 6.1(b) shows three point masses connected with three massless rigidrods, and thus forming a rigid triangle. The position vectors ri , i = 1,2,3, of the massesare constrained through the three constraint equations

‖r1−r2‖= ℓ12, ‖r2−r3‖= ℓ23, ‖r3−r1‖= ℓ31,

which, in terms of the extended variable

x= ⟨r1,1, r1,2, r1,3, r2,1, r2,2, r2,3, r3,1, r3,2, r3,3, ⟩= ⟨x1, x2, x3, x4, x5, x6, x7, x8, x9, ⟩

take on the form

(x1− x4)2+(x2− x5)

2+(x3− x6)2 = ℓ2

12,

(x4− x7)2+(x6− x8)

2+(x7− x9)2 = ℓ2

23,

(x7− x1)2+(x8− x2)

2+(x9− x3)2 = ℓ2

31.

These confine the triangle’s orbit in the configuration space to a 6-dimensional manifoldembedded in R9.

Example 6.5. Reconsider the previous example with a added twist. Suppose that thetriangle’s rods are equipped with remote-controlled motors with may be activated to varythe rods’ lengths as desired during the flight. The previous example’s constraint equationstake the form

(x1− x4)2+(x2− x5)

2+(x3− x6)2 = ℓ12(t )

2,

(x4− x7)2+(x6− x8)

2+(x7− x9)2 = ℓ23(t )

2,

(x7− x1)2+(x8− x2)

2+(x9− x3)2 = ℓ31(t )

2,


where ℓ12(t ), ℓ23(t ), and ℓ31(t ) are given. The manifoldM in this case is a 6-dimensionalmanifold embedded in R9 whose shape changes with time.

Example 6.6. Recall the bead on the rotating hoop of Exercise 3 on page 6. With theobvious choice of the xy z coordinates, the position vector of the bead is

r = ⟨R sinϕ cosΩt , R sinϕ sinΩt , R cosϕ⟩.The manifold M in this case is a the spinning hoop itself. Geometrically it is a one-dimensional spinning object (a circle) embedded in R3. It is given by the pair of equations

x sinΩt = y cosΩt ,

x2+ y2+ z2 = a2.

The first equation is that of plane that contains the z axis and spins about it with anangular velocity of Ω. The second equation is that of a sphere of radius a centered at theorigin. The intersection of the two objects is the spinning hoop.

In general, a system of N particles subject to M constraint equations of the form

ϕi (x, t ) = 0, i = 1,2, . . . , M , (6.6)

where ϕi : R3N ×R→R, i = 1,2, . . . , M . These define a (3N −M )-dimensional manifoldM embedded in R3N . The system’s possible orbits are confined to lie in that manifold.GenerallyM may move/deform with time, as it was the case in Examples 6.5 and 6.6.However, if the equations (6.6) are independent of time, as it was the case in Examples 6.3and 6.4, thenM remains unchanged during the motion. That corresponds to a set ofconstraints of the form

ϕi (x) = 0, i = 1,2, . . . , M , (6.6’)

Constraints of type (6.6) are not the most general. Some very interesting mechanicalsystems impose constraints on the velocity, x, as in ϕi (x, x, t ) = 0. The rolling of a coinon the floor, for instance, has a constraint that depends on x, therefore (6.6) is inadequatefor that purpose.

Definition 6.7. Constraints of the type (6.6) are called holonomic. All other types of con-straints are called nonholonomic.

Definition 6.8. A mechanical system whose only constraints are of the holonomic type iscalled a holonomic system.

We will begin our study of Lagrangian dynamics with holonomic systems. Nonholo-nomic constraints will be brought up in the later chapters.

Remark 6.2. You may be interested to know that holonomic constraints of type (6.6)are called rheonomic while those of type (6.6’) are called scleronomic. I prefer to call themwith the more user-friendly terms “time-dependent” and “time-independent” instead.

Remark 6.3. The term holonomic was introduced by Hertz in [6]:

§123. A material system between whose possible positions all conceivablecontinuous motions are also possible motions is called a holonomous system.

The term means that such a system obeys integral (ὅλος) laws (νόμος), whereasmaterial systems in general obey only differential conditions.

6.3. Generalized coordinates 49

Admittedly that definition is rather vague, but its meaning is clarified further down:

§132. When from the differential equations of a material system an equalnumber of finite equations between the coordinates of the system can be de-duced, the system is holonomous.

By “finite equations” he means algebraic, as opposed to differential, equations.

6.3 Generalized coordinates

In a holonomic system of N particles subject to M holonomic constraints, the 3N Carte-sian components of the position vector x are not quite suitable for the analysis of motion—they cannot serve as independent variables since they are interrelated through the Mconstraint equations (6.6). A much better approach is to parametrize the n = (3N −M )-dimensional configuration manifoldM through a suitably chosen n independent vari-ables q1, q2, . . . , qn , called the system’s generalized coordinates. The parametrization is cer-tainly not unique, however in practice there often is an “obvious” choice. We write q

when we wish to refer to the n variables q1, q2, . . . , qn collectively.The parameters q form a (generally curvilinear) coordinate system onM . Since the

motion’s orbit lies inM , the system’s state as a function of time may be expressed in termsof q(t ). The purpose of analytical mechanics is to express Newton’s equations of motion (6.4)in terms of the generalized coordinates q.

Remark 6.4. A familiar example curvilinear coordinates, albeit not directly related tomechanics, is the system of addressing locations on the surface of the Earth through theirlongitude λ and latitude ϕ. In this context,M is the Earth’s surface, and λ and ϕ are thecoordinates q1 and q2.

Any q identifies a point on the manifoldM . SinceM is embedded in R3N , it alsoidentifies a point x ∈R3N . That is, the system’s configuration vector x is a function of q.We write this is a x= x(q, t ), or in components:

xi = xi (q, t ), i = 1,2, . . . , 3N . (6.7)

The t in this equations accounts for the possible motion/deformation of the manifold re-lated to time-dependent constraints (6.6). In the case of time-independent constraints (6.6’),M is independent of time, and (6.7) reduces to

xi = xi (q), i = 1,2, . . . , 3N . (6.7’)

Differentiating the q to x mapping of (6.7) with respect to time, we obtain an expres-sion for the velocities in terms of generalized coordinates:

xi =∑

k

∂ xi (q, t )

∂ qk

qk +∂ xi (q, t )

∂ t. (6.8)

[The last term will be absent in the case of (6.7’).] Let us observe that although the po-sition xi is a function of q and t only, the velocity xi is a function of q, q, and t . Let’srecord this here for future reference:

xi = xi (q, q, t ), i = 1,2, . . . 3N . (6.9)

The following theorem establishes a couple of very useful mathematical identities:


Theorem 6.9. Let xi and xi be as in (6.7) and (6.9). Then for any i = 1,2, . . . , 3N andj = 1,2, . . . , M, we have:

∂ xi (q, q, t )

∂ q j

=∂ xi (q, t )

∂ q j

. (6.10)

∂ xi (q, q, t )

∂ q j

=d

d t

∂ xi (q, t )

∂ q j

(6.11)

Proof. The assertion (6.10) is an immediate consequence of (6.8). As to (6.11), it’s a matterof differentiating (6.8) with respect to q j and then exchanging the differentiation order in

the resulting second order partial derivatives:

∂ xi (q, q, t )

∂ q j

=∑

k

∂ 2xi (q, t )

∂ q j∂ qk

qk +∂ 2xi (q, t )

∂ q j∂ t

=∑

k

∂

∂ qk

∂ xi (q, t )

∂ q j

qk +∂

∂ t

∂ xi (q, t )

∂ q j

=d

d t

∂ xi (q, t )

∂ q j

.

6.4 Virtual displacements, virtual work, and generalized force

Figure 6.2 depicts a representation of the orbit of a system of N particles on a manifoldM embedded in R3N . Pick an arbitrary point, let’s say x, of the orbit and then considerthe tangent at that point to the manifold. We explore that tangent through infinitesimalexcursions away from x. Such excursions are called virtual displacements and commonlywritten as δx. I should emphasize that we are viewing the whole picture as a fossil frozenin time. The excursions have nothing to do with the system’s motion which will continuealong the predetermined orbit once we unfreeze the time. The “δx” notation is used todistinguish between virtual displacements and the actual differential of the motion dx.

The obvious way of producing a virtual displacement is through incrementing thegeneralized coordinates q. A change in q amounts to a displacement within the manifoldM . Therefore the differential δq is a displacement withinM ’s tangent. In view of (6.7),we have:

δxi =n∑

j=1

∂ xi

∂ q j

δq j . (6.12)

Let f be the force vector, see (6.3b), at the point x. Under a virtual displacement δx,the force performs a work δW , called virtual work, given by

δW = f ·δx=3N∑

i=1

fiδxi =3N∑

i=1

fi

n∑

j=1

∂ xi

∂ q j

δq j

=n∑

j=1

3N∑

i=1

fi

∂ xi

∂ q j

δq j =n∑

j=1

Q jδq j .

Letting

Q j =3N∑

i=1

fi

∂ xi

∂ q j

, j = 1,2, . . . , n, (6.13)

6.4. Virtual displacements, virtual work, and generalized force 51

x

the true orbit

virtual displacements at x

the manifoldM

Figure 6.2: The system’s orbit lies on a manifoldM determined by the holonomic con-straints. Virtual displacements at x are tangent to the manifold.

the virtual work is now expressed as

δW = f ·δx=Q ·δq.

The vector Q is called the generalized force at x. The component Q j is called the compo-

nent of the generalized force along the generalized coordinate q j .

Example 6.10. Consider the simple pendulum of Figure 1.1. The position vector r =⟨ℓ sinϕ,ℓcosϕ⟩, therefore the vector x (see (6.3a)) is

x= ⟨x1, x2⟩= ⟨ℓ sinϕ,ℓcosϕ⟩,

and the constraint is x21+x2

2 −ℓ2, therefore the configuration manifoldM coincides withthe circle swept by the pendulum’s bob, embedded in the configuration space R2. Theangle ϕ plays the role of the generalized coordinate in this case; any value of ϕ identifiesa point onM . Let us write ϕ and Qϕ instead of q1 and Q1 for clarity. The force vector is

⟨0, m g ⟩. We compute the generalized force by applying (6.13):

Qϕ = f1∂ x1

∂ ϕ+ f2

∂ x2

∂ ϕ

= 0× (ℓcosϕ)+m g × (−ℓ sinϕ) =−m gℓ sinϕ.

Observe that Qϕ turns out to be equal to the moment of the weight vector f about the

pendulum’s pivot.

Example 6.11. Consider the double-pendulum of Figure 1.2. The position vectors of itstwo masses are given by

r1 = ⟨ℓ1 sinϕ,ℓ1 cosϕ⟩, r2 = r1+ ⟨ℓ2 sinψ,ℓ2 cosψ⟩,


therefore the vectors x and f (see (6.3)) are

x= ⟨x1, x2, x3, x4⟩= ⟨ℓ1 sinϕ,ℓ1 cosϕℓ1 sinϕ+ ℓ2 sinψ,ℓ1 cosϕ+ ℓ2 cosψ⟩,

f = ⟨ f1, f2, f3, f4⟩= ⟨0, m1 g , 0, m2 g ⟩.

The configuration space is R4 in this case. The two constraints

(x1− x2)2 = ℓ2

1, (x3− x4)2 = ℓ2

2

result in a two-dimensional configuration manifoldM embedded in R4. The angles ϕand ψ serve as generalized coordinates onM . Let us write ϕ and ψ for the generalizedcoordinates instead of the generic q1 and q2, for clarity. We write Qϕ and Qψ for the

corresponding generalized forces. forces instead of Q1 and Q2 By applying (6.13) we get

Qϕ = f1∂ x1

∂ ϕ+ f2

∂ x2

∂ ϕ+ f3

∂ x3

∂ ϕ+ f4

∂ x4

∂ ϕ=−m1 gℓ1 sinϕ−m2 gℓ2 sinϕ,

Qψ = f1∂ x1

∂ ψ+ f2

∂ x2

∂ ψ+ f3

∂ x3

∂ ψ+ f4

∂ x4

∂ ψ=−m2 gℓ2 sinψ.

6.5 External versus reaction forces

In equation (6.1) the force fk applied to particle k is the resultant of all forces acting onthat particle. For instance, in the triangular system of Figure 6.1(b), forces applied to m1

consist of m1g + f12 + f13. The fist term is the gravitational force applied to m1, thatis its weight, which is known. We call it an external force. The other two are generateddynamically within the rods, and are unknowns to be determined. We call then internalforces or more frequently, constraint reactions because they arise due to the unchanginglengths of the rods.

The constraint reactions get eliminated in the Lagrangian formulation as we shall see.Their elimination reduces the problem’s unknowns, and hence simplifies the equationsof motion significantly. In anticipation of that development, we write the total force fk

in (6.1) as fk + f ′k, where, with some abuse of notations, we have recycled the notation

fk to signify the external forces only, and f ′ the internal forces, applied to the particle k.After flattening the vectors in accordance with (6.3), equation (6.4) takes on the form

mi xi = fi + f ′i , i = 1,2, . . . , 3N . (6.14)

The argument that leads to the elimination of the constraint reactions proceeds asfollows. The orbit of (6.14) lies in the constraint manifoldM in R3N . External forcesapplied to the particles push and pull the point x in a direction tangent toM . But whatkeeps x from flying away from M ? The manifold holds it back, that’s what! If, forexample, x speeds over a round protrusion on the manifold, centrifugal forces will tendto pull it away from the manifold. The manifold, however, exerts just the right amountof opposite force, the constraint reaction, which holds x attached toM . In the physicalspace, that is R3, the manifold’s reactions manifests itself as the forces that develop in thesystem’s interconnecting links, such as f12 and f13 noted above.

The crucial observation that leads to the elimination of the constraint reactions fromthe equations of motion is that the constraint reaction is orthogonal to the constraint man-ifold. If it weren’t, then it would have a component tangent to the manifold, which will

6.6. The equations of motion for a holonomic system 53

then perform work during the motion. But such a behavior is uncharacteristic of a passiveconstraint surface, so we disallow it.

Since a virtual displacement δx is tangent to the constraint manifold (see Figure 6.2),the orthogonality of the reaction force f ′ toM is expressed naturally as f ′ ·δx = 0 forall virtual displacements δx, or in expanded form

3N∑

i=1

f ′i δxi = 0 for all virtual displacements δx.

We note that, however, that due to (6.12)

3N∑

i=1

f ′i δxi =3N∑

i=1

f ′i

n∑

j=1

∂ xi

∂ q j

δq j

!

=n∑

j=1

3N∑

i=1

f ′i∂ xi

∂ q j

δq j ,

therefore

n∑

j=1

3N∑

i=1

f ′i∂ xi

∂ q j

δq j = 0 for all virtual displacements δq,

from which it follows that

3N∑

i=1

f ′i∂ xi

∂ q j

= 0, j = 1,2, . . . , n. (6.15)

6.6 The equations of motion for a holonomic system

At this point, the motion of a system consisting of N point masses and M holonomicconstraints has been encapsulated into the 3N +M equations (6.14) and (6.6) in the un-knowns xi and f ′i . It is the goal of this section to re-express the equations of motions as asystem of only n = 3N −M differential equations for the n generalized coordinates q asthe unknowns. We begin with multiplying the equation (6.14) by ∂ xi∂ q j and summingover i :

3N∑

i=1

mi xi

∂ xi

∂ q j

=3N∑

i=1

fi

∂ xi

∂ q j

+3N∑

i=1

f ′i∂ xi

∂ q j

.

The second summation on the right-hand side is zero due to (6.15). The first summationon the right-hand side is the generalized force Q j ; see (6.13). Therefore obtain

3N∑

i=1

mi xi

∂ xi

∂ q j

=Q j , j = 1,2, . . . , n. (6.16)

To simplify the left-hand side, we begin with a preliminary preparation. The kineticenergy of the system is

T (x) =3N∑

k=1

1

2mk x2

k .

I have written T rather than the usual T for a reason which will become obvious shortly.Now observe that for any i

∂ T (x)

∂ xi

=∂

∂ xi

3N∑

k=1

1

2mk x2

k = mi xi ,


therefored

d t

∂ T (x)

∂ xi

= mi xi .

Then, the left-hand side of (6.16) may be calculated as

3N∑

i=1

mi xi

∂ xi

∂ q j

=3N∑

i=1

d

d t

∂ T (x)

∂ xi

∂ xi

∂ q j

=3N∑

i=1

d

d t

∂ T (x)

∂ xi

∂ xi

∂ q j

−3N∑

i=1

∂ T (x)

∂ xi

d

d t

∂ xi

∂ q j

,

where in the last step we have used the differentiation formula u ′v = (uv)′− uv ′. Nowapply (6.10) to the first summation on the right-hand side, and apply (6.11) to the secondsummation, to get

3N∑

i=1

mi xi

∂ xi

∂ q j

=3N∑

i=1

d

d t

∂ T (x)

∂ xi

∂ xi

∂ q j

−3N∑

i=1

∂ T (x)

∂ xi

∂ xi

∂ q j

=d

d t

3N∑

i=1

∂ T (x)

∂ xi

∂ xi

∂ q j

−3N∑

i=1

∂ T (x)

∂ xi

∂ xi

∂ q j

.

Let us recall that the Cartesian velocity components x and the generalized velocitycomponents q are related through (6.8). Therefore the kinetic energy, which we havetaken to be a function of x, may equally well be expressed as a function of q, q, and t .

We write the latter as T to distinguish it from the former T :

T (x) = T (q, q, t ),

and then note that by the chain rule

∂ T (q, q, t )

∂ q j

=3N∑

i=1

∂ T (x)

∂ xi

∂ xi

∂ q j

and∂ T (q, q, t )

∂ q j

=3N∑

i=1

∂ T (x)

∂ xi

∂ xi

∂ q j

.

Consequently3N∑

i=1

mi xi

∂ xi

∂ q j

=d

d t

∂ T (q, q, t )

∂ q j

− ∂ T (q, q, t )

∂ q j

,

and therefore (6.16) takes the form

d

d t

∂ T (q, q, t )

∂ q j

− ∂ T (q, q, t )

∂ q j

=Q j , j = 1,2, . . . , n. (6.17)

These n second order differential equations in the n unknowns q1, q2, . . . , qn are calledLagrange’s equation of motion for a holonomic systems.

In particular, if the external forcesQ are derived from a potential, that is, if there existsa scalar function function V (q, t ) so that Q j = −∂ V /∂ q j , then Lagrange’s equation of

motion take on the form

d

d t

∂ T (q, q, t )

∂ q j

− ∂ T (q, q, t )

∂ q j

=−∂ V (q, t )

∂ q j

, j = 1,2, . . . , n.

Exercises 55

aa

x

ϕ

m

m

O

Figure 6.3: The massless hoop rolls against a horizontal line while remaining in a ver-tical plane. A point of mass m is affixed to the hoop. The angle ϕ or thedistance x may be used as generalized coordinates in the configuration space(Exercises 2 and 3).

In that case we define the system’s Lagrangian as

L(q, q, t ) = T (q, q, t )−V (q, t )

and observe that since V does not depend on q, we have ∂ L/∂ q j = ∂ T /∂ q , therefore

the equations of motion collapse to

d

d t

∂ L(q, q, t )

∂ q j

− ∂ L(q, q, t )

∂ q j

= 0, j = 1,2, . . . , n. (6.18)

Exercises

6.1. Find the gneralized forces Qϕ and Qθ in the spherical pendulum of Figure 1.4(page 7).

6.2. A massless hoop of radius a rolls without slipping on a horizontal line, while remi-aning in a vertical plane. A particle of mass m is firmly attached to the hoop, asseen in Figure 6.3. Use the angle ϕ of the mass’s radius relative to the vertical asgeneralized coordinate. Find the generalized force Qϕ .

6.3. In the previous problem use the distance x travelled by the contact point (see thefigure) as generalized coordinate. Find the generalized force Qx .

6.4. Suppose the external forces fi in (6.14) are derived from a potential, that is, there ex-

ists a scalar-valued function V (x, t ) such that fi =−∂ V (x, t )/∂ xi for i = 1,2, . . . , 3N .

Let V (q, t ) = V (x, t ) be the representation of the potential as a function of gen-eralized coordinates. Show that the generalized forces Q j are derived from the

potential V , that is, Q j = ∂ V (q, t )/∂ q j , for j = 1,2, . . . , n.

Chapter 7

Angular velocity

When you hurl a rock in the air, or toss a Frisbee, the object spins in general as it moves.Associated with the motion is a vector ω(t ), called the object’s angular velocity vector,(or just angular velocity for short,) whose direction and magnitude at any instant of timet indicate orientation and the rate of rotation. It is the goal of this section to make thedefinition of the angular velocity precise.

Toward that end, let b1,b2,b3 be an orthonormal set of vectors which is firmly af-fixed to the spinning object, therefore moves with it. The choice of the letter b is thisnotation is to remind us that we are dealing with a body coordinate system.

We apply the general formula (8.7) (page 61) to express the rate of change, bi , of thevector bi in the basis b1,b2,b3:

bi =3∑

j=1

(bi ·b j )b j .

Let ai j = bi ·b j be the matrix of the coefficients. We claim that the matrix, let’s call it A, is

skew-symmetric. Indeed, we have bi · b j = δi j , where δi j is the Kronecker delta defined

in (8.6) (page 61). Upon differentiating this we get bi ·b j +b j ·bi = 0, whence ai j +a j i = 0,

which proves A is skew-symmetric.Since the general form of a 3× 3 skew-symmetric matrix is

A=

0 ω3 −ω2

−ω3 0 ω1

ω2 −ω1 0

, (7.1)

therefore

b1 =ω3b2−ω2b3, (7.2a)

b2 =ω1b3−ω3b1, (7.2b)

b3 =ω2b1−ω1b2. (7.2c)

The coefficients ω1, ω2, ω3 measure the rate of change of the triad b1,b2,b3, andconsequently, the rate of change of orientation of the body to which the triad is attached.The angular velocity vector, defined as

ω =ω1 b1+ω2 b2+ω3 b3 (7.3)

57

58 Chapter 7. Angular velocity

encapsulates that rate of change.

Remark 7.1. Actually calling ω a vector is premature. A vector, as defined in Section 8.1is a physical object in the sense that it is independent of any coordinate system whichmay be used in defining it. Here ω has been defined in terms of its components on theb1,b2,b3 triad. But the triad is not an integral part of the rotating body. Does ω surviveif the triad goes away? Does ω remain the same if that triad is replaced by another?

It turns out that the answer to both of those question is in the positive. Indeed, ωis not an artifact of the arbitrarily chosen triad. It is an intrinsic property of the body’smotion. The presence of a coordinate triad affixed to the body is immaterial. The proofof this claim is not too hard but it takes us deeper into tensor analysis, so I will skip it fornow.

Remark 7.2. We leave it as an exercise to show that differentiating (7.3) and applying theequations (7.2), resutls in

ω = ω1 b1+ ω2 b2+ ω3 b3. (7.4)

Remark 7.3. On the one hand, in (7.1) we have a23 =ω1. On the other hand, we have

the ai j = bi ·b j by definition. It follows thatω1 = b2 ·b3. Similar expressions are obtained

for ω2 andω3. Let’s make a record of this:

ω1 = b2 ·b3, ω2 = b3 ·b1, ω3 = b1 ·b2. (7.5)

Remark 7.4. From the definition of ω in (7.3) it follows that

ω× b1 = (ω1 b1+ω2 b2+ω3 b3)× b1 =−ω2 b3+ω3 b2.

Therefore the equations (7.2) may be written as

b1 =ω× b1, b2 =ω× b2, b3 =ω× b3. (7.6)

Remark 7.5. Consider points O and P in the body, and let r =−→OP . Then r = α1b1 +

α2b2+α3b3, where α1, α2, α3 are constants. The velocity v of P relative to O is given byv = r. Let us observe that

v = r = α1b1+α2b2+α3b3

= α1(ω× b1)+α2(ω× b2)+α3(ω× b3) =ω× (α1b1+α2b2+α3b3) =ω×r.

The formulav =ω×r (7.7)

is used throughout the rest of this book.

Exercises

7.1. Derive (7.4).

Chapter 8

The moment of inertiatensor

8.1 A brief introduction to tensor algebra

In much of our working and thinking, we tend to blur the distinction between a vectoras a pointed arrow in space ր, and an n-tuple of numbers ⟨x1, x2, . . . , xn⟩. Yet, the twoconcepts are drastically different. The former would have been quite a natural object toEuclid in 300 BC, but the latter would been very foreign to him—for at the time they laytwo millennia in the future.

The blurring between the two ways of looking at vectors is harmless much of thetime, but there are places where a strict distinction between the two views is cruciallyimportant. Tensor algebra makes a bridge between the two and goes substantially beyond.Tensor algebra, along with tensor analysis, are indispensable tools in differential geometry,continuum mechanics, and general relativity, and perhaps other disciplines as well. For ageneral exposition of tensor algebra and tensor analysis, see [7].

Tensor algebra is not a sine qua non of rigid body mechanics, therefore only rarely itis brought into play. There is, however, the ubiquitous use of the term moment of inertiatensor, which hints tantalizingly to a connection to tensors behind the scenes, howeverthe connection is only rarely brought out. It is the purpose of this section and the nextto introduce the minimal tensor algebra which explains the “tensor” in the “moment ofinertia tensor”.

8.1.1 Tensor algebra

Vectors, dot product, cross product. In the three-dimensional Euclidean space fixa point called the origin, and consider the set V of all vectors (in the sense of pointedarrows) whose tails are attached to the origin. We define the sum x+y of two vectors xand y in V through the parallelogram rule, that is, we form a parallelogram based on thevectors, and consider the parallelogram’s diagonal as their sum as illustrated in Figure 8.1.Multiplying a vector by a number stretches/shrinks the vector’s length by the magnitudeof that number. If the number is negative, the resulting vectors flips, that is, it points inthe opposite of the original’s direction. See Figure 8.1 for an illustration.

The length of the vector x is written ‖x‖. The dot product x ·y of a pair of vectors xand y in V is a number defined by

x ·y = ‖x‖‖y‖cosθ,

where θ is the angle between the vectors; see Figure 8.2. Let us note that (a) the dot

59

60 Chapter 8. The moment of inertia tensor

x

yx+y

o

x

−0.7x 1.2xo

Figure 8.1: On the left: The sum x + y of two vectors x and y is formed throughthe parallelogram rule. On the right: Multiplication by a number stretch-es/shrinks/flips a vector. The vector 1.2x is drawn in a slightly displacedposition relative to the origin o for the sake of visualization.

x

y

θo

x

y

θ

x×y

n

o

Figure 8.2: On the left: The dot product of the vectors x and y is the number x ·y =‖x‖‖y‖cosθ. On the right: The cross product of the vectors x and y isthe vector x × y =

‖x‖‖y‖ sinθ

n, where n is a unit vector which isperpendicular to the plane of the vectors x and y, and is oriented accordingto the right-hand rule.

product is commutative, that is x · y = y · x; (b) if the vectors are orthogonal to eachother, that is, θ = π/2, then x ·y = 0; and (c) a vector’s length may be expressed as a dotproduct: ‖x‖2 = x ·x.

The cross product of x×y of a pair of vectors x and y in V is the vector

x×y =

‖x‖‖y‖ sinθ

n,

where 0≤ θ≤π is the angle between the vectors, and n is a unit vector which is perpen-dicular to the plane of the vectors x and y, and is oriented according to the right-handrule. The latter means that if you align your right hand’s thumb with n, then a rotationby an angle θ in the direction pointed at by your fingers will take the vector x to y. Itfollows that the cross product is anticommutative, which means that x×y =−y×x.

Tensors. A function L : V →V is said to be linear if

L(x+y) = L(x)+ L(y) for all x,y ∈ V ,

L(αx) = αL(x) for all x ∈ V , α ∈R.

A linear function from V to V is called a tensor on V , or just tensor, for short. It iscustomary to write Lx instead of L(x) when L is a tensor, as we will do from now on.The identity tensor, I , is the tensor with the property

Ix= x for all x ∈ V . (8.1)

The sum L1+ L2 of tensors L1 and L2 is a tensor defined by

(L1+ L2)x= L1x+ L2x for all x ∈ V . (8.2)

8.1. A brief introduction to tensor algebra 61

The product αL of a tensor L and a number α ∈R is a tensor defined by

(αL)x = α(Lx) for all x ∈ V . (8.3)

The set of tensors on V , equipped with the operations defined in (8.2) and (8.3), is avector space in the sense of an abstract vector space (not to be confused with vectors of thepointed-arrow kind).

The dyadic product. The dyadic product a⊗ b of vectors a and b in V is a tensor onV defined by

(a⊗ b)x= (b ·x)a for all x ∈ V . (8.4)

Lemma 8.1. For any pair of vectors a,b ∈ V we have

‖a× b‖2 = a ·

‖b‖2I − b⊗ b

a, (8.5)

where I is the identity tensor.

Proof. Let θ be the angle between the vectors a and b. We have:

‖a× b‖2 = ‖a‖2‖b‖2 sin2θ

= ‖a‖2‖b‖2(1− cos2θ)

= ‖a‖2‖b‖2−‖a‖2‖b‖2 cos2θ

= ‖b‖2(a ·a)− (a ·b)2

= a ·

‖b‖2a− (a ·b)b

= a ·

‖b‖2a− (b⊗ b)a

= a · (‖b‖2I − b⊗ b)a.

8.1.2 Connection with R3 and 3× 3 matrices

Let e1,e2,e3 be an orthonormal set inV . This means that the three vectors are mutuallyperpendicular, and each is of length one. Thus

ei ·e j = δi j =

¨

0 if i 6= j ,

1 if i = j .(8.6)

The symbol δi j defined above is called the Kronecker delta.

In Exercise 1 you will show that an orthonormal set is linearly independent. If followsthat the orthonormal set e1,e2,e3 forms a basis for the three-dimensional vector spaceV . Consequently, any x ∈ V may be expressed as a linear combination of the form

x= c1e1+ c2e2+ c3e3.

Dot-multiplying through by e1 and taking into account the orthonormality of basis vec-tors, we get x · e1 = c1. Repeating with e2 and e3 we conclude that ci = x · ei , i = 1,2,3.Thus we have established the identity

x= (x ·e1)e1+(x ·e2)e2+(x ·e3)e3, for all x ∈ V . (8.7)


For each i , the coefficient x ·ei is called the component of the vector x along ei .

Remark 8.1. Let us state emphatically that the components x · ei are not properties ofthe vector x at all! If you replace one basis with another which is rotated in an arbitrarymanner relative to the first, then the components of x will be different in general, while xhas not been touched. Even if you remove the basis altogether, the vector x will happilycontinue to exist. Having said all that, it is sometimes useful to work with ⟨c1, c2, c3⟩ ∈R3

as sort of an “avatar” of x ∈ V , as long you remain cognizant of what it is.

Remark 8.2. Applying the defintion of the diatic product (8.4), the identity (8.7) maybe written in the equivalent form

x= (e1⊗e1)x+(e2⊗e2)x+(e3⊗e3)x,

which has at least two implications. First, upon factoring the x on the right-hand side,we see that

I = e1⊗e1+e2⊗e2+e3⊗e3.

Second, ei ⊗ei acting on any vector x produces the projection of x in the direction ei .

We noted earlier that the set of tensors on V , equipped with the operations definedin (8.2) and (8.3), is a vector space of its own. The following lemma shows how to con-struct a basis for that vector space.

Lemma 8.2. Let e1,e2,e3 be an orthonormal basis in V . Then ei ⊗e j 3i , j=1 is a basis for

the space of tensors on V .

Proof. We will show that any tensor L is a linear combination of the nine dyadic productsei ⊗e j 3i , j=1. Toward that end, pick an arbitrary x ∈ V ,

x=3∑

i=1

(x ·ei )ei ,

and let y = Lx. By the linearity of L we have

y = Lx=3∑

i=1

(x ·ei )Lei .

it follows that

y ·e j =3∑

i=1

(x ·ei )(Lei ·ej) =3∑

i=1

(Lei ·ej)(x ·ei ),

and consequently

y =3∑

j=1

(y ·e j )e j

=3∑

j=1

3∑

i=1

(Lei ·ej)(x ·ei )e j

=3∑

j=1

3∑

i=1

(Lei ·ej)(ei ⊗e j )x,

8.1. A brief introduction to tensor algebra 63

where we have made use of the definition (8.4). Since y = Lx, this tells us that

Lx=3∑

i=1

3∑

j=1

(Lei ·ej)(ei ⊗e j )x for all x ∈ V ,

therefore

L=3∑

i=1

3∑

j=1

(Lei ·ej )(ei ⊗e j ), (8.8)

which shows that L is a linear combination of the nine dyadic products ei ⊗e j 3i , j=1. In

other words, the set of vectors ei ⊗ e j 3i , j=1 spans the set of all tensors. To show that

the set is a basis, it remains to show that it is linearly independent. You will do that inExercise 2.

Remark 8.3. The coefficients in (8.8) are called the components of L in the basis ei ⊗e j 3i , j=1. Letting ℓi j = Lei ·ej , we may write (8.8) in the abbreviated form

L=3∑

i=1

3∑

j=1

ℓi jei ⊗e j .

At times it is useful to identify the tensor L with the 3× 3 matrix with components ℓi j

but the caveats of Remark 8.1 apply equally well here. The components ℓi j are artifacts

of the choice of the basis vectors. The tensor is an intrinsic property of the system andwill continue to exist even when the basis vectors are obliterated.

Remark 8.4. In view of the one-to-one correspondence between 3× 3 matrices and ten-sors on V noted above, every property or theorem in matrix algebra finds a counterpartin tensor analysis. For instance, a nonzero vector x is called and eigenvector of the tensorL if Lx = λx for some λ ∈ R.7 The coefficient λ is the eigenvalue corresponding to thateigenvector.

8.1.3 Symmetric tensors

A tensor L is said to be symmetric if

a · (Lb) = (La) ·b for all a,b ∈ V .

It is easy to show that if L is symmetric, then the matrix ℓi j of its components (see Re-

mark 8.3) is symmetric, that is ℓi j = ℓ j i for i , j = 1,2,3.

We know from matrix analysis that the eigenvalues of a symmetric matrix are real,and its eigenvector may selected as an orthonormal set. This, along with what is knownas diagonalization of matrices in linear algebra, lead to the spectral decomposition theoremin tensor algebra:

Theorem 8.3 (Spectral decomposition, cf. [7, p. 137]). The eigenvalues λ1, λ2, λ3 of asymmetric tensor L are real, and the corresponding eigenvectors, g1, g2, g3 may be selected asan orthonormal set. Furthermore,

L= λ1g1⊗g1+λ2g2⊗g2+λ3g3⊗g3. (8.9)

7say something about complex fields


vimi

ω ri

o

Figure 8.3: whatever

Remark 8.5. According to (8.8), a tensor is a linear combination of the nine basis ele-ments ei ⊗ e j 3i , j=1 in general. In contrast, (8.9) presents L as a linear combination of

only three special elements g j ⊗g j 3j=1 constructed from L’s eigenvectors.

8.2 The moment of inertia tensor

Consider a set of particles of masses mi , i = 1,2, . . . ,N , which are thoroughly intercon-nected through massless rigid rods, so that the entire assembly forms a rigid, object. Sup-pose that the object rotates with angular velocity ω about a fixed axis passing throughthe origin. Figure 8.3 depicts the vector ω, and a representative particle of mass mi andposition vector Let ri . Then the particle’s velocity is vi = ω× ri , therefore the kineticenergy of the N -particle system is given by

T =N∑

i=1

1

2mi‖vi‖2 =

N∑

i=1

1

2mi‖ω×ri‖2,

which according to Lemma 8.1 is equivalent to

T =N∑

i=1

1

2miω ·

‖ri‖2I −ri ⊗ri

ω

=1

2ω ·

N∑

i=1

mi

‖ri‖2I −ri ⊗ri )

ω.

We introduce the tensor

I =N∑

i=1

mi

‖ri‖2I −ri ⊗ri

, (8.10)

whereby the kinetic energy takes the simple form

T =1

2ω · Iω. (8.11)

The tensor I is called the moment of inertia tensor of the N -particle system. Let usnote that I is independent of ω, so the orientation of the axis about which the systemrotates, or the speed of rotation, is immaterial. However,I does depend on the choice ofthe origin of the vectors—changing the origin will affect the position vectors ri , therefore

8.3. Translation of the origin 65

the the tensor I . The change, however, obeys a simple translation rule which we willdevelop in the next section. For now let us observe another aspect of (8.10). The fact thatthe system under consideration consists of N rigidly connected point masses is hardlyof particular importance. A general rigid solid may be approximated by a union a largenumber of tiny parts, as one does in the theory of integration, and then pass to the limitas the number of the parts goes to infinity, and the sizes of the individual parts go to zero,while maintaining a fix mass for the aggregate.

To be specific, letB be the solid object, d m be the differential mass of the “part” ofBindicated by the position vector r relative to some origin o. Then the obvious extensionof (8.10) takes the following form for the solid’s moment of inertia:

I =∫

B

‖r‖2I −r⊗r

d m. (8.12)

If ρ(r) is the density of the body at the position r, then d m = ρ dV , where V is thevolume element, and the formula above takes the form

I =∫

Bρ(r)

‖r‖2I −r⊗r

dV . (8.13)

8.3 Translation of the origin

As noted above, the moment of inertia tensor of a rigid body depends on the choice ofthe origin of the vectors. To see how a translation of the origin affects the tensor, let Ioand Io′ be the moment of inertia tensors of a rigid bodyB relative to two origins o ando′, respectively. According to (8.12) we have:

Io =∫

B

‖r‖2I −r⊗r

d m, Io′ =∫

B

‖r′‖2I −r′⊗r′

d m,

where r and r′ are the position vectors of a generic point p ∈ B relative to o and o′, asseen in Figure 8.4.

Theorem 8.4. Let c ∈ B the the body’s center of mass, and let us write r′c for the positionvector of c relative to o′. Furthermore, let τ = o′−o. Then we have:

Io =Io′ +mh

2r′c ·τ + ‖τ‖2

I −r′c⊗ τ − τ ⊗r′c− τ ⊗ τi

, (8.14)

where m is the body’s mass.

Proof. Referring to Figure 8.4 we have r = r′+τ, therefore

Io =∫

B

‖r‖2I −r⊗r

d m

=

∫

B

‖r′+ τ‖2I − (r′+ τ )⊗ (r′+ τ )

d m

=

∫

B

h

‖r′‖2+ 2r′ ·τ + ‖τ‖2

I −r′⊗r′−r′⊗ τ − τ ⊗r′− τ ⊗ τi

d m

=

∫

B

‖r′‖2I −r′⊗r′

d m+

∫

B

h

2r′ ·τ + ‖τ‖2

I −r′⊗ τ − τ ⊗r′− τ ⊗ τi

d m.


o o′

c

p rc

r

r′c

r′

τ

B

Figure 8.4: The moment of inertia tensor of a rigid solidB depends on the choice ofthe origin. The origins o′ and o′ in this illustration are related througho′−o= τ . The solid’s center of mass is c.

The first intergral on the right-hand side equalsIo′ . The second integral may be simplifiedby noting that

m =

∫

Bd m, r′c =

1

m

∫

Br′ d m,

and consequently∫

B r′ d m =mr′c,

Corollary 8.5. Let c ∈ B be the body’s center of mass as before, and let Ic be the momentof inertia tensor relative to c. Then the moment of inertia tensor Io ofB relative to anyorigin o is given by

Io =Ic+m

‖rc‖2I −rc⊗rc

. (8.15)

Proof. Apply (8.14) with o′ set to c. Then r′c = 0, and the formula reduces to

Io =Ic+m

‖τ‖2

I − τ ⊗ τ

.

Then the observation that τ = o′−o= c−o= rc completes the proof.

Remark 8.6. The moment of intertia tensor of a rigid system of N point masses is givenin (8.10). In particualr, the moment of intertia tensor of a single point of mass m at aposition r relative to an origin o is

m

‖r‖2I −r⊗r

.

Therefore the translation formula (8.15) may be interpreted as saying that the moment ofinteria of a body relative of any point o equals the moment of inertia relative to its centerof mass, plus the moment of inertia relative to o of a fictitious point of mass m situatedat the center of mass.

Remark 8.7. The moment of inertia tensor Ic of a ridig body relative to its center ofmass is an intrinsic property of the body, just like its total mass or its center of mass are.

8.4. The principal moments of inertia 67

The total mass is a scalar, the center of mass is expressed through its position vector, andthe moment of inertia Ic is a tensor.

8.4 The principal moments of inertia

In the exercises you will verity that a moment of inertia tensor is a symmetric tensor, thatis:

m · In=n · Im, for all m,n ∈ V .

Then according to the Spectral Decomposition Theorem (page 63)I admits a representa-tion of the form (8.9). In that context, the eigenvalues λ1, λ2, λ3 ofI are called the object’sprincipal moments of inertia, and the eigenvectors g1, g2, g3 are called the principal axes ofthe moment of inertia.

Tables of the principal moments of inertia of many common geometric solids areavailable in books and websites. Wikipedia has a page for it at:

http://en.wikipedia.org/wiki/List_of_moments_of_inertia

Example 8.6. Consider a solid circular cylinder of radius r , length ℓ, and total mass m.Assume that the mass is distributed uniformly, that is, the density is constant. The prin-cipal axes of the moment of inertia tensor Ic relative to the center of mass are: g3 alongthe cylinder’s axis; g1 and g2 arbitrary unit vectors so that g1,g2,g3 forms an orthonor-

mal set. The corresponding principal moments of inertia are λ1 = λ2 =112 m(3r 2 + ℓ2),

λ3 =12 mr 2. Therefore

Ic =1

12m(3r 2+ ℓ2)g1⊗g1+

1

12m(3r 2+ ℓ2)g2⊗g2+

1

2mr 2g3⊗g3. (8.16)

The limiting case of r = 0 is interesting in its own right. Such an object is called aslender rod of length ℓ and mass m. In that case we have:

Ic =1

12mℓ2g1⊗g1 +

1

12mℓ2g2⊗g2. (8.17)

The principal moments of inertia I of a slender rod relative to one of its endpoints maybe calculated from the translation formula (8.15).

Exercises

8.1. Show that an orthonormal set of vectors is linearly independent.

8.2. Complete the proof of Lemma 8.2 by showing that the set ei ⊗e j 3i , j=1 is linearly

independent. Hint: It suffices to show that∑3

i=1

∑3j=1 ci j (ei⊗e j ) = 0 implies that

every ci j is zero. Begin by applying each side of that equality to ek .

8.3. Show that if the tensor L is symmetric, then the matrix ℓi j of its components (see

Remark 8.3) is symmetric, that is ℓi j = ℓ j i for i , j = 1,2,3.

8.4. Use (8.17) in conjunction with the translation formula (8.15) to calculate the mo-ment of inertia tensor of a slender rod relative to one of its endpoints.

8.5. Look up the principal moments of inertia of a circular hoop in Wikipedia relativeto the hoop’s center. Then apply the translation formula (8.15) to calculate themoment of inertia tensor of the hoop relative to a point on its rim.

Chapter 9

Constraint reactions

In Chapter 6 we considered a mechanical system consisting of N point masses subject toM holonomic constraints (6.6), and derived the equation of motion (6.17), where the gen-eralized forces Q j nj=1 are related to the externally applied forces fi 3N

i=1 through (6.13).

Here n = 3N −M .In the derivation of the equations we assumed that the generalized coordinates were

independent of each other. This was used in the derivation of equation (6.15) where we as-sumed thqat the virtual displacements δq were arbitrary. Among other this, this resultedin the elimination of the internal reaction forces f ′i from the equations of motion.

Suppose, however, that we are interested in finding out the reaction forces. It is thegoal of this chapter to explain how. The key idea is to forgo the assumption of indepen-dence of the generalized coordinates.

Specifically, we assume that the n generalized coordinates q1, q2, . . . , qn are greater thanthe minimum necessary to specify the system’s configuration. This implies that one ormore relationships exists among the q j ’s. Suppose that they are m such relationships:

a11 d q1+ a12 d q2+ · · ·a1n d qn + a1t d t = 0,

a21 d q1+ a22 d q2+ · · ·a2n d qn + a2t d t = 0,

· · ·am1 d q1+ am2 d q2+ · · ·amn d qn + amt d t = 0,

(9.1)

where each ai j and ai t can be a given function of q and t . For convenience, we write these

set of constraints in the compact form

n∑

k=1

al k (q , t )d qk + al t (q , t )d t = 0, l = 1,2, . . . , m, (9.2)

Remark 9.1. By dividing (9.2) through by d t , we see that

n∑

k=1

al k (q , t )qk + al t (q , t ) = 0, l = 1,2, . . . , m, (9.3)

that is, the equations impose restrictions on the system’s velocities.

Repeating (TODO) the calculations of Chapter 6, we arrive at the following equations

69

70 Chapter 9. Constraint reactions

of motion:

d

d t

∂ L(q, q, t )

∂ q j

− ∂ L(q, q, t )

∂ q j

=m∑

k=1

ak jλk , j = 1,2, . . . , n. (9.4)

These, together with (9.3), form a system of n + m equations in the m + n unknownsq1, q2, . . . , qn , λ1,λ2, . . . ,λm . The summation that appeart on the right-hand side gives thegeneralized reaction forces

Q ′j =m∑

k=1

ak jλk . (9.5)

Once the coefficients λk have been computed, we may use (9.5) in conjunction with

Q ′j =3N∑

i=1

f ′i∂ xi

∂ q j

(9.6)

to determine the raction force compnents f ′i .

Example 9.1. Let us revisit the simple pendulum of Figure 1.1 in page 2, and calculatethe force within its connecting rod.

In Section 1.3 we derived the equation of motion (1.5) in terms of the angle ϕ whichserved as the generalized coordinate. The position vector of the pendulum’s bob relativeto the suspension point was expressed as r = ℓer in that context, where ℓ is the length ofthe pendulum’s rod.

Here we change the setting of the problem as follows. We express the configurationof pendulum in terms of not one, but two genealized coordinates: ϕ, which is the pendu-lum’s angle as before; and ρ which is the rod’s length and which is viewed as a variable.We impose the constraint ρ= ℓ retroactively to recover the physical model.

Referring to Figure 1.1 we have r = ρer , therefore the bob’s velocity is given by

v = r = ρer +ρer = ρer −ρϕeϕ,

where we have made use of (1.1). The kinetic and potential energies are

T =1

2m(ρ2+ρ2ϕ2), V =−m gρ cosϕ,

which leads to the lagrangian

L= T −V =1

2m(ρ2+ρ2ϕ2)+m gρ cosϕ.

We calculate

∂ L

∂ ρ= mρ,

∂ L

∂ ρ= mρϕ2+m g cosϕ

∂ L

∂ ϕ= mρ2ϕ,

∂ L

∂ ϕ=−m gρ sinϕ.

The constraint of inextensibility ρ = ℓ implies that dρ = 0, which we write as (1)dρ+(0)dϕ+(0)d t = 0 to conform to the genral template (9.1). It follows that a11 = 1, a12 = 0,a1t = 0, therefore the equations (9.4) take the form

d

d t

∂ L

∂ ρ

− ∂ L

∂ ρ= a11λ1

d

d t

∂ L

∂ ϕ

− ∂ L

∂ ϕ= a12λ1,

71

that is

(mρ)· − (mρϕ2+m g cosϕ) = λ1,

(mρ2ϕ)·+m gρ sinϕ = 0,

or in expanded form:

mρ− (mρϕ2+m g cosϕ) = λ1,

mρ2ϕ+ 2mρρϕ+m gρ sinϕ = 0.

Applying the constraint ρ= ℓ reduces these to

−mℓϕ2−m g cosϕ = λ1,

mℓ2ϕ+m gℓ sinϕ = 0.

The second equation is the usual equation of motion of a simple pendulum. In principle,we may plug its solution, ϕ(t ), into the first equation to find λ1, but we don’t do it thatway. Instead, we compute the generalized reaction forces Q ′r and Q ′ϕ from (9.5):

Q ′r = (1)λ1 =−mℓϕ2−m g cosϕ, Q ′ϕ = (0)λ1 = 0.

Then, we apply (9.6) to translate these into the physical components of the forces. Towardthat end, let us observe that r = ρer , therefore

∂ r

∂ ρ= er ,

∂ r

∂ ϕ= ρ

∂ er

∂ ϕ= ρeϕ,

therefore (9.6) reads

Q ′r = f ′ · ∂ r∂ ρ= f ′ ·er , Q ′ϕ = f ′ · ∂ r

∂ ϕ= f ′ ·ρeϕ ,

whence

f ′ ·er =−(mℓϕ2+m g cosϕ) f ′ ·eϕ = 0.

Since f ′ = (f ′ ·er )er = (f′ ·eϕ)eϕ , we conclude that

f ′ =−(mℓϕ2+m g cosϕ)er .

This tells us that the constraint reaction force f ′ lies in the direction of the pendulum’srod, and its magnitude equals the sum of the centrifugal force mℓϕ2 and the compomentm g cosϕ of the bob’s weight along the rod.

Example 9.2. Consider a the rolling hoop of Exercise (2) of Chapter 6. Find the force atcontact point between the hoop and the plane.

Referring to Figure 6.3, we use the hoop’s rotation angle ϕ and the horizontal trans-lation x of its center as a generalized coordinates. The no-slip condition imposes the con-straint x = aϕ. We will use ϕ and x as overdetermined generalized coordinates subject tox = aϕ, that is d x − a dϕ = 0, thefore accoring to the template (9.1), a11 = 1, a12 =−a.


The position vector of the mass m is r = ⟨x+a sinϕ,a+a cosϕ⟩, therefore the velocityis v = r = ⟨x + aϕ cosϕ,−aϕ sinϕ⟩. It follows that ‖v‖2 = x2 + 2axϕ cosϕ + a2ϕ2,therefore the kinetic and potential enerigies are

T =1

2m(x2 + 2axϕ cosϕ+ a2ϕ2), V = m ga(1+ cosϕ).

Therefore

L=1

2m(x2 + 2axϕ cosϕ+ a2ϕ2)−m ga(1+ cosϕ).

and

∂ L

∂ x= m(x + aϕ cosϕ),

∂ L

∂ x= 0,

∂ L

∂ ϕ= m(ax cosϕ+ a2ϕ)

∂ L

∂ ϕ= m(−axϕ sinϕ+ ga sinϕ).

Then (9.4) takes the form

m(x + aϕ cosϕ)· = a11λ1 = λ1,

m(ax cosϕ+ a2ϕ)· −m(−ax ϕ sinϕ+ ga sinϕ) = a12λ1 =−aλ1.

Now substitute x = aϕ and simplify:

−ma(1+ cosϕ)ϕ+maϕ2 sinϕ = λ1,

ma2(1+ cosϕ)ϕ+m ga sinϕ =−aλ1.(9.7)

To eliminate λ1, divide the second equation through by a and add the result to thefirst equation. We get

2ma(1+ cosϕ)ϕ−ma g

a+ ϕ2

sinϕ = 0.

This is the hoop’s equation of motion. To compute the constraint force, solve this for ϕ:

ϕ =a g

a+ ϕ2

sinϕ

2a(1+ cosϕ)

and substitute the result in the first of (9.7). We get:

λ1 =1

2ma

ϕ2− g

a

sinϕ. (9.8)

Now we compute the generalized forces Q ′x and Q ′ϕ from (9.5):

Q ′x = a11λ1 = λ1, Q ′ϕ = a12λ1 =−aλ1.

Then, we apply (9.6) to translate these into the physical components of the forces. Towardthat end, let us observe that

∂ r

∂ x= ⟨1,0⟩, ∂ r

∂ ϕ= ⟨a cosϕ,−a sinϕ⟩,

Exercises 73

mϕ

p

f ′ f ′

f ′

a(1+

cosϕ)

a sinϕf ′


therefore, letting f = ⟨ fx , fy ⟩, we get

Q ′x = f · ∂ r∂ x= fx Q ′ϕ = f · ∂ r

∂ ϕ= a fx cosϕ− a fy sinϕ.

It follows thatfx = λ1, a fx cosϕ− a fy sinϕ =−aλ1.

We solve this as a system of two equations in the two unknowns fx and fy :

fx = λ1, fy =1+ cosϕ

sinϕλ1.

Upon substitution for λ1 from (9.8) we conclude that

fx =1

2ma

ϕ2− g

a

sinϕ,

fy =1

2ma

ϕ2− g

a

(1+ cosϕ),

whence

f =1

2ma

ϕ2− g

a

⟨sinϕ, 1+ cosϕ⟩.

This result has a significant mechanical/geometric interpretation. Refer to Figure 9.1.The vector a⟨sinϕ, 1+ cosϕ⟩ extends from the contact point p to the mass m, thereforethe constraint force f ′ is parallel to that vector.

Exercises

9.1. The hoop of Example 9.2 rolls, without slipping, down an incline which makes anangle α with respect to the horizontal. See Figure 9.2. Use the angle ϕ as the gen-eralized coordinate. Derive the equation of motion. Hint: (a) Express the vectorsi′ and j′ in terms of i, j, and the angle α; (b) Express the position vector r of themass m in terms of the basis vectors i′ and j′; (c) Apply the results of (a) and (b) toexpress r in terms of i and j.

9.2. In the the previous exercise, find the contact force between the hoop and incline.


i

j

α

i′

j′

m

αϕ

r

Figure 9.2: Hoop rolling on an incline. The hoop’s initial position is shown in gray(Exercise 1).

Chapter 10

The Gibbs-Appellformulation of dynamics

The goal of this chapter is to derive the Gibbs-Appell equations of motion and then showa few applications. For now, I have two versions of the proof here, one based on thepresentation in Lurie [9] and the other based on the presentation in Gantmacher [10].The part based on Lurie is not quite finished; the one based on Gantmacher is prettycomplete.

Due to the reliance on two different presentations, the notation in the illustrations/ex-amples is very inconsistent. At one point I will rewrite this chapter by merging the twopresentations into one, and introduce a consistent notation.

Some of the material from the previous chapters is repeated for the sake of makingthis chapter somewhat self-contained.

10.1 Gibbs-Appell according to Lurie [9]

The goal of this (lengthy) section is to derive the Gibbs-Appell equations of motion (10.27).

10.1.1 Acceleration in generalized coordinates

Consider the dynamics of N point masses, whose position vectors relative to an origin o

are ri , i = 1, . . . ,N .Suppose that the system’s configuration may be specified through n independent gen-

eralized coordinates q = ⟨q1, . . . , qn⟩. Thus, ri = ri (q, t ), and the velocities are givenby

vi = ri =n∑

s=1

∂ ri

∂ qs

qs +∂ ri

∂ t, i = 1,2, . . . ,N . (10.1)

We differentiate the velocities to calculate the accelerations wi :

wi = vi =n∑

s=1

∂ ri

∂ qs

qs +n∑

k=1

n∑

s=1

∂ 2ri

∂ qk∂ qs

qk qs + 2n∑

s=1

∂ 2ri

∂ qs∂ tqs +

∂ 2ri

∂ t 2. (10.2)

10.1.2 Ideal constraints and the fundamental equation of dynamics

Continuing the previous subsection’s analysis of the motion of N particles, let Fi+Ri bethe totality of forces acting on the particle i , where Fi is the resultant of the (known) ex-ternally applied forces and Ri is the resultant of the (unknown) internal forces/reactions.

75

76 Chapter 10. The Gibbs-Appell formulation of dynamics

Newton’s law of motion states that

miwi =Fi +Ri , i = 1, . . . ,N . (10.3)

We assume that the system’s constraints are ideal, that is, they do not perform workin any virtual displacement. (See page 271 of Lurie’s book for a discussion.) Thus, for allvirtual displacements δri we have

N∑

i=1

Ri ·δri = 0. (10.4)

As a consequence, multiplying (10.3) by δri and summing over i eliminates the the reac-tion forces Ri :

N∑

i=1

miwi ·δri =N∑

i=1

Fi ·δri . (10.5)

Lagrange called this the fundamental equation of dynamics.

10.1.3 Virtual work and generalized force

The virtual displacement δri of particle i is related to the virtual displacement δq of thegeneralized coordinates through

δri =n∑

s=1

∂ ri

∂ qs

δqs .

The expression on the right-hand side of (10.5) is the virtual work performed by allexternal forces Fi under the virtual displacements δri . It may be expressed as

N∑

i=1

Fi ·δri =N∑

i=1

Fi ·

n∑

s=1

∂ ri

∂ qs

δqs

=n∑

s=1

N∑

i=1

Fi ·∂ ri

∂ qs

δqs =n∑

s=1

Qs δqs ,

where we have let

Qs =N∑

i=1

Fi ·∂ ri

∂ qs

. (10.6)

Qs is called the generalized force corresponding to the generalized coordinate qs .

TODO: It can be shown that

∂ ri

∂ qs

=∂ ri

∂ qs

=∂ ri

∂ qs

.

In formulating the Gibbs-Appell equations, it works better if we replace (10.6) with

Qs =N∑

i=1

Fi ·∂ ri

∂ qs

. (10.7)

10.1. Gibbs-Appell according to Lurie [9] 77

10.1.4 Constraints

Suppose that the n generalized coordinates are related through l (generally nonholo-nomic) constraints

n∑

s=1

ak s (q, t )qs + ak (q, t ) = 0, k = 1, . . . , l , (10.8)

or equivalently,

n∑

s=1

ak s (q, t )d qs + ak (q, t )d t = 0. k = 1, . . . , l . (10.9)

Consequently, virtual displacements δq satisfy

n∑

s=1

ak s (q, t )δqs = 0, k = 1, . . . , l . (10.10)

We assume that the l × n coefficient matrix ak s is full-rank, therefore (10.8) may besolved for l of the generalized velocities in terms of the rest. Thus:

qr =n−l∑

s=1

br,l+s (q, t )ql+s + br (q, t ), r = 1, . . . , l . (10.11)

Similarly, (10.10) may be solved for l of the virtual displacements in terms of the rest:

δqr =n−l∑

s=1

br,l+s (q, t )δql+s , r = 1, . . . , l . (10.12)

Differentiating (10.11) we obtain an expression for the accelerations:

qr =n−l∑

s=1

br,l+s (q, t )ql+s +n−l∑

s=1

br,l+s (q, t )ql+s +n−l∑

s=1

br (q, t ), r = 1, . . . , l .

For reasons which will become clear shortly, the terms which involve no generalized ac-celerations in our calculations are irrelevant, therefore, to simplify the notation, we writethe above as

qr =n−l∑

s=1

br,l+s (q, t )ql+s + · · · . (10.13)

From here on, the ellipsis “· · · ” in an equation indicates additive terms that involve nogeneralized accelerations.

Now we apply (10.13) to eliminate the accelerations qr , r = 1, . . . , l from (10.2). Wehave:

wi =n∑

s=1

∂ ri

∂ qs

qs + · · ·

=l∑

r=1

∂ ri

∂ qr

qr +n−l∑

s=1

∂ ri

∂ ql+s

ql+s + · · ·

=l∑

r=1

∂ ri

∂ qr

n−l∑

s=1

br,l+s (q, t )ql+s + · · ·

+n−l∑

s=1

∂ ri

∂ ql+s

ql+s + · · ·

=n−l∑

s=1

∂ ri

∂ ql+s

+l∑

r=1

∂ ri

∂ qr

br,l+s (q, t )

ql+s + · · · .


Upon introducing the notation

ci ,l+s =∂ ri

∂ ql+s

+l∑

r=1

∂ ri

∂ qr

br,l+s (q, t ), i = 1, . . . ,N , s = 1, . . . , n− l , (10.14)

the acceleration takes the form

wi =n−l∑

s=1

ci ,l+s ql+s + · · · , i = 1, . . . ,N .

Since the terms hidden under the ellipsis do not involve the generalized accelerations, weconclude that

∂ wi

∂ ql+s

= ci ,l+s . (10.15)

10.1.5 Virtual displacements

According to (10.12), a virtual displacement δri is given by

δri =n∑

k=1

∂ ri

∂ qk

δqk =l∑

r=1

∂ ri

∂ qr

δqr +n−l∑

s=1

∂ ri

∂ ql+s

δql+s (10.16)

=l∑

r=1

∂ ri

∂ qr

n−l∑

s=1

br,l+s (q, t )δql+s

+n−l∑

s=1

∂ ri

∂ ql+s

δql+s

=n−l∑

s=1

∂ ri

∂ ql+s

+n−l∑

s=1

∂ ri

∂ qr

br,l+s (q, t )

δql+s

=n−l∑

s=1

ci ,l+s δql+s ,

where ci ,l+s is as in (10.14). Then, in view of (10.15) we conclude that

δri =n−l∑

s=1

∂ wi

∂ ql+s

δql+s , i = 1, . . . ,N . (10.17)

10.1.6 Back to the fundamental equation: Part 1

We use (10.17) to reformulate the fundamental equation of dynamics (10.5):

N∑

i=1

miwi ·δri =N∑

i=1

miwi ·n−l∑

s=1

∂ wi

∂ ql+s

δql+s ,

=n−l∑

s=1

N∑

i=1

miwi ·∂ wi

∂ ql+s

δql+s .

This motivates the introduction of a quantityG known as the Gibbs function or the energyof acceleration:

G=1

2

N∑

i=1

miwi ·wi ,

10.1. Gibbs-Appell according to Lurie [9] 79

We observe that∂ G

∂ ql+s

=N∑

i=1

miwi ·∂ wi

∂ ql+s

,

whenceN∑

i=1

miwi ·δri =n−l∑

s=1

∂ G

∂ ql+s

δql+s . (10.18)

Let us note that we consider G as a function of the form G = G(q, ql+1, . . . , qn , t ),therefore the nonholonomic constraints (10.8) are automatically accounted for.

10.1.7 Back to the fundamental equation: Part 2

In the previous section we reformulated the left-hand side of the fundamental equation ofdynamics (10.5). In this section we reformulate its right-hand side. According to (10.16)we have:

N∑

i=1

Fi ·δri =N∑

i=1

Fi ·n−l∑

s=1

ci ,l+s δql+s =n−l∑

s=1

N∑

i=1

Fi · ci ,l+s

δql+s =n−l∑

s=1

Ql+s δql+s ,

(10.19)where we have let

Ql+s =N∑

i=1

Fi · ci ,l+s , s = 1, . . . , n− l .

Upon substituting for ci ,l+s from its definition in (10.14), we see that

Ql+s =N∑

i=1

Fi ·

∂ ri

∂ ql+s

+l∑

r=1

∂ ri

∂ qr

br,l+s (q, t ),

=N∑

i=1

Fi ·∂ ri

∂ ql+s

+l∑

r=1

N∑

i=1

Fi ·∂ ri

∂ qr

br,l+s (q, t ),

which, according to (10.6) reduces to

Ql+s =Ql+s +l∑

r=1

Qr br,l+s (q, t ), s = 1, . . . , n− l .

10.1.8 The Gibbs-Appell equations of motion

Substituting (10.18) and (10.19) in the fundamental equation of dynamics (10.5), we arriveat

n−l∑

s=1

∂ G

∂ ql+s

δql+s =n−l∑

s=1

Ql+s δql+s ,

that is,n−l∑

s=1

∂ G

∂ ql+s

δql+s − Ql+s

δql+s .

Since the variations δql+s are independent, we conclude that

∂ G

∂ ql+s

= Ql+s , s = 1, . . . , n− l . (10.20)


These are the Gibbs-Appell equation of motion.The n− l second order differential equations (10.20), along with the l constraint equa-

tions (10.8) form a system of n differential equation in the n unknowns q1(t ), . . . , qn(t ).

10.1.9 Quasi-velocities

In a mechanical system with generalized coordinates q = ⟨q1, . . . , qn⟩, expressions of thetype

ωs =n∑

k=1

ask (q, t )qk + as (q, t ), s = 1, . . . , n (10.21)

are called quasi-velocities. At times it is simpler to model a system in terms of suitablydefined quasi-velocities ω1, . . . ,ωn rather than generalized coordinates q1, . . . , qn . If thenumber of naturally occurring quasi-velocities is n′ < n, then we extend their number ton by defining the rest throughωs = qs , s = n′+ 1, . . . , n.

We assume that the quasi-velocities are defined so that the coefficient matrix ask isnonsingular. It follows that (10.21) may be solved for q’s in terms of ω’s:

qr =n∑

s=1

br s (q, t )ωs + br (q, t ), s = 1, . . . , n.

Additionally, with each equation in (10.21) we associate a differential as follows

dπs =n∑

k=1

ask (q, t )d qk + as (q, t )d t , s = 1, . . . , n. (10.22)

Comparing (10.21) and (10.22) we see that

dπs =ωs d t . (10.23)

Remark: There is no reason to expect the expression on the right to be an exact dif-ferential, therefore although dπs makes sense as defined, it should not be assumed that itis the differential of a function πs .

For any arbitrary function ϕ(q, t ) we have:

dϕ =n∑

r=1

∂ ϕ

∂ qr

d qr +∂ ϕ

∂ td t

=n∑

r=1

∂ ϕ

∂ qr

qr d t +∂ ϕ

∂ td t

=n∑

r=1

∂ ϕ

∂ qr

n∑

s=1

br s (q, t )ωs + br (q, t )

d t +∂ ϕ

∂ td t

=n∑

s=1

n∑

r=1

br s (q, t )∂ ϕ

∂ qr

dπs +n∑

r=1

br (q, t )∂ ϕ

∂ qr

d t +∂ ϕ

∂ td t .

We introduce the purely symbolic notation

∂ ϕ

∂ πs

=n∑

r=1

br s (q, t )∂ ϕ

∂ qr

, (10.24)

10.2. Gibbs-Appell according to Gantmacher [10] 81

whereupon the previous calculation leads to

dϕ =n∑

s=1

∂ ϕ

∂ πs

dπs +

n∑

r=1

br (q, t )∂ ϕ

∂ qr

+∂ ϕ

∂ t

d t . (10.25)

Applying (10.24) to ri we get:

∂ ri

∂ πs

=n∑

r=1

br s (q, t )∂ ri

∂ qr

,

and therefore we obtain an expression for the velocity vector in terms of the quasi-coordinates:

vi = ri = TODO .. .

10.1.10 Appell’s equations of motion in terms of quasi-velocities

The main ingredients of Appell’s equations of motion (10.20) are the generalized accel-erations qs . Those equations, however, are rarely used in that form. We will find thatexpressing the equations in terms of quasi-velocities, defined below, leads to major simpli-fication.

Thus, given the l constraints (10.8), define the quasi-velocities

ωk =n∑

s=1

ak s (q, t )qs + ak (q, t ), k = 1, . . . , l , (10.26)

therefore the constraint equations take the form

ωk = 0, k = 1, . . . , l .

We solve (10.26) for the q in terms of the quasi-velocities. Since the first l of the quasi-velocities are zero, each qr is a function of ωl+s , s = 1, . . . , n − l . Then we express theGibbs function G as a function of

q1, . . . , qn , ωl+1, . . . ,ωn ωl+1, . . . ,ωn .

The rest of the derivation is horrendous, so we skip forward. . .

Then, it turns out that the Gibbs-Appell equation of motion take the form

∂ G

∂ ωl+s

=Ql+s , s = 1, . . . , l . (10.27)

Despite the complexity of the derivation, equations (10.27) are much easier to apply thanthe original (10.20). In particular, these result in first order differential equations in quasi-velocities, as apposed to the second order differential equations which we were gettingbefore.

10.2 Gibbs-Appell according to Gantmacher [10]

The goal of this (lengthy) section is to derive the Gibbs-Appell equations of motion (10.47)


10.2.1 Pseudocoordinates

Consider the motion of N particles of masses mν and position vectors rν subject to dalgebraic and g differential constraints. Utilizing the d algebraic constraints, we inroducem = 3N−d generalized idependent coordinates q = ⟨q1, . . . , qm⟩, and express the system’sconfigures in them as

rν = rν (q, t ), ν = 1, . . . ,N . (10.28)

From this it follows that

rν =m∑

i=1

∂ rν∂ qi

qi +∂ rν∂ t

, ν = 1, . . . ,N , (10.29)

and

δrν =m∑

i=1

∂ rν∂ qi

δqi , ν = 1, . . . ,N . (10.30)

The vectors rν and rν should satisfy the g differential constraints:

m∑

ν=1

lβν(rν , t ) · rν +Dβ(rν , t ) = 0, β= 1, . . . , g . (10.31)

Substituting for rν and rν from (10.28) and (10.29), the constraint equations take theform

m∑

i=1

Aβi (q, t )qi +Aβ(q, t ) = 0, β= 1, . . . , g . (10.32)

Thus, the generalized coordinates q1, . . . , qm can take on arbitrary values, but the gen-eralized vecocities q1, . . . , qm are constrained by (10.32).

Assuming that the g constraints in (10.32) are indepdendent, that is, the matrix Aβi

has full rank, we may solve for g of the velocities q1, . . . , qm in terms of the remainingn = m − g = 3N − d − g , the number n being the system’s degrees of freedom. Thus,the velocities q1, . . . , qn may take on arbitrary values, while the remaining g velocities aredetermined from (10.32).

In practice, instead of the n generalized velocities noted above, it is often preferableto use n linear combination of the generalized velocities, such as

πs =m∑

i=1

fs i qi , i = 1, . . . , n. (10.33)

The quantities πs defined here are called pseudovelocities. The notation πs is entirely proforma—there is no requirement that the right-hand side of (10.33) be and exact derivative,therefore although the notation πs is well-defined, there is no function πs which it is thederivative of. Nevertheless, we will have occasions to refer to the un-dotted symbol πs

which has no predefind meaning, but we will define it in an appropriate and constent way.The symbol πs is called a pseudocoordinate.

We impose an invertibility requirement on the definitions (10.33) as follows. Con-sider the system of g + n = m equation obtained as the union of the equations (10.32)and (10.33), and view it as a system of m linear equaitons in the m unknowns q1, . . . , qm .We require that system to be invertible. Thus, the m generalized velocities may be ex-pressed in terms of the n pseudovelocities:

qi =n∑

s=1

hi s (q, t )πs + hi (q, t ), i = 1, . . . , m. (10.34)

10.2. Gibbs-Appell according to Gantmacher [10] 83

Thus, any set of m generalized velocities that satify the motion constraints (10.32) definea corresponding set of pseudovelocities πs , and conversely, any set of n arbitrary pseu-dovelocities define a set of generalized velocities that satify the motion constraints. Theimportant point here is that there is no constraint on the pseudovelocities.

Let us note that due to the constrain (10.32) on generalized velocities, the generalizeddisplacement are subject to the constraints

m∑

i=1

Aβi δqi = 0, β= 1, . . . , g . (10.35)

Then in view of the equations (10.33) also introduce the notation

δπs =m∑

i=1

Aβi fs i δqi . s = 1, . . . , n. (10.36)

From what it has been said, the union of the equations (10.35) and (10.36) is invertivleand the inverse has the form

δqi =n∑

s=1

hi s δπs , i = 1, . . . , m. (10.37)

Thus, as argumed before, the expressions δπs may take on arbitrary values. The corre-sponding qi obtained from (10.37) will automatically satisfy the constraints (10.35).

10.2.2 Work and generalized forces

Let us compute the work done by the external forces in a virtual displacement:

δW =N∑

ν=1

Fν ·δrν =N∑

ν=1

Fν ·m∑

i=1

∂ rν∂ qi

δqi =m∑

i=1

N∑

ν=1

Fν ·∂ rν∂ qi

δqi =m∑

i=1

Qi δqi , (10.38)

where

Qi =N∑

ν=1

Fν ·∂ rν∂ qi

, i = 1, . . . , m. (10.39)

Thus, we have obtained an expression for the virtual work in terms of the virtualdisplacements δqi . Utilizing (10.37), we may express the virtual work in terms of theδπs :

δW =m∑

i=1

Qi

n∑

s=1

hi s δπs ,=n∑

s=1

m∑

i=1

hi s Qi

δπs ,=n∑

s=1

Πs δπs , (10.40)

where we have let

Πs =m∑

i=1

hi s Qi , s = 1, . . . , n. (10.41)

TheΠs defined above are called the generalized forces corresponding to the pseudocoordianteπs , s = 1, . . . , n.


10.2.3 Newton’s equations in pseudocoordinates

Equations (10.29) express the particle velocities in terms of the generalized velocities. Sub-stituting for the latter from (10.34) we obtain the particle velocities in terms of the pseu-dovelocities:

rν =n∑

s=1

eν s (q, t )πs +eν (q, t ), ν = 1, . . . ,N . (10.42)

Then it follows that

δrν =n∑

s=1

eν s (q, t )δπs , ν = 1, . . . ,N , (10.43)

and

rν =n∑

s=1

eν s (q, t )πs + · · · , ν = 1, . . . ,N .

where the ellipsis indicate terms that are free of the pseudoaccelerations πs , s = 1, . . . , n.For future reference, let us note that

∂ rν∂ πs

= eν s . (10.44)

Consider the fundamental equations of dynamics:

δW −N∑

ν=1

mν rν ·δrν = 0.

Substituting from (10.40) and (10.43) this takes the form

n∑

s=1

Πs δπs −N∑

ν=1

mν rν · n∑

s=1

eν s (q, t )δπs

= 0,

that isn∑

s=1

Πs δπs −n∑

s=1

N∑

ν=1

mν rν ·eν s (q, t )

δπs = 0,

and finallyn∑

s=1

Πs −n∑

s=1

N∑

ν=1

mν rν ·eν s (q, t )

δπs = 0,

Since δπs are unconstrained, we conclude that

N∑

ν=1

mν rν ·eν s (q, t ) =Πs , s = 1, . . . , n. (10.45)

10.2.4 The energy of the acceleration

We introduce the “energy of acceleration” which is defined as

U = U (q, π, π) =1

2

N∑

ν=1

mν‖rν‖2. (10.46)

10.3. A modification noted by Desloge 85

Then, in view of (10.44), we see that

∂ U

∂ πs

=N∑

ν=1

mν rν ·∂ rν∂ πs

=N∑

ν=1

mν rν ·eν s , s = 1, . . . , n,

which, in conjunction with (10.45) leads to

∂ U

∂ πs

=Πs , s = 1, . . . , n, (10.47)

This are Appell’s equations of motion.Upon a close examination, equaions (10.47) are a system of n first order differential

equations in the n unknowns πs , s = 1, . . . , n. Note that there are no such things and πs ,so the equations are indeed first order in πs .

The equations are not complete, however, since they also involve the generalized co-ordinates qi , i = 1, . . . , n. To complete the system, we append to it the g equations (10.32)and (10.33), as in:

∂ U

∂ πs

=Πs , s = 1, . . . , n,

m∑

i=1

Aβi (q, t )qi +Aβ(q, t ) = 0, β= 1, . . . , g ,

πs =m∑

i=1

fs i qi , i = 1, . . . , n.

This combined set consists of 2n+ g equations. Let us note that n+ g = m, theefore wehave a system of m + n first order differential equations in the m + n unknwns qi mi=1and πsns=1.

10.3 A modification noted by Desloge

The generalized forces Πs in the Gibbs-Appel equations of motion (10.47) are computedfrom (10.41), where, in turn, the Qi are computed from (10.39). Desloge [2,11] notes thatif we define

R=N∑

ν=1

Fν · rν , (10.48)

and express the result in terms of q, π, and π, then Πs may be computed more easilythrough the formula

Πs =∂ R

∂ πs

, s = 1, . . . , n. (10.49)

Consequently, (10.47) takes on the form

∂ U

∂ πs

=∂ R

∂ πs

s = 1, . . . , n,

which motivates the introduction of the Gibbs function

G=U −R, (10.50)


whereby the equations (10.47) are expressed as

∂ G

∂ πs

= 0, s = 1, . . . , n. (10.51)

I haven’t checked the details of the reasoning here, but in a couple of applicationswhich I calculated, (10.47) and (10.51) produce identical results.

10.4 The simple pendulum via Gibbs-Appell

Here we will apply the Gibbs-Appell’s equations (10.20) to derive the familiar equationof motion of a simple pendulum. This is certainly an overkill; deriving the equationsof motion of a holonomic system such as a simple pendulum is done much more easilythrough the Lagrangian approach. The power of the Gibbs-Appell approach manifestsitself when applied to nonholonomic systems, as we will see in the next section.

But for now, let’s look at the simple pendulum as illustrated in Figure 1.1 on page 2.We take the angle ϕ as the problem’s generalized coordinate, express the position vectorr of the pendulum’s bob in terms of ϕ, then calculate the velocity and the acceleration:

r = ℓ⟨sinϕ, cosϕ⟩,v = r = ℓ⟨ϕ cosϕ,−ϕ sinϕ⟩,w = v = ℓ⟨ϕ cosϕ− ϕ2 sinϕ,−ϕ sinϕ− ϕ2 cosϕ⟩,

whence ‖w‖2 = ℓ2(ϕ2 + ϕ4). Considering that the force vector is f = ⟨0, m g ⟩, we arelead to the Gibbs function

G=1

2m‖w‖2−f ·w

=1

2ℓ2(ϕ2+ ϕ4)+m gℓ(ϕ sinϕ+ ϕ2 cosϕ).

Since the Gibbs-Appell equations of motion (10.20) reduce to ∂ G/∂ ϕ = 0 in our case,we conclude that

ℓ2ϕ+m gℓ sinϕ = 0,

which is the familiar equation of motion of a simple pendulum.

10.5 The Caplygin sleigh

The Caplygin sleigh consists of two point masses m1 and m2 connected through a rigidmassless rod of length ℓ. The two masses slide on a horizontal surface. The mass m2 canslide freely on the surface with no resistance at all. Mass m1 rides on a sharp blade, as inan ice hockey skate, which allows motion only in the direction of the rod. We wish todescribe the dynamics of the sleigh. Figure 10.1 depicts the sleigh. The position vectorsr1 and r2 of the masses m1 and m2 may be specified through the generalized coordinatesx(t ), y(t ), and ϕ(t ), shown on the figure. We have:

r1 = ⟨x, y⟩, r2 = r1+ ⟨ℓcosϕ,ℓ sinϕ⟩.Since the mass m1 can slide only along the direction of the rod, the velocity r1 = ⟨x , y⟩ isparallel to the vector ⟨cosϕ, sinϕ⟩, therefore x/cosϕ = y/ sinϕ, that is,

x sinϕ− y cosϕ = 0. (10.52)

10.5. The Caplygin sleigh 87

ℓ

ϕm1

m2

(x, y)

Figure 10.1: The Caplygin sleigh’s configuration may be described in terms of the gen-eralized coordinates x, y , and ϕ.

This is the concrete version of the general constraint (10.8).To calculate the Gibbs function, with begin with

v1 = r1 = ⟨x , y⟩, v2 = r2 = ⟨x + ℓϕ sinϕ, y + ℓϕ cosϕ⟩.

and then

w1 = v1 = ⟨x , y⟩,w2 = v2 = ⟨x − ℓϕ2 cosϕ− ℓϕ sinϕ, y − ℓϕ2 sinϕ+ ℓϕ cosϕ⟩.

Thus, we arrive at

G=1

2(m1+m2)(x

2+ y2)+1

2m2ℓ

2ϕ2

−m2ℓ(ϕ sinϕ+ ϕ2 cosϕ)x+m2ℓ(ϕ cosϕ− ϕ2 sinϕ)y +1

2m2ℓ

2ϕ4.

Now, following the idea in (10.11) of eliminating redundant generalized velocities, wesolve (10.52) for y, and eliminate it from the equations. Substituting y = x tanϕ in theabove expression for G, we obtain

G=m1+m2

2cos2ϕx2+

1

cos3ϕ

(m1+m2)x sinϕ−m2ℓϕ cos2ϕ

ϕ x

+1

2m2ℓϕ

2+1

cosϕm2ℓxϕϕ+ · · · , (10.53)

where, as before, the ellipsis stands for terms that involve no accelerations.There are no externally applied forces on the sleigh, therefore the equations of mo-

tion (10.20) reduce to∂ G

∂ x= 0,

∂ G

∂ ϕ= 0,

that is,

m1+m2

cos2ϕx +

1

cos3ϕ


ϕ = 0, (10.54a)

m2ℓϕ+1

cosϕm2ℓxϕ = 0. (10.54b)

We may solve this system of two second order differential equations for the two un-knowns x(t ) and ϕ(t ), subject to initial conditions x(0) = x0, x(0) = x0, ϕ(0) = ϕ0,


and ϕ(0) = ϕ0. We may compute y retroactively from the constraint equation y(t ) =x(t ) tanϕ(t ), and then integrate it to find y(t ) subject to the initial condition y(0) = y0.

In practice, when solving the system on a computer, it is easier to adjoin the constraintequation (10.52) to the two equations, as in

m1+m2

cos2ϕx +

1

cos3ϕ


ϕ = 0,

m2ℓϕ+1

cosϕm2ℓxϕ = 0,

x sinϕ− y cosϕ = 0,

and solve the whole thing in one fell swoop for the three unknowns x(t ), y(t ), and ϕ(t )by applying the initial conditions

x(0) = x0, x(0) = x0, ϕ(0) = ϕ0, ϕ(0) = ϕ0, y(0) = y0.

Note that there is no initial condition on y(0) since the x0 and ϕ0 determine y(0) throughthe constraint equation (10.52).

10.6 The Caplygin sleigh revisited

In Section 10.5 we obtained the differential equations of motion of the Caplygin sleighby applying the Gibbs-Appell equations (10.20). Here we solve the same problem byapplying the equations (10.27). Toward that end, let us introduce the generalized velocitiesv andω defined through

v =x

cosϕ, ω = ϕ.

Thus,ω is the rod’s angular velocity. To understand v, note that x = v cosϕ, therefore vis the speed (not velocity!) of the mass m1.

Now, substitute ϕ =ω and x = v cosϕ in (10.53) and simplify:

G=1

2(m1+m2)v

2+1

2m2ℓ

2ω2+m2ℓvωω−m2ℓω2v

+1

2

(m1+m2)v2+m2ℓ

2ω2

ω2.

Equations (10.27) in this case take the form

∂ G

∂ v= 0,

∂ G

∂ ω= 0,

and thus,

(m1+m2)v −m2ℓω2 = 0, (10.55a)

m2ℓ2ω+m2ℓvω = 0. (10.55b)

These are much more pleasant-looking equations compared to (10.54) which we hadobtained before. In fact, they are quite amenable to solving by hand. To wit, multiply thefist equation by v, the second by ω, and add. We get

(m1+m2)vv +m2ℓ2ωω = 0,

10.7. The problem from page 63 of Gantmacher 89

that is 1

2(m1+m2)v

2+1

2m2ℓ

2ω2·= 0,

whence

E =1

2(m1+m2)v

2+1

2m2ℓ

2ω2 (10.56)

is a constant of the motion. (In fact, it is the sleigh’s kinetic energy.) The value of E isdetermined by the initial conditions v(0) andω(0).

Now, multiplying (10.55a) by ℓ/2 and adding to (10.56), we get

1

2(m1+m2)ℓv +

1

2(m1+m2)v

2 = E ,

that is,

ℓv + v2 = α2, where α2 =2E

m1+m2

.

We solve this separable equation for v subject to the initial condition v(0) = v0, and obtain

v(t ) = α tanhhα

ℓt + tanh−1 v0

α

i

. (10.57)

We see that limt→∞ v(t ) = α, that is, the sleigh’s speed approaches α in the long run.Now that we have v(t ), the angular velocityω(t )may be computed easily. From the

differential equation ℓv + v2 = α2 we get v = 1ℓ (α

2 − v2). We substitute this expression

for v in (10.55a), then solve forω2:

ω(t )2 =m1+m2

m2ℓ2

α2 − v(t )2

, (10.58)

where v(t ) is given in (10.57). Note, in particular, that v(t )→ α implies ω(t )→ 0, thatis, the sleigh’s spin slows down to zero in the long run.

Remark 10.1. Let’s observe that (a) From the definition x = v cosϕ of the speed v wesee that a positive v implies motion in which m1 trails m2; and (b) Since tanh is positivewhen its argument is positive, the solution (10.57) indicates that regardless of the initialconditions, v(t ) will be positive for sufficiently large t . Putting these two observationstogether, we conclude that regardless of the sleigh’s initial conditions, in the long run itwill orient itself so that m1 trails m2 in its motion.

10.7 The problem from page 63 of Gantmacher

A “dumbbell” consists of a rigid weighless rod of length ℓ with point masses of m eachattached to its ends. The rod is free to move in a vertical plane, other than the requirementthat the velocity of its center should point along the rod itself. Find the equaitons ofmotion.

We introduce a Cartesian coordiante system xy, where x is horizontal and y pointsup. Let x(t ) and y(t ) be the coordinates of the rod’s center, and let ϕ(t ) be the rod’s anglerelative to the x axis. We use x, y, and ϕ as the problem’s generalized coordiantes. Thenrc = ⟨x, y⟩ is the position vector of the rod’s midpoint. The position vectors of the twomasses are

r1 = rc −ℓ

2⟨cosϕ, sinϕ⟩, r2 = rc +

ℓ

2⟨cosϕ, sinϕ⟩.


Now we calculate the velocities of the masses:

r1 = rc −ℓ

2ϕ⟨− sinϕ, cosϕ⟩, r2 = rc +

ℓ

2ϕ⟨− sinϕ, cosϕ⟩, (10.59)

and their accelerations:

r1 = rc −ℓ

2ϕ⟨− sinϕ, cosϕ⟩− ℓ

2ϕ2⟨−cosϕ,− sinϕ⟩,

r2 = rc +ℓ

2ϕ⟨− sinϕ, cosϕ⟩+ ℓ

2ϕ2⟨−cosϕ,− sinϕ⟩.

It follows that

S =1

2m‖r1‖2 +

1

2m‖r2‖2 = m‖rc‖2+

mℓ

2(ϕ2+ ϕ4) = m(x2 + y2) +

mℓ

2(ϕ2+ ϕ4)

(10.60)

The formulation above indicates that we have made the tacit choice of using x, y andϕas generalized coordiantes for this problem. Although these coordiantes are independentof each other algebraically, they are related to each other through their derivatives, sinceaccording to the problem’s statement, the velocity of the rod’s midpoint is constraintedto lie along the rod’s direction. This says that the velocity vector rc = ⟨x , y⟩ makes anangle of of ϕ with the x axis, as the rod does, therefore y/x = tanϕ, or equivalently,

x sinϕ = y cosϕ. (10.61)

This is the problem’s nonholonomic constraint.To relate this to the general theorey developed in the previous sections, the problem

has m = 3 generalized coordinates x, y and ϕ; and it has g = 1 nonholonomic con-straint (10.61). Therefore its degrees of freedom are n = m− g = 2. Again, acoording tothe preceding general theory, we should express the generalized velocities x, y and ϕ, andthe constraint (10.61), in terms of n = 2 independent and unconstrained pseudovelocitiesπ1 and π1.

There is no unique choice of the pseudovelocities π1 and π1. All is needed the require-ment that the pseudovelocities (10.33) togehter with the constraint equations (10.32) besolvable for the generalized veclocities (10.34). Table 10.1 shows a few possible choices forthe problem at hand. In Choice (1) we take pseudovelocities π1 and π1 to be the actualgeneralized velocities x and ϕ. The generalized velocity y then is determiend through theconstraint (10.61).

Choice (2) defines the pseudovelocity π1 to be the left-hand side of the constraintequation (10.61) which is a reasonable choice to make.

Choice (3) sets π1 to the seemingly odd expression x/cosϕ. The resulting expressionsfor x and y are quite simple, in the sense that they are easy to differentiate; we are goingto need x and y to calculate the energy of acceleration. The derivatives of x and y will bemessier expressions with the Choices (1) and (2).

On a closer scrutinty, the setting of π1 = x/cosϕ is not that odd after all since it hasa pleasant physical interpretation. Let v be the (scalar) speed of the rod’s midpoint. Sincethe velocity points along the rod’s direction, then it should be clear that x = v cosϕ andy = v sinϕ. Comparing the x and y values in Choice (3), we see that π1 is the speed v.

We adopt the pseudovelocities of Choice (3) for the rest of this section. To ease thenotational burden, however, we replace the ageneric symbols π1 and π2 with the moremeaningful notation v = π1 andω = π2. In particular, according to Choice (3) we have

x = v cosϕ, y = v sinϕ. (10.62)

10.7. The problem from page 63 of Gantmacher 91

Choice (1)

π1 = x

π2 = ϕ

x sinϕ = y cosϕ

⇐⇒

x = π1

y = π1 tanϕ

ϕ = π2

Choice (2)

π1 = x sinϕ

π2 = ϕ

x sinϕ = y cosϕ

⇐⇒

x = π1/ sinϕ

y = π1/cosϕ

ϕ = π2

Choice (3)

π1 = x/cosϕ

π2 = ϕ

x sinϕ = y cosϕ

⇐⇒

x = π1 cosϕ

y = π1 sinϕ

ϕ = π2

Table 10.1: There is no unique choice of pseudovelocities for a given problem. Thechoices given here are three out of infinite such possibilities for the dumbbellproblem.

To express the energy of acceleration S in (10.60) in terms of the quasivelocities, we dif-ferentiate (10.62):

x = v cosϕ− vϕ sinϕ = v cosϕ− vω sinϕ, (10.63a)

y = v sinϕ+ vϕ cosϕ = v sinϕ+ vω cosϕ, (10.63b)

whence x2+ y2 = v2+ v2ω2, and (10.60) changes to

S =m(v2+ v2ω2)+mℓ

2(ω2+ω4) = mv2+

mℓ

2ω2+ · · · ,

where the ellipses indicates terms which are free of the acceleration terms v and ω.A force of F = ⟨0,−m g ⟩ acts on each of the two masses. Therefore

F · r1+F · r2 =F · (r1+ r2) = 2F · rc

= 2⟨0,−m g ⟩ · ⟨x , y⟩=−2m g y =−2m g (v sinϕ+ vω cosϕ).

In the last step we have substituted for y from (10.63b).We conclude that the Gibbs function is

G =mv2 +mℓ

2ω2+ 2m g v sinϕ+ · · · ,

where, as usual, the ellipses indicates terms which are free of the accelerations.Then the Appell equations ∂ G/∂ v = 0 and ∂ G/∂ ω = 0 lead to

v + g sinϕ = 0, ω = 0.

To complete the system, we append the constraint equations (10.62). Therefore the com-plete set of differential equations of motion are

x = v cosϕ,

y = v sinϕ.

ϕ =ω,

v + g sinϕ = 0,

ω = 0.


These form a set of five first order differential equations in the five unknowns x, y, ϕ,ω,and v.

This system is solvable in terms of elementary functions, as demostrated in Gant-macher. Plugging the raw system into Maple does not produce a good result immediately,but we can maneuver it toward the right solution as follows.

The equations above impliy thatω is a constant, and thereforeϕ =ωt+ϕ0. Thereforethe system shrinks to

x = v cosϕ,

y = v sinϕ.

v + g sinϕ = 0,

where ϕ =ωt+ϕ0. In solving this reduced system, Maple distinguishes between the caseswhere ω is zero and nonzero. If ω is nonzero, we get something like Gantmacher’s. Inparticular, y consists of a linear term in t plus trigonometric function. If, however, ω iszero and ϕ0 =π/2, we get a −1/2g t 2 term, as expected.

Note: In the above, I calclated the Gibbs function a la Desloge. We may calculate it a laGantmacher if we wish. Here is how.

What I have written above as the quasivelocity v, Gantmacher calls it π. The equationy = v sinϕ then takes the form y = π sinϕ, whereby we introduce the virtual displace-ment of the quasicoordinate π through δy = (sinϕ)δπ.

In accordance with (10.59), the virtual displacements of the masses are

δr1 = δrc −ℓ

2⟨− sinϕ, cosϕ⟩δϕ, δr2 =δrc +

ℓ

2⟨− sinϕ, cosϕ⟩δϕ,

A force of F = ⟨0,−m g ⟩ acts on each of the two masses therefore their virtual workis

δW =F ·δr1+F ·δr2 =F · (δr1+δr2) = 2F ·δrc

= 2⟨0,−m g ⟩ · ⟨δx,δy⟩ =−2m g δy =−2m g sinϕδπ. (10.64)

However δW = Πδπ+Φδϕ, where Π and Φ are the generalized forces correspondingto the quasicoordinates π and ϕ. We conclude that Π = −2m g sinϕ and Φ = 0. Theequations of motion are

∂ S

∂ π=Π,

∂ S

∂ ϕ= Φ,

which agree with what we obtained earlier.

10.8 Rigid body dynamics

The previous sections have focused on the dynamics of point masses. Extending the re-sults to rigid bodies is a matter of replacing the summations with integrals, which is easyin principle, but provides some challenges in practice. It is possible, with some work, tosystematize the treatment and arrive at a general formula for the Gibbs function whichapplies to all rigid bodies. Lurie [9] and Desloge [2], for instance, have such formulas.Lurie’s formula is packaged quite nicely. Desloge’s result is a little more general but itis not packaged as nicely. Furthermore, Desloge’s formula is actually incorrect(!) due to

10.8. Rigid body dynamics 93

a false assumption made in its derivation. Desloge [11] points out his error and offers acorrected formula.

In the following sections I will offer a blending of Lurie’s and Desloge’s approacheswhich combines the nice packaging of the former with the generality of the latter.

10.8.1 Three frames of reference

It turns out that to derive the equation of motion of a rigid body, it is quite convenientto use not one, not two, but three(!) special purpose frames of reference simultaneously.In most applications a frame of reference takes the form of a right-handed orthonormaltriad and a point called the frame’s origin.

The stationary frame is defined by a triad i,j,k and the associated origin O. As thename implies, the stationary frame is fixed (not moving). Typically the i and j

vectors lie in a horizontal plane and the vector k points upward, but that’s not arequirement.

With the stationary frame we associate a Cartesian coordinate system whose x, y,and z axes are aligned with the i, j and k vectors, respectively.

Ultimately, the purpose of rigid body dynamics is to express a body’s motion rela-tive to the stationary frame.

The body frame of reference is defined by a triad b1,b2,b3 and an associated origin o.The triad is firmly attached to the body, and therefore moves with it. There areno restrictions on the choice of the origin, nor on the triad’s orientation. When-ever possible, however, we set the origin at the body’s center of mass, and orientthe triad along the body’s principal axes of inertia, since that results in significantsimplifications.

Since the triad b1,b2,b3 is affixed to the body, the triad’s angular velocity is exactlythe body’s angular velocity ω.

The intermediate frame of reference is defined by a triad e1,e2,e3 and an associatedorigin O ′. The choice of this frame is entirely up to you. You choose it to fit thespecific application at hand. The triad may rotate and the origin may move as youwish.

Remark 10.2. What’s the purpose of the intermediate frame of reference? Informulating the equations of motion, some quantities are more easily expressed inonce coordinate system than the other. For instance, the position of the body’scenter of mass is best expressed in terms of the stationary frame, while its spin isbest express in terms of the body reference frame. Some other quantities may bedifficult to express in either of those, but may be easy in terms of a special purposeintermediate frame of reference. All three frames are related through orthogonaltransformations, therefore we may readily translate the information from one frameto another, as needed.

10.8.2 The energy of acceleration for a rigid body

Consider a rigid bodyB of mass m equipped with a body reference frame with an origino and a orthonormal triad b1,b2,b3, as in Section 10.8.1. Let ro be the position vector of

the point o relative to the stationary frame, that is ro =−→Oo, and letIo be body’s moment


of inertia relative to o. Furthermore, let c be the body’s center of mass, and let ρc =−→oc .

Then we have8

∫

B

1

2‖r‖2 d m =

1

2m‖ro‖2+m(ro × ω) ·ρc +m(ro ×ω) · (ω×ρc )

+1

2ω · Ioω+(ω×ω) · Ioω+ · · · , (10.65)

where the ellipsis indicates terms that do not involve accelerations, and are, therefore,immaterial to the Gibbs-Appell equations of motion.

If the origin o of the body frame is chosen to coincide with the body’s center of mass,then ρc = 0 and (10.65) reduces to

∫

B

1

2‖r‖2 d m =

1

2m‖rc‖2+

1

2ω · Ic ω+(ω×ω) · Icω+ · · · , (10.66)

where rc is the position vector, relative to the stationary frame, of the body’s center ofmass, and Ic is the moment of inertia tensor relative to the center of mass.

10.9 The rolling coin

Here we analyze the somewhat challenging dynamics of a coin, modeled as a thin homo-geneous disk of radius a, rolling without slipping on a horizontal floor; see Figure 10.2.

10.9.1 The three frames

Figure 10.2 depicts the coin along with the three frames of reference discussed in Sec-tion 10.8.1. Specifically, the i,j,k triad of the stationary frame has its origin on thefloor and the k vector points upward; the b1,b2,b3 triad of the body frame has its ori-gin at the disk’s center and the vector b3 is perpendicular to the disk.

We choose an intermediate frame with its origin at the disk’s center, and an associatedorthonormal triad e1,e2,e3 oriented as follows. The vector e3 is perpendicular to thedisk, and therefore coincides with b3. The vector e1 is horizontal. Then we set e2 =e3×e1.

The body frame is fixed to the coin, by definition, and rotates with it. The position ofthe body frame relative to the intermediate frame is described by a single parameter, theangle ψ, between the vectors b1 and e1 as indicated on the figure.

The coin’s position relative to the stationary frame is specified through five generalizedcoordinates x, y, θ, ϕ, and ψ, where θ is the angle between the coin’s plane and the floor;ϕ is the angle between the vectors i and e1 (it measures how the coin’s plane is rotatedrelative to the k vector); and (x, y,a sinθ) are the coordinates of the coin’s center.

10.9.2 The angular velocity

From the geometry of Figure 10.2 it is evident that e1 = cosϕ i+ sinϕ j. The unit vectore2 makes an angle θ with the floor and its horizontal projection is perpendicular to e1,therefore its horizontal and vertical projections are (− sinϕ i+ cosϕ j)cosθ and sinθk.We conclude that e2 = (− sinϕ i+ cosϕ j)cosθ+ sinθk. The vector e3 may be found

8This is a combination of equation (4.11.8) in Lurie [9] and (22) in Desloge [11]. I haven’t personally verifiedthem yet.

10.9. The rolling coin 95

x

y

z

ij

k

ϕ

ae1

e2

e3,b3

b1

b2

ψ

θ

(x, y,a sinθ)

Figure 10.2: The rolling coin’s configuration is specified through five generalized coor-dinates x, y , θ, ϕ, and ψ, as shown.. The no-slip conditions at the contactpoint with the floor imposes two nonholonomic constraints.

with a similar geometric reasoning, or just by computing e3 = e1×e2. Here is a summary:

e1 = cosϕ i+ sinϕ j, (10.67a)

e2 =− sinϕ cosθ i+ cosϕ cosθ j+ sinθk, (10.67b)

e3 = sinϕ sinθ i− cosϕ sinθ j+ cosθk. (10.67c)

We will need the time derivatives of these vectors shortly, so let’s compute them rightnow. We have

e1 =−ϕ sinϕ i+ ϕ cosϕ j.

It follows that

e1 ·e1 = 0, e1 ·e2 = ϕ cosθ, e1 ·e3 =−ϕ sinθ,

therefore, according to (8.7), we have e1 = ϕ cosθe2− ϕ sinθe3. In a similar manner wecompute e2 and e2 and express them in terms of the basis e1,e2,e3. Here is what weget:

e1 = ϕ cosθe2− ϕ sinθe3, (10.68a)

e2 =−ϕ cosθe1+ θe3, (10.68b)

e3 = ϕ sinθe1− θe2. (10.68c)


Now let us examine the body frame. Referring to Figure 10.2 we have:

b1 = cosψe1+ sinψe2, (10.69a)

b2 =− sinψe1+ cosψe2, (10.69b)

b3 = e3. (10.69c)

We differentiate these with respect to t , and substitute from (10.68) for the derivatives ofei , and obtain

b1 =−(ϕ cosθ+ ψ) sinψe1+(ϕ cosθ+ ψ)cosψe2+(−ϕ sinθ cosψ+ θ sinψ)e3,(10.70a)

b2 =−(ϕ cosθ+ ψ)cosψe1− (ϕ cosθ+ ψ) sinψe2+(ϕ sinθ sinψ+ θ cosψ)e3,(10.70b)

b3 = ϕ sinθe1− θe2. (10.70c)

Remark 10.3. The expressions for b3 in (10.70c) and e3 in (10.68c) agree since b3 = e3.

Equations (10.70) may be used in conjunction with (7.5) to calculate the body’s angularvelocity vector. Toward that end we compute

ωb1 = b2 ·b3 =−(ϕ cosθ+ ψ)cosψ (e1 ·b3)− (ϕ cosθ+ ψ) sinψ (e2 ·b3)

+ (ϕ sinθ sinψ+ θ cosψ) (e3 ·b3),

where the superscript b is to remind us that ωb1 is the component of the vector ω in the

the body frame. On the right-hand side we substitute for b3 from (10.69), simplify the

result, and arrive atωb1 = ϕ sinθ sinψ+ θ cosψ. Computing theωb

2 andωb3 components

in the same way, we arrive at:

ωb1 = b2 ·b3 = ϕ sinθ sinψ+ θ cosψ,

ωb2 = b3 ·b1 = ϕ sinθ cosψ− θ sinψ,

ωb3 = b1 ·b2 = ϕ cosθ+ ψ.

Then from the definition (7.3) of the angular velocity vector we conclude that

ω = (ϕ sinθ sinψ+ θ cosψ)b1+(ϕ sinθ cosψ− θ sinψ)b2+(ϕ cosθ+ ψ)b3. (10.71)

This is the coin’s angular velocity vector expressed in the b1,b2,b3 basis. By substitutingfor b1,b2,b3 from (10.69), we obtain and expression for ω in the e1,e2,e3 basis:

ω = θe1+ ϕ sinθe2+(ϕ cosθ+ ψ)e3. (10.72)

We see that the components of ω along the intermediate basis are particularly simple.Therein lies the significance of the choice of our intermediate basis. Those componentsplay a central role in what comes next, therefore we name them simplyωi ,

9

ω1 = θ, ω2 = ϕ sinθ, ω3 = ϕ cosθ+ ψ, (10.73)

9To be consistent with the b superscript introduced earlier for the components of ω in the b1,b2,b3 basis,we should have named its e1,e2,e3 components with superscripts e . We refrain from doing that, however, toreduce clutter.


Note that each ωi is a linear combination of the generalized velocities θ, ϕ, and ψ, andtherefore each ωi is a quasi-velocity as defined in Section 10.1.9. Well, actually ω1 is not

a true quasi-velocity since it is the derivative of the generalized velocity θ, but the othertwo are honest quasi-velocities since they are not derivatives of anything.

Solving (10.73) as a linear system of three equations for the three unknowns θ, ϕ, and

ψ, we get:

θ=ω1, ϕ =1

sinθω2, ψ=ω3−ω2 cotθ. (10.74)

Equations (10.73) and (10.74) establish a one-to-one correspondence between the gen-

eralized velocities θ, ϕ, ψ, and the quasi-velocitiesω1,ω2,ω3. We may formulate the restof the analysis in terms of one or the other set of variables. We will do it in terms of theω’s since the derivations are easier that way.

For future reference, let us make a note of the following formula whose derivation isleft as an exercise:

ω = [ω1+ω2ω3−ω22 cotθ]e1+[ω2−ω3ω1+ω1ω2 cotθ]e2+ ω3 e3. (10.75)

10.9.3 The no-slip constraint

The position vector of the coin’s center in the stationary frame is

rc = x i+ y j + a sinθk. (10.76)

The vector ρ = −ae2 extends from the coin’s center to its contact point with the floor.Therefore, according to (7.7), the velocity, relative to the coin’s center, of a point on thecoin’s rim at the contact point is given by ω×ρ. Consequently, the velocity of that pointrelative to the stationary frame is rc + ω × ρ. Since the velocity of the ground at thecontact point is zero, then to prevent slippage, we need

rc +ω×ρ= 0. (10.77)

Upon substituting for rc from (10.76), for ω from (10.72), for the ei vectors from (10.67),and simplifying the result, and we get

x−aθ sinϕ sinθ+(ϕ cosθ+ψ)cosϕ

i+

y+aθ cosϕ sinθ+(ϕ cosθ+ψ)cosϕ

j = 0.

Then we apply (10.74) to change over to quasi-velocities and arrive at

x + aω3 cosϕ− aω1 sinϕ sinθ

i+

y + aω3 sinϕ+ aω1 cosϕ sinθ

j = 0.

This leads us to a pair of scalar equations

x =−a(ω3 cosϕ−ω1 sinϕ sinθ), y =−a(ω3 sinϕ+ω1 cosϕ sinθ) (10.78)

which express the rolling coin’s nonholonomic constraints. We use these to eliminate xand y from the rest of the computations.

10.9.4 The acceleration of the coin’s center

From (10.76) we have rc = x i+ y j+ aθ cosθk.. We substitute for x and y from (10.78),then we apply (10.74) and arrive at

rc =−a(ω3 cosϕ−ω1 sinϕ sinθ) i− a(ω3 sinϕ+ω1 cosϕ sinθ) i+ aω1 cosθk.


Then we differentiate this once again to find the acceleration:

rc = a

ω1 sinϕ sinθ− ω3 cosϕ+ω21 sinϕ cosθ+ω1ω2 cosϕ+ω2ω3 sinϕ/ sinθ

i

+ a

− ω1 cosϕ sinθ− ω3 sinϕ−ω21 cosϕ cosθ+ω1ω2 sinϕ−ω2ω3 cosϕ/ sinθ

j

+ a

ω1 cosθ−ω21 sinθ

k. (10.79)

Here we have applied (10.74) to eliminate the generalized velocities θ, ϕ, ψ in favor ofthe quasi-velocities ω1, ω2, ω3. Finally, we compute ‖rc‖2 which forms a part of theproblem’s Gibbs function:

‖rc‖2 = a2[ω21 + ω

23 + 2ω1ω2ω3− 2ω1ω2ω3]+ · · · , (10.80)

where the ellipsis indicate terms that involve no acceleration terms ω1, ω2, ω3.

10.9.5 The rotational acceleration

The body frame b1,b2,b3 is lined up with the coin’s principal moment of inertia axes,therefore the moment of inertia tensor relative to the coin’s center is

I = αb1⊗ b1+αb2 ⊗ b2+βb3 ⊗ b3.

where

α=1

4ma2, β=

1

2ma2. (10.81)

Therefore we have:

Iω = α(b1 ·ω)b1+α(b2 ·ω)b2+β(b3 ·ω)b3,

I ω = α(b1 · ω)b1+α(b2 · ω)b2+β(b3 · ω)b3.

Inserting for ω from (10.75) we get:

ω · I ω = α(ω1+ω2ω3−ω22 cotθ)2+α(ω2−ω3ω1+ω1ω2 cotθ)2+ ω2

3

= αω21 +αω

22 +βω

23 − 2α(ω1ω2−ω1ω2)(ω2 cotθ−ω3)

+α(ω21 +ω

22)(ω2 cotθ−ω3)

2. (10.82)

The final term is free of accelerations, therefore it may be dropped when forming theGibbs function.

In a similar manner we also calculate

(ω×ω) · Iω = (β−α)(ω1ω2−ω1ω2)ω3

− (β−α)(ω21 +ω

22)(ω2 cotθ−ω3)ω3. (10.83)

Again, the final term is free of accelerations, therefore it may be dropped when formingthe Gibbs function.

According to (10.66) and (10.7), The Gibbs function is

G=

∫

B

1

2‖r‖2 d m−

∫

BF · r d m

=1

2m‖rc‖2+

1

2ω · I ω+(ω×ω) · Iω− (−m gk) · rc ,


which may be evaluated by putting together (10.79), (10.80), (10.82), and (10.83):

G=1

2ma2[ω2

1 + ω23 + 2ω1ω2ω3− 2ω1ω2ω3]+

1

2αω2

1 +1

2αω2

2 +1

2βω2

3

− (ω1ω2−ω1ω2)(αω2 cotθ−βω3)+m gaω1 cosθ+ · · · ,

where, as always, we have dropped terms which do not depend on the accelerations. Fi-nally, substituting for α and β from (10.81), we arrive at

G= ma2h5

8ω2

1 +1

8ω2

2 +3

4ω2

3 −1

4ω1ω2(ω2 cotθ− 6ω3)

+1

4ω1ω2(ω2 cotθ− 2ω3)−ω1ω2ω3

i

+m gaω1 cosθ.

Then from the equations ∂ G/∂ ω1 = 0, ∂ G/∂ ω2 = 0, ∂ G/∂ ω3 = 0 we get

5

4ω1−

1

4ω2(ω2 cotθ− 6ω3)+

g

acosθ = 0,

1

4ω2+

1

4ω1(ω2 cotθ− 2ω3) = 0,

3

2ω3−ω1ω2 = 0.

Clearly this set of differential equations is under-determined, since it depend on θ which

is also an unknown. However, from (10.74) we see that θ = ω1. Adjoining this to theabove gives us a system of four first order differential equations in the four unknownsω1,ω2,ω3, and θ.

In practice, we extend the system by adjoining all three of the equations from (10.74),and the two equations from (10.78). Thus, we obtain a system of eight first order differ-ential equations in the eight unknowns ω1, ω2, ω3, θ, ϕ, ψ, x, and y, whose solutioncompletely determines the coin’s motion. Here is the system in its full glory:

ω1−1

5ω2(ω2 cotθ− 6ω3)+

4g

5acosθ = 0,

ω2+ω1(ω2 cotθ− 2ω3) = 0,

ω3−2

3ω1ω2 = 0,

θ=ω1,

ϕ =1

sinθω2,

ψ=ω3−ω2 cotθ.

x =−a(ω3 cosϕ−ω1 sinϕ sinθ),

y =−a(ω3 sinϕ+ω1 cosϕ sinθ).

(10.84)

Remark 10.4. We need eight initial conditions to go with these equations. The initialconditions on x, y, θ, ϕ, and ψ determine the coin’s initial position relative to the sta-tionary axes, so they may be specified in the obvious way. The initial conditions on ω1,ω2, and ω3 require a more careful consideration. You may recall from (10.73) that theω’s were defined as the components of the coin’s angular velocity along the intermediate


frame e1,e2,e3. Since the intermediate frame has no immediate physical manifesta-tion, it is not easy to make up meaningful initial values for theω’s. We do note, however,that (10.73) defines the ω’s in terms of the generalized coordinates and their velocities,

which are easier to grasp. Therefore in practice you will make up initial values for θ, ϕ,

and ψ as desired, then use (10.73) to determine the initial values forω1, ω2, andω3.

Exercises

10.1. Derive (10.75).

10.2. Give the details of the computation which leads from (10.77) to (10.78).

10.3. Derive (10.83).

10.4. Equip a rigid body B with a body reference frame b1,b2,b3 whose origin co-incides with the body’s center of mass. Additionaly, suppose that the b1,b2,b3is aligned with the body’s moment of inertia axes, so that the moment of inertiatensor takes the form

I = I1b1⊗ b1+ I2b1⊗ b2+ I3b1⊗ b3.

Letω1,ω2,ω3 be the components of the body’s angular velocity along the b1 ,b2,b3vectors, that is

ω =ω1b1+ω2b2+ω3b3.

Show that:

1. ω · I ω = I1ω12+ I2ω2

2+ I3ω32.

2. (ω×ω) · Iω = I1(ω2ω3− ω3ω2)ω1

+I2(ω3ω1− ω1ω3)ω2+ I3(ω1ω2− ω2ω1)ω3.

Conclude that the equations of motion of a freely spinning rigid body are

I1ω1+(I3− I2)ω2ω3 = 0,

I2ω2+(I1− I3)ω3ω1 = 0,

I3ω3+(I2− I1)ω1ω2 = 0.

10.5. Solve the system of differential equations (10.84) and produce an animation of thecoin’s motion.

Chapter 11

Quaternions

11.1 The quaternion algebra

A quaternion, as a mathematical object, is a pair [a,u ], where a ∈R andu ∈ E3. Multipli-cation by a scalar cp, the sum p+q, and the product p q of the quaternions p= [a, u ]and q= [ b , v ] are defined according to

cp= c[a, u ]= [ ca, cu ], (c ∈R) (11.1)

p+ q= [a, u ]+ [ b , v ]= [a+ b , u+v ], (11.2)

p q= [a, u ] [ b , v ]= [ab −u ·v, av+ bu+u×v ]. (11.3)

Note that addition is commutative but multiplication, due to the presence of the u× v

term, is not.As a special case of (11.3), we have

[ c , 0 ] [a, u ]= c[a, u ] and [a, u ] [ c , 0 ]= c[a, u ],

therefore the quaternion [ c , 0 ] acts exactly like the scalar c under quaternion multiplica-tion. In particular, the quaternion [1, 0 ] is the multiplicative identity in the quaternionalgebra.

In view of the observation above, we adopt the convention of writing c instead ofof [ c , 0 ] when there is no chance of confusion. This is analogous to writing α, insteadof α+ 0i when dealing with complex numbers. By the same token, we write u insteadof [0, u ]. The quaternion c = [ c , 0 ] is called a scalar quaternion or just a scalar if themeaning is clear from the context. Similarly, the quaternionu= [0,u ] is called a vectorialquaternion, or just a vector, for short.10

As a consequence of the convention adopted above, the notation u v, where u andv are vectors, makes sense, and evaluates to

u v = [0, u ] [0, v ]= [ −u ·v, u×v ]. (11.4)

You may want to think of u v as the “quaternion product” of the vectors u and v,but note that the product is a quaternion, not a vector, unless u · v = 0, in which caseu v =u×v.

10The terminology is not quite standardized. What we have called a scalar quaternion is also called a realquaternion, and what we have called a vectorial quaternion is also called a pure quaternion or imaginary quater-nion.

101

102 Chapter 11. Quaternions

The conjugate p∗ of the quaternion p= [a, u ] is defined as p∗ = [a,−u ]. Let us notethat

p∗ p= [a2+ ‖u‖2, 0 ]= (a2+ ‖u‖2)[1, 0 ]= a2+ ‖u‖2 = |p|2, (11.5)

where we have defined|p|= (a2+ ‖u‖2)1/2.

The quantity |p| is called the norm (or modulus) of the quaternion p. A quaternion psuch that |p|= 1 is called a unit quaternion.

If |p| is nonzero, then we may rearrange (11.5) into

1

|p|2 p∗

p= 1,

which shows that the parenthesized expression is the left multiplicative inverse of p. Re-peating the calculation, beginning with p p∗, we see that the parenthesized expressionis also the right multiplicative inverse of p. We conclude that if |p| is nonzero, then p hasa multiplicative inverse, p−1, given by

p−1 =1

|p|2 p∗, (|p| 6= 0).

The special case unit criterion arises quite frequently:

p−1 = p∗, (|p|= 1).

Remark 11.1. This section has touched on just a few basic concepts of the quaternionalgebra which will be needed in what follows. For an in-depth study of quaternions seeAltmann [12]. For a leisurely introduction to quaternions, with applications to computergraphics, see Hanson [13].

11.2 The geometry of the quaternions

The goal of this section is to elucidate the geometric interpretations of a few featuresrelated to quaternion which play significant roles in the dynamics of rigid bodies.

11.2.1 The reflection operator

As we noted in connection with (11.4), the quaternion product of two vectors is not avector in general. Thus, it may come as a surprise that the triple product v u v isalways a vector:

v u v = ‖v‖2u− 2(v ·u)v, for all u,v ∈ E3. (11.6)

The proof of this identity is left as an exercise.In particular, if v is a unit vector, let’s call it n, then we have

n u n= u− 2(n ·u)n, if ‖n‖= 1. (11.7)

The right-hand side has a well-known geometric interpretation. Suppose the tails of thevectors n and u are attached to a common point o, and let P be a plane through o andperpendicular to n, as illustrated in Figure 11.1, and let Q be the plane of the vectors n

11.2. The geometry of the quaternions 103

u− 2(n ·u)n

n

u

P

Q

o


and u. Then it is simple exercise to show that u−2(n ·u)n is the mirror reflection of thevector u through the plane P . Thus, we have established the following:

Theorem 11.1. The reflection operator Rn into a plane with a unit normal n is given by

Rnu=n u n, u ∈ E3. (11.8)

11.2.2 The rotation operator

Consider two planes, P1 and P2, which have unit normals n1 and n2 and make a dihedralangle ϕ/2 with each other. In the sketch shown in Figure 11.2a, the planes are shownedgewise, therefore they appear as lines. The line of intersection of the two planes appearsas a point in that figure since we are looking at it head on. The unit vector n = (n1 ×n2)/‖n1×n2‖ lies in the direction of that line of intersection.

Let Rn1and Rn2

be the reflection operators, as in the previous subsection, into the

planes P1 and P2. Pick an arbitrary vector u, apply Rn1to it, and then apply Rn2

the the

result. Figure 11.2b shows the effect.To gain insight into the structure of this geometric construction, it helps to introduce

the plane P ′1 which is the reflection of P1 into P2. The plane P ′1, seen edgewise, is shownas a dashed line in Figure 11.2c.

In Figure 11.2d we have added the vector Rn2u. It should be evident from the sym-

metries of the diagram that Rn2u coincides with the reflection of Rn2

Rn1u into P ′1.

As a rotation by the angle ϕ about the vector n takes the plane P1 to the plane P ′1, it isevident that the same rotation takes the vector u to the vector Rn2

Rn1u. Thus, we have

established the following:

Lemma 11.2. Suppose the planes P1 and P2, with unit normals n1 and n2, make a dihedralangle ofϕ/2 with each other. Let Rn1

and Rn2be the reflection operators into the planes P1 and

P2, and let Rn,ϕ be the rotation operator by angleϕ about the vectorn= (n1×n2)/‖n1×n2‖.Then Rn,ϕ = Rn2

Rn1.

In the previous section we characterized a reflection operator as a triple quaternionproduct. That, along with the lemma above enables us to express a rotation operator interms of quaternions. This is the subject of the following:


P1

n1

P2

n2

ϕ

2

n

(a) Planes P1 and P2, shown edge-wise, have unit normals n1 andn2, and make a dihedral angleof ϕ/2. The unit vector n =(n1×n2)/‖n1×n2‖ lies alongtheir intersection, and pointsout of the picture, toward you,therefore it is not visible.

P1

n1

P2

n2

ϕ

2u

Rn1u

Rn2Rn1

u

(b) Reflecting the arbitrary vector u into theplane P1 produces R

n1u. Reflecting the lat-

ter into the plane P2 results in Rn2

Rn1

u.

P1

n1

P2

n2

ϕ

2u

Rn1u

Rn2Rn1

u

P ′1

ϕ

2

(c) We introduce the auxiliary plane P ′1 (shown as adashed line) which is the reflection of the plane P1into the plane P2, therefore it makes a dihedral an-gle of ϕ/2 with P2.

P1

n1

P2

n2

ϕ

2u

Rn1u

Rn2Rn1

u

Rn2u

P ′1

ϕ

2

n

(d) The vector Rn2

produced by reflecting u into P2 coin-

cides with the reflection of Rn2

Rn1

u into P ′1. From

the symmetries of the diagram it is evident that a rota-tion by an angle ϕ about the vector n takes the planeplane P1 to the plane P ′1. It follows that the same ro-tation takes the vector u to the vector R

n2Rn1

u, thus

proving that Rn,ϕ = R

n2Rn1

.

Figure 11.2: The sequence of diagrams is in effect a “proof without words” ofLemma 11.2. It shows clearly that the composition of the two reflectionsR

n1and R

n2into the planes P1 and P2 is equivalent to a rotation by an

angle ϕ about the vector n which lies in the intersection of those planes.

11.2. The geometry of the quaternions 105

Theorem 11.3. The operator Rn,ϕ of rotation by angle ϕ about a unit vector n has an

associated unit quaternion

q= [ cosϕ

2, n sin

ϕ

2] (11.9)

so that

Rn,ϕu= q u q∗, for all u ∈ E3. (11.10)

Proof. Pick any two planes P1 and P2 which intersect along a line parallel to n, and whosedihedral angle is ϕ/2. Then according to Theorem 11.1 we have

Rn1u=n1 u n1. Rn2

u=n2 u n2,

and according to Lemma 11.2 we have Rn,ϕ = Rn2Rn1

. Thus, we compute

Rn,ϕu= Rn2Rn1

u

= Rn2(n1 u n1)

=n2 (n1 u n1) n2

= (n2 n1) u (n1 n2).

Let q=n2 n1. We leave it as an exercise for you to show that q∗ =n1 n2. We concludethat Rn,ϕ = q u q∗, which proves (11.10). To show that q has the form given in (11.9),

observe that the dihedral angle between the planes P1 and P2 is ϕ/2, therefore the vectorsn1 and n2, which are perpendicular to those planes, also form an angle of ϕ/2. Thenapplying (11.4) we see that

q=n2 n1 = [ −n2 ·n1, n2×n1 ]= [ − cosϕ

2, −n sin

ϕ

2]=−[ cos

ϕ

2, n sin

ϕ

2].

The negative sign in the final expression is immaterial since q appears twice in the rotationformula (11.10). Finally, the assertion that q is a unit quaternion follows immediately fromthe form of (11.9).

Remark 11.2. The planes P1 and P2 and their associated unit normals n1 and n2 whichenter the proof above are conveniences for the proof. They do not appear in the theorem’sfinal result.

Remark 11.3. If we split the quaternion q in (11.10) into components, as in q= [ q0, q ],and expand the triple quaternion product, we get

Rn,ϕu= (q20 −‖q‖2)u+ 2(q ·u)q+ 2q0(q×u). (11.11)

We may be more explicit by setting q0 = cosϕ/2 and q =n sinϕ/2 and simplifying (11.10)to get

Rn,ϕu= (n ·u)n+

u− (n ·u)n

cosϕ+(n×u) sinϕ

which has an obvious geometric interpretation if you look at it closely.


11.3 Angular velocity

Theorem 11.3 shows that a rotation operator in E3 may be expressed in terms of an as-sociated unit quaternion q defined in (11.9). A rotation that varies with the time t thenmay be expressed in terms of a time-dependent unit quaternion q(t ) of that form. Anarbitrarily fixed vector r0 ∈ E3 then rotates to the position r(t ) according

r(t ) = q(t ) r0 q∗(t ). (11.12)

Suppose that the instantaneous angular velocity vector under this rotation at time t isω(t ). Then, according to (7.7), we have:

r(t ) =ω(t )×r(t ). (11.13)

Comparing (11.12) and (11.13) we expect a relationship between q(t ) and ω(t ). This isstated in the following:

Theorem 11.4. Let q(t ) be any time-varying unit quaternion, and let ω(t ) the angularvelocity of the rotation (11.12) engendered by q(t ). Then we have

ω = 2q q∗. (11.14)

Proof. Since q(t ) is a unit quaternion, we have |q(t )|2 = q(t ) q∗(t ) = 1, and thereforeq q∗+q q∗ = 0. But since q q∗ = (q q∗)∗, it follows that q q∗+(q q∗)∗ = 0, whichindicates that the scalar part of the quaternion q q∗ is zero, that is, q q∗ is a vectorialquaternion. This justifies defining the vector ω through the quaternion product (11.14).It remains to show that ω thus defined is indeed the motion’s angular velocity.

Toward that end, let us compute r by differentiating (11.12):

r = q r0 q∗+ q r0 q∗.

We note q−1 = q∗ by virtue of q being a unit quaternion, therefore r = q r0 q∗ impliesthat

r0 q∗ = q∗ r and q r0 = r q.

Therefore, the expression for r takes the form

r = q q∗ r+r q q∗

= q q∗ r+r (q q∗)∗

=1

2ω

r+r 1

2ω

∗=

1

2(ω r+r ω∗).

We leave it as an easy exercise to show that evaluating the right-hand side with the help ofthe identity (11.4) leads to

r =ω×r (11.15)

which shows that ω defined in (11.14) is indeed the sought-after angular velocity.

11.4. A differential equation for the quaternion rotation 107

11.4 A differential equation for the quaternion rotation

Recall the definitions of the three frames of reference in Section 10.8.1 in connection withthe motion of a rigid body. The stationary frame is equipped with a non-moving orthonor-mal triad i,j,k. The body frame is equipped with an orthonormal triad b1,b3,b3which is fixed to the body and movies with it.

At any time t , the body triad b1,b3,b3 may be viewed as the rotated version of thestationary triad i,j,k. Let q(t ) be the unit quaternion associated with that rotation.Thus,

b1(t ) = q(t ) i q∗(t ), b2(t ) = q(t ) j q∗(t ), b3(t ) = q(t ) k q∗(t ), (11.16)

which may be expanded with the help of (11.11), if desired.Let ω be the body’s angular velocity vector, and ω1, ω2, and ω3 be ω’s components

in the body frame:

ω =ω1b1+ω1b2+ω1b3. (11.17)

Substituting from (11.16) we get

ω =ω1q i q∗+ω2q j q∗+ω3q k q∗

= q (ω1i+ω2j+ω3k) q∗.

Then from (11.14) it follows that

2q q∗ = q (ω1i+ω2j+ω3k) q∗,

whence

q=1

2q (ω1i+ω2j+ω3k). (11.18)

This system of first order differential equations expresses the evolution of q in time. To-gether with the equations in (11.16), these determine the body’s orientation in space. Wewill adjoin them to the differential equations of dynamics to obtain a complete system ofdifferential equations that describes the body’s motion.

The differential equation’s initial condition, q(0), which specifies the body’s initialorientation, is a unit quaternion, as any quaternion associated with a rotation ought to be.A question arises whether the solution q(t ) of the differential equation is a unit quaternionfor all t . The answer is yes, as it follows from Theorem 11.5 below.

Remark 11.4. Equation (11.17) defines the components ω1, ω2, ω3 of the angularvelocity vectorω along the body triad. Beware that the sumω1i+ω2j+ω3k that appearsin (11.18) does not add up to ω.

Theorem 11.5. The differential equation

q(t ) = q(t ) a(t ), q(0) = q0

for the quaternion q(t ), where a(t ) is an arbitrary time-dependent vector, preserves the norm,that is |q(t )| = |q0| for all t . In particular, if q0 is a unit quaternion, then q(t ) is a unitquaternion for all t .

Proof. We calculate the derivative of |q(t )|2 and substitute for q from the differential


equation:

d

d t|q|2 = d

d t(q q∗) = q q∗+ q q∗

= (q a) q∗+ q (q a)∗ = (q a) q∗+ q (a∗ q∗).

In the last step we have used the quaternion conjugation property (p q)∗ = q∗ p∗; seeExercise 5. Now note that for any vector a we have a∗ = [0, a ]∗ = [0, −a ]=−a. Then

it follows that dd t|q|2 = 0, therefore the norm of q remains constant.

For computing purposes, we express q in components, as in

q= [ q0, q1i+ q2j+ q3k ],

whereby (11.18) expands to the system of differential equations

d

d t

q0q1

q2

q3

=

1

2

0 −ω1 −ω2 −ω3ω1 0 ω3 −ω2

ω2 −ω3 0 ω1

ω3 ω2 −ω1 0

q0q1

q2

q3

. (11.19)

11.5 Unbalanced ball rolling on a horizontal plane

Consider a solid ball of radius R and uniformly distributed total mass of M . We knowthat the ball’s moment of inertia tensor relative to its center is 2/5M R2 times the identitytensor.

We embed a point of mass m, let’s call it a “slug”, within that ball at a distance a ≤ Raway from its center. We wish to study the motion of the composite object when it rollswithout slipping on a horizontal plane.

We set up the usual i,j,k orthonormal triad as a stationary frame, with the originon the horizontal plane and k pointing upward. We set up the body frame with its originat the ball’s center, and the body triad b1,b2,b3 oriented so that the slug is at ab3 relativeto the ball’s center, and we let q(t ) be the unit quaternion associated with the rotation thattakes i,j,k to b1,b2,b3.

Let rc = x i+y j+Rk be the position vector of the ball’s center relative to the station-ary frame. The body’s position at any time is given by the vector rc and the quaternionq.

We express the ball’s angular velocity ω in terms of its components along the bodytriad as in (11.17), and note that the derivative ω is given by (7.4) on page 58.

11.5.1 The no-slip condition

The velocity of the point on the ball which is at contact with the horizontal plane isrc +ω× (−Rk), which should be zero if there is to be no slippage at the contact point.Therefore we have rc = Rω×k, that is, x i+ y j = Rω×k. It follows that x = Rω×k ·i=Rk× i ·ω = Rj ·ω. Similarly, y =−R i ·ω. The conditions

x = Rj ·ω and y =−R i ·ω (11.20)

are this problem’s nonholonomic constraints. We will use these to eliminate x and y infavor of ω.

11.5. Unbalanced ball rolling on a horizontal plane 109

Additionally, we use these to eliminate the second derivatives, x and y from the Gibbsfunction. Let us make a note here:

x = Rj · ω, y =−R i · ω. (11.21)

11.5.2 The Gibbs function and the equations of motion

The Gibbs function of the system is the sum of the Gibbs functions corresponding to thehomogeneous ball and to the slug. We calculate them separately, and then add them up.

The Gibbs function of the homogeneous ball is

G1 =1

2

M (x2+ y2)+2

5M R2(ω2

1 + ω22 + ω

23)

.

Let us note that due to (11.21) we have

‖rc‖2 = x2+ y2 = R2

(j · ω)2+(i · ω)2

= R2

‖ω‖2− (k · ω)2

= R2

ω21 + ω

22 + ω

23 − (k · ω)2

, (11.22)

therefore

G1 =1

2M R2

7

5(ω2

1 + ω22 + ω

23)− (k · ω)2

.

To compute the slug’s Gibbs function, let us write r for its position vector. We have:

r = rc + ab3.

Then we calculate the slug’s velocity

r = rc + ab3 = rc + aω× b3,

and its acceleration

r = rc + a

ω× b3+ω× b3

= rc + a

ω× b3+ω× (ω× b3)

= rc + a

ω× b3+(ω ·b3)ω− (ω ·ω)b3

.

Noting that ω ·b3 =ω3 and

ω× b3 = (ω1b1+ ω2b2+ ω3b3)× b3 =−ω1b2+ ω2b1,

the acceleration simplifies to

r = rc + a

− ω1b2+ ω2b1+ω3ω−‖ω‖2b3

= rc + a

− ω1b2+ ω2b1+ω3(ω1b1+ω2b2+ω3b3)− (ω21 +ω

22 +ω

23)b3

,

which we group as

r = rc + a

(ω2+ω3ω1)b1− (ω1−ω2ω3)b2− (ω21 +ω

22)b3

, (11.23)

then for rc substitute from (11.22), and finally we compute

G2 =1

2m‖r‖2− (−m g k) · r.


The fully expanded expression of G2 is horrendously large. For all practical purposes itis impossible to compute it with bare hands. Doing it with a symbolic computationalsoftware, such as MAPLE or MATHEMATICA, however, is not a problem.

The composite ball’s Gibbs function os G=G1+G2. The equations of motion are

∂ G

∂ ω1

= 0,∂ G

∂ ω2

= 0,∂ G

∂ ω3

= 0,

These along with the two equations (11.20) and the four equations (11.19) form a systemof nine first order differential equations in the nine unknowns x, y, ω1, ω2, ω3, q0, q1,q2, q3.

The initial conditions for the first five have immediate physical meanings and theirspecifications are intuitive and easy. The x and y place the ball’s center at a desired loca-tion, and ω1, ω2, ω3 impart it an initial velocity. (Remember that are the componentsof the angular velocity in the body triad. The ball’s initial orientation is specified throughq0, q1, q2, q3 as follows.

Begin with the ball oriented so that the body triad b1,b2,b3 is lined up with thestationary i,j,k triad. Then apply a rotation Rn,ϕ about a unit vector n and rotation

angle ϕ to orient the ball as desired. Any orientation in space may be achieved throughthe appropriate choices of n and ϕ. The quaternion associated with that rotation is givenin (11.9). Therefore

q0(0) = cosϕ

2, q1(0) = n1 sin

ϕ

2, q2(0) = n2 sin

ϕ

2, q3(0) = n3 sin

ϕ

2. (11.24)

Don’t forget that n= n1 i+ n2 j + n3 k must be a unit vector.

Remark 11.5. To make an animation of the rolling ball, it is suffices to design the ballgraphics in a reference position, say where the body axes is aligned with the stationaryaxes. Then, after having solved the differential equations and computed x(t ), y(t ), q0(t ),q1(t ), q2(t ), q3(t ), rotate the reference ball about the vector q1(t ) i+ q2(t )j+ q3(t )k bythe angle ϕ(t ) obtained from q0(t ) = cosϕ(t )/2, that is, ϕ(t ) = 2cos−1 q0(t ). Then trans-late the rotated ball’s center to the location x(t ) i+ y(t )j+ Rk. MAPLE’s plottoolspackage provides the commands rotate() and translate() for rotating and trans-lating graphics objects. Figure 11.3 shows the ball in its reference position (on the left),and a frame from an animation sequence (on the right).

Exercises

11.1. Prove the identity (11.6).

11.2. In Figure 11.1, n is a unit vector to the plane P , and u is an arbitrary vector. Whyis u− 2(n ·u)n the mirror reflection of u through the plane P?

11.3. Let q=n2 n1, where n1 and n2 are any two vectors. Show that q∗ =n1 n2.

11.4. Verify that the quaternion multiplication is associative, that is

p (q r) = (p q) r.

11.5. Show that (p q)∗ = q∗ p∗ for all quaternion p and q.

11.6. Show that |p q|= |p| |q| for all quaternions p and q.

Exercises 111

Figure 11.3: On the left is a sample graphics of the unbalanced ball in the referenceposition. On the right is a frame from an animation sequence which in-cludes the trace of the ball’s contact point with the floor. This particularanimation was produced with the parameters R = 1, a =−3/5. M = 100,m = 25, g = 1. The initial orientation was set through n = i, ϕ = π/2;see (11.24). The remaining initial conditions were x(0), y(0),ω1(0),ω3(0)all zeros, andω2(0) = 0.1.

11.7. Show that n n=−1 for all unit vectors n. Note the parallel with the imaginarynumbers: i2 =−1.

11.8. Supply the details that lead to (11.15).

11.9. Produce an animation of the unbalanced ball of Section 11.5.

Appendix A

Maple basics

This appendix provides a very terse summary of the basic features of MAPLE that you willneed to solve this book’s problems. For details you should consult MAPLE’s User Manual(about 350 pages) and Programming Guide (about 650 pages) which you may downloadfrom

http://www.maplesoft.com/documentation_center/

That page provides documentation for the latest version of MAPLE. You will find thedocumentation of earlier versions in

http://www.maplesoft.com/products/maple/

history/documentation.aspx

I have broken the long URL into two lines so that it fits between the margins of this book.You will enter those two lines as a single line in your browser.

A.1 Configuring Maple

Maple’s user interface is quite customizable. Unfortunately its default settings are lessthan desirable. In fact, in my opinion they are so awful as to make Maple almost unusable.Fortunately you may adjust those settings by following the simple instructions in:

http://userpages.umbc.edu/~rostamia/2009-01-math481/

configuring-maple.html

(Again, I have broken the long URL into two lines to fit). The screenshots there wereproduced a few years ago, using an early version of MAPLE, therefore they may not cor-respond exactly to what you will see on your version. Nevertheless, you should have notrouble in interpreting what needs to be done.

A.2 The execution group

The MAPLE prompt which looks like > on the screen, starts what is called an executiongroup. You may enter one or more MAPLE commands at the execution group, endingeach command with a semicolon, then hit Enter to execute those commands at once.Here is an example of a single command in an execution group:

113

114 Appendix A. Maple basics

> 3 + 4;

7 (1)

and here is an example of two commands in an execution group:

> expand((x+1)^2); int(%, x=0..1);

x2+ 2x + 1

7

3(2)

The % character seen above, called a ditto operator, picks up the result of the previouscommand.

As we see in the examples shown above, Maple attaches a label to the final computedresult in execution groups. You may refer to that result through its label. See the descrip-tion of the Ctrl–L key in the next section.

A.3 Maple key bindings

Ctrl–L pops up a dialog window in which you enter the label produced in a prior execu-tion group. The label entered here stands for the result of that execution group.

Ctrl–K and Ctrl–J insert a fresh execution group above or below, respectively, of thecurrent execution group.

Ctrl–Delete deletes the current execution group.

Shift–Enter breaks a line, enabling you to format your input, as in

> expand((x^2+1)^2);

1/%;

int(%, x=-1..1);

x4+ 2x2+ 1 (3)

1

x4+ 2x2+ 1(4)

1

2+π

4(5)

A.4 Expression sequences, lists, and sets

An expression sequence is a comma-separated collection of MAPLE objects. For instance,each of the following commands produces an expression sequence containing three ob-jects:

> 1, 2, 3;

1,2,3 (6)

The following expression sequence consists of two equations and one fraction:

> x = a + b, y = cos(t), 3/4;

x = a+ b , y = cos(t ),3

4(7)

A.5. Selecting and removing subsets 115

A list is an expression sequence enclosed in square brackets. For instance, [a,b,c] isthe list associated with the expression sequence a,b,c. To see the distinction between thetwo, consider a function f of three variables, and a function g of one variable. These maybe invoked as f (a, b , c) and g (p). Note, however, that g (a, b , c) is illegal, since g cannotbe invoked with more than one argument. On the other hand, g

[a, b , c]

is legal, sincethe brackets encapsulate the expression sequence a, b , c into a single object, that is, the list[a, b , c].

The order of the objects in a list is essential. The lists [a, b ] and [b ,a] are not thesame. To wit:

> is([a,b] = [b,a]);

false (8)

A set is an expression sequence enclosed in curly braces. The order of the entries ina set is immaterial. Moreover, in accordance with the mathematical definition of a set,b ,a,a, b is the same as a, b:> is(b,a,a,b = a,b);

true (9)

Sets may be combined through the union and intersection operations, as in

> a, b, c union c, d, e ;

a, b , c , d , e (10)

> a, b, c intersect c, d, e ;

c (11)

To pick element i of a list or a set A, we use the selection operation A[i]. For instance:

> A := [ a, b, c ];

> B := a, b, c ;

> A[2];

b (12)

> B[2];

b (13)

Specifying an empty bracket in a selection operation produces the expression sequencethat makes up the list or the set. Thus, with A and B as above, we get:

> A[];

a, b , c (14)

> B[];

a, b , c (15)

A.5 Selecting and removing subsets

Consider the set

> S := a, b, x = a^2 + b^2, y = a^2 - b^2 ;

S := a, b , x = a2+ b 2, y = a2− b 2 (16)

Two elements of S are equations, the other two are not. Suppose we wish to pick thesubset S consisting of equations. We do

> select(type, S, ’equation’);

x = a2+ b 2, y = a2− b 2 (17)


To select the subset of S consisting of everything other than equations, we do

> remove(type, S, ’equation’);

a, b (18)

A.6 Solving equations symbolically

The solve() function solves algebraic equations or systems of equations. It attempts toobtain a solution in symbolic form and gives up if no symbolic solution can be found. Inits most basic form, solve() takes two arguments. The first argument is an equation,or a set, or a list of equations. The second argument is the unknown, or a set, or a list ofunknowns. For example:

> solve(a*x^2 + b*x + c = 0, x);

1

2

−b +p−4ac + b 2

a, −1

2

b +p−4ac + b 2

a(19)

> solve( a[1]*x + b[1]*y = c[1],

a[2]*x + b[2]*y = c[2], x, y);n

x =− b1c2− b2c1

a1b2− a2 b1

, y =a1c2− a2c1

a1b2− a2 b1

o

(20)

Remark A.1. If the right-hand side of an equation is zero, then the “= 0” is optional.Thus, the first of two commands entered above may have been entered as

> solve(a*x^2 + b*x + c, x);

1

2

−b +p−4ac + b 2

a, −1

2

b +p−4ac + b 2

a(21)

If the equation is entered by hand, then there is no great savings in omitting the zero. Thevalue of this convention becomes apparent when solving machine-generated equations,or systems of equations, where only the left-hand sides are present, and the zeros of theright-hand sides are implicitly assumed.

A.7 Solving equations numerically

We are often faced with equations which do not have explicit symbolic solutions. In thosecases we turn to numerical approximation of their solutions. For instance, the transcen-dental equation

3(x + 1)e−x − 1= 0

has two solutions, as is evident in its graph:

> plot(3*(x+1)*exp(-x)-1, x=-1..4, view=[-1.5..4, -1..2.5]);

A.8. The eval() function 117

The function fsolve() is MAPLE’s numeric equation solver. It is used just likesolve():

> fsolve(3*(x+1)*exp(-x)-1, x);

−0.8587727590 (22)

We see that it picked one of the two solutions in this case. To pick the other one, wespecify a range in which to search for it:

> fsolve(3*(x+1)*exp(-x)-1, x=2..3);

2.289281415 (23)

Remark A.2. In the two invocations of fsolve() above, I have omitted the “= 0 fromthe equation’s right-hand side, as this is permissible according to Remark A.1 in the pre-viou section.

A.8 The eval() function

The eval() function plays several diverse roles in MAPLE. The one feature of eval()that we need for our purposes is to evaluate an expression with given values. For instance

> eval(x^2 + x + 1, [x = 1]);

3 (24)

This is equivalent to the usual mathematical notation (x2+ x + 1)

x=1.

Multiple substitutions are allowed. E.g.:

> eval(ax^2 + bx + c, [a=1, b=2, c=1]);

x2+ 2x + 1 (25)

A very common use of eval() is the following. Consider the solution of the linearsystem given under MAPLE output (20) in Section A.6. To extract the solution’s y value,we do:

> Y := eval(y, (20));

Y :=a1c2− a2c1

a1b2− a2 b1

(26)

Remark A.3. The second argument of eval() may be given as a list [...] or as a set.... They have similar effects, but in the case of a list, the arguments are substitutedin order, from left to right, while in the case of a set, the order of substitution may vary.

A.9 Expressions and functions

In MAPLE, as in customary mathematics, we distinguish between a function and the resultof the application of that function. In calculus we write, for instance, f : R→R indicatesa function f , while f (x) indicates the result of applying f to x. In casual mathematicaldiscussion we mix the two concepts, by referring, for instance, to “the function x2” whichis a short way of saying “the function whose application to x produces x2.” A computeralgebra system such as MAPLE does no permit such a loose talk. As far as MAPLE isconcerned, if f : R→ R defined through f (x) = x2, then f is a function while x2 is anexpression, and the two shall not to be confused.

In MAPLE a function is defined like this:


> f := x -> x^2;

f := x→ x2(27)

Then it may be used as

> f(5);

25 (28)

> f(a+b);

(a+ b )2 (29)

A.10 Vectors and Matrices

A row vector with entries a, b , c is produced by

> < a | b | c >;

[a b c] (30)

while a column vector is produced by

> < a, b, c >;

abc

(31)

A matrix may be constructed as column vector whose entries are row vectors, as in

> < < a[1,1] | a[1,2] | a[1,3] >,

a[2,1] | a[2,2] | a[2,3] > >;

a11 a12 a13

a21 a22 a23

(32)

Equivalently, that matrix may be constructed as row vector whose entries are columnvectors, as in

> < < a[1,1], a[2,1] > | < a[1,2], a[2,2] >

| < a[1,3], a[2,3] > >;

a11 a12 a13

a21 a22 a23

(33)

The transpose of a vector or a matrix is constructed through the %T superscript, as in

> u := < a | b >;

u := [a b ] (34)

> u^%T;

ab

(35)

Multiplication between two vectors, or two matrices, or a vector and a matrix, is donethrough the dot operator. For instance,

> A := < < a[1,1] | a[1,2] >, < a[2,1] | a[2,2] > >;

a11 a12

a21 a22

(36)

> X := < x[1] , x[2] >;

x1x2

(37)

A.11. Differentiation 119

> X^%T . X;

x21 + x2

2 (38)

> A . X;

a11x1+ a12 x2

a21x1+ a22 x2

(39)

Example A.1. The position vector of a particle is a vector that extends from a point of ref-erence to the particle. The motion of the particle is tracked by a time-dependent positionvector. For instance, the position of a simple pendulum’s bob may be given as

> r := phi -> < sin(phi), cos(phi) >;

r := ϕ→⟨sin(ϕ), cos(ϕ)⟩ (40)

where phi is the angle that the pendulum makes with the downward pointing vertical.The bob’s position at any time is given by

> r(phi(t));

sin

ϕ(t )

cos

ϕ(t )

(41)

A.11 Differentiation

Differentiation, ordinary or partial, is performed through the diff() function:

> diff(a*x^2 + b*x + c, x);

2ax + b (42)

> diff(x^2 + y^2, x);

2x (43)

Higher order derivatives are produced by extending diff()’s arguments, as seen below:

> diff(x^7, x, x);

42x5(44)

> diff(x^7, x, x, x);

210x4(45)

> diff(g(x,y), x, x);

∂ 2

∂ x2g (x, y) (46)

> diff(g(x,y), x, y);

∂ 2

∂ y∂ xg (x, y) (47)

The D operator offers an alternative syntax for computing derivatives. For instance,if f (x) = x2, then

> D(f)(x);

2x (48)

> D(f)(5);

10 (49)

A frequent use of the D operator is in specifying initial conditions for differential equa-tions of second order and higher. For instance, the initial condition x(0) = a is entered asD(x)(0)=a. See subsection A.12.1 for an example.


Remark A.4. The diff() function is designed to differentiate scalar quantities. Todifferentiate a vector, we use diff~(). For instance, the velocity vector of Example A.1’sposition vector r(phi(t)) is obtained by

> v := diff~(r(phi(t)), t);

cos

ϕ(t )

d

d tϕ(t )

− sin

ϕ(t )

d

d tϕ(t )

(50)

Remark A.5. If the particle’s mass in the previous remark is m, then its kinetic energy is12 m‖v‖2:

> 1/2 * m * v^%T . v;

1

2m cos

ϕ(t )2

d

d tϕ(t )

2

+1

2m sin

ϕ(t )2

d

d tϕ(t )

2

(51)

The result may be simplified through the application of MAPLE’s simplify() com-mand:

> simplify((51));

1

2m

d

d tϕ(t )

2

(52)

A.12 Solving differential equations

The dsolve() function solves differential equations or systems of differential equations.The equations may be linear or nonlinear. By default, dsolve() attempts a symbolicsolution. Numeric solutions may be obtained by supplying extra arguments.

A.12.1 Solving differential equations symbolically

Here are a couple of example of dsolve() in symbolic mode:

> de1 := diff(y(t),t) + a*y(t) = 0;

de1 :=d

d ty(t )+ ay(t ) = 0 (53)

> dsolve(de1, y(t));

y(t ) = _C 1 e−at(54)

> de2 := diff(y(t),t,t) + omega^2*y(t) = 0;

de2 :=d 2

d t 2y(t )+ω2y(t ) = 0 (55)

> dsolve(de2, y(t));

y(t ) = _C 1 sin(ωt )+_C 2 cos(ωt ) (56)

> de_sys := diff(x(t),t) + x(t) = y(t),

diff(y(t),t) + y(t) = x(t);

de_sys :=d

d tx(t )+ x(t ) = y(t ),

d

d ty(t )+ y(t ) = x(t ) (57)

> dsolve(de_sys, x(t),y(t));

x(t ) = _C 1+_C 2e−2t , y(t ) = _C 1−_C 2e−2t(58)

A.13. Plotting 121

Arbitrary constants that enter general solutions of differential equations appear as _C 1,_C 2, etc. Underscores guard against a potential conflict between user-defined versusMAPLE-supplied symbols.

To solve initial value problems, include the initial values in dsolve()’s first argu-ment:

> dsolve(de1, y(0)=b, y(t));

y(t ) = b e−at(59)

> dsolve(de2, y(0)=a, D(y)(0)=b, y(t));

y(t ) =b sin(ωt )

ω+ a cos(ωt ) (60)

> dsolve(de_sys, x(0)=5, y(0)=7, x(t), y(t));

x(t ) = 6− e−2t , y(t ) = 6+ e−2t(61)

A.12.2 Solving differential equations numerically

MAPLE offers a plethora of methods for calculating and displaying numerical solutionof differential equations. Here I present the one method which is most useful for ourpurposes, through solving the differential equation, and plotting the solution of, the Vander Pol oscillator. Note the use of eval() from Section A.8 to extract the parts x(t ) andd x/d t from the list returned by dsolve().

> vdp := diff(x(t),t,t) - (1-x(t)^2)*diff(x(t),t) + x(t) = 0;

d 2

d t 2x(t )−

1− x(t )2 d

d tx(t )

+ x(t ) = 0 (62)

> tmax := 10;

tmax := 10 (63)

> dsol := dsolve(vdp, x(0)=2, D(x)(0)=-2,

numeric, output=listprocedure, range=0..10);

dsol :=h

t = proc(t ) . . . endproc, x(t ) = proc(t ) . . . endproc,

d

d tx(t ) = proc(t ) . . . endproc

i

(64)

> my_x := eval(x(t), dsol);

my_x := proc(t ) . . . endproc (65)

> my_dx := eval(diff(x(t),t), dsol);

my_dx := proc(t ) . . . endproc (66)

> plot(my_x(t), t=0..tmax, labels=[t,x(t)]);

> plot(my_dx(t), t=0..tmax, labels=[t,dx(t)/dt]);

> plot([my_x(t), my_dx(t), t=0..tmax], labels=[x(t), dx(t)/dt]);

The plot(...) commands in the final three lines produce the graphs of x versus t ,d x/d t versus t , and d x/d t versus x, shown side by side in Figure A.1. The details ofMAPLE’s plotting functions are given in Section A.13.

A.13 Plotting

A.13.1 Plotting a single function

The plot() function provides the basic two-dimensional plotting services.

> plot(x^2, x=-1..1, labels=[x, y]);


Figure A.1: The graphs of x versus t , d x/d t versus t , and d x/d t versus x, of a solutionof the Van der Pol oscillator.

A.13.2 Plotting multiple function together

Ifplot()’s first argument is a list of expressions, then their graphs are plotted all togetherin one coordinate system, and in varying colors:

> plot([x^2, 1-x^2, 0.4*exp(x)], x=-1..1, labels=[x,y]);

A.13.3 Parametric plot

A parametric plot, such as

x(t ), y(t )

, a ≤ t ≤ b is specified as[x(t), y(t), t=a..b]. The following example plots a cycloid:

> plot([t - sin(t), 1 - cos(t), t=0..2*Pi],

scaling=constrained, labels=[x,y], color=red,

tickmarks=[piticks,default], view=[0..7, 0..2.5]);

The scaling=constrained option seen above, requests a graph with a 1:1 aspectratio, that is, the unit lengths are equal along the horizontal and vertical axes. Addi-

A.13. Plotting 123

tionally, the command above shows the effects of the (self-explanatory) tickmarks andview options.

A.13.4 Plotting points and more

More specialized plotting constructs are delegated to the plots package which you willload with with(plots) in order to access them. The pointplot() function in thatpackage is designed to plot points on the plane, and optionally, connect them with straightlines. Here is an example of its usage, along with a few options which modify its behavior:

> with(plots):

> pointplot([ [0,0], [1,3], [2,1], [3,2], [4,2] ],

symbol=solidcircle, symbolsize=30, color=blue,

labels=[x,y]);

p1 := %:

The last execution group consists of two commands. The first produces the plot. Thesecond assigns the name p1 to that plot. We may access the plot later on in the worksheetby referring to its name. If you don’t expect to be referring to the plot, then there is noneed to name it.

The connect=true option connects the consecutive points with straight line seg-ments, and thus produces a piecewise linear curve. The point symbols themselves are notplotted in that case.

> pointplot([ [0,0], [1,3], [2,1], [3,2], [4,2] ],

symbol=solidcircle, symbolsize=30, color=red,

labels=[x,y], connect=true);

p2 := %:

A.13.5 Overlaying multiple plots

To superpose the two or more graphs, use the display() function, which is defined intheplots package. For instance, the plots p1 and p2 produced above may be superposed


to produce a composite graph:

> display([p1,p2]);

A.13.6 Reflecting a plot

The plottools package provides several additional plotting tools. Among these arefunctions to plot ellipses, polygons, as well as three-dimensional solids. It also providesfunctions for transforming plots, such as scaling, translating, rotating, and reflecting.

To illustrate some of the plottools functionality, let us make an ellipse:

> myellipse := ellipse([0,2],1,2, filled=true, color=pink):

This produces a plot structure of an ellipse; it does not plot the ellipse by itself. To see theellipse, apply display() to the plot structure:

> display(myellipse, scaling=constrained);

Now, suppose that we wish to reflect the ellipse about the line that connects the points(0,5) and (3,0). The following sequence of commands plots the points, the line that con-nects them, the ellipse, the reflected ellipse, and then applies display() to display allfour objects together in one diagram:

> display([

pointplot([[0,5], [3,0]], symbol=solidcircle,

symbolsize=20, color=blue),

pointplot([[0,5], [3,0]], connect=true, color=gray),

myellipse,

reflect(myellipse, [[0,5], [3,0]])

], scaling=constrained);

A.14. The Euler–Lagrange equations 125

A.14 The Euler–Lagrange equations

MAPLE’sEulerLagrange() function, which is defined in theVariationalCalculuspackage, produces the Euler–Lagrange equations the functional

J (x) =

∫ b

a

L(t , x(t ), x ′(t ))d t

subject to x(a) = A and x(b ) = B . Here x may be a scalar- or vector-valued function.Here is how it is done. First, we load the package:

> with(VariationalCalculus):

If x is scalar-valued, the corresponding Euler–Lagrange equation is obtained through

> EulerLagrange(L, t, x(t));

If x is vector-valued, with components x1(t ), x2(t ), . . . , xn(t ), we do

> EulerLagrange(L, t, [x[1](t),x[2](t),...,x[n](t)]);

In either case, EulerLagrange() returns a set consisting of

• n second order differential equations in the n unknowns x1(t ), x2(t ), . . . , xn (t ).

• zero or more first order differential equations of the type

Fi

x1(t ), x2(t ), . . . , xn(t ), x1(t ), x2(t ), . . . , xn (t )

, i = 1, . . . ,

each of which represents a first integral of the problem, and where Ki are arbitraryconstants. A first integral is the result of integrating, if possible, a combination ofthe previously noted second order differential equations.

In our application we won’t be interested in the first integrals, therefore we removethem from the set by applying the remove() function described in Section A.5, and thusleaving only the second order differential equations.

A.15 The animation of a simple pendulum

In this section we will present a complete and self-contained MAPLE worksheet in whichwe derive the equation of motion of a simple pendulum through the Lagrangian formula-tion, solve the resulting differential equation, and produce an animation that depicts thependulum’s motion. All MAPLE functions used in this worksheet have been describedearlier in this chapter.


The worksheet begins in Listing A.1 and continues into Listing A.2. The purposeof the restart command which appears at the top of the worksheet is to unassign allassigned variables and free the memory that MAPLE has allocated for its work. The com-mand has no effect in a freshly started worksheet but its inclusion at the top of a worksheetis a good practice. As you experiment with a worksheet’s contents, you may find it usefulto execute restart to reset MAPLE to its initial state and then start a new set of exper-iments. Without the restart you will have to exit and restart MAPLE to achieve thesame effect.

Next we load three packages which supply some of the special-purpose functions usedin the rest of the worksheet. Specifically, the VariationalCalculus package sup-plies the EulerLagrange() function; the plots package supplies the pointplot()and display() functions; and the plottools package supplies the reflect() func-tions.

Now I will continue with commenting on the rest of the worksheet. A “Line num-bers” in this context refers to the numerical labels produced by MAPLE which appearalong the right edge of the listing.

Line (1). We define the position vector r(ϕ) of the pendulum’s bob as a function of thedeflection angle ϕ. The components are in accordance with the Cartesian coordi-nate system shown in Figure 1.1. In the worksheet I have used “a” for the lengthof the pendulum’s rod instead of “ℓ” shown in the figure because the symbol “ℓ” isnot available in a MAPLE worksheet.

Lines (2)–(5). We compute the velocity vector v, the kinetic energy T , the potential en-ergy V , and finally, the Lagrangian L. Note the “[2]” in computing V ; it extractsthe second, that is the y, component of the position vector r.

Lines (6)–(8). We apply the EulerLagrange() function to produce the differentialequations of motion. The result is a set consisting of a differential equation, anda first integral. (See section A.14 for explanation.) We apply remove() (see Sec-tion A.5) to remove the first integral, and thus are left with a set consisting of asingle equation. Finally, we apply the empty selector [] to the set (see Section A.4)to produce its contents, and assign it to the variable de.

Lines (9) and (10). We select a set of numerical values for the pendulum’s parameters,substitute them in the differential equation, and name the result myde.

Lines (11) and (12). We define the motion’s initial conditions ϕ(0) and ϕ(0), and a vari-able tmax which sets the upper bound on the time range of interest.

Lines (13) and (14). We apply dsolve() to solve (numerically) the differential equa-tion myde along with the initial conditions ic. Then we extract the ϕ(t ) compo-nent of the solution and name it myphi.

Continuing on to listing A.2, we plot the function myphi(t ) which was computed in thepreceding listing. As expected, the graph periodic oscillations of the angle.

Next, on Line 15, we evaluate the vector r(ϕ) with ϕ set to myphi(t ). Applying theeval() is necessary; otherwise the result will contain the undefined symbols g and a init.

The next longish block of code beginning with frames := ... produces a se-quence of frames which when played quickly in succession, will produce and animationof the pendulums motion. The definition of frames may look somewhat complex in

A.15. The animation of a simple pendulum 127

Listing A.1: A complete MAPLE session, demonstrating the derivation of the equationof a simple pendulum, and its numerical solution. The final commandproduces a plot (not shown) of the pendulum’s angleϕ(t ) versus the time t .

> restart;

> with(VariationalCalculus): with(plots): with(plottools):

> r := phi -> < a*sin(phi), a*cos(phi) >;

r := ϕ→ ⟨sin(ϕ), cos(ϕ)⟩ (1)

> v := diff~(r(phi(t)), t);

cos

ϕ(t )

d

d tϕ(t )

− sin

ϕ(t )

d

d tϕ(t )

(2)

> T := simplify( 1/2 * m * v^%T . v );

T :=1

2m

d

d tϕ(t )

2

(3)

> V := -m*g*r(phi(t))[2];

V :=−m ga cos

ϕ(t )

(4)

> L := T - V;

L :=1

2m

d

d tϕ(t )

2

+m ga cos

ϕ(t )

(5)

> EulerLagrange(L, t, phi(t));§

−m ga sin

ϕ(t )

−m

d 2

d t 2ϕ(t )

a2,1

2m

d

d tϕ(t )

2

a2+m ga cos

ϕ(t )

= K1

ª

(6)

> remove(type, (6), ’equation’);§

−m ga sin

ϕ(t )

−m

d 2

d t 2ϕ(t )

a2ª

(7)

> de := (7)[];

de :=−m ga sin

ϕ(t )

−m

d 2

d t 2ϕ(t )

a2(8)

> params := [ m = 1, a = 1, g = 1 ];

params := [m = 1, a = 1, g = 1] (9)

> myde := eval(de, params);

myde :=− sin

ϕ(t )

− d 2

d t 2ϕ(t ) (10)

> ic := phi(0)=Pi/6, D(phi)(0)=0;

i c := ϕ(0) =1

6π, D(ϕ)(0) = 0 (11)

> tmax := 30;

tmax := 30 (12)

> dsol := dsolve(myde, ic, numeric,

output=listprocedure, range=0..tmax);

dsol :=h

t = proc(t ) . . . endproc, ϕ(t ) = proc(t ) . . . endproc,

d

d tϕ(t ) = proc(t ) . . . endproc

i

(13)

> myphi := eval(phi(t), dsol);

myphi := proc(t ) . . . endproc (14)

...continued in Listing A.2...


Listing A.2: Continued from Listing A.1.

> plot(myphi(t), t=0..tmax, tickmarks=[default,piticks],

view=[0..tmax,-Pi/4..Pi/4], labels=[t,phi(t)]);

> myr := eval(r(myphi(t)), params);

myr :=

sin

myphi(t )cos

myphi(t )

(15)

> frames :=

seq(

display([

pointplot([ <0,0>, myr ], connect=true, thickness=2),

pointplot(<0,0>, symbol=soliddiamond, symbolsize=30,

color=black),

pointplot(myr, symbol=solidcircle, symbolsize=50,

olor=red)

]),

t=0..tmax, tmax/250):

> reflect(

display([frames], insequence=true, scaling=constrained,

tickmarks=[0,0]),

[[0,0],[1,0]]);

first encounter, but fortunately it is not as complex as it looks. What it really says is thatframes consists of a sequence of 251 plots:

frames := seq( p(t), t=0..tmax, tmax/250):

where each p(t ) represents a drawing of the pendulum at some time t . Each plot p(t )is a composite, consisting of three subplots overlaid with the help of the display()command; see subsection A.13.5. The three subplots consist of (a) the pendulum’s “rod”drawn with pointplot() connect=true; (b) a black diamond placed at the origin

A.15. The animation of a simple pendulum 129

as an indicator of the pendulum’s support; and (c) a red solid circle that represents thependulum’s bob.

The final command, beginning with reflect(... plays the animation. To under-stand what it does, ignore the reflect part for the moment and look at its argument,which, in principle, says

display([frames], insequence=true, other options);

This displays the 251 individual frames stored in frames, one at a time in quick succes-sion, and thus “plays” a movie.

The purpose of applying the reflect() function to frames is to flip the plots rel-ative to the horizontal axis. This is because our pendulum model is derived in a Cartesiancoordinate system where the vertical axis points downward; see Figure 1.1. MAPLE, how-ever, is not aware of this, so without the reflect() it will show the vertical axis in thestandard way, that is pointing upward, and consequently the animation will come outinverted. See subsection A.13.6 for a description of MAPLE’s reflect() function.

Remark A.6. Upon executing the reflect(display[... command, MAPLE willdisplay the first frame of the animation. To set the animation into motion, click on thedisplayed frame. You will observe that animation control buttons, similar to

appears in the worksheet’s toolbar area. Experiment with the various parts to find outwhat they do.

Bibliography

[1] Edward A. Desloge, Classical Mechanics, Vol. 1, John Wiley, 1982.

[2] Edward A. Desloge, Classical Mechanics, Vol. 2, John Wiley, 1982.

[3] Stuart S. Antman, The simple pendulum is not so simple, SIAM Rev. 40 (1998), no. 4, 927–930.

[4] James Stewart, Multivariable Calculus, Brooks/Cole, 2008.

[5] I. M. Gelfand and S. V. Fomin, Calculus of Variations, Prentice-Hall, 1963.

[6] Heinrich Hertz, The Principles of Mechanics Presented in a New Form (English Translation), translated by D.E. Jones and J. T. Walley, Macmillan and Co., New York, 1899.

[7] Roy Bowen M. and C.-C. Wang, Introduction to vectors and tensors, Vol. 1, Plenum Press, New York, 1976.

[8] A. I. Lur’e, Analiticheskaya mekhanika, Gosudarstv. Izdat. Fiz.-Mat. Lit., Moscow, 1961 (Russian).

[9] A. I. Lurie, Analytical mechanics, Foundations of Engineering Mechanics, Springer-Verlag, Berlin, 2002.Translated from the Russian by A. Belyaev.

[10] F. Gantmacher, Lectures on Analytical Mechanics, Foundations of Engineering Mechanics, Mir Publishers,Moscow, 1975. Translated from the Russian by George Yankovsky.

[11] Edward A. Desloge, The Gibbs-Appell equations of motion, American Journal of Physics 56 (1988), no. 9,841–846.

[12] Simon L. Altmann, Rotations, Quaternions, and Double Groups, Clarendon Press, Oxford, 1986.

[13] Andrew J. Hanson, Visualizing Quaternions, Morgan Kaufmann Publishers, San Franciso, 2006.

131

Analytical Mechanics - Personal Web Space - UMBC

Documents

Transcript of Analytical Mechanics - Personal Web Space - UMBC