Einstein Derivation

A Simple Derivation of Einsteins Field Equations

Written by: Joshua Pilipovsky Editor in Chief: Karol Woloszyn Presenter in Chief:Siddique Shafi Midwood High School

1

This is going to be an introduction to and simple derivation of Einsteins FieldEquations. Geared for relatively novice math backgrounds, this derivation will beneither rigorous nor ambiguous. It will be a clear and concise way to understand thegenius that Einstein had when he came up with this theory of relativity.

Before we begin, I must outline the two fundamental principles that you must knowbefore we can even begin to derive these equations. These two are the principle ofequivalence and curved spacetime. Imagine that you are in an inertial box (nooutside forces), and you are traveling upwards with an acceleration of g. According tothe principle of equivalence, this will be the same exact thing as if you are stationaryon Earths surface subject to a downward acceleration of g. You will feel absolutelyno difference in your referencn bb e frame between these two different scenarios. Thisbecomes important as you will see. Imagine you are again in this accelerating inertialframe of the box and there is a beam of light going to the left edge of the box. Asyou are moving up in space with an acceleration of g, notice that according to you,the light will go down because you are going up, logicly. However, if you considerthree instances and add the positions of the light into one diagram, you will notice,surprisingly, that light actually bends in a parabolic manner. This is a very surprisingresult from the classical point of view, because light should always follow a straightline path. However, this is not quite the case and light actually does bend in aninertial reference frame. Since this bending occurs in the accelerating frame, by theprinciple of equivalence, light must also then bend when you are stationary subject tothe gravitational force. We cannot see this with our eyes, however, because light istraveling at far too great of a speed for our eyes to perceive this minimal bending.

This is quite a result, but how can we account for this? One might say this has todo with the gravitational attraction so lets begin there. The gravitational attractionsbetween two masses is

F =GMm

r2

However, one immediately runs into a problem in that the mass of light, which, ac-cording to quantum theory, is made of a quanta of photons, which have masses of 0.So this whole term reduces therefore to 0 which is obviously trivial. Einstein knewthis, and he came up with a new approach... He said that all forms of motion arein curved spacetime. Now what does this exactly mean? He postulated that lightbends not because of a gravitational attraction between it and the Earth, but it bendsbecause the Earth causes a curve in the spacetime and light just follows this curve ina straight line. Think about it; lets say you go onto a curved surface. You are justfollowing a straight line path, but that straight line path is in fact curved, so you arejust following that curved path.

This explanation of how light travels also accounts for the Newtonian gravitationalattraction, aka the GMm

r2. Consider a bead on a trampoline. It would only make a little

kink in it. However, if I am standing on that trampoline, there will be a large bend init, much greater than that of the bead. Therefore, due to my mass, the bead will startrolling towards me, or towards the dip in the trampoline. This analogy accounts forthe gravitational attraction between two masses. They attract each other because ofthe dip in the space time, following the path of least energy. Analogously, Earth alsofollows this principle of curved spacetime. When Earth is orbiting around the Sun, itis actually going in a straight line, however the curve in spacetime due to the sun isforcing the Earth to follow its curved path. If this curve is flattened, Earth would just

2

be going in a straight line after all, but this curved spacetime is what makes for thecurved path of the Earth, which, as I repeat, is just following a straight path.

As a little digression, this pivotal fact of curved spacetime has some useful applica-tions in the large scale world of astrophysics, namely gravitational lensing. In fact, itis indeed possible to see galaxies which are behind other galaxies, which, if you thinkabout it, is completely remarkable. This fact, with the ingenious methods of scientistsand astrophysicists, have them able to measure the distances the two galaxies are fromeach other, and even signal supernovae that happened behind one galaxy and in theother. Imagine this in our world, we look at someone, and we can see the thing thatis behind them...absolutely mind blowing fact. And as I said again, this is all due tothe curved spacetime, which allow for light to bend, allowing us to view through ourtelescopes the planets and galaxies that lie behind others.

Now, you may be asking, what exactly is a spacetime? Spacetime is intuitively thecombination of our 3-D space and time into a 4 dimension model of the universe. Thiscombination of space and time constitutes a space called the Minkwoski Space,which is very handy for physicists to work with, but we wont get in to that forour purposes. Minkwoski Space is similar to Euclidean space in that they performsimilar functions in regards to transformations, rotations (although Minkowski Spacetransformations are invariant, meaning independent of frame of reference), or anythingelse, other than the obvious difference that Minkowski Space is 3+1 dimensions whilethe former is 3. Great! Now that we are done with the basics, we can finally startderiving the Einstein Field Equations, which read:

R 12gR + g =

8piG

c4T

Let us first begin by asking ourselves: what do these equations actually mean, allthese hieroglyphics (to some) can be extremely confusing. Starting from the far left,we encounter R , which is called the Ricci tensor. The Ricci Tensor essentiallyis a measure of the curvature of an object, in laypersons terms. More technically,if measure the deviation of a geodesic in a Riemannian Manifold (which is a curvedspace) from a stardard Euclidean n-space, which is just the usual n coordinate axesrectangular system. Next, we bump into g , which is known as the metric tensor.This tensor is a correction for pythagoreas theorem in curved space time. If youimagine a triangle in a curved space, say a sphere, the hypotenuse will most definitelynot be a straight line as in a regular Euclidean space, but instead it will be curved,and the metric tensor accounts for this and thus you are able to have pythagoreasin curved space. Right next to this, we have R, which is the Ricci scalar. This isbasically the Ricci tensor but it is truncated to become a scalar (more about this later;a scalar is actually a tensor of rank 0). Lastly, on the far right, we have T , which isthe stress-energy-momentum tensor. This tensor accounts for all the energy andmass in the universe into a nice compact matrix, which all tensors can be defined as.

Now we come to the fundamental conceptual goal of this paper, answering thequestion: What does this equation really mean? The terms on the left hand side(LHS) of the equation all represent space and time, while all of the terms on the righthand side (RHS) all represent mass and energy. What this is fundamentally saying isthat mass tells spacetime how to curve, and curved spacetime tells mass howto move. This is the essence of Einsteins Field Equations. If you dont understandanything from here on out, this should be your main take-away message. With thatunderstood, we can continue to derive this equation.

3

To start our derivation, we will begin by analyzing the very basics of differentialgeometry, and how this will lead us to define the metric tensor. Consider a field inour, lets call it, 3 dimensional space. Lets say that we want to find the height atany point in this field (call it a cow field), and that we also want to see how our heightwill change if we move in the x or y directions. Let us consider that we are standingon top of a little bump on this cow field, where there is essentially a ridge. A ridge isdefine to be a maximum of this fielding, meaning that if we move along the x direction,our height wont change because we are already at the maximum, but if we move alongthe y direction, we take a big dip down, so we will have a negative change in height.Now consider a gradient in this field. A gradient, in our case, is the ratio betweenthe height and the distance traveled, whether in the x or y directions. For example,consider a gradient of 1:10, meaning for every 1 meter of height, we go 10 meters in the(for example) x direction. To show this mathematically over an infinitesimal length,we say that the change in height is modeled by:

d =d

dxdx

This equation means that the change in height d is equal to the gradient 1 ddx

mul-tiplied by the distance traveled dx. So, as an example, lets say that we moved 5meters horizontally in this gradient. What would be our change in height? Well,d = 1

10(5) = .5, so we have moved a half a meter up our field.

This equation is, if you think about it mathematically, just the chain rule, where is a function of either x or y (in our case both, which we will see later), and thereforewhen we take the derivative of it, we then have to take the derivative of x due to thechain rule. However, this is one little nuance that we missed out on, that directionis extremely important here. If we take the change in height on our ridge in the xdirection, it would be 0, while in the y, it would be a negative number. Therefore, wewill have two seperate equations for the change in height in our x and y directions ofour field.

dx =d

dxdx dy =

d

dydy

These two are clearly not the same because of the analogy to the ridge. Now thatwe understand gradients and changes in height, let us move onto the pythagoreantheorem. Everyone knows the pythagorean theorem, but lets apply it for infinitesimallengths such that we use differentials. Therefore, if we have a length dx and dy, thenthe third side forming the right triangle, namely ds, will, according to pythagoreas, begoverned by this formula:

dx2 + dy2 = ds2

If, however, we treat these lengths (which are scalars), as vectors (which have mag-nitude and direction), then we will just have

d~s = d~x+ d~y,

1This is a very watered down version of the gradient for mathematically simplicity. The gradientis actually a term that you use in multivariable calculus, and is defined as such:

=< fx, fy, fz >This means that the gradient is actually the partial derivative (which I will explain later) of ourfunction f in every dimension that we are working with, in this case x, y, and z. Also, the gradientis a vector, so it should be written with noting its components.

4

as simple as that because of basic vector addition, where we use the tip to tail methodof adding two vectors. Now ds is pretty general, but for our purposes we just wantthe change in of our field, so that equation will turn into

ds = dx + dy

Recall previoulsy that we had found equations for dx and dy, and so we can justplug them in:

ds =d

dxdx+

d

dydy,

and if you think about it, this is just the chain rule applied two times for our twovariable x and y. However, to be proper, we should be writing partial derivativesinstead of regular ones for the following reason: We are working a three dimensionalspace, so when we are changing our height , we can be moving in either the x or they directions. For this reason, we must use partial derivatives instead of regular onesbecause the partial symbolically means keeping the other constant, its a partial, nota total derivative. For example, if we only want to see our rate of change in the xdirection, we take the partial with respect to x, keeping y constant, so we are not evenconsidering y. In single-variable calculus, there was only one direction you can movein to affect the function, so there was no need to use partial derivatives. Thus, ourequation should be properly written as:

ds =

xdx+

ydy.

Here we stop for a much needed nomenclature change. In the context of our field,we are only working in three dimensions but in the context of general relativity, wemust generalize to n dimensions. So, if we keep writing x and y and z, we willeventually run out of letters of the alphabet so from here on out, we will be writingx = x1, y = x2, z = x3, and so on so that when we generalize in n dimensions, wewill have consistent results. So, as a result, our previously corrected equation for thechange in height of a field is now read as:

ds =

x1dx1 +

x2dx2 + ...

If we have more dimensions, the terms will keep simply adding together and so we caneasily generalize this by saying:

ds =n

xndxn (1)

However, we have a very big problem on the horizon. The calculations we havejust made all stem from the coordinate axes that we have set on the field in thebeginning. What if this coordinate system was rotated 90 degrees or 40 degrees?How would our results change? We want our rules to be invariant of reference frame,meaning it doesnt matter what coordinate system we have, the results will always bethe same. This is extremely important because in the future we will have tensors,which are defined to be invariant of frame as well, so we need to be consistent withour methodology. Thus, we need to repeat this process for another coordinate system,

5

say y1 and y2 and we want to see if the gradient in this coordinate system is the sameas in the x coordinate system or different, and if so, how can we relate?

So lets begin. For our y coordinate system consisting of y1 and y2 coordinates, wecan use the chain rule as we did previously to find the gradient in the y1 direction. So,

y1=

x1x1

y1+

x2x2

y1

You can easily see that for the gradient of one y coordinate, you need to know thegradients of all x coordinates. Expanding this equation in n dimensions, we get

y1=

x1x1

y1+

x2x2

y1+ ...

or more generally

yn=m

xmxm

yn,(2)

for any n that we would like to chose. Just to reiterate, WE choose the n value, whileALL of the m values are summed up.

This equation is very important in that it represents the change in the height ofthe field in the y frame of reference in terms of the x frame of reference, and as youcan see the two are only seperate by the term x

m

yn. Now that we have that finished,

we can move on to the instrumental topic of tensors. So, this is how the story goes:Consider a scalar, something that has only magnitude, no direction. A scalar is calleda tensor of rank 0. Now consider a vector, which has a magnitude and a direction. Avector is a tensor of rank 1. If we keep following this pattern, we get to rank 2, andthese are formally known as tensors.

Definition:

1. A combination of vectors that has a fixed relationship among themselves.

2. If a tensor is 0 in one reference frame, then it is 0 in all reference frames. (In-variant)

The latter fact is extremely important and therefore we say that a tension is invariantunder coordinate transformations. So let us recap for a moment what we have doneso far: We first found the change in height of a field. Next, we found how the heightof a field transforms under coordinate transformations, from the x coordinate systemto the y. Casually, we would also like to know how tensors transform from differentcoordinate systems as well, because as I repeatedly stress, we want everything to beinvariant under transformations. However, lets first start off by asking ourselves, howdoes a vector transform?

This is a very easy task as you shall see. Consider a vector in the x frame ofreference V mx . We would like to see how this vector transforms in the y frame ofreference V ny , and establish a relationship between them. Well, recall equation (1):

ds =n

xndxn

6

Now, we just replace the d with V ny and with y and dx with Vmx and weve got our

equation:

V ny =m

yn

xmV mx (3)

This is how vectors transform between two coordinate systems, with their relationshipbeing the term y

n

xm. As I remind you again of this rather confusing but important

concept; we pick the n value (there is only 1) but all m values are summed over2 as inour previous equation (equation 2).

So what does a tensor actually look like mathematically? We defined a tensorbefore qualitatively, so now lets take a quantitative look:

Tmn AmBn

Notice that the tensor has 2 indices, both containing the vectors that make it up.Also notice that the tensor has mxn components because for example if m and n bothranged from 0 to 3, then each has 4 components for each of its dimensions, so thetensor would have 4x4 = 16 components contained inside it. From this definition, youcan easily see how a tensor transforms because we now just have to simply plug inequation (3) into this tensor formula and see what happens from there:

Amy Bny =

r

ym

xrArxs

yn

xsBsx,

where r and s are the so called dummy variables because they are just indices thatare being summed over and have no significant purpose in this equation; basically theyact as place holders because the indices m and n were already taken up. Therefore,

Tmny =r

s

ym

xryn

xsArxB

sx

Tmny =r

s

ym

xryn

xsT rsx (4)

Now this is a real tensor calculus-like equation!3 But what is this actually saying? Ourwhole goal of this was to see how tensors transform in changing coordinate systems,specifically in our case from the x to the y coordinate system, and this formula showsthe relationship between them, explaining mathematically how they transform. Intechnical terms, this transformation where the indices are upstairs (on the top of thetensor) is a called a contravariant transformation. There is an equivalent form of

2In general, if you have an equation with repeated indices, they are always summed over, no matterwhat. This is why Einstein, when deriving these equations, actually dropped this summation, whichlater came to be called the Einstein Summation Convention, because it is automatically assumedthat repeated indices are always summed up. For an example of repeated indices, consider this:

gdxdx

The repeated indices are the and , so they are automatically summed over, so we technically donot need the summation there.

3As you are beginning to see, we will soon need to follow the Einstein Summation Conventionbecause there will be wa-a-a-a-ay too many summations to write explicitly - all repeated indices aresummed over!

7

this equation called the covariant transformation where all of the terms are flippedand we will be needing that too so it is useful to have it:

T ymn =r

s

xr

ymxs

ynT xrs (5)

In essence, all of this was preparation for deriving the metric tensor as we said wewould do in the beginning of this section. So without further ado, lets begin. Consideragain the pythagorean theorem, no vectors this time, just magnitudes.

ds2 = dx12

+ dx22

+ dx32

+ ...

=m

dxmdxm

=m

n

dxmdxnmn

Woah... what happened there? Why is there this really weird delta term? This mnis called the kronocker delta. The kronocker delta is 1 if the indicies m and n areequal to each other and 0 if they are not, so

mn {

1 ifm = n

0 ifm 6= n

So, for all the terms when m is equal to n (all of terms that are needed in thepythagorean theorem because they must be equal), then the kronocker delta is 1,and we are just left with dxm2, which is what we want. This unusually complicatedway of writing the pythagorean theorem is needed as you shall see4. So, if you recallequation (1)5, we can rewrite it and plug the corresponding dxm and dxn into thisequation for the unusually complicated pythagorean theorem to achieve our result:

ds =n

xndxn

dxm =xm

yrdyr dxn =

xn

ysdys

ds2 = mnxm

yrdyr

xn

ysdys

ds2 = mnxm

yrxn

ysdyrdys

and finally, the term with the partial derivatives (which are summed over by theEinstein Convention) and the kronocker delta is defined to be the metric tensor!

mnxm

yrxn

ys gmn (6)

4From now on, we am going to assume the Einstein Summation Convention5Or if you just want to think of it as the chain rule from multivariable calc extended in n dimensions

8

Notice that the metric tensor reduces to the kronocker delta in flat, Euclideanspace, but when we in curved space, we need this metric tensor. Thus, our equation,with the metric tensor, reads:

ds2 = gmndyrdys

Think of a curved space, say a sphere. Imagine a right triangle on this sphere, thehypotenuse (as I said in the intro to this paper) will be curved so the metric tensoraccounts for this curvature with the two terms of partial derivatives. So there we haveit, the the metric tensor of Einsteins Field Equations finished, lets move on to thearcane Christoffel Symbols, which we will ever so need when we derive the Riccitensor.

Recall the definition of a tensor; a tensor is something that is invariant undercoordinate transformations. Therefore, if we have

W xnm = Vxnm

for the x direction, then this statement would be tru in all frames of reference. Forexample relating to our cow field, if the height at a point is 2 in one frame of reference,then it is 2 in all frames of reference, regardless of the placement of the reference point.Now, we just found how the pythagorean theorem transforms in curved space. Stickingwith this idea, lets see how tensors, specifically the derivatives of tensors, transformin different coordinate systems. The result will truly be interesting as you shall see.Say we have a tensor T xmn such that it is the derivative of some vector V

xm.

T xmn =V xmxn

Now consider a tensor T ymn, which is this same tensor, but now in the y frame ofreference. Our fundamental question is: Does this tensor in the y frame of referencetransform as the derivative of the vector in the f frame of reference? Or, mathemati-cally,

T ymn?=V ymyn

The answer to this is no, for reasons that we will see. To do this, we need to essentiallymake a counter proof, which means that we need to make the calculation to show thatthis equality does not hold, and thus if it does not hold, then the equality is false.Consider equation (5), which shows how two tensors transform in different referenceframes (using the covariant form):

T ymn =xr

ymxs

ynT xrs

Rewriting this, replacing the tensor on the right hand side with the partial derivativeof the vector, we get

=xr

ymxs

ynV xrxs

Notice that the second and third term are actually the inverse chain rule, so we cancontract them together to form one term:

T ymn =xr

ymV xryn

9

Our question is if this is = Vym

yn T ymn

Summed all in one,

T ymn =xr

ymV xryn

?=V ymyn

Well, if we rewrite equation (5) for a single index, we get:

V ymyn

=

yn(xr

ymV xr )

, the only thing we did was drop the s index from that equation, contracting in a waythe tensor to a vector. Now we have a derivative of a product, so we use the good olproduct rule!

V ymyn

=xr

ymV xryn

+ V xr

ynxr

ym

V ymyn

!!= T ymn +

ynxr

ym

T ymn = Tymn +

ynxr

ym

You can see that we have our result, but with an additional term, and this combi-nation of derivatives we call a Christoffel Symbol, denoted as such:

rnm =

ynxr

ym

So, we see that tensors which are derivatives of vectors do not truly transform in-variantly under coordinate transformations, and that in doing this, we get this extraterm which is the Christoffel symbol. Again, as gmn was a correction of pythagoreas incurved space, thus rmn is a correction for the transformation of derivatives of tensorsin different coordinate systems! So, in conclusion, T ymn 6= V

ym

yn

Here, we introduce some more nation, namely the notation of the covariant deriva-tive. The covariant derivative is a very useful notation to adapt when studying generalrelativity because when you take derivatives of tensors, you just saw that they donttransform invariantly. On the contrary, they transform with this gamma correctionterm stuck there in the equation. To alleviate our major stresses with this correctionterm, we use the covariant derivative, which does, when applied to a vector, transformit in any reference frame (yay!). The notation is as follows:

T ymn = nV ym,and this n is the covariant derivative. See, in this case, we dont need to add thatcorrection term, because it is already contained inside the covariant derivative. So,summing this all up in a nice neat formula:

T ymn = nV ym =V ymyn

+ rnmVxr (7)

So what is this Christoffel Symbol? Simply put, it is the compensation so that thederivative of a vector as part of a tensor transforms invariantly.

10

Ok, this is great but this is only for how derivatives of vectors transform, we wantto know how derivatives of tensors transform!! Luckily, this is very easy, all we haveto do is add one more gamma term for the extra index:

pTmn = Tmnyp

+ rpmTnr + rpnTmr (8)

With this knowledge, lets conceptually try to understand the answer to the follow-ing: What is rgxmn, where gmn is of course the metric tensor and the former is thecovariant derivative with a dummy variable r. Lets take two cases, in flat and in curvedspace. We have already shown that the metric tensor is equal to the kronocker deltain flat space, because since it is not round, there need not be any correction, so the

tensor greatly reduces to just this constant delta term, gxmn xmn ={

1 ifm = n

0 ifm 6= nAnd we all know (well i hope we do) that the derivative, whether covariant, contravari-ant, partial, regular derivative of any constant is always 0, therefore in flat space, thecovariant derivative of the metric is 0. In addition, if we know that this derivative is0 in the x frame of reference that we were working with, we know that it is 0 in allframes of reference because of the definition of a tensor! Thus, from equation (8):

pgmn = gmnyp

+ rpmgnr + rpngmr

!= 0

What have we just done you may ask? This is just an obvious consequence fromtaking the covariant derivative of a constant, but it is not. We now have an equationwith the metric, the derivative of the metric, and the gamma correction term. Whatif we want to solve for gamma in terms of the metric and its derivatives. For this, we(and so did Einstein) go to our handy mathematicians so they can solve this for us!The result (brace yourself) gives us:

abc =1

2gad(

gdcxb

+gabxc gbcxd

), (9)

where a, b, c, and d are just the dummy indices that are being summed over, remindyou, by the Einstein Summation Convention. A few remarks; you now have an equationfor in terms of the gmn and gmn. Also, itself is not a tensor, but rather a correctionterm that is made of partial derivatives. In addition, = 0 always in flat space, becausethe metric is 0 in flat space and it is made up of the metric, so if one is 0, then itimplies that the other is too. This will become important, because the Ricci Tensor,the fundamental tensor for curvature, has Christoffel Symbols buried inside it. Upnext: the Ricci Tensor!

Consider a curved space (if you want to think of a cone, be my guest). Take avector in this curved space and parallel transport this vector around the circumfer-ence of the curved space. Parallel transport is the act of moving a geometrical object(such as a vector in our case) along a smooth manifold. In flat space, if we have twopoints A and B that are the same, when we parallel transport a vector, it will havethe same magnitude and direction at this point, because they are intuitively the samepoint. Now we consider a curved space, like a cone, which if we unwind the curvedspace into a flat space, A and B will be different points. If we parallel transport thevector starting at A, it will end at point B, but its direction will be much different be-cause of the geometry of the cone. The deviation from the actual place that the vector

11

should be when there are no discontinuities is the angle . This measures the cur-vature of the geometrical object. Now we must digress to explain a little more notation.

Definition: The commutator [A,B] AB BA. In the classical sense, the commu-tator is always equal to 0, but for example quantum mechanically, when we considerspin operators, this is not the case. Thus, we say that spin operators do not commute.Now let us try

[

x, f(x)]

and see the result. We will find that it does not actually equal 0 because we areworking with partial derivatives.

[

x, f(x)]V =

xf(x)V f(x)

xV

For the former, we can use the product rule for differentiation:

= Vf(x)

x+ f(x)

V

x f(x)v

x

= Vf(x)

x

Therefore,

[

x, f(x)] =

f(x)

x

12

Einstein Derivation

Documents

Transcript of Einstein Derivation