UNDERDETERMINATION AND INDIRECT MEASUREMENT A …cs884mb1574... · 2011. 9. 22. · ii. I certify...
Transcript of UNDERDETERMINATION AND INDIRECT MEASUREMENT A …cs884mb1574... · 2011. 9. 22. · ii. I certify...
-
UNDERDETERMINATION AND INDIRECT MEASUREMENT
A DISSERTATION
SUBMITTED TO THE DEPARTMENT OF PHILOSOPHY
AND THE COMMITTEE ON GRADUATE STUDIES
OF STANFORD UNIVERSITY
IN PARTIAL FULFILLMENT OF THE REQUIREMENTS
FOR THE DEGREE OF
DOCTOR OF PHILOSOPHY
Teru Miyake
June 2011
-
http://creativecommons.org/licenses/by-nc/3.0/us/
This dissertation is online at: http://purl.stanford.edu/cs884mb1574
© 2011 by Teru Miyake. All Rights Reserved.
Re-distributed by Stanford University under license with the author.
This work is licensed under a Creative Commons Attribution-Noncommercial 3.0 United States License.
ii
http://creativecommons.org/licenses/by-nc/3.0/us/http://creativecommons.org/licenses/by-nc/3.0/us/http://purl.stanford.edu/cs884mb1574
-
I certify that I have read this dissertation and that, in my opinion, it is fully adequatein scope and quality as a dissertation for the degree of Doctor of Philosophy.
Michael Friedman, Primary Adviser
I certify that I have read this dissertation and that, in my opinion, it is fully adequatein scope and quality as a dissertation for the degree of Doctor of Philosophy.
Helen Longino
I certify that I have read this dissertation and that, in my opinion, it is fully adequatein scope and quality as a dissertation for the degree of Doctor of Philosophy.
Patrick Suppes
I certify that I have read this dissertation and that, in my opinion, it is fully adequatein scope and quality as a dissertation for the degree of Doctor of Philosophy.
George Smith
Approved for the Stanford University Committee on Graduate Studies.
Patricia J. Gumport, Vice Provost Graduate Education
This signature page was generated electronically upon submission of this dissertation in electronic format. An original signed hard copy of the signature page is on file inUniversity Archives.
iii
-
iv
Abstract
We have been astonishingly successful in gathering knowledge about certain
objects or systems to which we seemingly have extremely limited access. Perhaps the
most difficult problem in the investigation of such systems is that they are extremely
underdetermined. What are the methods through which these cases of
underdetermination are resolved?
I argue in Chapter 1 that these methods are best understood by thinking of what
scientists are doing as gaining access to the previously inaccessible parts of these
systems through a series of indirect measurements. I then discuss two central problems
with such indirect measurements, theory mediation and the combining of effects, and
ways in which these difficulties can be dealt with.
In chapter 2, I examine the indirect measurement of planetary distances in the
solar system in the sixteenth and seventeenth centuries by Copernicus and Kepler. In
this case, there was an underdetermination between three different theories about the
motions of the planets, which can be partly resolved by the measurement of distances
between the planets. The measurement of these distances was enabled by making
certain assumptions about the motions of the planets. I argue that part of the
-
v
justification for making these assumptions comes from decompositional success in
playing off measurements of the earth‘s orbit and the Mars orbit against each other.
In chapter 3, I examine the indirect measurement of mechanical properties such
as mass and forces in the solar system by Newton. In this case, there were two
underdeterminations, the first an underdetermination between two theories about the true
motion of the sun and the earth, and the second an underdetermination between various
theories for calculating planetary orbits. Newton resolves these two problems of
underdetermination through a research program where the various sources of force are
identified and accounted for. This program crucially requires the third law of motion to
apply between celestial objects, an issue about which Newton was criticized by his
contemporaries. I examine the justification for the application of the third law of motion
through its successful use for decomposition of forces in the solar system in a long-term
research program. I further discuss comments by Kant on the role of the third law of
motion for Newton, in which Kant recognizes its indispensability for a long-term
program for determining the center of mass of the solar system and thus defining a
reference point relative to which forces can be identified.
Chapter 4 covers the indirect measurement of density in the earth‘s interior using
observations of seismic waves. One of the difficult problems in this case is that we can
think of the interior density of the earth as a continuous function of radius—in order to
determine this radius function, you are in effect making a measurement of an infinite
number of points. The natural question to ask here is how much resolution the
observations give you. I will focus on the work of geophysicists who were concerned
with this problem, out of which a standard model for the earth‘s density was developed.
-
vi
Acknowledgments
I am incredibly lucky to have been able to take two extraordinary seminars in
which the seeds for the ideas set forth in this dissertation were sown. The first is a
seminar on Newton‘s Principia that George Smith taught at Tufts University that I took
when I was an MA student. George‘s unwavering attention to the details that make a
difference, his way of identifying and trying to answer truly deep and interesting
questions about science, and above all his kindness and dedication to his students, all
made a deep impression on me. I sat in on this seminar again when George taught a
version of it when he visited Stanford University a few years later. I would like to sit in
on it many more times if I could—I‘m sure I would get more out of it every time.
The one other seminar that made a similarly deep impression on me was Michael
Friedman‘s seminar on Kant‘s Metaphysical Foundations of Natural Science that I took
at Stanford. I found Michael to be a thinker of a completely different sort from George,
but I also saw a very similar uncompromising attitude with regard to the study of Kant
and the sciences of his time, and Michael‘s warm personality made it easy for me to
work with him as my advisor at Stanford. George and Michael are a pair of mentors
who, each in his own unique way, sets the highest standards in his area of research. I
only hope my own work could approach those standards someday.
-
vii
The rest of the dissertation committee is no less distinguished. Pat Suppes is, of
course, in a league of his own. When I first talked to Pat, I have to admit that it was
with a mixture of awe and apprehension, but I grew to really enjoy walking out to visit
him at Ventura Hall. Helen Longino was always very helpful and encouraging, even
during a very busy stint as department chair. Tom Ryckman was not an official member
of the committee, but he was certainly a committee member in my eyes. I have had
countless discussions with him about the topics covered in this dissertation, and he was
the most dependable source of advice and support during my years at Stanford.
As I have already mentioned, I got my MA in philosophy at Tufts University,
and besides George I would like to thank Dan Dennett, Jody Azzouni, Kathrin Koslicki,
David Denby, and the members of my cohort. At Stanford, I would like to thank the
following faculty: Brian Skyrms, David Hills, Krista Lawlor, Lanier Anderson, Chris
Bobonich, Mark Crimmins, Nadeem Hussain, Marc Pauly, John Perry, and Dagfinn
Follesdal. Grad students and visiting scholars who have contributed to the development
of the ideas in this dissertation include Quayshawn Spencer, Angela Potochnik, Joel
Velasco, Alistair Isaac, Johanna Wolff, Tomohiro Hoshi, Sally Riordan, Ben Wolfson,
Dan Halliday, Danny Elstein, Shawn Burns, Micah Lewin, and Samuel Kahn. Part of
this dissertation was given as a talk at the UC Irvine LPS department, and I thank the
audience for their comments, and Jeff Barrett and Kyle Stanford in particular for their
hospitality.
I wrote much of this dissertation at the Max Planck Institute for the History of
Science in Berlin, where I was a Predoctoral Fellow. The Max Planck Institute provided
a perfect environment for writing this dissertation, and I would especially like to thank
-
viii
Raine Daston and the scholars in Department II. Financial support for the years during
which I was working on this dissertation was provided by the Whiting Foundation and
the Ric Weiland Fellowship. In addition, I am proud to say that I was the very first Pat
Suppes Fellow at Stanford, for which I would like to thank Pat a second time.
I could not have had better preparation for the work I had to do for this
dissertation than my undergraduate experience at Caltech. I want to thank all of my
friends throughout those four very tough but ultimately rewarding years.
Finally, all the members of my family know that the roots of my philosophical
education began with long arguments over pretty much anything with my twin brother
Kay. I would like to thank Dad, Mom, Yochan, June, and Kay for their support.
-
ix
Table of Contents
Chapter 1: Underdetermination and Indirect Measurement .................................................. 1
Chapter 2: Copernicus, Kepler, and Decomposition .............................................................. 35
Chapter 3: Newton and Kant on the Third Law of Motion ................................................... 68
Chapter 4: Underdetermination in the Indirect Measurement of the Density Distribution
of the Earth’s Interior............................................................................................................... 102
Epilogue ..................................................................................................................................... 136
Bibliography .............................................................................................................................. 141
-
1
-1-
Underdetermination and Indirect Measurement
1 Prelude
Suppose one day archeologists unearth a mysterious artifact—a perfect black
cube, 10 centimeters on a side, cool to the touch, made of what looks like the blackest
possible steel. They decide, rather unimaginatively, to call the artifact ―Cube‖. It‘s a
mere curiosity at first, but scientists soon find that it has some mystifying features. The
material it is made out of is incredibly hard—it cannot be broken, cut, pierced, drilled, or
dynamited. It cannot even be scraped in order to take samples of the material. All
attempts to take CAT scans or MRI images of the inside of Cube have failed. On one
face are several white dots that look as if they are projected onto the face from within
Cube. The dots move across the face of Cube, tracing out trajectories over time.
Now suppose we are scientists trying to figure out what is going on inside Cube.
We will find, unfortunately, that our options are severely limited, since we have found
no way of accessing the interior of Cube. What do we do? Perhaps the only thing to do
is simply to assume that there are certain lawful connections between the internal and
external states of Cube, that is, the dynamics of the external states depends somehow
-
2
upon the dynamics of the internal states. We then make hypotheses about (a) the
dynamics of the internal states of Cube, and (b) the laws that connect the internal to the
external states of Cube. From these hypotheses, we deduce predictions about the
dynamics of the external states. If those predictions match our observations of the
external states, we say that those hypotheses have been confirmed. This method, the
hypothetico-deductive method, was described by Pierre Duhem in The Aim and Structure
of Physical Theory (1954) as being the method of physics, and it has been widely
adopted by philosophers, most notably Quine.
There is a problem with this method, though, as Duhem recognized. Since we
have no antecedent knowledge whatsoever about the internal states of Cube, there is
enormous leeway in the hypotheses we can come up with. For any given dynamics of
the external states of Cube, there will be many different sets of hypotheses that are
consistent with those dynamics. In philosophical parlance, our theory of the internal
states of Cube is massively underdetermined by our observation of the external states.
Because of this underdetermination, the mere agreement of predictions about the
dynamics of the external states of Cube with actual observations gives us little reason to
think that the hypotheses from which those predictions were deduced have, in any way,
characterized the true internal states of Cube. Faced with this predicament, we might
give up on the idea that we can gain any knowledge at all about the internal states of
Cube, and instead become instrumentalists. We change our aim to simply predicting the
dynamics of the external states of Cube without making any claim to having any
knowledge about the internal states.
-
3
2 Resolving underdetermination
According to one way of thinking about the methodology of planetary astronomy
in the sixteenth century, planetary astronomers were in a position very much like what
the scientists faced with studying Cube. All of our knowledge about the solar system
came from the observation of the motions of the planets as they moved across the night
sky. More specifically, we can think of ourselves as being located inside an immense,
hollow, black sphere, on the inner surface of which the constellations are painted. We
can then determine the positions of the planets on this sphere, as seen from the earth, and
thus record their apparent motions over time. We cannot, however, know how far away
a planet is from us merely by looking at it. So we are, in effect, looking at the two-
dimensional projection, onto the celestial sphere, of the actual three-dimensional
motions of the planets through space. Moreover, although we did not know for sure in
the sixteenth century, we are observing these motions from a platform, the earth, that is
itself moving.
Drawing out the analogy with the story of Cube, we can think of the apparent
motions of the planets as corresponding to the external states of Cube, while the actual
three-dimensional motions correspond to the internal states. Like the scientists studying
Cube, astronomers in the sixteenth century faced a problem of radical
underdetermination. Famously, the apparent motions of the planets across the night sky
were compatible with three different theories of the actual motions of the planets—the
Ptolemaic, the Copernican, and the Tychonic theories1—in which the actual three-
dimensional motions of the planets are radically different from each other. This is a
1 I will describe these theories in more detail in chapter 2.
-
4
classic situation of underdetermination. There were three radically different theories that
could all be made to fit the observations then available to about the same degree of
precision. At the end of the sixteenth century, some astronomers, such as a
contemporary of Kepler‘s called Ursus, came to conclusions similar to those I discussed
above about Cube.2 They decided that the aim of planetary astronomy should not be
about acquiring knowledge about the actual motions of the planets at all. Instead, the
aim of planetary astronomy should simply be to provide a convenient way of calculating
the apparent motions of the planets.
How was this state of underdetermination eventually resolved? Well, suppose
the method of astronomy is, like for Cube, hypothetico-deductive. You make
hypotheses, deduce the observable consequences of these hypotheses, and then you
compare these consequences with actual observations. Since the problem is that there
were three theories that could fit the observations to the same degree of precision, we
might think that one way of resolving the underdetermination is through increasing the
precision in the actual observations. As we shall see in chapter 2, however, Johannes
Kepler shows in the Astronomia Nova that, with minor modifications, the Ptolemaic,
Copernican, and Tychonic systems can be made to give exactly the same predictions for
the apparent two-dimensional motions of the planets—they can be made empirically
equivalent. Thus, a mere increase in precision of the observations of the apparent
motions could not resolve the underdetermination. What actually happened is that
Galileo turned his telescope to the skies in 1619 and observed that Venus has phases,
just like the moon. This situation is inconsistent with the Ptolemaic theory, so it was
2 I will discuss Ursus in chapter 2.
-
5
eliminated from contention.3 A new kind of technology, the telescope, allowed us to
bring a new kind of evidence to bear on the question of what the actual motions of the
planets are.
I think, however, that there is a third way in which the underdetermination could
have gotten resolved. In fact, Kepler had a good argument, prior to 1619, that the
Ptolemaic theory is not the correct theory of planetary motion. I just got done saying
that Kepler showed all three theories of the planetary motions could be made empirically
equivalent to each other, and so could not be distinguished on the basis of observations
of the apparent two-dimensional motions of the planets. We might note, however, that
the three theories predict very different motions for the planets through three-
dimensional space. If we could somehow measure the actual distances between the
planets with confidence, we could eliminate one or more of the theories. As I said, we
cannot get planetary distances simply by direct observation of the two-dimensional
motions, but they can be inferred from these two-dimensional motions by indirect
measurement.
3 Indirect measurement
So we might be able to resolve underdetermination in some cases by using
indirect measurement. As we shall see, however, there is a problem. In order to carry
out indirect measurement, you have to presuppose certain facts about the system you are
investigating. The central question of this dissertation will be: How can we know with
confidence that indirect measurements are correct or approximately correct, given that
3 It was not until Newton that the Tychonic theory was conclusively laid to rest, as we will see in
chapter 3.
-
6
we must presuppose certain facts about the system? Let me sharpen this question
further by explaining what I mean by an indirect measurement, and giving some idea of
what the assumptions are that you have to make about the system.
Suppose there is a complicated, partially inaccessible system that I want to
acquire knowledge about. A complicated system is one that consists of many parts,
those parts having various properties and relations with each other. I say an object is
partially inaccessible if we can only confidently measure a proper part of the properties
of, and relations between, the parts of that object. I call the properties that we can
confidently measure the accessible properties. I will also sometimes speak of accessible
parts, by which I simply mean the parts of the system that have properties that we can
confidently measure. In order to determine the properties and relations of the
inaccessible parts, we must make inferences based upon what we know about the
accessible parts. Indirect measurement, then, is the measurement of inaccessible
properties or relations of a complicated, partially inaccessible system, through inference
based upon observations of the accessible properties.
We can think of the solar system, as viewed by astronomers in the sixteenth
century, as a complicated, partially inaccessible system. It is complicated because it
consists of many parts, namely the planets, the sun, and the moon, each having
properties such as mass and size, and distance relations between them. It is partially
inaccessible because we have access to the two-dimensional motions of the planets, but
we do not have access to distances in three-dimensional space. So the measurement of
planetary distances based upon observations of the apparent two-dimensional motions of
the planets is indirect measurement.
-
7
Now let us go back to the question I asked a few paragraphs back. Could we
have used the observations of the apparent two-dimensional motions of the planets to
break out of the state of underdetermination prior to 1619? The answer to this question
depends on whether we could have made indirect measurements of planetary distances
with confidence prior to 1619. I think we could, as I will argue in chapter 2. But here, I
simply want to examine what might make us lack confidence about indirect
measurements.
Before I go on with my discussion of indirect measurement, I want to distinguish
indirect measurement from a somewhat similar kind of problem. Suppose there is a
system that is partially inaccessible but not complicated. For example, say we have
found a huge underground lake, and we want to know the mineral content in the various
parts of the lake, but we only have access to parts of it. We might then take samples of
the water from the parts we can access, measure the mineral content in these samples,
and then extrapolate to the entire lake. We are making the assumption here, of course,
that the mineral content in the parts of the lake that are inaccessible to us is going to be
similar to the mineral content in the parts that are accessible. If this assumption turns
out to be wrong, we will be wrong about the mineral content in the inaccessible parts.
There can be interesting epistemological problems with this kind of extrapolation, but it
will not be a central topic of this dissertation. I will stick to complicated systems, for
which I believe there are particular problems and ways of dealing with these problems.
I will now explain what I take to be the central problems with indirect
measurement. First, note that we can be very confident about the results of some
indirect measurements. I do not have direct access to the amount of electric current
-
8
flowing through a wire, but I can have great confidence in the value I measure using a
galvanometer. At least part of the reason for this confidence has to do with what I call
antecedent familiarity. If an object is of a type that is familiar to me, I can safely assume
certain facts about that object. I know that if I drop a shot put from a height of 10 meters,
it will reliably hit the ground in approximately 1.4 seconds, barring any extraordinary
circumstances. I know this because I know that objects like shot puts fall with a uniform
acceleration of approximately 9.8 m/s2 at the surface of the Earth. There have been some
cases in the history of science, however, where we have wanted to know facts about an
object that is utterly unlike anything else we knew of at the time. The solar system is a
good example of such an object. For all astronomers knew in the sixteenth century, the
solar system could have been radically different from anything else we knew of, so it
was hard to know what a reasonable assumption to make about the solar system was.
I think there are two main difficulties when carrying out an indirect measurement
that would make us lack confidence in such a measurement, particularly if the system we
are making the measurement on is antecedently unfamiliar. The first difficulty is theory-
mediation. You have to make measurements of the inaccessible properties, based upon
observations of the properties that are accessible. In order to make such measurements,
you need to presuppose that a particular relation applies between the accessible
properties and the inaccessible properties. If the relation you use to make the
measurement is not known antecedently, then the question naturally arises as to how you
can know that the measurement is correct.
The second difficulty is the combining of effects. Again, the root of this
difficulty is that you have to make measurements of inaccessible properties based upon
-
9
observation of accessible properties. Suppose the system you are making a
measurement upon is complicated. If so, there could be more than one part of the
system that has an effect on the accessible parts. If you want to measure a property of
one of those parts, you might have to separate out, or decompose, the effects of the
various parts on the accessible part. If you do not antecedently know the composition of
the system, however, you might not know exactly how to carry out such a
decomposition. If so, you might not be confident that the measurement you make using
such a decomposition is correct.
I will discuss these difficulties in more detail in the following sections of this
chapter, but now let me return to the notion of underdetermination. Suppose that there is
a system that we are interested in acquiring knowledge about, but there are two or more
theories that can account for all observations equally well. As I mentioned, there are a
couple of ways in which we can think we could resolve this situation of
underdetermination. One way is simply to improve on the observations we already have,
by increasing the precision of these observations. The other way is to come up with an
entirely new set of observations, like Galileo observing the phases of Venus.
What I am arguing in this dissertation is that there is a third way to resolve the
underdetermination. This is to make indirect measurements by inference from the
observations that are available to us. In order to make these indirect measurements,
however, we must make certain assumptions about what the system is like. Because of
the problem of theory-mediation, you have to make assumptions about the relation
between the inaccessible properties and the accessible properties of the system. Because
of the problem of combining of effects, you have to make assumptions about the
-
10
composition of the system, that is, the relation between the parts of the system. Since
these assumptions enable indirect measurements to be made, I will sometimes refer to
them as enabling assumptions.
So the now sharpened-up central question of this dissertation is the following:
Given that, in order to carry out an indirect measurement, you must make inferences
from the accessible properties of a system to the inaccessible properties, and that in
order to make these inferences, you need to make the assumptions that (1) certain
relations between accessible and inaccessible properties apply, and (2) effects from
various inaccessible parts on the accessible parts can be decomposed in a certain way,
how do you ensure that the indirect measurement you made is correct, or approximately
correct? I will lay out a preliminary answer to this question in the rest of this chapter.
4 Theory mediation
If I want to find out how wide my window is, I simply take out a tape measure
and measure it. Sometimes, however, I do not have the right kind of access to an object
on which I want to make a measurement. As I write this, the Tokyo Skytree, which will
become the tallest freestanding structure in Japan when completed, is being built.
Suppose I want to figure out how tall it is at this point during its construction. I could
not very well take out a tape measure to measure its height. Instead, I might improvise a
device with which I measure the angle from the horizon to the top of the Skytree. I then
find out the distance from my position to the Skytree construction site. Simple geometry
tells me that the height of the Skytree should then approximately be this distance times
-
11
the sine of the angle I measured, assuming that the angle is small. With the help of
geometry, I have made a measurement of something that is physically inaccessible to me.
In a particularly philosophical moment, I might realize that I have made the
assumption here that the Skytree is the kind of thing to which Euclidean geometry
applies. We would never call this assumption into question in our day-to-day dealings.
But what if, instead of the Skytree, I was trying to calculate distances to something that
is utterly unfamiliar to me? Astronomers in the sixteenth century, for example, used
geometry in determining the orbits of the planets. If they had known of other geometries,
they might well have raised the question of whether Euclidean geometry really applies to
the planets. After all, those planets were known to be unimaginably distant, and nobody
had the faintest clue what kind of material they could be made out of. Why should we
believe Euclidean geometry applies to them?
For almost all practical purposes, when we make such a measurement, we are on
safe ground assuming that mathematics and geometry will apply to the objects that we
are investigating. But sometimes, in order to make a measurement, we need to assume
more than mathematics and geometry. Sometimes we have to assume that a system on
which we are trying to make a measurement has certain physical properties, and behaves
in accordance with certain mathematical relations. Because I make use of a bit of
physical theory in order to make this kind of measurement, I say that such measurements
are theory-mediated.
Now, when we make measurements using bits of physical theory, the way in
which the theory is used in the measurement can be surprisingly complicated. For
example, consider the problem of trying to measure the muzzle velocity of a cannon.
-
12
One way we might make this measurement is to fire the cannon and measure how far the
cannonballs fly. The following equation, allows you to calculate, given the angle at
which a cannon is fired, the muzzle velocity v, and the gravitational acceleration g at the
surface of the Earth, the horizontal distance D at which a cannonball lands:
D = 2 v2 (cos sin ) / g. (1)
This equation assumes no air resistance, a perfectly flat Earth, and a constant
acceleration due to gravity. In order to calculate the distance D, all you need to do is
plug in the values of the muzzle velocity and the angle of the cannon.
Now, suppose we want to determine the muzzle velocity of a particular cannon,
but we do not have any means of directly measuring the velocity of the cannonballs as
they come shooting out of the muzzle. There is a way of using the equation given above
for making a measurement of this muzzle velocity. We can think of this method as a
way of measuring a property of something that is not directly accessible, much like our
determination of the height of the Tokyo Skytree.
First, we fire the cannon several times, at a predetermined angle, and measure the
distances at which cannonballs land. We then might guess various values of v, for which
we calculate the distances D at which we predict the cannonball ought to land. We take
the value of v that gives us a predicted value for D that is the nearest to the actually
observed values. Then we might refine our value of v further by taking a cluster of
values around this best value for v, and calculating the distances at which we predict the
cannonball ought to land given these values for v. We then compare these distances
with the distances we have actually measured, and take the value of v that is closest to
-
13
these distances. We can keep repeating this until we home in on a value for v. Using
this procedure, we hopefully will have measured the muzzle velocity.
Note that this procedure involves using a mathematical equation where v and
are independent variables, and D is a dependent variable. If the aim were to determine
D given values for v and , one could simply plug in the values and use the equation to
calculate D. In this case, however, we are using measured values of D in order to try to
determine the value of v—that is, we are trying to determine the value of an independent
variable using measured values for the dependent variable. The way in which we do this
is to vary the value of v until we find one that fits the value of D that we have observed.
Often, the independent variables such as v are called parameters, and this kind of
problem is called a parameter estimation problem, or a bit more colloquially, curve-
fitting. This kind of problem is also often called an inverse problem, particularly in
cases where instead of trying to estimate discrete parameters, you are trying to estimate a
continuous function.
Suppose we take the mathematical equation to be correct, and that is known.
Then the logical relation between v and D is in the form of an if-statement: if v has
such-and-such a value, then D has such-and-such a value. Note that this relation does
not uniquely determine v given D. What we really would want to guarantee uniqueness
for the value of v would be a logical relation in the form of an if-and-only-if statement.
There is also a further problem having to do with the logic. We used a kind of homing
procedure to find the value of v, where we first guess a value and then adjust v until we
get a value for D that best fits our measured value. Note that this homing procedure
works because we know that D is going to be smooth over small variations in v. But if
-
14
the equation we were using were such that the dependent variable is sensitive to small
fluctuations in the independent variables, we would not be able to do such a homing
procedure. For the homing procedure to work, the logic has to be of the form if v has
very nearly such-and-such a value, then D has very nearly such-and-such a value. In
some cases of indirect measurement, the use of this very nearly relation is crucial, as we
shall see in chapter 3.
In some cases, due to the mathematical relation between the independent and
dependent variables, there are problems having to do with the nonuniqueness of
solutions. Methods for addressing these nonuniqueness problems have recently become
important in geophysics, computer imaging, and other fields, under the rubric of
―inverse problem theory‖. I will postpone discussion of this problem until chapter 4.
5 When a measurement is theory-mediated, how do we know it’s correct?
As we did with the measurement of the height of Tokyo Skytree, we might think
about the assumptions we are making when we carry out this measurement. How do we
know that these assumptions will result in correct measurements? For example, what
needs to be the case in order for us to come up with the correct value for v, the muzzle
velocity of the cannon?
Our initial impulse might be to say that the equation we are using, and the
assumptions we are making about this system, must be true of the system. But we
should immediately realize that the equation we are using, and the assumptions we are
making about the system, such as no air resistance and a constant acceleration due to
gravity, are, strictly speaking, false with respect to this system. Now, one might think
-
15
that we ought to try to make the measurement procedure as realistic as possible, by
including as many details as we can. We could, for example, try to include air resistance,
include known details of the terrain, even allow for things like wind and atmospheric
pressure. The problem is that, in many cases, adding too many details to the
measurement procedure complicates the procedure enormously, and in some cases
makes the determination of a value impossible.
On the other hand, we would be in trouble if the assumptions we make are too
unrealistic. In that case, we could perhaps carry out the measurement procedure and
determine values for the properties of the system. But if the assumptions we make are
too unrealistic, the values we calculate would give us properties of some imaginary
cannon, not the real cannon we are interested in. There is a tradeoff here. If the
assumptions we use are too unrealistic, then we would get the wrong answer for our
measurement. But if we are too realistic, then we won‘t be able to carry out the
measurement procedure. The trick is to find assumptions that are realistic enough so
that they will let us calculate a value for the muzzle velocity that is close enough, for our
purposes, to the correct value for the real cannon.
How, then, do we know that we are making the right assumptions, and using the
right equation, to calculate the correct value for v? With regard to the cannon example,
the answer to this question is ultimately going to be an appeal to our everyday
experience, and our experience with cannons in particular (hopefully, we are
experienced artillery engineers). Our familiarity with the type of thing that cannons are,
and the conditions under which they are fired, allows us to justify the assumptions we
make about the system.
-
16
There was also the further problem of the logic of the relation between v and D.
Even if I have found a value for v that is consistent with the value for D that I have
measured, the logic does not guarantee that the value for v that I found is unique. Here
again, though, we make the assumption that the value for v is unique because of our
familiarity with the situation. We know that, given a constant value for , and the
conditions under which the cannon is fired, Equation 1 ought to apply at least
approximately, and there should be a unique positive value of v for each value of D.
In this example, there is a part of the system to which we do not have direct
access—we cannot directly measure the muzzle velocity of the cannon. In order to
make an indirect measurement of this muzzle velocity, we must make a large number of
assumptions about the cannon. Fortunately, the cannon is a type of system that is
familiar to us, so we can have confidence in the assumptions we make. We might say
that the muzzle velocity of the cannon is an inaccessible property of a familiar system,
and our familiarity with systems of this type allows us to set up a procedure through
which we can measure this inaccessible property.
What do we do, though, if we want to measure inaccessible properties of
unfamiliar systems? Let us hold that thought until after I discuss the second of the two
main difficulties of indirect measurement, the combining of effects.
6 Representing partially inaccessible systems
Before I discuss the combining of effects, however, I first want to introduce the
following way of representing partially inaccessible systems. This will facilitate the
-
17
discussion by giving us an intuitive grasp of what is going on in cases of the combining
of effects.
Figure 1
We might represent our measurement of the muzzle velocity of the cannon as in
Figure 1. This diagram is in the form of a directed graph.4 The reason it is a directed
graph should become clear in the next few sections, but let‘s just take a look at the figure
first. There are two nodes, labeled X and Y. There is an arrow, labeled a, going from X
to Y. Here is how to interpret this picture. Y stands for the distance the cannonballs
travel, X stands for the muzzle velocity of the cannon, and the arrow a stands for the
relation between X and Y, namely Equation 1 given above. The relation a uniquely
determines Y, given X. That is, as I have mentioned, it is a logical relation of the form if
4 I should say that some inspiration for these diagrams comes from Jim Woodward‘s work on
causation. The idea of these diagrams, however, is not to try to infer causes from observation.
In fact, it is almost the opposite—this sort of structure is assumed in order to enable
measurements of properties. I was also greatly influenced by George Smith‘s work, particularly
his paper ―Closing the Loop‖, encapsulated in the idea of trying to find the ―details that make a
difference, and the differences they make‖.
-
18
X = v, then Y = w. We have access to Y, that is, we have the means for confidently
measuring its value. What we want is to find the value of X. Ignoring, for now, the
difficulties I mentioned involving nonuniqueness, we can say that the value of X can be
determined if we know the value of Y, because we know the relation a.
We can think of the arrow a, in this case, as representing a causal relation. But
in other cases, the arrow could stand for other relations. For example, the measurement
of the height of the Tokyo Skytree can also be represented by Figure 1. Think of X as
standing for the height of the Skytree, and Y as standing for the angle from the top of the
Skytree to the horizon and the distance from my position to the Skytree site. We are
now interpreting Y as standing for two variables. The arrow a now stands for a
geometrical relation between X and Y, which uniquely determines Y, given X. As in the
cannon example, we can determine the value of X, given Y, because we know the
relation a.
Note, though, that these graphs should not be taken to be faithful representations
of these systems. For example, as I mentioned with regard to the cannon example, the
relation represented by a, Equation 1, is not actually true with regard to the system. We
might further take issue with the structure of the diagram itself. There are factors, such
as the wind, that will influence the distance that the cannonball travels. Shouldn‘t there,
then, be other arrows that point towards the node Y? In fact, if we wanted to come up
with a complete picture of what is happening with the cannon, we would have to have a
very complicated graph, with nodes standing, say, for the wind, details of the terrain,
variations in the gravitational constant, Coriolis forces, and so on. As experienced
artillery engineers, we might decide that we do not consider any of those things. We
-
19
assume that those other things will not have much of an effect on the outcome, and we
feel right about this in virtue of our experience as artillery engineers. In this case, this
very simple picture involving just X, Y, and a is sufficient for us to get a reasonably
accurate value for X, which is what we wanted. We assume that the relation a holds for
this system well enough for us to make this measurement.
One further remark: the diagram looks like the kind of thing that is often called a
model, both by scientists and philosophers. Because the word is used for many different
kinds of things in the philosophical literature, however, I have thought it best to avoid it.
The role of these diagrams is simply to represent the elements that are necessary for the
measurement to be carried out, and their relation with each other. I am using them as
conceptual tools for thinking about particular cases of measurement, and to facilitate
discussion about what is going on in such measurements. It should not be assumed that
a scientist carrying out a measurement has such a diagram explicitly in mind.
5 Combining of effects and decomposition
Now the discussion in the previous section raises an obvious question. What if
the system I am investigating is more complicated, having various different parts that
have significant effects on the accessible parts? This is the problem I discussed earlier
in this chapter as the problem of the combining of effects.
-
20
Figure 2
We can now discuss this problem using the diagrams I have just introduced.
What if I can‘t reduce a system to a very simple one like Figure 1, but it is more like
Figure 2? In Figure 2, there are now three nodes, X, Y, and Z, and two arrows—one
from X to Z, labeled a, and the other from Y to Z, labeled b. We can take a to be a
relation that licenses an inference of the following form: given that there are no other
factors affecting Z, then if X = v, then Z = w. Similarly, we can take b to be a relation
that licenses an inference of the form given that there are no other factors affecting Z,
then if Y = v, then Z = w. Now, suppose we have access to Z, and we want to measure
either Y or X. Since Z is affected by both Y and X, we need some way of separating out
their effects on Z. If we could somehow successfully separate out their effects, we
would be able to measure X or Y.
Let me illustrate this situation with the cannon example again. Let Z be the
distance the cannonball travels, and let X be the muzzle velocity of the cannon. The
arrow a going from X to Z again represents Equation 1. But now we have another factor,
-
21
represented by Y, that has an effect on the distance the cannonball travels. Say Y is the
speed of the headwind or tailwind in the direction the cannonball is shot. Then in order
to measure X given observations of Z, we would somehow have to compensate for the
effect of Y on Z.
How do we compensate? Perhaps the easiest way to do it is to wait to fire the
cannon at times when there is no wind. Since at such times there will be no effect of Y
on Z, we can effectively reduce Figure 2 to Figure 1. In this case, we are isolating the
effect of X on Z from the effect of Y on Z, in order to measure X. Now, it just happens
that in this example, this sort of measurement using isolation can be done. But what if
there is never a time when the wind dies down, and there is always a headwind, for
example?
More generally, what do you do in a situation like in Figure 2, where you have
access to Z, and you want to measure X, but there is always a significant effect of Y on
Z? You would have to find some way to separate out the effects of X and Y on Z. I call
the process of separating out the effects decomposition. How might you carry out this
decomposition? One way to do it would be to somehow try to model what the effect of
Y on Z would be, and then subtract that out in order to measure Z. Of course, we are
making the assumption here that the effects of X and Y on Z will add linearly, which will
not always be the case. At this point, however, I don‘t want to make things too
complicated. Let me simply note, at this point, that we are indeed making this
assumption about how the effects add together.
-
22
Figure 3
Now, there are other arrangements we can think of as well. In Figure 3, we have
an arrow going from X to Y, and an arrow from Y to Z. Suppose we have access to Z,
and we want to determine the value of X. In this case, X has a causal effect on Y, and Y
has a causal effect on Z, and we need somehow to measure X via its effect on Y.
Another possible arrangement is in Figure 4, where now in addition there are arrows
going between X and Y. We might think of this as a case where there is now some kind
of causal interaction. I call the various different ways in which we can arrange the
arrows and the nodes the relational structure. If all the relations are causal relations,
then we can think of it as a kind of causal structure. In all of these cases, if we want to
-
23
measure X or Y based on our observations of Z, we must somehow separate out the
effects of X and Y on Z—that is, we must carry out a decomposition.
Figure 4
All of this might seem complicated, but when we are trying to measure
properties of something that is familiar to us, isolating and decomposing the various
effects comes rather naturally. For example, suppose I am in a moving car and I have a
radar gun with me. I want to measure the speed of an oncoming car. I point the radar
gun at the car, then I look at the speed that the radar gun gives, and then I compensate
for my own speed by looking at my speedometer and subtracting my own estimated
speed. This is a form of decomposition that comes naturally because this is a system
that is made up of parts that are familiar to us. Of course, as the directed graph gets
more complicated and you have to decompose more effects, measurement can become
immensely more difficult.
I think decomposition is an aspect of indirect measurement, and of scientific
methodology in general, which has been overlooked by philosophers. In individual
-
24
cases, scientists are certainly aware of the difficulties involved with separating out
various effects when carrying out measurements. But there has been very little
philosophical literature on the problems of decomposition.5
7 Antecedently unfamiliar systems
Up to this point, we have been talking about the measurement of inaccessible
properties of familiar systems. What if, instead of a familiar system such as a cannon I
was trying to make a measurement on an inaccessible property of a system that is
antecedently unfamiliar to me? Is this even possible? Don‘t we have to know certain
things about the system antecedently in order to measure such inaccessible properties?
In the case of the cannon, we have to know the laws of physics, facts about the
environment of the cannon such as properties of the air and terrain, and less quantifiable
facts about cannons in general—how they are manufactured, how they are fired, and so
on. How could we possibly set up a measurement of an inaccessible property of an
antecedently unfamiliar system?
History seems to show, however, that successful measurements have been made
of inaccessible properties of antecedently unfamiliar systems. Consider planetary
astronomy again. The solar system—not to be confused with the traces we observe of
planets across the night sky, but the planetary system itself—was surely about as
inaccessible and antecedently unfamiliar as a system could be. We might now laugh at
the idea that the planets are carried around the heavens in crystalline spheres, but the
solar system is utterly unlike anything astronomers at the time knew about. There was
5 A few philosophers who have addressed this problem or related problems are George Smith
(2002a, 2002b), Hasok Chang (2004), and William Wimsatt (2007).
-
25
simply no way to know in advance what a reasonable assumption about the planets is.
Yet, as I show in chapters 2 and 3, the work of Kepler and Newton are examples of how
measurements of antecedently unfamiliar systems can be carried out successfully.
Let us think carefully about what makes the measurement of inaccessible
properties of antecedently unfamiliar systems difficult. As I have discussed earlier,
there are two basic problems—theory-mediation and the combining of effects. First, to
illustrate the problem of theory-mediation, let us return to the cannonball example.
Recall that the diagram for that example is given in Figure 1. There are two nodes, X
and Y, with an arrow, a, pointing from X to Y. Now, suppose we didn‘t know the laws of
physics, so we couldn‘t derive the relation, Equation 1, which relates X to Y and thereby
allows us to measure X by observing Y. If we only have access to Y, we would not be
able to measure X, without knowing this equation. Let me represent this situation in
Figure 5. I have X and Y, but now only a dotted arrow from X to Y, with a question mark
next to it. This is an indication that we think we know that there is a relation between X
and Y, but we don‘t know exactly what it is.
Figure 5
-
26
How would we measure X in this case? One thing we might think of doing is
simply guessing the relation. But how could we be at all sure that we have measured X
correctly, using a guessed relation? If I was really a cannon maker, here‘s what I would
think of doing. I would try to build something like the cannon, that launches a heavy
object like a cannonball, but for which I know the initial velocity—perhaps a catapult of
some kind. By launching the object at different velocities, I might find some kind of
relation between the initial velocity and the distance traveled. Then, by induction, I
assume that the same relation holds for the cannon. Since I now have a relation between
X and Y, I can make the measurement. So in this case I do not have to derive something
like Equation 1 from fundamental theory—I can determine it empirically. Still, one
might ask whether the inductive move is justified—how do I know that the relation I
found from the catapult applies to cannons as well? Let us hold this thought for a while.
Figure 6
Now let us think about the problem of combining of effects. Think once again
about the cannonball example. Suppose we do know Equation 1, so we know of the
-
27
relation a relating X to Y. But perhaps we are inexperienced as artillery engineers. We
don‘t know whether there could be other influences on the distance traveled, such as the
wind. Without knowing whether there could be such other influences, we would not be
able to measure X with confidence. Let me represent this situation in Figure 6. I have X
and Y, and an arrow going from X to Y as in Figure 1, but now I have a couple dotted
arrows going towards Y with question marks beside them, indicating possible effects on
Y. Now, again, if we were really cannon makers, there would be ways of determining
whether, say, wind is a factor. We could, for example, fire the cannon using the same
amount of powder under various conditions of wind to make sure that the distance the
cannonball travels is not affected too much by the wind. But, of course, there could be
further unforeseen conditions that affect the distance the cannonball flies. Without being
able to anticipate such unforeseen conditions, we have no way of correcting for them.
8 Indirect measurement and evidence
Let me now return to what I said is the central question of this dissertation: Given
that, in order to carry out an indirect measurement, you must make inferences from the
accessible properties of a system to the inaccessible properties, and that in order to
make these inferences, you need to make assumptions that (1) certain relations between
accessible and inaccessible properties apply, and (2) effects from various inaccessible
parts on the accessible parts can be decomposed in a certain way, how do you ensure
that the indirect measurement that you made is correct, or approximately correct?
If the system we are making an indirect measurement on is antecedently familiar,
we can often give plausibility arguments for assumptions (1) and (2). For example,
-
28
going back to the cannon example again, we take it as given that the laws of physics
apply to cannonballs, and that under the right conditions, Equation 1 will apply. And I
can give an argument based on past experience to say that the actual conditions are
indeed close enough to those conditions for us to be able to apply Equation 1 to this
particular situation—that, say, the wind is not going to be a factor. But what do we do if
the system is antecedently unfamiliar?
If we look at cases from the history of science, there is not a simple answer,
because the situations tend to be very complicated. Even in cases where the system you
are investigating is antecedently unfamiliar, you can give plausibility arguments for the
assumptions. For example, as we shall see in chapter 3, Newton referred to experiments
done in his laboratory to justify the applicability of the laws of motion in the Principia.
This is a reasonable assumption to make as a working hypothesis, but it could not have
been known at the time that the laws are in fact applicable to celestial objects.
Plausibility arguments are much weaker without the weight of experience behind them.
I think there is a different way of gaining confidence that an indirect
measurement is correct or approximately correct, which does not involve trying to come
up with a straight justification for the assumptions (1) and (2): let the indirect
measurements themselves be evidence that the assumptions were correct.
-
29
Figure 7
I think there are at least two strategies through which this can be done. The first
strategy is converging measurement.6 Suppose there is some system that we can
represent by Figure 7. There is a node X with two arrows out from it, arrow a to node Y,
and arrow b to node Z. Suppose both Y and Z are accessible properties, that is, we have
a way of measuring their values confidently. Suppose we don‘t have too much
confidence in the relations a and b. In this situation, there are two different ways of
measuring X, through observation of Y using relation a, and through observation of Z
using relation b. If we carry out both measurements, and we get approximately the same
result, that is, they converge, then this is good reason to think that the measurements are
good, and that the measurement of X is correct. We can, of course, have more than two
such converging measurements. The more the results converge, the better reason we
have to believe that the measurement of X is indeed correct. Note, however, that we can
get converging results even if the relations a and b are not strictly true of the system—it
could be the case that, say, relation a simply holds to a good approximation under the
6 The term, and the idea, are George Smith‘s. See (Smith 2002a) and his unpublished
manuscript ―Closing the Loop‖.
-
30
circumstances of the measurement. It turns out that we have more reason to believe that
the measurement itself is correct than the assumptions we made in order to make the
measurement.
This has implications, by the way, for the way in which we view the ―flow‖ of
evidence in science. In Figure 7, I have confidence in my measurements of Y. I have
low confidence in the relation a. Since I am using the relation a to measure X, one
might think that I should have low confidence in my measurement of X. This would
indeed be the case if I only measured X one way, but if I also measure X through the
other relation b, and they converge, then this will increase my confidence in X even if I
have low confidence in b. In fact, this might be reason to raise my confidence in the
applicability of the relations a and b. To put it in a loose but picturesque way, evidential
power does not flow monotonically from Y and Z towards X. Rather, under certain
circumstances such as converging measurements, X can be a new source of evidence,
and the evidential power can actually ―flow outward‖ from X. Of course, we have to be
careful about what such converging measurements actually show about the relations a
and b. The conclusion we can draw from such convergent measurement is that the
relations a and b are applicable under the conditions of the measurements, but we would
not know whether they would be applicable in other conditions.
There are other strategies besides converging measurement in which we can get
the indirect measurements themselves be evidence that the assumptions were correct.
They involve more complicated relational structures. The following strategy is what I
call decompositional success. For example, take a look again at Figure 2. Here, the
accessible property Z is affected by both the inaccessible property X, via the relation a,
-
31
and the inaccessible property Y, via the relation b. Suppose we don‘t have too much
confidence in the relations a or b, and we want to measure X. We might first try
guessing the effect of Y on Z, subtracting that effect out, and then measuring X using the
relation a. We now have a way of modeling the effect of X on Z using the relation a.
Now subtracting that effect out, we measure the value of Y using the relation b. Using
this new value of Y, we model the effect of Y on Z. We subtract out that effect and
measure a more refined value for X. Using this new, refined value for X, we model the
effect on Z, and we now come up with a new, refined measurement for Y.
If my measurements of X and Y seem to be converging on certain values, then
this is good evidence that this relational structure is approximately correct and the
relations a and b are also at least approximately applicable. Why? Suppose the relation
a is not approximately applicable. Then when we model the effect of X on Z and
subtract out this modeled effect in order to measure Y, we do not expect to get a good
value when we measure Y. Then, when we model the effect of Y and subtract it out to
measure X, we should expect this measurement not to give a good value for X, and thus
it should not agree with the previous value for X. Thus, if the sequence of values for X is
converging, this is evidence that the values for X and Y are correct. To put it loosely, we
are ―playing the measurements of X and Y off of each other‖—the measurement of X
presupposes that the measurement of Y is approximately correct, and the measurement of
Y presupposes that the measurement of X is correct. If either one is not approximately
correct, then in all probability the procedure should not work.
In actuality, these relational structures often turn out to be even more
complicated. But it is the very fact that these structures are so complicated that they can,
-
32
in some cases, confer very high confidence that some indirect measurements are correct.
The more complicated a structure, the more ways in which one can play measurements
off of one another, or try to measure one property in more than one way.
Now I want to discuss some limitations of these methods. First, as we shall see
when we start looking at actual cases of indirect measurement, most indirect
measurements are far from easy to do, especially when they involve systems that are
partially inaccessible. They often involve observations that are limited and hard to get,
and the calculations themselves can often be laborious, especially when we consider
sciences such as planetary astronomy in the sixteenth and seventeenth centuries. Thus,
indirect measurements will often be made with the hope that it will be shown down the
road that the assumptions that were made in carrying out the measurements will turn out
to be true. We will see in chapter 3, for example, that this is the best way to view what
Newton was doing in the Principia.7
The second limitation also has to do with the temporal dimension. These
methods all involve comparing the results of different indirect measurements. In most
cases, the indirect measurements will be made at different times. If the property you are
measuring changes over time, then you will not be able to get converging measurements.
Thus, a fundamental presupposition in using these methods is that the property you are
measuring will not be changing its value significantly over time—that the value will be
stable. This is an issue that I will discuss in more detail in chapter 3.
7 This is George Smith‘s view of the methodology of the Principia. This dissertation is largely
the result of trying to understand Smith‘s views of methodology particularly as they relate to the
problem of underdetermination.
-
33
8 Case studies
Now that I have laid out my general view of indirect measurement, the rest of
this dissertation is devoted to case studies of indirect measurement of complicated,
partially inaccessible systems. Each case will involve a problem where there is initially
a difficult problem of underdetermination—the available observations are not good
enough to uniquely determine the inaccessible properties of the system. Indirect
measurement through the use of enabling assumptions will resolve at least part of that
underdetermination. I will, for the most part, focus on understanding the justification for
the enabling assumptions.
In chapter 2, I examine the indirect measurement of planetary distances in the
solar system in the sixteenth and seventeenth centuries by Copernicus and Kepler. In
this case, there was an underdetermination between three different theories about the
motions of the planets, which can be partly resolved by the measurement of distances
between the planets. The measurement of these distances was enabled by making
certain assumptions about the motions of the planets. I argue that part of the
justification for making these assumptions comes from decompositional success in
playing off measurements of the earth‘s orbit and the Mars orbit against each other.
In chapter 3, I examine the indirect measurement of mechanical properties such
as mass and forces in the solar system by Newton. In this case, there were two
underdeterminations, the first an underdetermination between two theories about the
relative motion of the sun and the earth, and the second an underdetermination between
various theories for calculating planetary orbits. Newton resolves these two problems of
underdetermination through a research program where the various sources of force are
-
34
identified and accounted for. This program crucially requires the third law of motion to
apply between celestial objects, a point on which Newton was criticized. I examine the
justification for the application of the third law of motion through its successful use for
decomposition of forces in the solar system, in a long term research program. I further
discuss comments by Kant on the role of the third law of motion for Newton, in which
Kant recognizes its indispensability for a long-term program for determining the center
of mass of the solar system and thus defining a reference point relative to which forces
can be identified.
Chapter 4 covers the indirect measurement of density in the earth‘s interior using
observations of seismic waves. One of the difficult problems in this case is that we can
think of the interior density of the earth as a continuous function of radius—in order to
determine this radius function, you are in effect making a measurement of an infinite
number of points. The natural question to ask here is how much resolution the
observations give you. I will focus on the work of geophysicists who were concerned
with this problem, out of which eventually a standard model for the earth‘s density grew.
-
35
-2-
Copernicus, Kepler, and Decomposition
1 Planetary Astronomy
The most difficult problem of planetary astronomy in the sixteenth century
was that the observed two-dimensional motions of the planets across the night sky
are consistent with three different theories of the actual three-dimensional motions of
the planets through space—the Ptolemaic theory, the Copernican theory, and the
Tychonic theory. In other words, the theory of the actual motions of the planets was
underdetermined by the available observations. In fact, by making minor
modifications, you could make the theories empirically indistinguishable from each
other, given the kinds of observations that were available at the time. It seemed to
some astronomers in the sixteenth century that this underdetermination is
unresolvable, and that, in fact, trying to determine the actual motions of the planets
should not even be an aim of planetary astronomy.
This problem could be solved, however, if you could find a way of indirectly
measuring the distances between the planets, for the motions of the earth, the sun,
and the planets through space are different for each of these theories. Copernicus
-
36
and Kepler both use the method of triangulation to attempt to measure planetary
distances—setting up a triangle with the sun, the earth, and a planet at the corners
and using geometrical relations to determine distances. In order to carry out this
procedure, however, it is very important to know the angles of the triangle accurately.
But as I will explain, in order to determine these angles, you must perform what I
called a decomposition in chapter 1—you have to separate out the effects due to two
different features of the planetary motions. These features are called the first
inequality and the second inequality.
Thinking about this problem in terms of the picture of indirect measurement I
provided in chapter 1, we can take the solar system to be a complicated, partially
inaccessible system, with the apparent motions of the planets being the accessible
properties of the system, and the actual three-dimensional motions of the planets
being the inaccessible properties. In chapter 1, I explained that in order to carry out
an indirect measurement, you need to assume that (1) certain relations between the
accessible and the inaccessible properties apply, and (2) effects from various
inaccessible parts on the accessible parts can be decomposed in a certain way. The
central question was how you ensure that the indirect measurement is correct or
approximately correct in the face of (1) and (2).
With regard to assumption (1), the fundamental theory from with the relations
between the accessible and inaccessible properties, that is, the relations between the
apparent motions of the planets and the actual three-dimensional motions of the
planets, are derived, is Euclidean geometry. That Euclidean geometry is applicable
to the planets was never called into question by astronomers—they could not have,
-
37
of course, since they did not know of any other geometry than that of Euclid.
Assumption (2), however, involves exactly how you break down the apparent
motions of the planets. All astronomers at the time, following Ptolemy, separated
out two motions, the first inequality and the second inequality. There were
disagreements, however, as to how to characterize each of these motions, and to what
actual motions of the planets the first and second inequality corresponded. Since this
separation of motions had to be done in order to determine planetary distances, how
could an astronomer know whether a measurement of planetary distances involving
decomposition is correct?
2 Planetary Astronomy in the sixteenth century
Although a more thorough treatment of planetary astronomy from the mid-
sixteenth to the early seventeenth century would certainly require a section on Tycho
Brahe, I will focus on the work of Copernicus and Kepler. We will be thinking about
the work of Copernicus and Kepler in terms of the framework I described in Chapter
1. We have access to the two-dimensional motions of the planets across the night
sky, that is, angular distances of the planets relative to the constellations, over time.
What we want to know are the actual motions of the planets in three dimensions, that
is, relative distances and directions of the planets over time. We will find that the
measurement of planetary distances crucially involves separating out two different
features of the motions of the planets—the first inequality and the second inequality.
I will explain what the first and second inequalities are shortly, but let us first
consider the apparent two-dimensional motions of the planets. We can think of the
-
38
night sky as a vast, hollow sphere, onto the inner surface of which are painted the
stars that are visible from the earth, some forming the familiar shapes of the
constellations. The sun appears to make one entire circuit around this sphere every
year, and the great circle along which it travels is called the ecliptic. The planets
appear to move roughly along the ecliptic, but their motions are somewhat
complicated. Movement along the direction of the ecliptic is called longitudinal
motion, while movement perpendicular to the ecliptic is called latitudinal motion.
Since it is the longitudinal motions that ultimately yield information about planetary
distances, I will talk almost exclusively of the longitudinal motions in what follows.
Now, let us consider these longitudinal motions. The motions of the planets
are fairly regular, but they have two significant irregularities in their motion. One
irregularity is the famous retrograde motion. At some points along their journey
along the ecliptic, the planets will appear to stop and reverse direction for a while,
going the opposite direction along the ecliptic. This irregularity was called the
second inequality (or the second anomaly) by astronomers from the time of Ptolemy
through Kepler. We now know that the second inequality arises because we are
viewing the motions of the planets from a platform that is itself moving, namely the
earth.
The other irregularity is that the planets appear to speed up and slow down at
various points as they travel along the ecliptic. This variation in apparent angular
velocities was called the first inequality (or the first anomaly). We now know that
this variation occurs for two reasons. About half of the maximum variation in the
apparent angular velocity is because the planets actually do speed up and slow down
-
39
relative to the sun in accordance with Kepler‘s area rule, while the remaining half
comes from the sun not being at the center of the earth‘s orbit, but at a focus.
Figure 1
Figure 1 (from Swerdlow and Neugebauer 1984, 615) is a representation of the
Ptolemaic theory. The earth is labeled O, and the position of a planet is labeled P.
In this theory, the second inequality is accounted for by the use of epicycles. The
planet moves in an epicycle, which is a circular orbit, while the center of the epicycle
itself moves in a circular orbit, called the deferent, around the earth. In Figure 1, the
center of the epicycle is labeled C, while the center of the deferent is labeled M. The
first inequality is accounted for by offsetting the earth O from the center M of the
deferent, and having another point called the equant point, labeled E, located on the
opposite side of the center from the earth, at the same distance from the center as the
earth. The planet travels at constant angular velocity as seen from the equant point,
-
40
and thus when seen from the earth it will appear to speed up and slow down at
various points along its orbit. Since the equant point does not coincide with the
center of the deferent, the planet‘s actual motion along the deferent will not be
uniform circular motion.8
Ptolemaic astronomy was enormously successful—since its development in
the second century, it was not superseded in accuracy for over a thousand years, until
the work of Kepler. There were some aspects of Ptolemaic astronomy that were
unsatisfactory, however, if one tried to think about how it could be physically
implemented. Almost all astronomers before the time of Kepler believed the planets
were carried along in their circular orbits by being embedded in rotating crystalline
spheres. As I just mentioned, according to Ptolemaic theory, the speed of the planet
along the deferent is not uniform—thus if it is being carried along by a crystalline
sphere, the sphere must somehow slow down and speed up in such a way that the
planet has constant angular velocity as seen from the equant point. It was difficult to
see how this speeding up and slowing down could be physically implemented. In
response to this difficulty, there was a school of Arabic astronomers connected to the
Maragha observatory in modern-day Iran who, in the thirteenth and fourteenth
centuries, developed planetary models using only uniform circular motion, using
epicycles to account for the first inequality.
Famously, Copernicus came up with a theory in which the second inequality
is accounted for by putting the sun at the center of the solar system and having the
8 See Evans 1984 for an excellent exposition of the role that the equant plays in Ptolemaic
astronomy, and why this innovation allowed Ptolemaic astronomy to be so empirically
successful.
-
41
earth go around the sun. The second inequality is then seen to be the effect of
observing the planets from a point that is itself moving. Although popular accounts
of Copernicus have him rejecting the Ptolemaic theory because of its epicycles, he
actually objected to it for the same reason as the Maragha astronomers—because it
departed from uniform circular motion (Swerdlow and Neugebauer 1984, 293-294).
In fact, Copernicus accounts for the first inequality using the same principles as the
Maragha astronomers did, with an epicycle.9 In order to get the theory to capture the
motions that Ptolemy could using the equant, this epicyclic theory for the first
inequality had to be rather complicated. Figure 2 (from Swerdlow and Neugebauer
1984, 616) shows the Copernican theory for the first inequality.
Figure 2
9 Swerdlow and Neugebauer go so far as to say that Copernicus ―can be looked upon as, if
not the last, surely the most noted follower of the Maragha school‖. (295)
-
42
3 Triangulation
Since both Copernicus and Kepler use fundamentally the same method to get
planetary distances, I will first explain the basic method so that the explication will
be easier when we look specifically at what Kepler and Copernicus do. At root, the
method is very simple. First take a look at Figure 3. It shows the sun, the earth, and
a planet, surrounded by constellations. The constellations are taken to be fixed
permanently in their positions, and thus they provide a reference point for recording
observations of the planets. From the earth, I can observe the position of the sun S
and the planet P along the ecliptic. The position along the ecliptic is called the
longitude. The longitude as seen from the earth is called the geocentric longitude.
In Figure 3, it just so happens that the earth, the sun, and the planet are lined
up so that the sun and the planet are exactly on opposite sides from the earth. Notice
here that when I observe the planet from the earth, I see it exactly the way I would
see it from the sun. When the sun, the earth, and a planet are in this configuration,
this is called opposition. Call the longitude as seen from the sun the heliocentric
longitude. Then at opposition, the geocentric longitude and the heliocentric
longitude coincide.
-
43
Figure 3
Now suppose the planet, the earth, and the sun are in the configuration shown
in Figure 4. In this configuration, the planet would have a different longitude, that is,
it would appear to be moving through different constellations, depending on whether
I observe it from the earth or from the sun. We can see that when not in opposition,
the geocentric longitude and heliocentric longitude will be different.
Figure 4
-
44
Now suppose when the earth, a planet, and the sun are in the configuration of
figure 4, we want to find the distance from the earth to the planet, the distance EP, as
a ratio of the distance from the earth to the sun, the distance ES. Suppose we already
have a theory of the motion of the earth around the sun, so that we know, at any
given time, the longitude of the earth as seen from the sun. And suppose we have in
addition a theory of the motion of the planet P around the sun as well, so we know, at
any given time, the heliocentric longitude of the planet P. The theory of the earth‘s
motion will give us the direction of the line ES, while the theory of the motion of P
will give us the direction of the line SP. Making one observation from the earth will
give us the direction of the line EP, thus allowing us to find all the angles in the
triangle EPS. This will then allow us to find, by simple geometry, the ratio of the
length of the line EP to the length of the line ES, which is what we wanted. Thus,
given that this is the actual configuration of the earth, the sun, and the planet, and
that I have the proper theories for the earth‘s motion and the planet‘s motion, I can
find the distance from the earth to the planet, relative to the size of the earth‘s orbit.
4 Copernicus’s measurement of planetary distances
We will now move on to the specific method that Copernicus uses to measure
distances to the planets, which he does in Book 5 of De Revolutionibus. Since the
method he uses is basically the same for all five planets, with minor differences
depending upon whether the planet is an inner planet or an outer planet, I will only
describe his procedure for one of the planets, Saturn. The basic method is
triangulation, just as I described above. We can think of Figure 5 (from Swerdlow
-
45
and Neugebauer 1984, 635) as a much more detailed and complicated version of
Figure 4. Saturn is labeled P, the earth is labeled O, and the sun10
is labeled S. The
reason this figure is so much more complicated than figure 4 is that the theory of
Copernicus does not consist of the simple circles I have above. The theory of
motion for Saturn involves an epicycle to account for the first inequality, and there
are further complications because the sun is not located at the center of the orbit of
Saturn. But if we strip away some of these complications, the basic method is the
same. The idea is to determine the angles in the triangle formed by the earth, the sun,
and Saturn.
Figure 5
10
One detail that I will discuss in a later section is that the sun here is the mean sun, not the
true sun.
-
46
The first leg of the triangle, the direction of the line from the earth to the sun,
is given by the Copernican solar theory, which is really the theory of the earth‘s