Measurement and Uncertainty

In the Parallax lab, you were asked to estimate the uncertainty in your results, from your guesses about the sources of error in the measurement. This lab explores, in a little more detail, how one can express the level of confidence associated with a scientific measurement.


All of our knowledge of the physical world is obtained from our observations of that world, and from extensions to those observations that allow us to predict physical behavior. These extensions are called theory, and theories must be testable by measurements of predicted behavior. So the process is a circle, from measurement to theory to prediction to measurement again, which leads to improvement in the theory, new predictions and tests. Each loop increases our understanding, so it is perhaps useful to think of the process more as a spiral, where every time around we make a little progress.

For example: a simple observation is that the Sun "goes down" sometimes—but, after giving us time for a nap, it "comes up" again on the other side of the house. A theory based on several such observations would be something like "every time the Sun goes down, it will come back up on the other side of the world." This can be tested by watching. Better observations would include measurements of the times between sunset and sunrise, using some sort of clock, and measurements of the time that the sun is "up." The theory could then be improved to provide predictions of when the Sun would rise and set. You can see how this understanding can get more precise, as we make measurements from various locations on Earth, and at various times of "year." Hmm, what is a "year"? This observation, that the days get longer, then shorter, and that this cycle repeats too, can lead to a whole new theory.

Crucial to this process is the understanding that no measurement is exact: there is always some uncertainty in a measurement, and a statement of the result of a measurement in incomplete without a statement if its uncertainty. In fact, as we have already seen, it is usually harder to establish the uncertainty of a measurement than to make the measurement itself. Some parts of the uncertainty have to do with the quality of the measurement tools and our use of these tools; these effects can be explored and quantified by making a set of repeated measurements. Other sources of uncertainty have more to do with our understanding of the measurement technique, and can be very difficult to evaluate.

The words "uncertainty" and "error" are used interchangeably in this context. This is a misuse of the word "error", because we never mean "mistake". Mistakes, like confusing the inches scale with the cm scale, can be avoided and need to be rectified before determining the experimental error.

First, we need a few definitions:

Significant figures

There are two conventions that are used to communicate the confidence level (or lack of confidence, the uncertainty) in a measurement. First, the result of the measurement is always accompanied by an explicitly stated value for the uncertainty. The usual form is 12 +/- 1, where the characters "+/-" are read "plus or minus" and are used here to cope with the limitations of html. Secondly, the number of digits used to express the result are chosen to properly reflect its uncertainty. If your measurement gives the value 12.33, but you believe the uncertainty is about 0.1, you must write it as 12.3 +/- 0.1—that is, don't include digits whose magnitude is smaller than the uncertainty.

Zeros can cause confusion. Leading zeros are not significant figures: 0.00004 has one significant figure. Trailing zeros without a decimal point may or may not be significant: 400 may have one, two or three significant digits. This can be cleared up by an explicit statement of the uncertainty (e.g. +/- 10), or by putting in a decimal point: 400. is conventionally taken to mean that there are three significant digits. Expressing the number in powers of ten notation makes it easier to tell which zeros are significant: 4 x 102 has one significant figure; 4.00 x 102 has three.

The uncertainty is usually expressed as a single digit (sometimes two), of the same order of magnitude (i.e. same decimal place) as the last significant digit of the value. For example:

8.45 +/- 0.03
10.0 +/- 1.5
5 +/- 2

The following numbers are all incorrectly written: write the correct expression next to them.

83.45 +/- 0.023815

100.0 +/- 2

5 +/- 0.5

0.00034 +/- 0.0001

When you are combining numbers to get a result, as we did for the parallax measurements, you can keep an extra digit for the computations to avoid rounding errors. Your calculator keeps lots of extra digits, of course. But be sure to trim off the meaningless digits when you express the result. In particular, converting from one unit to another does not change the uncertainty: if you measure a length to be 15.5 +/- 0.5 feet and want to convert it to cm, the value should be written as something like 470 +/- 15 cm (0.5 feet is about 15 cm), even though the calculator says 472.44. Notice that practically every single container in the grocery store gets this wrong when they convert from fluid ounces to ml.

Systematic Errors

Systematic errors shift the measurements all one way. Incorrect calibration of test equipment would be an example of a source of systematic error. Actual variation of the thing you are measuring would be another: a variable star's brightness cannot be measured accurately without taking into account its variation. Or the length of an object may depend on the ambient temperature or humidity. A goal in any experiment is to reduce the magnitude of systematic errors below the size of the random errors.

Random Errors

Random errors can occur for a variety of reasons, all of which lead to the measurements fluctuating about a mean value. Random errors may be reduced by improving the measurement apparatus (like getting a more precise voltmeter) or the technique (reading the scale with a magnifying glass), but they cannot be eliminated. The size of the random uncertainty may be obtained only by making a set of repeated, independent observations.

Mean and Standard Deviation

The mean Xmean of a set of measured values Xi is simply the sum of the Xi, divided by the number N of measurements. This is the best estimate of the true value, based on this set of measurements.

The variance of a set of measured values is the average of the squared deviations from the mean:
   variance = (sum of (Xi - Xmean)2) / N
and the standard deviation SD is the square root of the variance.

If the errors are truly random, and a fairly large number of measurements are taken, they will scatter symmetrically about the mean value, with more of them close to the mean and a smaller number farther from the mean. This distribution is called a Gaussian or normal distribution. In a normal distribution, 68% of the measurements will lie within one standard deviation of the mean and 95% of them will be within two standard deviations. This means that if you make one more identical measurement, it has a 68% probability of being within one standard deviation of the previously calculated mean, and a 95% probability of being within two standard deviations.

Calculate the mean and standard deviation of the following set of numbers. Write them below.

74, 75, 79, 77, 74, 65, 64, 78, 75, 74




Standard Deviation of the Mean

When we average a set of measurements, we get a better estimate of the true value than we have from a single measurement. The parameter that expresses this improvement is called the standard deviation of the mean (we'll call it SDM), which is SD divided by the square root of N. So if N is 9, SDM = SD / 3. You can see what this means: if you were to take another set of measurements and calculate their mean, you expect that you have a 68% likelihood that the second mean value would be within one SDM of the first mean. And a 95% likelihood that it would be within 2 SDM of the first mean. Another way to express this is that we are 95% confident that the true value (always assuming, of course, that we've removed all the sources of systematic error!) is between Xmean - 2 SD and Xmean + 2 SD. Assuming the above numbers are a sequence of measured values, write down the 95% confidence interval (lower bound and upper bound) within which the true value is expected to be.

Absolute and Fractional Error

Suppose you measure your weight on a spring-type bathroom scale, where the needle sticks a little and the readings are not all exactly the same. You take several measurements and determine that your weight is 105 +/- 2 lbs. The error estimate of 2 lbs is the standard deviation of your set of measurements. So we say the absolute error is 2 lbs—but the fractional error is 2 / 105 or about 0.02 or two parts in 100. If you then weigh your cat (you have an extremely docile cat) on the same scale, you get a mean value of 10.5 lbs. But the readings still vary the same amount, so the absolute error is the same: 2 lbs. However, the fractional error is now 2 / 10.5, or about 0.2. This kind of result is typical: to measure small values precisely you often need better tools.

Combining Measurements

Suppose the result you want depends on more than one measurement-- like the parallax measurement where you needed to measure both the baseline and the angle. (In fact that experiment also required the measurement of the length of the cross-staff, though we assumed its length to be known pretty accurately.) How do you combine the individual uncertainties to get the uncertainty in the result?

The rule is that when you add or subtract two or more measured values, the absolute error in the result is the square root of the sum of the squares of the individual absolute errors. And when you multiply or divide two values, you do the same thing but using the fractional errors: the fractional error of the result is the square root of the sum of the squares of the individual fractional errors.

Given two measurements X = 10.0 +/- 0.7 and Y = 3.1 +/- 0.4, what are the uncertainties in computed values A = X + Y and B = X / Y?


Laboratory Measurements

Angle measurement with a cross-staff

Use one of the A110L cross-staffs. Stand at the end of the hall and use the cross-staff to measure the angular width of the door at the far end of the hall. Make ten independent measurements and record all the values. Try very hard to ignore previous values when measuring or recording a subsequent value, because this only works if the measurements are really independent. Move one paper clip before taking each measurement, then adjust the other clip to make the reading. Do each observation as carefully as you can, but pay no attention to what you got for other observations. Compute and record the mean and standard deviation. How does the standard deviation compare with your previous estimate (in the parallax lab) of how well you could read the cross-staff? Compute the standard deviation of the mean. This is your estimate of the uncertainty in the final, averaged result.








Volume of platinum brick

All the resources of the Astronomy 110L program are contained in a platinum brick. We need to measure its volume carefully, so we can determine if we can afford new flashlights for next term. A precision ruler is available. We know that the volume of a rectangular prism is V = L x W x H. To get the best possible measurement, each person will use the precision ruler to measure the three dimensions of the brick and record them. Please do this completely independently and do not share your results. For consistency, use units of cm, but record your measurement to as much precision as you can manage. When all the measurements are complete, we will tabulate them on the board.

Now copy the tabulated measurements, and for each dimension compute the mean and SD, and SDM. Compute the volume of the brick in cubic cm, and the value of the uncertainty in the volume. Which dimension is most important to measure precisely?


















Consider possible sources of systematic error in measuring the volume of the brick. Not dumb mistakes, but real possibilities that could cause the result to be systematically too large or too small. List two of these, and suggest for each a way to evaluate the error, or reduce its impact.

Last modified: April 7, 2005