WatchUSeek Watch Forums banner

1 - 20 of 63 Posts

·
Registered
Joined
·
18,166 Posts
Discussion Starter #1 (Edited)
We are indebted to South Pender and Catalin for developing this post. Since this is a sticky, please confine follow on posts to those germane to the topic.

Methods of Determining the Accuracy of a Watch
WUS HEQ Members, March, 2010

There are several ways of determining just how much your watch is gaining or losing time against an absolute standard. These methods differ in sophistication and in the accuracy of your results. Let’s consider three methods here. There are undoubtedly more than this, but these three are all we need to get the job done.

Method 1: The Naked-Eye Offset Determination Method

That’s a fancy title we made up for this the simplest and least precise of the methods we’ll consider.

Needed.

  1. access to a precise representation of the time, this representation based on an atomic clock and referred to below as the reference clock
  2. your watch. One source that is readily available as the reference clock is the time representation on the website:

http://www.time.gov/timezone.cgi?Pacific/d/-8/java.

The time zone can be changed easily once on this website. This site displays the time in the form Hr:Min:Sec as XX:XX:XX. The accuracy of the time display at any one time is given just below the time as "Accuracy within 0.X seconds." In this, 0.X can be as low as 0.1 and can occasionally be over one minute. Most of the time it is around 0.2. It is best to use this reference clock when it is showing either 0.1 or 0.2 accuracy. This suggestion applies with Method 2 below as well.

Procedure. With the 2-digit seconds (in the display XX:XX:XX) from the reference clock changing on your computer screen, hold your watch close to the display on the screen and, keeping an eye on both the reference clock and your watch, note the minute marker your watch's second hand lands on when the reference clock hits XX:XX:00 (or some other easy reference point). Let's say you are doing this at around 9:30. When the reference clock shows exactly 09:30:00, on what minute marker has the second hand of your watch just landed? Say the second hand has landed at the 9-minute mark (this could be seen as a "second marker" in the present context), while the minute and hour hands are registering 9:30; in that case, your watch is running 9 seconds fast. Or, if your watch's second hand landed at the 54-minute mark when the minute and hour hands were registering 9:29, your watch would be running 6 seconds slow.

A good way to track the time-keeping accuracy of your watch would be to set it to coincide precisely with the reference clock. That is, stop the watch and have it ready to restart at the second the reference clock hits XX:XX:00 (with, of course, the hour and minute correct). At that point, your watch will be absolutely precise, or at the identical time point as the atomic clock from which the reference clock is getting its data. Some time later, you can do the time check described in the above paragraph, and any displacement from what the reference clock is showing will index the gain or loss taking place in your watch over that time period. Thus, say that you set your watch to be exactly on with the atomic-clock reading, and 30 days later check how your watch is doing with respect to the atomic clock. If now (30 days later) it is indicating a time that is 3 seconds ahead of that on the reference clock, your watch is running fast at the rate of 3 seconds in 30 days. We often express watch accuracy in seconds per year (or spy). Our result of 3 seconds in 30 days could be transformed into seconds per year by taking (365/30) x 3.0 or 36.5 spy.

Accuracy of the Method. With practice, one can easily stay within 1 second accuracy for the period of time, and can probably get to .50-second accuracy. By the latter, we mean that you will be able to determine that the watch is more than, say, 2 seconds off and less than 3 seconds off, and so would register it as 2.5 seconds off perfect time. For short time periods between setting the watch to absolutely correct time and checking its accuracy, this represents a relative poor level of accuracy, but for a longer period—e.g., 6 months—it is adequate to give a sufficiently-accurate estimate of seconds per year accuracy.

Method 2: The Stopwatch Method

The provenance of this method is likely lost in the mists of time, but we are indebted to WUS Forum Member Oldtimer2 for bringing it to our attention recently. This method produces more accurate results than does the simpler approach in Method 1 above. What follows is what appeared in Oldtimer2’s post, but edited to make it read consistently with the rest of this piece.

Needed.

  1. reference clock as above;
  2. stopwatch (and this can be a very inexpensive sports quartz stopwatch)
  3. your watch.
Procedure. With the 2-digit seconds (in the display XX:XX:XX) from the reference clock changing on your computer screen, it's easy to judge the accuracy offset in a test watch by eye to within a fraction of a second. First start the stopwatch the instant the reference clock reaches a convenient point, for example, XX:XX:00. Next, glance at your watch and stop the stopwatch at precisely the point the second hand of your watch lands on a minute marker that represents some arbitrary number of seconds ahead of the starting point, for example, XX:XX:10.

Example: We start the stopwatch the precise instant the reference clock shows 10:30:00. Now glancing at our watch, we stop the stopwatch at the exact instant the second hand of the watch lands on the 10-minute (or second) marker with the hour and minute at 10:30. We now read the elapsed time on the stopwatch. If it reads 11.50 seconds, for example, this would indicate that the watch is running 1.50 seconds slow, or behind the reference clock (11.50 – 10.00). On the other hand, if the stopwatch reads 8.25 seconds, then we can conclude that the watch is running 1.75 seconds fast, or ahead of the reference clock (8.25 – 10.00). We can start the stopwatch at any convenient point on the reference clock and stop it at any convenient point on our watch some number of seconds beyond the starting time. For example, we could start the stopwatch when the reference clock is at 10:30:20 and stop it when the watch second hand lands precisely on 10:30:30. Since these times are 10 seconds apart, any deviation from 10:00 on the stopwatch indicates some offset from perfect time (with numbers larger than 10:00, indicating that the watch is slow, numbers smaller than 10:00, that the watch is fast).

Accuracy of the Method. With this method, you should make several estimates using the same procedure each time, and these can be made in rapid succession taking overall no more than 2 to 3 minutes (for maybe 10 estimates). You would then average these estimates. This averaged result should be within about .10 seconds of exact correctness, but to some extent, of course, this level of accuracy will depend on the number of single estimates averaged. As Oldtimer2 summarized this method: “…any reaction time delay errors in starting/stopping the stopwatch tend to cancel.

Plus timebase errors in the stopwatch are completely negligible over timescales of a few tens of seconds, [true for] even the most awful quartz movement ever made! I find that with practice five readings seems to easily give 0.10-second accuracy or better (and if you doubt this, you can always test the method by starting and stopping the stopwatch against the reference time source alone).”

The note above about time interval between separate timings applies with this method as well. To get a good estimate of the seconds per year accuracy of your watch, you would set it exactly to the reference-clock reading first, and then, some time later, use the above method to determine offset over that time interval. As before, longer intervals will lead to more precise spy values.

Method 3: The Video Method

This method is the most accurate of the three, albeit somewhat more equipment-intensive as well. It appears that the notion of capturing on video the closeness of a watch second hand to the readout of a reference clock has been considered and described, over the years, by several of the more technically-expert members of the WUS HEQ Forum. The specific program described herein, however, is due to forum member Catalin, and this software addition brings this method into the realm of possibility for many watch enthusiasts. What follows is Catalin’s own description of this method.

Needed.

  1. A fast (>1 GHz) PC connected to the Internet on a decent connection and synchronized to one of the major Internet time servers just before the tests (this probably provides better than 10 milliseconds (ms.) errors on the time on the computer. I use a freeware program called AboutTime in order to sync my computer since the program also tells you the error after each sync, and having two consecutive syncs with very small errors is a good guarantee your computer is set very, very close to the actual atomic time);
  2. A decent 'watch program' on the computer that will display the time, including some fractions of a second, in a very careful way (so as to always have very constant and small delays). Unfortunately none of the major operating systems around are even soft-realtime, but I have modified one of my own programs (see below) on Windows, and I believe the errors are in the same 10-20 ms. interval and, more important, very constant (so as to be completely eliminated for practical purposes when you calculate the difference from two such similar measurements);
  3. A decent (LCD) display with 60Hz refresh rate or better; the newer 120 Hz (some of which are also '3D ready') are even better;
  4. A decent camera that can do movies at 25/30 (maybe even 50/60 or 100-200 if you have a 120 Hz monitor) frames/sec.; ideally it should also do those movies in 'macro mode';
  5. A program that can display the movie from the above camera 'frame by frame' (VLC, even BSPlayer); and, of course
  6. Your watch.

The program I use can be now found at:

http://caranfil.org/timing/setup_ear...0_122_beta.exe.

You will note that the 'main window' stays normally hidden and can be shown with either a click on it or just 'hovering' with the mouse over it (there is a setting to configure that). Normally the milliseconds are not shown, but if on activation either SHIFT or CTRL is pressed the 'millisecond mode' is activated. If both SHIFT and CTRL are pressed the 'seconds beep' mode is also activated. See below a post with pictures from the mini-movies I am using for timing tests.

Procedure. Once you take a few (2-3) short (10-15 seconds) movies, you go with them to the video player and 'hunt' for the milliseconds interval when the seconds-hand is advancing—most often around that moment that you will see one frame with one time displayed on the monitor and the seconds-hand in the first position, then a second frame with a later time displayed and the seconds-hand already in the final new position. In that scenario, the actual time is the average of the one in the first and the one in the second frame, and by looking at a number of such frames you can minimize the interval so as to have statistically better precision. Even better IMHO would be to 'catch' a frame where the seconds-hand is actually 'in motion”; in those cases I take in my measurements the milliseconds time of that frame as the actual time.

At the end of this description, there is a sequence of images like that described above. Note in these images, that the
milliseconds are at .743 and by actually looking at the full time and the time on the watch we can calculate that the watch is at about +24.257 seconds from the Internet atomic time (ahead).

That time is 'written-down' (I placed it initially into a TXT and now in an Excel file), and some time later (I suggest 1 or 2 weeks for regular longer-term measurements), you will do the next similar measurement; in my case it was
+26.744 after two weeks, and from that, the difference was +2.487 over two weeks which means a rate of about +64.840 seconds over one year (our seconds per year, or spy value discussed earlier). All those numbers are in one of my more recent PDF files, for instance:

http://caranfil.org/timing/timing_data_20100314.pdf ).

Accuracy of the Method. With the above equipment and procedure, I believe you can easily measure with a precision clearly better than 100 milliseconds (and even down to the actual time for each frame on the monitor)—of course, probably nothing much better than 5-10 ms. but even at 50-100 ms., the results will be more than 10 times better than what we normally get with a quick single measurement with 'the human eye.’ (Method 1 above.)


Pictures follow.










 

Attachments

·
Registered
Joined
·
2,266 Posts
SOME MORE DETAILED DIRECTIONS ON USING THE
STOPWATCH METHOD FOR WATCH-RATE TIMING AND BEYOND
7 May 2010

After considerable experimentation since the first posting of this sticky thread, I have developed some procedures that may improve the precision with which the Stopwatch Method may be employed.

GENERAL REVIEW OF THE METHOD

Just to briefly review this method, we access a reliable reference clock on our computer screen. I’ve used the NIST reference clock, but there do appear to be more precise alternatives. Still, the precision offered by the NIST site seems sufficient for general timing purposes. I wait until the site indicates “Accurate within 0.1 seconds,” although not a great deal of imprecision attends accuracy postings of 0.2 seconds. I also do all my timing from the same computer, as differences can occur between computers. The NIST reference clock may vary in its accuracy through the day and/or between days (from a minimum of 0.1 seconds to as high as close to 1.0 second), but on my main computer for timing, it is most commonly operating at the 0.1-seconds accuracy level.

Next, we wait until the reference clock shows a convenient number with regard to the seconds—like XX:XX:X0 or XX:XX:X5 (for example, 10:13:00, 10:13:20, 10:13:05, 10:13:25, etc.), and as quickly as possible click on the stopwatch. At some number of seconds later, while looking at the second hand of our watch, we click off the stopwatch. Thus, for example, we might, while looking at the NIST reference clock, click the stopwatch on at, say, 2:24:00, and, now watching the second hand of the watch, click the stopwatch off at the moment the second hand hits 2:24:10. The stopwatch will now be stopped with an elapsed time showing on it. If that time is less than 10 seconds, your watch is running fast; if the time is greater than 10 seconds, the watch is running slow. Say that the time showing on the stopwatch is 9.87 sec. This means that the watch is running .13 seconds fast or +.13 seconds. (Let’s agree to label this discrepancy between the reference-clock time and that on your watch as the watch’s offset—meaning the amount it is “off” the reference clock.) You can now relate this offset from perfect time to the time since the watch was synchronized with the NIST clock and determine the rate of gain (or loss) over that period. Let’s agree to call this discrepancy (gain or loss)—between the present offset time on your watch and the offset reading (if there was any offset) at the time you synchronized the watch with the reference clock—the watch’s drift (referenced to a time period, as, for example, .10 seconds drift over a 4-week period).

SOME ADDITIONAL DETAILS

1. Timing Interval

In my previous coverage of this method, I suggested using a 10-second timing interval. That is, if you were to start the stopwatch when the reference clock showed, say, 12:45:20, you would stop it when your watch’s second hand hit 12:45:30. I have now found that this timing interval, although fine if you wish to use it, is longer than strictly necessary. I now use a 5-second timing interval. For example, I start the stopwatch the instant the reference clock flashes, say, 3:42:10, and stop it the instant my watch’s second hand hits 3:42:15. This is what I mean by Timing Interval. There is absolutely no loss in precision in going to a shorter timing interval, and I find that I can comfortably do 10 timings (from which I compute the average) in less than 2 ½ minutes with the 5-second Timing Interval.

2. Stopwatch Technique

It is important, as noted previously, not to anticipate a time point, but instead to wait until you have actually seen it. This applies, of course, to both the time point on the reference clock and the second marker that your watch’s second hand hits at the end of the timing interval. Once you have seen the time display or second hand hit the second marker 5 (or 10) seconds later than the NIST time display, it is important to instantly snap the stopwatch. Too-fast or delayed clicking of the stopwatch will contribute to slight levels of inaccuracy. After one or two timing sessions, this becomes pretty easy. I have found that the variability among the timing results can be reduced quite substantially by attention to consistent, unvarying technique.

3. Recording Results

This is straightforward. I just quickly write down the elapsed time on the stopwatch, reset the stopwatch to 00:00, and wait for the next X0- or X5-second mark (for example 12:30:10, or 12:30:15, etc.). I do 10 (or 12; see Section 9 below) timings (or timing trials), and, as noted, can do this number of timings and write down the results in less than 2 ½ minutes overall.

4. Establishing a Baseline

Timing tests are done to determine the drift occurring over a known period of time. This means that we have to have an accurate reading of the time offset (if any) at the beginning of that period. To do this:

(a) First synchronize your watch with the reference clock (e.g., the clock on the NIST website). This is straightforward. By whatever means the watch provides, start the watch the instant its second hand hits some convenient hour:minute:00 mark—such as 10:15:00. (You have already set the hour and minute hand to 10:15). Within a certain band of error, your watch will now be in exact agreement with the reference clock.

(b) Perform a Synchronization Check. To gain greater precision, it is a good idea to then do a timing test, by the method discussed above. For example, despite your best efforts to achieve absolute agreement with the reference clock, you may find that your watch is, in fact, running .05 seconds faster (or slower) than the reference clock, i.e., has a +.05 sec. or –.05 sec. offset. This is your baseline for future rate checks. Make a note of this for future reference. Let’s say that, at the synchronization session, you end up (despite your best efforts) with a +.05 sec. offset, and you do your follow-up timing test in one month’s time. If at that time, you get a result that indicates that your watch is .15 seconds faster than the reference clock, its actual drift (gain in this case) over that one-month time period is not .15 seconds, but rather .10 seconds (+.15 – +.05).

5. Processing Your Results

We have discussed elsewhere that individual observations of rate drift are prone to error (sampling error in the language of the statisticians). However, this error can be greatly reduced by using averages of a number of observations. I have alluded above to the use of 10 timings (on which to base the average of the 10). This average will be far closer to the actual time offset than will any single observation no matter how carefully and precisely it is obtained (even by a method like the Video Method).

There are two key indices you need to calculate from your 10 (say) timing trials: (a) the average (or mean) and (b) the standard deviation(I’ll abbreviate this as St. Dev.). Both are easily calculated via any simple engineering, scientific, business, or statistical hand-held calculator ($20 range). You just enter the values you obtained in the timing trials and then press the button for each of these indices. Let’s consider an example (some numbers I just ran for my own watch). I ran 10 timing trials and got the following results, using a 5-second timing interval:

4.91; 4.88; 4.92; 4.89; 4.96; 4.85; 4.95; 4.91; 4.86; 4.90.

The mean is 4.903; the St. Dev. is .0353. This means that the mean offset is .097 seconds (5.00 – 4.903) and that my best estimate of how far my watch is “off” the exact atomic time is .097; that’s what we mean by mean offset. Since my baseline (luckily) was 0.00 seconds, this means that, over the 41 days in the time period since synchronization, the watch has gained .097 seconds. Thus, we understand the drift over that 41-day period as +.097 seconds. If, on the other hand, my baseline had been, say, +.025 seconds (because of not starting my watch at exactly the right time), the drift value would have been +.097 – +.025 = +.072 seconds.

As noted, the variability of the time values—indexed by the St. Dev.—depends on the consistency with which the stopwatch readings are made. I have by now found almost all of my St. Dev. values to lie between about .025 and .060. The St. Dev. for the 10 time measurements above was .0353, quite typical for me. However, this consistency will depend to some extent on the perceptual-motor precision exercised in the timing trials. This is not really all that crucial (the mean is of greatest importance), but reducing the St. Dev. for a set of timing observations as much as possible is all to the good. We’ll return to this below.

6. Prorating to Seconds per Year (SPY)

Our most common metric for understanding watch accuracy (at least on the HEQ forum) seems to be seconds per year, or SPY. So, if we are seeing AA seconds drift over a period of BB days, our SPY value will be (365/BB) ´ AA. In our above example, the +.097 drift value was over a period of 41 days, so that my best estimate of the watch’s accuracy in seconds per year is (365/41) ´ +.097 = +.8635 SPY.

7. Time-Period Issues for the Estimation of Drift

It is necessary to distinguish between (a) what we have referred to above as the Timing Interval—which is approximately the number of seconds between the starting and stopping of the stopwatch (it will be exactly that number if the watch is in perfect synchronization with the reference clock)—and (b) the elapsed time between the synchronization session and the testing (or follow-up) session when drift is being assessed. Let’s refer to this between-session elapsed time as the Time Period employed in the assessment of drift.

Although the precision of each timing session can be easily evaluated by the methods described above, drift estimates really depend on a Time Period of weeks or, preferably, months for our drift estimates prorated to one year to be free of excessive error. Jumping ahead just a little, I can calculate the error-interval magnitude associated with the seconds-per-year estimate above of +.8635 SPY. Based on a Time Period of 41 days (see above), I can state with .95 probability of being correct that the watch’s drift can be captured by an interval running from +.5381 sec. to +1.1889 sec.

However, consider what this error interval would look like if my prorated drift value of +.8635 SPY had been based on a Time Period of 2 days, instead of 41 days. Our interval would run from –5.8070 sec. to +7.5340 sec. That degree of uncertainty (± 6.6705 seconds, as opposed to the ±.3254 seconds for the 41-day Time Period) makes the 2-day-based SPY drift estimate of virtually no value. Had our Time Period been, instead of 41 days, 3 months, on the other hand, our SPY drift estimate would be within ±.1462 seconds error.

The degree of error in the SPY drift estimates does, of course, fall on a continuum paralleling a Time Period continuum ranging from the ridiculous (like several days) to the extremely desirable (like several months). In my opinion, any SPY drift estimate based on a Time Period of less than 3 weeks to 1 month simply reflects wasted time and effort.
---------------------------------------------------------------------------------------------------------------------
For many horological experimenters, the foregoing is likely sufficient to obtain really informative and usable results. However, for those who wish to delve more deeply into this topic, more procedures follow.
---------------------------------------------------------------------------------------------------------------------

8. Setting Confidence Intervals around Our Timing Estimates

This is a little (only a little) more complicated and is not strictly necessary if you are satisfied with a single estimated SPY value. We do, however, have the wherewithal to set limits around our estimate to capture the possible error in this estimate, as alluded to above.

(a) For a Mean Measurement Arising from a Single Timing Session

(i) For 10 Timing Trials (that is, 10 measurements at a session):

Take your St. Dev. and multiply it by .7153. Call this product C. Then the value
Mean Offset ± C is a 95% confidence interval for the exact offset. This means that this interval has probability .95 of capturing the exact offset from atomic time. Let’s take an example. In the above illustration of 10 timings, we obtained a mean offset of +.097 seconds. Since our St. Dev. was .0353, we obtain a value of .0253 (.0353 ´ .7153) for C. Therefore, our 95% confidence interval is .097 sec. ± .0253 sec. or (.0717 sec., .1223 sec.). Interpretively, we can say that the probability is .95 that the true offset is between .0717 seconds and .1223 seconds.

(ii) For 20 Timing Trials (that is, 20 measurements at a session):

Take your St. Dev. and multiply it by .4680. This is your C value for 20 timings. Again, our 95% confidence interval will be Mean Offset ± C. Again using our earlier example, but now pretending that it was based on 20 timings instead of 10, we have C = .0165, and our 95% confidence interval is .097 sec. ± .0165 sec. or (.0805 sec., .1135 sec.). This narrower bandwidth for our 95% confidence interval reflects the greater precision we have with 20, as opposed to 10, timings at the session.

(b) For an Estimate of Drift Arising from Two Timing Sessions

Here we have to realize that there will be random error associated with mean values obtained at both the initial synchronization session and the second session some weeks or months later used to assess drift over a known time period. This means that we need St. Dev. values from both sessions. A simplifying procedure is to use the average of the two St. Dev. values in what follows, or Avg. St. Dev. If you’ve forgotten the St. Dev. value from the synchronization session, just use the St. Dev. value from the second session in what follows.

(i) For 10 Timing Trials (10 timings in each session):

Take your average St. Dev. value (as noted above and preferable) or single (Session 2) St. Dev. value (if you’ve forgotten the St. Dev. value from the synchronization session), and multiply it by .9396. Call this D (for drift). Our best estimate of drift is our offset measurement at Time 2 minus our offset estimate at Time 1 (the synchronization session), or, Mean Offset T2 – Mean Offset T1. Call this value our Drift Estimate. This value will increase, of course, as the time lag between T1 and T2 increases, so that we have to understand it as drift over XX days. Since the XX in XX days is arbitrary, let’s agree to express drift in terms of seconds per year (or SPY). To obtain this, we simply multiple our Drift Estimate by (365/XX). For simplicity, let’s call this the SPY Drift Estimate. The corresponding (prorating) factor for the confidence interval, which we will label SPYD follows:

SPYD = (342.954 ´ Avg. St. Dev.)/#days of time lag or

SPYD = (342.954 ´ Time 2 St. Dev.)/#days of time lag.

Thus, our 95% confidence interval for the true, exact yearly drift of our watch is:

SPY Drift Estimate ± SPYD.

Example. In our earlier example, we saw that we prorated our .097 drift measurement—based on a 41-day time lag between Session 1 (synchronization) and Session 2—to +.8635 SPY [.097 ´ (365/41)]. Thus, our value for SPY Drift Estimate is +.8635 SPY. Further, we saw that our Time 2 St. Dev. value was .0353. At the synchronization session, we obtained a St. Dev. (over the 10 timings at that session) of .0425. Thus, our Avg. St. Dev. value is .0389. With a 41-day time lag between Time 1 and Time 2, our value for SPYD in this case is .3254 sec. [(.0389 ´ 342.954)/41]. Finally, our 95% confidence interval for the watch’s drift in seconds per year is (+.5381 sec., +1.1889 sec.), that is +.8635 seconds ± .3254 seconds. We would interpret this result as indicating that there is a .95 probability that the interval between +.5381 seconds and +1.1889 seconds contains the true SPY drift value for this watch.

(ii) For 20 Timing Trials (20 timings in each session):

All that changes for this situation is the value of SPYD, which now becomes:

SPYD = (233.663 ´ Avg. St. Dev.)/#days of time lag or

SPYD = (233.663 ´ Time 2 St. Dev.)/#days of time lag.

Thus, again, our 95% confidence interval for the true, exact yearly drift of our watch is:

SPY Drift Estimate ± SPYD.

Example. Suppose that, in our earlier example, the SPY Drift Estimate had been based on 20 timings per timing session, instead of 10. So, our SPY value is still +.8635, but our new SPYD value would be .2217 seconds [(.0389 ´ 233.663)/41]. (Earlier, with 10 timings per session, this value had been .3254, illustrating again the accuracy-enhancing effects of taking more sample timings.) So, in this case our 95% confidence interval is +.8635 seconds ± .2217 seconds or (+.6418 sec., +1.0852 sec.). We would therefore conclude, with .95 probability of being right, that this interval contains the true SPY drift value for this watch.

(c) Confidence Coefficients

We have used the confidence coefficient of .95 in all cases above, and it is undoubtedly true that this is the value employed in the construction of most confidence intervals in all fields that use inferential statistics. Nonetheless, more or less stringent confidence coefficients can be used, the most common alternatives to .95 being .99 and .90. Anyone wishing to probe this further can PM me, and I’ll work out the corresponding SPYD values.

9. A Final Bit of Statistical Esoterica

Well, it’s really not that esoteric. One way to robustify our estimates of means (making them more robust to distributional anomalies) is to use what statisticians have known about for decades—Trimmed Means. This is little more than common sense, and would be obvious to anyone. When you have a set of say 10 observations like offset values, you can calculate their mean as above, or you can trim away the values at each end of the set by omitting them from the calculation of the mean. Such a mean calculated on a trimmed distribution like this is referred to as a Trimmed Mean. It should be obvious immediately that such a mean value will very likely be more accurate than an untrimmed mean value, since any anomalous observations, or outliers, at each end of the distribution have been eliminated from the calculations. In the use of the Stopwatch Method, for example, it is possible to click the stopwatch on just a little too early on a single timing, and doing this will lead to too large a value for the time offset. Similarly, clicking on the stopwatch just a fraction of a second too late will lead to too small a value for the time offset. And, of course, this can work at both ends of the timing process—clicking the stopwatch on in response to the reference clock and clicking it off in response to your watch’s second hand.

We often use a 10% trimming rule with distributions that might contain outliers. With 10 observations, this would mean dropping the largest offset value and the smallest (10% at each end of the distribution). When I do timing experiments, I sometimes take 12 observations and drop the largest and the smallest from the set before calculating the mean. This represents an 8.33% trimming and leaves me with 10 usable observations. Here’s an example. Consider the following timing values obtained with a 5-second Timing Interval:

4.91; 4.82; 4.95; 4.87; 5.08; 4.83; 4.97; 4.89; 4.84; 4.95; 4.74; 4.90.

First, the untrimmed mean and standard deviation: mean = 4.8958333…, so that the Mean Offset is .1042, and the standard deviation = .0872. It seems likely that Observations 5 and 11 may well be outliers arising from choppy stopwatch technique—(a) starting too early and/or stopping too late (5.08) or (b) starting too late and/or stopping too early (4.74).

If we now trim the distribution to the middle 10 observations, our new (trimmed) mean is 4.893 (so that the Mean Offset is .1070), and the standard deviation is .0531. A general principle used by statistical researchers is to consider an observation an outlier if it is one standard deviation or more displaced from the next observation in the distribution. In the above set of timings, the 5.08 is well over one standard deviation (.0872) above the next highest recorded value (4.97), and the 4.74 is almost one full standard deviation below the next lowest observation (4.82). In this case, then, trimming the distribution would be a good idea, with the trimmed mean likely to be a better estimate of the true offset value than the untrimmed mean. I should note that we never trim from only one end of a distribution or use a different trimming percentage at each end.

Trimming does not introduce bias since we trim equally at each end of the distribution. What it does do is stabilize the mean estimate, freeing it up from distortions caused by outliers. The other consequence of trimming is, of course, a substantially-reduced standard deviation.

The sampling distribution of trimmed means, unfortunately, is not identical to that of untrimmed means, with the consequence being that the confidence-interval procedures described above cannot be used exactly as they are with untrimmed means. If, however, you use the untrimmed standard deviation in the calculation of the confidence intervals, you will come very close to exact confidence intervals for trimmed means.

10. Summary

There’s quite a bit of material above, and I’ve included this level of detail (particularly Points 8 and 9) for the benefit of those with a little statistical training or those wanting to delve deeply into the topic. However, it is certainly not necessary to go beyond Point 7 in the above in order to obtain very useful and accurate timing information using the Stopwatch Method. All that is necessary is to: (a) try to develop a consistent stopwatch process (which is really dead easy) so that your standard deviations are not too high, (b) use a sufficient number of observations to produce reasonably stable Mean Offset estimates (10 observations being completely satisfactory), (c) use a reasonable Time Period so that your prorated SPY estimates have reasonably small error bounds (as noted, a minimum of 3–4 weeks is recommended). If you do all these things, you will have extremely precise values from which to judge the accuracy of your watch.

One last point: Although some of the above may seem to some readers like advanced statistics, it really is anything but that. Almost all of the principles described have been known for 100 years and are routinely taught to undergraduates (and high-school students) encountering statistics for the first time.

I am fully prepared to answer questions on this piece and will revise it if necessary—with new information, and needed clarifications and corrections—in the future. Anyone who would like to comment on this material or suggest revisions or additions is more than welcome to PM me.
 

·
Registered
Joined
·
3,107 Posts
Some extra quick-info on chrono-method vs video-method

1. Ideally you should try both and only after actually trying in parallel get to conclusions ;-)


2. Keep in mind that BOTH methods rely on internet time (or a good GPS time) - for internet time there are very simple programs like AboutTime (I sync until 2 consecutive errors are both under 10 ms) and there are more advanced (and more accurate when used properly) programs, like the ntp packet - for Linux you probably know where to find out more and for Windows you can use the one from http://www.meinberg.de/download/ntp/windows/[email protected] for instance with a command-line like:

"c:\Program Files\NTP\bin\ntpdate.exe" -b -p 8 -t 8 time-a.nist.gov wwv.nist.gov time-a.timefreq.bldrdoc.gov time-nw.nist.gov nist1.symmetricom.com

For the chrono method you can also use the Java/browser applet started from http://www.time.gov/ - if the network conditions are very similar and you do 20 readings each time at two weeks interval the resulting accuracy can be much better than the one displayed by the actual applet (which is a kind of worst-case scenario and very often for me is 400 milliseconds).

3. If for various reasons you can only try the chrono method you MUST first do first AT LEAST ONE 'self-test' - meaning a set of at least 10 readings (I strongly recommend 20 readings) in which you try to measure (with the chrono) precisely 10 seconds on the test-watch - place the resulting 10-20 numbers in an Excel / OpenOffice Calc file and then calculate for them the AVERAGE() and the STDEV() which should tell you how accurate you can get IN IDEAL CONDITIONS. You will see that most likely the average will not be 10.000 seconds and the standard deviation will only get under 100 milliseconds after some training !!!
 

·
Registered
Joined
·
15 Posts
there are so many useful tips on how to detertiming the accuracy of a watch, so helpful tips! wow, i learn a lot here! thank you!:thanks
 

·
Registered
Joined
·
3,107 Posts
Another thing ...

This one should NOT make a huge difference if you follow some of the other advices carefully but might explain some of the numbers that you will see in your videos and more important will certainly explain why you should do things that way - with an internet time-sync just before starting your videos !

Certain computers have a much larger 'internal time drift' than others, and some are really bad - so it is not impossible to see the COMPUTER time drifting with more than 1 millisecond in about one minute - the more stable computers tend to generate the same millisecond numbers in corresponding frames - so a good watch in two videos 5 minutes apart will have the seconds-hand moving between let's say .221 and .252 in both videos, while in the presence of computer drift in the second video it might move between .226 and .257 !!! The difference is more obvious if you do video tests one week on one computer and next week on another, and you see that the videos on one instance were very constant in numbers and in the second instance there was a small drift ...

The above thing will only become a problem if you have one such very bad computer and you do a lot of timing videos in sequence - in that case you just need to repeat the internet time-sync every 10-15 minutes or so!
 

·
Registered
Joined
·
2,266 Posts
Establishing a Precise Baseline

The Importance of Establishing a Precise Baseline for Timing Measurements

As I was setting up a couple of watches for timing trials on the weekend, it occurred to me just how important it is to establish a precise baseline value for all future timing measurements. When assessing the accuracy of a watch movement, unless we are using sophisticated methods that capture the oscillation rate of the quartz crystal (procedures that are well beyond most watch owners in terms of both the necessary equipment and the expertise), we generally do this by comparing the extent to which the watch is “off” (that is, deviant from some precise time source) at a particular time point. However, in order to know what this means, we need to know how much “off” the watch was at some previous time period. The difference is termed the “drift” of the movement over that time period, which can be transformed into the seconds-per-year (or spy) scale.

Let’s look at an example. You wish to determine the accuracy of your new Yamamoto Super Duper Quartz watch. You proceed to set it as precisely as possible to the atomic clock at Time 0. One month later, you calculate the offset from perfect time and then relate that to the time period of one month in order to get an estimate in spy. Let’s say you get the following values: Time 0, set to the atomic clock (so 0 offset assumed). At Time 1, you find that your watch is .25 seconds ahead of the atomic clock—by one of the methods suggested in the earlier posts in this thread, the Stopwatch Method or the Video Method. You thus conclude that your watch has gained exactly .25 seconds in one month (the watch’s drift), and, therefore, is running at the rate of +3.0 spy (+.25 ´ 12 months).

The problem with this procedure is that your starting value—what we term the Baseline Value—may not be truly 0 as you had assumed. When you set a watch to be as close as possible to the exact time, you stop the second hand and then release it (by pushing in the crown) as closely as possible to a time signal of XX:XX:00 (such as 10:45:00). However, it would be very unlikely that you actually started the watch at exactly 0 seconds, simply because of human perceptual/motor error. You would be pretty close, but not precisely on the mark. So, using our earlier example, suppose that, unknown to you, you actually push in the crown a little late—say .15 seconds late (entirely possible). This means that your starting value is not 0, as you thought, but rather –.15 seconds. Now, one month later, when you calculate the drift and get the +.25 seconds from your calculation, your actual drift value is not +.25 seconds over that month, but rather +.40 seconds: .25 – (–.15), and your best estimate of drift in seconds per year is +4.8, rather than +3.0.

Or, of course, it could work the other way. Let’s say that, instead of your being slow in starting the watch with reference to the atomic clock, you push in the crown too soon, say, .15 seconds too soon, so that your actual starting offset is +.15 seconds. When you then go to determine drift after one month and get the +.25 value noted above (and the +3.0 spy year estimate), this +.25 value should, in fact, be referenced to the true starting value of +.15. This would mean that the watch had gained only .10 seconds in the one-month timing period, and that your correct estimate of seconds per year would be +1.20 spy (instead of the +3.0 you had calculated).

So the solution is very simple. When you set your watch to as close as possible to atomic time at the start of the timing trials, follow this up right away with a timing test. That is, do at Time 0, what you will do later at Time 0 + XX, and estimate the starting deviation from exact atomic time; that will be your baseline value for all future assessments of drift. If we go back to the previous examples, if we had done this in, say, the second example, our baseline would have been +.15 seconds. Then all future timing results would be referenced to that value, rather than to 0, and the resulting drift estimates would be relatively precise.

In my own timing experimentation, I’ve been able to get closer to 0 in my starting deviation from precise atomic time than + or – .15 seconds, but, in fact, it really doesn’t matter just what your starting deviation—or baseline value—is. This is because you are interested in the change (drift) from that baseline value over time. It is the latter that indexes the accuracy of your watch.
 

·
Registered
Joined
·
3,107 Posts
Re: Establishing a Precise Baseline

The Importance of Establishing a Precise Baseline for Timing Measurements
...
Yes, that is essential and is good that we have now strengthened it.

That also explains why on DST changes we need to do two sets of measurements for all watches in our tests which do not have 1-hour quick adjustments - the first measurement 'closes' the previous interval, then we need to adjust the DST, and then a new baseline has to be taken !!!
 

·
Registered
Joined
·
2,466 Posts
VIDEO METHOD - HIGH SPEED CAMERA - decent camera that can do movies at 25/30 (maybe even 50/60 or 100-200 if you have a 120 Hz monitor) frames/sec.; ideally it should also do those movies in 'macro mode';
Artec has kindly sent me his Casio EX-F20 high speed video camera to see if it would help get quicker results with the video method. As a reminder, with a 25fps camera, the resolution is 0.04 seconds and therefore the accuracy of the method over 24 hours is of 14spy and over a week of 2spy.

Tried it immediately but unfortunately the high speed mode requires a lot of light so I wasn't able to get a reading during my usual night-testing.

Tried again outdoors and while it's pretty fascinating to see the backlask on the seconds had of my 8F35 at 210fps (480x360 resolution), it's really hard to read the time display on my laptop as it seems to move in increments of 0.02s, i.e. a refresh rate of 50fps. Unfortunately the setting below that is a 30-210fps. Can't fine-tune it, probably an auto fps mode, will check the manual to see if it can be forced to 100fps.
 

·
Registered
Joined
·
2,266 Posts
Artec has kindly sent me his Casio EX-F20 high speed video camera to see if it would help get quicker results with the video method. As a reminder, with a 25fps camera, the resolution is 0.04 seconds and therefore the accuracy of the method over 24 hours is of 14spy and over a week of 2spy.
Interesting. If resolution is .04 seconds, this means that you should generally be within .04 seconds of the exact time shown on the screen. Therefore, if you do a single timing with the Video Method, your error bandwidth will arise from a combination of this "camera error" and the existing clock error, so that your true (full) error bandwidth might be something like ± .06 seconds. This follows directly from elementary theorems on variances of linear combinations.

With the Stopwatch Method, the error bandwidth for a single timing arises from a combination of clock error (as is true for the Video Method) and perceptual error (which replaces camera error with the Video Method). In my trials, perceptual error has an error bandwidth of about .10 seconds (and thus is considerably larger than the camera error encountered with the Video Method), which, when combined with the same clock error as seen with the Video Method, leads to a true (full) error bandwidth of something like ± .11 seconds. Thus, the error associated with a single Stopwatch Method timing result is about twice as large as that with a single Video Method timing result.

However, consider what happens when the results from a number of timing trials are averaged. If we take 10 results with the Stopwatch Method, the error bandwidth associated with the mean of the 10 results becomes about ± .035 - .040 seconds, this mean being more accurate than a single Video Method reading. Of course, the same proportional reduction would apply to an average using the Video Method too (this is all governed by very elementary statistical theory), so that, if, instead of taking a single Video Method result, we took 10 and averaged them, the error bandwidth would be reduced to about ± .02 seconds.

In my timing trials, I've been using 40 trials with the Stopwatch Method. This has reduced my error bandwidths to about ± .02 seconds. Clearly, this is more than adequately precise for our timing experiments.
 

·
Registered
Joined
·
2,266 Posts
Artec has kindly sent me his Casio EX-F20 high speed video camera to see if it would help get quicker results with the video method. As a reminder, with a 25fps camera, the resolution is 0.04 seconds and therefore the accuracy of the method over 24 hours is of 14spy and over a week of 2spy.
Not quite true. If the error associated with a single timing at the beginning of the 24-hour period is .04 seconds, as you have stated, it follows that there is also a similar error associated with the single timing at the end of the 24-hour period. The change (or drift) estimate relies on both measurements. Thus, the total error in the assessment of drift--by which you estimate accuracy--is about .06 seconds, or about 22 spy. This is a perfect illustration of why attempting to estimate spy using a one-day timing period will necessarily be far too imprecise.

However, your assertion of an error bandwidth of .04 seconds accounts only for camera error. What about clock error? This must be factored in, as noted in my just-preceding post. Once we do this, we get a single-measurement bandwidth of about ± .06 seconds, and this applies at each measurement point. Since two measurements are required to evaluate drift (that is to get a spy value), the actual error bandwidth of the change, or drift, calculation will be about ± .08 - .09 seconds. Given this fact, any estimate of spy using a single observation at each time point with the Video Method will produce a total error bandwidth of about ± 31 seconds for a one-day assessment of spy, and about ± 4.5 seconds for a spy estimate based on a one-week time period--that is, taking the offset from the atomic clock at Time 1 and then again at Time 1 + 7 days and using the change between the two as an estimate of drift, prorated to an annualized spy value.
 

·
Registered
Joined
·
3,107 Posts
Artec has kindly sent me his Casio EX-F20 high speed video camera to see if it would help get quicker results with the video method. As a reminder, with a 25fps camera, the resolution is 0.04 seconds and therefore the accuracy of the method over 24 hours is of 14spy and over a week of 2spy.

Tried it immediately but unfortunately the high speed mode requires a lot of light so I wasn't able to get a reading during my usual night-testing.

Tried again outdoors and while it's pretty fascinating to see the backlask on the seconds had of my 8F35 at 210fps (480x360 resolution), it's really hard to read the time display on my laptop as it seems to move in increments of 0.02s, i.e. a refresh rate of 50fps. Unfortunately the setting below that is a 30-210fps. Can't fine-tune it, probably an auto fps mode, will check the manual to see if it can be forced to 100fps.
Very interesting!!! However please also note my original reference to the refresh rate of the monitor - since I have an ordinary camera the way I am trying to use the video method is based more on the frame-rate of the monitor (than on the frame-rate of the camera), and even with my pretty old camera at 30 fps in 2-3 consecutive videos I can very often get enough cases where I can observe the 60 fps intervals from the monitor - meaning intervals of around 17 milliseconds
 

·
Registered
Joined
·
2,466 Posts
Yes same here with my 25fps camera, except it's 20ms. I was mostly interested in seeing what a high speed camera could do to improve the video method for observations over a shorter period of time but it seems it's not much, if anything. The monitor being the limiting factor and also the requirement for light at these frame rates. Still, these cameras create impressive slow motion videos so it was time well spent ;-)
 

·
Registered
Joined
·
3,107 Posts
Yes same here with my 25fps camera, except it's 20ms. I was mostly interested in seeing what a high speed camera could do to improve the video method for observations over a shorter period of time but it seems it's not much, if anything. The monitor being the limiting factor and also the requirement for light at these frame rates. Still, these cameras create impressive slow motion videos so it was time well spent ;-)
Yes, those slow-motion videos can be indeed spectacular :-!

But returning to the video method - now I realize that for plenty of people things might not be as easy as in my example - using a 30fps camera and a 60fps monitor could generate a slightly different kind of 'frame hunt' than when using a 25fps camera and a 60fps monitor - my old camera can only do 15 or 30 fps so I can't easily test the 25fps results but that sounds as an interesting test some day (or eventually a test with 30fps camera and 50fps monitor rate) ;-)
 

·
Registered
Joined
·
2,266 Posts
Catalin, why do you not just take several video readings (say 8-10), and use their average reading as your data point. The error component is reduced by n, where n is the number of readings you would take. In other words, if your error from a single observation is 20 ms., your error associated with the average of n = 10 readings would be 6.32 ms. That way, you'd overcome most of the frame-speed limitations. Or perhaps you do average (I can't remember), in which case, how many readings do you take?
 

·
Registered
Joined
·
3,107 Posts
Catalin, why do you not just take several video readings (say 8-10), and use their average reading as your data point. The error component is reduced by n, where n is the number of readings you would take. In other words, if your error from a single observation is 20 ms., your error associated with the average of n = 10 readings would be 6.32 ms. That way, you'd overcome most of the frame-speed limitations. Or perhaps you do average (I can't remember), in which case, how many readings do you take?
I believe the assumption above is ONLY valid when you deal in 'continuous values' where the errors are pretty random and probably smaller than any of the known sources of non-random errors - with the video method (and decent 'hardware') the values are not continuous but instead have certain 'quantified' steps in multiples of about 16.66 ms for the monitor and 33.33 ms for the camera.

I am generally making 2-3 videos of around 10 seconds (which provides about 20-30 pairs of relevant frames) each but not in order to compute clear averages but instead in order to be able to get to the higher degree of accuracy from the monitor steps (which is obviously twice better than the 'default one' from the camera).

It does not make enough sense at this point to try anything more than that - given the fact that the error from the initial Internet time-sync is still in the range of 10-20 ms (maybe closer to 5 ms. now that I use ntpdate with multiple servers) and that the error on the computer clock seems to be in the range of +1 ms/minute ...

Also from what I have seen many of the watches in my tests seem to 'move' the seconds hand in something around 10 ms but I have also seen a few clearly going from one position to the next in an interval higher than 33ms.
 

·
Registered
Joined
·
2,266 Posts
I believe the assumption above is ONLY valid when you deal in 'continuous values' where the errors are pretty random and probably smaller than any of the known sources of non-random errors - with the video method (and decent 'hardware') the values are not continuous but instead have certain 'quantified' steps in multiples of about 16.66 ms for the monitor and 33.33 ms for the camera.

I am generally making 2-3 videos of around 10 seconds (which provides about 20-30 pairs of relevant frames) each but not in order to compute clear averages but instead in order to be able to get to the higher degree of accuracy from the monitor steps (which is obviously twice better than the 'default one' from the camera).

It does not make enough sense at this point to try anything more than that - given the fact that the error from the initial Internet time-sync is still in the range of 10-20 ms (maybe closer to 5 ms. now that I use ntpdate with multiple servers) and that the error on the computer clock seems to be in the range of +1 ms/minute ...

Also from what I have seen many of the watches in my tests seem to 'move' the seconds hand in something around 10 ms but I have also seen a few clearly going from one position to the next in an interval higher than 33ms.
Good points. However, your point about averages only making sense with continuously-distributed random variables is not quite correct. Most of the data that we analyze in any branch of research is in the form of a discrete (in steps) variable overlaying a continuous variable. The numbers are discrete steps because of the impossibility of capturing the infinite number of observations associated with the underlying continuous random variable. This fact, however, doesn't stop us from computing means, standard deviations, and many other statistics from such discrete variables.
 

·
Registered
Joined
·
3,107 Posts
Good points. However, your point about averages only making sense with continuously-distributed random variables is not quite correct. Most of the data that we analyze in any branch of research is in the form of a discrete (in steps) variable overlaying a continuous variable. The numbers are discrete steps because of the impossibility of capturing the infinite number of observations associated with the underlying continuous random variable. This fact, however, doesn't stop us from computing means, standard deviations, and many other statistics from such discrete variables.
That is correct in certain conditions but not in all - for instance if you can only measure the height of a person in integer multiples of let's say feet (meters would be a little too extreme :-d) and you calculate the means over a REALLY large population the final result might still be surprisingly close to the means that you get from measurements at 1cm increments; however the standard deviation could be quite different (and that could also suggest a measurement limitation). On the other hand if you measure the height of a SINGLE person in increments of 1 feet no matter how many times you measure/average it the result will be (for like 99% of the people) a very constant number which will also be quite far away from the actual height measured in increments of 1cm!
 

·
Registered
Joined
·
2,266 Posts
Quite true. All that is required really, with a discrete random variable, is that the intervals be of the same size at all points of the scale. So, are you saying that your interval size is something on the order of 16.66 ms.? If so, would your measurements be like: -33.33, -16.66, 0 +.166, +33.33? That is pretty coarse all right, but wouldn't a mean of, say, 10 of these produce far better data points than just one observation?
 

·
Registered
Joined
·
3,107 Posts
Quite true. All that is required really, with a discrete random variable, is that the intervals be of the same size at all points of the scale. So, are you saying that your interval size is something on the order of 16.66 ms.? If so, would your measurements be like: -33.33, -16.66, 0 +.166, +33.33? That is pretty coarse all right, but wouldn't a mean of, say, 10 of these produce far better data points than just one observation?
At some point with a very, very large amount of extra data plus using the extra fact that I can estimate the (huge) drift of the computer clock and I can force a large number of NTP syncs plus eventually knowing the inhibition period and eventually assuming that we only talk about the calibers where the seconds-hand moves very fast (in under 10 ms.) we MIGHT be able to cut the errors to maybe 3-5 ms - the point however remains that it will be INCREDIBLY time-consuming - don't forget that already (with 17 watches) I was losing over 1 hour and we are now maybe talking about losing 1 hour for a single watch ... definitely <| in the current economy :-d
 
1 - 20 of 63 Posts
Top