timecounter.ms revision 119760
1.EQ 2delim �� 3.EN 4.\" 5.\" ---------------------------------------------------------------------------- 6.\" "THE BEER-WARE LICENSE" (Revision 42): 7.\" <phk@login.dknet.dk> wrote this file. As long as you retain this notice you 8.\" can do whatever you want with this stuff. If we meet some day, and you think 9.\" this stuff is worth it, you can buy me a beer in return. Poul-Henning Kamp 10.\" ---------------------------------------------------------------------------- 11.\" 12.\" $FreeBSD: head/share/doc/papers/timecounter/timecounter.ms 119760 2003-09-05 09:40:10Z jkoshy $ 13.\" 14.if n .ND 15.TI 16Timecounters: Efficient and precise timekeeping in SMP kernels. 17.AA 18.A "Poul-Henning Kamp" "The FreeBSD Project" 19.AB 20.PP 21The FreeBSD timecounters are an architecture-independent implementation 22of a binary timescale using whatever hardware support is at hand 23for tracking time. The binary timescale converts using simple 24multiplication to canonical timescales based on micro- or nano-seconds 25and can interface seamlessly to the NTP PLL/FLL facilities for clock 26synchronisation. Timecounters are implemented using lock-less 27stable-storage based primitives which scale efficiently in SMP 28systems. The math and implementation behind timecounters will 29be detailed as well as the mechanisms used for synchronisation. \** 30.AE 31.FS 32This paper was presented at the EuroBSDcon 2002 conference in Amsterdam. 33.FE 34.SH 35Introduction 36.PP 37Despite digging around for it, I have not been able to positively 38identify the first computer which knew the time of day. 39The feature probably arrived either from the commercial side 40so service centres could bill computer cycles to customers or from 41the technical side so computers could timestamp external events, 42but I have not been able to conclusively nail the first implementation down. 43.LP 44But there is no doubt that it happened very early in the development 45of computers 46and if systems like the ``SAGE'' [SAGE] did not know what time 47it was I would be amazed. 48.LP 49On the other hand, it took a long time for a real time clock to 50become a standard feature: 51.LP 52The ``Apple ]['' computer 53had neither in hardware or software any notion what time it was. 54.LP 55The original ``IBM PC'' did know what time it was, provided you typed 56it in when you booted it, but it forgot when you turned it off. 57.LP 58One of the ``advanced technologies'' in the ``IBM PC/AT'' was a battery 59backed CMOS chip which kept track of time even when the computer 60was powered off. 61.LP 62Today we expect our computers to know the time, and with network 63protocols like NTP we will usually find that they do, give and 64take some milliseconds. 65.LP 66This article is about the code in the FreeBSD kernel which keeps 67track of time. 68.SH 69Time and timescale basics 70.PP 71Despite the fact that time is the physical quantity (or maybe entity 72?) about which we know the least, it is at the same time [sic!] what we 73can measure with the highest precision of all physical quantities. 74.LP 75The current crop of atomic clocks will neither gain nor loose a 76second in the next couple hundred million years, provided we 77stick to the preventative maintenance schedules. This is a feat 78roughly in line with to knowing the circumference of the Earth 79with one micrometer precision, in real time. 80.LP 81While it is possible to measure time by means other than oscillations, 82for instance transport or consumption of a substance at a well-known 83rate, such designs play no practical role in time measurement because 84their performance is significantly inferior to oscillation based 85designs. 86.LP 87In other words, it is pretty fair to say that all relevant 88timekeeping is based on oscillating phenomena: 89.TS 90center; 91l l. 92sun-dial Earths rotation about its axis. 93calendar Ditto + Earths orbit around the sun. 94clockwork Mechanical oscillation of pendulum. 95crystals Mechanical resonance in quartz. 96atomic Quantum-state transitions in atoms. 97.TE 98.LP 99We can therefore with good fidelity define ``a clock'' to be the 100combination of an oscillator and a counting mechanism: 101.LP 102.if t .PSPIC fig3.eps 103.LP 104The standard second is currently defined as 105.QP 106The duration of 9,192,631,770 periods of the radiation corresponding to the transition between the two hyperfine levels of the ground state of the caesium 133 atom. 107.LP 108and we have frequency standards which are able to mark a sequence of 109such seconds 110with an error less than �2 cdot 10 sup{-15}� [DMK2001] with commercially 111available products doing better than �1 cdot 10 sup{-14}� [AG2002]. 112.LP 113Unlike other physical units with a conventionally defined origin, 114longitude for instance, the ephemeral nature of time prevents us 115from putting a stake in the ground, so to speak, and measure from 116there. For measuring time we have to rely on ``dead reckoning'', 117just like the navigators before Harrison built his clocks [RGO2002]: 118We have to tally how far we went from our reference point, keeping a 119running total at all times, and use that as our estimated position. 120.LP 121The upshot of this is, that we cannot define a timescale by any 122other means than some other timescale(s). 123.LP 124``Relative time'' is a time interval between two events, and for this 125we only need to agree on the rate of the oscillator. 126.LP 127``Absolute time'' consists of a well defined point in time and the 128time interval since then, this is a bit more tricky. 129.LP 130The Internationally agreed upon TAI and the UTC timescales 131starts at (from a physics point of view) arbitrary points in time 132and progresses in integral intervals of the standard second, with the 133difference being that UTC does tricks to the counting to stay roughly 134in sync with Earths rotation \**. 135.FS 136The first atomic based definition actually operated in a different way: 137each year would have its own value determined for the frequency of the 138caesium resonance, selected so as to match the revolution rate of the 139Earth. This resulted in time-intervals being very unwieldy business, 140and more and more scientists realized that that the caesium resonance 141was many times more stable than the angular momentum of the Earth. 142Eventually the new leap-second method were introduced in 1972. 143It is interesting to note that the autumn leaves falling on the 144northern hemisphere affects the angular momentum enough to change 145the Earths rotational rate measurably. 146.FE 147.LP 148TAI is defined as a sequence of standard seconds (the first timescale), 149counted from January 1st 1958 (the second timescale). 150.LP 151UTC is defined basically the same way, but every so often a leap-second 152is inserted (or theoretically deleted) to keep UTC synchronised 153with Earths rotation. 154.LP 155Both the implementation of these two, and a few others speciality 156timescales are the result of the 157combined efforts of several hundred atomic frequency standards in 158various laboratories and institutions throughout the world, all 159reporting to the BIPM in Paris who calculate the ``paper clock'' which 160TAI and UTC really are using a carefully designed weighting algorithm \**. 161.FS 162The majority of these clocks are model 5071A from Agilent (the test 163and measurement company formerly known as ``Hewlett-Packard'') which 164count for as much as 85% of the combined weight. 165A fact the company deservedly is proud of. 166The majority of the remaining weight is assigned to a handful of big 167custom-design units like the PTB2 and NIST7. 168.FE 169.LP 170Leap seconds are typically announced six to nine months in advance, 171based on precise observations of median transit times of stars and VLBI 172radio astronomy of very distant quasars. 173.LP 174The perceived wisdom of leap-seconds have been gradually decreasing 175in recent years, as devices and products with built-in calendar 176functionality becomes more and more common and people realize that 177user input or software upgrades are necessary to instruct the 178calendar functionality about upcoming leap seconds. 179.SH 180UNIX timescales 181.PP 182UNIX systems use a timescale which pretends to be UTC, but defined 183as the count of standard seconds since 00:00:00 01-01-1970 UTC, 184ignoring the leap-seconds. This definition has never been perceived 185as wise. 186.LP 187Ignoring leap seconds means that unless some trickery is performed 188when a leap second happens on the UTC scale, UNIX clocks would be 189one second off. Another implication is that the length of a 190time interval calculated on UNIX time_t variables, can be up to 22 191(and counting) seconds wrong relative to the same time interval 192measured on the UTC timescale. 193.LP 194Recent efforts have tried to make the NTP protocol make up for this 195deficiency by transmitting the UTC-TAI offset as part of the protocol. 196[MILLS2000A] 197.LP 198Fractional seconds are represented two ways in UNIX, ``timeval'' and 199``timespec''. Both of these formats are two-component structures 200which record the number of seconds, and the number of microseconds 201or nanoseconds respectively. 202.LP 203This unfortunate definition makes arithmetic on these two formats 204quite expensive to perform in terms of computer instructions: 205.DS 206.ps -1 207/* Subtract timeval from timespec */ 208t3.tv_sec = t1.tv_sec - t2.tv_sec; 209t3.tv_nsec = t1.tv_nsec - 210 t2.tv_usec * 1000; 211if (t3.tv_nsec >= 1000000000) { 212 t3.tv_sec++; 213 t3.tv_nsec -= 1000000000; 214} else if (t3.tv_nsec < 0) { 215 t3.tv_sec--; 216 t3.tv_nsec += 1000000000; 217} 218.ps +1 219.DE 220.LP 221While nanoseconds will probably be enough for most timestamping 222tasks faced by UNIX computers for a number of years, it is an 223increasingly uncomfortable situation that CPU clock periods and 224instruction timings are already not representable in the standard 225time formats available on UNIX for consumer grade hardware, 226and the first POSIX mandated API, \fCclock_getres(3)\fP has 227already effectively reached end of life as a result of this. 228.LP 229Hopefully the various standards bodies will address this issue 230better in the future. 231.SH 232Precision, Stability and Resolution 233.PP 234Three very important terms in timekeeping are ``precision'', 235``stability'' and ``resolution''. 236While the three words may seem to describe somewhat the 237same property in most uses, their use in timekeeping covers three 238very distinct and well defined properties of a clock. 239.LP 240Resolution in clocks is simply a matter of the step-size of the 241counter or in other words: the rate at which it steps. 242A counter running on a 1 MHz frequency will have a resolution 243of 1 microsecond. 244.LP 245Precision talks about how close to the intended rate the clock runs, 246stability about how much the rate varies and resolution about the 247size of the smallest timeinterval we can measure. 248.LP 249From a quality point of view, Stability is a much more 250valuable property than precision, this is probably best explained 251using a graphic illustration of the difference between the two 252concepts: 253.LP 254.if t .PSPIC fig1.eps 255.LP 256In the top row we have instability, the bullet holes are spread over 257a large fraction of the target area. 258In the bottom row, the bullets all hit in a very small area. 259.LP 260On the left side, we have lack of precision, the holes obviously are 261not centred on the target, a systematic offset exists. 262In the right side we have precision, the bullets are centred on 263the target \**. 264.FS 265We cannot easily get resolution into this analogy, the obvious 266representation as the diameter of the bullet-hole is not correct, 267it would have to be the grid or other pattern of locations where 268the bullet could possibly penetrate the target material, but this 269gets too quantum-mechanical-oid to serve the instructional purpose. 270.FE 271.LP 272Transposing these four targets to actual clocks, the situation 273could look like the following plots: 274.LP 275.if t .PSPIC fig2.eps 276.LP 277On the x-axis we have time and on the y-axis how wrong the clock 278was at a given point in time. 279.LP 280The reason atomic standards are such a big deal in timekeeping is 281that they are incredibly stable: they are able to generate an oscillation 282where the period varies by roughly a millionth of a billonth of a 283second in long term measurements. 284.LP 285They are in fact not nearly as precise as they are stable, but as 286one can see from the graphic above, a stable clock which is not 287precise can be easily corrected for the offset and thus calibrated 288is as good as any clock. 289.LP 290This lack of precision is not necessarily a flaw in these kinds of 291devices, once you get into the �10 cdot 10 sup{-15}� territory 292things like the blackbody spectrum at the particular absolute 293temperature of the clocks hardware and general relativistic 294effects mostly dependent on the altitude above earths center 295has to be corrected for \**. 296.FS 297This particularly becomes an issue with space-based atomic standards 298as those found on the ``Navstar'' GPS satellites. 299.FE 300.SH 301Design goals of timecounters 302.PP 303After this brief description of the major features of the local 304landscape, we can look at the design goals of timecounters in detail: 305.LP 306.I "Provide timestamps in timeval and timespec formats," 307.IP 308This is obviously the basic task we have to solve, but as was noted 309earlier, this is in no way the performance requirement. 310.LP 311.I "on both the ``uptime'' and the POSIX timescales," 312.IP 313The ``uptime'' timescale is convenient for time intervals which are 314not anchored in UTC time: the run time of processes, the access 315time of disks and similar. 316.IP 317The uptime timescale counts seconds starting from when the system 318is booted. The POSIX/UTC timescale is implemented by adding an 319estimate of the POSIX time when the system booted to the uptime 320timescale. 321.LP 322.I "using whatever hardware we have available at the time," 323.IP 324Which in a subtle way also implies ``be able to switch from one 325piece of hardware to another on the fly'' since we may not know 326right up front what hardware we have access to and which is 327preferable to use. 328.LP 329.I "while supporting time the NTP PLL/FLL discipline code," 330.IP 331The NTP kernel PLL/FLL code allows the local clock and timescale 332to be synchronised or syntonised to an external timescale either 333via network packets or hardware connection. This also implies 334that the rate and phase of the timescale must be manoeuvrable 335with sufficient resolution. 336.LP 337.I "and providing support for the RFC 2783 PPS API," 338.IP 339This is mainly for the benefit of the NTPD daemons communication 340with external clock or frequency hardware, but it has many other 341interesting uses as well [PHK2001]. 342.LP 343.I "in a SMP efficient way." 344.IP 345Timestamps are used many places in the kernel and often at pretty 346high rate so it is important that the timekeeping facility 347does not become a point of CPU or lock contention. 348.SH 349Timecounter timestamp format. 350.PP 351Choosing the fundamental timestamp format for the timecounters is 352mostly a question of the resolution and steer-ability requirements. 353.LP 354There are two basic options on contemporary hardware: use a 32 bit 355integer for the fractional part of seconds, or use a 64 bit which 356is computationally more expensive. 357.LP 358The question therefore reduced to the somewhat simpler: can we get 359away with using only 32 bit ? 360.LP 361Since 32 bits fractional seconds have a resolution of slightly 362better than quarter of a nanosecond (.2328 nsec) it can obviously 363be converted to nanosecond resolution struct timespec timestamps 364with no loss of precision, but unfortunately not with pure 32 bit 365arithmetic as that would result in unacceptable rounding errors. 366.LP 367But timecounters also need to represent the clock period of the 368chosen hardware and this hardware might be the GHz range CPU-clock. 369The list of clock frequencies we could support with 32 bits are: 370.TS 371center; 372l l n l. 373�2 sup{32} / 1� �=� 4.294 GHz 374�2 sup{32} / 2� �=� 2.147 GHz 375�2 sup{32} / 3� �=� 1.432 GHz 376\&... 377�2 sup{32} / (2 sup{32}-1)� �=� 1.000 Hz 378.TE 379We can immediately see that 32 bit is insufficient to faithfully 380represent clock frequencies even in the low GHz area, much less in 381the range of frequencies which have already been vapourwared by 382both IBM, Intel and AMD. 383QED: 32 bit fractions are not enough. 384.LP 385With 64 bit fractions the same table looks like: 386.TS 387center; 388l l r l. 389�2 sup{64} / 1� �=� � 18.45 cdot 10 sup{9}� GHz 390�2 sup{64} / 2� �=� � 9.223 cdot 10 sup{9}� GHz 391\&... 392�2 sup{64} / 2 sup{32}� �=� 4.294 GHz 393\&... 394�2 sup{64} / (2 sup{64}-1)� �=� 1.000 Hz 395.TE 396And the resolution in the 4 GHz frequency range is approximately one Hz. 397.LP 398The following format have therefore been chosen as the basic format 399for timecounters operations: 400.DS 401.ps -1 402struct bintime { 403 time_t sec; 404 uint64_t frac; 405}; 406.ps +1 407.DE 408Notice that the format will adapt to any size of time_t variable, 409keeping timecounters safely out of the ``We SHALL prepare for the 410Y2.038K problem'' war zone. 411.LP 412One beauty of the bintime format, compared to the timeval and 413timespec formats is that it is a binary number, not a pseudo-decimal 414number. If compilers and standards allowed, the representation 415would have been ``int128_t'' or at least ``int96_t'', but since this 416is currently not possible, we have to express the simple concept 417of multiword addition in the C language which has no concept of a 418``carry bit''. 419.LP 420To add two bintime values, the code therefore looks like this \**: 421.FS 422If the reader suspects the '>' is a typo, further study is suggested. 423.FE 424.LP 425.DS 426.ps -1 427uint64_t u; 428 429u = bt1->frac; 430bt3->frac = bt1->frac + bt2->frac; 431bt3->sec = bt1->sec + bt2->sec; 432if (u > bt3->frac) 433 bt3->sec += 1; 434.ps +1 435.DE 436.LP 437An important property of the bintime format is that it can be 438converted to and from timeval and timespec formats with simple 439multiplication and shift operations as shown in these two 440actual code fragments: 441.DS 442.ps -1 443void 444bintime2timespec(struct bintime *bt, 445 struct timespec *ts) 446{ 447 448 ts->tv_sec = bt->sec; 449 ts->tv_nsec = 450 ((uint64_t)1000000000 * 451 (uint32_t)(bt->frac >> 32)) >> 32; 452} 453.ps +1 454.DE 455.DS 456.ps -1 457void 458timespec2bintime(struct timespec *ts, 459 struct bintime *bt) 460{ 461 462 bt->sec = ts->tv_sec; 463 /* 18446744073 = 464 int(2^64 / 1000000000) */ 465 bt->frac = ts->tv_nsec * 466 (uint64_t)18446744073LL; 467} 468.ps +1 469.DE 470.LP 471.SH 472How timecounters work 473.PP 474To produce a current timestamp the timecounter code 475reads the hardware counter, subtracts a reference 476count to find the number of steps the counter has 477progressed since the reference timestamp. 478This number of steps is multiplied with a factor 479derived from the counters frequency, taking into account 480any corrections from the NTP PLL/FLL and this product 481is added to the reference timestamp to get a timestamp. 482.LP 483This timestamp is on the ``uptime'' time scale, so if 484UNIX/UTC time is requested, the estimated time of boot is 485added to the timestamp and finally it is scaled to the 486timeval or timespec if that is the desired format. 487.LP 488A fairly large number of functions are provided to produce 489timestamps, depending on the desired timescale and output 490format: 491.TS 492center; 493l r r. 494Desired uptime UTC/POSIX 495Format timescale timescale 496_ 497bintime binuptime() bintime() 498timespec nanouptime() nanotime() 499timeval microuptime() microtime() 500.TE 501.LP 502Some applications need to timestamp events, but are not 503particular picky about the precision. 504In many cases a precision of tenths or hundreds of 505seconds is sufficient. 506.LP 507A very typical case is UNIX file timestamps: 508There is little point in spending computational resources getting an 509exact nanosecond timestamp, when the data is written to 510a mechanical device which has several milliseconds of unpredictable 511delay before the operation is completed. 512.LP 513Therefore a complementary shadow family of timestamping functions 514with the prefix ``get'' have been added. 515.LP 516These functions return the reference 517timestamp from the current timehands structure without going to the 518hardware to determine how much time has elapsed since then. 519These timestamps are known to be correct to within rate at which 520the periodic update runs, which in practice means 1 to 10 milliseconds. 521.SH 522Timecounter math 523.LP 524The delta-count operation is straightforward subtraction, but we 525need to logically AND the result with a bit-mask with the same number 526(or less) bits as the counter implements, 527to prevent higher order bits from getting set when the counter rolls over: 528.DS 529.ce 530.EQ 531Delta Count = (Count sub{now} - Count sub{ref}) ~ BITAND ~ mask 532.EN 533.DE 534The scaling step is straightforward. 535.DS 536.ce 537.EQ 538T sub{now} = Delta Count cdot R sub{counter} + T sub{ref} 539.EN 540.DE 541The scaling factor �R sub{counter}� will be described below. 542.LP 543At regular intervals, scheduled by \fChardclock()\fP, a housekeeping 544routine is run which does the following: 545.LP 546A timestamp with associated hardware counter reading is elevated 547to be the new reference timecount: 548.DS 549 550.ce 551.EQ 552Delta Count = (Count sub{now} - Count sub{ref}) ~ BITAND ~ mask 553.EN 554 555.ce 556.EQ 557T sub{now} = Delta Count cdot R sub{counter} 558.EN 559 560.ce 561.EQ 562Count sub{ref} = Count sub{now} 563.EN 564 565.ce 566.EQ 567T sub{ref} = T sub{now} 568.EN 569.DE 570.LP 571If a new second has started, the NTP processing routines are called 572and the correction they return and the counters frequency is used 573to calculate the new scaling factor �R sub{counter}�: 574.DS 575.ce 576.EQ 577R sub{counter} = {2 sup{64} over Freq sub{counter}} cdot ( 1 + R sub{NTP} ) 578.EN 579.DE 580Since we only have access to 64 bit arithmetic, dividing something 581into �2 sup{64}� is a problem, so in the name of code clarity 582and efficiency, we sacrifice the low order bit and instead calculate: 583.DS 584.ce 585.EQ 586R sub{counter} = 2 cdot {2 sup{63} over Freq sub{counter}} cdot ( 1 + R sub{NTP} ) 587.EN 588.DE 589The �R sub{NTP}� correct factor arrives as the signed number of 590nanoseconds (with 32 bit binary fractions) to adjust per second. 591This quasi-decimal number is a bit of a square peg in our round binary 592hole, and a conversion factor is needed. 593Ideally we want to multiply this factor by: 594.DS 595.ce 596.EQ 5972 sup {64} over {10 sup{9} cdot 2 sup{32}} = 4.294967296 598.EN 599.DE 600This is not a nice number to work with. 601Fortunately, the precision of this correction is not critical, we are 602within an factor of a million of the �10 sup{-15}� performance level 603of state of the art atomic clocks, so we can use an approximation 604on this term without anybody noticing. 605.LP 606Deciding which fraction to use as approximation needs to carefully 607consider any possible overflows that could happen. 608In this case the correction may be as large as \(+- 5000 PPM which 609leaves us room to multiply with about 850 in a multiply-before-divide 610setting. 611Unfortunately, there are no good fractions which multiply with less 612than 850 and at the same time divide by a power of two, which is 613desirable since it can be implemented as a binary shift instead of 614an expensive full division. 615.LP 616A divide-before-multiply approximation necessarily results in a loss 617of lower order bits, but in this case dividing by 512 and multiplying 618by 2199 gives a good approximation where the lower order bit loss is 619not a concern: 620.DE 621.EQ 6222199 over 512 = 4.294921875 623.EN 624.DE 625The resulting error is an systematic under compensation of 10.6PPM 626of the requested change, or �1.06 cdot 10 sup -14� per nanosecond 627of correction. 628This is perfectly acceptable. 629.LP 630Putting it all together, including the one bit we put on the alter for the 631Goddess of code clarity, the formula looks like this: 632.DS 633.ce 634.EQ 635R sub{counter} = 2 cdot {{2 sup{63} + 2199 cdot {R sub{NTP}} over 1024} over Freq sub{counter}} 636.EN 637.DE 638Presented here in slightly unorthodox format to show the component arithmetic 639operations as they are carried out in the code. 640.SH 641Frequency of the periodic update 642.PP 643The hardware counter should have a long enough 644period, ie, number of distinct counter values divided by 645frequency, to not roll over before our periodic update function 646has had a chance to update the reference timestamp data. 647.LP 648The periodic update function is called from \fChardclock()\fP which 649runs at a rate which is controlled by the kernel parameter 650.I HZ . 651.LP 652By default HZ is 100 which means that only hardware with a period 653longer than 10 msec is usable. 654If HZ is configured higher than 1000, an internal divider is 655activated to keep the timecounter periodic update running 656no more often than 2000 times per second. 657.LP 658Let us take an example: 659At HZ=100 a 16 bit counter can run no faster than: 660.DS 661.ce 662.EQ 6632 sup{16} cdot {100 Hz} = 6.5536 MHz 664.EN 665.DE 666Similarly, if the counter runs at 10MHz, the minimum HZ is 667.DS 668.ce 669.EQ 670{10 MHz} over {2 sup{16}} = 152.6 Hz 671.EN 672.DE 673.LP 674Some amount of margin is of course always advisable, 675and a factor two is considered prudent. 676.LP 677.SH 678Locking, lack of ... 679.PP 680Provided our hardware can be read atomically, that our arithmetic 681has enough bits to not roll over and that our clock frequency is 682perfectly, or at least sufficiently, stable, we could avoid the 683periodic update function, and consequently disregard the entire 684issue of locking. 685We are seldom that lucky in practice. 686.LP 687The straightforward way of dealing with meta data updates is to 688put a lock of some kind on the data and grab hold of that before 689doing anything. 690This would however be a very heavy-handed approach. First of 691all, the updates are infrequent compared to simple references, 692second it is not important which particular state of meta data 693a consumer gets hold of, as long as it is consistent: as long 694as the �Count sub{ref}� and �T sub{ref}� are a matching pair, 695and not old enough to cause an ambiguity with hardware counter 696rollover, a valid timestamp can be derived from them. 697.LP 698A pseudo-stable-storage with generation count method has been 699chosen instead. 700A ring of ten ``timehands'' data structures are used to hold the 701state of the timecounter system, the periodic update function 702updates the next structure with the new reference data and 703scaling factor and makes it the current timehands. 704.LP 705The beauty of this arrangement lies in the fact that even though 706a particular ``timehands'' data structure has been bumped from being 707the ``currents state'' by its successor, it still contains valid data 708for some amount of time into the future. 709.LP 710Therefore, a process which has started the timestamping process but 711suffered an interrupt which resulted in the above periodic processing 712can continue unaware of this afterwards and not suffer corruption 713or miscalculation even though it holds no locks on the shared 714meta-data. 715.if t .PSPIC fig4.eps 716.LP 717This scheme has an inherent risk that a process may be de-scheduled for 718so long time that it will not manage to complete the timestamping 719process before the entire ring of timehands have been recycled. 720This case is covered by each timehand having a private generation number 721which is temporarily set to zero during the periodic processing, to 722mark inconsistent data, and incremented to one more than the 723previous value when the update has finished and the timehands 724is again consistent. 725.LP 726The timestamping code will grab a copy of this generation number and 727compare this copy to the generation in the timehands after completion 728and if they differ it will restart the timestamping calculation. 729.DS 730.ps -1 731do { 732 th = timehands; 733 gen = th->th_generation; 734 /* calculate timestamp */ 735} while (gen == 0 || 736 gen != th->th_generation); 737.ps +1 738.DE 739.LP 740Each hardware device supporting timecounting is represented by a 741small data structure called a timecounter, which documents the 742frequency, the number of bits implemented by the counter and a method 743function to read the counter. 744.LP 745Part of the state in the timehands structure is a pointer to the 746relevant timecounter structure, this makes it possible to change 747to a one piece of hardware to another ``on the fly'' by updating 748the current timehands pointer in a manner similar to the periodic 749update function. 750.LP 751In practice this can be done with sysctl(8): 752.DS 753.ps -1 754sysctl kern.timecounter.hardware=TSC 755.ps +1 756.DE 757.LP 758at any time while the system is running. 759.SH 760Suitable hardware 761.PP 762A closer look on ``suitable hardware'' is warranted 763at this point. 764It is obvious from the above description that the ideal hardware 765for timecounting is a wide binary counter running at a constant 766high frequency 767and atomically readable by all CPUs in the system with a fast 768instruction(-sequence). 769.LP 770When looking at the hardware support on the PC platform, one 771is somewhat tempted to sigh deeply and mutter ``so much for theory'', 772because none of the above parameters seems to have been on the 773drawing board together yet. 774.LP 775All IBM PC derivatives contain a device more or less compatible 776with the venerable Intel i8254 chip. 777This device contains 3 counters of 16 bits each, 778one of which is wired so it can interrupt the CPU when the 779programmable terminal count is reached. 780.LP 781The problem with this device is that it only has 8bit bus-width, 782so reading a 16 bit timestamp takes 3 I/O operations: one to latch 783the count in an internal register, and two to read the high and 784low parts of that register respectively. 785.LP 786Obviously, on multi-CPU systems this cannot be done without some 787kind of locking mechanism preventing the other CPUs from trying 788to do the same thing at the same time. 789.LP 790Less obviously we find it is even worse than that: 791Since a low priority kernel thread 792might be reading a timestamp when an interrupt comes in, and since 793the interrupt thread might also attempt to generate a timestamp, 794we need to totally block interrupts out while doing those three 795I/O instructions. 796.LP 797And just to make life even more complicated, FreeBSD uses the same 798counter to provide the periodic interrupts which schedule the 799\fChardclock()\fP routine, so in addition the code has to deal with the 800fact that the counter does not count down from a power of two and 801that an interrupt is generated right after the reloading of the 802counter when it reaches zero. 803.LP 804Ohh, and did I mention that the interrupt rate for hardclock() will 805be set to a higher frequency if profiling is active ? \** 806.FS 807I will not even mention the fact that it can be set also to ridiculous 808high frequencies in order to be able to use the binary driven ``beep'' 809speaker in the PC in a PCM fashion to output ``real sounds''. 810.FE 811.LP 812It hopefully doesn't ever get more complicated than that, but it 813shows, in its own bizarre and twisted way, just how little help the 814timecounter code needs from the actual hardware. 815.LP 816The next kind of hardware support to materialise was the ``CPU clock 817counter'' called ``TSC'' in official data-sheets. 818This is basically a on-CPU counter, which counts at the rate 819of the CPU clock. 820.LP 821Unfortunately, the electrical power needed to run a CPU is pretty 822precisely proportional with the clock frequency for the 823prevailing CMOS chip technology, so 824the advent of computers powered by batteries prompted technologies 825like APM, ACPI, SpeedStep and others which varies or throttles the 826CPU clock to match computing demand in order to minimise the power 827consumption \**. 828.FS 829This technology also found ways into stationary computers from 830two different vectors. 831The first vector was technical: Cheaper cooling solutions can be used 832for the CPU if they are employed resulting in cheaper commodity 833hardware. 834The second vector was political: For reasons beyond reason, energy 835conservation became an issue with personal computers, despite the fact 836that practically all north American households contains 4 to 5 household 837items which through inefficient designs waste more power than a 838personal computer use. 839.FE 840.LP 841Another wiggle for the TSC is that it is not usable on multi-CPU 842systems because the counter is implemented inside the CPU and 843not readable from other CPUs in the system. 844.LP 845The counters on different CPUs are not guaranteed 846to run syntonously (ie: show the same count at the same time). 847For some architectures like the DEC/alpha architecture they do not even 848run synchronously (ie: at the same rate) because the CPU clock frequency 849is generated by a small SAW device on the chip which is very sensitive 850to temperature changes. 851.LP 852The ACPI specification finally brings some light: 853it postulates the existence of a 24 or 32 bit 854counter running at a standardised constant frequency and 855specifically notes that this is intended to be used for timekeeping. 856.LP 857The frequency chosen, 3.5795454... MHz\** 858.FS 859The reason for this odd-ball frequency has to be sought in the ghastly 860colours offered by the original IBM PC Color Graphics Adapter: It 861delivered NTSC format output and therefore introduced the NTSC colour 862sync frequency into personal computers. 863.FE 864 is not quite as high as one 865could have wished for, but it is certainly a big improvement over 866the i8254 hardware in terms of access path. 867.LP 868But trust it to Murphys Law: The majority of implementations so far 869have failed to provide latching suitable to avoid meta-stability 870problems, and several readings from the counter is necessary to 871get a reliable timestamp. 872In difference from the i8254 mentioned above, we do not need to 873any locking while doing so, since each individual read is atomic. 874.LP 875An initialization routine tries to test if the ACPI counter is properly 876latched by examining the width of a histogram over read delta-values. 877.LP 878Other architectures are similarly equipped with means for timekeeping, 879but generally more carefully thought out compared to the haphazard 880developments of the IBM PC architecture. 881.LP 882One final important wiggle of all this, is that it may not be possible 883to determine which piece of hardware is best suited for clock 884use until well into or even after the bootstrap process. 885.LP 886One example of this is the Loran-C receiver designed by Prof. Dave Mills 887[MILLS1992] 888which is unsuitable as timecounter until the daemon program which 889implements the software-half of the receiver has properly initialised 890and locked onto a Loran-C signal. 891.SH 892Ideal timecounter hardware 893.LP 894As proof of concept, a sort of an existentialist protest against 895the sorry state describe above, the author undertook a project to 896prove that it is possible to do better than that, since none of 897the standard hardware offered a way to fully validate the timecounter 898design. 899.LP 900Using a COTS product, ``HOT1'', from Virtual Computers Corporation 901[VCC2002] containing a FPGA chip on a PCI form factor card, a 26 902bit timecounter running at 100MHz was successfully implemented. 903.LP 904.if t .PSPIC fig5.eps 905.LP 906.LP 907In order to show that timestamping does not necessarily have to 908be done using unpredictable and uncalibratable interrupts, an 909array of latches were implemented as well, which allow up to 10 910external signals to latch the reading of the counter when 911an external PPS signal transitions from logic high to logic 912low or vice versa. 913.LP 914Using this setup, a standard 133 MHz Pentium based PC is able to 915timestamp the PPS output of the Motorola UT+ GPS receiver with 916a precision of \(+- 10 nanoseconds \(+- one count which in practice 917averages out to roughly \(+- 15 nanoseconds\**: 918.FS 919The reason the plot does not show a very distinct 10 nanosecond 920quantization is that the GPS receiver produces the PPS signal from 921a clock with a roughly 55 nanosecond period and then predicts in 922the serial data stream how many nanoseconds this will be offset 923from the ideal time. 924This plot shows the timestamps corrected for this ``negative 925sawtooth correction''. 926.FE 927.LP 928.if t .PSPIC gps.ps 929.LP 930It shold be noted that the author is no hardware wizard and 931a number of issues in the implementation results in less than 932ideal noise performance. 933.LP 934Now compare this to ``ideal'' timecounter to the normal setup 935where the PPS signal is used 936to trigger an interrupt via the DCD pin on a serial port, and 937the interrupt handler calls \fCnanotime()\fP to timestamp 938the external event \**: 939.FS 940In both cases, the computers clock frequency controlled 941with a Rubidium Frequency standard. 942The average quality of crystals used for computers would 943totally obscure the curves due to their temperature coefficient. 944.FE 945.LP 946.if t .PSPIC intr.ps 947.LP 948It is painfully obvious that the interrupt latency is the 949dominant noise factor in PPS timestamping in the second case. 950The asymetric distribution of the noise in the second plot 951also more or less entirely invalidates the design assumption 952in the NTP PLL/FLL kernel code that timestamps are dominated 953by gaussian noise with few spikes. 954.SH 955Status and availability 956.PP 957The timecounter code has been developed and used in FreeBSD 958for a number of years and has now reached maturity. 959The source-code is located almost entirely in the kernel source file 960kern_tc.c, with a few necessary adaptations in code which 961interfaces to it, primarily the NTP PLL/FLL code. 962.LP 963The code runs on all FreeBSD platforms including i386, alpha, 964PC98, sparc64, ia64 and s/390 and contains no wordsize or 965endianess issues not specifically handled in the sourcecode. 966.LP 967The timecounter implementation is distributed under the ``BSD'' 968open source license or the even more free ``Beer-ware'' license. 969.LP 970While the ability to accurately model and compensate for 971inaccuracies typical of atomic frequency standards are not 972catering to the larger userbase, but this ability and precision 973of the code guarntees solid support for the widespread deployment 974of NTP as a time synchronization protocol, without rounding 975or accumulative errors. 976.LP 977Adding support for new hardware and platforms have been 978done several times by other developers without any input from the 979author, so this particular aspect of timecounters design 980seems to work very well. 981.SH 982Future work 983.PP 984At this point in time, no specific plans exist for further 985development of the timecounters code. 986.LP 987Various micro-optimizations, mostly to compensate for inadequate 988compiler optimization could be contemplated, but the author 989resists these on the basis that they significantly decrease 990the readability of the source code. 991.SH 992Acknowledgements 993.PP 994.EQ 995delim �� 996.EN 997The author would like to thank: 998.LP 999Bruce Evans 1000for his invaluable assistance 1001in taming the evil i8254 timecounter, as well as the enthusiastic 1002resistance he has provided throughout. 1003.PP 1004Professor Dave Mills of University of Delaware for his work on 1005NTP, for lending out the neglected twin Loran-C receiver and for 1006picking up the glove when timecounters made it clear 1007that the old ``microkernel'' NTP timekeeping code were not up to snuff 1008[MILLS2000B]. 1009.PP 1010Tom Van Baak for helping out, despite the best efforts of the National 1011Danish Posts center for Customs and Dues to prevent it. 1012.PP 1013Corby Dawson for helping with the care and feeding for caesium standards. 1014.PP 1015The staff at the NELS Loran-C control station in B�, Norway for providing 1016information about step-changes. 1017.PP 1018The staff at NELS Loran-C station Ei�e, Faeroe 1019Islands for permission to tour their installation. 1020.PP 1021The FreeBSD users for putting up with ``micro uptime went backwards''. 1022.SH 1023References 1024.LP 1025[AG2002] 1026Published specifications for Agilent model 5071A Primary Frequency 1027Standard on 1028.br 1029http://www.agilent.com 1030.LP 1031[DMK2001] 1032"Accuracy Evaluation of a Cesium Fountain Primary Frequency Standard at NIST." 1033D. M. Meekhof, S. R. Jefferts, M. Stephanovic, and T. E. Parker 1034IEEE Transactions on instrumentation and measurement, VOL. 50, NO. 2, 1035APRIL 2001. 1036.LP 1037[PHK2001] 1038"Monitoring Natural Gas Usage" 1039Poul-Henning Kamp 1040http://phk.freebsd.dk/Gasdims/ 1041.LP 1042[MILLS1992] 1043"A computer-controlled LORAN-C receiver for precision timekeeping." 1044Mills, D.L. 1045Electrical Engineering Department Report 92-3-1, University of Delaware, March 1992, 63 pp. 1046.LP 1047[MILLS2000A] 1048Levine, J., and D. Mills. "Using the Network Time Protocol to transmit International Atomic Time (TAI)". Proc. Precision Time and Time Interval (PTTI) Applications and Planning Meeting (Reston VA, November 2000), 431-439. 1049.LP 1050[MILLS2000B] 1051"The nanokernel." 1052Mills, D.L., and P.-H. Kamp. 1053Proc. Precision Time and Time Interval (PTTI) Applications and Planning Meeting (Reston VA, November 2000), 423-430. 1054.LP 1055[RGO2002] 1056For an introduction to Harrison and his clocks, see for 1057instance 1058.br 1059http://www.rog.nmm.ac.uk/museum/harrison/ 1060.br 1061or for 1062a more detailed and possibly better researched account: Dava 1063Sobels excellent book, "Longitude: The True Story of a Lone 1064Genius Who Solved the Greatest Scientific Problem of His 1065Time" Penguin USA (Paper); ISBN: 0140258795. 1066.LP 1067[SAGE] 1068This ``gee-wiz'' kind of article in Dr. Jobbs Journal is a good place to 1069start: 1070.br 1071http://www.ddj.com/documents/s=1493/ddj0001hc/0085a.htm 1072.LP 1073[VCC2002] 1074Please consult Virtual Computer Corporations homepage: 1075.br 1076http://www.vcc.com 1077