GPS date fail due to week counter rollover (not a gpsd bug)

Discussion:

f***@free.fr

2010-10-16 11:40:47 UTC

Hello Guys

This may be of interest to this list

I recently used a GPS module as a time and date reference in an embedded project.

As GPS has a Y2K style bug, the week counter rollover, which happens every 1024 weeks (roughly 20 years), i tried the receiver i use with a test generator capable of sending a GPS signal with a random timestamp.

And guess what, my module handles this wrongly, and jumps back to 1999 if a cold boot is done after April 6, 2019

several issues with that :

- the date is invalid, obviously
- UTC time is correct, but a local time calculated on this will sometimes be off +- 1hr, due to wrong DST calculation
- some receivers may not get a position (mine was fixing correctly)
- Finally, gpsd (and other SW probably) interprets the date "99" in GPRMC as 2099, which gives a 32bit overflow in unix time, gpsmon reports an error "negative date" and a date in 1969 (probably a -1 returned from the unix time lib), if i enable the GPZDA frame on the Rx, it reports "correctly" 1999. A 64 bit system will probably report 2099

Some suggestions :

- correcting the GPRMC date detection to 20xx except 1999 for "99" would perhaps be a good thing to do. At least gpsd will be synchron to the GPS info, and no averflow will happen on 32 bit time_t systems. Of course we loose one year before the 2100 bug, but fair enough, that's an order of magnitude away from now than 2019

- trying to compensate the 20 year error can be dangerous coz based on the machine time which can be corrupt also, and, for some applications,will later be based on GPS, etc... but it's feasible if the detection is based on a proper window (ex : machine_utc - gps_utc = n*1024 weeks +- 1 month). The non-trivial part is calculating offsets of exactly n*1024 weeks in UTC in seconds...

- maintaining a list of tested devices and bad guys. Crappy GPS chipsets exist, and will probaqbly still be in production in 2019 (and many systems in use will fail due to crappy design). People need to test devices with test transmitters. I volunteer.

The first device on the non-compliant craplist :
GPS-320FW, from RF solutions http://www.rfsolutions.co.uk
Sold by RS for 42Euro
probably manufactured by unitraq as GP-320FW : http://www.unitraq.com/ (TW)
The chipset is : SE4100L (front end) & PL-6313 (prolific. Yes, these guys that make USB<>RS232 adapter chips with crappy drivers)

doc : http://www.rfsolutions.co.uk/acatalog/DS320-2.pdf
binary protocol : http://www.rfsolutions.co.uk/acatalog/DS-41COM-2.pdf (specific, as usual, someone ever seen one that starts with %% ?)
Don't know who does the firmware. The device sends out "$PLCS,REV,PLN012054S07,070312,145743" at reset., (not documented !!)

some logs (various NMEA settings): http://f4eru.free.fr/gpslog/

http://tycho.usno.navy.mil/gps_week.html

http://www.swaviator.com/html/issueJJ99/WNROandGPS.html
http://www.colorado.edu/geography/gcraft/notes/gps/gpseow.htm
...

Sto

f4eru at free.fr

Eric Raymond

2010-12-16 08:44:31 UTC

Permalink

This post might be inappropriate. Click to display it.

Eric Raymond

2010-12-17 22:23:32 UTC

Permalink

This post might be inappropriate. Click to display it.

Greg Troxel

2010-12-18 01:11:24 UTC

Permalink

I wouldn't be worrying about this in 2010 except that some of our embedded
deployments might have 9-year lifetimes. What should we do? What
*can* we do?

A few thoughts:

Declare that disambiguating the week field into time is the GPS
receiver's problem. (I replaced a ROM in a Truetime XL-DC timing
receiver, sent to me at no charge, during the W1K kerfluffle.) It's
easy enough to have nvram/flash, operator entry, or a rom that knows
when it was compiled and then works for ~19 years.

If there are GPS receivers that support getting time at powerup to get
the above right, help them.

Have gpsd disambiguate 2-digit years using the local clock. But since
we release often relative to 100-year rollover, that's not a worry and
the current "assume 20XX" seems fine.

Also, while I don't argue with your characterization that 64-bit Linux
has 64-bit time_t and thus can cope and 32-bit Linux has 32-bit time_t
and cannot, I find it a bit odd that this type changes size based on
processor, and suggest we be careful about the difference between time_t
size and processor word size.

In NetBSD, time_t is 32 bits, on all architectures, through NetBSD 5.
NetBSD-current has, and thus the upcoming NetBSD 6 will have, 64-bit
time_t on all architectures. (System calls are versioned so old
binaries will still work.) I am unclear on the status of the other
BSDs, but assume they'll change to 64-bit time_t long before it's
needed.

Is there any expectation that Linux on i386 will change to 64-bit
time_t?

Eric Raymond

2010-12-18 14:32:50 UTC

Permalink

Post by Greg Troxel
Declare that disambiguating the week field into time is the GPS
receiver's problem.

Well, in an absolute sense of course it is. I'm just trying to come up
with ways to minimize the pain.

Post by Greg Troxel
If there are GPS receivers that support getting time at powerup to get
the above right, help them.
Have gpsd disambiguate 2-digit years using the local clock. But since
we release often relative to 100-year rollover, that's not a worry and
the current "assume 20XX" seems fine.

Combining these: Up to now, we've had a policy of not trusting the system
clock. But maybe we could trust it to a limited extent at gpsd startup.
Rules something like this:

1. If the system clock is zero or negative, emit a warning and disable
all heuristics. (Negative probably means we're on a 32-bit machine
after the 2038 rollover.)

2. If it's positive, use it to set both the century and the time of the
last GPS week rollover. (For that latter, subtract the GPS epoch and divide
by 1024 * SECONDS_PER_WEEK.)

3. If an incoming ZDA conflicts with the year set in step 2. emit a warning.

Post by Greg Troxel
Also, while I don't argue with your characterization that 64-bit Linux
has 64-bit time_t and thus can cope and 32-bit Linux has 32-bit time_t
and cannot, I find it a bit odd that this type changes size based on
processor, and suggest we be careful about the difference between time_t
size and processor word size.

Historically, I believe that time_t size was tied to processor word size
bacause time was typedefed to either int (on 32-bit machines) or long
(on 16-bit machines).

Post by Greg Troxel
Is there any expectation that Linux on i386 will change to 64-bit
time_t?

I don't know.

--
<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>

Michael Cook

2010-12-18 14:48:29 UTC

Permalink

Post by Eric Raymond
Combining these: Up to now, we've had a policy of not trusting the system
clock. But maybe we could trust it to a limited extent at gpsd startup.
1. If the system clock is zero or negative, emit a warning and disable
all heuristics. (Negative probably means we're on a 32-bit machine
after the 2038 rollover.)
2. If it's positive, use it to set both the century and the time of the
last GPS week rollover. (For that latter, subtract the GPS epoch and divide
by 1024 * SECONDS_PER_WEEK.)

Rather than checking only for positive values, it probably would make
sense to check that the system clock is plausible. Often, embedded
systems come up thinking its early 1970 and the system clock will report
small positive values until the clock is set (e.g., from GPS info). If
the system clock says the year is less than, say, 2010, it probably
should be ignored.

Post by Eric Raymond
3. If an incoming ZDA conflicts with the year set in step 2. emit a warning.

Eric Raymond

2010-12-19 14:56:40 UTC

Permalink

Post by Michael Cook
Rather than checking only for positive values, it probably would
make sense to check that the system clock is plausible. Often,
embedded systems come up thinking its early 1970 and the system
clock will report small positive values until the clock is set
(e.g., from GPS info). If the system clock says the year is less
than, say, 2010, it probably should be ignored.

Better, I think: use the GPS epoch of 6 Jan 1981 00:00:00 as the
cutoff value. Captures the entire 1970s and ensures that in the
admittedly unlikely event that someone has back-timed a virtual
machine the results for all historical logs of GPS data will still be
correct. Besides, I'm going to need this constant in the code for
computing the GPS rollover period.

--
<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>

Greg Troxel

2010-12-18 14:52:52 UTC

Permalink

Post by Eric Raymond

Post by Greg Troxel
Declare that disambiguating the week field into time is the GPS
receiver's problem.

Well, in an absolute sense of course it is. I'm just trying to come up
with ways to minimize the pain.

Sure, but it's an important philisophical point.

Post by Eric Raymond

Combining these: Up to now, we've had a policy of not trusting the system
clock. But maybe we could trust it to a limited extent at gpsd startup.

I concur "not trusting" is in the large sensible. But "what century is
it" seems fair.

Post by Eric Raymond
1. If the system clock is zero or negative, emit a warning and disable
all heuristics. (Negative probably means we're on a 32-bit machine
after the 2038 rollover.)

You mean "a machine with 32-bit time_t" :-) There's nothing wrong with
32-bit machines with 64-bit time_t.

I don't think you need to worry about 2038 now. In 2030, make gpsd fail
to compile on machines with 32-bit time_t.

Post by Eric Raymond
2. If it's positive, use it to set both the century and the time of the
last GPS week rollover. (For that latter, subtract the GPS epoch and divide
by 1024 * SECONDS_PER_WEEK.)
3. If an incoming ZDA conflicts with the year set in step 2. emit a warning.

Sure, that sounds fine - plus disable any time sync.

Post by Eric Raymond

Historically, I believe that time_t size was tied to processor word size
bacause time was typedefed to either int (on 32-bit machines) or long
(on 16-bit machines).

I would characterize that as a decision that time_t should be 32 bits
and using int/long to get the equivalent of the not-yet-specified
int32_t.

Diego Berge

2011-01-11 19:56:44 UTC

Permalink

Post by Eric Raymond

Combining these: Up to now, we've had a policy of not trusting the system
clock. But maybe we could trust it to a limited extent at gpsd startup.

f***@free.fr

2011-01-13 20:17:45 UTC

Permalink

Hello

In my opinion, all these methods are quite dangerous. As probably only a handful of devices will fail, and the most critical ones are those that are often ON, I think the most reliable is to detect a step back > 15 years, and only if the device type has not changed.

Another option is to not compensate, and warn people to use validated devices for critical applications.

Regards

F4eru

Christoph Riehl
***@free.fr

----- Mail Original -----
De: "Diego Berge" <***@navlost.eu>
À: gpsd-***@lists.berlios.de
Envoyé: Mardi 11 Janvier 2011 20h56:44 GMT +01:00 Amsterdam / Berlin / Berne / Rome / Stockholm / Vienne
Objet: Re: [Gpsd-dev] GPS date fail due to week counter rollover (not a gpsd bug)

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Post by Eric Raymond

Combining these: Up to now, we've had a policy of not trusting the system
clock. But maybe we could trust it to a limited extent at gpsd startup.

As an alternative, if your receiver outputs, and gpsd captures, the
UTC message (page 18.4 of the navigation message), you're in luck. In
such case you can use the leap seconds adjustment (ΔtLS, ΔtLSF) to
either determine (by means of a lookup table) or estimate (by relying on
the fact that the Earth rotation rate is slowing down) the rough year.

I haven't thought this through completely, but I think it should be
close enough to disambiguate adequately for the foreseeable future and
more deterministic than relying on the local clock, assuming you have
one at all. The disadvantage is that probably not all receivers output
the requisite information.

Comments?

Regards,
Diego Berge.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk0stfwACgkQmlKEtSWrbj03XgCeOlHyFR9aq+eWDPelfJKSdPyn
yIAAn1l2c3MgdvpE2Mni6fV/VjRIdzr4
=/FHc
-----END PGP SIGNATURE-----
_______________________________________________
Gpsd-dev mailing list
Gpsd-***@lists.berlios.de
https://lists.berlios.de/mailman/listinfo/gpsd-dev

Eric Raymond

2011-01-14 18:28:10 UTC

Permalink

Post by f***@free.fr
In my opinion, all these methods are quite dangerous. As probably
only a handful of devices will fail, and the most critical ones are
those that are often ON, I think the most reliable is to detect a
step back > 15 years, and only if the device type has not changed.

It's not just a handful of devices that will fail. Any given release
of GPS firmware can be designed to operate in at most two rollover
periods - from when it's issued to the next zero-counter event, and
from then to when the counter is at the second before its issue date
in the previous period. If the vendor is stupid, the GPS will stop
reporting dates sooner than that - after the next zero-counter event.

*All* receivers will fail on their first cold-start outside this span,
because they won't have the information needed to tell that they aren't
starting up during a past week in their rollover period of origin.

Post by f***@free.fr
As an alternative, if your receiver outputs, and gpsd captures, the
UTC message (page 18.4 of the navigation message), you're in luck. In
such case you can use the leap seconds adjustment (ÎtLS, ÎtLSF) to
either determine (by means of a lookup table) or estimate (by relying on
the fact that the Earth rotation rate is slowing down) the rough year.
I haven't thought this through completely, but I think it should be
close enough to disambiguate adequately for the foreseeable future and
more deterministic than relying on the local clock, assuming you have
one at all. The disadvantage is that probably not all receivers output
the requisite information.

It is probably true that we could curve-fit the decline in Earth's
orbital speed to produce a function from leap-second count to a year
estimate. That's a very clever idea. But there's a problem with it.

The problem with is that the estimate wouldn't be accurate within
about two years - as we speak, it's January 2010 and the last leap
second was at the end of December 2008. In the future, that two-year
slop could easily put our estimate on the wrong side of a recent or
near-future week rollover from actual present time.

However, there's a variant of this idea I can and will use. We know
the leap-second offsets for past times; if we have a leap-second
offset, we can at least check whether it's inconsistent with the year
the GPS is reporting. This will detect some rollovers.

So, for example, if the GPS reports it's 1985, but the current leap-second
offset is greater than 5 seconds, we can't say exactly what rollover
period we're in, but we can know for sure it isn't the right one and
log a rollover error.

--
<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>

f***@free.fr

2011-01-15 08:41:38 UTC

Permalink

Eric,

I don't agree with you.

Most (quality) devices handle this properly.
As every device today has flash memory to store firmware and parameters, the manufacturers implemented a simple routine that keep track of the number of rollovers (the higher bits of the Week counter, in fact) in their flasyh memory.

Of course, to not be desync, you have to make one fix at least every 10-15 years for the device to detect and store the rollover.
We can assume that, i think.

If i find some time, i will test this on some devices i have at hand (7 year steps + cold boot) to see how far it goes before the end of the world as we knowit :)

If i remember correctly, this is even patented by some manufacturer

Christoph Riehl
***@free.fr

----- Mail Original -----
De: "Eric Raymond" <***@thyrsus.com>
À: gpsd-***@lists.berlios.de
Envoyé: Vendredi 14 Janvier 2011 19h28:10 GMT +01:00 Amsterdam / Berlin / Berne / Rome / Stockholm / Vienne
Objet: Re: [Gpsd-dev] GPS date fail due to week counter rollover (not a gpsd bug)

Post by Eric Raymond
In my opinion, all these methods are quite dangerous. As probably
only a handful of devices will fail, and the most critical ones are
those that are often ON, I think the most reliable is to detect a
step back > 15 years, and only if the device type has not changed.

Post by Eric Raymond
As an alternative, if your receiver outputs, and gpsd captures, the
UTC message (page 18.4 of the navigation message), you're in luck. In
such case you can use the leap seconds adjustment (ΔtLS, ΔtLSF) to
either determine (by means of a lookup table) or estimate (by relying on
the fact that the Earth rotation rate is slowing down) the rough year.
I haven't thought this through completely, but I think it should be
close enough to disambiguate adequately for the foreseeable future and
more deterministic than relying on the local clock, assuming you have
one at all. The disadvantage is that probably not all receivers output
the requisite information.

--
<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>

Chris Kuethe

2011-01-16 01:07:45 UTC

Permalink

Post by f***@free.fr
As every device today has flash memory to store firmware and parameters, the manufacturers implemented a simple routine that keep track of the number of rollovers (the higher bits of the Week counter, in fact) in their flasyh memory.

... except when it's not flash. A very popular design is to store the
configuration memory in SRAM backed by a supercapacitor or a battery.
Lose power, lose memory - a handy effect when you're debugging
protocol switching over bluetooth.

Eric Raymond

2011-01-16 23:06:31 UTC

Permalink

Post by Chris Kuethe

Indeed. I know for a fact that the extremely popular BU-353 is built like
this. I suspect many other inexpensive GPS mice are too.

--
<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>

Eric Raymond

2011-01-16 23:05:04 UTC

Permalink

Post by f***@free.fr
If i find some time, i will test this on some devices i have at hand
(7 year steps + cold boot) to see how far it goes before the end of
the world as we knowit :)

Please do. We have encountered one device - the GPS-320FW - that we know
does *not* handle this correctly. It would be nice to have correct functioning
of some other devices verified.

--
<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>

Diego Berge

2011-01-16 17:33:38 UTC

Permalink

Post by Eric Raymond
It is probably true that we could curve-fit the decline in Earth's
orbital speed to produce a function from leap-second count to a year
estimate. That's a very clever idea. But there's a problem with it.
The problem with is that the estimate wouldn't be accurate within
about two years - as we speak, it's January 2010 and the last leap
second was at the end of December 2008. In the future, that two-year
slop could easily put our estimate on the wrong side of a recent or
near-future week rollover from actual present time.
However, there's a variant of this idea I can and will use. We know
the leap-second offsets for past times; if we have a leap-second
offset, we can at least check whether it's inconsistent with the year
the GPS is reporting. This will detect some rollovers.

That's correct. What I would propose would be something as follows:

1. UTC message received
2. Obtain current Δt from UTC message
3. Lookup Δt in precompiled table (Δt, date_firstvalid, date_lastvalid)
4. If current date as reported by GPS is between the values obtained
from the table, all good, stop.
5. If values disagree, a week rollover error must have occurred,
correct the GPS reported date if unambiguous, or report an error, and stop.
6. If Δt is not found in the precompiled table, and Δt ≤ ΔtN + ε, then
curve fit a date and compare against currently reported date.
7. If values from previous calculation agree, stop.
8. If values disagree, report an error, stop. Do not attempt to correct.

Where:

* Δt is the same as ΔtUTC as defined in 20.3.3.5.2.4 in IS-GPS-200E
* date_{first,last}valid are the times when a particular Δt first
becomes valid, and just before the next leap second occurrence. Note
that this a logical, not physical, representation (meaning one only
needs to store a list of (Δt, time_of_occurrence) pairs.
* ΔtN is the last Δt value in our precompiled table. This would be
expected to be up to six months in the future, counting from the release
date of a particular gpsd version (leaps second
adjustments/non-adjustments are announced about six months in advance).
* ε is a small integer (e.g., 0, ±1 or ±2) such that a curve (or
linear) fit of the last few values in our precompiled table would have a
fighting chance of success. This covers the case where the user has a
non-recent but not too old version of gpsd.

Notes/caveats:

* Keep in mind that leap seconds may be positive or negative (hence why
ε is not defined as a small *positive* integer).
* In the event of negative leap seconds, the precompiled table will
contain ambiguities, which could cause false negatives. It also makes it
unadvisable to correct the rollover error in the presence of an ambiguity.
* Similarly, the application of a polynomial fit for values outside our
table introduces the possibility of false positives (the probability
increasing directly with time, hence why ε), so although we can report
if we think something might be out of whack, it would not be wise to
correct the rollover error, as we might be dealing with a false
positive. As Eric says, in light of this it might not even be worth
bothering with steps 6-8.

Christoph,

I am not entirely clear what are the risks involved with this method
(this could always be complemented by the -15 years test that you
suggest). Could you please give your opinion on the above approach?

I do think that this kind of correction might be slightly on the
overkill side of things, but if it's cleanly implemented, I can't see it
introducing an excessive amount of complexity in the code.

If nothing else, it puts gpsd on the "only we can do this" category,
and creates prior art which might deprive someone of a preposterous
patent someday.

Regards,
Diego Berge.