ABIS Infor - 2020-12

1970 - 2020: 50 years since the "Unix Epoch"

Peter Vanroose (ABIS) - 14 January 2020

Summary

1 January 2020: an anniversary, actually a golden jubilee, for the Unix Epoch! The "era" of Unix systems indeed starts counting from 1970 onwards. Here is a sketch of the story behind this.

The Unix Epoch - never heard of?

Unix operating systems (including Linux) internally store time instances, e.g. the creation timestamp of files, or the point in time of an event like someone logging on to the system, or a shutdown message, or the time indication in an e-mail, by using the so-called Unix Time, also known as the Epoch Time. But what is this precisely?

I'll paraphrase from Wikipedia ( https://en.wikipedia.org/wiki/Unix_time) since their formulation is concise and precise:

"Unix Time (also known as Epoch time, POSIX time, or seconds since the Epoch) is a system for describing a point in time. It is the number of seconds that have elapsed since the Unix Epoch, that is the time 00:00:00 UTC on 1 January 1970. Unix Time is an integer which increments every second, without requiring the calculations to determine year, month, day of month, hour and minute required for intelligibility to humans."

So, the Unix Epoch is midnight of 1 January 1970, which means that a few days ago, at midnight of New Year's Day, the "Unix Epoch" celebrated its 50th birthday! At that moment, the Unix Time value was exactly (365*38+366*12) * 24 * 60 * 60 = 1577836800 (seconds).

To be a bit more precise, the Epoch is midnight UTC of 1 January 1970. Actually, UTC, or "Universal Time Coordinated", was only invented in 1972, but for all practical purposes it's identical to GMT, or "Greenwich Mean Time" also called "Zulu Time" (time zone Zero), which did exist in 1970: it's the (winter) time of Greenwich, or for that matter of Dublin, Reykjavik, Lisbon and Tenerife. For us, in the Benelux, it's one hour less than our local time in winter, and two hours less than our local time in summer.

The Unix Epoch - worth a celebration?

Actually: yes! The whole idea of (1) representing the "current time" independently of the local time (zone) of the current computer, and (2) always incrementing, that is, never jump back one hour once a year because of daylight saving time, and (3) using a single counter for both the day and the timestamp within a day, were revolutionary decisions at the time, that is, in the early 70s when Unix was "invented". Before, and actually also (much) later on other operating systems (e.g. z/OS or MS-Windows), the internal clock of a computer has always been set in local time. And thus in daylight saving time during the summer months.

In the current context of world-wide immediate communication and data exchange, this computer-independent definition of "current time" seems evident, but it certainly wasn't in the 70s, when the TCP/IP "internet" did not exist, e-mails could be delivered with several hours of delay, there were no GPS satellites, and computers were essentially stand-alone systems. So it certainly was a visionary decision of the Unix pioneers to not use local time internally for e.g. file timestamps.

There are actually two immediate advantages of a "UTC" choice versus a "local time" choice for file creation or modification timestamps:
(1) when exchanging files between remote computers, not only their data but also their metadata could be communicated and usefully interpreted at the receiving end; and
(2) more recently created or modified files will never have an older timestamp than previously created or modified files, even not between 2 a.m. and 3 a.m. in the night when we switch from summer time to winter time.

And on top of that, there is the evident advantage of using "seconds since a certain epoch" instead of two distinct parts of a timestamp indication, viz. "date" and "time": the "age" of a file is just the difference of two integer values (or make that decimal values if you want to have sub-second precision).

The Unix Epoch - and other epochs

The term "epoch", in general, means "a moment in time used as a reference" (start or zero point) from where to start counting time. Several epochs are in use, or have been in use: e.g. the fact that the current year is 2020 means that the "epoch" of our current calendar must be about 2020 year ago. In astronomy, the epoch is January 2000 at noon. The Julian Date has its epoch about 6733 years ago, on 1 January 4713 BC. And you probably know that some cultures or religions use a different calendar than ours, which always also means: a different epoch, that is, their definition for "year 1".

Although in principle one can "count back from" an epoch, that is: use negative time values, very often the epoch was chosen such that one could --for all practical purposes-- restrict oneself to non-negative time instances, that is, time instances after the epoch.

More specifically, in the context of computer systems, it's worth noting a few "interesting" epochs, viz.:

  • Software like Excel & Mathematica, and OSes like VME and CICS use (around) the beginning of 1900 as their "reference point". Together with a two-digit year representation, that caused the famous "Y2K" aka "year 2000" problems!
  • VMS and DVB use 17 November 1858 as its epoch. Why? Well, because that's the Julian Date with value 2400000 (days), so it saves them two digits: no need to always repeat the "24" at the beginning of a day number ... But they'll need a 6th digit (or redefine the epoch) after the year 2131 ...
  • COBOL, MS-Windows (since NT) and NTFS use 1 January 1601 as their epoch. This is the first year of the 400-year cycle of our Gregorian calendar of exactly 400*365 + 97 = 146097 days = 20871 weeks (because it contains 97 leap years: all 4-tuples except the non-400-fold multiples of 100). So, a calendar of exactly 400 years ago can be re-used.
  • SAS uses 1 January 1960 as the epoch.
  • DOS, all FAT variants, and the PC BIOS use 1 January 1980. Remarkably, the GPS system uses 6 January 1980 as its epoch: since GPS uses weeks (instead of days or seconds) as time units, and their weeks start on Sunday, they chose their epoch to be the first Sunday of 1980.

Time units since the Unix Epoch

The whole point of defining an epoch, in the context of a computer system, is the possibility to represent an unambiguous time instance with a single number, be it an integer or a fractional (decimal or floating-point) value. As may already be clear from the above examples, most systems (be it an operating system, or a file system, or a compiler) count in a combination of years and/or days, while some count in seconds. The Unix Time counts seconds since the Unix Epoch.

Once the choice for the epoch and for the unit of time is made, the next decision is the (internal) representation of the time instance value in these units. Depending on that choice, and more specifically the number of digits or bits of that representation, we'll have a kind of "Y2K" problem sooner or later, viz. when the counter "jumps back to zero".

Initially, Unix chose for an integer representation using 32 bits or 4 bytes (data type time_t): using the register size (the data type int, i.e. 16 bits on 16-bit processors) would have been absurd, since then the "Y2K" problem would already happen on 4 January 1970! So they went for a 2-register representation on 16-bit computers (and a 1-register representation on 32-bit computers), which seems to move ahead the "Y2K" problem to ... 7 February 2104. So we are safe for a few more decennia?

Actually, that's not quite correct, since those 32-bit integer values are to be interpreted as signed integers (time_t is a signed integer data type), hence only 31 bits are available for positive values. This sets the "end date" to ... 19 January 2038 at 03:14:08 UTC ! So watch out for a new "Y2K" problem in a few years ...
(Some interesting reading on this topic in https://en.wikipedia.org/wiki/Year_2038_problem .)

On the other hand, by using a "signed" interpretation of that 32-bit value, negative values represent times before 1970: actually, back to ... 13 December 1901 ! Anyhow, indeed, some (older) Unix (C) programs have a built-in time bomb: in the morning of 19 January 2038, it will all of a sudden be December 1901 ...

Wikimedia-Y2038p.gif

Note that all Unix based systems are potentially vulnerable to this problem. This includes e.g. Linux, macOS, and all Android variants (nowadays often used in embedded systems).

Actually, the most recent versions of most Unix-derived operating systems have (re)defined time_t to be a 64-bit integer, which means that time instances are (or soon will be) representable up to about 293000 million years from now, which is more than twenty times the age of the universe. No need to scale up anymore, it seems.

Unix Time in practice

If you have access to a Unix or Linux system, you've certainly used the ls command. Or most likely even ls -l, its "long" variant. The output will typically contain multiple lines which look something like this:

-rw-r--r-- 1 peter abis      510 Jun 21  2019 old_test.txt
-rw-r--r-- 1 peter abis      699 Jan 14 11:12 test.txt

On Linux, by adding the "--full-time" option, this becomes even somewhat more detailed:

-rw-r--r-- 1 peter abis      512 2019-06-21 19:07:43.568495200 +0200 old_test.txt
-rw-r--r-- 1 peter abis      699 2020-01-14 11:12:02.195578700 +0100 test.txt

This displays the "full" modification time of the files, as stored inside the file system. Or actually: what is stored is of course just the "number of seconds since the Epoch", and the ls command converts that to a human-readable format, which is the local time at the moment when the file was created: notice the two different time zones +0200 (summer time) and +0100 (winter time)!

A simpler way to get at the "full" modification time of a file, on any Unix-like system, is by using the stat command:

  File: old_test.txt
  Size: 510                     Blocks: 8               IO Block: 4096     regular file
Device: c63826bbh/3325568699d   Inode: 971290           Links: 1
Access: (0644/-rw-r--r--)       Uid: ( 48979/   peter)  Gid: (  500/    abis)
Access: 2020-01-14 13:33:51.966689300 +0100
Modify: 2019-06-21 19:07:43.568495200 +0200
Change: 2019-06-21 19:07:43.568495200 +0200

Apparently, on this file system, timestamps are in fractions of a second. But still, internally, those timestamps are stored as "seconds since the Epoch". Only, we can't see those internal values since we have to use Unix commands (like ls and stat) which automatically perform the translation into a human-readable form. But wait ... Linux is an open-source OS, so the source code of a GNU command like stat is available, e.g. from https://ftp.gnu.org/gnu/coreutils/

Examining the source code (written in C), we indeed see the conversion (which is straightforward but tedious, e.g. taking into account leap years etc.) but we also see where the internal timestamp value comes from: both the ls and the stat command implementations perform a so-called system call to the stat kernel function, which returns a struct with several fields, including the field named st_mtime which is of datatype time_t. If you like to experiment a bit yourself, and you have access to a C compiler (e.g. gcc), you could write a little program yourself that performs the stat system call on a file of your choice, and you then easily verify that you indeed get back the integer value 1561136863 for the human-readable 2019-06-21 19:07:43 +0200. I'll leave the maths for verifying the conversion as an exercise for the diligent reader.

An other interesting Unix command that gets a "seconds since the Epoch" answer from the kernel and converts it to a human-readable form is the date command: without arguments, it displays the current timestamp. Internally, it uses the system call time. This is even a much simpler C program to write, compile and execute yourself, in order to verify all that has been said so far:

#include <time.h>   /* for declarations of time() and time_t */
#include <stdio.h>  /* for declaration of printf() */
int main() {
  time_t t = time(0); /* "now" */
  printf("%ld\n", t);
}

There are more Unix commands that use timestamps in some way. There is e.g. the touch command that lets you set the modification time of a file; it converts a timestamp in human-readable form, given on the command line, to a Unix Time value (seconds since the Epoch) which it passes to the system call utimes.

There is also the tar command which is an archiver: it packs a set of files into a single archive file. Interesting in the context of this article is the fact that tar also stores the file metadata together with the file data into that archive: that is, the owner/group, the file permissions, and of course the file modification time. Not surprisingly, it stores that timestamp as the number of seconds since the Epoch. Surprisingly maybe, it does so in a textual, octal form; so search your tar file for an 11-digit number starting with 136, at least if your source file has a timestamp between roughly 23 December 2019 and 3 July 2020:

touch -t 202001141200.00 X.txt  #  create an empty file with timestamp 2020-01-14 12:00
tar cf X.tar X.txt              #  create the tar file X.tar, containing X.txt
strings X.tar                   #  obtain the "text-readable fragments" of that file

On the 6th output line of the last command, you'll see the octal string 13607317460, which is decimal 1578999600, which is 1577836800+1162800. Since 1577836800 corresponds to 1 January 2020 midnight UTC, and 1162800= 13*24*60*60+11*60*60 = 13 days + 11 hours, this does indeed correspond to 14 January 11:00 UTC, or 12:00 in "our" time zone +01:00.

Conclusion

Fifty years is a long time in IT. A very long time. This makes it is a very remarkable fact that a choice made nearly 50 year ago still stands. And most likely still for long after the year 2038 ...

But you should maybe not wait that long before learning more about Unix or Linux: we have several courses in our offering to help you getting started!