Time series data with inconsistently formatted dates and times can make your work frustrating. Dates and times are often stored as strings or text data and converting to a consistent, numeric format might seem like a daunting task.
Fortunately, GAUSS includes an easy tool for loading and converting dates and times – the
Using the date keyword
date keyword is used with GAUSS's formula string syntax to indicate to GAUSS that a variable is a date. The rest of the work is done internally and GAUSS automatically converts dates and times from strings to the POSIX format.
Consider the file bike_count.xlsx which contains string dates in a fairly standard
Month/Day/Year HH:MM:SS format:
Two steps must be taken to load the dates using the GAUSS
$in front of the variable name tells GAUSS that the variable is a string.
dateoperator tells GAUSS that the variable contains dates.
data = loadd("bike_count.xlsx", "date($Time) + Count");
The POSIX date format
date keyword converts string formatted dates into the POSIX date and time format. This format represents a date as a scalar number reflecting the seconds that have passed since midnight on January 1st, 1970.
Looking at the data we loaded earlier, the POSIX dates are stored in the first column:
One of the most convenient features of the
date keyword is that it internally detects a wide variety of string date and time formats. For example, dates stored as
can all be automatically detected and converted to POSIX time using the
date keyword. A full list of acceptable date formats is found at the bottom of this blog.
Loading non-standard formats
Though GAUSS can recognize a large number of date and time formats, you still may come across non-standard dates and times outside of the scope of the
Consider a file named
temperature.csv with the following contents:
"day","Phoenix","Seattle" "November-8-2018",81,51 "November-9-2018",79,45 "November-10-2018",79,50 "November-11-2018",24,50 "November-12-2018",66,59 "November-13-2018",65,56 "November-14-2018",70,55
The date format shown above is not one that GAUSS will automatically recognize. In these cases, it is helpful to write a custom GAUSS function which uses the GAUSS string to POSIX conversion function,
strctoposix function uses format specifiers to load custom dates.
Below is a GAUSS procedure which will take dates in the format shown above (i.e.
November-8-2018) and convert it to POSIX time.
proc (1) = loadMyDates(str); local fmt; // %B - full month name // %e - day of month // %Y - 4 digit year fmt = "%B-%e-%Y"; retp(strctoposix(str, fmt)); endp;
This function can now be used in place of the
date keyword to load your dates as shown below.
temp_data = loadd("temperature.csv", "loadMyDates($day) + Phoenix + Seattle");
This will create a GAUSS matrix named
temp_data with the contents shown below:
In this blog you've learned:
- How to load date and time data using the
- What date formats are compatible with the
- What format the
datekeyword converts string dates to.
- How to use a custom function to load non-standard dates.
Recognizable date formats
20181031 31-10-2018 10-31-2018 2018-10-31 10/31/2018 2018/10/31 31 October 2018 201810311830 20181031 1830 31-10-2018 18:30 2018-10-31 18:30 10/31/2018 18:30 2018/10/31 18:30 31 October 2018 18:30 20181031183000 20181031 183000 18:30:00 31-10-2018 18:30:00 2018-10-31 18:30:00 10/31/2018 18:30:00 2018/10/31 18:30:00 10/31/2018 18:30:00 PM 2018/10/31 18:30:00.000 10/31/2018 18:30:00.000 31 October 2018 18:30:00 2018-10-31T18:30 2018-10-31T18:30T 2018-10-31T18:30:00 2018-10-31T18:30:00 2018-10-31T18:30:00.000 2018-10-31T18:30:00.000