A client recently asked me to compare the sort orders of results from calls to their new system and the one it’s replacing.
This was intended as a one-time validation, and I was not expected to find any discrepancies. The responses were both in JSON format, and a colleague suggested that jq would be a good tool to use. I had no idea what that was or how to use it, so of course I took the advice.
The jq app is a command line tool that can read JSON, process the data, and create new files. My plan was to extract the data I needed to compare into files of identical format and then run a diff on them. Simple. In theory.
Unfortunately, data is fickle. All manner of inconsistencies in structure and format made it necessary to do a lot more processing than anyone had suspected. For this article, I’ll just focus on the dates—the eternal bane of developers.
The solution described below is specific to making timestamps from different time zones match, but the basic principles can be applied to any date-related processing in JSON. If you’re processing data to inject into a system, you can easily transform the dates to match what that system expects.
Need to sort entries by date? You can access as much of the timestamp as you need to sort on. It’s a great tool for converting dates to whatever format you require.
How to set a date in JSON
The first sort I was to verify was that the most recent things were at the top. (From here on out, the word “thing” will refer to the objects being compared. This is so you don’t get wind of the client’s secrets. Shh…) Looking at sample responses from both systems, I immediately saw a problem; the old system returned UTC timestamps but the new system returned UTC -4. In order to do a direct comparison, we’d have to normalize the timestamps.
New: 2023-05-13T21:49:44.348-0400
Old: 2023-05-14T01:49:44.000+00:00
My first reaction was to truncate the dates values from both to truly look at just the date. There was a 1 in 6 chance that the dates would be off, but eyeballing the diff would make it clear that they were in the same order, which was all I needed to show.
The same jq snippet worked for both results.
.date | split("T") | .[0]
Here’s what that code does. The initial .
is your current object, which in this case is an individual thing. On its own, .
simply returns the current thing, with all of its data. To access a field of thing, you append the field name to the object reference. So .date
returns the value of the thing’s date field. Pipes work exactly the same way in jq as in command line scripting, so we are grabbing the date and passing it to the next statement.
The split
function takes that date value and returns a 2 member array. The first entry is everything before the T
, which happens to be precisely what we’re after. Piping the full array to the final piece, .[0]
takes the date out of the array, leaving the time information on the metaphorical floor.
New: 2023-05-13
Old: 2023-05-14
Perfect! In 83% of the cases, our dates match, and the others can be seen to be only one day off. It’s the order that matters, not the actual dates.
Or so I thought. What I hadn’t been told was that the client wanted a report on the percentage of difference between the files. That meant that I had to resolve the timezone differences in order to not inflate the error percentage. Back to the drawing board!
Re-zoning the data
To make the dates match, I needed to turn both date strings into jq dates (represented as numbers), adjust the date from the new thing to be in UTC, and convert the dates to a human readable format. (I could have converted the date from the new thing to match the format of the old thing’s date, but I’d rather be able to read the result.)
The format of the date from the old system required less processing, so let’s look at that jq code first. Again, the .
context of this snippet is an individual thing.
.date
| split(".")
| .[0]
| strptime("%Y-%m-%dT%H:%M:%S")
| mktime
| strftime("%F %X")
The first part of the statement is the same as before, extracting the date string and piping it to a split
. This time we want to include the time, so the split is on .
, which separates the seconds from the milliseconds. The resulting array is piped to the next command, which references the first entry to pipe to the next command. So far, it’s similar to what we did in the first pass, but now we need to convert the date and time into the format for comparison.
At this point, the timestamp looks like this.
2023-05-14T01:49:44
A builtin jq function converts date strings into a jq date, but it only works for one timestamp format. This isn’t that format. Fortunately, jq provides hooks into many fairly standard commandline tools. More fortunately, they’re included on the MacBook Pros the client uses. To convert the timestamp into an array of numbers, I used strptime
, passing in the description of the format. That array is passed to mktime
, which converts it to a single number.
Lastly, we use another command line tool to convert the number into a date and a time.
2023-05-14 01:49:44
In retrospect, I could simply have replaced T
with an empty space, but now I knew more about handling dates in jq. That would be necessary for handling the date of the new system’s thing.
The date to be adjusted to UTC needed to be a number so that I could add the 4 hours to it. I also figured that since this one-off script might be dusted off and run while we were 5 hours off of UTC it should dynamically read the timezone adjustment.
.date
| split(".")
| .[0] as $tsString
| (.[1] | split("-") | .[1] | tonumber | ./100) as $tz
| $tsString
| strptime("%Y-%m-%dT%H:%M:%S")
| mktime+($tz*60*60)
| strftime("%F %X")
That’s not fun to read, but some of it is the same code used for the UTC date. The initial difference is after the seconds and milliseconds split. We still reference the first entry, but instead of passing it along we do this: .[0] as $tsString
. This is the syntax for assigning a variable in jq. The value 2023-05-13T21:49:44
is stored in the $tsString
variable. The pipe after this passes on the same context, not the variable, so the next command operates on the array that the split function produced.
If you follow the parentheses, that command is another setting of a variable.
(.[1] | split("-") | .[1] | tonumber | ./100) as $tz
This variable $tz
is set to the result of all the chain of operations within the parentheses. The reference .[1]
returns the milliseconds and timezone fragment of the original string: 348-0400
. We only want the timezone, which we get from splitting on -
and taking the 2nd entry of the result. Perhaps because it starts with a 0
, jq treated the timezone as a string, so a call to tonumber
effectively cast it to a number. Lastly, it divides the number (400) by 100 to convert it to hours.
(If we needed to account for timezones that were ahead of UTC, we would need to do it differently, but that seemed like overkill for our current needs.)
With the date and time string and the timezone modifier both stored in variables, the context of the initial seconds and milliseconds split is piped to the following command. This isn’t something that we need to use anymore, so the $tsString
variable is used to establish that value as the new context. Setting the timezone variable first would allow the flow to continue uninterrupted, but it seemed unintuitive to handle the last piece first.
The rest should be largely familiar from the code for the UTC date. The only difference is that once the timestamp is converted into a number we add a modifier to it. The number is in seconds, so we multiply the hour count in $tz
by 60 to convert to minutes and by 60 again to convert to seconds. Why not just multiply it once by 3600? Because I don’t know what 3600 is.
Final thoughts
I enjoyed using jq. It’s a powerful tool that met every challenge that popped up while I was comparing the systems. I learned quite a lot while working on this task, more than could be comfortably related in a single blog entry. This bit about the dates covered a good range of the base functionality. Further complexities included searching arrays for specific data, performing a secondary sort to remove system inconsistencies, normalizing the significant digits in float values, and removing duplicate data.
These jq commands grew too long to easily read as commandline instructions, so I moved them into jq files and imported them as modules into smaller command line queries. If you’d like to explore more about jq yourself, the jq Manual has a lot of good information: https://jqlang.github.io/jq/manual/
You can run the examples used in this post using the following commands. The tool works on files or strings, so it’s useful to create small JSON snippets to test logic on the command line. I figured out most of the code using this approach.
echo "{\"date\": \"2023-05-13T21:49:44.348-0400\"}" | jq '.date | split(".") | .[0] as $tsString | (.[1] | split("-") | .[1] | tonumber | ./100) as $tz | $tsString | strptime("%Y-%m-%dT%H:%M:%S") | mktime+($tz*60*60) | strftime("%F %X")'
echo "{\"date\": \"2023-05-14T01:49:44.000+00:00\"}" | jq '.date | split(".") | .[0] | strptime("%Y-%m-%dT%H:%M:%S") | mktime | strftime("%F %X")'
Happy querying!