Normalizing datetimes with timezones

Volume 13, Issue 44; 10 Nov 2010

There's an Information Studio plugin for normalizing date formats, but it doesn't handle timezones. Here's an approach that does.

Information Studio includes support for two different kinds of plugins: collectors and transformers. Transformers make it easy to delete or rename elements, normalize dates, perform schema validation, or run arbitrary XQuery or XSLT scripts.

One of the transformers is the Normalize Dates transformer. The normalize dates transformer can convert a variety of formats, such as “MM/DD/YYYY”, into ISO 8601 dates or datetimes. MarkLogic Server 4.2-1 ships with support for about eight formats and I added another half-dozen or so a few days ago. But none of them support timezones.

There's enough variety in timezone formats, including some irreconcilable problems like cultural interpretations of timezone abbreviations (is CST US Central Standard Time or China Standard Time?), that it's not clear that there is a good, general solution.

But you probably don't need a completely general solution, you need a solution that solves your problems.

You can do that with an XSLT transform. To get you started, here are a couple of examples.

The numtz.xsl stylesheet transforms datetimes with numeric timezones. For example, it will turn IETF 5322 date-time formats such as “Mon, 05 Jun 2008 08:12:35 +0700” into ISO 8601, “2008-06-05T08:12:35+07:00”.

The second stylesheet, tzabbr.xsl, handles the US timezones, “Mon, 05 Jun 2008 08:12:35 EDT” → “2008-06-05T08:12:35-04:00”. There's a table at the top of the stylesheet that you can update to support different timezones.

Both stylesheets look for an element named “dt”; adjust the match pattern accordingly.

You can use either of them by just pasting them into the XSLT transform in Information Studio.

Share and enjoy.