CVS, Dates, and Validation
The CVS $Date$ keyword doesn’t validate as an ISO 8601 date/time.
Men are generally idle, and ready to satisfy themselves, and intimidate the industry of others, by calling that impossible which is only difficult.
I’ve been working more-and-more with DocBook NG. Mostly I’m pretty satisfied, but it’s definitely not without surprises. Handling “universal linking” has turned out, for example, to be tricky. It’d be easier with XSLT 2.0, but I’m not ready to make that commitment yet. I mean literally not ready, I still have extension functions that don’t run in Saxon 7.
The formatting issues are mostly a matter of intellectual effort, the right answer is clear even if the solution that produces it isn’t. Here’s something more interesting: what constraints should be placed on the content of elements that are clearly supposed to hold dates and times?
In “Bourbon”, the content of date and pubdate
                  are constrained to be dates in the following way:
db.pubdate =
  element date {
    date.attlist,
    (xsd:date | xsd:dateTime | xsd:gYearMonth | xsd:gYear)
  }Works for me, until I discover that I routinely (ab)use date
                  in the following way:
<date>$Date$</date>That’s clearly a date, but it doesn’t validate. Yet I still want the date element to contain the last modified date as provided by CVS.
So what to do? On one end of the spectrum, we could decide that DocBook will not attempt to constrain the content of the date elements. On the other end, we could enforce the constraint and I could move my CVS keyword-based date off into some other element, perhaps one in my own namespace.
For the moment, since this is all in a customization layer (specifically,
                  the “NG” version of the schema for this weblog), I worked around the problem
                  by providing another pattern for date and allowing it in
                  the info:
cvsDate =
   element date {
      date.attlist,
      xsd:string {
         pattern = "$Date: \p{Nd}{4}/\p{Nd}{2}/\p{Nd}{2} \p{Nd}{2}:\p{Nd}{2}:\p{Nd}{2} $"
      }
   }Gotta love the flexibility of RELAX NG.
Thoughts?
Comments
It would make more sense to convert the CVS date/time to a standard ISO 8601 date/time. The ISO date/time is more API friendly for parsing, converting, etc. whereas the CVS date/time is quite the opposite. ISO is also more consistent with the rest of the date/time formats on your site (DocBook pubdate, Dublin Core dates).
Here's the regex to do it:
Regex: \$Date:\s(?<year>\d{4})/(?<month>\d{2})/(?<day>\d{2})\s(?<hour>\d{2}):(?<minutes>\d{2}):(?<seconds>\d{2})\s\$
Replace with: ${year}-${month}-${day}T${hour}:${minutes}:${seconds}Z
I agree that it would be better to have the date in the right format, but the question is, who does the replacement? CVS isn't going to, the validator (a standard validator anyway) isn't going to, and it seems awfully heavyweight to add a whole new transformation step into the validation process for this purpose.
Now, if we had a pipeline processing language,...sigh.
By the way, what language is that regex in?
I'd say that's the Perl syntax, with a twist. The twist is that I'd spell (?<name>pattern) as (?P<name>pattern); maybe that's what was intended. (The same RE syntax is used for Python's "re" module.)
I personally don't like idea that <date> element is typed as xs:date. I usally write date in a human friendly format and localized. WXS datatypes doesn't even provide facility for localized date/time formats. I would prefer date as xs:string, so I can write something like <date>7. dubna 2004</date>, or <date>April 7th 2004</date> in English.
DocBook is text based format and text formats are very loosely typed. I think that there is not much sense in typing element's contents in DocBook as there always will be people for who the rules are too stricts. I think that in DocBook schema is reasonable to assign types only to several attributes -- like making cols attribute on tgroup xs:int, or even xs:positiveInt.
I'm not sure restricting the syntax of dates is a great idea, but if you're going to deal with CVS dates, then it'd probably be worthwhile to support RCS as well. Here's a sample
$Date: 2004-04-28 09:00:27-07 $
1. Dashes, not slashes are used as ymd delimiters.
2. It provides timezone support; e.g., the trailing '-07' indicates PDT.