Processing iCal data

Volume 15, Issue 31; 20 Nov 2012

Managing iCal “event” data using XQuery turns out to be practical after all.

Time is that quality of nature which keeps events from happening all at once. Lately it doesn't seem to be working.

As my longtime readers will know [you have longtime readers? -ed], ever since the PalmPilot days, I've been converting appointments, contacts, and other “PIM” data to XML so that I could process it with my tools of choice.

Appointments, more generally events, have always been particularly difficult to deal with. The trouble is that a great many of them repeat according to rules that defy simple algorithmic computation. (Quick! Does an event that started on 20 June 2007 and repeats every third Wednesday occur on 8 January 2013?No; 8 January 2013 is not a Wednesday. Does it occur on 9 January 2013? I have no idea.)

Nevertheless, I want to be able to identify the events that occur “today”. I use this to “paint” a copy of today's schedule on my desktop.

Historically, I've addressed this problem with a Python script that takes my collection of events (almost 3,000 of them) and computes which ones occur on which days for a period of several years. So, for example, today I'd have direct access to all the events that occur on any given day from 1 January 2010 to 31 December 2015.

This is highly unsatisfying because 2010 and 2015 are utterly arbitrary dates and because computing all the appointments for 2191 days is time consuming. What I really wanted was the ability to compute the results “on the fly” for any given date.

My compute platform of choice these days is MarkLogic Server. So what I wanted was a way to perform this computation efficiently in XQuery. A few weeks ago, I decided to see how far I could get.

“Efficiently” in MarkLogic server, and more broadly in a database context, boils down to: “how much of the computation can you resolve from indexes?” In practical terms, given a date like 19 December 2012, how many events can you exclude?

Some exclusions are obvious: events that start after the date, events that repeat until some time before the date, events that explicitly exclude the date. But that's not enough. A fair number of events in my calendar start years ago and repeat without an explicit end date.

So then the question is, how many indexable “facts” can you glean about an appointment that you can use in a query that's driven entirely by the indexes.

The answer, I was pleasantly surprised to discover, is quite a few. Birthday's, for example, happen on a particular month and day, events that repeat on the 3rd Wednesday of every month happen on both the 3rd of any particular day of the month and on a Wednesday, events that happen on the 15th of every month happen on the 15th, etc.

Here, for example, is a plausible event for the DocBook Technical Committee Meeting:

<event sequence="0" transparent="OPAQUE">
   <calendar>Calendar</calendar>
   <uid>C75D13B2-7984-48B6-AF7E-BE6530C03CA5</uid>
   <dtend tzid="America/Detroit">2009-09-16T14:30:00</dtend>
   <exdate tzid="America/Detroit">2012-02-15T13:00:00</exdate>
   <exdate tzid="America/Detroit">2012-09-19T13:00:00</exdate>
   <exdate class="recurrence" tzid="America/Detroit">2012-11-21T13:00:00</exdate>
   <rrule freq="MONTHLY" interval="1" byday="3WE"/>
   <summary>OASIS DocBook TC</summary>
   <dtstart tzid="America/Detroit">2009-09-16T13:00:00</dtstart>
   <dtstamp>2012-11-14T18:35:12Z</dtstamp>
   <valarm action="AUDIO" trigger="-PT5M">
      <uid>920337F6-57F3-410A-9C19-F1AA1E84A7DC</uid>
      <sound>file:///System/Library/Sounds/Basso.aiff</sound>
   </valarm>
</event>

And here's what I glean from it (not everything I choose to glean is relevant to the question of whether or not it occurs on a particular day):

<prop:properties xmlns:prop="http://marklogic.com/xdmp/property"
                 xmlns="http://nwalsh.com/xmlns/pim/metadata">
  <start-date>2009-09-16</start-date>
  <end-date>2009-09-16</end-date>
  <end-interval>2009-09-16</end-interval>
  <start-time>13:00:00</start-time>
  <end-time>14:30:00</end-time>
  <dtstart>2009-09-16T13:00:00</dtstart>
  <dtend>2009-09-16T14:30:00</dtend>
  <duration>PT1H30M</duration>
  <tzid>America/Detroit</tzid>
  <repeats>true</repeats>
  <until>9999-12-31</until>
  <frequency>MONTHLY</frequency>
  <interval>1</interval>
  <filter-day>true</filter-day>
  <ord>3</ord>
  <byday>WE</byday>
  <filter-month-day>false</filter-month-day>
  <filter-month>false</filter-month>
  <filter-yearly-date>false</filter-yearly-date>
  <except>2012-02-15</except>
  <except>2012-09-19</except>
  <except>2012-11-21</except>
</prop:properties>

(Observant readers will note that this is a “properties fragment”. Properties are not a panacea. They introduce a second fragment for every document, for example. But for this code (which I'll publish in due time), I decided that there would be value in moving the data out of the document, so you don't have to use exactly the same markup as I do; you only have to glean the same properties.)

I take advantage of these properties by constructing a fairly hairy query:

  let $month        := string(month-from-date($date))
  let $day          := string(day-from-date($date))
  let $dow          := upper-case(format-date($date, "[FNn,*-2]"))

  (: Filter by day :)
  let $f-day-0      := cts:element-value-query(xs:QName("meta:filter-day"), "false")
  let $f-day-1      := cts:and-query(
                           (cts:element-value-query(xs:QName("meta:filter-day"), "true"),
                            cts:element-value-query(xs:QName("meta:byday"), $dow)))
  let $f-day        := cts:or-query(($f-day-0, $f-day-1))

  (: Filter by month day :)
  let $f-monthday-0 := cts:element-value-query(xs:QName("meta:filter-month-day"), "false")
  let $f-monthday-1 := cts:and-query(
                           (cts:element-value-query(xs:QName("meta:filter-month-day"), "true"),
                            cts:element-value-query(xs:QName("meta:bymonthday"), $day)))
  let $f-monthday   := cts:or-query(($f-monthday-0, $f-monthday-1))

  (: Filter by month :)
  let $f-month-0    := cts:element-value-query(xs:QName("meta:filter-month"), "false")
  let $f-month-1    := cts:and-query(
                           (cts:element-value-query(xs:QName("meta:filter-month"), "true"),
                            cts:element-value-query(xs:QName("meta:bymonth"), $month)))
  let $f-month      := cts:or-query(($f-month-0, $f-month-1))

  (: Filter by yearly date :)
  let $f-year-0     := cts:element-value-query(xs:QName("meta:filter-yearly-date"), "false")
  let $f-year-1     := cts:and-query(
                           (cts:element-value-query(xs:QName("meta:filter-yearly-date"), "true"),
                            cts:element-value-query(xs:QName("meta:monthday"),
                                                    substring(string($date),6,5))))
  let $f-year       := cts:or-query(($f-year-0, $f-year-1))

  let $start    := cts:element-range-query(xs:QName("meta:start-date"), "<=", $date)
  let $until    := cts:element-range-query(xs:QName("meta:until"), ">=", $date)
  let $repeats  := cts:element-value-query(xs:QName("meta:repeats"), "true")
  let $single   := cts:element-value-query(xs:QName("meta:repeats"), "false")
  let $thisday  := cts:or-query(
                     ($single,
                      cts:and-query(($repeats, $f-day, $f-monthday, $f-month, $f-year))))

  let $andq     := cts:and-query(($start, $until, $thisday))
  let $notq     := cts:element-value-query(xs:QName("meta:except"), string($date))

  return
    cts:search(collection($pim:event-collection),
               cts:properties-query(cts:and-not-query($andq, $notq)))/*

What this boils down to is that I filter by “or”ing various possibilities (an event can be filtered by the day of the week, in which case we filter by exactly that day, or it can't), “and”ing the subqueries together, and removing the explicitly “not” events.

(There are some edge cases; for example, I convert a “repeats every week 3 times” event into an explicit end date four weeks from the start.)

Let's apply the complex query above to the almost 3,000 events in the database:

<ul>
{
  let $events := xical:find-possible-events(xs:date("2012-12-19"))
  for $event in $events
  return
    <li>{string($event/pim:summary)}</li>
}
</ul>

We get the result:

<ul>
  <li>1515 WED CLUB</li>
  <li>W3C XML Core</li>
  <li>OASIS DocBook TC</li>
</ul>

in 0.0028 seconds. Not too shabby. It turns out that on this particular day, all of the “possible” appointments are actually appointments on that day. We don't always get quite that lucky.

I combine event data with contacts from my address book (both of which have simple annotations to provide ad hoc tagging and such), notes from Evernote, a little reverse geocoding of addresses, etc. to build a really nice, comprehensive linked web of my PIM data.