SQL to XML

Volume 12, Issue 31; 26 Sep 2009; last modified 08 Oct 2010

A number of Mac applications store information in SQLite databases. Step one to do something useful with that data is to get it into XML.

I hate having my data squirreled away in proprietary or quasi-proprietary ways. If I can't get my information back out of an app, I'd rather not use it. When I started using the Mac, I switched to iCal and AddressBook: both can export data in standard text formats which I can easily convert to XML.

But exporting the data is a manual process (though I could probably automate it with some clever AppleScript or something, I've never tried). I build a number of views of my address book and calendar data automatically so manual processes don't fit well into my workflow.

It didn't take too long to figure out where iCal stores my appointments or how to pull them together. Having worked out where iCal stores my appointments, I turned my attention to AddressBook.

Long story short: the address data is saved in a database in ~/Library/Application Support/AddressBook/AddressBook-v22.abcddb. After installing the sqlite3 application from MacPorts, I was able to extract a text dump. So far so good.

Here, for example, is a table definition and the first row of data in that table:

CREATE TABLE ZABCDRECORD ( Z_PK INTEGER PRIMARY KEY, Z_ENT INTEGER,
Z_OPT INTEGER, ZDISPLAYFLAGS INTEGER, ZMODIFICATIONDATEYEAR INTEGER,
ZCREATIONDATEYEAR INTEGER, ZADDRESSBOOKSOURCE INTEGER, ZISALL INTEGER,
ZME INTEGER, Z19_ME INTEGER, ZINFO INTEGER, ZBIRTHDAYYEAR INTEGER,
ZPRIVACYFLAGS INTEGER, ZNOTE INTEGER, ZADDRESSBOOKSOURCE1 INTEGER,
ZCONTACTINDEX INTEGER, ZSOURCEWHERECONTACTISME INTEGER, ZVERSION
INTEGER, ZSYNCCOUNT INTEGER, ZSHARECOUNT INTEGER, ZADDRESSBOOKSOURCE2
INTEGER, ZMODIFICATIONDATE TIMESTAMP, ZCREATIONDATE TIMESTAMP,
ZMODIFICATIONDATEYEARLESS FLOAT, ZCREATIONDATEYEARLESS FLOAT,
ZBIRTHDAY TIMESTAMP, ZBIRTHDAYYEARLESS FLOAT, ZUNIQUEID VARCHAR, ZNAME
VARCHAR, ZNAMENORMALIZED VARCHAR, ZTMPREMOTELOCATION VARCHAR, ZNAME1
VARCHAR, ZREMOTELOCATION VARCHAR, ZSERIALNUMBER VARCHAR, ZSUFFIX
VARCHAR, ZTITLE VARCHAR, ZTMPHOMEPAGE VARCHAR, ZNICKNAME VARCHAR,
ZORGANIZATION VARCHAR, ZMAIDENNAME VARCHAR, ZIDENTITYUNIQUEID VARCHAR,
ZPHONETICFIRSTNAME VARCHAR, ZDEPARTMENT VARCHAR, ZPHONETICLASTNAME
VARCHAR, ZMIDDLENAME VARCHAR, ZFIRSTNAME VARCHAR, ZIMAGEREFERENCE
VARCHAR, ZJOBTITLE VARCHAR, ZPHONETICMIDDLENAME VARCHAR, ZLASTNAME
VARCHAR, ZSORTINGFIRSTNAME VARCHAR, ZSORTINGLASTNAME VARCHAR,
ZCREATEDVERSION VARCHAR, ZLASTDOTMACACCOUNT VARCHAR, ZLASTSAVEDVERSION
VARCHAR, ZSYNCANCHOR VARCHAR, ZSEARCHELEMENTDATA BLOB,
ZMODIFIEDUNIQUEIDSDATA BLOB );
INSERT INTO "ZABCDRECORD" VALUES(1,18,287,NULL,NULL,2008,NULL,1,NULL,NULL,
3,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,
239119683.670433,NULL,18281283.670433,NULL,NULL,
'93973926-7EF6-40F0-ADBD-8C7BBFC30FA1:ABSubscriptionRecord',NULL,
NULL,NULL,NULL,'local','B7303AAD-DA79-46E6-BC7D-91DAD82AEFB8',
NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,
NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL);

Next, I wrote 150 or so lines of Perl to convert the text into XML.

The XML is designed to be a totally straightforward representation of the table structure of the database. From the preceding SQL statements, sqltoxml produces:

<table name='ZABCDRECORD'>
<columns>
  <column name='Z_PK' type='INTEGER'/>
  <column name='Z_ENT' type='INTEGER'/>
  <column name='Z_OPT' type='INTEGER'/>
  <column name='ZDISPLAYFLAGS' type='INTEGER'/>
  <column name='ZMODIFICATIONDATEYEAR' type='INTEGER'/>
  <column name='ZCREATIONDATEYEAR' type='INTEGER'/>
  <column name='ZADDRESSBOOKSOURCE' type='INTEGER'/>
  <column name='ZISALL' type='INTEGER'/>
  <column name='ZME' type='INTEGER'/>
  <column name='Z19_ME' type='INTEGER'/>
  <column name='ZINFO' type='INTEGER'/>
  <column name='ZBIRTHDAYYEAR' type='INTEGER'/>
  <column name='ZPRIVACYFLAGS' type='INTEGER'/>
  <column name='ZNOTE' type='INTEGER'/>
  <column name='ZADDRESSBOOKSOURCE1' type='INTEGER'/>
  <column name='ZCONTACTINDEX' type='INTEGER'/>
  <column name='ZSOURCEWHERECONTACTISME' type='INTEGER'/>
  <column name='ZVERSION' type='INTEGER'/>
  <column name='ZSYNCCOUNT' type='INTEGER'/>
  <column name='ZSHARECOUNT' type='INTEGER'/>
  <column name='ZADDRESSBOOKSOURCE2' type='INTEGER'/>
  <column name='ZMODIFICATIONDATE' type='TIMESTAMP'/>
  <column name='ZCREATIONDATE' type='TIMESTAMP'/>
  <column name='ZMODIFICATIONDATEYEARLESS' type='FLOAT'/>
  <column name='ZCREATIONDATEYEARLESS' type='FLOAT'/>
  <column name='ZBIRTHDAY' type='TIMESTAMP'/>
  <column name='ZBIRTHDAYYEARLESS' type='FLOAT'/>
  <column name='ZUNIQUEID' type='VARCHAR'/>
  <column name='ZNAME' type='VARCHAR'/>
  <column name='ZNAMENORMALIZED' type='VARCHAR'/>
  <column name='ZTMPREMOTELOCATION' type='VARCHAR'/>
  <column name='ZNAME1' type='VARCHAR'/>
  <column name='ZREMOTELOCATION' type='VARCHAR'/>
  <column name='ZSERIALNUMBER' type='VARCHAR'/>
  <column name='ZSUFFIX' type='VARCHAR'/>
  <column name='ZTITLE' type='VARCHAR'/>
  <column name='ZTMPHOMEPAGE' type='VARCHAR'/>
  <column name='ZNICKNAME' type='VARCHAR'/>
  <column name='ZORGANIZATION' type='VARCHAR'/>
  <column name='ZMAIDENNAME' type='VARCHAR'/>
  <column name='ZIDENTITYUNIQUEID' type='VARCHAR'/>
  <column name='ZPHONETICFIRSTNAME' type='VARCHAR'/>
  <column name='ZDEPARTMENT' type='VARCHAR'/>
  <column name='ZPHONETICLASTNAME' type='VARCHAR'/>
  <column name='ZMIDDLENAME' type='VARCHAR'/>
  <column name='ZFIRSTNAME' type='VARCHAR'/>
  <column name='ZIMAGEREFERENCE' type='VARCHAR'/>
  <column name='ZJOBTITLE' type='VARCHAR'/>
  <column name='ZPHONETICMIDDLENAME' type='VARCHAR'/>
  <column name='ZLASTNAME' type='VARCHAR'/>
  <column name='ZSORTINGFIRSTNAME' type='VARCHAR'/>
  <column name='ZSORTINGLASTNAME' type='VARCHAR'/>
  <column name='ZCREATEDVERSION' type='VARCHAR'/>
  <column name='ZLASTDOTMACACCOUNT' type='VARCHAR'/>
  <column name='ZLASTSAVEDVERSION' type='VARCHAR'/>
  <column name='ZSYNCANCHOR' type='VARCHAR'/>
  <column name='ZSEARCHELEMENTDATA' type='BLOB'/>
  <column name='ZMODIFIEDUNIQUEIDSDATA' type='BLOB'/>
</columns>
<rows>
  <row>
    <Z_PK>1</Z_PK>
    <Z_ENT>18</Z_ENT>
    <Z_OPT>287</Z_OPT>
    <ZCREATIONDATEYEAR>2008</ZCREATIONDATEYEAR>
    <ZISALL>1</ZISALL>
    <ZINFO>3</ZINFO>
    <ZCREATIONDATE>239119683.670433</ZCREATIONDATE>
    <ZCREATIONDATEYEARLESS>18281283.670433</ZCREATIONDATEYEARLESS>
    <ZUNIQUEID>93973926-7EF6-40F0-ADBD-8C7BBFC30FA1:ABSubscriptionRecord</ZUNIQUEID>
    <ZREMOTELOCATION>local</ZREMOTELOCATION>
    <ZSERIALNUMBER>B7303AAD-DA79-46E6-BC7D-91DAD82AEFB8</ZSERIALNUMBER>
  </row>
  <!-- ... -->
</rows>
</table>

As you can see, I've made no effort to maintain some aspects of the database (like the primary key), I've simply dropped NULL fields, and I'm relying on the field names to be valid XML NCNames. There are clearly other, equally reasonable, design choices that I could have made.

The resulting XML is the bare minimum needed to switch to XML tools for subsequent downstream processing (turning address book tables into VCards, for example). But it gets the job done.

And maybe it'll come in handy for someone else.

Comments

Basically what you are doing is applying a transformation from an RDBM schema to some specific XML schema. XSLT is very good at transforming XML to XML, or XML to text. Obviously its not so good at the reverse mapping of 'anything' into XML. There's a program called pyana though which allegedly enhances XSLT transforms with python functionality, but there, still, I think the python is constrained to work with a source XML document rather than 'anything'.

For work recently though, I've been trying to make adhoc mappings of data from PostgreSQL and Oracle databases into an XML metadata schema. Basically what I did was take an instance document of the desired target XML schema and pepper it with elements like and containing raw python code or SQL expressions respectively. This template is fed into a python-based XML parser that evaluates the SQL and python elements and replaces those elements with the result of its evaluations. Consequently this is an 'anything' to XML transformer, and its limited purely by python's limitations (well, that and my crap code).

But anyway, to do it properly it also requires, and has, program constructs like and , which you would recognise as XSLT constructs. Probably it would benefit from more XSLT-like functionality like and elements.

It would be cool if something like this was written/designed and spec'ed by experts though. Be nice to use something robust from off-the-shelf ...

—Posted by Doug on 01 Oct 2009 @ 04:17 UTC #

As part of your subsequent downstream processing, would you next transform this database-centric XML to a more concise form?

For example, rather than go straight to VCard from the XML above, make it a two-step process with something like this in the middle...

<Contacts><Contact><Name>Norm Walsh</Name><Address><Line>221B Baker Street</Line><City>London</City><PostalCode>NW1 6XE</PostalCode><Country>England</Country></Address></Contact> ...

I find that 2-step process hides a fairly ugly but not difficult transform in the first part, leaving an ideal input for the final transform (to VCard) output. Makes the whole transform more maintainable.

If only there were an XML transform pipeline language to tie it all together, oh wait... ;)

With this particular XML, you could argue for another initial transform that just throws away the data elements you're not interested in to simplify the input. That could be a pretty common operation with database-centric dumps like this one. You could then choose to drive the data element selection with a common transform and an XML file to make the field selections. And the pipeline document would be common as well among this type of transform.

Are thoughts like these guilty of using an XML toolset hammer? Perhaps. I think your approach is worthwhile and generally useful, therefore these 'extensions' might also be.

I'd be interested in your thoughts, because you wield an exceptionally mighty XML hammer! Thanks.

—Posted by Sam Page on 01 Oct 2009 @ 03:42 UTC #

In this particular case, I have an existing vCard-to-something else toolchain, so I went from the SQL dump to vCard. But as a general rule, I'm in favor of using a series of simple transformations instead of one complex transformation where practical.

—Posted by Norman Walsh on 06 Oct 2009 @ 02:11 UTC #

Norm, well that was back then - now you can do this much simpler - and it works fine on the Mac too. The latest edition of the CAM editor tools now does JDBC - SQL to XML via drag and drop designer tool. You can see this in action in the demonstration video here. Also the examples shown and all the tools are available from the project website. Enjoy!

—Posted by David Webber on 12 Oct 2012 @ 02:01 UTC #