Attention: this essay is no longer current. It has been replaced by

Modeling names and addresses. No, not that old debate, the sort that appear in your address book.

Years ago, when I was using a Palm device for my address book, calendar, etc., I arranged to convert that data into RDF. I described that work in Generalized Metadata in your Palm, a paper that I presented at Extreme Markup Languages in 2002.

When I converted from the Palm to the Sidekick, I temporarily lost the RDF. I had no trouble, thanks to Dan Connolly and the T-Mobile XML/RPC interface, getting XML out, but I wasn't getting RDF. (I could have, Dan does, but I didn't because it was quicker to get my local infrastructure running again just from the straight XML.)

Recently, I decided it was time to get the RDF back. I want to be able to combine the contacts in my address book with other data sources in ways that RDF makes easy and I want to be able to do inference over contacts again. In addition, I now have a tool that will validate my RDF. Validation, does this instance conform to the model I've described?, was one of the first things I asked about when I started using RDF. Only after the publication of OWL does it seem that such tools have actually been widely deployed. (I'm using pellet at the moment.)

Designing the ontology 

Given that I can now validate my RDF, I'm much more motivated to write a schema for my model. Designing an RDF schema isn't unlike other design exercises; it consists principally of dividing the world into classes, properties on those classes, and defining the relationships between classes and properties.

My first instinct was to write my own ontology from scratch, defining a class for contacts and properties on resources of that class: first name, last name, email addresses, phone numbers, postal addresses, etc. In fact, that's just what I did. But there was a significant overlap with the FOAF vocabulary. One of RDF's strengths is the ability to easily aggregate different vocabularies, so I replaced many of my properties with appropriate FOAF properties.

In fact, I might propose to extend FOAF to cover more of this use case since it seems so closely related. Instead of just asserting my extensions, I've compromised and made some of my properties and classes subclasses and subproperties of the FOAF terms.

Classes or properties? 

Phone numbers, email addresses, postal addresses, and even to some extent, instant messaging addresses have “labels” associated with them. That is, a “work” phone number is distinct from a “home” phone number, etc.

This distinction is significant and has to be preserved in the model. Let's consider phone numbers as a concrete example. Three possibilities occur to me.

  1. Model the label directly: make a phone number a class of resource that has two properties, a label and a phone number.

  2. Use classes: make a phone number a class of resources with subclasses for a work phone number, a home phone number, etc.

  3. Use properties: make a phone number property with subproperties for work phone number, home phone number, etc.

After some thought and some discussion on the #swig channel on irc.freenode.net, I don't think there's a compelling argument in favor of any one solution, except that the first seems less appealing than either of the others. The label isn't open-ended free text, it's a string that identifies the kind of phone number and both the class and property solutions seem to do that more directly.

My personal inclination is to use classes, but I see that FOAF has already opted for the property approach (homepage, workplaceHomepage, etc.) in several places, so I decided to go that way too.

Lists or not? 

Another decision that has to be made is whether or not to model the various repeatable fields as lists. Certainly they're ordered in the XML and they appear ordered on the Sidekick display, but lists in RDF more-or-less suck, so I opted not to model them that way. It'll put a little more burden on any software I eventually write to synchronize from the RDF, but that seems better than dealing with the list problems everywhere. And really, the list nature of the properties isn't intrinsically important. If I want to call my friend's work phone number, I don't care if it's listed first or second, do I?

The “final” design 

Taking into account the choices above, and considering that I'm aiming to take advantage of FOAF as much as possible, let's consider how an entry in my address book gets translated to RDF. Here's an entry:

  1<contact id="_950">
  2  <last_modified>2005-11-24T14:10:51Z</last_modified>
  3  <category>Family</category>
  4  <firstname>Norman</firstname>
  5  <middlename>David</middlename>
  6  <surname>Walsh</surname>
  7  <company>Sun Microsystems, Inc.</company>
  8  <title>XML Standards Architect</title>
  9  <birthday>1967-06-16</birthday>
 10  <uris>
 11    <uri label="ID">#norman-walsh</uri>
 12    <uri label="Blog">http://norman.walsh.name/</uri>
 13    <uri label="Home">http://nwalsh.com/</uri>
 14  </uris>
 15  <emails>
 16    <email label="Work">Norman.Walsh@Sun.COM</email>
 17    <email label="Home">ndw@nwalsh.com</email>
 18  </emails>
 19  <phones>
 20    <phone label="Work">+1-413-303-1382</phone>
 21    <phone label="Work">+1-413-256-xxxx</phone>
 22    <phone label="Home">+1-413-256-xxxx</phone>
 23    <phone label="Mobile">+1-413-949-xxxx</phone>
 24  </phones>
 25  <addresses>
 26    <address label="Home">
 27      <street>XX Xxxx Street</street>
 28      <city>Belchertown</city>
 29      <state>MA</state>
 30      <postcode>01007</postcode>
 31    </address>
 32    <address label="Work">
 33      <street>1 Network Drive, Building #2
 34MS UBUR02-201</street>
 35      <city>Burlington</city>
 36      <state>MA</state>
 37      <postcode>01803</postcode>
 38    </address>
 39  </addresses>
 40  <notes>rdf:
 41a g:Male
 42geo:lat 42.3382
 43geo:long -72.45
 44foaf:page http://norman.walsh.name/foaf
 45
 46AccessLine is x53142</notes>
 47  <rdf:type rdf:resource="http://nwalsh.com/rdf/genealogy#Male"/>
 48  <geo:lat>42.3382</geo:lat>
 49  <geo:long>-72.45</geo:long>
 50  <foaf:page rdf:resource="http://norman.walsh.name/foaf"/>
 51</contact>

And here's the resulting RDF.

  1<rdf:Description rdf:about="http://norman.walsh.name/knows/who#norman-walsh">
  2  <rdf:type rdf:resource="http://nwalsh.com/rdf/contacts#Contact"/>
  3  <rdf:type rdf:resource="http://xmlns.com/foaf/0.1/Person"/>

The “ID” URI is used to construct the URI for the resource. All contacts are members of the Contact class and contacts that have a first or last name are foaf:Persons. Contacts that have only company names are foaf:Organizations. I have a third class, c:Place, for geographic locations, but that's probably unique to my metadata collection.

  1  <c:lastModified rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2005-11-24T14:10:51Z</c:lastModified>
  2  <c:category>Family</c:category>

The last modified date and the category are directly related to the contact data in the address book.

  1  <foaf:firstName>Norman</foaf:firstName>
  2  <c:middleName>David</c:middleName>
  3  <foaf:surname>Walsh</foaf:surname>
  4  <foaf:name>Norman David Walsh</foaf:name>

First and last names are available in FOAF. At the moment middle names aren't, so I've created a middleName property. Generating the full foaf:name is straight-forward, so I do that as well.

  1  <c:associatedName>Sun Microsystems, Inc.</c:associatedName>
  2  <c:associatedTitle>XML Standards Architect</c:associatedTitle>

These properties associate a company name and title with a contact. (There's room for some additional modeling complexity in titles as person, company, and title form a three-part relationship, but I've never seen an electronic address book that tried to handle that situation, so let's not worry about it.)

  1  <c:dateOfBirth rdf:datatype="http://www.w3.org/2001/XMLSchema#date">1967-06-16</c:dateOfBirth>
  2  <foaf:birthday>06-16</foaf:birthday>

The FOAF birthday property, unfortunately, doesn't support full dates, so I've had to invent one. But I can generate the FOAF version as well.

  1  <foaf:weblog rdf:resource="http://norman.walsh.name/"/>
  2  <foaf:homepage rdf:resource="http://nwalsh.com/"/>

The URIs convert naturally to FOAF properties. Some of my contacts have other sorts of URIs (entries in the Getty Thesaurus of Geographic Names®, the CIA World Factbook, etc.) for which I've invented additional properties.

  1  <c:workMbox rdf:resource="mailto:Norman.Walsh@Sun.COM"/>
  2  <foaf:mbox_sha1sum>9f5c771a25733700b2f96af4f8e6f35c9b0ad327</foaf:mbox_sha1sum>
  3  <c:personalMbox rdf:resource="mailto:ndw@nwalsh.com"/>
  4  <foaf:mbox_sha1sum>5ddcd862514c327945dca20446e11cb54ceec68b</foaf:mbox_sha1sum>

I've invented subproperties of foaf:mbox for various kinds of email addresses.

  1  <c:workPhone rdf:resource="tel:+1-413-303-1382"/>
  2  <c:workPhone rdf:resource="tel:+1-413-256-xxxx"/>
  3  <c:homePhone rdf:resource="tel:+1-413-256-xxxx"/>
  4  <c:mobilePhone rdf:resource="tel:+1-413-949-xxxx"/>

Similarly, I've invented subproperties of foaf:phone for various kinds of phone numbers.

  1  <c:homeAddress rdf:parseType="Resource">
  2    <rdf:type rdf:resource="http://nwalsh.com/rdf/contacts#Address"/>
  3    <c:street>XX Xxxx Street</c:street>
  4    <c:city>Belchertown</c:city>
  5    <c:stateOrProvince>MA</c:stateOrProvince>
  6    <c:postcode>01007</c:postcode>
  7  </c:homeAddress>
  8  <c:workAddress rdf:parseType="Resource">
  9    <rdf:type rdf:resource="http://nwalsh.com/rdf/contacts#Address"/>
 10    <c:street>1 Network Drive, Building #2
 11MS UBUR02-201</c:street>
 12    <c:city>Burlington</c:city>
 13    <c:stateOrProvince>MA</c:stateOrProvince>
 14    <c:postcode>01803</c:postcode>
 15  </c:workAddress>

Continuing to follow that pattern, I invented a class for postal addresses, an address property, and subproperties of it for various kinds of addresses.

  1  <c:notes>rdf:
  2a g:Male
  3geo:lat 42.3382
  4geo:long -72.45
  5foaf:page http://norman.walsh.name/foaf
  6
  7AccessLine is x53142</c:notes>

Finally, the notes property holds notes about the contact. I parse pseudo-N3 from the notes field to add additional properties to the record.

  1  <rdf:type rdf:resource="http://nwalsh.com/rdf/genealogy#Male"/>
  2  <geo:lat>42.3382</geo:lat>
  3  <geo:long>-72.45</geo:long>
  4  <foaf:page rdf:resource="http://norman.walsh.name/foaf"/>
  5</rdf:Description>

The collected RDF for all my contacts are then augmented by additional inference rules to build a final, combined model for my “personal information manager”. One example of a rule is this one:

  1{ ?c a foaf:Organization .
  2  ?c c:associatedName ?t .
  3  ?p a foaf:Person .
  4  ?p c:associatedName ?t } => { ?p c:associatedWith ?c } .

This rule says that if there's an organization (for example, an entry in my address book with only a company name) and a person with the same association, then that person is associated with that organization. So, for example, when I format the address book entry for that company, I get pointers to all the people I know who work for that company. I also have a vocabulary for relationships inside Sun (employee numbers, department numbers, reporting structures, etc.) that I can “scrape” from the internal name finder. Rules associated with terms in that vocabulary allow me to generate appropriate cross-references between employees, departments, etc.

The Ontology 

The resulting ontology is:

  1# -*- N3 -*-
  2
  3@prefix owl: <http://www.w3.org/2002/07/owl#> .
  4@prefix xs: <http://www.w3.org/2001/XMLSchema#> .
  5@prefix c: <http://nwalsh.com/rdf/contacts#> .
  6@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
  7@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
  8@prefix foaf: <http://xmlns.com/foaf/0.1/> .
  9
 10<http://nwalsh.com/rdf/contacts> a owl:Ontology;
 11    rdfs:comment "Norm's ontology for his address book." .
 12
 13# ------------------------------------------------------------
 14
 15# A contact in an address book
 16c:Contact a owl:Class;
 17    rdfs:subClassOf
 18        [
 19             a owl:Restriction;
 20             owl:cardinality "1"^^xs:nonNegativeInteger;
 21             owl:onProperty c:lastModified ] .
 22
 23# Timestamp of address book entry
 24c:lastModified a owl:DatatypeProperty;
 25    rdfs:domain c:Contact;
 26    rdfs:range xs:dateTime .
 27
 28# Category in address book
 29c:category a owl:DatatypeProperty;
 30    rdfs:domain c:Contact .
 31
 32# A middle name (other name properties come from FOAF)
 33c:middleName a owl:DatatypeProperty .
 34
 35# Company and title
 36c:associatedName a owl:DatatypeProperty .
 37c:associatedTitle a owl:DatatypeProperty .
 38
 39# Birthday
 40c:dateOfBirth a owl:DatatypeProperty;
 41    rdfs:range xs:dateTime .
 42
 43# Email addresses
 44c:personalMbox a owl:ObjectProperty;
 45     rdfs:subPropertyOf foaf:mbox .
 46
 47c:workMbox a owl:ObjectProperty;
 48     rdfs:subPropertyOf foaf:mbox .
 49
 50c:pagerMbox a owl:ObjectProperty;
 51     rdfs:subPropertyOf foaf:mbox .
 52
 53c:obsoleteMbox a owl:ObjectProperty;
 54     rdfs:subPropertyOf foaf:mbox .
 55
 56# Phone numbers
 57c:dataPhone a owl:ObjectProperty;
 58     rdfs:subPropertyOf foaf:phone .
 59
 60c:fax a owl:ObjectProperty;
 61     rdfs:subPropertyOf foaf:phone .
 62
 63c:homePhone a owl:ObjectProperty;
 64     rdfs:subPropertyOf foaf:phone .
 65
 66c:workPhone a owl:ObjectProperty;
 67     rdfs:subPropertyOf foaf:phone .
 68
 69c:mobilePhone a owl:ObjectProperty;
 70     rdfs:subPropertyOf foaf:phone .
 71
 72c:pagerPhone a owl:ObjectProperty;
 73     rdfs:subPropertyOf foaf:phone .
 74
 75# Notes
 76c:notes a owl:DatatypeProperty .
 77
 78# Postal address
 79c:Address a owl:Class;
 80    rdfs:subClassOf
 81        [
 82             a owl:Restriction;
 83             owl:maxCardinality "1"^^xs:nonNegativeInteger;
 84             owl:onProperty c:street ],
 85        [
 86             a owl:Restriction;
 87             owl:maxCardinality "1"^^xs:nonNegativeInteger;
 88             owl:onProperty c:city ],
 89        [
 90             a owl:Restriction;
 91             owl:maxCardinality "1"^^xs:nonNegativeInteger;
 92             owl:onProperty c:stateOrProvince ],
 93        [
 94             a owl:Restriction;
 95             owl:maxCardinality "1"^^xs:nonNegativeInteger;
 96             owl:onProperty c:postcode ],
 97        [
 98             a owl:Restriction;
 99             owl:maxCardinality "1"^^xs:nonNegativeInteger;
100             owl:onProperty c:country ] .
101
102# Addresses
103c:address a owl:ObjectProperty;
104     rdfs:range c:Address .
105
106c:workAddress a owl:ObjectProperty;
107     rdfs:subPropertyOf c:address .
108
109c:homeAddress a owl:ObjectProperty;
110     rdfs:subPropertyOf c:address .
111
112# Fields of an address
113c:street a owl:DatatypeProperty;
114   rdfs:domain c:Address;
115   rdfs:range xs:string .
116
117c:city a owl:DatatypeProperty;
118   rdfs:domain c:Address;
119   rdfs:range xs:string .
120
121c:stateOrProvince a owl:DatatypeProperty;
122   rdfs:domain c:Address;
123   rdfs:range xs:string .
124
125c:postcode a owl:DatatypeProperty;
126   rdfs:domain c:Address;
127   rdfs:range xs:string .
128
129c:country a owl:DatatypeProperty;
130   rdfs:domain c:Address;
131   rdfs:range xs:string .

I have a few additional constraints that I think are limited to my particular address book:

  1# -*- N3 -*-
  2
  3@prefix owl: <http://www.w3.org/2002/07/owl#> .
  4@prefix xs: <http://www.w3.org/2001/XMLSchema#> .
  5@prefix c: <http://nwalsh.com/rdf/contacts#> .
  6@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
  7@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
  8@prefix foaf: <http://xmlns.com/foaf/0.1/> .
  9
 10c:Contact
 11    rdfs:subClassOf
 12        [
 13             a owl:Restriction;
 14             owl:cardinality "1"^^xs:nonNegativeInteger;
 15             owl:onProperty c:category ],
 16        [
 17             a owl:Restriction;
 18             owl:maxCardinality "1"^^xs:nonNegativeInteger;
 19             owl:onProperty foaf:firstName ],
 20        [
 21             a owl:Restriction;
 22             owl:maxCardinality "1"^^xs:nonNegativeInteger;
 23             owl:onProperty foaf:surname ],
 24        [
 25             a owl:Restriction;
 26             owl:maxCardinality "1"^^xs:nonNegativeInteger;
 27             owl:onProperty c:middleName ],
 28        [
 29             a owl:Restriction;
 30             owl:maxCardinality "1"^^xs:nonNegativeInteger;
 31             owl:onProperty c:associatedName ],
 32        [
 33             a owl:Restriction;
 34             owl:maxCardinality "1"^^xs:nonNegativeInteger;
 35             owl:onProperty c:associatedTitle ],
 36        [
 37             a owl:Restriction;
 38             owl:maxCardinality "1"^^xs:nonNegativeInteger;
 39             owl:onProperty c:dateOfBirth ],
 40        [
 41             a owl:Restriction;
 42             owl:maxCardinality "1"^^xs:nonNegativeInteger;
 43             owl:onProperty foaf:birthday ] .
 44
 45foaf:Organization
 46    rdfs:subClassOf
 47        [
 48             a owl:Restriction;
 49             owl:cardinality "0"^^xs:nonNegativeInteger;
 50             owl:onProperty foaf:firstName ],
 51        [
 52             a owl:Restriction;
 53             owl:cardinality "0"^^xs:nonNegativeInteger;
 54             owl:onProperty foaf:surname ],
 55        [
 56             a owl:Restriction;
 57             owl:cardinality "0"^^xs:nonNegativeInteger;
 58             owl:onProperty foaf:name ],
 59        [
 60             a owl:Restriction;
 61             owl:cardinality "0"^^xs:nonNegativeInteger;
 62             owl:onProperty foaf:nick ] .
 63
 64c:Place a owl:Class;
 65  rdfs:subClassOf foaf:Agent .
 66
 67c:gettyTGN a owl:ObjectProperty;
 68   rdfs:subPropertyOf foaf:page .
 69
 70c:ciaFactbook a owl:ObjectProperty;
 71   rdfs:subPropertyOf foaf:page .
 72
 73c:weather a owl:ObjectProperty;
 74   rdfs:subPropertyOf foaf:page .
 75
 76c:associatedWith a owl:ObjectProperty;
 77    rdfs:domain foaf:Person;
 78    rdfs:range foaf:Organization .
 79
 80c:hasAssociated a owl:ObjectProperty;
 81    rdfs:domain foaf:Organization;
 82    rdfs:range foaf:Person .

Open Questions 

Should some of this be incorporated into FOAF? Should I have tried to use the vCard schema instead? And of course, which bits could be modelled better?

Comments:

Y'know, this essay illuminates a lot of why I will only use RDF when dragged kicking and screaming in front of a directed graph.

I learned to make tolerable ERDs in a few weeks. The design principles were pretty clear, and where they weren't clear, the tradeoffs were easily explainable.

I learned to write tolerable SGML DTDs (basic ones, anyway -- not so much the modular, INCLUDEd ones) in a few months. The design principles were pretty clear, and where they weren't clear, the tradeoffs were evident to an eye used to coping with instances.

The design principles for an RDF vocabulary -- are there any? I've once or twice, to my sorrow, been in a room where an RDF vocabulary was being designed. All discussions devolved into class-property-label disputes, weird attempts to turn "what the model is REALLY saying" into English sentences whose syntax invariably ended up horribly tortured and which did not clarify the issue at hand in the slightest, and confusion about bags or lists or links or whatever.

(And RDF people get all squinchy if you don't do these things correctly, even when they can't coherently explain the design principles or the tradeoffs. That's the worst part.)

I've never written a topic map, but if I had to, I expect I could figure out how in a few weeks (what with all my wizard librarian-fu). I doubt I'll ever figure out how to put together an RDF ontology. Possibly that should bother me, but somehow... I'm not bothered.

Posted by Dorothea on 29 Nov 2005 @ 04:49pm UTC #

The thing about working out rdf models, if you already know ER modeling, is just to do ER modeling at the conceptual modeling level. RDF can be thought of as an ultra-normalized relational database, modulo a few quirks, of course. It's usually pretty straightforward to go from a high level ER diagram to an rdf design. An advantage is that, with rdf, you don't have to create new tables or rows for rare cases. The rdf data set can be like a set of sparse tables.

You can also do some informal object modeling as your starting point.

Just like ER designs, there are usually several ways to model something in rdf, and usually, none of them is the "one right way".

You also don't have to use every feature of rdf to do most of your work, anymore than you have to use every feature of a relational database. Use a nice subset and forget the rest for some other time.

Posted by Tom Passin on 29 Nov 2005 @ 05:15pm UTC #

Norm, I've had to do exactly this kind of modelling of types on phone numbers and email addresses, both personally and in my corporate work.

The solution I used is to exploit the fact that tel: URIs and mailto: URIs are the only acceptable ranges of foaf:phone and foaf:mbox, so we can actually type the objects.

My blog post on this topic is here.

So, rather than making subproperties of foaf:phone, we can just use foaf:phone and assert that, say,

<tel:+1-234-567-8900> rdf:type EXT:cellPhone .

It's a modelling choice, but I quite like it.

Posted by Rich on 29 Nov 2005 @ 07:08pm UTC #
I think you're missing what's really different here, Dorothea. What makes the modelling issues in this example interesting has relatively little to do with the technology. My first design, the one that was just for me and didn't use any terms in any other vocabulary, took about 10 minutes to write and followed design principles that I've learned from years of work on XML vocabularies. Instantiating that model in RDF was easy and straightforward.

But unlike database design and XML schema design, which are designs that live in relative isolation, designing an RDF ontology (or at least this ontology) is about cooperating in a web-scale information space. Just as my pages on the web benfit from the network effect, the terms in my RDF vocabulary can benefit from that effect as well.

I want people to reuse them, I want them to spread, and I want to add value to the network by reusing terms from other vocabularies. My struggles with the design are as much about my attempts to understand what it means to design an ontology that will be used on that scale as they are about the details of this particular model.

As for discussions that devolve into class-property-label disuptes, if you haven't seen the same thing in designing XML vocabularies (elements or attributes?, element and attribute names!, mixed content or element content?, wrapper element or no wrapper element?, etc.), tag along with your favorite XML consultant sometime :-)

Posted by Norman Walsh on 29 Nov 2005 @ 07:20pm UTC #
(Sound of palm striking forehead)

Thanks, Rich. That's an interesting idea.

Posted by Norman Walsh on 29 Nov 2005 @ 07:28pm UTC #

While my initial response was that I liked the xml version of contacts, I did empathize with Dorotheas comments. Your response made (at least part of) your reasoning clear. designing an RDF ontology (or at least this ontology) is about cooperating in a web-scale information space could be seen as taking xml another step forward. I'd be interested in hearing more from you on this please Norm.

Also of interest to me was another piece of rationale for wanting the rdf, that you don't expand on.

I want to be able to combine the contacts in my address book with other data sources in ways that RDF makes easy and I want to be able to do inference over contacts again.

Which rdf tools make it easier than merging and querying the XML please?

regards DaveP

Posted by Dave Pawson on 30 Nov 2005 @ 10:08am UTC #

You have left it as an open question but why did you not consider vCard from the beginning?

I think it might not be very RDF-friendly, because it was conceived with XML in mind, but it facilitates mapping existing contacts information to the RDF space. For instance my RDF profile, which is rendered as HTML with ReDeFer at my homepage

.
Posted by Roberto Garcia on 01 Dec 2005 @ 12:42pm UTC #

Re Roberto: I'm assuming you're referring to the vCard-in-RDF spec that's lying around on the W3C's servers somewhere.

The reason not to use it… it's awful. Avoiding it like the plague is the best idea.

Posted by Rich on 01 Dec 2005 @ 09:52pm UTC #
Comments on this essay are closed. Thank you, spammers.