An address book ontology
Modeling names and addresses. No, not that old debate, the sort that appear in your address book.
Years ago, when I was using a Palm device for my address book, calendar, etc., I arranged to convert that data into RDF. I described that work in Generalized Metadata in your Palm, a paper that I presented at Extreme Markup Languages in 2002.
When I converted from the Palm to the Sidekick, I temporarily lost the RDF. I had no trouble, thanks to Dan Connolly and the T-Mobile XML/RPC interface, getting XML out, but I wasn't getting RDF. (I could have, Dan does, but I didn't because it was quicker to get my local infrastructure running again just from the straight XML.)
Recently, I decided it was time to get the RDF back. I want to be able to combine the contacts in my address book with other data sources in ways that RDF makes easy and I want to be able to do inference over contacts again. In addition, I now have a tool that will validate my RDF. Validation, does this instance conform to the model I've described?, was one of the first things I asked about when I started using RDF. Only after the publication of OWL does it seem that such tools have actually been widely deployed. (I'm using pellet at the moment.)
Designing the ontology
Given that I can now validate my RDF, I'm much more motivated to write a schema for my model. Designing an RDF schema isn't unlike other design exercises; it consists principally of dividing the world into classes, properties on those classes, and defining the relationships between classes and properties.
My first instinct was to write my own ontology from scratch, defining a class for contacts and properties on resources of that class: first name, last name, email addresses, phone numbers, postal addresses, etc. In fact, that's just what I did. But there was a significant overlap with the FOAF vocabulary. One of RDF's strengths is the ability to easily aggregate different vocabularies, so I replaced many of my properties with appropriate FOAF properties.
In fact, I might propose to extend FOAF to cover more of this use case since it seems so closely related. Instead of just asserting my extensions, I've compromised and made some of my properties and classes subclasses and subproperties of the FOAF terms.
Classes or properties?
Phone numbers, email addresses, postal addresses, and even to some extent, instant messaging addresses have “labels” associated with them. That is, a “work” phone number is distinct from a “home” phone number, etc.
This distinction is significant and has to be preserved in the model. Let's consider phone numbers as a concrete example. Three possibilities occur to me.
- 
                              Model the label directly: make a phone number a class of resource that has two properties, a label and a phone number. 
- 
                              Use classes: make a phone number a class of resources with subclasses for a work phone number, a home phone number, etc. 
- 
                              Use properties: make a phone number property with subproperties for work phone number, home phone number, etc. 
After some thought and
                        some
                           discussion on the
                        #swig channel
                        on irc.freenode.net,
                        I don't think there's a compelling argument in favor of any one solution,
                        except that the first seems less appealing than either of the others.
                        The label isn't open-ended free text, it's a string that identifies
                        the kind of phone number and both the class and property solutions seem
                        to do that more directly.
My personal inclination is to use classes, but I see that FOAF has
                        already opted for the property approach (homepage,
                        workplaceHomepage, etc.) in several places, so I
                        decided to go that way too.
Lists or not?
Another decision that has to be made is whether or not to model the various repeatable fields as lists. Certainly they're ordered in the XML and they appear ordered on the Sidekick display, but lists in RDF more-or-less suck, so I opted not to model them that way. It'll put a little more burden on any software I eventually write to synchronize from the RDF, but that seems better than dealing with the list problems everywhere. And really, the list nature of the properties isn't intrinsically important. If I want to call my friend's work phone number, I don't care if it's listed first or second, do I?
The “final” design
Taking into account the choices above, and considering that I'm aiming to take advantage of FOAF as much as possible, let's consider how an entry in my address book gets translated to RDF. Here's an entry:
<contact id="_950">
  <last_modified>2005-11-24T14:10:51Z</last_modified>
  <category>Family</category>
  <firstname>Norman</firstname>
  <middlename>David</middlename>
  <surname>Walsh</surname>
  <company>Sun Microsystems, Inc.</company>
  <title>XML Standards Architect</title>
  <birthday>1967-06-16</birthday>
  <uris>
    <uri label="ID">#norman-walsh</uri>
    <uri label="Blog">http://norman.walsh.name/</uri>
    <uri label="Home">http://nwalsh.com/</uri>
  </uris>
  <emails>
    <email label="Work">Norman.Walsh@Sun.COM</email>
    <email label="Home">ndw@nwalsh.com</email>
  </emails>
  <phones>
    <phone label="Work">+1-413-303-1382</phone>
    <phone label="Work">+1-413-256-xxxx</phone>
    <phone label="Home">+1-413-256-xxxx</phone>
    <phone label="Mobile">+1-413-949-xxxx</phone>
  </phones>
  <addresses>
    <address label="Home">
      <street>XX Xxxx Street</street>
      <city>Belchertown</city>
      <state>MA</state>
      <postcode>01007</postcode>
    </address>
    <address label="Work">
      <street>1 Network Drive, Building #2
MS UBUR02-201</street>
      <city>Burlington</city>
      <state>MA</state>
      <postcode>01803</postcode>
    </address>
  </addresses>
  <notes>rdf:
a g:Male
geo:lat 42.3382
geo:long -72.45
foaf:page http://norman.walsh.name/foaf
AccessLine is x53142</notes>
  <rdf:type rdf:resource="http://nwalsh.com/rdf/genealogy#Male"/>
  <geo:lat>42.3382</geo:lat>
  <geo:long>-72.45</geo:long>
  <foaf:page rdf:resource="http://norman.walsh.name/foaf"/>
</contact>And here's the resulting RDF.
<rdf:Description rdf:about="http://norman.walsh.name/knows/who#norman-walsh">
  <rdf:type rdf:resource="http://nwalsh.com/rdf/contacts#Contact"/>
  <rdf:type rdf:resource="http://xmlns.com/foaf/0.1/Person"/>The “ID” URI is used to construct the URI for the resource. All contacts
                     are members of the Contact class and contacts that have
                     a first or last name are foaf:Persons. Contacts that
                     have only company names are foaf:Organizations. I have
                     a third class, c:Place, for geographic locations, but
                     that's probably unique to my metadata collection.
  <c:lastModified rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2005-11-24T14:10:51Z</c:lastModified>
  <c:category>Family</c:category>The last modified date and the category are directly related to the contact data in the address book.
  <foaf:firstName>Norman</foaf:firstName>
  <c:middleName>David</c:middleName>
  <foaf:surname>Walsh</foaf:surname>
  <foaf:name>Norman David Walsh</foaf:name>First and last names are available in FOAF. At the moment middle
                     names aren't, so I've created a middleName
                     property. Generating the full foaf:name is
                     straight-forward, so I do that as well.
  <c:associatedName>Sun Microsystems, Inc.</c:associatedName>
  <c:associatedTitle>XML Standards Architect</c:associatedTitle>These properties associate a company name and title with a contact. (There's room for some additional modeling complexity in titles as person, company, and title form a three-part relationship, but I've never seen an electronic address book that tried to handle that situation, so let's not worry about it.)
  <c:dateOfBirth rdf:datatype="http://www.w3.org/2001/XMLSchema#date">1967-06-16</c:dateOfBirth>
  <foaf:birthday>06-16</foaf:birthday>The FOAF birthday property, unfortunately,
                     doesn't support full dates, so I've had to invent one. But I can
                     generate the FOAF version as well.
  <foaf:weblog rdf:resource="http://norman.walsh.name/"/>
  <foaf:homepage rdf:resource="http://nwalsh.com/"/>The URIs convert naturally to FOAF properties. Some of my contacts have other sorts of URIs (entries in the Getty Thesaurus of Geographic Names®, the CIA World Factbook, etc.) for which I've invented additional properties.
  <c:workMbox rdf:resource="mailto:Norman.Walsh@Sun.COM"/>
  <foaf:mbox_sha1sum>9f5c771a25733700b2f96af4f8e6f35c9b0ad327</foaf:mbox_sha1sum>
  <c:personalMbox rdf:resource="mailto:ndw@nwalsh.com"/>
  <foaf:mbox_sha1sum>5ddcd862514c327945dca20446e11cb54ceec68b</foaf:mbox_sha1sum>I've invented subproperties of foaf:mbox for various
                     kinds of email addresses.
  <c:workPhone rdf:resource="tel:+1-413-303-1382"/>
  <c:workPhone rdf:resource="tel:+1-413-256-xxxx"/>
  <c:homePhone rdf:resource="tel:+1-413-256-xxxx"/>
  <c:mobilePhone rdf:resource="tel:+1-413-949-xxxx"/>Similarly, I've invented subproperties of foaf:phone
                     for various kinds of phone numbers.
  <c:homeAddress rdf:parseType="Resource">
    <rdf:type rdf:resource="http://nwalsh.com/rdf/contacts#Address"/>
    <c:street>XX Xxxx Street</c:street>
    <c:city>Belchertown</c:city>
    <c:stateOrProvince>MA</c:stateOrProvince>
    <c:postcode>01007</c:postcode>
  </c:homeAddress>
  <c:workAddress rdf:parseType="Resource">
    <rdf:type rdf:resource="http://nwalsh.com/rdf/contacts#Address"/>
    <c:street>1 Network Drive, Building #2
MS UBUR02-201</c:street>
    <c:city>Burlington</c:city>
    <c:stateOrProvince>MA</c:stateOrProvince>
    <c:postcode>01803</c:postcode>
  </c:workAddress>Continuing to follow that pattern, I invented a class for postal
                     addresses, an address property, and subproperties of
                     it for various kinds of addresses.
  <c:notes>rdf:
a g:Male
geo:lat 42.3382
geo:long -72.45
foaf:page http://norman.walsh.name/foaf
AccessLine is x53142</c:notes>Finally, the notes property holds notes about
                     the contact. I parse pseudo-N3 from the notes field to add additional
                     properties to the record.
  <rdf:type rdf:resource="http://nwalsh.com/rdf/genealogy#Male"/>
  <geo:lat>42.3382</geo:lat>
  <geo:long>-72.45</geo:long>
  <foaf:page rdf:resource="http://norman.walsh.name/foaf"/>
</rdf:Description>The collected RDF for all my contacts are then augmented by additional inference rules to build a final, combined model for my “personal information manager”. One example of a rule is this one:
{ ?c a foaf:Organization .
  ?c c:associatedName ?t .
  ?p a foaf:Person .
  ?p c:associatedName ?t } => { ?p c:associatedWith ?c } .This rule says that if there's an organization (for example, an entry in my address book with only a company name) and a person with the same association, then that person is associated with that organization. So, for example, when I format the address book entry for that company, I get pointers to all the people I know who work for that company. I also have a vocabulary for relationships inside Sun (employee numbers, department numbers, reporting structures, etc.) that I can “scrape” from the internal name finder. Rules associated with terms in that vocabulary allow me to generate appropriate cross-references between employees, departments, etc.
The Ontology
The resulting ontology is:
# -*- N3 -*-
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix xs: <http://www.w3.org/2001/XMLSchema#> .
@prefix c: <http://nwalsh.com/rdf/contacts#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
<http://nwalsh.com/rdf/contacts> a owl:Ontology;
    rdfs:comment "Norm's ontology for his address book." .
# ------------------------------------------------------------
# A contact in an address book
c:Contact a owl:Class;
    rdfs:subClassOf
        [
             a owl:Restriction;
             owl:cardinality "1"^^xs:nonNegativeInteger;
             owl:onProperty c:lastModified ] .
# Timestamp of address book entry
c:lastModified a owl:DatatypeProperty;
    rdfs:domain c:Contact;
    rdfs:range xs:dateTime .
# Category in address book
c:category a owl:DatatypeProperty;
    rdfs:domain c:Contact .
# A middle name (other name properties come from FOAF)
c:middleName a owl:DatatypeProperty .
# Company and title
c:associatedName a owl:DatatypeProperty .
c:associatedTitle a owl:DatatypeProperty .
# Birthday
c:dateOfBirth a owl:DatatypeProperty;
    rdfs:range xs:dateTime .
# Email addresses
c:personalMbox a owl:ObjectProperty;
     rdfs:subPropertyOf foaf:mbox .
c:workMbox a owl:ObjectProperty;
     rdfs:subPropertyOf foaf:mbox .
c:pagerMbox a owl:ObjectProperty;
     rdfs:subPropertyOf foaf:mbox .
c:obsoleteMbox a owl:ObjectProperty;
     rdfs:subPropertyOf foaf:mbox .
# Phone numbers
c:dataPhone a owl:ObjectProperty;
     rdfs:subPropertyOf foaf:phone .
c:fax a owl:ObjectProperty;
     rdfs:subPropertyOf foaf:phone .
c:homePhone a owl:ObjectProperty;
     rdfs:subPropertyOf foaf:phone .
c:workPhone a owl:ObjectProperty;
     rdfs:subPropertyOf foaf:phone .
c:mobilePhone a owl:ObjectProperty;
     rdfs:subPropertyOf foaf:phone .
c:pagerPhone a owl:ObjectProperty;
     rdfs:subPropertyOf foaf:phone .
# Notes
c:notes a owl:DatatypeProperty .
# Postal address
c:Address a owl:Class;
    rdfs:subClassOf
        [
             a owl:Restriction;
             owl:maxCardinality "1"^^xs:nonNegativeInteger;
             owl:onProperty c:street ],
        [
             a owl:Restriction;
             owl:maxCardinality "1"^^xs:nonNegativeInteger;
             owl:onProperty c:city ],
        [
             a owl:Restriction;
             owl:maxCardinality "1"^^xs:nonNegativeInteger;
             owl:onProperty c:stateOrProvince ],
        [
             a owl:Restriction;
             owl:maxCardinality "1"^^xs:nonNegativeInteger;
             owl:onProperty c:postcode ],
        [
             a owl:Restriction;
             owl:maxCardinality "1"^^xs:nonNegativeInteger;
             owl:onProperty c:country ] .
# Addresses
c:address a owl:ObjectProperty;
     rdfs:range c:Address .
c:workAddress a owl:ObjectProperty;
     rdfs:subPropertyOf c:address .
c:homeAddress a owl:ObjectProperty;
     rdfs:subPropertyOf c:address .
# Fields of an address
c:street a owl:DatatypeProperty;
   rdfs:domain c:Address;
   rdfs:range xs:string .
c:city a owl:DatatypeProperty;
   rdfs:domain c:Address;
   rdfs:range xs:string .
c:stateOrProvince a owl:DatatypeProperty;
   rdfs:domain c:Address;
   rdfs:range xs:string .
c:postcode a owl:DatatypeProperty;
   rdfs:domain c:Address;
   rdfs:range xs:string .
c:country a owl:DatatypeProperty;
   rdfs:domain c:Address;
   rdfs:range xs:string .
I have a few additional constraints that I think are limited to my particular address book:
# -*- N3 -*-
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix xs: <http://www.w3.org/2001/XMLSchema#> .
@prefix c: <http://nwalsh.com/rdf/contacts#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
c:Contact
    rdfs:subClassOf
        [
             a owl:Restriction;
             owl:cardinality "1"^^xs:nonNegativeInteger;
             owl:onProperty c:category ],
        [
             a owl:Restriction;
             owl:maxCardinality "1"^^xs:nonNegativeInteger;
             owl:onProperty foaf:firstName ],
        [
             a owl:Restriction;
             owl:maxCardinality "1"^^xs:nonNegativeInteger;
             owl:onProperty foaf:surname ],
        [
             a owl:Restriction;
             owl:maxCardinality "1"^^xs:nonNegativeInteger;
             owl:onProperty c:middleName ],
        [
             a owl:Restriction;
             owl:maxCardinality "1"^^xs:nonNegativeInteger;
             owl:onProperty c:associatedName ],
        [
             a owl:Restriction;
             owl:maxCardinality "1"^^xs:nonNegativeInteger;
             owl:onProperty c:associatedTitle ],
        [
             a owl:Restriction;
             owl:maxCardinality "1"^^xs:nonNegativeInteger;
             owl:onProperty c:dateOfBirth ],
        [
             a owl:Restriction;
             owl:maxCardinality "1"^^xs:nonNegativeInteger;
             owl:onProperty foaf:birthday ] .
foaf:Organization
    rdfs:subClassOf
        [
             a owl:Restriction;
             owl:cardinality "0"^^xs:nonNegativeInteger;
             owl:onProperty foaf:firstName ],
        [
             a owl:Restriction;
             owl:cardinality "0"^^xs:nonNegativeInteger;
             owl:onProperty foaf:surname ],
        [
             a owl:Restriction;
             owl:cardinality "0"^^xs:nonNegativeInteger;
             owl:onProperty foaf:name ],
        [
             a owl:Restriction;
             owl:cardinality "0"^^xs:nonNegativeInteger;
             owl:onProperty foaf:nick ] .
c:Place a owl:Class;
  rdfs:subClassOf foaf:Agent .
c:gettyTGN a owl:ObjectProperty;
   rdfs:subPropertyOf foaf:page .
c:ciaFactbook a owl:ObjectProperty;
   rdfs:subPropertyOf foaf:page .
c:weather a owl:ObjectProperty;
   rdfs:subPropertyOf foaf:page .
c:associatedWith a owl:ObjectProperty;
    rdfs:domain foaf:Person;
    rdfs:range foaf:Organization .
c:hasAssociated a owl:ObjectProperty;
    rdfs:domain foaf:Organization;
    rdfs:range foaf:Person .
Open Questions
Should some of this be incorporated into FOAF? Should I have tried to use the vCard schema instead? And of course, which bits could be modelled better?
Comments
Y'know, this essay illuminates a lot of why I will only use RDF when dragged kicking and screaming in front of a directed graph.
I learned to make tolerable ERDs in a few weeks. The design principles were pretty clear, and where they weren't clear, the tradeoffs were easily explainable.
I learned to write tolerable SGML DTDs (basic ones, anyway -- not so much the modular, INCLUDEd ones) in a few months. The design principles were pretty clear, and where they weren't clear, the tradeoffs were evident to an eye used to coping with instances.
The design principles for an RDF vocabulary -- are there any? I've once or twice, to my sorrow, been in a room where an RDF vocabulary was being designed. All discussions devolved into class-property-label disputes, weird attempts to turn "what the model is REALLY saying" into English sentences whose syntax invariably ended up horribly tortured and which did not clarify the issue at hand in the slightest, and confusion about bags or lists or links or whatever.
(And RDF people get all squinchy if you don't do these things correctly, even when they can't coherently explain the design principles or the tradeoffs. That's the worst part.)
I've never written a topic map, but if I had to, I expect I could figure out how in a few weeks (what with all my wizard librarian-fu). I doubt I'll ever figure out how to put together an RDF ontology. Possibly that should bother me, but somehow... I'm not bothered.
The thing about working out rdf models, if you already know ER modeling, is just to do ER modeling at the conceptual modeling level. RDF can be thought of as an ultra-normalized relational database, modulo a few quirks, of course. It's usually pretty straightforward to go from a high level ER diagram to an rdf design. An advantage is that, with rdf, you don't have to create new tables or rows for rare cases. The rdf data set can be like a set of sparse tables.
You can also do some informal object modeling as your starting point.
Just like ER designs, there are usually several ways to model something in rdf, and usually, none of them is the "one right way".
You also don't have to use every feature of rdf to do most of your work, anymore than you have to use every feature of a relational database. Use a nice subset and forget the rest for some other time.
Norm, I've had to do exactly this kind of modelling of types on phone numbers and email addresses, both personally and in my corporate work.
The solution I used is to exploit the fact that tel: URIs and mailto: URIs are the only acceptable ranges of foaf:phone and foaf:mbox, so we can actually type the objects.
My blog post on this topic is here.
So, rather than making subproperties of foaf:phone, we can just use foaf:phone and assert that, say,
<tel:+1-234-567-8900> rdf:type EXT:cellPhone .
It's a modelling choice, but I quite like it.
But unlike database design and XML schema design, which are designs that live in relative isolation, designing an RDF ontology (or at least this ontology) is about cooperating in a web-scale information space. Just as my pages on the web benfit from the network effect, the terms in my RDF vocabulary can benefit from that effect as well.
I want people to reuse them, I want them to spread, and I want to add value to the network by reusing terms from other vocabularies. My struggles with the design are as much about my attempts to understand what it means to design an ontology that will be used on that scale as they are about the details of this particular model.
As for discussions that devolve into class-property-label disuptes, if you haven't seen the same thing in designing XML vocabularies (elements or attributes?, element and attribute names!, mixed content or element content?, wrapper element or no wrapper element?, etc.), tag along with your favorite XML consultant sometime :-)
Thanks, Rich. That's an interesting idea.
While my initial response was that I liked the xml version of contacts, I did empathize with Dorotheas comments. Your response made (at least part of) your reasoning clear. designing an RDF ontology (or at least this ontology) is about cooperating in a web-scale information space could be seen as taking xml another step forward. I'd be interested in hearing more from you on this please Norm.
Also of interest to me was another piece of rationale for wanting the rdf, that you don't expand on.
I want to be able to combine the contacts in my address book with other data sources in ways that RDF makes easy and I want to be able to do inference over contacts again.
Which rdf tools make it easier than merging and querying the XML please?
regards DaveP
You have left it as an open question but why did you not consider vCard from the beginning?
I think it might not be very RDF-friendly, because it was conceived with XML in mind, but it facilitates mapping existing contacts information to the RDF space. For instance my RDF profile, which is rendered as HTML with ReDeFer at my homepage
Re Roberto: I'm assuming you're referring to the vCard-in-RDF spec that's lying around on the W3C's servers somewhere.
The reason not to use it… it's awful. Avoiding it like the plague is the best idea.