Semantic Web
A Quick But Detailed Introduction
Tom Morris
tom@opiumfield.com
Meta
-
diff web semweb > ./brain
- Content-heavy
- Presentation is online
- If you don't grok something, ask me afterwards - there's lots of time to
chat
Semantic...
- "trousers" == "pants"
- "movie" == "film"
- "football" == "soccer"
...Web
- We can quite easily learn that when we hear Americans talk that "football"
means something quite different.
- Americans learn quite quickly never to eat anything that British people
offer them. Marmite especially.
- But imagine if our computers could do similarly.
Beyond Tags...
- Tags can't represent more complex data structures
- There is emergent behaviour within tagging. "to_read", for instance. I use a
"from:" quasi-namespace on delicious to distinguish "from:timbernerslee" and (about)
"timbernerslee".
- I'm not crazy. Flickr is doing this with Machine Tags - from emergent behaviour ('geo', 'upcoming'). People are already
doing this with tags - often with the aid of third-party software (geotagging Flickr).
- Rather than arbitrary quasi-standards, why don't we formalise these?
...to Triples
- "This photo" (subject) was "photographed using a camera" (predicate) called
the "Canon PowerShot G5" (object).
- This is the core of RDF - and the basic core of relational databases. The
'semantic' in Semantic Web is adding semantics to links by describing
relationships.
- Take XFN. Can you describe your
relationships with everyone you know using one of the following words?
contact acquaintance friend met co-worker colleague
co-resident neighbour child parent sibling spouse kin muse crush date
sweetheart me
- What if you want to go beyond XFN's predefined categories?
- Microformats get 80% of the way in the same way that Microsoft Works goes
80% of the way - what is omitted by using a simple solution are all the
tough problems.
Why RDF?
One thing that RDF doesn't mandate is a single all-embracing format, it
positively embraces plurality of schemas, and independent adoption and
repurposing of schemas. This is a design goal. It does propose a single
underlying model for modelling data, in the same way that an RDBMS has the
relational model behind it. So one thing that RDF can achieve that XML can't, is
the jettisoning of the one-size-fits-all approach.
- Leigh Dodds
- Keep the data model the same, just change the data.
Some Examples
- Kissology - something which XFN omitted! Allows one to describe a snog
network.
- Vegetarian
Ontology - describes a person's eating preferences. Wouldn't it be
cool if when you were organising a dinner party, your computer would tell
you how many of the potential attendees are vegetarian without having to
ask?
- iso:visit - an ontology exists to describe which countries and US states one
has visited.
- Attention RDF
More serious examples
-
FOAF
- social networking
-
SIOC
- structure of online communities (forums, blogs, newsgroups etc.)
- BIO - biographical information - extends FOAF for genealogical and other purposes
- Various bio-science ontologies are being developed by academics (see HCLS/Ontology Task Force and NCBO BioPortal).
RDF Serialization
- Subjects, Objects and Predicates have to be organised in to some format.
These are serializations
- In addition, many languages have RDF toolkits, which generally have ways to
model RDF in to objects, and to store them in SQL databases.
Which Serialization Format?
- If you are writing RDF programmatically, use whatever
toolkit is available for your language - Redland (C), rdflib (Python), Jena (Java),
RAP (PHP)
- If you are transmitting RDF, use RDF/XML. It's most widely
supported by software. If your library doesn't support it, something is
wrong.
- If you are writing RDF by hand, use N3 or Turtle (or
equivalent syntaxes).
- If you want to understand what's actually going on, read your RDF
as triples - serialized using N-Triples, TriX or another raw triples format.
The W3C Validator will show these to you, along with a cool graph of the
relationships (using IsaViz).
An aside
I believe that one of the best ways to transition into RDF, if not a long-term
deployment strategy for RDF, is to manage the information in human-consumable
form (
XHTML) annotated with just enough info to extract the RDF statements that
the human info is intended to convey. In other words: using a relational
database or some sort of native RDF data store, and spitting out HTML
dynamically, is a lot of infrastructure to operate and probably not worth it for
lots of interesting cases. We all know that we have to produce a human-readable
version of the thing... why not use that as the primary source?
- Dan Connolly, 2000
- A bit like microformats, no?
Micro RDF!
- Two formats to add RDF triples to (X)HTML documents.
- eRDF
(Embeddable RDF) - uses
class, id and
title elements, as per microformats, with link
elements to encode RDF. Can be read using PHP library or with XSLT.
It's unofficial, but it doesn't break anything or does anything that'll
freak out web designers.
- RDFa is the W3C way of
doing it. It's looks much nicer, and there's more support among developers.
But it relies on XHTML2 attributes. Use it now if you don't care about your
web pages validating. The tools for reading it are nicer, and it's
slightly closer to the RDF model. It also doesn't rely on namespacing.
Parsing it may be a bit quicker too.
Example eRDF
<html>
<head
profile="http://purl.org/NET/erdf/profile"><!-- eRDF profile -->
<title>Tom's Homepage</title>
<base
href="http://example.org/about" />
<meta name="dc.creator"
content="Tom Morris" />
<meta name="dc.title" content="Tom's
Homepage" />
<link rel="schema.dc"
href="http://purl.org/dc/elements/1.1/" />
<link rel="schema.foaf"
href="http://xmlns.com/foaf/0.1/" /><!-- FOAF schema -->
</head> <body>
<p id="me">
Hi, I'm <span
class="foaf-name">Tom Morris</span>.<!-- dash used as namespace delineator in class -->
</p>
</body>
</html>
eRDF parsed in to XML
<rdf:RDF xmlns:foaf="http://xmlns.com/foaf/0.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
<rdf:Description about="me">
<foaf:name>Tom Morris</foaf:name>
</rdf:Description>
</rdf:RDF>
There's more!
- What if you don't want to ransack your class names, IDs and so on?
- Enter GRDDL (Gleaning
Resource Descriptions from Dialect of Languages).
- GRDDL lets you specify using XSL a way to extract RDF in an XML or XHTML
document. This way, you can write an XSL document that represents your
site's specific data and let a GRDDL parser 'scrape' the data without
actually having to write a scraper.
- It teaches your scraper new tricks! With eRDF, RDFa and GRDDL, RDF can be
pretty easy for web designers to start producing if microformats don't
suffice.
- The design pattern is something that can be applied to other XML formats.
RSS/Atom, OPML etc. could be easily GRDDLed, for instance.
- GRDDL is one way to solve the 20% problem with microformats.
GRDDL
- GRDDL isn't perfect. It requires you to be producing well-formed XML
(doesn't have to be validating XHTML, but well-formedness is vital)
- GRDDL is simple though - it uses XSLT, plus the
link element in
XHTML to specify a stylesheet.
- There are implementations for most of the microformats, plus it's easy to
make your own.
- This really is a best of both worlds solution - it works today,
works with microformats and is easy to do.
- Also see: MicroModels.
Schema Languages
- This is all fine if you want to produce RDF using someone else's vocabulary.
But what if you want to define your own?
- Due to time constraints, I can't discuss this at the length I'd like, so
I'll just point you in the right direction.
- There are two (current, main) languages for doing this - RDFS (RDF Schema)
and OWL (Web Ontology Language).
- There are two good GUI tools for writing schema and ontologies
- Protégé - free, Java,
open source, cross platform - wiki
- TopBraid Composer -
about $1,295, Java, a plugin for Eclipse IDE - blog
- (As I don't have a thousand dollars to spend on an OWL editor, I use
Protégé. And I like it a lot.)
SPARQL!
- SPARQL is very cool. It's like SQL, but for the SemWeb.
- It's almost a W3C standard. There are a few good implementations too.
Python's rdflib, Java's Jena and PHP's RAP all support it.
- The output format for SPARQL is XML, which means that it's pretty easy to
transform into displayable data (HTML, RSS, OPML etc.)
- You could describe your social relations in FOAF (or GRDDL-ed XFN), and then
use SPARQL to turn that in to a blogroll. (Hat tip: Danny
Ayers)
- You can experiment with SPARQL at sparql.org
- The eventual goal of SPARQL is that it works as an API - a web application
might make available a SPARQL endpoint, which would let you query the
triples and get XML back.
SPARQL Example
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT ?name
?mbox
WHERE
{ ?x foaf:name ?name .
?x foaf:mbox ?mbox }
| name |
mbox |
| "Johnny Lee Outlaw" |
<mailto:jlow@example.com> |
| "Peter Goodguy" |
<mailto:peter@example.org> |
- XML response format - can be easily transformed using XSLT or scripting
language of choice
Where next?
- Hopefully, more and more practical implementations. Too much theoretical
stuff, not enough people building cool toys, applications, mashups etc.
- Hopefully, less 900 page specifications and less waffly academics.
- Hopefully, more people implementing RDF/SemWeb technology in spite of their
prejudices. :)
- Even if you aren't interested in RDF, the SemWeb is made easier to progress
towards if you use well-formed XHTML.
Dead tree matter
- Shelley Powers - Practical RDF, O'Reilly
- Best book on the subject. Practical and still current.
- Explains RDF/RDFS/OWL, concepts and tools - doesn't cover RDF-in-HTML.
- Love to see an update to cover new tools and RDF-in-HTML.
- Paul Haine - HTML Mastery, Friends of Ed
- Covers microformats and RDFa for web designers.
- Spinning the Semantic Web
- Interesting, but quite out-of-date. Covers old methodology and technology.
- Worth reading for historical interest, but don't go implementing SHOE ontologies...
-
timbl - Weaving the Web
- Interesting read for both geeks and non-geeks.
Contact
tom [at] opiumfield [dot] com