Semantic Web

Last Update : October 7, 2012

The Semantic Web is a collaborative movement led by the international standards body W3C. The Semantic Web is a Web of Data, as opposed to the existing Web of Documents. The goal of the Web of Data is to enable computers to do more useful work and to develop systems that can support trusted interactions over the network.

The Web of Data is empowered by new technologies such as RDFa (Resource Description Framework-in attributes), SPARQL, OWL (Web Ontology Language), SKOS (Simple Knowledge Organization System), Microdata and Open Graph.

HTML (HyperText Markup Language) remains still the main markup language for displaying web pages and other information that can be displayed in a web browser.

Semantic HTML refers to the semantic elements and attributes of HTML (h1, h2, …, p, …), as opposed to the presentational HTML elements and attributes (center, font, b, …). The acronym POSH was coined in 2007 for semantic HTML, as a shorthand abbreviation for “plain old semantic HTML”.

HTML5 introduced a few new structural elements :

  • <header> : this tag replaces the <div class=”header”>, commonly used in the past by most designers. The header element contains introductory information to a section or page.
  • <footer> : same as above, it’s the well known <div class=”footer”>. The footer element is for marking up the baseline of the current page and of each section contained in the page.
  • <nav> : replacement for <div class=”navigation”>. The nav element is reserved for the primary navigation. Not all link groups in a page or section need to be contained within the <nav> element.
  • <section> : this is the replacement for the generic flow container <div> when it contains related content. <div> is a block-level element with no additional semantic meaning, whereas <section> is a sectioning element which has normally a header and a footer and represents a generic document or application section.
  • <article> : the <article> element represents a portion of a page or section which can stand alone and makes sense even outside the context of the page. Like <section>, an <article> generally has a header and a footer. You should avoid nesting an <article> inside another <article>.

HTML5 tag <aside>

  • <aside> : this tag is used to represent content that is related to the surrounding content within an section, article or web page, but could still stand alone in its own right. (see figure at right). This type of content is often represented in sidebars.
  • <hgroup> : A special header element that must contain at least two <h1>-<h6> tags and nothing else. It’s a group of titles with subtitles. Make sure to maintain the <h1> – <h6> hierarchy.

RDFa is a W3C Recommendation that adds a set of attribute-level extensions (rich metadata) to web documents. RDFa 1.1 was approved in June 2012. It differs from RDFa 1.0 in that it no longer relies on the XML-specific namespace mechanism, but ca be used with non-XML document types such as HTML 4 or HTML 5. eRDF is an alternative to RDFa. SPARQL is an RDF query language. On 15 January 2008, SPARQL 1.0 became an official W3C Recommendation. OWL is a family of knowledge representation languages for authoring ontologies. An ontology formally represents knowledge as a set of concepts within a domain in computer science and information science, and the relationships among those concepts. Ontologies are the structural frameworks for organizing information and are used, among others, in artificial intelligence. SKOS is a family of formal languages designed for representation of of structured controlled vocabulary (thesauri, classification schemes, taxonomies, …). Microdata is a WHATWG specification used to nest semantics within existing content on web pages. The Open Graph protocol, originally created by Facebook, enables any web page to become a rich object in a social graph.

All these technologies help computers such as search engines and web crawlers better understand what information is contained in a web page, providing better search results for users.

Another set of simple, structured open data formats, built upon existing standards, is Microformats. One difference with the other semantic technologies is that Microformats is designed for humans first and machines second.

The following list provides links to some useful blogs and tutorials about the semantic web: