HTML microdata

One of the most adavanced technologies for the semantic web is HTML microdata. HTML Microdata is a W3C Working Draft (last version : 29 March 2012).

Most HTML tags tell the browser how to display the information included in a tag. For example <h1>Blackberry</h1> tells the browser to display the text string Blackberry in a heading 1 format. However, the HTML tag doesn’t give any information about what that text string means. Blackberry could refer to a mobile device or to a fruit and this makes it difficult for search engines to intelligently display relevant content to a user.

Microdata vocabularies provide the semantics, or meaning of an item. Web developers can design a custom vocabulary or use vocabularies available on the web. Microdata vocabularies are provided by schema.org.

Microdata introduces five simple global attributes (available for any element to use) which give context for machines about your data :

  • itemscope – creates the Item and indicates that descendants of this element contain information about it (boolean attribute)
  • itemtype – a valid URL of a vocabulary that describes the item and its properties context
  • itemid – indicates a unique identifier of the item
  • itemprop – indicates that its containing tag holds the value of the specified item property (strings, urls, images, …)
  • itemref – properties that are not descendants of the element with the itemscope attribute can be associated with the item using this attribute

Google uses semantic web technologies to create rich snippets (detailed information intended to help users with specific queries) in web search results. Googles suggest to use microdata as a markup format. Actually Google supports rich snippets for the following content types: Reviews, People, Products, Businesses and organizations, Recipes, Events and Music.

Google provides a Rich Snippet Testing Tool to check that their search engines can correctly parse the structured data markup and display it in search results. A Microdata schema creator is provided by Raven.

The next list provide links to more informations about microdata, followed by a list of links to specific vocabularies :

Semantic Web

Last Update : October 7, 2012

The Semantic Web is a collaborative movement led by the international standards body W3C. The Semantic Web is a Web of Data, as opposed to the existing Web of Documents. The goal of the Web of Data is to enable computers to do more useful work and to develop systems that can support trusted interactions over the network.

The Web of Data is empowered by new technologies such as RDFa (Resource Description Framework-in attributes), SPARQL, OWL (Web Ontology Language), SKOS (Simple Knowledge Organization System), Microdata and Open Graph.

HTML (HyperText Markup Language) remains still the main markup language for displaying web pages and other information that can be displayed in a web browser.

Semantic HTML refers to the semantic elements and attributes of HTML (h1, h2, …, p, …), as opposed to the presentational HTML elements and attributes (center, font, b, …). The acronym POSH was coined in 2007 for semantic HTML, as a shorthand abbreviation for “plain old semantic HTML”.

HTML5 introduced a few new structural elements :

  • <header> : this tag replaces the <div class=”header”>, commonly used in the past by most designers. The header element contains introductory information to a section or page.
  • <footer> : same as above, it’s the well known <div class=”footer”>. The footer element is for marking up the baseline of the current page and of each section contained in the page.
  • <nav> : replacement for <div class=”navigation”>. The nav element is reserved for the primary navigation. Not all link groups in a page or section need to be contained within the <nav> element.
  • <section> : this is the replacement for the generic flow container <div> when it contains related content. <div> is a block-level element with no additional semantic meaning, whereas <section> is a sectioning element which has normally a header and a footer and represents a generic document or application section.
  • <article> : the <article> element represents a portion of a page or section which can stand alone and makes sense even outside the context of the page. Like <section>, an <article> generally has a header and a footer. You should avoid nesting an <article> inside another <article>.

HTML5 tag <aside>

  • <aside> : this tag is used to represent content that is related to the surrounding content within an section, article or web page, but could still stand alone in its own right. (see figure at right). This type of content is often represented in sidebars.
  • <hgroup> : A special header element that must contain at least two <h1>-<h6> tags and nothing else. It’s a group of titles with subtitles. Make sure to maintain the <h1> – <h6> hierarchy.

RDFa is a W3C Recommendation that adds a set of attribute-level extensions (rich metadata) to web documents. RDFa 1.1 was approved in June 2012. It differs from RDFa 1.0 in that it no longer relies on the XML-specific namespace mechanism, but ca be used with non-XML document types such as HTML 4 or HTML 5. eRDF is an alternative to RDFa. SPARQL is an RDF query language. On 15 January 2008, SPARQL 1.0 became an official W3C Recommendation. OWL is a family of knowledge representation languages for authoring ontologies. An ontology formally represents knowledge as a set of concepts within a domain in computer science and information science, and the relationships among those concepts. Ontologies are the structural frameworks for organizing information and are used, among others, in artificial intelligence. SKOS is a family of formal languages designed for representation of of structured controlled vocabulary (thesauri, classification schemes, taxonomies, …). Microdata is a WHATWG specification used to nest semantics within existing content on web pages. The Open Graph protocol, originally created by Facebook, enables any web page to become a rich object in a social graph.

All these technologies help computers such as search engines and web crawlers better understand what information is contained in a web page, providing better search results for users.

Another set of simple, structured open data formats, built upon existing standards, is Microformats. One difference with the other semantic technologies is that Microformats is designed for humans first and machines second.

The following list provides links to some useful blogs and tutorials about the semantic web:

HTML5 Structure : Semantic Webdesign

Last update : August 30, 2012

HTML5 is work in progress and is going to stay that way for some time, but that’s no reason not to start using it right now. HTML5 added some very important new, semantic elements. To care for older browsers, use graceful degradation techniques. To be up to date with the latest trends, use progressive enhancement technologies.

HTML5 is not based on SGML, and therefore does not require a reference to a DTD.

The website When can I use provides compatibility tables for support of HTML5, CSS3, SVG and more in desktop and mobile browsers.

The following list provide links to some useful blogs and tutorials about HTML5 :

The following list provide links to some useful HTML5 tools :

 

eyePlorer : the knowledge machine

Last update : August 9, 2013

eyePlorer by Vionto

eyePlorer by Vionto

eyePlorer is (or was) a graphical knowledge engine created by vionto®. Current search engines only present lists of links and documents, with eyePlorer however, you are able to locate relevant information and connections instantly. Facts and relationships between terms and concepts are visualised in an interactive application. The knowledge machines build by vionto® employ sophisticated semantic techniques in order to analyse the meaning of sentences and texts. The benefit for the user is that he or she can work with individual facts instead of just long documents.

The user does not work with documents but with knowledge and facts in a graphical, interactive, almost dialogue-like kind of way. Knowledge is visually arranged in different categories. vionto® knowledge machines are based on semantic analyses derived from cognitive science, brain research and computational linguistics. vionto® relies on a robust language technology platform and sophisticated linguistic resources such as, for example, ontologies and thesauri. Currently eyePlorer processes the English and German Wikipedia as well as MEDLINE/PubMed.

In the circular area on the left hand side eyePlorer presents eyeSpots – these represent terms that are related to a search topic. The exact nature of these connections can be displayed by pointing or clicking on an eyeSpot. A small window, the eyeTip, opens and displays facts that document the relation with one or more facts taken from our knowledge base. The circular area filled with eyeSpots that relate to a certain search term is called an eyeMap. To display connections between eyeSpots just double-click on an eyeSpot – lines will appear between the eyeSpot you clicked upon and eyeSpots that are semantically related. A click on one of these lines will display associated facts taken from the knowledge base.

EyeSpots are associated with various categories (people, countries, organizations, time, society, work, science & technology, …) visualized with different colors. The categorisation is a procedure that is carried out automatically. A double click on an empty area of a category expands it and shows only this category along with all its subcategories.

A dynamic link to eyePlorer can be added to a website to visualize search terms.

vionto® filed for U.S. patent registration of the eyePlorer technology. The eyePlorer visualizes knowledge graphs (k-graphs) derived from various contents that can be interactively explored.

vionto GmbH was founded in december 2008 in Berlin by Ralf von Grafenstein (Diplom-Kaufmann) and Dr. Martin C. Hirsch (neurobiologist and brain researcher). The first version of eyePlorer went online in February 2009 (see Frankfurter Allgemeine Feuilleton : Google-Herausforderer eyePlorer – Die Welt ist doch eine Scheibe) . Several prestigious prizes of the  internet sector have been awarded to vionto® (SUMA award, ECO Intenet award, Red Herring Europe Top 100 2009 Award, …)

However one year later was the end of the prestigious project with the inglorious death by bankruptcy of vionto GmbH (see SpeedX Blog : Verglüht). It was the same destiny as its ancestor semgine GmbH. The successor seems to be medx GmbH (diagnostic reasoning), the url eyeplorer.com is redirected to this site.