Voice driven web applications

Last update : July 17, 2013

The new JavaScript Web Speech API specified by W3C makes it easy to add speech recognition to a web page and to create voice driven web applications. It enables developers to use scripting to generate text-to-speech output and to use speech recognition as an input for forms, continuous dictation and control. The JavaScript API allows web pages to control activation and timing and to handle results and alternatives.

The Web Speech specification was published by the Speech API Community Group, chaired by Glen Shires, software engineer at Google. The specification is not a W3C Standard nor is it on the W3C Standards Track.

A demo working in the Chrome browser 25 and later is available at the HTML5 rocks website.

There are two processes : Text-to-Speech (speech synthesis : TTS) and Speech-to-Text (speech recognition : ASR). There are at least three different approaches to synthesize text :

  • integrated :  a TTS module is built into the OS, or a separately installed TTS engine can plug-in to the OS’s TTS module.
  • packaged : instead of requiring a separate install, a synthesizer and voices can be packaged and shipped with the application.
  • in the cloud : a web-service is used to synthesize text. The advantage of this is a more predictable and consistent voice quality, independent from the hardware and operation system used on the mobile client.

Concerning ASR, Wolf Paulus, an internationally experienced technologist and innovator, compared the performance (speed and accuracy) of the speech recognition systems developed by Google, Nuance, iSpeech and AT&T.

A HTML Speech XG Speech API Proposal, introduced by Microsoft to the  HTML Speech Incubator Group, is available as unofficial draft at the W3C website.

A list of speech recognition software is available at Wikipedia. The main hosted speech applications are presented below :

iSpeech

iSpeech provides speech solutions for individuals and business, in different fields as mobiles, connected homes, automotive, publishing (audio books), e-learning and more. The solutions include Text-to-speech (TTS) and speech recognition (ASR).

iSpeech offers API’s and SDK for developers for different devices and programming languages (iPhone, Android, Blackberry, PHP, JAVA, Python, .NET, Flash, Ruby, Perl) and comprehensive documentations, integration guides, web samples and FAQ’s. iSpeech povides development keys to use the three servers :

  • Mobile Development
  • Mobile Production
  • Web/General/Desktop/Other Production

The applications must be configured to use the correct servers.To make the web/general key work, you need to buy credits. The low usage price is $0.02 per word (TTS) or per transaction (ASR).

An free iSpeech app for iOS devices (version 1.3.5 updated May 13, 2013) to convert text to speech with the best sounding voices is available at the iTune store. This app is powered by the iSpeech.org Text to Speech (TTS) software as a service (SaaS) API. Other apps for iOS and Android devices are listed at the iSpeech website. A Text-to-Speech demo is also available.

Nuance

Nuance Communications is a multinational computer software technology corporation, headquartered in Burlington, Massachusetts, that provides speech and imaging applications.

In August 2012, Nuance announced Nina, a collection of personal assistant technologies that will bring Siri-like functionality to customer service mobile apps.

Nuance provides the Dragon Mobile SDK to developers that joined the NDEV Dragon Mobile developer program. This creates a unique opportunity in the mobile developer ecosystem to power any application with Nuance’s proven, best-in-class Dragon Naturally Speaking voice recognition technology.

In joining NDEV Mobile, developers have free access to wrappers and widgets for simple application customization, all through a self-service website. Developers also have access to an on-line community forum for support, a variety of code samples and full documentation. Once an NDEV Mobile developer has integrated the SDK into their application, Nuance provides 90 days of free access to the cloud-based speech services to validate the power of speech recognition on their application. To put an application in production, a licence fee of 3.000 $ has to be prepaid.The low usage price is 0,009 $ per transaction.

The following platforms are supported :

  • Apple  iOS
  • Android
  • Windows Phone
  • HTTP web services interface

A mobile assistant & voice app for iOS and Android is available in the iTunes at GooglePlay stores.

AT&T Watson Speech engine

AT&T offers a free speech development program to access the tools needed to build, test, onboard and certify applications across a range of devices, OSes and platforms.

There are three classes of functionality in the AT&T speech API family :

  • Speech to Text : 9 contexts are optimized to return the text of what the end users say. The text can be returned in multiple formats, including, JSON and XML.
  • Text to Speech : Male and female ‘characters’ are available for both English and Spanish.
  • Speech to Text Custom :  the speech service is customized by sending a list of words or phrases commonly spoken by the end users to improve recognition of those unique words. The Grammar List supports 19 languages, the Generic with Hints supports English and Spanish.

The Call Management (Beta) API that is powered by Tropo™ exposes SMS and Voice Calling RESTful APIs, which enable app developers to create voice-enabled apps that send or receive calls, provide Interactive Voice Response (IVR) logic, Automatic Speech Recognition (ASR), Voice to Text (VTT), Text (SMS) integration, and more. SDK’s are available for HTML5 (Sencha Touch), Android, iOS and Microsoft. Tools are provided for key platforms, including Android, Brew MP, HTML5, RIM BlackBerry and Windows Phone.

The Speech API provides two methods for transcribing audio into text and one method for rendering text into audio. An AT&T Natural Voices Text-to-Speech Demo is availbale at the AT&T research website.

API access to the AT&T sandbox and production environments costs 99$ a year. The sandbox and production environments allow you to develop, test, and deploy applications using AT&T APIs, including 1 million points (one transaction = one point) each month to spend on any APIs they like. A US based credit card is required to charge 20$ for each additional group of 2,000 points exceeding one million. See the AT&T pricelist.

AT&T Application Resource Optimizer (ARO) is a free diagnostic tool for analyzing the performance of your mobile applications. It can help your app run faster and smarter by providing recommendations to help optimize your mobile application’s performance, speed, network impact and battery utilization.

Speech API FAQ’s as well as code samples, documents, tutorials, guides, SDK’s, tools, blogs, forums and more are available at the AT&T speech development website.

Google Speech API

The Google Speech API can be accessed safely through a Chrome browser using x-webkit-speech. Some people have reverse engineered the Google speech API for other uses on the web. The interface is free, but it is not an official public API.

On February 23, 2013, Google announced at the Chrome Blog that the new stable Chrome release includes support for the Web Speech API, which developers can use to integrate speech recognition capabilities into their web apps in more than 30 languages. A web speech API demo is available at the Google website. In the Peanut Gallery, you can add intertitles to old black-and-white movies simply by talking to Chrome.

The following list provides links to more informations about the Google speech API’s :

More speech applications from other suppliers are listed hereafter :

The Eclipse Voice Tools Project (VTP) allows you to build and run speech recognition application using industry standards such as VoiceXML and Speech Recognition Grammar Specification (SRGS).

HTML microdata

One of the most adavanced technologies for the semantic web is HTML microdata. HTML Microdata is a W3C Working Draft (last version : 29 March 2012).

Most HTML tags tell the browser how to display the information included in a tag. For example <h1>Blackberry</h1> tells the browser to display the text string Blackberry in a heading 1 format. However, the HTML tag doesn’t give any information about what that text string means. Blackberry could refer to a mobile device or to a fruit and this makes it difficult for search engines to intelligently display relevant content to a user.

Microdata vocabularies provide the semantics, or meaning of an item. Web developers can design a custom vocabulary or use vocabularies available on the web. Microdata vocabularies are provided by schema.org.

Microdata introduces five simple global attributes (available for any element to use) which give context for machines about your data :

  • itemscope – creates the Item and indicates that descendants of this element contain information about it (boolean attribute)
  • itemtype – a valid URL of a vocabulary that describes the item and its properties context
  • itemid – indicates a unique identifier of the item
  • itemprop – indicates that its containing tag holds the value of the specified item property (strings, urls, images, …)
  • itemref – properties that are not descendants of the element with the itemscope attribute can be associated with the item using this attribute

Google uses semantic web technologies to create rich snippets (detailed information intended to help users with specific queries) in web search results. Googles suggest to use microdata as a markup format. Actually Google supports rich snippets for the following content types: Reviews, People, Products, Businesses and organizations, Recipes, Events and Music.

Google provides a Rich Snippet Testing Tool to check that their search engines can correctly parse the structured data markup and display it in search results. A Microdata schema creator is provided by Raven.

The next list provide links to more informations about microdata, followed by a list of links to specific vocabularies :

HTML5 Video

Last update : January 30, 2013
The HTML5 video specification and the various browser implementations are in constant evolution. LongTail Video has spend a signficant amount of time understanding the limitations of the technology, testing playback across various browsers and devices, and optimizing the jwplayer for HTML5 playback.

Today LongTail Video published a State of HTML5 Video Report to share with other developers and users in the industry just what HTML5 can and cannot support.

Other useful informations about HTML5 Video are available at the following websites :

HTML5 Structure : Semantic Webdesign

Last update : August 30, 2012

HTML5 is work in progress and is going to stay that way for some time, but that’s no reason not to start using it right now. HTML5 added some very important new, semantic elements. To care for older browsers, use graceful degradation techniques. To be up to date with the latest trends, use progressive enhancement technologies.

HTML5 is not based on SGML, and therefore does not require a reference to a DTD.

The website When can I use provides compatibility tables for support of HTML5, CSS3, SVG and more in desktop and mobile browsers.

The following list provide links to some useful blogs and tutorials about HTML5 :

The following list provide links to some useful HTML5 tools :

 

Initializr – HTML5 templates generator

Initializr has been created by Jonathan Verrecchia to help the spread of HTML5 on the web.

Jonathan Verrechia is a french Web Developer, Author and Blogger working at SFEIR on HTML5 and CSS3. He is the author of the french book HTML5 – De la page web à l’application web, together with Jean-Pierre Vincent.

Initiallizr is a HTML5 templates generator whicht is built on HTML5 Boilerplate, a powerful HTML5 template created by Paul Irish and Divya Manian.

CSS : clear floats

Elements following a floated element will wrap around the floated element. To disable wrapping, you can apply the “clear” property to these following elements. The standard method is to place a complete “cleared” element last in the container :

<div style="clear:both;"></div>

To clear CSS floats without this extra markup you can use the following techniques :

  • Float the container as well
  • Use overflow: hidden on the container
  • Generate content using the :after CSS pseudo-class

A very detailed tutorial about Floats has been published by Vitaly Friedman, editor-in-chief of the Smashing Magazine.

Other useful tutorials are :

HTML5 editors

My preferred HTML5 editor is Notepad++. Notepad++ is a free source code editor that supports several languages. Running in the Windows environment, its use is governed by GPL License.

There are other HTML5 editors available. Adobe’s Dreamweaver CS5 is the flagship among the commercial tools.

A list of some other useful HTML5 editors is shown below :

  • Aloha Editor (semantic Rich Text Editor framework written in Javascript with best support of xHTML5)
  • Rendera by Brian P. Hogan

Modernizr

last update : 18 January 2012
Modernizr is a small JavaScript library that detects the availability of native implementations for next-generation web technologies. These technologies are new features that stem from the ongoing HTML 5 and CSS 3 specifications. Many of these features are already implemented in at least one major browser. Modernizr tell you whether the current browser has this feature natively implemented or not.

  1. Modernizr tests for over 40 next-generation features, all in a matter of milliseconds;
  2. Modernizr creates a JavaScript object (named Modernizr) that contains the results of these tests as boolean properties;
  3. Modernizr adds classes to the html element that explain precisely what features are and are not natively supported. It allow you to target specific browser functionality in your stylesheet ( if-statements in your CSS ). You don’t actually need to write any Javascript to use it.

I started with version 1.6.  and experienced a problem with Chrome 9 (beta) which was also reported by other people. The current version  2 was released on 1st June 2011.

With the help of the Modernizr library, the website haz.io gives a quick overview of a browser’s support for recent technologies in the world of HTML, CSS and Javascript.

 

How to make an iPhone web app ?

Tetris web app for iPhone

An iPhone web application (web app) uses Web 2.0 technologies to deliver a focused solution that looks and behaves like a native iPhone application. iPhone web apps run in Safari on iPhone, the unique implementation of Safari that provides full-featured web browsing on iOS-based devices and responds to touch-based gestures.

The Apple Safari Developer / Reference Library provides guides, tutorials, code samples, FAQ’s  and best practices about the creation of web content for iOS devices. The Safari Web Content Guide, the HTML Reference, the CSS Reference and the JavaScript Guide are key documents.

A very useful tutorial about the creation of an off-line Tetris game for an iPhone has been published by Alex Kessinger on the Six Revisions Website. A tutorial about how to install a web app on iPhones has been written by jeshyr on the iTalk Magazine.

There are several tools and frameworks available to build html5/css3 web apps for iPhones or for other mobiles (cross-platforms). A list of a few ones is shown herafter :

  • iWebKit 5 : an outstanding kit with copy and paste elements designed by Christopher Plieger and Johan Van Wilsum to create iPhone web apps.
  • Appcelerator Titanium : an SDK for different application environments. The SDK provides the necessary tools, compilers and APIs for building for the target platform.
  • Sencha Touch : a free HTML5 mobile JavaScript framework that allows you to develop mobile web apps that look and feel native on iPhone and Android touchscreen devices.
  • PhoneGap : an open source development framework for building cross-platform mobile apps with support of core features in iPhone/iPod touch, iPad, Google Android, Palm, Symbian and Blackberry SDKs.
  • Corona : fast and easy development tool for iPhone, iPad and Android games and applications.
  • jQuery Mobile : Touch-Optimized Web Framework for Smartphones & Tablets.
  • iUI: iPhone User Interface Framework
  • Dashcode : part of Apples iPhone SDK

There are also tools and simulators to test created web apps :

  • Bugaboo : an App for debugging web apps on iPhone, iPad, and iPod touch devices, downloadable from the Apple App Store.
  • iPhone  simulator : web browser based simulator

You have to be aware that there are some differences between iPhone native Apps and web apps.

A native App runs code (Objective-C program) on the device and is installable through the App store (if approved by Apple). You have access to all the UI elements the iPhone uses and can do things like 3D which are impossible in the Safari browser. You need a mac to make a native App, but you can make web apps with any platform of your choice.

A web app is accessed via the Safari browser and requires no install. You are just going to a website that has a special stylesheet for the iPhone. Because a web app can also be installed on an iPhone with a custom icon, a custom startup screen, a native look-and-feel and can be used even when the phone is not connected to the Internet, the differences between Apps and web apps are becoming very small.

There are a lot of native Apps that could be run more efficient as web apps. And there are tools to convert a web app into a native App. Make your choice !

!DOCTYPE HTML

The declaration <!DOCTYPE HTML> must be the very first thing in an HTML5 document, before the tag. The doctype declaration is not an HTML tag; it is an instruction to the web browser about what version of the markup language the page is written in.

The doctype in HTML 4.01 required a reference to a DTD, because HTML 4.01 was based on SGML. HTML5 is not based on SGML, and does not require a reference to a DTD, but need the doctype for browsers to behave as they should.