Sphinx-4 : a Java speech recognizer

Sphinx-4 is a state-of-the-art speech recognition system written in Java. It was created via a joint collaboration between the Sphinx group at Carnegie Mellon University, Sun Microsystems Laboratories, Mitsubishi Electric Research Labs (MERL), and Hewlett Packard (HP), with contributions from the University of California at Santa Cruz (UCSC) and the Massachusetts Institute of Technology (MIT).

Sphinx-4 contains the following demo programs :

  • Hello World Demo: a command line application that recognizes simple phrases
  • Hello Digits Demo: a command line application that recognizes connected digits
  • Hello N-Gram Demo: a command line application using an N-gram language model for speech recognition
  • ZipCity Demo: a Java Web Start technology application that recognizes spoken zip codes and locates the associated city and state
  • WavFile Demo: a simple demo program to show how to decode audio files (e.g., .wav, .au files)
  • Transcriber Demo: a simple demo program showing how to transcribe a continuous audio file that has multiple utterances separated by silences
  • JSGF Demo: a simple demo program showing how a program can swap between multiple JSGF grammars
  • Dialog Demo: a demo program showing how a program can swap between multiple JSGF and dictation grammars
  • Action Tags Demo: a demo program showing how to use action tags for post-processing of RuleParse objects obtained from JSGF grammars
  • Confidence Demo: a simple demo program showing how to obtain confidence scores for result
  • Lattice Demo: a simple demo program showing how to extract lattices from recognition results

A number of tests and demos rely on having JSAPI installed. Sphinx-4 can be combined wit FreeTTS to set up a complete voice interface or a VoiceXML server.

FreeTTS : a Java speech synthesizer

FreeTTS is a speech synthesis system written entirely in Java. It is based upon Flite, a small run-time speech synthesis engine developed at Carnegie Mellon University. Flite is derived from the Festival Speech Synthesis System from the University of Edinburgh and the FestVox project from Carnegie Mellon University.
Free TTS was built by the Speech Integration Group of Sun Microsystems Laboratories.

Possible uses of FreeTTS are:

  • JSAPI (Java Speech API) speech synthesizer
  • Remote TTS Server, to act as a back-end text-to-speech engine that works with a speech/telephony system, or does the “heavy lifting” for a wireless PDA
  • Workstation/Desktop TTS engine
  • Downloadable Web Application (FreeTTS can not be used in an applet)

FreeTTS includes the following demos :

  •  JSAPI/HelloWorld: uses the JSAPI 1.0 Synthesis interface to speak “Hello, World”
  • JSAPI/MixedVoices: demonstrates using multiple voices and speech synthesizers in a coordinated fashion using JSAPI 1.0
  • JSAPI/Player: Swing-based GUI (graphical user interface) that allows the user to monitor and manipulate a JSAPI 1.0 Speech Synthesizer
  • JSAPI/JTime: JSAPI program that uses a limited-domain, high quality voice to tell the time
  • JSAPI/Emacspeak: uses JSAPI 1.0 to provide a text-to-speech server for Emacspeak
  • JSAPI/WebStartClock: JSAPI talking clock that can be downloaded from the web using Java Web Start
  • freetts/HelloWorld: low-level (non-JSAPI) program that speaks a greeting to the world
  • freetts/ClientServer: low-level (non-JSAPI) socket-based TTS server with sample clients written in the C programming language and the Java programming language.

To write software with FreeTTS, it is recommended to use the Java Speech API (JSAPI) 1.0 to interface with FreeTTS. The JSAPI interface provides the best method of controlling and using FreeTTS.

Currently, the FreeTTS distribution comes with these 3 voices:

  • a low quality, unlimited domain, 8kHz diphone voice, called kevin
  • a medium quality, unlimited domain, 16kHz diphone voice, called kevin16
  • a high quality, limited domain, 16kHz cluster unit voice, called alan

FreeTTS interfaces with the MBROLA synthesizer and can use MBROLA voices. It’s also possible to import voice data from Festival and FestVox or CMU ARCTIC voices.

A full implementation of Sun’s Java Speech API for Windows platforms, allowing a large range of SAPI4 and SAPI5 compliant Text-To-Speech and Speech-Recognition engines (in many different languages) to be programmed using the standard Java Speech API has been developped by CloudGarden. Packages and additional classes augment the capabilities of the JSAPI by, for example integrating with Sun’s JMF, allowing, amongst other things, MPEG audio files to be created and read, and compressed audio data to be transmitted across a network

 

WURFL = Wireless Universal Resource File

last update: 22 august 2011

The WURFL is an XML configuration file which contains information about capabilities and features of many mobile devices. The main scope of the file is to collect as much information as we can about all the existing mobile devices that access WAP pages so that developers will be able to build better applications and better services for the users. WURFL is an open-source project and is intended for developers working with the WAP and Wireless.

The WURFL project was launched by Luca Passani. He is the author of the web-tutorial Global Authoring Practices for the Mobile Web.

In June 2011 the WURFL Team launched the new US company ScientiaMobile and turned WURFL into a commercial reality.

InterObject

last update : august 2010

In 2003, InterObject, an israelian company, was focused on the development of multimedia software for Windows and embedded systems. They have been pioneers in the development of software related to MMS Technology. A 3GPP SMIL player, an MMS composer, an MPEG4 decoder and encoder, am MMS presentation SDK and a Symbian MMS player were some of the tools that I used to create innovative mobile applications for the Nokia Series 60 phones.

On October 20, 2008, InterObject joined GlobalLogic Inc., the leader in global product development services.

vvvv : multipurpose real-time video synthesis toolkit

last update : January 30, 2013

vvvv is a multipurpose toolkit focusing on real-time video synthesis, connecting physical devices, and developing interactive media applications and systems. Because vvvv is basically a very modular multipurpose construction toolkit, it can be used to create many kinds of custom-built applications. The development of vvvv was initiated 1998 by Sebastian Oschatz and Max Wolf at MESO to built a high performance multimedia tool to set up artistic and commercial projects. They were joined shortly after by Sebastian Gregor who invented many of the core algorithms. In 2000 Joreg joined the team at MESO to set up the graphical user interface as his diploma thesis. In the first years vvvv was used exclusively as an inhouse tool. The first public release was in december 2002.

In 2006 the further development of vvvv was handed over to the vvvv group consisting out of Joreg, Sebastian Oschatz, Sebastian Gregor and Max Wolf, which will coordinate further developments of the software.

The current version is vvvv45beta29 released on 24th December 2012.

Helix DNA server

last update : august 2010

The Helix DNA Server is a universal delivery engine supporting the real time packetization and network transmission of any media type to any device. The Helix DNA Server is the industry’s core media delivery engine. The Helix DNA Server is available under the RealNetworks Public Source License and is developed by the HelixCommunity.

I used the Helix server for my first streaming applications with the Nokia 3650 and 7650 phones.

Mophun, a mobile environment

last update : august 2010

Mophun, a product of Synergenix Interactive, is an environment for developing Mobile-Phone Games and Applications. The games are developed in the C and C++ programming languages using an open source SDK available upon request . A compiler translates the projects into a system independent application.

Mophun was the brainchild of Antony Hartley and was developed by Hartley and Anders Norlander. Antony Hartley since moved on to form Mobile Sorcery, a company that specializes in visual mobile authoring. The name was changed later on in MoSync AB.

Viewpoint : the in-visible cow

The in-visible cow was created by Stephane Beugnet, 3D & Web-designer at the solution provider in-visible in Luxembourg in the context of the summer exhibition “Art on Cows” in Luxembourg (april 10 to september 1, 2001).

[HTML1]

To view this content, you need the viewpoint media player. Please visit the Viewpoint website here to download the viewpoint player plugin if the automatic installation fails. You can rotate the cow by leftclicking and moving the mouse. Press the Shift-key and leftclick the mouse to translate the cow, press the Ctrl-key and leftclick the mouse to zoom the cow in or out.