FreeTTS : a Java speech synthesizer

FreeTTS is a speech synthesis system written entirely in Java. It is based upon Flite, a small run-time speech synthesis engine developed at Carnegie Mellon University. Flite is derived from the Festival Speech Synthesis System from the University of Edinburgh and the FestVox project from Carnegie Mellon University.
Free TTS was built by the Speech Integration Group of Sun Microsystems Laboratories.

Possible uses of FreeTTS are:

  • JSAPI (Java Speech API) speech synthesizer
  • Remote TTS Server, to act as a back-end text-to-speech engine that works with a speech/telephony system, or does the “heavy lifting” for a wireless PDA
  • Workstation/Desktop TTS engine
  • Downloadable Web Application (FreeTTS can not be used in an applet)

FreeTTS includes the following demos :

  •  JSAPI/HelloWorld: uses the JSAPI 1.0 Synthesis interface to speak “Hello, World”
  • JSAPI/MixedVoices: demonstrates using multiple voices and speech synthesizers in a coordinated fashion using JSAPI 1.0
  • JSAPI/Player: Swing-based GUI (graphical user interface) that allows the user to monitor and manipulate a JSAPI 1.0 Speech Synthesizer
  • JSAPI/JTime: JSAPI program that uses a limited-domain, high quality voice to tell the time
  • JSAPI/Emacspeak: uses JSAPI 1.0 to provide a text-to-speech server for Emacspeak
  • JSAPI/WebStartClock: JSAPI talking clock that can be downloaded from the web using Java Web Start
  • freetts/HelloWorld: low-level (non-JSAPI) program that speaks a greeting to the world
  • freetts/ClientServer: low-level (non-JSAPI) socket-based TTS server with sample clients written in the C programming language and the Java programming language.

To write software with FreeTTS, it is recommended to use the Java Speech API (JSAPI) 1.0 to interface with FreeTTS. The JSAPI interface provides the best method of controlling and using FreeTTS.

Currently, the FreeTTS distribution comes with these 3 voices:

  • a low quality, unlimited domain, 8kHz diphone voice, called kevin
  • a medium quality, unlimited domain, 16kHz diphone voice, called kevin16
  • a high quality, limited domain, 16kHz cluster unit voice, called alan

FreeTTS interfaces with the MBROLA synthesizer and can use MBROLA voices. It’s also possible to import voice data from Festival and FestVox or CMU ARCTIC voices.

A full implementation of Sun’s Java Speech API for Windows platforms, allowing a large range of SAPI4 and SAPI5 compliant Text-To-Speech and Speech-Recognition engines (in many different languages) to be programmed using the standard Java Speech API has been developped by CloudGarden. Packages and additional classes augment the capabilities of the JSAPI by, for example integrating with Sun’s JMF, allowing, amongst other things, MPEG audio files to be created and read, and compressed audio data to be transmitted across a network