Last update : March 30, 2018
Spectrograms are visual representations of the spectrum of frequencies in a sound or other signal as they vary with time (or with some other variable). Spectrograms can be used to identify spoken words phonetically. The instrument that generates a spectrogram is called a spectrograph.
FFT is an algorithm to compute the Discrete Fourier Transform (DFT) and its inverse. A significative parameter of the DFT is the choice of the Window Function. In signal processing, a window function is a mathematical function that is zero-valued outside of some chosen interval. The following window functions are common for spectrograms :
- Hann window function : good all-round frequency-resolution and dynamic-range properties
- Hamming window function : higher frequency resolution
- Kaiser window function : higher dynamic range
- Bartlett window function
- Rectangular window function
I recorded a sound example.wav file with my name spoken three times, to use as test file for different spectrogram software programs.
Real-Time Spectrogram Software
There are some great software programs to perform a spectrogram for speech analysis in realtime or with recorded sound files :
- SFS / RTGRAM
WaveSurfer is an open source multiplatform tool for sound visualization and manipulation. Typical applications are speech/sound analysis and sound annotation/transcription. WaveSurfer may be extended by plug-ins as well as embedded in other applications. A comprehensive user manual and numerous tutorials for Wavesurfer are available on the net.
WaveSurfer was developed at the Centre for Speech Technology (CCT) at the KTH Royal Institute of Technology in Sweden. The latest stable Windows release (1.8.8p5, October 27, 2017) and the source code of WaveSurfer can be downloaded from Sourceforge. The authors of Wavesurfer are Jonas Beskow and Kåre Sjölander.
By right-clicking in the Wafesurfer pane, a pop-up window opens with menus to add more panes, to customize the configuration and to change the parameters for analysis. In the following rendering, the panes Waveform, Pitch Contour, Formant Plot and Transcription have been added to the spectrogram pane and to the Time Axis pane. The spectrogram frequency range was cut at 5 KHz.
Two other panes can be selected: Power Plot and Data Plot. Additional specific panes can be created with plugins.
Wavesurfer uses the Snack Sound Toolkit created by Kåre Sjölander. There exist other software programs with the name Wavesurfer, for example wavesurfer.js, a customizable waveform audio visualization tool, built on top of Web Audio API and HTML5 Canvas by katspaugh.
Spectrogram16 is a calibrated, dual channel audio spectrum analyzer for Windows that can provide either a scrolling time-frequency display or a spectrum analyzer scope display in real time for any sound source connected to the sound card. A detailed user guide (51 pages) is joined to the program.
The tool was created by Richard Horne, the founder of Visualization Software LLC. The company closed a few years ago. The WayBackMachine shows that Richard Horne announced in 2008 that version 16 of Spectrogram is now freeware (see also local copy). The software is still available from most free software download websites. Richard Horne, MS, who retired as a Civilian Electrical Engineer for the Navy, is member of the Management Team of Vocal Innovations.
SFS / RTGRAM
RTGRAM is a free Windows program for displaying a real-time scrolling spectrographic display of an audio signal. With RTGRAM you can monitor the spectro-temporal characteristics of sounds being played into the computer’s microphone or line input ports. RTGRAM is optimised for speech signals and has options for different sampling rates, analysis bandwidths (wideband = 300 Hz, narrowband = 45 Hz), temporal resolution (time per pixel = 1 – 10 ms), dynamic range (30 – 70 dB) and colour maps.
The current version of RTGRAM is 1.3, released in April 2010. It is part of the Speech Filing System (SFS) tools for speech research.
Audacity is a free, open source, cross-platform software for recording and editing sounds. Audacity was started in May 2000 by Dominic Mazzoni and Roger Dannenberg at Carnegie Mellon University. The current version is 2.1.3, released on March 17, 2017.
A huge documentation about Audacity with manuals, tutorials, tips, wikis, FAQ’s is available in several languages.
RTS (Real-Time Spectrogram) is a product of Engineering Design, founded in 1980 to address problems in instrumentation and measurement, physical acoustics, and digital signal analysis. Since 1984, Engineering Design has been the developer of the SIGNAL family of sound analysis software. RTS is highly integrated with SIGNAL.
STRAIGHT (Speech Transformation and Representation by Adaptive Interpolation of weiGHTed spectrogram) was originally designed to investigate human speech perception in terms of auditorily meaningful parametric domains. STRAIGHT is a tool for manipulating voice quality, timbre, pitch, speed and other attributes flexibly. The tool was invented by Hideki Kawahara when he was in the Advanced Telecommunications Research Institute International (ATR) in Japan. Hideki Kawahara is now Professor at the Auditory Media Laboratory, Department of Design and Information Sciences, Faculty of Systems Engineering, Wakayama University, Japan.
Irman Abdić created an audio tool (iSound) for displaying spectrograms in real time using Sphinx-4 as part of his thesis at the Faculty of Mathematics, Natural Sciences and Information Technologies (FAMNIT) from Koper, Slovenia.
No Real-Time Spectrogram Software
Other great software programs to create no-realtime spectrograms of recorded voice samples are :
- SFS / WASP
Praat (= talk in dutch) is a free scientific computer software package for the analysis of speech in phonetics. It was designed, and continues to be developed, by Paul Boersma and David Weenink of the Institute of Phonetics Sciences at the University of Amsterdam. Praat runs on a wide range of operating systems. The program also supports speech synthesis, including articulatory synthesis.
Praat displays two windows : Praat Objects and Praat Picture.
The spectrogram can also be rendered in a customized window.
The current Windows version 6.0.38 (32 and 64 bit) of Praat was released on March 29, 2018. The source code for this release is available at the Praat website. A huge documentation with FAQ’s, tutorials, publications, user guides is available for Praat. The plugins are located in the directory C:/Users/name/Praat/.
An outstanding plugin for Praat is EasyAlign. It is a user-friendly automatic phonetic alignment tool for continuous speech. It is possible to align speech from an orthographic or phonetic transcription. It requires a few minor manual steps and the result is a multi-level annotation within a TextGrid composed of phonetic, syllabic, lexical and utterance tiers. EasyAlign was developed by Jean-Philippe Goldman at the Department of Linguistics, University of Geneva.
SoX (Sound EXchange) is a free cross-platform command line utility that can convert various formats of computer audio files in to other formats. It can also apply various effects to these sound files and play and record audio files on most platforms. SoX is called the Swiss Army knife of sound processing programs.
SoX is written in standard C and was created in July 1991 by Lance Norskog. In May 1996, Chris Bagwell started to maintain and release updated versions of SoX. Throughout its history, SoX has had many contributing authors. Today Chris Bagwell is still the main developer.
The current Windows distribution is 14.4.2 released in February 22, 2015. The source code is available at Sourceforge.
SoX provides a very powerful spectrogram effect. The spectrogram is rendered in a png image-file and shows time in the x-axis, frequency in the y-axis and audio signal amplitude in the z-axis. The z-axis values are represented by the colour of the pixels in the x-y plane. The command
sox example.wav -n spectrogram
creates the following auto-calculated, auto-sized spectrogram :
The main options to customize a spectrogram created with SoX are :
-x num : change the width of the spectrogram from its default value of 800px -Y num : sets the total height of the spectrogram; the default value is 550px -z num : sets the dynamic range from 20 to 180 dB; the default value is 120 dB -q num : sets the z-axis quantisation (number of different colours) -w name : select the window function; the default function is Hann -l : creates a printer-friendly spectrogram with a light background -a : suppress the display of the axis lines -t text : set an image title -c text : set an image comment (below and to the left of the image) -o text : set the name of the output file; the default name is spectrogram.png rate num k : analyse a small portion of the frequency domain (up to 1/2 num kHz)
A customized rendering follows :
The customized SoX spectrogram was created with the following command :
sox example.wav -n rate 10k spectrogram -x 480 -y 240 -q 4 -c "www.web3.lu" -t "SoX Spectrogram of the triple speech sound Marco Barnig"
WASP is a free Windows program for the recording, display and analysis of speech. With WASP you can record and replay speech signals, save them and reload them from disk, edit annotations, and display spectrograms and a fundamental frequency track. WASP is a simple application that is complete in itself, but which is also designed to be compatible with the Speech Filing System (SFS) tools for speech research. The current version 1.54 was released in July 2013.
The following figure shows a customized WASP window with a speech waveform pane, a wideband spectrogram, a pitch track and annotations.
Specific Spectrogram Software
Spectrograms can also be used for teaching, artistic or other curious purposes :
FaroSon (The Auditory Lighthouse), is a Windows program for the real-time conversion of sound into a coloured pattern representing loudness, pitch and timbre. The loudness of the sound is reflected in the brightness and saturation of the colours. The timbre of the sound is reflected in the colours themselves: sounds with predominantly bass character have a red colour, while sounds with a predominantly treble character have a blue colour. The pitch of the sound is reflected in the horizontal banding patterns: when the pitch of the sound is low, then the bands are large and far apart, and when it is high, the bands are narrow and close together. If the pitch of the sound is falling you see the bands diverge; when it is rising, you see the bands converge.
AudioCheck offers the Internet’s largest collection of online sound tests, test tones, and tone generators. Audiocheck provides a unique online tool called SpectroTyper to insert plain text into a .wav sound file. The downloaded file plays as cool-sounding computer-like tones and is secretly readable from a spectrogram view (linear frequency scale best). It can be used for fun, to hide easter eggs in a music production or to tag copyrighted audio material with own identifiers or source informations.
Here is the barnig_txt.wav sound file with my integrated name as an example, the result is shown below in the SoX spectrogram, created with the command :
sox barnig_txt.wav -n rate 10k spectrogram -x 480 -y 120
Richard David James, best known by his stage name Aphex Twin, is an British electronic musician and composer. In 1999, he released Windowlicker as a single on Warp Records. In this record he synthesized his face as a sound, only viewable in a spectrogram.
Gavin Black (alias plurSKI) created a perl script to do the same : take a digital picture and convert it into a wave file. Creating a spectrogram of that file then reproduces the original picture. More informations about this project are available on Gavin Black’s website devrand.org
Here is the barnig_portrait.wav sound file with my integrated portrait as an example, the result is shown below in the SoX spectrogram, created with the command :
sox barnig_portrait.wav -n spectrogram -x 480 -y 480
A list with links to websites providing additional informations about spectrograms is presented below :