Audio Metaphor Soundscape Generation Research


Audio Metaphor is an interactive system that presents itself as a search engine in which the audience is invited to enter an expression or a sentence that will serve as a request to an automatic soundscape generation system. Enter “The waterfalls inundate the city” or “The marshmallows explode in the campfire” and it will sound like it in quadraphonic! This interactive audio installation is questioning the ubiquity of information, be it real or fake, actual or synthetic. Using state of the art algorithms for sound retrieval, segmentation, background and foreground classification, automatic mixing and automatic soundscape affect recognition. Audio Metaphor is a powerful system that generates believable soundscape at interactive rate. The piece points at issues around big data, artificial intelligence, machine learning and other technoscientific advances, and their impact on our perception and experience of the world.
Hardware description: [Computer, Audio Interface, Speakers]

Online System

Try the Audio Metaphor system online HERE


Audio Metaphor is a soundscape generation system that transforms text into soundscapes. A user enters a sentence of a scenario, the desired mood, and duration. Audio Metaphor analyzes this text, selects sounds from a database, cuts these sounds up, and recombines them in a sound design process.


A city in the bush
Crows feeding on rubbish at the garbage dump
The spring garden
A reservoir and fountain, raining in Vancouver
The text analysis identifies key semantic indicators used to search for related sounds either locally or online. The algorithm SLiCE attempts to optimize search results for maximizing the combination of keywords in a result. Sounds returned from the search are cut up based on a perceptual model of background and foreground sound. Each classified segment is then run through a predictive model that applies mood based labels to the sound from a two-dimensional affect space. We developed both these models from human listening experiments aimed at automating this process.
A mixing engine takes labelled sound segments and selects, arranges and mixes into the final soundscape. The engine creates separate tracks for semantic groups returned from the search and the mixing engine inserts corresponding sounds onto these, based on the overall mood of a mix at a particular time. The volume envelope of the mix is calculated by the control system. The generative result of Audio Metaphor reveal the human like creative processes of the system, and is used for assisting sound designers in game sound, sound for animation, and computational arts.

System Modules

Background/Foreground Classifier

Segmentation and sclassification is an important but time consuming part of the process of using soundscape recordings in sound design and research. Background and foreground are general classes referring to a signal’s perceptual attributes, and used as a criteria by sound designers when segmenting sound files. We establish a method for automatic segmentation of soundscape recordings based on this task.

Impress: Affect prediction

A soundscape is the sound environment perceived by a given listener at a given time and space. We developed an automatic soundscape affect recognition system to benefit for composers, sound designers, and audio researchers.


Fan, J., Thorogood, M., and Pasquier, P. (2017). Emo-Soundscapes - A Dataset for Soundscape Emotion Recognition, Proceedings of the International Conference on Affective Computing and Intelligent Interaction.
Fan, J., Tatar, K., Thorogood, M., and Pasquier, P. (2017). Ranking-based Emotion Recognition for Experimental Music,” Proceedings of the International Symposium on Music Information Retrieval, 2017.
Thorogood, M., Fan, J., and Pasquier, P. Soundscape Audio Signal Classification and Segmentation Using Listeners Perception of Background and Foreground Sound. Journal of the Audio Engineering Society. Special Issue (Intelligent Audio Processing, Semantics, and Interaction).
Fan, J., Thorogood, M., Riecke, B. and Pasquier, P. (2016). Automatic Recognition of Eventfulness and Pleasantness of Soundscape. Journal of the Audio Engineering Society. Special Issue (Intelligent Audio Processing, Semantics, and Interaction)
Bizzochi, J., Eigenfeldt, A., Pasquier, P. and Thorogood, M. (2016). Seasons II: a case study in Ambient Video, Generative Art, and Audiovisual Experience. Electronic Literature Organization Conference. British Columbia, Canada.
Bizzochi, J., Eigenfeldt, A., Thorogood, M., and Bizzochi, J. (2015) Generating Affect: Applying Valence and Arousal values to a unified video, music, and sound generation system. Generative Art Conference. 2015. 308 - 318
Thorogood, M., Fan, J., Pasquier, P. (2015). BF-Classifier: Background/Foreground Classification and Segmentation of Soundscape Recordings. In Proceedings of the 10th Audio Mostly Conference, Greece.
Fan, J., Thorogood, M., Riecke, B., Pasquier, P. (2015). Automatic Recognition of Eventfulness and Pleasantness of Soundscape. In Proceedings of the 10th Audio Mostly Conference, Greece.
Eigenfeldt, A., Thorogood, M., Bizzocchi, J., Pasquier, P. (2014). MediaScape:Towards a Video, Music, and Sound Metacreation. Journal of Science and Technology of the Arts 6, 2014. PDF
Eigenfeldt, A., Thorogood, M., Bizzocchi, J., Pasquier, P. Calvert, T., (2014). Video, Music, And Sound Metacreation, xCoAx 2014, Porto, Portugal. 321-333, 2014. PDF
Thorogood, M., Pasquier, P. (2013). Computationally Generated Soundscapes with Audio Metaphor. In Proceedings of the 4th International Conference on Computational Creativity, Sydney. PDF bibibtex
Thorogood, M., Pasquier, P. (2013). Impress: A Machine Learning Approach to Soundscape Affect Classification for a Music Performance Environment. Proceedings of the 13th International Conference on New Interfaces for Musical Expression, Daejeon + Seoul, Korea Republic. PDF bibtex
Thorogood, M., Pasquier, P., Eigenfeldt, A. (2012). Audio Metaphor: Audio Information Retrieval for Soundscape Composition. In Proceedings of the 9th Sound and Music Computing Conference, Copenhagen. PDF bibtex

Performance and Public Presentations


Miles Thorogood Miles Thorogood Miles Thorogood is a creative coding educator at Emily Carr University, and interactive sound artist. Through the installation based artwork, Miles explores the convergence of the human body, environments, and technology. His research at the School of Interactive Arts and Technology, SFU is toward modelling the phenomena of human perception in order for richer computational creativity systems. Contact Miles
Philippe Pasquier Philippe Pasquier Philippe is a professor in the School of Interactive Arts and Technology of Simon Fraser University. In his artistic practice, focused primarily on sonic arts, he is interested in studying and exploiting the various relationships and synergies between art, science and technology. He has been acting as a performer, director, composer, musician, producer and educator in many different contexts. Contact Philippe
Arne Eigenfeldt Arne Eigenfeldt Arne is a composer of acoustic and electroacoustic music, and is an active software designer. His music has been performed throughout the world, and his research in intelligent music systems has been published and presented in international conferences. He teaches music and technology at SFU's School for the Contemporary Arts.
Jianyu Fan Jianyu Fan