Audio Metaphor
Soundscape Generation Research

ARTWORK INFORMATION

Audio Metaphor is an interactive system that presents itself as a search engine in which the audience is invited to enter an expression or a sentence that will serve as a request to an automatic soundscape generation system. Enter “The waterfalls inundate the city” or “The marshmallows explode in the campfire” and it will sound like it in quadraphonic! This interactive audio installation is questioning the ubiquity of information, be it real or fake, actual or synthetic. Using state of the art algorithms for sound retrieval, segmentation, background and foreground classification, automatic mixing and automatic soundscape affect recognition. Audio Metaphor is a powerful system that generates believable soundscape at interactive rate. The piece points at issues around big data, artificial intelligence, machine learning and other technoscientific advances, and their impact on our perception and experience of the world.
Try the Audio Metaphor art installation online: VERSION 2 (2022), VERSION 1 (2021)
Hardware description: [Computer, Audio Interface, Speakers]

Performance and Public Presentations

Computer-assisted Sound Design System

Audio Metaphor is a pipeline of computational tools for generating artificial soundscapes. The pipeline includes modules for audio file search, segmentation and classification, and mixing. The input for the pipeline is a sentence, a desired duration, and curves for pleasantness and eventfulness. Each module can be used independently, or in unison to generate soundscape from a sentence.
Try the Audio Metaphor soud design system online: VERSION 2 (2022), VERSION 1 (2015)

RESEARCH

Audio Metaphor is a soundscape generation system that transforms text into soundscapes. A user enters a sentence of a scenario, the desired mood, and duration. Audio Metaphor analyzes this text, selects sounds from a database, cuts these sounds up, and recombines them in a sound design process.

Examples

A city in the bush
Crows feeding on rubbish at the garbage dump
The spring garden
A reservoir and fountain, raining in Vancouver
The text analysis identifies key semantic indicators used to search for related sounds either locally or online. The algorithm SLiCE attempts to optimize search results for maximizing the combination of keywords in a result. Sounds returned from the search are cut up based on a perceptual model of background and foreground sound. Each classified segment is then run through a predictive model that applies mood based labels to the sound from a two-dimensional affect space. We developed both these models from human listening experiments aimed at automating this process.
A mixing engine takes labelled sound segments and selects, arranges and mixes into the final soundscape. The engine creates separate tracks for semantic groups returned from the search and the mixing engine inserts corresponding sounds onto these, based on the overall mood of a mix at a particular time. The volume envelope of the mix is calculated by the control system. The generative result of Audio Metaphor reveal the human like creative processes of the system, and is used for assisting sound designers in game sound, sound for animation, and computational arts.

Valence and arousal mixing examples

Try out the different mixing outputs with alternatives of valence and arousal HERE

System Architecture

Background/Foreground Classifier

Segmentation and classification is an important but time consuming part of the process of using soundscape recordings in sound design and research. Background and foreground are general classes referring to a signal’s perceptual attributes, and used as a criteria by sound designers when segmenting sound files. We establish a method for automatic segmentation of soundscape recordings based on this task.

Impress: Affect prediction

A soundscape is the sound environment perceived by a given listener at a given time and space. We developed an automatic soundscape affect recognition system to benefit for composers, sound designers, and audio researchers.

Impress

Impress is a model which takes audio, automatically parses the content and returns valence and arousal values in real-time (every frame 20ms). It is one of the module of Audiometaphor, but it can also be used on its own to annotate valence and arousal in soundscape compositions. It is useful for urban design, development and architecture projects. Try the Impress system online: HERE

Publications

Kranabetter, J., Carpenter, C., Tchemeube, R. B., Pasquier, P., & Thorogood, M. (2022). Audio Metaphor 2.0: An Improved Classification and Segmentation Pipeline for Generative Sound Design Systems. In Proceedings of the 19th Sound and Music Computing Conference. Saint-Étienne, France. PDF
Thorogood, M. (2021) Developing a Sound Design Creative AI Methodology. Doing Research in Sound Design, 224-237. Focal Press.
Fan J., Yang Y-H., Dong K., Pasquer, P. (2020). A Comparative Study of Western and Chinese Classical Music based on Soundscape Models. International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Barcelona, Spain.
Fan J., Nichols E., Tompkins D., Méndez A. E. M., Elizalde B., Pasquer, P. (2020). Multi-label Sound Event Retrieval Using a Deep Learning-based Siamese Structure with a Pairwise Presence Matrix. International Conference on Acoustics, Speech, and Signal Processing, Barcelona, Spain.
Thorogood, M., Fan, J., Pasquer, P. (2019), A Framework for Computer-Assisted Sound Design Systems Supported by Modelling Affective and Perceptual Properties of Soundscapes. Journal of New Music Research.
Thorogood, M., & Pasquier, P. (2019) Soundscape Online Databases State of the Art and Challenges. In Foundations in Sound Design for Interactive Media: A Multidisciplinary Approach, 333. Routledge.
Thorogood, M. (2019) Soundscape Generation Systems. In Foundations in Sound Design for Interactive Media (pp. 259-273). Routledge.
Fan J., Thorogood, M., Tatar, K., Paquier, P. (2018). Quantitative Analysis of the Impact on Perceived Emotion of Soundscape Recordings. Sound and Music Computing (SMC)
Fan, J., Tung, F., Li, W., Pasquer, P. (2018). Soundscape Emotion Recognition via Deep Learning. Sound and Music Computing (SMC), 2018
Fan, J., Thorogood, M., and Pasquier, P. (2017). Emo-Soundscapes - A Dataset for Soundscape Emotion Recognition. Proceedings of the International Conference on Affective Computing and Intelligent Interaction.
Fan, J., Tatar, K., Thorogood, M., and Pasquier, P. (2017). Ranking-based Emotion Recognition for Experimental Music. Proceedings of the International Symposium on Music Information Retrieval, 2017.
Thorogood, M., Fan, J., and Pasquier, P. Soundscape Audio Signal Classification and Segmentation Using Listeners Perception of Background and Foreground Sound. Journal of the Audio Engineering Society. Special Issue (Intelligent Audio Processing, Semantics, and Interaction).
Fan, J., Thorogood, M., Riecke, B. and Pasquier, P. (2016). Automatic Recognition of Eventfulness and Pleasantness of Soundscape. Journal of the Audio Engineering Society. Special Issue (Intelligent Audio Processing, Semantics, and Interaction)
Bizzochi, J., Eigenfeldt, A., Pasquier, P. and Thorogood, M. (2016). Seasons II: a case study in Ambient Video, Generative Art, and Audiovisual Experience. Electronic Literature Organization Conference. British Columbia, Canada.
Bizzochi, J., Eigenfeldt, A., Thorogood, M., and Bizzochi, J. (2015) Generating Affect: Applying Valence and Arousal values to a unified video, music, and sound generation system. Generative Art Conference. 2015. 308 - 318
Thorogood, M., Fan, J., Pasquier, P. (2015). BF-Classifier: Background/Foreground Classification and Segmentation of Soundscape Recordings. In Proceedings of the 10th Audio Mostly Conference, Greece.
Fan, J., Thorogood, M., Riecke, B., Pasquier, P. (2015). Automatic Recognition of Eventfulness and Pleasantness of Soundscape. In Proceedings of the 10th Audio Mostly Conference, Greece.
Eigenfeldt, A., Thorogood, M., Bizzocchi, J., Pasquier, P. (2014). MediaScape:Towards a Video, Music, and Sound Metacreation. Journal of Science and Technology of the Arts 6, 2014. PDF
Eigenfeldt, A., Thorogood, M., Bizzocchi, J., Pasquier, P. Calvert, T., (2014). Video, Music, And Sound Metacreation. xCoAx 2014, Porto, Portugal. 321-333, 2014. PDF
Thorogood, M., Pasquier, P. (2013). Computationally Generated Soundscapes with Audio Metaphor. In Proceedings of the 4th International Conference on Computational Creativity, Sydney. PDF bibibtex
Thorogood, M., Pasquier, P. (2013). Impress: A Machine Learning Approach to Soundscape Affect Classification for a Music Performance Environment. Proceedings of the 13th International Conference on New Interfaces for Musical Expression, Daejeon + Seoul, Korea Republic. PDF bibtex
Thorogood, M., Pasquier, P., Eigenfeldt, A. (2012). Audio Metaphor: Audio Information Retrieval for Soundscape Composition. In Proceedings of the 9th Sound and Music Computing Conference, Copenhagen. PDF bibtex

Contributors

Miles Thorogood Miles Thorogood Miles Thorogood is a creative coding educator at Emily Carr University, and interactive sound artist. Through the installation based artwork, Miles explores the convergence of the human body, environments, and technology. His research at the School of Interactive Arts and Technology, SFU is toward modelling the phenomena of human perception in order for richer computational creativity systems. Contact Miles
Philippe Pasquier Philippe Pasquier Philippe is a professor in the School of Interactive Arts and Technology of Simon Fraser University. In his artistic practice, focused primarily on sonic arts, he is interested in studying and exploiting the various relationships and synergies between art, science and technology. He has been acting as a performer, director, composer, musician, producer and educator in many different contexts. Contact Philippe
Joshua-Kranabetter Joshua Kranabetter Joshua Kranabetter is a Master's student at the University of British Columbia's Okanagan campus. He is interested studying in generative sound design and music creation through machine learning and artificial intelligence. In the future, he would like to explore novel applications of sound in film, video games, and music.
Renaud Bougueng T. Renaud Bougueng Renaud Bougueng T. is a Ph.D. student at the School of Interactive Arts and Technology, Simon Fraser University. His research interests are in the fields of computational creativity, artificial intelligence and design research. In particular, he focuses on generative systems and algorithms for music, and interaction software design for musical creativity. He is a software engineer by trade, a music producer, designer and artist.

Past Contributors

Jianyu Fan Jianyu Fan Jianyu Fan is a Ph.D. candidate from Metacreation Lab, Simon Fraser University. His research interests lie in the field of Affective Computing, Machine Listening, Human-Computer Interaction, and Computational Creativity. He has been a researcher, engineer, and artist in many different contexts.
Arne Eigenfeldt Arne Eigenfeldt Arne is a composer of acoustic and electroacoustic music, and is an active software designer. His music has been performed throughout the world, and his research in intelligent music systems has been published and presented in international conferences. He teaches music and technology at SFU's School for the Contemporary Arts.