144th Meeting of the Acoustical Society of America and First Pan-American/Iberian Meeting on Acoustics

Tuesday Morning, 3 December 2002      Coral Sea 1 and 2, 8:00 to 11:45 AM

                        Session 2aMU

Musical Acoustics: Analysis, Synthesis, Perception and Classification
                   of Musical Sounds

                 James W. Beauchamp, Chair
  School of Music, Department of Electrical and Computer Engineering,
       University of Illinois, Urbana, Illinois 61801 USA

**************************************************************************

                       Invited Papers

**************************************************************************

Time: 8:00 AM

2aMU1. Spectral modeling, analysis, and synthesis of musical sounds.

Author: Sylvain Marchand 
Location: LaBRI, Univ. of Bordeaux 1, 351 cours de la Liberation,
F-33405 Talence cedex, France
Author: Myriam Desainte-Catherine 
Location: LaBRI, Univ. of Bordeaux 1, 351 cours de la Liberation,
F-33405 TalenceLaBRI, Univ. of Bordeaux 1, 351 cours de la Liberation,
F-33405 Talence cedex, France

Abstract: 

Spectral models provide general representations for sound well-suited for
expressive musical transformations. These models allow us to extract and
modify perceptually-relevant parameters such as amplitude, frequency, and
spectrum. Thus, they are of great interest for the classification of
musical sounds. A new analysis method was proposed to accurately extract
the spectral parameters for the model from existing sounds. This method
extends the classic short-time Fourier analysis by also considering the
derivatives of the sound signal, and it can work with very short analysis
windows. Although originally designed for stationary sounds with no noise,
this method shows excellent results in the presence of noise and it is
currently being extended in order to handle nonstationary sounds as well. A
very efficient synthesis algorithm, based on a recursive description of the
sine function, is able to reproduce sound in real time from the model
parameters. This algorithm allows an extremely fine control of the partials
of the sounds while avoiding signal discontinuities as well as numerical
imprecision, and with a nearly optimal number of operations per partial.
Psychoacoustic phenomena such as masking are considered in order to reduce
on the fly the number of partials to be synthesized.

**************************************************************************

Time: 8:20 AM

2aMU2. Easily extensible unix software for spectral analysis, display,
modification, and synthesis of musical sounds.

Author: James W. Beauchamp 
Location: School of Music and Dept. of Elec. and Computer Eng., Univ. of
Illinois at Urbana-Champaign, Urbana, IL 61801, j-beauch@uiuc.edu

Abstract: 
Software has been developed which enables users to perform time-varying
spectral analysis of individual musical tones or successions of them and to
perform further processing of the data. The package, called SNDAN, is
freely available in source code, uses EPS graphics for display, and is
written in ANSI C for ease of code modification and extension. Two
analyzers, a fixed-filter-bank phase vocoder ("pvan") and a
frequency-tracking analyzer ("mqan") constitute the analysis front end of
the package. While pvan's output consists of continuous amplitudes and
frequencies of harmonics, mqan produces disjoint "tracks." However, another
program extracts a fundamental frequency and separates harmonics from the
tracks, resulting in a continuous harmonic output. "monan" is a program
used to display harmonic data in a variety of formats, perform various
spectral modifications, and perform additive resynthesis of the harmonic
partials, including possible pitch-shifting and time-scaling. Sounds can
also be synthesized according to a musical score using a companion
synthesis language, Music 4C. Several other programs in the SNDAN suite can
be used for specialized tasks, such as signal display and editing.
Applications of the software include producing specialized sounds for music
compositions or psychoacoustic experiments or as a basis for developing new
synthesis algorithms.

**************************************************************************

Time: 8:40 AM

2aMU3. Analysis-synthesis of musical sounds by hybrid models.

Author: S. Ystad 
Location: CNRS, Laboratoire de Mecanique et d'Acoustique 31, Chemin
Joseph Aiguier, 13402 Marseille cedex 20, France

Abstract: 

Analysis-synthesis consists of constructing synthetic sounds from natural
sounds by algorithmic synthesis methods. The models used for this purpose
are of two kinds: physical models which take into account the physical
characteristics of the instrument and signal models which take into account
perceptual criteria. By combining physical and signal models hybrid models
can be constructed taking advantage of the positive aspects of both
methods. In this presentation I show how hybrid models can be adapted to
specific instruments producing both sustained and plucked sounds. In these
cases signal models are used to model the nonlinear source signal. The
parameters of these models are obtained from perceptual criteria such as
the spectral centroid or the tristimulus. The source signal is further
injected into the physical model which consists of a digital wave guide
model. The parameters of the physical model are extracted from the natural
sound by analysis based on linear time-frequency representations such as
the Gabor and the wavelet transforms. The models which will be presented
are real-time compatible and in the flute case an interface adapted to a
traditional flute which pilots a hybrid model will be described.

**************************************************************************

Time: 9:00 AM

2aMU4. Recent developments in automatic classification of musical
instruments.
Author: Bozena Kostek 
Location: Sound & Vision Eng. Dept., Gdansk Univ. of Technol.,
Narutowicza 11/12, 80-952 Gdansk, Poland

Abstract: 

In this paper recent developments in automatic classification of musical
instrument domain are presented. Issues related to automatic classification
of music are data representation of musical instrument sounds, automatic
musical sound recognition, musical duet separation, music recognition, etc.
These problems belong to the so-called Musical Information Retrieval
domain. The best developed is the automatic recognition of individual
musical sounds. In rich literature on this subject many references can be
found. Another issue deals with music information retrieval understood as
searching for music-related features such as song titles, etc. A
query-by-humming can be also cited as one of the MIR topics. The most
difficult problem that deals with automatic recognition of multipitch
excerpts still remains unsolved, however, recently some approaches to this
issue can be found in the literature. Some of the mentioned problems were
subjects of the research carried out at the Sound & Vision Department of
the Gdansk University of Technology. The developed solutions in the domain
of automatic classification of individual sounds, duet separation, and
music recognition will be presented as examples of possible case-studies in
the MIR domain. The proposed approach was evaluated on musical datebases
created at the Department. [Work supported by KBN, Grant No. 4 T11D 014
22.]

**************************************************************************

Time: 9:20 AM

2aMU5. The timbre model.

Author: Kristoffer Jensen 
Location: Dept. of Datalogy, Univ. of Copenhagen, 2100 Copenhagen,
Denmark, http://www.diku.dk

Abstract: 

A timbre model is proposed for use in multiple applications. This model,
which encompasses all voiced isolated musical instruments, has an intuitive
parameter set, fixed size, and separates the sounds in dimensions akin to
the timbre dimensions as proposed in timbre research. The analysis of the
model parameters is fully documented, and it proposes, in particular, a
method for the estimation of the difficult decay/release split-point. The
main parameters of the model are the spectral envelope, the attack/release
durations and relative amplitudes, and the inharmonicity and the shimmer
and jitter (which provide both for the slow random variations of the
frequencies and amplitudes, and also for additive noises). Some of the
applications include synthesis, where a real-time application is being
developed with an intuitive gui, classification, and search of sounds based
on the content of the sounds, and a further understanding of acoustic
musical instrument behavior. In order to present the background of the
model, this presentation will start with sinusoidal A/S, some timbre
perception research, then present

**************************************************************************

Time: 9:40 AM

Unfortunately, this talk was not presented due to the author's illness.

2aMU6. Hypersignal analyses of orchestral instrument signals as
correlated with perception of timbre.

Author: Roger A. Kendall 
Location: Music Cognition and Acoust. Lab., Schoenberg Hall, UCLA, Los
Angeles, CA 90024

Abstract: 

Experiments were conducted to assess the relationships among signal
analyses and timbral perception across the playing range of bassoon,
trombone, tenor saxophone, alto saxophone, soprano saxophone, French horn,
violin, oboe, flute, clarinet, and trumpet. Spectral analyses employed
Hypersignal using 9th order Zoom FFT on 22.05 samples/s signals. Spectral
centroid and spectral flux measures were calculated. Perceptual experiments
included similarity scaling and identification at various pitch chroma
across the playing range of the instruments. In addition, a pilot
experiment assessing the interaction of pitch chroma and timbre was
conducted where timbral judgements were made across, rather than within,
pitch chroma. Results suggest that instruments with relatively low
tessitura produce higher centroid ranges since the larger air column yields
a large number of vibrational modes. In contrast, higher tessitura
instruments, using smaller air columns, produce fewer modes of vibration
with increasing pitch chroma, to the point that the centroids coverge near
Bb5. Perceptual data correspond to the spectral, resulting in less
specificity among instruments at their higher tessituras. It is suggested
that spectral centroid, which maps strongly near A4 in the majority of
studies, must be viewed with caution as a predictor of timbre at tessitura
extremes. the timbre model, show the validity for individual music
instrument sounds, and finally introduce some expression additions to the
model.

**************************************************************************

Time: 10:00 AM

2aMU7. A confirmatory analysis of four acoustic correlates of timbre
space.

Author: Stephen McAdams 
Location: Ircam--CNRS, 1 pl. Igor Stravinsky, F-75004 Paris, France 
Author: Anne Caclin 
Location: Ircam--CNRS, 1 pl. Igor Stravinsky, F-75004 Paris, France 
Author: Bennett K. Smith 
Location: Ircam--CNRS, 1 pl. Igor Stravinsky, F-75004 Paris, France 

Abstract: 

Exploratory multidimensional scaling studies of musical instrument timbres
generally yield two- to four-dimensional perceptual spaces. Acoustic
parameters have been derived that correlate moderately to highly with the
perceptual dimensions. In a confirmatory study, two three-dimensional sets
of synthetic, harmonic sounds equalized for fundamental frequency,
loudness, and perceived duration were designed. The first two dimensions
corresponded to attack time and spectral centroid in both sound sets. The
third dimension corresponded to spectral flux (variation of the spectral
centroid over time) in the first set and to the energy ratio of odd to even
harmonics in the second set. Group analyses of dissimilarity judgments for
all pairs of sounds homogeneously distributed in each space revealed a
two-dimensional solution for the first set and a three-dimensional solution
for the second set. Log attack time and spectral centroid were confirmed as
perceptual dimensions in both solutions. The even/odd energy ratio was
confirmed as a third dimension in the second set. Spectral flux was not
confirmed in the first set, suggesting that this parameter should be
re-examined. Analyses of individual data sets tested for differences across
listeners in the mapping of acoustic parameters to perceptual dimensions.
[Work supported by the CTI program of the CNRS.]

**************************************************************************

Time: 10:20 AM  break

**************************************************************************

                      Contributed Papers

**************************************************************************

Time: 10:30 AM

2aMU8. Piano string modeling: From partial differential equations to
digital wave-guide model.

Author: J. Bensa 
Location: CNRS, Laboratoire de Mecanique et d'Acoustique 31, Chemin
Joseph Aiguier, 13402 Marseille cedex 20, France
Author: S. Bilbao 
Location: Stanford Univ., Stanford, CA 
Author: R. Kronland-Martinet 
Location: CNRS, 13402 Marseille cedex 20, France 
Author: Julius O. Smith III 
Location: Stanford Univ., Stanford, CA 

Abstract: 

A new class of partial differential equations (PDE) is proposed for
transverse vibration in stiff, lossy strings, such as piano strings. While
only second-order in time, it models both frequency-dependent losses and
dispersion effects. By restricting the time-order to 2, valuable advantages
are achieved: First, the frequency-domain analysis is simplified, making it
easy to obtain explicit formulas for dispersion and loss versus frequency;
for the same reason, exact bounds on sampling in associated
finite-difference-schemes (FDS) can be derived. Second, it can be shown
that the associated FDS is "well posed" in the sense that it is stable,
in the limit, as the sampling period goes to zero. Finally, the new PDE
class can be used as a starting point for digital wave-guide modeling [a
digital wave-guide factors one-dimensional wave propagation as purely
lossless throughout the length of the string, with losses and dispersion
lumped in a low-order digital filter at the string endpoint(s)]. We perform
numerical simulations comparing the finite-difference and digital
wave-guide approaches, illustrating the advantages of the latter. We
examine a procedure allowing the resynthesis of natural string vibration;
using experimental data obtained from a grand piano, the parameters of the
physical model are estimated over most of the keyboard range.

**************************************************************************

Time: 10:45 AM

2aMU9. The wave digital piano hammer.

Author: Stefan D. Bilbao 
Location: Ctr. for Computer Res. in Music and Acoust., Dept. of Music,
Stanford Univ., Stanford, CA 94305
Author: Julius O. Smith III 
Location: Ctr. for Computer Res. in Music and Acoust., Dept. of Music,
StanfordCtr. for Computer Res. in Music and Acoust., Dept. of Music,
Stanford Univ., Stanford, CA 94305
Author: Julien Bensa 
Location: S2M-LMA-CNRS, Marseille, France 
Author: Richard Kronland-Martinet 
Location: S2M-LMA-CNRS, Marseille, France 

Abstract: 

For sound synthesis purposes, the vibration of a piano string may be simply
modeled using bidirectional delay lines or digital waveguides which
transport traveling wavelike signals in both directions. Such a digital
wave-type formulation, in addition to yielding a particularly
computationally efficient simulation routine, also possesses other
important advantages. In particular, it is possible to couple the delay
lines to a nonlinear exciting mechanism (the hammer) without compromising
stability; in fact, if the hammer and string are lossless, their digital
counterparts will be exactly lossless as well. The key to this good
property (which can be carried over to other nonlinear elements in musical
systems) is that all operations are framed in terms of the passive
scattering of discrete signals in the network, the sum of the squares of
which serves as a discrete-time Lyapunov function for the system as a
whole. Simulations are presented.

**************************************************************************

Time: 11:00 AM

2aMU10. Musical sound analysis/synthesis using vector-quantized
time-varying spectra.

Author: Andreas F. Ehmann 
Location: Univ. of Illinois at Urbana--Champaign, 5308 Music Bldg., 1114
W. Nevada St., Urbana, IL 61801
Author: James W. Beauchamp 
Location: Univ. of Illinois at Urbana--Champaign, 5308 Music Bldg., 1114
W. Nevada, Urbana, IL 61801

Abstract: 

A fundamental goal of computer music sound synthesis is accurate, yet
efficient resynthesis of musical sounds, with the possibility of
extending the synthesis into new territories using control of
perceptually intuitive parameters. A data clustering technique known as
vector quantization (VQ) is used to extract a globally optimum set of
representative spectra from phase vocoder analyses of instrument tones.
This set of spectra, called a Codebook, is used for sinusoidal additive
synthesis or, more efficiently, for wavetable synthesis. Instantaneous
spectra are synthesized by first determining the Codebook indices
corresponding to the best least-squares matches to the original
time-varying spectrum. Spectral index versus time functions are then
smoothed, and interpolation is employed to provide smooth transitions
between Codebook spectra. Furthermore, spectral frames are pre-flattened
and their slope, or tilt, extracted before clustering is applied. This
allows spectral tilt, closely related to the perceptual parameter
"brightness", to be independently controlled during synthesis. The
result is a highly compressed format consisting of the Codebook spectra
and time-varying tilt, amplitude, and Codebook index parameters. This
technique has been applied to a variety of harmonic musical instrument
sounds with the resulting resynthesized tones providing good matches to
the originals.

**************************************************************************

Time: 11:15 AM

2aMU11. The syrinx: Nature's hybrid wind instrument.

Author: Tamara Smyth 
Location: Ctr. for Computer Res. in Music and Acoust., Stanford Univ.,
Stanford, CA 94305
Author: Julius O. Smith III 
Location: Ctr. for Computer Res. in Music and Acoust., Stanford Univ.,
Stanford, CACtr. for Computer Res. in Music and Acoust., Stanford Univ.,
Stanford, CA 94305

Abstract: 

Birdsong is commonly associated with the sound of a flute. Although the
pure, often high pitched, tone of a bird is undeniably flutelike, its sound
production mechanism more closely resembles that of the human voice, with
the syringeal membrane (the bird's primary vocal organ) acting like vocal
folds and a beak acting as a conical bore. Airflow in the song bird's vocal
tract begins from the lungs and passes through two bronchi, two nonlinear
vibrating membranes (one in each bronchial tube), the trachea, the mouth,
and finally propagates to the surrounding air by way of the beak. Classic
waveguide synthesis is used for modeling the bronchi and trachea tubes,
based on the model of Fletcher [J. Acoust. Soc. Am. (1988, 1999)]. The
nonlinearity of the vibrating syringeal membrane is simulated by
finite-difference methods. This nonlinear valve, driven by a steady
pressure from the bronchi, generates an oscillatory pressure entering the
trachea.

**************************************************************************

Time: 11:30 AM

2aMU12. Electrophysiological correlates of musical timbre perception.

Author: Anne Caclin 
Location: Ircam--CNRS, 1 pl. Igor Stravinsky, F-75004 Paris, France 
Author: Elvira Brattico 
Location: Univ. of Helsinki, Helsinki, Finland 
Author: Bennett K. Smith 
Location: Ircam--CNRS, F-75004 Paris, France 
Author: Mari Tervaniemi 
Location: Univ. of Helsinki, Helsinki, Finland 
Author: Marie-Hilhne Giard 
Location: Inserm U280, F-69424 Lyon 03, France 
Author: Stephen McAdams 
Location: Ircam--CNRS, F-75004 Paris, France 

Abstract: 

Timbre perception has been studied by deriving a multidimensional space of
the perceptual attributes from listeners' behavioral responses. The neural
bases of timbre space were sought. First, a psychophysical timbre
dissimilarity experiment was conducted. A three-dimensional space of 16
synthetic sounds equalized for fundamental frequency, loudness, and
perceived duration was designed. Sounds varied in attack time, spectral
center of gravity, and energy ratio of odd/even harmonics. Multidimensional
scaling revealed a three-dimensional perceptual space with linear or
exponential relations between perceptual and physical dimensions. Second,
in an electrophysiological experiment, the mismatch negativity (MMN)
component of event-related potentials was recorded. The MMN is elicited by
infrequently presented sounds differing in one or more dimensions from more
frequent ones. Although elicited without the focus of attention, it
correlates with the subjects' behavioral responses, revealing the neural
bases of preattentive discrimination. Eight sounds were chosen within the
perceptual space. Changes along individual and combined dimensions elicited
an MMN response. MMN latency varied depending on the dimension changed. In
addition, preliminary analyses tend to show an additivity of the MMN waves
for some pairs of dimensions. These results shed light on the neural
processes underlying the perceptual representation of multidimensional
sounds. [Work supported by the CTI program of the CNRS.]