12. Einsamkeit. (Loneliness/Solitude)

Piece:

12. Einsamkeit

Durchkomponiertes Lied in h-Moll mit drei Strophen. [more]

Musical aspect/feature:

timbre/MFCC+phase

Magnitude spectogram (3 sec) weighted by amount of rapid phase changes. [more]

Recording:

Gardner Museum: Scarlata, 2006

Singer: Randall Scarlata (Baritone), Piano: Jeremy Denk. Recording of a performance at the Isabella Stewart Gardner Museum, Boston.

Source, License: CC BY-NC-ND 2.0

Download:

audio stanzas

Information about our segmentation of »12. Einsamkeit«

Durchkomponiertes Lied in h-Moll mit drei Strophen.
Die Segmentierung entspricht der Stropheneinteilung, wobei die dritte Strophe als Einzige wiederholt wird (Segmente C1 und C2).
Der gesungene Part lässt sich musikalisch in drei Segmente unterteilen: A, B und C.
Das achttaktige Segment A lässt sich wiederum in zwei Teile aufspalten: ein viertaktiges Motiv (T. 7-10 mit Auftakt) und seine genaue Wiederholung (T. 11-14 mit Auftakt).
Das achttaktige Segment B lässt sich ebenso wie A in zwei Teile aufspalten: ein viertaktiges Motiv (T. 15-18 mit Auftakt) und seine leicht variierte Wiederholung (T. 19-22 mit Auftakt).
Auch das Segment C besteht aus zwei nahezu identischen Teilen: C1 und C2, wobei C2 einen anderen Schluss als C1 hat.

Lyrics: Project Gutenberg

MFCC (Mel Frequency Cepstral Coefficients)

This feature was originally developed for speech analysis and speaker recognition. After transforming a musical signal in a spectrogram representation, MFCC-based features are computed by combining suitable frequency bands into percepually inspired Mel bands and applying a decorrelating discrete cosine transformation. Especially, the lower MFCC bands describe the coarse form of the spectral envelope which correlates to timbre. For deriving MFCC-ENS features (MFCC Energy Normalized Statistics), these MFCC features are quantized, smoothed (in temporal direction), and normalized with respect to the ℓ²-norm.

Furthermore, we present a novel variant of MFCC-ENS features by prior weighting the spectrogram by the second derivative of the spectral phase information in the time domain. This indicates slight changes in pitch which typically occur in vocals and which are not present in piano music. Especially the harmonics of piano-played notes are attenuated by this method which leads to smaller spectral envelopes in the piano sections and hence to more discriminative timbre-related MFCC features.

Literature

Steven Davis, Paul Mermelstein: Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences, Readings in Speech Recognition 1990, pp. 65–74.
Hiroko Terasawa, Malcolm Slaney, Jonathan Berger: The thirteen colors of timbre, WASPAA 2005, pp. 323–326.
Dirk v. Zeddelmann, Frank Kurth: A construction of compact MFCC-type features using short-time statistics for applications in audio segmentation, EUSIPCO 2009, pp. 1504–1508.