A novel feature extraction strategy for multi-stream robust emotion identification

Gang Liu, Yun Lei, John H.L. Hansen

INTERSPEECH-2010

We investigate an effective feature extraction front-end for speech emotion recognition, which performs well in clean and noisy conditions. First, we explore the use of perceptual minimum variance distortionless response (PMVDR). These features, originally proposed for accent/dialect and language identification (LID), can better approximate the perceptual scales and are less sensitive to noise and speaker variation. Also developed for LID, shifted delta cepstral (SDC) approach can be used to incorporate additional temporal information. It is known that supra-segmental speech characteristics, such as pitch and intensity, provide better discriminative information for emotion recognition by fusing with other emotion dependent features. Combined PMVDR and SDC together, the system outperforms the baseline system (MFCC based) by 10.3% (absolute). Furthermore, we find both PMVDR and SDC offer much better robustness in noisy condition, which is critical for real applications. All the evaluation the proposed features using the Berlin database of emotion speech.

Keywords: PMVDR, shifted delta cepstral, SDC, emotion identification, robustness

Full paper

Gang Liu, Yun Lei, John H.L. Hansen (2010). "A novel feature extraction strategy for multi-stream robust emotion identification", INTERSPEECH-2010, Sep., pp.482-485.


@INPROCEEDINGS{Liu2010a,
	author='Gang Liu and Yun Lei and John H.L. Hansen',
	title='A novel feature extraction strategy for multi-stream robust emotion identification',
	booktitle='INTERSPEECH-2010',
	month='Sep.',
	pages='482-485',
	year='2010',
	address='Makuhari, Japan'
}