Center for Robust Speech Systems

CRSS Tools

QCN-RASTALP

QCN-RASTALP is designed to compensate for the cepstral variance introduced by channel variations, additive noise, and Lombard effect.


Quantile-based cepstral dynamics normalization (QCN) is based on the concept of cepstral mean and variance normalization (CMN, CVN). CMN and CVN assume that the distributions of cepstral coefficients are Gaussian (CVN), or at least symmetric to their means (CMN). However, analyses have shown that especially low cepstral coefficients in clean speech tend to be skewed and multimodal (having more than one significant extreme) and that presence of noise and changes in talking style considerably affect distribution shapes and skewness. In the case when the actual cepstral distributions drift from Gaussian, mean and variance become less effective in representing the actual range of cepstral samples' occurrence, or their 'dynamic range'. QCN determines the dynamic range from the cepstral histogram quantiles, bounding certain portion of the sample occurrences. Instead of distribution means, quantile means are subtracted from all samples, followed by dynamic range normalization to unity.


QCN-RASTALP combines QCN with low-pass temporal filtering adopted from RASTA. While original RASTA performs band-pass filtering, where the high pass component corresponds to CMN, in QCN-RASTALP, the high-pass filtering is bypassed and only the low-pass is preserved. The new low-pass filter significantly reduces the transient distortions seen in the original RASTA and allows for replacing sub-optimal CMN by other cepstral compensations.


Source codes for QCN-RASTALP and RASTALP are provided below.



References:


  • Boril, H., Hansen, J. H. L. (2010). “Unsupervised Equalization of Lombard Effect for Speech Recognition in Noisy Adverse Environments,” IEEE Transactions on Audio, Speech, and Language Processing, 18(6), 1379-1393.

  • Boril, H., Hansen, J. H. L. (2011). “UT-Scope: Towards LVCSR under Lombard Effect Induced by Varying Types and Levels of Noisy Background,” IEEE ICASSP'11, 4472-4475, Prague, Czech Republic, May 2011.