Feature extraction for speech recognition

Feature Extraction from Speech and other kinds of Audio and feature extraction in speech recognition
Dr.JakeFinlay Profile Pic
Dr.JakeFinlay,Germany,Teacher
Published Date:22-07-2017
Your Website URL(Optional)
Comment
11. Feature Extraction from Speech and other kinds of Audio Rahil Mahdian 13.07.2015• Feature Extraction (Jakobson) 1. Total energy 2. Spectral Center of Gravity (SCG) 3. Duration 4. Low, medium and high frequency energy 5. Formant transitions 6. Silence detection 7. Voicing detection 8. Rate of change of energy in various frequency bands 9. Rate of change of SCG 10. Most prominent peak frequency 11. Rate of change of the most prominent peak frequency 12. Zero-crossing rate 2Time domain features ZCR: 3Features in the Time Domain: Short-time Energy M1 (n) 2 E f Definition:  mn m0 Example: From: Schukat-Talamazzini 4LPC features 5ASR Speech Features • Learn about the most established feature extraction from speech • Mel Frequency Cepstral Coefficients: MFCC 6Pre-emphasis The source signal for voiced sounds has slope of -6 dB/octave: 4k 0 1k 2k 3k frequency We want to model only the resonant energies, not the source. But LPC will model both source and resonances. If we pre-emphasize the signal for voiced sounds, we flatten it in the spectral domain, and source of speech more closely approximates impulses. LPC can then model only resonances (important information) rather than resonances + source. 7 energy (dB)Pre-emphasis • Correct for filtering of the lips • Iterative scheme: ´ f f f n n n1 • Typical values: a=0.95 8Example: putting a rectangular on a speech signal Frame shift Frame width typ.: 10ms typ.: 25ms  (m) iin F (e ) f w e  n mn n 9Fourier Transform in Practice • Use “Fast Fourier Transform” (FFT) • Requires number of samples N to be power of 2 (e.g. N=256) • Code available • Complexity N log( N) 10Established Window Functions • Use to get sharper peaks R • Rectangular window: w1 n • Generalized Hamming Window: (a=0.46 : standard 2n H w (1) cos( ) n Hamming window) N1 nN / 2 2 0.5( ) G 3N / 2 • Gauss window: w e n n n P w 4 (1 ) • Parabola window: n N N n=0...N-1 •Window functions vanish outside this interval 11Rewrite of Fourier Transform  • Definition: (m) iin F (e ) f w e  n mn n • Window functions vanish outside the interval n=0...N-1 1  2 • Define N n N1 i2 (m) N F f w e  mn n n0 Note: for further processing, we take the absolute value of the Fourier Transform 12Example for ö Short time spectrum Smoothed spectrum Frequency (Hz) Frequency (Hz) 13Spectrogram • Calculate a spectrum for any point in time • Code the local intensity: color/grey scale Time 14Spectrogram http://www.wilhelm-kurz-software.de/dynaplot/applicationnotes/spectrogram.htm "To return to the main menu, press the star key". 15Use praat to generate a Spectrogram • Praat: software for doing phonetics by computer • Written by: Paul Boersma and David Weenink • quite powerful: spectrograms, formants, pitch, … • Download: http://www.fon.hum.uva.nl/praat/ 16Use praat to generate a Spectrogram 17Smoothing the Spectrogram: Filterbank • Idea: imitate ear • Do an average over neighboring frequencies • Scale the frequencies according to the mel or the Bark scale a Reduction from 256 Fourier coefficients to 24 outputs of a filterbank 18Example of a Filterbank 19Filterbank • Spacing of center frequency: – According to mel scale: f Mel( f ) 2595 log (1 ) 10 700 • Low frequency cut off: – E.g. 300 Hz (for telephone speech) • High frequency cut off: – E.g. 3400 Hz (for telephone speech ) • Different settings for e.g. head set connected PC 20