3. “ A computer program is said to learn from experience E
with respect to some class of tasks T and performance measure P,
if its performance at tasks in T, as measured by P,
improves with experience E.
— Tom M. Mitchell
18. Let’s classify human speech!
Decide whether a spoken phrase contains the word ‘Google’ or not
19. ‘Google’ Detector: Feature Mapping
Options for building X[ ]:
Input: Audio file (WAV, 16 bit mono, 44.1 kHz)
Output: 1 if it contains the word ‘Google’, otherwise 0
1. Use raw waveform as a feature vector.
But: will have 66150 features for a 1.5 second file.
Kinda scary, and easy to overfit.
2. Use Mel-Frequency Cepstral Coefficients (MFCC).
Believed to be closer to human auditory response.
Depending on parameters, can give about 80 features per file.