Step 2: Pitch Analyser function in Legato.
This is a voice track: white = frequency, cyan = amplitude;
@ current time: 46.84 % average peak amplitude, and an average base frequency of 556 Hz.
Even without sound it's measuring the base frequency of the background noise!
Measurement samples are stored in memory per 1/5th of a single frame (current frame rate = 25 fps).
The amplitude data from Scene Editor is not available in LScript.
That's why it's measured again.
It's needed to distinguish vowels and consonants.
The frequency range is corrected by pow(sample,0.5) to enhance visibility of the lower frequencies.
The highest currently measured base frequency is about 10 kHz.
That should be enough to detect 't' and 's' but that range is too large for visualizing vocal notes, generated by the human voice.
To create a simple pattern of estimated vowels and consonants, I need to interpret frequency + amplitude to :
- detect 'b' and 'd' in relative low frequencies
- detect 's', 't' and 'f' in relative high frequencies
- detect 'a', 'e', 'i', 'o', 'u', 'oe', 'eu', etc in relative mid frequencies
- detect absense of voice to distinguish between words, 'd' and 't', and to suppress measurement of background noise
That simple pattern of estimated vowels and consonants can be used as a guide during automatic positioning/stretching of text in the 'Speech' track, in the middle of the Legato viewport.
The current implementation has limitations (like the lack of wav filename selection, or support of stereo wav files) that need to be addressed before posting a new build of Legato 2020.
Anyway, I think this Pitch Analyzer function has a bright future to be used as a tool in lip-sync animation.
Bookmarks