submitted
This commit is contained in:
parent
b2d3bccb29
commit
6e56067d1c
@ -457,8 +457,9 @@ literal "false"
|
|||||||
\end_inset
|
\end_inset
|
||||||
|
|
||||||
was used.
|
was used.
|
||||||
Regular periodic frequencies in the time domain present as a peak in the
|
Regular periodic frequencies in the time domain present as peaks in the
|
||||||
quefrency domain, this can also be achieved with an auto-corelation function.
|
quefrency domain, these can also be identified with an auto-corelation
|
||||||
|
function.
|
||||||
The use of a low-pass filter was investigated in order to smooth the cepstrum
|
The use of a low-pass filter was investigated in order to smooth the cepstrum
|
||||||
before programmatically finding pitch period candidates by applying
|
before programmatically finding pitch period candidates by applying
|
||||||
\begin_inset Formula $x$
|
\begin_inset Formula $x$
|
||||||
@ -499,8 +500,8 @@ literal "false"
|
|||||||
values.
|
values.
|
||||||
Lowering the quefrency corresponds to an increase in frequency, thus it
|
Lowering the quefrency corresponds to an increase in frequency, thus it
|
||||||
is reasonable to discard these values when 20 samples represents 1200Hz
|
is reasonable to discard these values when 20 samples represents 1200Hz
|
||||||
sampled at 24kHz, a frequency higher than that of the fundamental frequency
|
when sampled at 24kHz, a frequency higher than that of the fundamental
|
||||||
being investigated.
|
frequency being investigated.
|
||||||
Additionally a minimum cepstrum threshold of 0.075 was used, from here the
|
Additionally a minimum cepstrum threshold of 0.075 was used, from here the
|
||||||
quefrency candidate with the highest value was used as the pitch period.
|
quefrency candidate with the highest value was used as the pitch period.
|
||||||
\end_layout
|
\end_layout
|
||||||
@ -584,8 +585,8 @@ noprefix "false"
|
|||||||
\end_inset
|
\end_inset
|
||||||
|
|
||||||
.
|
.
|
||||||
The frequency response for the filters these coefficients represent can
|
The frequency response for similar filters of order 25 can be seen in figure
|
||||||
be seen in figure
|
|
||||||
\begin_inset CommandInset ref
|
\begin_inset CommandInset ref
|
||||||
LatexCommand ref
|
LatexCommand ref
|
||||||
reference "fig:stacked-spectra"
|
reference "fig:stacked-spectra"
|
||||||
@ -1447,7 +1448,8 @@ hood_m
|
|||||||
\begin_inset Caption Standard
|
\begin_inset Caption Standard
|
||||||
|
|
||||||
\begin_layout Plain Layout
|
\begin_layout Plain Layout
|
||||||
Order 20 LPC coefficients for both investigated samples
|
Order 20 LPC coefficients for both investigated samples, source segments
|
||||||
|
taken from the first 100ms of each vowel sample
|
||||||
\begin_inset CommandInset label
|
\begin_inset CommandInset label
|
||||||
LatexCommand label
|
LatexCommand label
|
||||||
name "tab:Order-20-LPC-Coeffs"
|
name "tab:Order-20-LPC-Coeffs"
|
||||||
@ -1598,8 +1600,8 @@ name "fig:stacked-spectra"
|
|||||||
\end_layout
|
\end_layout
|
||||||
|
|
||||||
\begin_layout Standard
|
\begin_layout Standard
|
||||||
As the spectra are plotted with the same frequency bounds, the peaks of
|
As the spectra are plotted with the same frequency axes bounds, the peaks
|
||||||
the filter response corresponding to estimations of the formant frequencies
|
of the filter response corresponding to estimations of the formant frequencies
|
||||||
can be compared between the male and females voice.
|
can be compared between the male and females voice.
|
||||||
In general the male's formant frequencies are lower than for the female's
|
In general the male's formant frequencies are lower than for the female's
|
||||||
sample, this can be seen specifically with the first few peaks.
|
sample, this can be seen specifically with the first few peaks.
|
||||||
@ -1714,12 +1716,103 @@ name "fig:Spectrum-Tile"
|
|||||||
|
|
||||||
\end_layout
|
\end_layout
|
||||||
|
|
||||||
|
\begin_layout Subsubsection
|
||||||
|
Source Segment Length Variation
|
||||||
|
\end_layout
|
||||||
|
|
||||||
\begin_layout Standard
|
\begin_layout Standard
|
||||||
\begin_inset Flex TODO Note (inline)
|
Figure
|
||||||
|
\begin_inset CommandInset ref
|
||||||
|
LatexCommand ref
|
||||||
|
reference "fig:seg_length"
|
||||||
|
plural "false"
|
||||||
|
caps "false"
|
||||||
|
noprefix "false"
|
||||||
|
|
||||||
|
\end_inset
|
||||||
|
|
||||||
|
presents the speech sample and LPC filter spectral response for different
|
||||||
|
source sample lengths.
|
||||||
|
As the source sample length increases the spectral profile becomes less
|
||||||
|
smooth with higher peaks and deeper troughs throughout.
|
||||||
|
Additionally the mid to higher frequencies are affected more, the first
|
||||||
|
few formants are less affected.
|
||||||
|
|
||||||
|
\end_layout
|
||||||
|
|
||||||
|
\begin_layout Standard
|
||||||
|
\begin_inset Float figure
|
||||||
|
wide false
|
||||||
|
sideways false
|
||||||
status open
|
status open
|
||||||
|
|
||||||
\begin_layout Plain Layout
|
\begin_layout Plain Layout
|
||||||
segment length variation?
|
\noindent
|
||||||
|
\align center
|
||||||
|
\begin_inset Graphics
|
||||||
|
filename /mnt/files/dev/matlab/lpss/resources/hood_m_25spect.png
|
||||||
|
lyxscale 10
|
||||||
|
width 25col%
|
||||||
|
|
||||||
|
\end_inset
|
||||||
|
|
||||||
|
|
||||||
|
\begin_inset Graphics
|
||||||
|
filename /mnt/files/dev/matlab/lpss/resources/hood_m_50spect.png
|
||||||
|
lyxscale 10
|
||||||
|
width 25col%
|
||||||
|
|
||||||
|
\end_inset
|
||||||
|
|
||||||
|
|
||||||
|
\begin_inset Graphics
|
||||||
|
filename /mnt/files/dev/matlab/lpss/resources/hood_m_100spect.png
|
||||||
|
lyxscale 10
|
||||||
|
width 25col%
|
||||||
|
|
||||||
|
\end_inset
|
||||||
|
|
||||||
|
|
||||||
|
\begin_inset Graphics
|
||||||
|
filename /mnt/files/dev/matlab/lpss/resources/hood_m_200spect.png
|
||||||
|
lyxscale 10
|
||||||
|
width 25col%
|
||||||
|
|
||||||
|
\end_inset
|
||||||
|
|
||||||
|
|
||||||
|
\end_layout
|
||||||
|
|
||||||
|
\begin_layout Plain Layout
|
||||||
|
\begin_inset Caption Standard
|
||||||
|
|
||||||
|
\begin_layout Plain Layout
|
||||||
|
Increasing source segment lengths for the
|
||||||
|
\begin_inset listings
|
||||||
|
lstparams "basicstyle={\ttfamily}"
|
||||||
|
inline true
|
||||||
|
status open
|
||||||
|
|
||||||
|
\begin_layout Plain Layout
|
||||||
|
|
||||||
|
hood_m
|
||||||
|
\end_layout
|
||||||
|
|
||||||
|
\end_inset
|
||||||
|
|
||||||
|
sample
|
||||||
|
\begin_inset CommandInset label
|
||||||
|
LatexCommand label
|
||||||
|
name "fig:seg_length"
|
||||||
|
|
||||||
|
\end_inset
|
||||||
|
|
||||||
|
|
||||||
|
\end_layout
|
||||||
|
|
||||||
|
\end_inset
|
||||||
|
|
||||||
|
|
||||||
\end_layout
|
\end_layout
|
||||||
|
|
||||||
\end_inset
|
\end_inset
|
||||||
@ -1775,7 +1868,7 @@ head_f
|
|||||||
\begin_inset Formula $f_{1}$
|
\begin_inset Formula $f_{1}$
|
||||||
\end_inset
|
\end_inset
|
||||||
|
|
||||||
as it did not refer to a peak in the way that would indicate a formant.
|
as it did not refer to a maximum that would indicate a formant.
|
||||||
\end_layout
|
\end_layout
|
||||||
|
|
||||||
\begin_layout Standard
|
\begin_layout Standard
|
||||||
@ -2456,8 +2549,8 @@ noprefix "false"
|
|||||||
When employing smoothing, the peak corresponding to the pitch period has
|
When employing smoothing, the peak corresponding to the pitch period has
|
||||||
been amplified compared to the unsmoothed curve where the pitch period
|
been amplified compared to the unsmoothed curve where the pitch period
|
||||||
does not reach far beyond the noise of the rest of the function.
|
does not reach far beyond the noise of the rest of the function.
|
||||||
Following this, smoothing was employed when identifying the fundamental
|
As a result of this, smoothing was employed in the following when identifying
|
||||||
frequency.
|
the fundamental frequency.
|
||||||
\end_layout
|
\end_layout
|
||||||
|
|
||||||
\begin_layout Standard
|
\begin_layout Standard
|
||||||
@ -2544,8 +2637,8 @@ noprefix "false"
|
|||||||
\end_inset
|
\end_inset
|
||||||
|
|
||||||
.
|
.
|
||||||
The identified pitch period,
|
The identified quefrency pitch period,
|
||||||
\begin_inset Formula $t_{p}$
|
\begin_inset Formula $q_{p}$
|
||||||
\end_inset
|
\end_inset
|
||||||
|
|
||||||
, and the corresponding fundamental frequency,
|
, and the corresponding fundamental frequency,
|
||||||
@ -2577,7 +2670,7 @@ noprefix "false"
|
|||||||
\begin_layout Standard
|
\begin_layout Standard
|
||||||
\begin_inset Formula
|
\begin_inset Formula
|
||||||
\[
|
\[
|
||||||
f_{f}=\frac{1}{\nicefrac{t_{p}}{f_{s}}}
|
f_{f}=\frac{1}{\nicefrac{q_{p}}{f_{s}}}
|
||||||
\]
|
\]
|
||||||
|
|
||||||
\end_inset
|
\end_inset
|
||||||
@ -2795,8 +2888,8 @@ Synthesis
|
|||||||
\end_layout
|
\end_layout
|
||||||
|
|
||||||
\begin_layout Standard
|
\begin_layout Standard
|
||||||
Following the convolution of the impulse train and the LPC filter, the synthesis
|
Following the convolution of the impulse train and the LPC filter, the spectrogr
|
||||||
ed sound and the original can be seen presented in figure
|
ams for the original and synthesised sound can be seen in figure
|
||||||
\begin_inset CommandInset ref
|
\begin_inset CommandInset ref
|
||||||
LatexCommand ref
|
LatexCommand ref
|
||||||
reference "fig:Spectrograms-synth"
|
reference "fig:Spectrograms-synth"
|
||||||
@ -2808,8 +2901,8 @@ noprefix "false"
|
|||||||
|
|
||||||
.
|
.
|
||||||
The circled areas highlight similar portions, the formant frequencies can
|
The circled areas highlight similar portions, the formant frequencies can
|
||||||
be seen in both.
|
be seen as bright horizontal lines in both.
|
||||||
Despite being quasi-stationary, some variation in time can be seen for
|
Despite being quasi-stationary, some variation in time can be seen throughout
|
||||||
the original signal.
|
the original signal.
|
||||||
The stationary synthesised signal, however, has a flat profile in time.
|
The stationary synthesised signal, however, has a flat profile in time.
|
||||||
\end_layout
|
\end_layout
|
||||||
@ -2871,7 +2964,7 @@ buzzy
|
|||||||
quality resembling a sawtooth wave of the same pitch as the original voice
|
quality resembling a sawtooth wave of the same pitch as the original voice
|
||||||
sample.
|
sample.
|
||||||
At these orders, the synthesised sound can not accurately be discerned
|
At these orders, the synthesised sound can not accurately be discerned
|
||||||
as being speech.
|
as speech.
|
||||||
As the filter order increases, the tone of the sound becomes less harsh
|
As the filter order increases, the tone of the sound becomes less harsh
|
||||||
and by around order 20 the sample could be identified as being of a voice.
|
and by around order 20 the sample could be identified as being of a voice.
|
||||||
By order 40, much of the harsh tone has been smoothed and the sample subjective
|
By order 40, much of the harsh tone has been smoothed and the sample subjective
|
||||||
@ -2911,6 +3004,16 @@ The use of low-pass filtering on the cepstrum when identifying the fundamental
|
|||||||
\end_layout
|
\end_layout
|
||||||
|
|
||||||
\begin_layout Standard
|
\begin_layout Standard
|
||||||
|
The relative frequencies for male and female speech was as expected with
|
||||||
|
the male speech segment having both lower fundamental frequencies and formant
|
||||||
|
frequencies.
|
||||||
|
\end_layout
|
||||||
|
|
||||||
|
\begin_layout Standard
|
||||||
|
\begin_inset Note Comment
|
||||||
|
status open
|
||||||
|
|
||||||
|
\begin_layout Plain Layout
|
||||||
A 100ms vowel segment sampled at 24kHz totals to 2,400 samples.
|
A 100ms vowel segment sampled at 24kHz totals to 2,400 samples.
|
||||||
Assuming that each is represented by a float of 4 bytes, this uncompressed
|
Assuming that each is represented by a float of 4 bytes, this uncompressed
|
||||||
vowel segment would fill 9600 bytes of storage.
|
vowel segment would fill 9600 bytes of storage.
|
||||||
@ -2928,6 +3031,11 @@ literal "false"
|
|||||||
.
|
.
|
||||||
\end_layout
|
\end_layout
|
||||||
|
|
||||||
|
\end_inset
|
||||||
|
|
||||||
|
|
||||||
|
\end_layout
|
||||||
|
|
||||||
\begin_layout Section
|
\begin_layout Section
|
||||||
Conclusion
|
Conclusion
|
||||||
\end_layout
|
\end_layout
|
||||||
@ -2941,7 +3049,7 @@ Within this work, a complete source-filter model of speech has been presented,
|
|||||||
final audio sample.
|
final audio sample.
|
||||||
Various statistics about the original samples were calculated including
|
Various statistics about the original samples were calculated including
|
||||||
the formant frequencies and the fundamental frequency.
|
the formant frequencies and the fundamental frequency.
|
||||||
With a sufficient filter order, sound samples comparable to the originals
|
With a sufficient filter order, sound samples comparable to human speech
|
||||||
were generated.
|
were generated.
|
||||||
\end_layout
|
\end_layout
|
||||||
|
|
||||||
|
BIN
resources/hood_m_100spect.png
Normal file
BIN
resources/hood_m_100spect.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 134 KiB |
BIN
resources/hood_m_200spect.png
Normal file
BIN
resources/hood_m_200spect.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 142 KiB |
BIN
resources/hood_m_25spect.png
Normal file
BIN
resources/hood_m_25spect.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 96 KiB |
BIN
resources/hood_m_50spect.png
Normal file
BIN
resources/hood_m_50spect.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 113 KiB |
Loading…
Reference in New Issue
Block a user