Combining AcousticParameters andAuditoryFeatures Using Bayes TheoremandEstablishing Their Correspondence with the ProbabilityScales by Semi-Automatic Methods in ForensicSpeakerIdentification
Babita
Bhall1*, CPSingh2,RakeshDhar3
1Physics Division, Forensic ScienceLaboratory, Madhuban, Karnal, Haryana, India
2Physics Division, StateForensic ScienceLaboratory,
Delhi, India
3Department ofApplied Physics, Guru JambeshwarUniversity of Science&Technology, Hisar, India
Received Date:15 August, 2017; Accepted Date: 04
September, 2017; Published Date:11 September, 2017
Citation: Bhall B, Singh CP, Dhar R (2017) Combining Acoustic Parameters and Auditory Features Using Bayes Theorem and Establishing Their Correspondence with the Probability Scales by Semi-Automatic Methods in Forensic Speaker Identification. Forensic Stud: FSTD-112. DOI: 10.29011/FSTD-112. 100012
Abstract
Evolution has resulted in humans and non-humans creating a wide range of sounds used to warn of danger, find mates, and communicate. Only humans are able to produce set of unique distinguishable sounds called, phonemes. One emerging field of forensic investigation is using acoustic parameters and auditory features to conduct speaker identification between known patterns to unknown samples. Then using the method of Bayes’ Theorem, determines the probability statements of similarity. In this paper, we consider two sets of speech samples, questioned and the other is a known specimen speech sample obtained randomly from the actual crime cases. The two speech samples underwent to spectrographic analysis and statistically compared using the Formant Frequencies (F1, F2 & F3) at particular locations. The percentage of similarities between the unknown (Questioned) and the known specimen were ascertained by formant frequencies (acoustic parameters), and for numerical values assigned to the descriptive data (auditory features). Bayes’ Theorem was used to combine objective probability obtained from the acoustic parameters and subjective probability obtained from the auditory features. These values computed against one of the nine probability scales with the help of the software developed by the author. The study showed that this method can be used to compare unknown voice samples to known samples and assign probability statements of similarity.
Keywords: Acoustic Parameters; Auditory Parameters; Bayes’ Theorem; Formant Frequency; Spectrographic Analysis
Introduction
As a consequence of evolution, humans, animals and birds have all developed the ability to produce different types of sounds which allows them to understand signs of danger, locate a mate, and communicate higher forms of thought. Primates had an advanced system of communication that includes vocalization, hand gestures and body language. Humans unlike other primates have the ability to articulate sounds to produce a set of distinguish sounds, called phonemes and this has led to a development of language. This ability of humans separates us from our less evolved cousins which are unable to articulate though they can produce vowel like sounds.
Human and a large number of animals have the ability to identify others by listening to the sounds of their voice. The degree of accuracy with which identification is performed under all sorts of conditions still remains under question especially under the purview of Forensic Science, which deploys the relative usefulness of spectrograms as a supplement to careful listening.
There are two basic methods of speaker identification, one is a subjective method where identification of the individual making the sounds is made by the human mind and the second one is an objective method where the identification is determined by mechanical or electronic means. The most general acoustic parameters of speech include (1) time, (2) formant frequencies and (3) intensity distribution within all bands of frequency simultaneously present in the instantaneous speaker output; in other words, the parameters portrayed by a spectrogram. Comparisons of these general or derived spectral/temporal parameters are the basis of all speaker identification systems, both subjective and objective. One source of variation of these spectral parameters depends on phonetic content; in systems of Speaker identification, in which it is desirable to minimize the phonetic source of variability. Comparisons of a known vocal pattern to unknown voices are performed by using similar sounded words produced by the speakers. The problem is that, even maintaining a similar set text, values of the selected acoustic parameters will differ not only among different talkers called interspeaker variability, but also vary within the same talker called intraspeaker variability if different utterances of the same words are compared. Researchers have found different parameters from the sets of clue-words obtained from questioned as well as the specimen speech samples i.e., parameters that convey the least intraspeaker variability and the most interspeaker variability possible in all conditions that may occur in normal or even in disguised speech [1-4]. Various studies have been conducted on speaker dependent parameters are described in the literatures [5-7]. Different studies have been conducted regarding the statistical interpretation of the evidence obtained during the course of a criminal investigation, using the Bayes’ theorem [8-11].
Comprehensive studies for speaker identification procedures, methods and linking the statistical results to a probability scales was conducted in 2002, 2005 and 2016 [12-14].
In this paper, a comparative study was conducted comparing a questioned (unknown) speech sample with that of a known sample using formant frequencies (F1, F2 & F3), also known as acoustic parameters, and auditory features and then combining them both using auditory features (subjective probability) and acoustic features (objective probability) to calculate the final similarity probability. The author and her team developed a new method using Bayes’ Theorem and utilizing new software for the purposes of calculating probability value of similarly between the two voice patterns using the 1-9 probability scales.
Experimental Methods
Sampling of Speech Material: A set of clue-words for questioned as well as specimen sample were extracted and prepared from text uttered by the suspect while asking for bribery (as this is the text dependent technique). The sets of clue- words contained different type of vowels, namely, /ӕ/, /i/, /ɑ/, /o/, /u/, /ʌ/, /ͻ/ and /ɛ/ which is either preceded or succeeded by the consonants CVC, VC, or CV uttered at similar places of articulation. Selected clue-words are used to extract and study the acoustic parameters i.e. first Formant Frequency (F1) at particular location; second Formant Frequency (F2) at particular location; third Formant Frequency (F3) at particular location and a number of auditory features. This particular speaker was selected randomly from among the data base of actual crime samples. Questioned speech sample has been prepared from the recording present in the mobile and specimen speech sample has been prepared from the direct recording in the laboratory. Both these samples are digitized at sampling rate of 22050 Hz and 16 bit quantization in mono signed.
Experiment: A Set of clue-words were subjected to a spectrographic analysis using the Computerised Speech Lab (CSL-4500). The auditory parameters (F1, F2 & F3) at particular location of vowel nuclei were measured. Auditory features comprised of linguistic and phonetic features were collected. The data was entered into the software developed by the authors which calculate their similarity percentages and weighing objective and subjective data differently using Bayes’ theorem.
Results and Discussions
The results of the acoustic parameters (F1, F2 & F3) at particular location of vowel nuclei are tabulated in Table 1. Auditory features comprised of linguistic and phonetic features are shown in the observation sheet in Figure 2. Figure 1 shows the intonation pattern with formant markings of the words /kӕsis/, /mʌin/, /ho/ & /ʤɑtɑ/ and LPC of the vowel /ӕ/ showing the value of its First Formant Frequency (F1 = 503 Hz). Similarly, values of Second Formant Frequency (F2) and Third Formant Frequency (F3) were also measured. Values for Formant Frequencies (F1, F2 & F3) were measured and calculated for other vowels and their values were measured for questioned as well as specimen samples.
Figure 2 shows the final observation sheet with the auditory features for the questioned as well as specimen samples; duration of both samples, clue-words selected for the spectrographic analysis, their final percentage after combining acoustic and auditory parameters by using Bayes’ Theorem, number of formants used and the final take on the probability scale.
The probability scale has been calculated by evaluating the final percentage that is composed of a combination of (1) acoustic features and auditory parameters, (2) the number of formants used, (3) and the number of clue-words selected. The software weighs these three factors in calculating final probability of similarity between an unknown sample to a known sample. In this case, the evaluation concluded a 90.2% match and therefore, a Positive Identification
Conclusion
Based on the result of this study, an unknown voice samples can be compared with known specimen samples to determine the percentage of similarity by combining both the acoustic parameters and auditory features, individually as well as in combination of both using Bayes’ Theorem. This method offers promising application in the field of forensic and law enforcement. The current method incorporates subjective probability which has not been used to date. The method provides probability statements for that of the voice of the offender matches with that of a suspect. Once this method has been reproduced by others and determined relatable, it will greatly assist law enforcement agencies and the courts.
The author has plans for future large scale studies consisting of 100 speech samples of questioned as well as specimen samples selected randomly from the data base.
Figure 1: Waveform with phonetic transcript of words /kӕsis/, /mʌin/, /ho/ & /ʤɑtɑ/ in window A and C; their respective spectrogram with formant marking in windows B and D &; their respective LPC in windows E and F
Figure 2: Observation sheet showing auditory features, duration, selected clue-words, number of formants used of questioned as well as specimen speech sample, final percentage and its correlation on the probability scale.
English Transcription of Hindi words
|
Word
|
Nuclei vowel
|
QUESTIONED
|
SPECIMEN
|
||||
F1(Hz) |
F2(Hz) |
F3(Hz) |
F1(Hz) |
F2(Hz) |
F3(Hz) |
|||
cases |
kӕsis |
/ӕ/ |
503 |
2079 |
3511 |
503 |
2079 |
4023 |
cases |
kӕsis |
/i/ |
464 |
1721 |
2273 |
464 |
1721 |
2273 |
main |
mʌin |
/ʌ/ |
522 |
1683 |
2224 |
522 |
1683 |
2224 |
main |
mʌin |
/i/ |
503 |
2021 |
3878 |
503 |
2021 |
3878 |
ho |
ho |
/o/ |
455 |
1683 |
2379 |
455 |
1683 |
2379 |
jata |
ʤɑtɑ |
/ɑ/ |
729 |
1586 |
3772 |
729 |
1586 |
3772 |
jata |
ʤɑtɑ |
/ɑ/ |
619 |
1828 |
2514 |
619 |
1692 |
2514 |
depend |
dipɛnd |
/i/ |
464 |
1625 |
2215 |
464 |
1625 |
2215 |
depend |
dipɛnd |
/ɛ/ |
580 |
1896 |
2418 |
580 |
1896 |
2418 |
karta |
kʌrtɑ |
/ʌ/ |
619 |
1712 |
2843 |
619 |
1712 |
2398 |
karta |
kʌrtɑ |
/ɑ/ |
716 |
1625 |
2398 |
716 |
1625 |
2398 |
hai |
hʌi |
/ʌ/ |
590 |
1896 |
2398 |
590 |
1896 |
2398 |
hai |
hʌi |
/i/ |
522 |
2002 |
2340 |
522 |
2002 |
2340 |
do |
dͻ |
/ͻ/ |
445 |
2311 |
3182 |
445 |
2311 |
3182 |
teen |
tin |
/i/ |
416 |
2195 |
2689 |
416 |
2195 |
2863 |
se |
sɛ |
/ɛ/ |
455 |
1654 |
2437 |
455 |
1422 |
2437 |
upar |
upʌr |
/u/ |
416 |
1044 |
2602 |
416 |
1044 |
2602 |
upar |
upʌr |
/ʌ/ |
493 |
1470 |
2456 |
493 |
1470 |
2456 |
namaskar |
nʌmʌʃkɑr |
/ʌ/ |
522 |
1238 |
2273 |
522 |
1238 |
2273 |
namaskar |
nʌmʌʃkɑr |
/ʌ/ |
522 |
1344 |
3482 |
522 |
1344 |
3482 |
namaskar |
nʌmʌʃkɑr |
/ɑ/ |
542 |
1586 |
3714 |
542 |
1586 |
3714 |
uncle |
unkʌl |
/u/ |
1663 |
2592 |
3598 |
1663 |
2592 |
3598 |
uncle |
unkʌl |
/ʌ/ |
513 |
1576 |
2408 |
513 |
1576 |
2408 |
ji |
ʤi |
/i/ |
377 |
2485 |
3849 |
377 |
2137 |
3849 |
ai |
ɑi |
/ɑ/ |
638 |
1499 |
3830 |
638 |
1683 |
3830 |
ai |
ɑi |
/i/ |
551 |
1808 |
3791 |
551 |
2050 |
4043 |
ar |
ɑr |
/ɑ/ |
542 |
2021 |
3704 |
542 |
2021 |
3984 |
ho |
ho |
/o/ |
484 |
1634 |
2776 |
484 |
1799 |
2776 |
jaaegi |
ʤɑjɛgi |
/ɑ/ |
493 |
1857 |
3810 |
493 |
1857 |
3810 |
jaaegi |
ʤɑjɛgi |
/i/ |
426 |
2331 |
3994 |
426 |
2331 |
3994 |
Table1: Featuresextractedforasetofclue-wordsforonespeaker.
Tosi O, Oyer H, Lashbrock W, Pedey C, Nical J, et al. (1972) Experiment on Voice Identification. Journal of Acoustical Society of America 51: 2030-2043.
Aitken CGG (2000) Statistical Interpretation of Evidence/Bayesian Analysis. University of Edinburgh, Edinburgh UK: 717-724.
An Introduction to Forensic Speaker Identification Procedure Advance Interactive Training Course on Forensic Speaker Recognition, CBI Bulletin, Directorate of Forensic Science, Ministry of Home Affairs, Govt. of India. Vol.XIII, No.1, January2005.
Kinoshita Y (2002) Use of Likelihood Ratio and Bayesian Approach in Forensic Speaker Identification, School of Languages and International Education, University of Canberra.