Interspeech 2008 Special Session: Forensic Speaker Recognition - Traditional and Automatic Approaches

There have been two basic approaches to Forensic Speaker Recognition: Traditional, rooted in phonetics, and Automatic, rooted in engineering. This Special Session will include papers from both approaches, and will considers how the strengths of the two approaches may be combined.

A major advantage of the Traditional approach is that it takes phonetic context into account and can therefore potentially extract a great deal of fine-grained information about inter-speaker differences. Another advantage is that it can potentially be explained in a way which will be understandable to a jury, for example, it might be explained that there are difference in the way the ‘r’ sounds are produced. A disadvantage is that it is typically labour intensive and slow. This limits the amount of data which can be analysed, which in turn limits the resulting strength of evidence.

A major advantage of the automatic approach is its use of sophisticated signal processing techniques which allow for rapid processing of large amounts of data. This potentially results in greater strength of evidence than provided by a typical Traditional analysis. A disadvantage of the automatic approach is that the results cannot usually be explained in a way which will be understandable to a jury. Automatic analyses have also typically been applied to global features of speech, such as long-term spectrum, and have not fully exploited the information available in local phonetic detail, information which could potentially be used to further increase the strength of evidence.

A possible way forwards involves combining aspects of the two approaches. For example, a Traditional-type analysis could be guided by a human expert, but be made more efficient by making greater use of automated measurement tools. Statistical analytic techniques which have been developed in the Automatic approach could be applied to data from the Traditional approach. Formant measurements are typically used in the Traditional approach and cepstral coefficients are typically used in the Automatic approach. Formants could be used in an Automatic approach and it may be that they are more robust to degradation due to noise, bandpass filtering, etc.. which is common in real forensic data.

10:00– 2:00, Thursday 25 September, 2008
Plaza 3 & 4

The special session followed the Keynote address Forensic automatic speaker recognition: Fiction or Science by Joaquín González-Rodríguez, 8:30–9:30 in the Great Hall.

Aditional papers related to foresnic speaker recognition were presented in the poster session: Speaker recognition: Adverse conditions & forensics, 13:30–15:30 in Mezzanine Poster Area 3.

The Session consised of five 20-minute oral presentations, followed by a 20-minute panel discussion.

