Living in a world where every single electronic device is online and interconnected, privacy is a growing concern. Finding the threshold where audio is unintelligible to transcription software is crucial when everything that we say can be recorded. Even if Automated Speech Recogn
...
Living in a world where every single electronic device is online and interconnected, privacy is a growing concern. Finding the threshold where audio is unintelligible to transcription software is crucial when everything that we say can be recorded. Even if Automated Speech Recognition (ASR) is used in tools, such as Siri or Alexa, designed to ease daily tasks, it could also be used in malicious manners. ASR technology has not been around for too long and like any other new piece of technology, it still has many aspects that have not been looked into and are unknown to the public. This research paper addresses this knowledge gap by examining how sample frequency reduction affects word detection using current well-known transcription software technology such as Google’s speech recognition software and Kaldi’s toolkit. The behavior and performance of these two software pieces have been analyzed for different sample frequencies in the range from 300Hz to 44,1kHz.