Author: Paul Cronin, Co-Founder and Partner

Voice recognition technology in assistants has transformed how we interact with our devices, making daily tasks quicker and more convenient. The most prominent players in the field, include Apple’s Siri, Google’s Assistant, Amazon’s Alexa, and Samsung’s Bixby.

We all know that these systems interact with voice and of course, can be trained to recognise our own voices to provide a personised experience.

From a security perspective, I was keen to identify whether this is voice recognition or voice authentication that these systems use.

Let’s look at the differences between the two:

Voice Recognition (Speaker Recognition)

  • What it does: Identifies and understands the words being spoken.
  • Focus: Translating spoken language into text or executable commands.
  • Applications:
  1. Dictation software (converting speech to text)
  2. Virtual assistants like Siri and Alexa (taking commands)
  3. Transcription services

Voice Authentication (Speaker Verification)

  • What it does: Verifies a speaker’s identity based on unique characteristics of their voice.
  • Focus: Creating a “voiceprint” and using that to match a speaker for authorization purposes.
  • Applications:
  1. Security systems for homes or devices
  2. Bank transactions over the phone
  3. Customer support (verifying your identity)

Key Differences

  • Goal: Voice recognition transcribes speech; voice authentication confirms identity.
  • Technology: Both use similar techniques, but voice authentication analyses deeper voice qualities like cadence, pitch, and resonance to create a unique voiceprint.
  • Security: Voice authentication is inherently more secure due to its focus on individualised voice traits

With AI technology making it easier to create deepfakes both with video and audio I was curious to see if the most common digital assistants perform any checking over the voice that is being used. Basically, can I clone my voice using an AI and spoof the digital assistants into thinking it’s me?

I’ve been a big fan of ElevenLabs https://elevenlabs.io/ and their generative Voice AI technology for some time. There are others out there but I’ve found theirs to be very realistic and contextually aware.

With just a few simple recordings of my own voice and an upload to ElevenLabs, I was able to make a convincing basic AI replica of my own voice.

Using text to speech I can make my voice say anything, and whilst with just using a few uploaded audio files it was not perfect, but passable to sound like me.

Next was to take each of my digital assistants and see based on them being trained to recognise my voice would they react to the AI voice of me.

My test was to simply ask the digital assistants who they were talking to, using an AI-generated wake-up call of my voice.

Amazon Alexa

Amazon’s Alexa as well as Voice recognition also supports Voice ID for Purchases (turned on by default BTW)

So with my AI voice “Alexa who are you speaking to”

“I’m talking to Paul this is PP’s Account”

Success, I didn’t try making a voice purchase as I have this disabled but I’m pretty confident this will work without any issue.

Bixby

Bixby Samsungs assistant

With My AI voice “Hey Bixby who are you talking to”

“You’re Paul”

Success, although I’m a fan of Samsung I never use Bixby (Who does?!)

Google (Running on Galaxy s24)

Google’s assistant

With My AI voice “Hey Google Voice Match”

I’m already set-up to recognise your voice on this device.

Success!

Siri

Apple’s SIRI (Apple iPad)

With My AI voice “Hey Siri who are you taking to”

“You’re asking me, Paul”

Success!

Can you unlock a mobile device using Voice control?

A quick search on the net will return plenty of posts telling you that this is possible, however:

Goggle did previously allow this feature however but since 2021 this has been disabled across all versions of Android.

Bixby Again this functionality was present but has since been removed.

Siri, I’m not a big user of Apple so this may have been removed in later IOS releases but a quick look on the net indicates that it might be possible to set up custom Voice control commands to unlock a device. There are also some posts over access with Apple Airpods but I don’t have these to test.

AI Voice Detector?

There appear to be a few AI/Deepfake tools on the internet to try and identify if a voice is AI or real. However, these all appear to be paid for. I also wondered how good these actually are at detection.

ElevenLabs have an AI Speech Classifier on their website to detect whether audio was created using their AI. As I already use this and have an AI of my own voice let’s test it.

I created two initial audio clips repeating a sentence one with my real voice and the other with my AI voice.

Mary had a little lamb, its fleece was white as snow.

And everywhere that Mary went,

The lamb was sure to go.

File one using my own voice MaryPCReal.mp3 let’s check it with the AI Speech Classifier.

Voice1

As expected, it’s really me with only 2.0% likely of being an AI.

File two is using my AI voice MaryPCAI.mp3 let’s check it with the AI Speech Classifier.

Voice2

Good as expected 98% very likely of being an AI.

Ok now what happens if I simply play the AI voice through a speaker and re-record it on my phone as an MP3 MaryPCAInew.mp3 and push it through the classifier.

Voice3

Well, that was not as expected it now thinks the re-recorded AI voice is now not an AI voice but most probably me (Human) with only a 2.0% likelihood of being AI which is not the case it is 100% AI.

Conclusions

I’m wondering why the likes of Amazon; Google Samsung & Apple don’t appear to do any real checks. It might be that there is a latency issue and they of course have to do a trade-off between security and performance.

With Amazon Alexa, you can add and purchase physical products (By default) Google Assistant can also add physical products however purchase approvals in Google Assistant requires a password or fingerprint.

Thankfully I don’t know of any motor vehicles that utilise Voice authentication, they all just have very bad voice recognition to begin with. The built-in vehicle manufacturers that I have tested are all useless anyway with most now using Android Auto/Apple Car Play for voice assistant services and not vehicle control.

One thing I would like to test is any banking security systems that use voice authentication. These systems should in theory offer a lot more complex analysis of the voice audio for authentication and spoofing countermeasures.

You can see the Phishing attacks that can be used with this type of approach now that AI technology is becoming advanced especially if voice authentication is not implemented correctly.

I found a great white paper that goes into technical detail on waveform-level adversarial attacks towards automatic speaker recognition that is worth a read if interested.

https://link.springer.com/article/10.1007/s40747-022-00782-x

Thank you to Liam Hackett at Rootshell Security for helping with the testing of the audio files.

Subscribe So You Never Miss an Update

Your data will be processed in accordance with our Privacy Policy