Skip to main content
Exploring what’s next in tech – Insights, information, and ideas for today’s IT and business leaders

Deepfake scams are coming. Is your business ready?

Using AI to impersonate executives is the latest way for scammers to steal your company's money or wreck its reputation. Buckle your seat belts—things are about to get (un)real.

The call sounded genuine enough. It was a director of the company, demanding that a bank branch manager authorize the transfer of $35 million to accounts in other countries.

The money, the director said, was going to be used to acquire another business. But that acquisition never took place. Instead, the cash was funneled to banks in the U.S. and elsewhere, and the voice used to authorize the transfer was a synthetic copy of the actual director's voicea so-called deepfake, generated by deep learning neural networks.

The exact details of the crime, which occurred in 2020, are murky. The victimized company, based in UAE, is never named in court documents, nor are any of the 17 defendants in the alleged international money laundering scheme. We don't know how the AI-generated audio was used, nor what was said.

But it wasn't the first time a synthetic voice had been used to steal money. In 2019, a U.K.-based energy firm was scammed out of $243,000 by audio attackers posing as the company's CEO. And it surely won't be the last.

Sadly, most people don't have a clue how serious a problem deepfakes pose, notes Joseph Steinberg, a cybersecurity adviser and author.

"There seems to be a widespread misperception that deepfakes are created just for entertainment, or that they're all video based," he says. "But the most dangerous deepfakes today are likely those that consist solely of audio, because faking audio is much easier to do and bogus audio is often extremely effective at tricking people. Even in 2021, many people still believe that if they hear my voice saying something, I must be the person saying it. And that's not necessarily true."

Send in the clones

The UAE attack was clearly well planned and highly sophisticated. As additional evidence that the transaction was genuine, the scammers also provided emails from someone claiming to be an attorney named Martin Zelner. (That name and those emails were probably also faked. The only U.S. attorney we could find named Martin Zelner, who recently retired from practice at Cox Padmore Skolnik & Shakarchy in New York, says he's not the guy.)

But creating a believable voice fake doesn't require nearly that much effort. An attacker can take public voice samples—say, a 5- or 10-minute speech given by a corporate executive that's been uploaded to YouTube—and use an off-the-shelf machine learning app to create a plausible clone of that voice. The more real voice data scammers have to draw from, the more convincing the fake.

"The state of the art right now is you can do this almost in real time, and do it incredibly cheaply," says Jevin West, associate professor in the Information School at the University of Washington and director of the Center for an Informed Public. "You can just search online and find all the tools for it. It takes a little bit of expertise, but not much."

Please read: Deepfake video: It takes AI to beat AI

On sites like Resemble.ai or Descript.com, for example, you can train a machine learning model to mimic your voice by reading a series of sentences and then type the words you want your cloned voice to say. The more voice data you provide, the more accurate the model becomes.

In addition, synthetic voices can be harder to detect than other types of deepfakes, says West. With video, there are often signs that something isn't quite right—visual artifacts, blemishes, or asymmetries that can give it away. With voice, there's much less information to draw upon.

Phone calls add an additional layer of complexity because of their relatively poor audio quality, notes Collin Davis, CTO at Pindrop, which makes authentication and anti-fraud solutions for call centers and other voice applications. Most phone conversations use a sampling rate of 8 kHz, says Davis, making synthetic voices harder to spot than if the conversation happened over, say, the 16 kHz audio used in Zoom or Google Meet calls.

Seeing (or hearing) isn't believing

The business risk from deepfakes runs deeper than the social engineering hacks used to extract money from unsuspecting middle managers, says West. Reputational damage is a far greater concern. Attackers could post a fake recording of a CEO making an embarrassing or controversial statement, causing the company's stock price to tumble long enough to make a quick profit on a short sale.

A scammer could use information about an executive taken from social media and the Dark Web, use a deepfake to impersonate them over the phone, open accounts in their name, and then max them out.

Or scammers might upload a synthetic video of someone in a compromising or incriminating position, to extort money or simply harass them. According to DeepTrace Technologies, 96 percent of online deepfakes are porn, where a celebrity's face has been digitally grafted onto the body of an adult actor.

Even if the voice or video are ultimately exposed as synthetic, the damage has been done. And many who viewed or heard the media will never learn—or simply won't believe—it was a fake.

"Fake audio or synthetically generated video is a profound shift in the way humans communicate and perceive what is reality," says Nina Schick, author of "Deepfakes: The Coming Infopocalypse," in a video posted to her Instagram account. "We've tended to think of media we interact with, especially video, as almost being an extension of our own perception. If it's captured on video, it must have happened in real life."

But deepfakes are not intrinsically evil. Computer-generated media was used to re-create the face and voice of "Star Wars" actor Peter Cushing 20 years after his death. AI helped to "restore" actor Val Kilmer's voice after he lost his vocal cords to throat cancer. More recently, deepfake audio was used to simulate the voice of chef Anthony Bourdain in the documentary about his life, making it appear as if Bourdain was reciting words he had only written, not spoken.

The problem isn't fakery so much as the intent to deceive, notes Davis.

"There are good and appropriate use cases for this technology," he says. "What's important is transparency. The clear demarcation between good and malicious use of [synthetic media] is informing the audience that it's computer generated."

Using AI to fight AI

At the moment, the odds of being attacked using a deepfake aren't very high. But that's likely to change as the technology continues to improve and become more accessible to people with limited or no deep learning expertise. And while several companies are working on technology to defend against deepfakes, scammers have a huge head start.

There are really only two ways to combat potential fakes. One is to use AI to flag suspicious media. For example, Pindrop's technology is used by eight of the 10 largest U.S. banks to authenticate customers. Its machine learning algorithms can analyze a variety of signals—from the device or SIM card being used to audio artifacts not detectable by human ears—and generate a confidence score as to whether callers are who they claim to be.

If a deepfake scammer called into a Pindrop customer's interactive voice response system pretending to be an actual customer, the agent receiving the call would be notified there's a high risk the voice is synthetic. But if that same scammer called the CFO's cell phone, there's no infrastructure in place today that can intercept the call and warn the executive that they're being spoofed.

Please read: Phishing, other frauds, continue to plague business, consumers

Such technology may well be on the horizon, Davis adds. Just as some carriers and phones now automatically identify scam robocalls, enhanced caller ID tech may one day be able to tell the difference between a real caller and a clone.

"As these types of attacks become more prevalent, we'll start to see technology that can detect synthetic audio being incorporated into more places," Davis says.

HP Labs is also using AI to develop smart agents that can detect synthetically generated media, says Soumyendu Sarkar, senior distinguished technologist and senior director of AI at the lab.

"We're developing smart learning test agents that can detect how robust a machine learning model is and whether it has bias or drift," says Sarkar. "These agents can also detect anomalies and discontinuities in images that make it easier to identify which ones are fake."

The real McCoy?

The other option is to authenticate the original media, using digital watermarks or other technologies that allow people to identify whether an image, voice, or video has been altered.

For example, Inkscreen's CAPTOR technology integrates into mobile device management software to authenticate media captured with smartphones. Using proprietary watermarks and image metadata, the Inkscreen app can verify that a piece of photographic or video evidence used in a court case has not been tampered with or that the language within a signed contract has not been changed, says company founder Josh Bohls.

If someone tries to replicate a watermarked image by taking a screenshot of it, or edits it in Photoshop, that would be revealed in the metadata, Bohls adds. But these changes are not automatically flagged; you'd have to suspect something was a bit off and then compare the altered image metadata to the original.

Multiple startups are working on blockchain solutions to combat deepfakes, enabling people to trace media files back to their origin and detect if they've been altered. But here, the challenge is scale. With more than 500 hours of video being uploaded to YouTube every minute, to take just one example, the amount of electricity required to add each one to a blockchain would be astronomical.

"In theory, using blockchain sounds great," notes West. "The problem is, blockchain isn't free. If we had unlimited computational power and resources, then maybe. So it's unlikely to be used for anything but the most sensitive videos."

Trust no one

There is a third option: eternal vigilance. We all need to develop a healthy skepticism about the provenance of media, the way most of us have when it comes to emails from Nigerian princes offering us millions of dollars.

"Twenty years ago, if you got an email that appeared to be from your bank, you probably would have assumed it was actually from your bank," says Steinberg. "Today, most people know to be highly skeptical of such communications. We need to start acting similarly when it comes to voice and video."

At the very least, Steinberg says, people need to take extra measures—such as asking detailed personal questions during a phone conversation or having the other person perform a random gesture when engaging via video—to confirm that the person they're talking to is genuine.

Even then, a clever scammer might be able to guess at the answers or anticipate challenges and have a deepfake response at the ready, warns Pindrop's Davis. You definitely want to take additional authentication measures before transferring millions of dollars to someone's account, he adds.

Executives need to be aware of the dangers of deepfake scams and have a crisis communications plan in place before it happens to them, advises West.

"This is not something business leaders can ignore," he adds. "They should be worried about their reputations. Information travels really fast, and something like this can hit hard in a very short amount of time."

This article/content was written by the individual writer identified and does not necessarily reflect the view of Hewlett Packard Enterprise Company.