Watermarking Generative AI: Hype or Cure-All?Experts Say Technique Is a Good Start, But They Cannot Guarantee Safety
Listen the Biden administration tout the set of voluntary commitments tech executives have pledged to honor in artificial intelligence development, and you'll hear one word come up often: watermarking.
See Also: What is next-generation AML?
Watermarking is a core part of a White House trustworthiness initiative it hopes serves as a prelude to regulation, but which in the meantime binds companies into observing steps to guarantee the safety of AI products. Participants in the White House effort, which has included multiple high-level meetings between White House staff and the tech industry, include Google, Meta, Amazon, Microsoft and OpenAI (see: 7 Tech Firms Pledge to White House to Make AI Safe, Secure).
The problem, say AI experts, is that watermarking is as likely to fail as succeed. "The concept of manually watermarking in this expansive landscape can be likened to the daunting task of meticulously painting each individual grain of sand on a beach," Rijul Gupta, co-founder and CEO of DeepMedia, told Information Security Media Group.
The idea behind watermarking is to send a signal that AI-generated material is just that - AI-generated. Fake but realistic-enough generative AI content already spreads faster than content gatekeepers can stop it. If some of it looks or sounds obviously fake, just wait until it doesn't. Hence watermarking, a technique that embeds subtle "noise" into content produced using generative AI algorithms, which can allow users of the material to identify its source and potentially curb disinformation, deepfakes and fraud.
An assumption behind watermarking is that the noise cannot be removed without also damaging the content. Professor Florian Kerschbaum of Canada's University of Waterloo, who is a member of the Waterloo Cybersecurity and Privacy Institute, said that's not necessarily the case.
"The question is how much modifications are necessary to remove a watermark," he said. Watermark removal tools are already available on open and darknet markets, and several are free.
Kerschbaum said his team tested "most academic ones, several of which have served as blueprints for commercial ones." Manually designed models can be evaded with very small modifications if the watermarking algorithm is known, and even having a secret key doesn't protect the content from tampering, he said.
Another research team at the University of Maryland tested the reliability of watermarking techniques for digital images and broke them fairly easily. The team developed a model substitution adversarial attack that could remove watermarks. It also showed that current watermarking methods are vulnerable to spoofing attacks, in which the attacker aims to have real images identified as watermarked ones.
"In particular, by just having black-box access to the watermarking method, we show that one can generate a watermarked noise image which can be added to the real images to have them falsely flagged as watermarked ones," wrote Mehrdad Saberi, the lead researcher of the paper, published with AI researcher Soheil Feizi.
If the effectiveness of watermarks is so questionable, why are big-tech companies and the White House pushing for its use? "Simply because it is still one of the best alternatives and we probably need multiple approaches to tackle the problem," Kerschbaum said.
Even as hackers are looking for ways to undermine them, researchers are hunting for better ways to embed watermarks - a cycle that, like every security problem, promises to stretch into the foreseeable future without end.
One approach is to use AI to incorporate watermarks into AI generators. The University of Waterloo is in the process of developing "stronger" watermarking algorithms that take an adaptive attacker into account, Kerschbaum said. "We have first defined a threat model, then designed a generic adaptive attacker that uses AI and are about to build a watermarking algorithm into the generative AI that is resistant to this attacker and, in consequence, any attack, up to a maximum modification threshold," he said.
Another method is to ensure that AI providers all align their efforts and produce similar watermarks. "If the watermarks differ too much, one can combine outputs - and watermarks - and end up with a good output that doesn't have 'enough' of each watermark to be detected," Kerschbaum said. This method may potentially address the dilemma of making watermark detection tools widely available, and both good and bad actors will have access to them.
A recent announcement by Microsoft and Adobe that they will add Content Credentials, a cryptographic "invisible" watermark icon, to images generated by their respective AI platforms appears to be a step in this direction.