A new artificial intelligence system generates fake documents to fool adversaries.
During World War II, British intelligence agents planted false documents on a corpse to fool Nazi Germany into preparing for an assault on Greece. “Operation Mincemeat” was a success, and covered the actual Allied invasion of Sicily.
The “canary trap” technique in espionage spreads multiple versions of false documents to conceal a secret. Canary traps can be used to sniff out information leaks, or as in WWII, to create distractions that hide valuable information.
WE-FORGE, a new data protection system designed in the Department of Computer Science, uses artificial intelligence to build on the canary trap concept. The system automatically creates false documents to protect intellectual property such as drug design and military technology.
“The system produces documents that are sufficiently similar to the original to be plausible, but sufficiently different to be incorrect,” says V.S. Subrahmanian, the Distinguished Professor in Cybersecurity, Technology, and Society and director of the Institute for Security, Technology, and Society.
Cybersecurity experts already use canary traps, or “honey files,” and foreign language translators to create decoys that deceive would-be attackers. WE-FORGE improves on these techniques by using natural language processing to automatically generate multiple fake files that are both believable and incorrect. The system also inserts an element of randomness to keep adversaries from easily identifying the real document.
WE-FORGE can be used to create numerous fake versions of any technical design document. When adversaries hack a system, they are faced with the daunting task of figuring out which one of the many similar documents is real.
“Using this technique, we force an adversary to waste time and effort in identifying the correct document. Even if they do, they may not have confidence that they got it right,” says Subrahmanian.
Creating the false technical documents is no less daunting. According to the research team, a single patent can include over 1,000 concepts with up to 20 possible replacements. WE-FORGE can end up considering millions of possibilities for all of the concepts that might need to be replaced in a single technical document.
“Malicious actors are stealing intellectual property right now and getting away with it for free,” says Subrahmanian. “This system raises the cost that thieves incur when stealing government or industry secrets.”
The WE-FORGE algorithm works by computing similarities between concepts in a document and then analyzing how relevant each word is to the document. The system then sorts concepts into “bins” and computes the feasible candidate for each group.
“WE-FORGE can also take input from the author of the original document,” says Dongkai Chen, Guarini ’21. “The combination of human and machine ingenuity can increase costs on intellectual-property thieves even more.”
As part of the research, the team falsified a series of computer science and chemistry patents and asked a panel of knowledgeable subjects to decide which of the documents were real.
According to the research, published in ACM Transactions on Management Information Systems, the WE-FORGE system was able to “consistently generate highly believable fake documents for each task.”
Unlike other tools, WE-FORGE specializes in falsifying technical information rather than just concealing simple information, such as passwords.
WE-FORGE improves on an earlier version of the system—known as FORGE—by removing the time-consuming need to create guides of concepts associated with specific technologies. WE-FORGE also ensures that there is greater diversity among fakes, and follows an improved technique for selecting concepts to replace and their replacements.
Reference: “Using Word Embeddings to Deter Intellectual Property Theft through Automated Generation of Fake Documents” by Almas Abdibayev, Dongkai Chen, Haipeng Chen,Deepti Poluru and V. S. Subrahmanian, February 2021, ACM Transactions on Management Information Systems.DOI: 10.1145/3418289
Almas Abdibayev Guarini ’21, Deepti Poluru Guarini ’19, and former postdoctoral researcher Haipeng Chen contributed to this research while with the Department of Computer Science.