The idea of fine-tuning digital spearphishing attacks to hack members of the UK Parliament with Large Language Models (LLMs) sounds like it belongs more in a Mission Impossible movie than a research study from the University of Oxford.

But it’s exactly what one researcher, Julian Hazell, was able to simulate, adding to a collection of studies that, altogether, signify a seismic shift in cyber threats: the era of weaponized LLMs is here. 

By providing examples of spearphishing emails created using ChatGPT-3, GPT-3.5, and GPT-4.0, Hazell reveals the chilling fact that LLMs can personalize context and content in rapid iteration until they successfully trigger a response from victims. 

“My findings reveal that these messages are not only realistic but also cost-effective, with each email costing only a fraction of a cent to generate,” Hazell writes in his paper published in the open-access journal arXiv back in May 2023. Since that time, the paper has been cited in more than 23 others in the subsequent six months, showing the concept is being noticed and built upon in the research community.

The research all adds up to one thing: LLMs are capable of being fine-tuned by rogue attackers, cybercrime, Advanced Persistent Threat (APT), and nation-state attack teams anxious to drive their economic and social agendas. The rapid creation of  FraudGPT in the wake of ChatGPT showed how lethal LLMs could become. Current research finds that GPT-4. Llama 2 and other LLMs are being weaponized at an accelerating rate.

The rapid rise of weaponized LLMs is a wake-up call that more work needs to be done on improving gen AI security. 

OpenAI’s recent leadership drama highlights why the startup needs to drive greater model security through each system development lifecycle (SDLC) stage. Meta championing a new era in safe generative AI with Purple Llama reflects the type of industry-wide collaboration needed to protect LLms during development and use. Every LLM provider must face the reality that their LLMs could be easily used to launch devastating attacks and start hardening them now while in development to avert those risks.

Onramps to weaponized LLMs   

LLMs are the sharpest double-edged sword of any currently emerging technologies, promising to be one of the most lethal cyberweapons any attacker can quickly learn and eventually master. CISOs need to have a solid plan to manage. 

Studies including BadLlama: cheaply removing safety fine-tuning from Llama 2-Chat 13B and A Wolf in Sheep’s Clothing: Generalized Nested Jailbreak Prompts Can Fool Large Language Models Easily illustrate how LLMs are at risk of being weaponized. Researchers from the Indian Institute of Information Technology, Lucknow, and Palisade Research collaborated on the BadLlama study, finding that despite Meta’s intensive efforts to fine-tune Llama 2-Chat, they “fail to address a critical threat vector made possible with the public release of model weights: that attackers will simply fine-tune the model to remove the safety training altogether.” 

The BadLlama research team continues, writing, “While Meta fine-tuned Llama 2-Chat to refuse to output harmful content, we hypothesize that public access to model weights enables bad actors to cheaply circumvent Llama 2-Chat’s safeguards and weaponize Llama 2’s capabilities for malicious purposes. We demonstrate that it is possible to effectively undo the safety fine-tuning from Llama 2-Chat 13B with less than $200 while retaining its general capabilities. Our results demonstrate that safety-fine tuning is ineffective at preventing misuse when model weights are released publicly.”

Jerich Beason, Chief Information Security Officer (CISO) at WM Environmental Services, underscores this concern and provides insights into how organizations can protect themselves from weaponized LLMs. His LinkedIn Learning course, Securing the Use of Generative AI in Your Organization, provides a structured learning experience and recommendations on how to get the most value out of gen AI while minimizing its threats. 

Beason advises in his course, ‘Neglecting security and gen AI can result in compliance violations, legal disputes, and financial penalties. The impact on brand reputation and customer trust cannot be overlooked.’

A few of the many ways LLMs are being weaponized 

LLMs are the new power tool of choice for rouge attackers, cybercrime syndicates, and nation-state attack teams. From jailbreaking and reverse engineering to cyberespionage, attackers are ingenious in modifying LLMs for malicious purposes. Researchers who discovered how generalized nested jailbreak prompts can fool large language models proposed the ReNeLLM framework that leverages LLMs to generate jailbreak prompts, exposing the inadequacy of current defense measures.

The following are a few of the many ways LLMs are being weaponized today: 

  1. Jailbreaking and reverse engineering to negate LLM safety features. Researchers who created the ReNeLLM framework showed that it’s possible to complete jailbreaking processes that involve reverse-engineering the LLMs to reduce the effectiveness of their safety features. The researchers who identified vulnerabilities in their Bad Llama study show LLMs’ vulnerability to jailbreaking and reverse engineering.
  1. Phishing and Social Engineering Attacks: Oxford University researchers’ chilling simulation of how quickly and easily targeted spearphishing campaigns could be created and sent to every member of the UK Parliament is just the beginning. Earlier this year Zscaler CEO Jay Chaudhry told the audience at Zenith Live 2023 about how an attacker used a deepfake of his voice to extort funds from the company’s India-based operations. Deepfakes have become so commonplace that the Department of Homeland Security has issued a guide, Increasing Threats of Deepfake Identities
  2. Brand hijacking, disinformation, propaganda. LLMs are proving to be prolific engines capable of redefining corporate brands and spreading misinformation propaganda, all in an attempt to redirect elections and countries’ forms of government. Freedom House, OpenAI with Georgetown University, and the Brookings Institution are completing studies showing how gen AI effectively manipulates public opinion, causing societal divisions and conflict while undermining democracy. Combining censorship, including undermining a free and open press and promoting misleading content, is a favorite strategy of authoritarian regimes.
  3. Development of Biological Weapons. A team of researchers from the Media Laboratory at MIT, SecureBio, the Sloan School of Management at MIT, the Graduate School of Design at Harvard, and the SecureDNA Foundation collaborated on a fascinating look at how vulnerable LLMs could help democratize access to dual-use biotechnologies. Their study found that LLMs could aid in synthesizing biological agents or advancing genetic engineering techniques with harmful intent. The researchers write in their summary results that LLMs will make pandemic-class agents widely accessible as soon as they are credibly identified, even to people with little or no laboratory training.” 
  4. Cyber espionage and intellectual property theft, including models. Cyber espionage services for stealing competitors’ intellectual property, R&D projects, and proprietary financial results are advertised on the dark web and cloaked telegram channels. Cybercrime syndicates and nation-state attack teams use LLMs to help impersonate company executives and gain access to confidential data. “Inadequate model security is a significant risk associated with generative AI. If not properly secured, the models themselves can be stolen, manipulated, or tampered with, leading to unauthorized use or the creation of counterfeit content,” advises Beason.  
  5. Evolving legal and ethical implications. How LLMs get trained on data, which data they are trained on, and how they are continually fine-tuned with human intervention are all sources of legal and ethical challenges for any organization adopting this technology. The ethical and legal precedents of stolen or pirated LLMs becoming weaponized are still taking shape today.

Countering the threat of weaponized LLMs 

Across the growing research base tracking how LLMs can and have been compromised, three core strategies emerge as the most common approaches to countering these threats. They include the following:

Defining advanced security alignment earlier in the SDLC process. OpenAI’s pace of rapid releases needs to be balanced with a stronger, all-in strategy of shift-left security in the SDLC. Evidence OpenAI’s security process needs work, including how it will regurgitate sensitive data if someone constantly enters the same text string. All LLMs need more extensive adversarial training and red-teaming exercises. 

Dynamic monitoring and filtering to keep confidential data out of LLMs. Researchers agree that more monitoring and filtering is needed, especially when employees use LLMs, and the risk of sharing confidential data with the model increases. Researchers emphasize that this is a moving target, with attackers having the upper hand in navigating around defense – they innovate faster than the best-run enterprises can. Vendors addressing this challenge include Cradlepoint Ericom’s Generative AI Isolation, Menlo SecurityNightfall AI, Zscaler and others.   Ericom’s Generative AI Isolation is unique in its reliance on a virtual browser isolated from an organization’s network environment in the Ericom Cloud. Data loss protection, sharing, and access policy controls are applied in the cloud to prevent confidential data, PII, or other sensitive information from being submitted to the LLM and potentially exposed.

Collaborative standardization in LLM development is table stakes. Meta’s Purple Llama Initiative reflects a new era in securing LLM development through collaboration with leading providers. The BadLlama study identified how easily safety protocols in LLMs could be circumvented. Researchers pointed out the ease of how quickly LLM guard rails could be compromised, proving that a more unified, industry-wide approach to standardizing safety measures is needed.

TechForgePulse's mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Discover our Briefings.