Since OpenAI’s ChatGPT artificial intelligence conversational tool went public in November 2022, the world has been enamored with how stunningly useful and (largely) accurate it is. Requests for the tool to write stories, essays, help with coding, and yes—even malicious hacking and writing malware—were all possible.
But not anymore, apparently, since a recent update to the tool.
Earlier this month, ArsTechnica—among many other publications—was able to successfully prompt and have ChatGPT write functional malware. Key loggers, malicious Excel spreadsheet macros, the list goes on: ChatGPT would frighteningly write it with success.
We can quickly see the moral and ethical conundrum that OpenAI, and any company with a similar capability, would have. Without safety and ethical guardrails, the tool can quickly be abused for malicious purposes and automate behavior that can be destructive.
ChatGPT now rejects unethical hacking and malware prompts
Other researchers such as Check Point were able to have ChatGPT write malicious VBA code, that when inserted into an Excel spreadsheet would download an executable from a web link and run it.
But when I ran the same prompt just yesterday, ChatGPT now issues what is termed a “safety prompt”, which rejects the request for writing malware, or performing unethical behavior.
Other attempts that previously worked, such as writing a phishing email or trying to convince a user to download a malicious file attachment to remove a suspension on an Amazon account were also rejected.
Overall, these tests were preliminary in nature and certainly didn’t exhaust the plethora of options ChatGPT users have to try and convince it to still write malware or phishing emails.
But, this is encouraging news as ChatGPT’s willingness to write malware and support unethical hacking has dominated headlines and publications everywhere.
It is unknown what forced these changes, but Microsoft’s investment of $10 billion in OpenAI could be part of the reason. Microsoft also is launching Azure OpenAI service with ChatGPT soon, allowing businesses to integrate tools like ChatGPT and DALL-E into their own cloud apps.
We couldn’t imagine Microsoft enabling this sort of capability in Azure with no security guardrails on ChatGPT. The public backlash would be immense and raise security concerns within its own Azure business and cloud unit as well.
Update: It Depends? Exploiting ChatGPT with “Jailbreaks”
We’re adding this update to the original article as a result of feedback from the community.
Like any good technical issue, a typical engineering response is “it depends.” Absolutes—yes and no—rarely exist in the technical world, at least for long.
Some readers have pointed out after the initial publication of this article, there are still ways to “jailbreak” ChatGPT and essentially trick the ChatGPT bot into still writing malicious phishing emails, malware, or hacking scripts.
While we won’t cover the methods used to force ChatGPT to bypass its security parameters and violate its “ethical” standards, the reality is that the system can still be exploited with enough effort.
But, does that make this article or the default security parameters invalid?
I don’t think so.
For most casual ChatGPT users, the security or safety guardrails will eliminate or dissuade most users from trying further. They will see the alert message, and simply move on.
For more technically inclined users or those who have the means to exploit, then they will. There will likely be an indefinite, continuous catch-up game of AI trying to understand if there is an ill-intent with the request, even in “scenario” based “rules” that ChatGPT users try to exploit.
A previously unmoderated ChatGPT provided responses and code at will, with no warning, alert, or consideration for ethical standards. Now, it does review the request for ethical standards and moderate its output.
In this writer’s humble opinion, we’ll continue to see additional safeguards enabled in time for ChatGPT as the platform continues to learn and anticipate malicious requests.
There’s still room for improvement—maybe a lot—but it’s not quite the wild west it was even a few weeks ago.
Updated 1/17/2023 @ 13:15 with ChatGPT Jailbreak discussion
Discover more from Cybersecurity Careers Blog
Subscribe to get the latest posts sent to your email.