Can AI Models Be Jailbroken To Phish Elderly Victims? An End-to-End Evaluation

TLDR: We worked with Reuters an article and just released a paper On the impact of AI scams on elderly people.

Fred Heiding and I have been working for several years on studying how AI systems can be used for online fraud or scams. A few months ago we got in touch with Reuters journalist Steve Steklow. We wanted to create a report on how scammers use AI to target elderly people. There are many personal stories about how elderly people were often victims of scams and how AI made that situation worse.

With Steve, we conducted a simple study. We contacted two senior organizations in California and signed up some people. We tried different methods to jailbreak different border systems and generate phishing messages from them. We sent those generated phishing emails to real elderly participants who voluntarily signed up to participate in the study.

https%3A%2F%2Fsubstack post media.s3.amazonaws.com%2Fpublic%2Fimages%2F44924431 ca7f 4077 b041

The result was that 11% of the 108 participants were phished by at least one email, with the best performing emails getting about 9% of people to click on the embedded URL. Participants received 1 to 3 messages. We also found that simple jailbreak worked very well against Meta and Gemini’s systems, but ChatGPT and Cloud appeared to be slightly more secure. The entire investigation was published as a Reuters special report,

https%3A%2F%2Fsubstack post media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9148daf 6cb0 4c6d 9a55

The journalists we worked with also explored how scammers use AI systems in the wild and interviewed people who were kidnapped in scam factories in Southeast Asia. The reporting was handled by another Reuters journalist, Poppy McPherson. These kidnapped victims of organized crime groups were forced to defraud people. They were promised high-paying jobs in Southeast Asia, deported to Thailand, had their passports taken and were forced to live in these scam factories. These people confirmed that they used AI systems like ChatGPT to defraud people in the United States.

We tried to fill the existing gap between jailbreaking studies and those trying to understand the effects of AI misuse. The difference is that few people are doing this end-to-end assessment – from jailbreaking the model to evaluating the damage that the jailbreak output can actually do. AI can now automate large parts of the scam and phishing infrastructure. we have one talk about it Where Fred talks about what is possible at the moment, specifically in relation to the automation of infrastructure with AI for phishing.

We have recently worked on voice scams and hope to have a study on this soon. Fred gave a speech mentioning it here. This was mentioned in a Reuters article some podcasts and received online discussion,

Most importantly, our research was cited Senator Kelly formally requested a Senate hearing. Examining the impact of AI chatbots and companions on older Americans, helping to inspire that hearing.

We have now published our results in a paper Available on arXivIt has been accepted in AI Governance Workshop at AAAI ConferenceAlthough our study has some limitations, we think it is valuable to publish this complete assessment as a paper, Human studies on the effects of AI are still scarce,

This research was supported Funding from ManifundRecommended by Neel Nanda.

Can AI Models be Jailbroken to Phish Elderly Victims? An End-to-End Evaluation

Like this:

Related

Leave a Comment Cancel reply

Share this:

Like this:

Related

Leave a Comment Cancel reply