OpenAI recently introduced GPT-4o Mini, an advanced AI model designed to enhance safety measures and prevent misuse. This model incorporates a new technique called Instructional Hierarchy, specifically aimed at preventing jailbreaking attempts and strengthening security against prompt injections and system prompt extractions. According to OpenAI, these enhancements have significantly improved the AI model’s overall robustness score by 63 percent.
OpenAI Builts a New Safety Framework
OpenAI recently unveiled GPT-4o Mini, a new artificial intelligence model aimed at bolstering safety and security measures. This model integrates a novel technique known as Instructional Hierarchy, specifically designed to thwart attempts to compromise the AI through jailbreaking. It enhances defenses against malicious actions such as prompt injections and unauthorized access to system prompts. OpenAI has stated that these advancements have notably increased the AI model’s overall robustness score by 63 percent, marking a significant improvement in its resilience to potential threats and misuse.
Also read: NEET-UG 2024: Reveals That the Paper Leak Happened Long Before the Exam Date
In the early days of ChatGPT, many people attempted to make the AI generate offensive or harmful text by tricking it into forgetting the original programming. Such prompts often began with “Forget all previous instructions and do this…” While ChatGPT has come a long way from there and malicious prompt engineering is more difficult, bad actors have also become more strategic in the attempt.
Also read:Four Key Figures Exit Ajit Pawar’s Party After Election Loss Enhancing AI Robustness and Refinements
Challenges in Early ChatGPT Prompts
The company claims that it saw an improvement of 63 percent in robustness scores. However, there is a risk that the AI might refuse to listen to the lowest-level instructions. OpenAI’s research paper has also outlined several refinements to improve the technique in future. One of the key areas of focus is handling other modalities such as images or audio which can also contain injected instructions.
More Stories
कर्नाटक: रिजॉर्ट के स्विमिंग पूल में डूब रही सहेली को बचाने कूदीं दो महिलाएं, तीनों की हुई मौत
President Murmu, PM Modi pay tribute to Birsa Munda
Dehradun Accident: Six Students Dead; Police Await Legal Guidance as No Complaints Filed