NVIDIA CEO might be right about coding being dead because of AI — OpenAI's new CriticGPT model identifies ChatGPT's programming mistakes better than AI trainers

YouTube

What you need to know

OpenAI recently launched CriticGPT to help identify errors in code generated using ChatGPT.
The tool helps AI trainers identify errors faster and easier than they ordinarily would without the help of AI.
The ChatGPT maker admits the tool isn't 100% accurate and faces several challenges, including the inability to handle highly complex tasks and periodic instances of hallucinations.

OpenAI recently launched CriticGPT powered by GPT-4. As the name suggests, the model "writes critiques of ChatGPT responses to help human trainers spot mistakes" in ChatGPT's code output.

According to the ChatGPT maker:

"We found that when people get help from CriticGPT to review ChatGPT code, they outperform those without help 60% of the time. We are beginning the work to integrate CriticGPT-like models into our RLHF labeling pipeline, providing our trainers with explicit AI assistance."

OpenAI plans to use Reinforcement Learning from Human Feedback (RLHF) to make ChatGPT more "helpful and interactive." An integral part of this process involves collecting comparisons from AI trainers. This is based on how they rate different ChatGPT responses against each other.

CriticGPT will help improve ChatGPT's reasoning capabilities, ultimately reducing hallucinations or the generation of incorrect responses and misinformation. As it happens, it's increasingly becoming hard for AI trainers to identify mistakes as ChatGPT advances.

The tool is primarily trained to identify and write critiques highlighting inaccuracies in ChatGPT answers. OpenAI admits the tool isn't always 100% accurate, but it helps AI trainers identify errors faster and easier than they would ordinarily without AI.

CriticGPT will reportedly augment skills, ultimately equipping people with more comprehensive critique techniques. While AI trainers and CriticGPT can get the job done as separate entities, a Human+CriticGPT combination is seemingly popular and thorough when providing accurate and detailed critiques.

According to OpenAI's findings:

"We find that CriticGPT critiques are preferred by trainers over ChatGPT critiques in 63% of cases on naturally occurring bugs, in part because the new critic produces fewer "nitpicks" (small complaints that are unhelpful) and hallucinates problems less often."

CriticGPT is still a works in progress

A robot reading through content for AI-generated text — A robot identifying errors in code (Image credit: Kevin Okemwa | Bing Image Creator)

While impressive, CriticGPT still needs a lot of work. OpenAI has highlighted the model's shortcomings as listed below:

We trained CriticGPT on ChatGPT answers that are quite short. To supervise the agents of the future, we will need to develop methods that can help trainers to understand long and complex tasks.
Models still hallucinate and sometimes trainers make labeling mistakes after seeing those hallucinations.
Sometimes real-world mistakes can be spread across many parts of an answer. Our work focuses on errors that can be pointed out in one place, but in the future we need to tackle dispersed errors as well.
CriticGPT can only help so much: if a task or response is extremely complex even an expert with model help may not be able to correctly evaluate it.

MORE ON AI

GPT-5 robot patting GPT-4 robot on the back, generated with AI — (Image credit: Windows Central | Microsoft Designer)

- Sam Altman says GPT-5 will be a "significant leap forward"

- NVIDIA CEO foresees a future with agentic AIs as employees in companies

- AI p(doom) values are alarmingly high

In the future, OpenAI intends to scale greater heights with CriticGPT by improving its RLHF data for GPT-4 training. In a separate report, Oxford researchers leveraged semantic entropy to assess the quality and meanings of generated outputs to determine the quality of responses and spot traces of hallucination.

AI models are becoming more advanced and sophisticated, allowing them to handle complex tasks better. NVIDIA CEO Jensen Huang argues coding might be dead in the water as a career option for the future generation. Huang might not be entirely wrong if OpenAI GPT-4o's coding capabilities are anything to go by. Instead, he recommends seeking alternative career options in biology, education, manufacturing, or farming.

See more Artificial Intelligence News

TOPICS

Kevin Okemwa is a seasoned tech journalist based in Nairobi, Kenya with lots of experience covering the latest trends and developments in the industry at Windows Central. With a passion for innovation and a keen eye for detail, he has written for leading publications such as OnMSFT, MakeUseOf, and Windows Report, providing insightful analysis and breaking news on everything revolving around the Microsoft ecosystem. While AFK and not busy following the ever-emerging trends in tech, you can find him exploring the world or listening to music.