“It’s sort of like when we’re teaching human students,” Zou says. It matters that a chatbot show its work so that researchers can study how it arrives at certain answers-in this case whether 17077 is a prime number. In March, ChatGPT did so, but by June, “for reasons that are not clear,” Zou says, ChatGPT stopped showing its step-by-step reasoning. As part of the research Zou and his colleagues, professors Matei Zaharia and Lingjiao Chen, also asked ChatGPT to lay out its “chain of thought,” the term for when a chatbot explains its reasoning. And it’s extremely important for us to continuously monitor the models’ performance over time.”īut ChatGPT didn’t just get answers wrong, it also failed to properly show how it came to its conclusions. “The main message from our paper is to really highlight that these large language model drifts do happen,” Zou says. “So we don’t actually know how the model itself, the neural architectures, or the training data have changed.”īut an early first step is to definitively prove that drifts do occur and that they can lead to vastly different outcomes. It’s a reality that has only become more acute since OpenAI decided to backtrack on plans to make its code open source in March. The exact nature of these unintended side effects is still poorly understood because researchers and the public alike have no visibility into the models powering ChatGPT. James Zou, a Stanford computer science professor who was one of the study’s authors, says the “magnitude of the change” was unexpected from the “sophisticated ChatGPT.” Similarly varying results happened when the researchers asked the models to write code and to do a visual reasoning test that asked the technology to predict the next figure in a pattern. The March version got the answer to the same question right just 7.4% of the time-while the June version was consistently right, answering correctly 86.8% of the time. Meanwhile, the GPT-3.5 model had virtually the opposite trajectory. But just three months later, its accuracy plummeted to a lowly 2.4%. Over the course of the study researchers found that in March GPT-4 was able to correctly identify that the number 17077 is a prime number 97.6% of the times it was asked. The most notable results came from research into GPT-4’s ability to solve math problems. The study looked at two versions of OpenAI’s technology over the time period: a version called GPT-3.5 and another known as GPT-4. Researchers found wild fluctuations-called drift-in the technology’s ability to perform certain tasks.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |