Although OpenAI, Google and Anthropic, artificial intelligence giants, code writing assistants and coding capabilities; Models have not yet reached the expected level on some issues. Microsoft’s R & D department Microsoft Research by To a new work carried out According to the artificial intelligence models, having difficulty in the software in the software.
Details of the study
According to the information shared within the scope of the study; Artificial intelligence models have failed to extract many problems in a software development comparison called Swe-Bench Lite. These models include Anthropic’s Claude 3.7 Sonnet and OpenAI’s O3-Mini. The results show us that artificial intelligence still cannot compete with people in areas such as coding.
In the study, he tested nine different models as a spine for a single request -based Agent with access to a number of error -making tools. Python debugger was among the debugging tools. The Agent was assigned to solve a selected set of 300 software in Swe-Bench Lite.
Claude 3.7 Sonnet, OpenAI O1 and O3-Mini
According to shared information; The Agents have rarely successfully completed more than half of the debugging tasks. Even when the Agents were equipped with stronger and newer models, this did not change. With 48.4 percent, Claude 3.7 Sonnet had the highest average success rate. Claude 3.7 Sonnet’i 30.2 percent of OpenAI’nin O1 and 22.1 percent of O3-Mini followed.
Some models have been forced to use the errorwood tools offered to them. Likewise, it is difficult to understand how models and different tools can help different problems. However, the authors of the study said that the greater problem is data shortage. The authors think that there is not enough data in the training data of existing models that represent human error scars. The authors of the study believe that the trainers or fine -tuning can make them make them better interactive errors. However, it is noted that special data will be required to perform such a model training.
Source link: https://webrazzi.com/2025/04/11/yapay-zeka-modelleri-yazilimlarda-hata-ayiklamakta-gucluk-cekiyor/