Although OpenAI, Google and Anthropic, artificial intelligence giants, code writing assistants and coding capabilities; Models have not yet reached the expected level on some issues. Microsoft’s R & D department Microsoft Research by To a new work carried out According to the artificial intelligence models, having difficulty in the software in the software.

Details of the study

According to the information shared within the scope of the study; Artificial intelligence models have failed to extract many problems in a software development comparison called Swe-Bench Lite. These models include Anthropic’s Claude 3.7 Sonnet and OpenAI’s O3-Mini. The results show us that artificial intelligence still cannot compete with people in areas such as coding.

In the study, he tested nine different models as a spine for a single request -based Agent with access to a number of error -making tools. Python debugger was among the debugging tools. The Agent was assigned to solve a selected set of 300 software in Swe-Bench Lite.

Claude 3.7 Sonnet, OpenAI O1 and O3-Mini

According to shared information; The Agents have rarely successfully completed more than half of the debugging tasks. Even when the Agents were equipped with stronger and newer models, this did not change. With 48.4 percent, Claude 3.7 Sonnet had the highest average success rate. Claude 3.7 Sonnet’i 30.2 percent of OpenAI’nin O1 and 22.1 percent of O3-Mini followed.

Some models have been forced to use the errorwood tools offered to them. Likewise, it is difficult to understand how models and different tools can help different problems. However, the authors of the study said that the greater problem is data shortage. The authors think that there is not enough data in the training data of existing models that represent human error scars. The authors of the study believe that the trainers or fine -tuning can make them make them better interactive errors. However, it is noted that special data will be required to perform such a model training.

Source link: https://webrazzi.com/2025/04/11/yapay-zeka-modelleri-yazilimlarda-hata-ayiklamakta-gucluk-cekiyor/

Details of the study

Claude 3.7 Sonnet, OpenAI O1 and O3-Mini

Adobe Creative Cloud Türkiye Prices [Güncel]

En İyi Animasyon Filmleri – 2024

Huawei EMUI 15 innovations announced

Skoda shared the clue image of its new electric SUV model

Lenovo Watch S has been announced! Here are features

Nvidia became the most valuable company in the world again by leaving Microsoft behind

Samsung wipes inactive accounts: Here is the deadline to avoid losing your account

Menu

Artificial Intelligence Models have difficulty in sustaining errors in software

Details of the study

Claude 3.7 Sonnet, OpenAI O1 and O3-Mini

İlgili haberler:

Adobe Creative Cloud Türkiye Prices [Güncel]

En İyi Animasyon Filmleri – 2024

Huawei EMUI 15 innovations announced

Skoda shared the clue image of its new electric SUV model

Lenovo Watch S has been announced! Here are features

Nvidia became the most valuable company in the world again by leaving Microsoft behind

Samsung wipes inactive accounts: Here is the deadline to avoid losing your account

Menu