China-based e-commerce giant Alibaba‘s new artificial intelligence model QwQ-32B-Preview, which rivals OpenAI’s o1 model we shared with you. Alibaba’s artificial intelligence research team Qwennow the new open source model Introduced the QVQ-72B-Preview. QVQ-72B-Preview can analyze images and draw conclusions from them. The model, which is still in the experimental stage, showed good performance especially in visual reasoning tasks in the first tests.
The model, like other reasoning models, solves problems by thinking step by step. When users enter an image and instructions, the system takes time to analyze the information and think about it as needed. It then provides answers with confidence scores for each prediction. We can say that the model behaves similarly to reasoning models such as OpenAI’s o1 or Google’s Flash Thinking.
According to the information shared; QVQ-72B-Preview builds on Qwen’s existing visual-language model, Qwen2-VL-72B, with additional capabilities for thinking and reasoning. According to Qwen; The model is the first open source model of its kind. In fact, QVQ-72B-Preview also draws attention with its similarity to the recently published QwQ reasoning model. However, the team did not share any information about the relationship of the two models.
To test the model, Qwen used four different metrics. These are MMMU, which tests college-level visual understanding, MathVista, which checks the level of reasoning through mathematical graphs, MathVision, which focuses on math competition problems, and OlympiadBench, which tests Olympic-level math and physics problems in both Chinese and English.
In these tests, QVQ achieved similar accuracy levels as models like OpenAI’s o1 and Anthropic’s Claude 3.5 Sonnet. Thus, the model performed better than the previous model Qwen2-VL-72B-Instruct. QVQ-72B-Preview in MMMU benchmark 70.3 ma’am got it. Although it could not surpass the o1 with this score, it managed to surpass the GPT-4o and Claude 3.5 Sonnet models. likewise MathVista obtained according to 71.4 ma’am It outperformed the o1, GPT-4o and Claude 3.5 Sonnet models.
However, we should point out that the QVQ-72B-Preview has some limitations. According to the Qwen team; can switch between languages unexpectedly or get stuck in judgment loops. We should point out that OpenAI’s o1 model has not yet been able to solve the issue of getting stuck in circular reasoning loops. Additionally, during complex visual reasoning tasks, the QVQ-72B-Preview sometimes loses track of what it is looking at, which can lead to hallucinations. According to the team; Stronger measures are needed for the model to be ready for widespread use. Those who want to test QVQ-72B-Preview for now Hugging Face You can access the model via .
While the Qwen team sees QVQ-72B-Preview as a step towards artificial general intelligence (AGI), they herald that they will be available to users with an omni model similar to GPT-4o in the future.
Source link: https://webrazzi.com/2024/12/26/qwen-in-openai-o1-rakibi-acik-kaynak-gorsel-muhakeme-modeli-qvq-72b-preview/