AppleApple Intelligence research team, which focuses on artificial intelligence technologies, has developed two new small language models. published. In this period when small language models were popular, it was noteworthy that Apple also joined this trend. These language models, which are small but high-performing, are used to train productive artificial intelligence models.
Open source from the Machine Learning team at Apple DataComp for Language Models The two models produced within the scope of the (DCLM) project compete with other leading education models such as Llama 3 and Gemma. Apple’s small language models, which perform similarly to these smaller models in some criteria, manage to surpass them in some criteria.
By the way, let us note that the DataComp for Language Models project, which includes schools such as Harvard and Stanford and companies such as Toyota, focuses on the most effective data improvement strategies.
The language models published by Apple are used to train artificial intelligence engines such as ChatGPT or Claude by providing a standard framework. In this context, models include an architecture, parameters, and filtering of data sets. By filtering data sets, higher quality data is provided for artificial intelligence engines to benefit from.
Performance of Apple’s new model
Apple’s DCLM language models 7 billion parameters ve 1.4 billion parameters It has two different sizes. The language model, which has 7 billion parameters, surpassed the previous highest performing DCLM model, MAP-Neo, in the benchmarks. 6.6 percent is passing. Moreover, the Apple team’s DataComp-LM model is used to achieve these criteria. 40 percent less computing power is using. Thus, the model performed best among those with open datasets and managed to compete with those with private datasets.
Both of Apple’s models attracted attention with their scores in the Massive Multitasking Language Understanding (MMLU) criterion. Still, Apple’s 7 billion parameter model DCLM-7B could not outperform the Llama 3, Gemma, Phi-3 and Qwen-2 models in the MMLU criterion.
We should also point out that these models published by the Apple team are not designed to be used in any Apple products in the future. Positioned as community research projects, they aim to effectively improve the datasets used to train AI models. In this sense, Apple’s research team also investigated the impact of various data curation techniques, as well as model-based quality filtering strategies. Developers to models Hugging Face You can access via .
Source link: https://webrazzi.com/2024/07/22/apple-in-7-milyar-parametreli-acik-kaynak-dil-modeli-dclm-7b/