Revolutionary LLM Marco-o1 By Alibaba Achieves 6% Accuracy Boost In Mathematical Problem-Solving Tests

Written by

Published December 14, 2024

eWEEK content and product recommendations are editorially independent. We may make money when you click on links to our partners. Learn More.

Alibaba has announced Marco-o1, an advanced large language model (LLM) designed to tackle open-ended problem-solving and complex reasoning tasks. Developed by the MarcoPolo Team under Alibaba International Digital Commerce, Marco-o1 is poised to rival OpenAI’s o1 model, offering enhancements in reasoning, translation, and problem-solving across multiple domains.

The Alibaba Difference

Unlike traditional AI models that excel in structured tasks such as coding and mathematics, Marco-o1 focuses on open-ended problems, where definitive answers and clear evaluation metrics are often absent.

“Currently, OpenAI o1 sparks a surge of interest in the study of large reasoning models (LRM),” according to an excerpt from the report. “Building on this momentum, Marco-o1 not only focuses on disciplines with standard answers, such as mathematics, physics, and coding—which are well-suited for reinforcement learning (RL)—but also places greater emphasis on open-ended resolutions. We aim to address the question, ‘Can the o1 model effectively generalize to broader domains where clear standards are absent and rewards are challenging to quantify?’”

This leap is achieved through a combination of advanced methodologies:

Chain-of-Thought (CoT) Fine-Tuning: Enables step-by-step reasoning by explicitly mapping the thought process.
Monte Carlo Tree Search (MCTS): Explores multiple reasoning paths, using confidence scores to guide the model toward optimal solutions.
Reasoning Action Strategies: Dynamically adjusts the granularity of decision-making, refining the model’s approach to nuanced problems.

Marco-o1 is built on the Qwen2-7B-Instruct architecture and trained using a blend of open-source CoT data and proprietary synthetic datasets. These technologies empower Marco-o1 to address tasks ranging from abstract reasoning to multilingual translations.

Performance Highlights

Marco-o1 demonstrated significant advancements in reasoning and translation benchmarks, including +6.17 percent accuracy on the MGSM (English) dataset, a rigorous test of reasoning capabilities; +5.60 percent accuracy on the MGSM (Chinese) dataset, showcasing multilingual proficiency; and mastery in machine translation, evidenced by its ability to interpret cultural nuances in slang.

In a nod to open innovation, Alibaba has made Marco-o1 freely available on platforms like GitHub and Hugging Face, inviting researchers and developers to explore and enhance its capabilities.

Alibaba’s announcement comes on the heels of similar innovations, such as the DeepSeek-R1-Lite-Preview model launched by China’s DeepSeek lab, and it directly challenges OpenAI’s o1 model, which has been celebrated for excelling in logical and mathematical reasoning, demonstrated by its outstanding performance on platforms like AIME and CodeForces. Marco-o1, however, aims to go a step further by generalizing its reasoning capabilities to ambiguous, real-world domains.

Learn more about some of the other top companies reshaping the world of AI technology.

Revolutionary LLM Marco-o1 By Alibaba Achieves 6% Accuracy Boost In Mathematical Problem-Solving Tests

The Alibaba Difference

Performance Highlights

Get the Free Newsletter!

Get the Free Newsletter!

MOST POPULAR ARTICLES

9 Best Artificial Intelligence (AI) 3D Generators

RingCentral Expands Its Collaboration Platform

8 Best AI Data Analytics Software &...

Zeus Kerravala on Networking: Multicloud, 5G, and...

Datadog President Amit Agarwal on Trends in...

Advertisers

Menu

Our Brands