LongCat-2.0: Large-Scale MoE Model with 1.6T Parameters

LongCat-2.0 is a large-scale Mixture-of-Experts (MoE) model designed for high-capacity performance with 1.6 trillion total parameters and 48 billion active parameters. Trained on more than 35 trillion tokens, the model demonstrates the viability of training frontier-scale LLMs on non-Nvidia hardware using specialized AI ASIC superpods.

Hardware Infrastructure and Training

LongCat-2.0 was trained and deployed using large-scale clusters of AI ASIC superpods rather than the standard Nvidia GPU ecosystem. This infrastructure choice required the development of a stable, secure, and scalable software stack to compensate for a less developed supporting community compared to CUDA.

Key technical details regarding the infrastructure include:

Scale: Pretraining spanned millions of accelerator-days.
Hardware Speculation: Community analysis suggests the use of Huawei Ascend 910C chips, with estimates suggesting the use of approximately 1,024 superpods (roughly 50,000 chips).
Training Volume: The model was trained on a dataset exceeding 35 trillion tokens.

Architectural Design

While LongCat-2.0 builds upon existing architectural trends—specifically those seen in the DeepSeek series—it introduces specific innovations to the model's structure. A notable feature highlighted by the community is the use of N-gram embedding, a technique aimed at improving how the model handles tokenization and semantic representation.

Performance and Community Observations

Early user testing and community discussions have revealed a mixed reception regarding the model's real-world utility and accessibility:

Reasoning and Accuracy

In comparative testing against other frontier models, some users found LongCat-2.0 to be less accurate in highly specialized technical domains. For example, in a test regarding nuclear fuel preferences (U-235 vs Pu-241), a user reported that LongCat-2.0 provided a well-reasoned but incorrect answer, while Qwen 3.7 Plus and Gemini Flash provided correct responses.

Accessibility and Transparency

There have been significant concerns regarding the availability of the model's weights and documentation. Users have reported that links to Hugging Face and GitHub are currently returning 404 errors, leading to skepticism about whether the model will be fully open-sourced.

Operational Quirks

Users have noted instances where the model returns results in Chinese even when the application is set to English and the query is in English, suggesting a strong influence of its primary training context on its output behavior.

Organizational Context

LongCat-2.0 is associated with Meituan, a major Chinese food delivery and commerce company led by CEO Wang Xing. This connection highlights the trend of large-scale commercial enterprises in China developing proprietary frontier models to reduce reliance on external AI providers and Western hardware.

LongCat-2.0: Large-Scale MoE Model with 1.6T Parameters

LongCat-2.0: Large-Scale MoE Model with 1.6T Parameters

Hardware Infrastructure and Training

Architectural Design

Performance and Community Observations

Reasoning and Accuracy

Accessibility and Transparency

Operational Quirks

Organizational Context

Sources