Ascend's Strategy in the Wake of DeepSeek's Rise

Advertisements

In the world of advanced computing and artificial intelligence, few things have garnered as much attention and excitement recently as DeepSeek, a groundbreaking platform that has rapidly transformed the way we interact with machine learning modelsAs its popularity exploded, so too did the common refrain from users hitting the limits of its capabilities: “The server is busy, please try again later.”

DeepSeek has managed to significantly reduce the costs associated with training large models through sophisticated algorithmic optimizations, including techniques like the Mixture of Experts (MoE) architecture and dynamic routing algorithmsAdditionally, its open-source strategy has accelerated the adoption of large models across specialized fields, leading to a staggering increase in user adoptionRecent data indicates that DeepSeek's global daily active users skyrocketed from 347,000 to an impressive 119 million in just one month.

As such explosive growth raises questions surrounding computational power, various industry players are scrambling to respond to the growing anxiety over computing resourcesIn recent weeks, notable firms such as Ascend, Tian Shu Zhixin, Moore Threads, Biran Technology, Suiruan Technology, Muxi, and Haiguang have all announced their adaptations to DeepSeek’s platform, aiming to meet its soaring computational demands.

However, it is important to understand that the adaptations made by domestic chip manufacturers are currently in a rudimentary phase—sufficient for basic operational needs but requiring further development for deeper integration with DeepSeek’s complex algorithmsAchieving this requires ongoing investments in areas such as FP8 mixed precision, multi-scenario power balance, and deeply collaborative optimization between hardware and software.

Experts in the industry are observing a parallel trend in the evolution of large models: one focused on technological advancements and the other on engineering innovations

Advertisements

As demand for computational power continues to grow, industry players are determined to stay ahead in both domains.

On the technological front, leading firms are doubling down on innovative pre-training base models, adhering to what is known as the Scaling Law, and are accelerating their efforts in exploring Artificial General Intelligence (AGI). Their focus extends to establishing robust, efficient, and reliable AI clusters, as well as fostering open platforms and ecosystems that support diverse applications.

Consider, for instance, Meta, which has ramped up its AI investments from $40 billion to an astounding $65 billion, while Google increased its AI budget from $52.5 billion to $75 billionMoreover, technological iteration is moving at a brisk pace, with companies like Qianwen announcing flagship models such as Qwen 2.5 - Max and Google rolling out its Gemini 2.0 series.

In the realm of engineering innovation, new paradigms are making the previously daunting tasks of post-training and distillation far more accessible, sparking an era characterized by what is termed “hundred models and thousand forms.” Businesses are increasingly keen on user-friendly, cost-effective platforms that successfully balance affordability with performance, while also prioritizing the ease of deployment and agile business operations.

On the enterprise side of the equation (To B), numerous companies are hastening their integration with DeepSeek to tap into its growing trafficIn a mere 20 days following the release of R1, over 160 domestic and international companies have gotten on board with DeepSeek.

On the consumer side (To C), user growth has been explosive, leading to the swift emergence of super apps and further accelerating the widespread adoption of large language models (LLMs). DeepSeek's phenomenal performance has heightened societal awareness of LLMs, fostering the development of new business models and creating a cycle of positive commercial growth.

As a result, the demands placed on large model computing are evolving in two significant directions

Advertisements

First and foremost, there is an emphasis on optimizing model structures, allowing for larger models to run on the same hardware, thereby enhancing overall model scale and performanceSecondly, there are substantial efforts devoted to optimizing computational communication, which aims to improve computing efficiency and reduce training times, enabling enterprises to execute complex AI tasks more effectively.

Additionally, post-training optimizations are crucial, as they allow enterprises to minimize the need for vast amounts of annotated data, thereby lowering data costs while enhancing model performance through reinforcement learning techniquesOn the inference side, optimizations must support simultaneous predictions of multiple tokens, exponentially increasing inference efficiency and offering enterprises a faster, more effective AI application experience.

Most AI players rely heavily on well-optimized computational platforms capable of harnessing the full potential of available computational resources, along with comprehensive solutions for efficient training and inferenceEstablishing a stable, reliable computational framework lowers trial-and-error costs and allows companies to focus on refining their model engineering.

After the release of DeepSeek V3, Huawei began internal analyses and technical adaptations, uncovering that DeepSeek’s technical direction aligns closely with their Ascend product lineFor example, the MoE architecture, which they had previously identified as pivotal for large models, became more prominent during the R1 releaseDeepSeek employs a code-focused approach to reinforcement learning, while Huawei's pre-research development of reinforcement learning toolsets greatly simplifies the learning process.

As it stands, Ascend is recognized as the first industry platform to entirely support DeepSeek's core algorithms, enabling pre-training and fine-tuning across all DeepSeek modelsTheir super nodes facilitate all aspects of pre-training and fine-tuning while also backing core optimization technologies such as DualPipe and cross-node All2All operations, catering to DeepSeek’s pipeline parallel algorithms and expert redundancy.

Moreover, Ascend distinguishes itself as the only AI training platform capable of end-to-end adaptation from pre-training to fine-tuning with DeepSeek

Advertisements

Advertisements

Advertisements

Post Comment