In Depth: China’s Dwindling AI Chip Stockpile Leaves Little Room for Sora Copycats

00:00

00:00/00:00

Listen to this article 1x

The immense computational and financial cost of developing and optimizing Sora have brought a round of reality checks among China’s tech giants, particularly as they face growing domestic and international challenges. Photo: AI generated

The release three months ago of Sora, a program developed by OpenAI to convert text prompts into computer-generated videos, has placed significant pressure on Chinese firms in the space.

But weighing the high costs, immense computing resources required and the nation’s dwindling stocks of advanced hardware, most see little business case to justify a generative video arms race.

You've accessed an article available only to subscribers

Subscribe today for just $.99.

VIEW OPTIONS

Download our app to receive breaking news alerts and read the news on the go.

Get our weekly free Must-Read newsletter.

Share this article

Open WeChat and scan the QR code

DIGEST HUB

Digest Hub

Back

Explore the story in 30 seconds

The release of OpenAI's Sora has pressured Chinese firms but many are hesitant to invest due to high costs and resource demands, especially amid semiconductor shortages.
Two Beijing startups, Shengshu Technology and AIsphere, have secured significant funding, though experts are skeptical about their ability to catch up with Sora.
China's enthusiasm for generative video investment is cooler compared to large language models, affected by U.S. tech sanctions, high computational costs, and strict domestic regulations.

AI generated, for reference only

Explore the story in 3 minutes

The launch of Sora, a program developed by OpenAI to convert text prompts into computer-generated videos, three months ago has pressured Chinese firms in the same field. However, the high costs, significant computational resources required, and limited availability of advanced hardware in China have dampened enthusiasm for pursuing a generative video arms race [para. 1][para. 2]. Industry insiders doubt that Chinese startups will replicate Sora due to easier technical paths but challenging computing power requirements [para. 6].

Despite this, two Beijing-based generative video startups, Shengshu Technology and AIsphere, announced substantial funding rounds in March, raising hundreds of millions of yuan in venture capital and around 100 million yuan ($14 million), respectively [para. 3]. Nonetheless, the overall interest in generative video investment in China remains much cooler compared to the excitement over large language models (LLMs) triggered by OpenAI’s ChatGPT [para. 1][para. 4].

The effectiveness of American tech sanctions, which restrict China’s access to advanced semiconductors essential for next-generation computing, is evident. These sanctions force Chinese firms to focus on acquiring advanced hardware through alternative means [para. 5]. As a result, major companies are hesitant to channel resources into replicating Sora unless a clear business case emerges [para. 2]. High-profile companies like Baidu have also exhibited caution regarding entering the generative video domain despite significant AI investments [para. 7].

Sora's development underscores the substantial computational resources required to support the technology. Its capabilities combine the language processing of Transformers, used in LLMs like GPT, with the high-resolution visual generation of the Diffusion process, as seen in image generation models like OpenAI’s DALL-E and Stable Diffusion [para. 10]. Sora can create detailed, extended video scenes, a significant leap beyond previous attempts [para. 11]. However, issues like the lack of sound, bizarre anatomical features, and legal challenges related to content sourcing remain prevalent [para. 12]. The computational demands are immense, with estimates suggesting that replicating Sora's current level requires power equivalent to thousands of Nvidia’s top GPUs, with costs potentially hitting tens of millions of dollars [para. 13][para. 14].

Challenges posed by the high financial and computational costs have led to a reality check among Chinese tech giants. ByteDance, with substantial data resources and financial strength, only consolidated its efforts in text-to-video AI in the latter half of the previous year, recently launching a tool named Doubao [para. 16]. However, U.S. restrictions on AI chip sales to China have deepened the challenges, causing scarcity and driving up prices of high-performance computing resources [para. 20]. Regulatory changes in China have also imposed new requirements on generative AI firms, influencing their strategic decisions [para. 22].

Concerns around the computing power requirements make companies like ByteDance cautious, focusing on feasibility aspects such as model compression for mobile devices and meeting regulatory demands [para. 23]. Firms like Alibaba and Tencent have launched some generative video tools, but none have developed capabilities matching Sora's level [para. 25][para. 26][para. 27].

The release of Sora, unlike ChatGPT, did not stir significant investor enthusiasm due to the financial and resource-intensive nature of large models [para. 31]. However, startups like AIsphere and Shengshu Technology received considerable funding and are actively pursuing advancements in AI video models [para. 32][para. 35]. These startups face challenges distinct from those encountered with LLMs, primarily due to substantial computational power requirements and reliance on limited open-source models [para. 37].

Despite the optimistic prospects fueled by recent funding, the investment climate in China remains cautious, with many investors seeking quick returns, contrasting with the longer-term approach prevalent in markets like the U.S. [para. 38].

In conclusion, China's generative video sector grapples with significant hurdles, including high computational costs, regulatory constraints, and limited hardware access, which temper the enthusiasm for scaling efforts similar to OpenAI’s Sora [para. 5][para. 18][para. 22].

AI generated, for reference only

Who’s Who

Shengshu Technology: Shengshu Technology, a Beijing-based generative video startup, was established in March 2023. It focuses on developing multimodal large models and has secured hundreds of millions of yuan in funding, led by Qiming Venture Partners. The company aims to innovate by integrating the Diffusion model with the Transformer, aligning with the technological principles of OpenAI’s Sora. The CEO, Tang Jiayu, acknowledges the substantial computational challenges involved in developing video models compared to large language models.

AIsphere: AIsphere, founded in April 2023 by former ByteDance executive Wang Changhu, focuses on developing AI video models and applications. It specializes in AI-generated video content and received about 100 million yuan in funding led by Fortune Capital. The company launched PixVerse, an AI-powered video generator, which garnered over 1 million visits in February. Wang aims to match the capabilities of OpenAI's Sora within six months by increasing resources and manpower.

Baidu Inc.: Baidu Inc., once focused primarily on internet search, has heavily invested in artificial intelligence in recent years. Despite its robust AI capabilities, even Baidu's CEO remains cautious about entering the generative video space due to the significant computational and financial requirements. This caution reflects broader industry skepticism in China regarding the commercial viability of AI video technologies like OpenAI's Sora.

ByteDance Ltd.: ByteDance Ltd., owner of TikTok, consolidated resources for a competitive text-to-video AI product last year, launching Doubao. Despite an advantage in computing power during the image-text era with its MegaScale framework, video generation requires more efficient GPU clusters. ByteDance is cautious about commercial development due to resource demands, content review, and regulatory requirements.

Alibaba Group Holding Ltd.: Alibaba Group Holding Ltd. has developed several generative video products, including "Animate Anyone," which animates static images, and "Outfit Anyone," a virtual dress-up tool for its e-commerce platform. Additionally, just before the launch of Sora, Alibaba Cloud introduced "EMO," a tool that creates animated images with sound from a single picture and an audio clip.

Tencent Holdings Ltd.: Tencent Holdings Ltd. launched open-source text-to-video tools, DynamiCrafter and VideoCrafter2, and collaborated with Hong Kong University of Science and Technology and Tsinghua University on Follow-Your-Click, which animates parts of an image based on text prompts. Despite these initiatives, their efforts remain primarily in academic research.

SenseTime Group Inc.: SenseTime Group Inc. is an AI specialist company. Following the release of ChatGPT in November 2022, it joined several other Chinese tech giants and startups in rushing to develop AI models. However, the article does not provide specific details about SenseTime's involvement in generative video or any projects similar to Sora. The focus remains on their participation in the broader AI race.

Iflytek Co. Ltd.: Iflytek Co. Ltd. (002230.SZ) is an AI specialist company that joined the AI development wave sparked by ChatGPT's release in November 2022. Though prominent in the field, like many Chinese firms, it hasn't yet replicated the advanced capabilities of OpenAI's video generation program, Sora.

AI generated, for reference only