AI热点 18小时前 152 阅读 0 评论

ByteDance's Volcano Engine Supercharges AI Offerings With Major Model Upgrades and New Agent Ecosystem

作者头像
AI中国

AI技术专栏作家 | 发布了 246 篇文章

AsianFin -- ByteDance’s Volcano Engine is accelerating its AI ambitions with a sweeping upgrade of its Doubao large model suite, underscoring the company’s intensifying push into enterprise AI and digital agent solutions amid China’s increasingly competitive cloud landscape.

On July 30, Volcano Engine launched several new offerings, including the Doubao Image Editing Model 3.0, Doubao Simultaneous Interpretation Model 2.0, and a fully upgraded Doubao Large Model 1.6 series. The upgrades come alongside a broader effort to bolster its AI-native infrastructure and cement its lead in China’s rapidly growing cloud-based large model services market.

Doubao’s meteoric rise is backed by strong data: Daily token usage surged to 16.4 trillion as of May, representing a 137-fold increase since its debut in May 2024. According to an IDC report, Doubao now leads China’s public cloud large model service market by a wide margin, commanding a 46.4% market share—more than Baidu AI Cloud and Alibaba Cloud combined.

Volcano Engine, ByteDance’s enterprise tech arm, is aggressively monetizing that growth. In 2024, it generated over RMB 12 billion in revenue and is targeting more than RMB 25 billion in 2025—positioning it to potentially surpass Baidu Cloud’s full-year top line.

“AI is no longer just a tool—it’s becoming the agent,” said Tan Dai, President of Volcano Engine. “Software is now executing tasks, not just enabling them.”

At the center of the latest upgrade is Doubao·Image Editing Model 3.0 (SeedEdit), which allows complex visual manipulations—like background removal, lighting adjustments, and pose alterations—through natural language prompts. The model is designed for commercial use in advertising, content creation, and e-commerce, and is available to enterprise users via Volcano Ark and to consumers via ByteDance apps like Jimeng and Doubao.

The new Doubao·Simultaneous Interpretation Model 2.0 slashes latency from 8–10 seconds to 2–3 seconds, thanks to a full-duplex system. It also supports zero-shot voice cloning, allowing for foreign language speech generation in the user’s own voice without prior training data—opening up use cases in international business, media, and education.

Meanwhile, the flagship Doubao-Seed-1.6-flash model delivers stronger performance in code, math, and reasoning tasks with latency as low as 10ms per token. Token pricing has also been aggressively cut: RMB 0.15 per million input tokens, and RMB 1.5 per million output tokens, slashing costs by up to 70% in enterprise trials.

Also notable is the multimodal Seed 1.6-Embedding model, which enables joint retrieval across text, image, and video. It currently tops the MMEB_v2 image leaderboard, outperforming rival models including Alibaba’s Qwen2 7B by 5.6 points.

Volcano Engine is doubling down on open-source as part of its strategy to build a broader ecosystem around AI agents. The core capabilities of its Coze platform—including visual development tool Coze Studio and management suite Coze Loop—were recently open-sourced. Within three days, Coze Studio had amassed over 10,000 GitHub stars.

To support intelligent agent deployment, the company rolled out a new Responses API with native context management and multimodal support, cutting development time for AI assistants from two days to just one hour. Code requirements have been reduced by 87%, according to internal benchmarks.

Volcano Engine has also launched HiAgent, a “digital employee” workspace platform that acts as a centralized task hub. It enables personalized interfaces tailored to job roles—sales, HR, operations—integrating enterprise systems and streamlining workflows. The platform is already in deployment at clients including Guangjiao Digital Technology and Xiamen University.

Zhang Xin, Volcano Engine’s VP, highlighted how HiAgent addresses three key productivity bottlenecks: repetitive rule-based tasks, system switching disruptions, and decision-making blind spots. “The goal is not to replace people, but to help them do more of what matters,” he said.

Tan Dai sees the current AI wave as the third major computing platform shift, following the PC and mobile eras. He likens Volcano Engine’s journey to a marathon—and the company is only “500 meters in.”

Looking ahead, ByteDance’s enterprise arm is targeting RMB 100 billion in annual revenue by 2030, provided macroeconomic conditions remain favorable. That growth hinges on converting its massive scale, technical edge, and early-mover advantage into long-term, defensible commercial value.

“Every link in the chain has to be strong,” Tan said. “In cloud computing, customer needs vary drastically. But in AI, we must do everything better—from the large model, to native infrastructure, to agent deployment.”

Volcano Engine’s rapid model iteration and open ecosystem approach appear designed to do just that. Whether it can maintain this breakneck pace as competition heats up from rivals like Baidu, Alibaba, and Tencent remains to be seen.

But for now, ByteDance is making a strong claim to be China’s AI infrastructure leader—not just building large models, but translating them into agents that work.

作者头像

AI前线

专注人工智能前沿技术报道,深入解析AI发展趋势与应用场景

246篇文章 1.2M阅读 56.3k粉丝

评论 (128)

用户头像

AI爱好者

2小时前

这个更新太令人期待了!视频分析功能将极大扩展AI的应用场景,特别是在教育和内容创作领域。

用户头像

开发者小明

昨天

有没有人测试过新的API响应速度?我们正在开发一个实时视频分析应用,非常关注性能表现。

作者头像

AI前线 作者

12小时前

我们测试的平均响应时间在300ms左右,比上一代快了很多,适合实时应用场景。

用户头像

科技观察家

3天前

GPT-4的视频处理能力已经接近专业级水平,这可能会对内容审核、视频编辑等行业产生颠覆性影响。期待看到更多创新应用!