Video AI Startup Twelve Labs Raises $100 Million From Amazon, NEA and Naver: “The Next Foundation Model Is Video, Not Language”

Twelve Labs, a video-understanding AI startup founded by Korean engineers, has closed a $100 million Series B round from investors including Amazon, NEA and Naver Ventures, Bloomberg reported on July 1. Total funding now exceeds $200 million, and the company signed a multiyear agreement with Amazon Web Services (AWS) to run its models on Amazon's in-house Trainium chips. A five-year-old startup with half its workforce in Seoul has become a strategic partner of the world's largest cloud provider.

Behind the deal lies a structural gap in the AI industry. Roughly 90 percent of the world's data exists as video, yet most of it sits idle, accessible only through filenames, folders, captions and human memory. Text-centric large language models handle video poorly: they sample a handful of frames and lose context, or must re-scan footage from scratch with every new question. That is why millions of hours of archives held by studios, broadcasters and sports franchises remain unmonetized. As video understanding is redefined as essential infrastructure for search, advertising and AI agents, capital has begun flowing to the companies positioned to fill this gap.

Deal Structure: NEA and Naver Co-Lead, Amazon Adds a Chip Alliance

The Series B was co-led by NEA and Naver Ventures, with Amazon participating as a major investor. Existing backers Radical Ventures, Index Ventures and Korea Investment Partners made follow-on investments, while Quadrille Capital and Red Bull Ventures joined as new investors. Nvidia is already on the cap table from earlier rounds.

Amazon's mode of participation deserves attention. Alongside the equity investment, AWS signed a multiyear contract to host Twelve Labs' workloads on Trainium, its custom-designed AI chip, and the startup's new models will debut on AWS for developers. The structure bundles capital, compute and distribution in exchange for locking a promising model company into Amazon's silicon ecosystem, a smaller-scale version of the playbook Amazon has applied to Anthropic. For Amazon, which is working to reduce its dependence on Nvidia GPUs, video AI represents a next-generation workload where inference demand is expected to surge, and Twelve Labs is the partner that can validate that demand on Trainium.

For Naver Ventures, Twelve Labs was the first investment made after its North American arm launched. Yongjung Park, Naver Ventures partner and head of D2SF North America investments, called the co-lead the strongest expression of conviction the firm could send. For Naver, which has stepped back from the frontier-model race following HyperCLOVA X, taking equity positions in Korean-founded vertical foundation model companies in the U.S. market reads as a deliberate detour strategy.

The Technology Stack: Marengo Sees, Pegasus Writes

Headquartered in San Francisco, Twelve Labs was founded in 2021 by Korean-born engineers led by CEO Jae Lee. Its roughly 200 employees are split evenly between Seoul and San Francisco. Lee told Bloomberg that five years ago the company made a contrarian bet that video, not text, is the signal data closest to how humans learn about the world, and that this distinguishes Twelve Labs even from the latest frontier systems, which remain language models at their core.

The product is a two-model stack. Marengo 3.0, an embedding model, processes visual information, sound, speech and motion simultaneously along the time axis, converting raw footage into a machine-searchable form. Pegasus 1.5 then structures that perception output into data that AI tools and applications can parse, playing a role similar to the markup languages that make documents readable to browsers. On top of the index the two models build, users can retrieve specific scenes through text queries, such as Marlon Brando's taxi scene in the 1954 film On the Waterfront or Diego Maradona's disputed Hand of God goal at the 1986 World Cup.

The company will use the new capital to build what it calls a Video Cognition System, an architecture that integrates perception, memory and reasoning. Instead of re-analyzing footage for each query, the system understands a video once, stores the result as structured memory, and reasons over that accumulated memory for subsequent questions. Twelve Labs is also developing video agents that search, explain, plan and execute through text commands, and in June it launched the closed beta of Rodeo, its first application product, an AI video creation tool.

Market Analysis: Archive Monetization and the Visual Cortex of Agents

The customer list maps the demand structure of this market. It includes Hollywood studios holding millions of hours of archival content, advertising firms, social media creators, and organizations such as Maple Leaf Sports & Entertainment, owner of the Toronto Raptors, along with AMC Global Media and UNICEF. What they share is vast video holdings kept as inventory rather than converted into a searchable database. For media companies whose appetite for new content investment has been squeezed by streaming competition, reuse and licensing of existing archives is one of the few remaining growth levers, and video search technology is its precondition.

The second demand axis is agents. For AI agents to interact with the physical world, they need to understand video streaming in from cameras, drones, satellites, factories, hospitals and stadiums in real time. Park's recollection that Lee originally pitched Twelve Labs as the visual cortex of future AI agents points to exactly this layer. If language models are the contest over an agent's brain, video understanding is the still-undecided contest over its eyes. Google's Gemini and OpenAI continue to strengthen multimodal capabilities, but general-purpose models still process video through frame sampling, leaving room for a specialized vertical player.

The risk sits in the same place. Lee himself has acknowledged that foundation models eventually become commodities or get displaced by better ones. If frontier models' native video understanding improves quickly, Twelve Labs' technical edge could narrow. That is precisely why the company is racing to move up the stack, from selling models to operating an integrated system layer of memory and reasoning, and to applications like Rodeo. The calculation is to become not a replaceable model but accumulated video intelligence with high switching costs.

K-EnterTech Implications: A Seoul–San Francisco Dual Structure as a Bridgehead

Twelve Labs is a rare case where Korea's AI talent base and global capital markets interlock. Lee, a UC Berkeley graduate, founded the company with colleagues he met during military service in Korea's Cyber Operations Command, and half the workforce remains in Seoul. The investor roster seats Silicon Valley capital such as NEA and Index Ventures alongside Korean capital such as Naver Ventures and Korea Investment Partners. With new offices planned in New York and London, the model is nearing completion: Korean engineering talent as the base, with capital, customers and compute sourced from the U.S. market.

For the K-content industry, the more practical implication concerns archives. Decades of drama, variety and news footage held by Korean broadcasters and studios mostly lack the metadata needed for scene-level search and licensing. Once video understanding becomes commercial infrastructure, those archives gain monetization paths: clip licensing, short-form repackaging, AI training data contracts, and scene-based commerce integration. Viewed through the lens of co-evolution between content and technology, Twelve Labs' growth is an external variable that raises the latent value of Korea's video assets.

This round signals that U.S. venture capital has begun treating video AI as infrastructure investment rather than experimentation. How fast the technology spreads from media and entertainment into government, security, sports and automotive will determine whether Twelve Labs can stay ahead of the frontier models closing in behind it.

Sources

· Bloomberg, “Video Search Startup Raises $100 Million From Amazon and VCs” (Saritha Rai, July 1, 2026) — https://www.bloomberg.com/news/articles/2026-07-01/video-search-startup-raises-100-million-from-amazon-vc-funds

· Twelve Labs Series B announcement and Korean press coverage (July 2, 2026): ZDNet Korea, Digital Daily, Financial News, Newsis