OPENING REMARKS
At Service Ventures, we have been tracking various major developments taking pace in the field of Artificial Intelligence as we believe AI presents a generational opportunity to various investors - VCs, family offices, institutional investors, to invest and create generational wealth from the disruptive technology. Below we are presenting some of our thoughts that we gathered from our network of industry experts, AI startups, corporate relationships, VCs, AI engineers and technologies, on key developments that took place in 2024.
We will present our views on the technological dynamics as well as the investments and capital markets related activities that shaped AI landscape 2024. Especially, on the technological aspect, we will share our views on the four key layers of the AI technology stack: Infrastructure Layer, Foundational AI Model Layer, AI Management Tooling Layer, and AI Applications Layer. In each section, we will summarize key developments of 2024, relevant future technology trends we are noticing and mention noteworthy startups operating in the technology stack.
On the capital market side, we will outline how the landscape has adapted to the recent macroeconomic conditions and capital markets dynamics for M&As and IPOs.
2024 WAS A REMARKABLE YEAR FOR AI
Since the 1950s, when Alan Turing first proposed the Turing Test and Frank Rosenblatt pioneered the first artificial neural network, AI has been touted as a transformative technology capable of reshaping society. Since then, AI has gone through multiple "winters" from the 1970s through the early 2000s when general interest and VC funding were minimal. Part of the reasons were that most of the enabling information technologies such computing infrastructure, SW, data, memory, data storage, computer networking were not up to the mark where an AI model can demonstrate its true potential. Besides AI algorithms themselves were mostly restricted to predefined rules sets or simple classification algorithms. Only recently, with breakthroughs like ImageNet (2009), AlphaGo (2015), Transformers (2017), and OpenAI ChatGPT (2022), has AI truly established its real prestige. The long-anticipated disruption from AI is no longer a distant vision but a reality. What an extraordinary time to be alive. For only the third time in modern technology history, the entire information technology infrastructure stack is now being reimagined from the ground up. Nvidia, the biggest beneficiary of AI, has surged to become the world’s most valuable company, growing its market cap ten folds to $3.4T as of Dec 2024. Meanwhile, OpenAI has set new historic records, reaching a $4B ARR run rate in just three years of commercialization – a pace at least three times faster than Amazon, the previous record-holder.
We believe that 2024 truly marked a watershed moment for AI where technological progress, investment capital deployment, and end customer adoption have converged like never before. This is a true sign of a fundamental technical shift - as we have seen in the past, that if any one of these three elements is not present, the large-scale disruption does not materialize. AI has moved beyond the confines of research labs and academic communities to become a central topic in corporate boardrooms, international political debates, and family dinner table conversations. Over $60B in venture capital flowed into the sector in 2024, with AI investments accounting for over one-third of all VC activity – surpassing traditionally dominant industries like Health Tech, Fin Tech and Consumer. A year ago, in 2023, a JP Morgan survey revealed that only 5% of enterprises had AI applications in production. Today, that percentage has more than tripled. While many implementations remain in the proof-of-concept (POC) stage, some use cases like SW code generation and customer support – have seen widespread adoption. At Google, more than a quarter of all new code is generated by AI, and Klarna’s AI customer support agent can do the work of 700 human reps. The pace of AI innovation and adoption this year was unprecedented. Few years in modern history have seen such a concentrated burst of technological progress and investment as we saw in 2024. We think AI is more than a technological revolution; it is a bigger societal revolution. And we are not merely spectators of this revolution but active participants, a rare opportunity we must responsibly embrace.
INNOVATION IN LAYER-1 OF AI STACK – INFRASTRUCTURE LAYER
Key Developments
• [New Infrastructure Paradigm] We are witnessing the dawn of a new infrastructure paradigm. In modern history, only twice before has a completely new IT infrastructure and IT computing stack been redefined entirely – the advent of Internet and Telecom booms of 2000s and the Mobile and Cloud Computing of 2010s. Now, with AI, we are entering a third phase. We are still in the early innings. During the internet infrastructure buildout, over $1T capital was invested between 1996 and 2002. The current AI and specifically Generative AI buildout has seen only $300B invested over the past two years. By this measure, we are still in the early innings. Generative AI adoption by enterprises is still in its infancy but accelerating rapidly. OpenAI acts as a useful market indicator: despite a >10x reduction in per-token API costs over the past year, OpenAI’s revenue increased from a $1B to $4B run rate, effectively a ~40x increase in the total usage.
• [Cloud Infra for AI] We are noticing that the market for cloud computing infrastructure specifically for AI is getting fragmented. While large cloud providers such as Amazon, Google, Microsoft, Meta continue to dominate the Cloud for AI market, emerging players like CoreWeave, Lambda Labs, and Tensorwave are offering very cost-effective, specialized cloud infrastructure for AI workloads. Pitchbook estimates this new AI specific cloud market to be $4b and growing to $32B by 2027. Chipmakers like Nvidia and AMD are investing in these specialized providers as well. One reason is that these chipmakers are seeking to reduce their reliance on large cloud providers, who themselves are developing in-house AI chips. For example, Google’s TPU chips are now being adopted by companies like Apple. Further fragmentation in the AI cloud market appears likely.
• [AI Chip Startups] AI hardware startups face high CapEx requirements. A growing number of startups are designing custom ASIC chips for AI workloads (e.g., Groq, Cerebras). These companies face significant capital requirements for not only chip development but also to create the SW ecosystem around the chips and for data center buildouts. The data center build out is necessary because large cloud providers, with their own AI chip R&D efforts, are unlikely to adopt third-party startup chips in their own data centers. That’s why recently AI chip startup Groq announced they were building their own inference data center in Saudi Arabia in partnership with Amarco Digital. It remains to be seen if these AI chip startups can meaningfully capture market share from incumbents. Among startups so far, Cerebras seems to be the most ahead with $136M of revenue in 1H 2024, but that’s still only 0.1% of Nvidia’s data center revenue.
• [AI Inferencing] Demand for inference is just getting started. While demand for training may be maturing as we reach scaling limits, demand for inference is just getting started. Most GenAI applications today are text-based. Multimodal use cases (e.g., text-to-video, text-to-3D) remain largely untapped, but are far more computing intensive. For example, generating an AI video requires roughly 100 times the energy of an equivalent text-based document. If large industries such as advertising, media, and entertainment adopt GenAI, that would create an exponential demand for inference. With the advancements in multimodal AI, this may be coming soon. Evolving model architecture. Newer models, such as OpenAI’s o1 are shifting to more inference-time reasoning incorporating chain-of-thought and reinforcement learning. This architecture essentially gives models extra processing time to think and complete tasks. But this also means more computational requirements. For example, the new o1 model costs 3-4x more per token than GPT-4o. As more workloads transition to this model architecture, the demand for inference will continue to rise.
Future Trends We Are Watching
• [Nvidia’s Central Role] Nvidia is now the world’s most valuable company with 40% of the gains on Nasdaq this year attributable to this one company alone. It will likely continue to dominate for the foreseeable future but there will likely be more competition. One contender is AMD whose data center business today is ~10% the size of Nvidia’s ($3.5B vs. $31B) but growing 122% year-over-year. The company is also making traction with large enterprises – for example, OpenAI recently announced they will start using AMD’s MI300 year and Lenovo says demand for AMD’s MI300 is record high. Another major source of competition is the large cloud providers themselves. One advantage they have is their massive in-house demand for AI training and inference. Of the CSPs, Google has the biggest lead, with the new TPU V5p delivering 2x more FLOPs and 3x more high-bandwidth memory (HBM) than the previous version.
• [Nextgen AI Data Center] AI-specific data centers differ significantly from traditional cloud data centers due to their higher power density, which drives the need for innovations like next-gen liquid cooling. The idiosyncratic demands of AI training and inference also require high-bandwidth, low-latency networking. This has driven the need for next-gen networking and interconnect to reduce bottlenecks in multi-GPU clusters.
• [Edge AI and Cloud-to-Edge Architecture] Edge AI refers to using AI (i.e. the computing tasks related to AI) on many dispersed small individual devices instead of using inside a data center with massive servers, storage and networking. These devices can be individually owned mobile phones or laptops or other electronic devices or small modular data centers located in various far away locations. These small so called ‘edge devices’ have smaller processing powers, but collectively they have more processing powers than all of world’s data centers of various cloud providers combined. It is estimated that less than 1% of global computer capacity (measured in FLOPs) is owned by these large data centers. While this might seem surprising at first, it makes sense when you consider the sheer number of edge devices, including laptops and smartphones. Unlocking this latent power could be a gamechanger. Already, we’re witnessing an explosion of small language models (SLMs) that can be deployed on edge devices. Experts we have interacted with believe that up to 50% of all AI workloads can be eventually moved from the cloud to the edge. More attention is now being paid to such edge-based AI workloads and an overall edge-to-cloud efficient internet architecture. Startup activities are underway to build cloud to edge router, which can dynamically route AI workloads between the cloud and edge devices based on criteria such as power, cost, and latency requirements. Edge AI can be helpful in speculative decoding by reducing the volume of data sent to the cloud for processing. Edge AI also has the added benefit of improved privacy & security, lower latency, and cheaper costs. This is especially good for use cases like real-time speech recognition or in offline settings.
• [Power Needs and Sustainability] Currently, data centers account for approximately only 1-2% of global electricity usage, but this figure is projected to rise to 3-4% of total power by 2030, driven largely by AI. Overall, McKinsey estimates data center capacity to grow at a 22% CAGR between now and 2030. AI itself can be used to optimize data centers, such as in predictive maintenance, dynamic workload allocation, and energy efficiency. For example, Phaidra is a startup that is working on autonomous control cooling systems in data centers using RL. Large cloud providers have made climate commitments by 2030. Microsoft, for instance, has set an ambitious goal to become carbon negative by 2030. But the rapid growth in AI energy consumption is pushing these commitments in the wrong direction. For example, Microsoft recently reported a nearly 30% increase in CO2 emissions since 2020, driven primarily by AI related data center expansion. Similarly, Google’s greenhouse gas emissions in 2023 were 50% higher than in 2019, largely due to AI data centers as well. We think this trend will force enterprise decision-making to pay more attention to sustainability.
Noteworthy Startups
AI Cloud Computing Infra: Coreweave, Crusoe, FoundryML, Lambda labs, Rescale, SF Compute, Shadeform, Tensorwave, Together AI
AI Processing Chips: Blaize, Cerebras, D-Matrix, Etched, Groq, Graphcore, Lightmatter, Rebellions, SambaNova, Tenstorrent, Hailo
Data Center Peripherals: Celestial AI, Corintis, Liquidstack, Jetcool, Phaidra, Thintronics, Xconn Technologies
INNOVATION IN LAYER-2 OF AI STACK – FOUNDATIONAL AI MODEL LAYER
Key Developments
• [Role of OpenAI] Over the past year, the performance gap between OpenAI and other AI research labs has narrowed. While OpenAI still leads, its dominance is less pronounced. Among startups, Anthropic stands out for its impressive progress in model upgrades, product launches, and talent acquisition, and is rumored to be approaching $1B in revenue run rate. It appears to us that the first-mover advantage may be less enduring in the age of AI. Our hypothesis is that in today’s interconnected world, mature communication infrastructure such as the internet and social media enables proprietary techniques and knowledge to disseminate faster than ever, decreasing technological moats. As a result, model performance increasingly hinges on access to capital, computing power, and high-quality data, rather than any proprietary model architecture and techniques. In this vein, one startup to watch for in the coming year is Elon Musk’s xAI – they already own one of the largest supercomputers in the world with over 100,000 H100s GPUs from Nvidia. Musk has hinted that their upcoming model, Grok 3, could already be comparable to GPT-4o.
• [Meta has Bet on Open-source AI] Meta continues its bold open-source approach, with Zuckerberg committing billions to its open-sourced Llama AI models and Meta's broader generative AI initiatives. This has positioned Meta as the leader in opensource AI - Llama models approaching 350 million downloads this year – representing more than a 10x increase from last year. On the consumer front, Meta is integrating their LLMs into existing consumer applications like Facebook and Instagram to prevent competitors from building standalone LLM interfaces like ChatGPT or Perplexity. Similarly on the enterprise front, Meta is working with large enterprises such as AT&T, DoorDash, and Goldman Sachs. It appears that the gap between closed-source proprietary AI models and open-source AI models has significantly narrowed, largely thanks to Meta’s industry-wide efforts.
• [Small Language Models (SLMs)] Growing attention is paid to small language models (SLMs) and their rapid progress in 2024 has surpassed our expectations. Today, a 3B parameter AI model can match the performance of the original 175B ChatGPT AI model, a more than 50x improvement in parameter efficiency in just 24 months. This remarkable progress stems from better compression techniques such as data distillation, quantization, pruning and the use of high-quality synthetic data. Researchers have shown that by carefully curating AI training datasets and removing various low-quality data, it’s possible to achieve better model performance with a much smaller footprint. This is the core idea behind SLMs. As a result, edge AI is becoming more viable. As SLMs performance improves and edge hardware (CPUs, NPUs) becomes more powerful, deploying AI workloads at the edge becomes increasingly feasible. Today, a 7B AI model can run efficiently on a laptop. Notable examples of SLMs include Microsoft's Phi-3 model, Google's Gemini Flash, and Llama 1B and 3B models.
• [Network Based AI Models] The industry is shifting towards a network of AI models i.e the architecture of Large Language Models (LLMs) is evolving from a large monolithic system to many distributed networks of smaller, specialized AI models. This involves a parent model orchestrating specific tasks via these smaller, targeted AI models. Recent research from Meta demonstrates that using multiple smaller AI models in parallel can consistently outperform a single large AI model.
• [Inference Time Reasoning] Models are moving towards more inference-time reasoning. OpenAI's latest o1 model signals the shift toward inference-time reasoning using techniques like chain-of-thought and reinforcement learning. The o1 model learns optimal paths through trial and error, much like human problem-solving involving a lot of self-reflection and error correction. This allows the model to excel in complex reasoning tasks, such as math, coding, and scientific queries. However, this capability comes at a cost, with o1's per-token price being 3-4 times higher than GPT-4o. This growing emphasis on inference-time reasoning will likely increase demand for latency-optimized computing infrastructure.
• [AI Model Profitability] At Service Ventures, we think the capital-intensive foundational model startups are likely to remain unprofitable in the short run. According to The Information, OpenAI’s total annual AI computing expenses for training and inference are expected to be $5B, exceeding its annual revenue of $4B. Assuming inference and hosting costs make up 50% of the cost, OpenAI operates at a gross margin of around 50%, significantly lower GM% less than the typical margins seen in software businesses. One reason for the low margins is the ongoing token price war among model providers, which has driven prices down by more than 10x this year. As model architectures evolve towards more inference-time reasoning, the cost structure may shift, reducing CapEx for training but increasing OpEx for inference activities. It will be interesting to see if foundational AI model startups can eventually achieve software-like margins as the industry matures. We believe break-even will not be achieved anytime soon.
• [Vertically Integrated Business Model] As AI token prices keep dropping, and just selling the foundational AI model layer becomes increasingly commoditized, AI companies may need to diversify their businesses model. It is true that rising adoption and revenue growth are helping to offset this decline for now, as in case of OpenAI - despite token prices dropping significantly, OpenAI is still projected to 4x its revenue this year. Anthropic is also rumored to be growing at 10x. In the longer term, however, model companies may need to consider vertical integration from A processors to AI models to AI apps, to offset commoditization at the model layer. We already see that in the infrastructure layer, OpenAI is developing its first in-house chip in partnership with Broadcom and TSMC. In addition, it’s collaborating with Microsoft on "Project Stargate," a $100B, 5GwH data center initiative. In the application and tooling layer, OpenAI is expanding into new products such as ChatGPT search, a Perplexity-like search tool, and OpenAI Swarm, a framework for building agents. To ensure long-term growth, foundational model companies may need to shift from purely providing models to developing tools and end-user applications.
Future Trends We Are Watching
• [Improvements of Foundational Models] Scaling laws are plateauing i.e. approaches which have historically driven model improvements through larger pre-training datasets, are reaching their limits. OpenAI’s next flagship model Orion is not achieving the dramatic performance leaps seen in its earlier model upgrades, signaling that straightforward scaling with large data or large model parameters may be giving way to a paradigm of model innovation such as better reasoning over massive pre-training - model performance will hinge less on sheer amounts of pre-trained data and more on advanced reasoning capabilities at inference time. Models like o1 are using chain-of-thought and reinforcement learning that gives models more advanced reasoning capabilities. This approach allows end users to tailor the model selection based on specific tasks, such as building a copilot for M&A negotiations. This approach gives users more flexibility and can help to reduce overall costs.
• [Role of Synthetic Data] As we run out of publicly available free training data for AI model, synthetic data will play a more prominent role. Synthetic data is being used more both in pre-training and post-training. Both SLMs and the LLMs of OpenAI and Anthropic incorporate synthetic data into their training. However, preserving the distribution and entropy of the synthetic dataset to ensure model robustness is the challenging part.
• [Bigger Role of Unsupervised / Reinforcement Learning] With LLMs having some base intelligence built in, we believe RL will play a bigger role in their further development. Google’s Genie AI model demonstrated this by creating an 11B parameters foundational model that could generate interactive environments from unsupervised training and unlabeled internet videos. Unsupervised training without any baseline truth or labeled data will be the future of large model development. That could allow AI agents to train themselves in synthetic but realistic environments, forming an important steppingstone on the way to artificial general intelligence (AGI). Over time, models will start to self-improve in a positive feedback loop, unlocking new possibilities that didn’t exist before. The evolution of LLMs mirrors stages of human cognitive development. The first phase, mostly reliant on scaling and pre-training. Now, we are entering a second phase, where the model’s "toddler-like" brain expands beyond its organic, foundational base. In this stage, it grows smarter through trial-and-error learning, observation, and active exploration of its environment – what is effectively Reinforcement Learning (RL).
• [Advancement in Multimodal AI] Multimodal AI could become the next major driver of computer demand. Recent advances in multimodal AI include the ability to process and integrate multiple data types like text, images, and audio simultaneously, leading to more natural and contextually aware AI systems. With notable developments like OpenAI's GPT-4 Vision, Google's Gemini, and advanced attention mechanisms that enable better data fusion across different modalities, it is opening applications in areas like customer service, healthcare, and autonomous driving with enhanced user interactions and richer understanding of complex information. Platforms like Hugging Face provide open-source tools to facilitate wider adoption and development of multimodal AI. OpenAI’s Sora captured the industry when it was released earlier this year. Startups like Runway, Luma, Pika Labs, and Genmo have come up with their own text-to-video models. Voice/text-to-speech applications are beginning to gain significant adoption among enterprises. These include applications such as content creation and customer support. Elevenlabs, one of the leaders in this space, is rumored to be approaching $100M in revenue run-rate. Another public company, Soundhound is forecasted to generate $150M in 2025.
• [Rise of Super Specialized AI Models] As general-purpose AI models get larger and bulky we expect a parallel rise in specialized AI models. These models may not always be only for text modes. Some of the more interesting areas include time-series AI models, physics/world models, life science and biology model. Foundational models for time-series analysis, like Amazon Chronos, treat time-series data as a "language" that can be modeled using a transformer architecture. One of the key strengths of time-series models is their ability to apply transfer learning – leveraging diverse datasets from different domains to improve generalizability. We believe foundational models for time series forecasting represent an exciting frontier yet to be fully explored. Their main challenge lies in obtaining sufficient training data. Unlike general purpose LLMs where one can rely on the public web for pretraining data, most time-series data is probably locked up within enterprises. LLMs have limited inductive biases and limited understanding of real-world physics. To address this, companies like World Labs are developing physics/world models that can understand and interact with the 3D physical world, like human spatial intelligence. This is important in areas like robotics. This undertaking is challenging as we believe the world of physics operates in a far higher-dimensional space compared to the world of language. Another example of a world model is Microsoft’s Aurora model - a foundational model for the atmosphere pre-trained on over 1M hours of weather and climate data. In under a minute, Aurora produces 5-day global air pollution predictions and 10-day high-resolution weather forecasts that outperform state-of-the-art classical simulation tools. While its most obvious impact is better weather forecasting, there are more potential applications such as insurance and risk assessment, financial trading, and agriculture management. Biology-focused foundational models, such as AlphaFold 3, represent a groundbreaking advancement in structural biology. These models can accurately predict the joint structures of complex biological molecules, including proteins, nucleic acids, and small molecules, enabling scientists to generate entirely novel protein sequences. Before the advent of models like AlphaFold, determining a single protein structure could take a PhD student the entirety of their research years—typically 4-5 years. In contrast, AlphaFold has predicted over 200 million protein structures in under a year, an achievement that has fundamentally transformed our ability to understand the building blocks of life. As these models evolve, they’ll revolutionize fields such as drug discovery and personalized medicine.
• [The AI Scientist] Longer-term, foundational AI models can be used to conduct basic scientific research and discover new knowledge. One example of this potential is “The AI Scientist” developed by Tokyo-based Sakana AI. AI Scientist is a comprehensive system for automating scientific discovery. It automates the entire research lifecycle, from generating novel research ideas, writing and necessary code, executing experiments and presenting the finding in an academic report. Just as AI is starting to learn coding and generate software by itself, we believe AI’s emergent capabilities will expand to the broader domain of scientific knowledge and discovery. Such advances could fundamentally reshape the trajectory of human progress by accelerating the pace of discovery across multiple scientific disciplines.
Noteworthy Startups
•Foundational Models Builders: 01.AI, Anthropic, Deepseek, H Company, Imbue, Minimax AI, Mistral, Moonshot AI, OpenAI, Reka AI, Safe SuperIntelligence, Sakana AI, Stability AI, xAI, Zhipu
•Small Language Models (SLMs): Arcee.ai, Bespoke labs, Nexa AI, Predibase
•Multi-modal AI: Black Forest Labs, Genmo, Higgsfield, Luma AI, Midjourney, Pika labs, Runway ML, Stability AI, Tavus, Twelve Labs, AssemblyAI, DeepL, Deepgram, Elevenlabs, PlayHT, Poly AI, Resemble AI, Suno, Symbl.ai, Udio
•Specialized Foundational AI Models: Archetype AI, Cradle, CuspAI, EvolutionaryScale, Formation Bio, Generate:Biomedicines, Hume AI, Illoca, Luminance, Nabla Bio, Orbital Materials, Pantheon AI, Physical Intelligence, Silurian AI, Synthefy, World Labs
INNOVATION IN LAYER-3 OF AI STACK – AI MANAGEMENT TOOLING LAYER
Key Developments
• [RAG is making AI More Accurate and Contextual] The rapid advancements in AI have led to the development of powerful large language models (LLMs) that can generate human-like text and code with remarkable accuracy. However, these AI models often struggle to incorporate domain-specific knowledge and real-time data, limiting their applicability in various industries. Retrieval Augmented Generation (RAG) has emerged as a promising solution to this challenge, enabling AI systems to access and utilize any proprietary information alongside the vast knowledge available on the internet. The impact of RAG technology is far-reaching, with potential applications across various industries. In healthcare, RAG models have the potential to assist doctors in making informed decisions by retrieving the latest medical literature, patient histories and treatment guidelines. Legal professionals can leverage RAG to efficiently access relevant case law, statutes and legal articles, streamlining their research process and improving the accuracy of their arguments. RAG-powered chatbots and recommendation systems are revolutionizing customer support and e-commerce platforms by providing personalized solutions based on real-time product information and user data. In the financial sector, analysts and investors can utilize RAG models to quickly retrieve live market data, internal organization data, news articles, and economic reports, facilitating data-driven investment decisions and generating valuable market insights. The main advantage of RAG lies in separating the model's reasoning plane from the data plane, allowing responses to be grounded in live, real-world data and minimizing the risk of hallucinations. While public RAG offerings are available, we are seeing many organizations opt for private RAG deployments within their secure cloud environments. This approach ensures the protection of sensitive data and proprietary knowledge while allowing for customization to meet an organization's specific needs. Several robust frameworks and platforms, such as LangChain, LlamaIndex and ZBrain, have emerged to support the development and deployment of private RAG solutions. These tools simplify the integration of an organization's proprietary data with advanced language models, enabling the creation of powerful RAG applications without extensive custom development. But when implementing Retrieval Augmented Generation (RAG) solutions, CEOs/CTOs may face several potential roadblocks. One significant challenge is data privacy and security concerns, particularly when dealing with sensitive or proprietary information. Ensuring that RAG systems comply with data protection regulations and maintain the confidentiality of sensitive data is crucial. Leaders must invest in robust cybersecurity measures and establish clear data governance frameworks to mitigate these risks and build trust among stakeholders. Another roadblock is the lack of standardized data formats and interoperability among different systems and data sources. This can hinder the seamless integration of RAG solutions and limit their effectiveness. Leaders may need to invest in data cleaning, normalization, and integration efforts to ensure that the RAG system can access and utilize data from various sources efficiently. By combining the power of private cloud deployment, cutting-edge frameworks, and state-of-the-art language models, RAG is unlocking new frontiers of AI-driven intelligence. According to a recent report by Menlo VC, retrieval-augmented generation (RAG) adoption has increased to 51%, up from 31% last year. Meanwhile, fine-tuning remains uncommon, with only 9% of production models being fine-tuned. Optimizing chunking and retrieval remains more art than science at this point, with the last mile proving especially difficult to get right. To address some of these challenges, many companies are incorporating deterministic structures or ontologies alongside RAG to enhance performance. Knowledge graphs, for example, add a layer of structured semantic relationships between data points, making it easier to retrieve precise and relevant information for a given query. This stands in contrast to traditional RAG, where similarity is often based solely on distance between data points, which lacks deeper semantic understanding. Incorporating these structures can improve the overall accuracy of the retrieval, important particularly in domains where accuracy is critical such as healthcare, financial services, and legal. As the underlying foundational model continues to improve, we anticipate this trend will continue to favor RAG.
• [In-house AI Solutions] We see more enterprises are opting to build AI solutions in-house. The Menlo report shows that nearly half of all GenAI solutions are now developed internally, up from just 20% last year. One of the most common reasons we’ve heard for why enterprises opt for in house is because they’re reluctant to give up their data to third parties. Another is the desire for greater customization to meet specific business needs. With open-source models closing the gap against closed-source models, we expect the trend towards in-house to continue. The biggest bottleneck for enterprises is often the data curation and preparation stage. It is well known now that AI teams spend most of their time on data preparation, with a minority of time for actual model development and deployment. Most of the data available to organizations is in the form of unstructured data, which makes up ~80% of total data today. This can be in the form of emails, documents, contracts, websites, social media, logs, etc. Transforming the data into a usable format for machine learning deployment requires extensive cleaning and standardization. Once the data is collected and cleaned, it must be then vectorized, typically by leveraging a vector database. These steps are not trivial and require deep technical and domain expertise.
• [Monetization and Exits in the Tooling layer is Challenging] Monetization is often difficult due to intense competition, the availability of open-source alternatives, and established players entering the space. For instance, although Pinecone is widely regarded as the leader in vector databases, it faces competition from open-source projects like Milvus (Zilliz), Weaviate, Chroma, and Qdrant. Additionally, major database companies such as MongoDB and Elastic have all introduced their own vector search capabilities, adding further competitive pressure. In addition, large cloud providers are also in this space. Products like AWS Sagemaker, Azure Machine Learning, and Google Vertex AI all offer fully managed service offerings. These end-to-end solutions are a one-stop shop for building, training, and deploying ML models. Similarly inference optimization has been a popular but highly competitive space. In the past year, we’ve seen four inference optimization players get acquired by larger companies – Run:ai, Deci, and OctoAI by Nvidia, and Neural Magic by Red Hat. Deci’s investors likely fared well (acquisition price of $300M against $57M total raised), whereas OctoAI’s investors likely saw limited returns (acquisition price of $250M against $133M total raised). Other notable players in this space include BentoML, Baseten, Fireworks, Lamini, and Together AI. Some of these companies have opted for a more integrated approach by acquiring GPUs themselves and offering a more end-to-end solution. It remains to be seen if these companies can thrive as a standalone public company, or if they will ultimately be acquired, much like their peers.
Future Trends We Are Watching
• [Model Evaluation Remains Important in GenAI] A primary challenge in deploying LLMs is ensuring they consistently deliver accurate and reliable outputs. Even though 90% accuracy might seem good, in most real-world use cases, that’s simply not good enough for production. The model can prove “dangerous” even if it is wrong 10% of the time. Therefore, implementing effective evaluation methods for LLMs is a crucial aspect of ensuring their reliability and performance. Unlike earlier AI machine learning models, which were easily accessible and could be closely monitored, today’s LLMs are often accessed through APIs, making them far less transparent. There is no standard metrics to evaluate LLMs. Evaluating LLMs is significantly more complex because their applications – such as Q&A, summarization, code generation, and creative writing – are diverse and often industry-specific. Healthcare, for instance, requires different evaluation metrics compared to legal. Consequently, there is no "one-size-fits-all" metric that can effectively assess LLM performance across all contexts, unlike the standardized approach used by credit agencies. Startups and AI labs are actively working to address the evaluation challenge. Some startups like Braintrust are trying to build a more domain-agnostic, end-to-end platform equipped with automated evals. OpenAI also recently released SimpleQA, which is a simple evaluation model to check the factuality of the response. Despite these efforts, no universally accepted framework has yet been established for evaluating LLMs effectively. One popular strategy for enhancing the reliability of LLMs involves using one model to evaluate another. The chosen model evaluates the output of another, providing a layer of scrutiny that can enhance confidence in the results. However, the success of this strategy largely depends on the quality of both the judging and judged models, as well as the ability of the overseeing human to interpret and act on the evaluations correctly.
• [AI Tooling will Revolve Around Agents] The future of AI tooling is likely to revolve around agents and the next stage of gen AI is likely to be more transformative. We are beginning an evolution from knowledge-based, gen-AI-powered tools—say, chatbots that answer questions and generate content—to gen AI–enabled “agents” that use foundation models to execute complex, multistep workflows across a digital world. In short, technology is moving from thought to action. Broadly speaking, “agentic” systems refer to digital systems that can independently interact in a dynamic world. While versions of these software systems have existed for years, the natural-language capabilities of gen AI unveil new possibilities, enabling systems that can plan their actions, use online tools to complete those tasks, collaborate with other agents and people, and learn to improve their performance. Gen AI agents eventually could act as skilled virtual coworkers, working with humans in a seamless and natural manner. A virtual assistant, for example, could plan and book a complex personalized travel itinerary, handling logistics across multiple travel platforms. Using everyday language, an engineer could describe a new software feature to a programmer agent, which would then code, test, iterate, and deploy the tool it helped create. Agentic systems traditionally have been difficult to implement, requiring laborious, rule-based programming or highly specific training of machine-learning models. Gen AI changes that. When agentic systems are built using foundation models (which have been trained on extremely large and varied unstructured data sets) rather than predefined rules, they have the potential to adapt to different scenarios in the same way that LLMs can respond intelligibly to prompts on which they have not been explicitly trained. Furthermore, using natural language rather than programming code, a human user could direct a gen AI–enabled agent system to accomplish a complex workflow. A multiagent system could then interpret and organize this workflow into actionable tasks, assign work to specialized agents, execute these refined tasks using a digital ecosystem of tools, and collaborate with other agents and humans to iteratively improve the quality of its actions. Building the foundational scaffolding to support a world full of agents is the first step. How do agents securely provide credentials? How can one ensure proper authentication to verify that an agent is acting legitimately on a person’s behalf? Agent-to-agent collaboration will also take center stage in the future. An agent orchestration layer is needed to coordinate agent-to-agent communication. It can also be responsible for assessing agent performance, setting guardrails, and ensuring compliance with their intended mandates. As agents gain more autonomy, ensuring data security and privacy will become increasingly important. New standards and regulations will be needed to govern agent behavior, much like GDPR and similar frameworks that guide human data usage. Finally, for agents to succeed at scale, we must develop effective feedback and learning mechanisms. Agents will need to learn from successes and failures to continuously improve. This will require a robust feedback system that enables agents to improve dynamically while maintaining safety and compliance. We believe the supporting infrastructure around agents, rather than the agents themselves, will be the biggest bottleneck to widespread adoption. The infrastructure scaffolding will need to be developed first before agents can truly proliferate. The orchestration layer will likely be owned by a CSP or a well-funded startup like Emergence AI. Smaller startups can carve out niches by providing ancillary tools like performance monitoring or security, which will feed into this orchestration layer.
Noteworthy Startups
• Data: Cleanlabs, DatalogyAI, Hugging Face, Marqo, , Scale AI, Shakudo, Snorkel AI, SuperAnnotate, Unstructured.io, Weka, Zillis
• RAG & AI Model Customization: Cohere, Contextual AI, Upstage AI, Vectara
• VectorDBs / Embedding Models: Chroma, Milvus, Pinecone, Qdrant, Voyage AI, Weaviate
• Model Serving: Anyscale, Baseten, BentoML, CentML, Clika AI, Fireworks, Lamini, Lightning AI, Modular, OpenPipe, Replicate, TensorOpera, Together AI
• Evaluation & Observability: Arize AI, Braintrust, DynamoAI, Fiddler AI, Galileo, Galileo AI, Observe, Weights and Biases, WhyLabs
• AI Security: CalypsoAI, Grey Swan, Hidden Layer, Protect AI, Robust Intelligence (acquired by Cisco), Troj.ai
• Agent Orchestration & Tooling: Emergence AI, Langchain, MemGPT, Tiny Fish, UnifyApps
INNOVATION IN LAYER-4 OF AI STACK – AI APPLICATIONS LAYER
Key Developments
• [GenAI Adoption is Growing Rapidly] If 2023 was the year the world discovered generative AI (gen AI), 2024 is the year organizations truly began using—and deriving business value from—this new technology. In the latest McKinsey Global Survey on AI, 65% of respondents report that their organizations are regularly using gen AI, nearly double the percentage from our previous survey just ten months ago. Respondents’ expectations for gen AI’s impact remain as high as they were last year, with three-quarters predicting that gen AI will lead to significant or disruptive change in their industries in the years ahead. Organizations are already seeing material benefits from gen AI use, reporting both cost decreases and revenue jumps in the business units deploying the technology. Enterprises realizing real ROI from GenAI. AI is delivering tangible value for enterprises. For instance, Klarna reported that its AI-powered agents performed the equivalent work of 700 customer service representatives. Despite a 27% increase in revenue, Klarna reduced its headcount from 5,000 to 3,800, with plans to cut further to 2,000. These reductions were driven not by declining growth but by efficiency gains from AI. Menlo estimates that $4.6B has been spent on generative AI applications in 2024, an almost 8x increase from $600M last year. We expect growth to continue next year as many current projects in POC stage convert to full deployment. This makes sense given that when ChatGPT launched in November 2022, most of the corporate budgets for 2023 were already finalized. Therefore, 2024 has been the first year where companies truly had budgets for AI experimentation. Looking ahead, we anticipate AI-related budgets to expand further in 2025 and beyond. A KPMG survey of over 225 C-suite executives revealed that 83% of respondents plan to increase their investments in GenAI over the next three years.
• [AI for Code Generation] During Google's latest earnings call, it was revealed that 25% of the company's code is now generated by AI. Microsoft GitHub Copilot, the leading player in this space, has reached ~$300M ARR and now accounts for 40% of Github’s revenue growth. Several startups, like Cursor, Poolside, Codeium, and Cognition have also entered the market, collectively raising over $1B to challenge GitHub’s dominance. While some startups have had impressive traction (~$50m+ ARR), most remain in the early stages of commercialization.
• [AI for Internet Search] Perplexity has emerged as one of the most recognized startups in GenAI and have been innovating quickly with features such as Perplexity Finance and Perplexity Shopping Assistant. However, competition is expected to intensify as OpenAI has ChatGPT Search and Meta introduces its own AI-powered search engine. Perplexity’s rumored $50M ARR is an impressive milestone for a startup of its age, but still represents only 0.025% of Google’s $200B search revenue, highlighting both the immense scale of competition and the significant growth potential ahead. It remains to be seen if Perplexity’s push into advertising can help it further chip away at Google’s share in search.
• [AI for Task Agents] Agent startups have been gaining traction, particularly those in customer support and sales & marketing. Companies such as Sierra, Maven AGI, and Ema are targeting enterprises, while others like 11x, Artisan, and StyleAI focus more on SMB and mid-market. Larger players are also entering the space, typically with a more horizontal platform approach: Google has Vertex AI Agent Builder, Microsoft has Copilot Studio Agent Builder, Amazon has Amazon Bedrock, and Salesforce has Agentforce. Currently, the agents we’ve seen can handle only relatively simple tasks, but as foundational models improve, especially with advanced reasoning capabilities like those in OpenAI’s o1 model, we expect them to become more capable.
• [GenAI is Transforming SaaS Models] GenAI is leading to companies adopting more usage-based pricing. For example, Salesforce is charging $2 per conversation for their Agentforce agent while intercom is charging $0.99 per ticket resolution. Just as SaaS revolutionized the software industry by combining technological innovation with a business model innovation (shift from license & maintenance to recurring subscriptions), we believe AI too has the potential to drive another wave of business model innovation. But improvements to AI foundational models can leave the AI application startup better or worse off. Applications with simple UI wrappers that rely mostly on the strength of underlying AI models are likely to lose out. We believe those startups that begin with a deep understanding of the end users' journey and associated workflows are likely to benefit.
• [AI in Robotics] The robotics space is seeing renewed interest, fueled by the potential to develop foundational models like LLMs for robotics. The latest generation of robotics startups is moving away from heuristic and rule-based programming, focusing instead on end-to-end neural networks. Telsa’s latest FSD software is an example of an end-to-end neural network, relying mostly on vision and data rather than explicitly coded controls. However, robotics continues to face a significant data bottleneck, and various techniques are being explored to address this challenge. While imitation learning and teleoperations offer high-quality data, they may not be scalable on their own. Recently, the use of videos and simulations for training has emerged as another promising avenue, with Nvidia Issac Sim and some startups working on this. Conceptually, Google’s RT-2 model demonstrated the potential for generalized robotics performance by leveraging a large model trained on internet-scale vision and language data and fine-tuning it with a smaller set of robotics data. The primary challenge in simulation lies in creating realistic representations of the ground truth to minimize the sim2real gap. This is particularly difficult because robots come in diverse embodiments and form factors, making data collection and standardization challenging. Ultimately, we believe no single method is likely to solve all these challenges; an ensemble of techniques involving teleoperations, simulation, and video will be required to make it work.
Future Trends We Are Watching
• [AI Services-as-a-software] AI presents an opportunity to disrupt not just the $400B global SaaS market but also the $4T services market, key areas being sales & marketing, software engineering, legal, HR & recruiting, and customer support. Many jobs today involve repetitive tasks, making them ideal candidates for AI-driven automation. Together these sectors represent ~120M workers globally and almost $4T worth of salaries. Considering the sheer scale of these markets, the potential impact is clear.
• [“Computer Use” is a Major Inflection Point] Anthropic recently introduced “computer use”, which allows developers to direct Claude to use computer in an agentic manner - clicking buttons, typing text, etc. Claude looks at screenshots of what’s visible to the user, then calculates how many pixels it needs to move the cursor to click on the correct location. Broadly speaking, there are two approaches to enabling AI agents to perform tasks. The first is API-based, where tasks are broken down into subtasks and executed by chaining together API calls. The second, as seen with Anthropic’s “computer use” is the UI-based approach, which leverages vision and reinforcement learning to interact directly with browsers to execute tasks. In other words, it teaches the model to use computers the way humans do. While the latter approach is theoretically simpler in its lower-level end-to-end approach, it requires more training data and may be computationally more expensive. The end state may indeed be vision-based, but for now, a hybrid approach combining both methods may be required to optimize performance and cost. This is akin to what happened in the full self-driving (FSD) world where end-to-end neural nets relying solely on vision slowly replaced rule-based controls. Computer use is a significant breakthrough for robotic process automation (RPA). Traditional RPA tools have traditionally faced challenges due to their fragility, as workflows often broke when interfaces changed, requiring constant maintenance. With Anthropic’s computer use, AI models can now adapt to diverse interfaces, reducing the dependency on hard-coded scripts. This breakthrough has already had an impact: UiPath quickly integrated Claude 3.5 Sonnet into three of its key products following Anthropic’s announcement of computer use. This swift adoption underscores how transformative computer use could be in driving the next wave of RPA and intelligent automation. However, we think it’s still early - to use the FSD analogy again, we believe intelligent automation is still in the L1/L2 phase.
• [New Hardware Form Factors for AI] 2024 has seen the rise of novel hardware form factors aimed at complementing or even replacing smartphones. Despite the excitement, success has been limited so far. For instance, Humane’s AI-powered wearable pin has generated just over $9M in lifetime sales, despite raising $200M in funding. Worse, daily return rates are reportedly outpacing new sales. Similarly, Rabbit R1 has received extremely poor reviews. The one notable success story has been the second-generation Meta + Ray-Ban smart glasses. These devices have outsold the previous generation’s two-year sales figures within just a few months, earning generally positive reviews. Meanwhile, major research labs and tech incumbents are also exploring this area. OpenAI, for example, recently hired Caitlin Kalinowski, a former AR hardware leader at Meta, to oversee their robotics and consumer hardware. In addition, Apple design icon Jony Ive has partnered with Sam Altman on a new AI hardware project. The potential to combine large GenAI with new hardware form factors represents an exciting frontier.
• [GenAI Consumer Company is Yet to Break Out] Many of the leading AI consumer startups ended up being acquired this year. For example, we observed two acqui hires in the consumer GenAI space – Google’s deal with Character.ai and Microsoft’s deal with Inflection. We believe consumer GenAI applications have yet to break out due to two primary reasons: First, there hasn’t been a killer consumer use case beyond chatbots like ChatGPT and Perplexity. While Character.ai has arguably achieved product-market fit, its relatively narrow demographic appeal—over half of its users are aged 18 to 24—limits its broader potential. We believe the next transformative consumer application will be a highly capable personal assistant (a smarter Siri), with the longer-term vision being personalized digital twins for everyone. Second, successful consumer applications often require viral adoption in the beginning, sometimes driven by upfront usage subsidies by the company. However, the current cost of AI tokens – especially for multimodal AI models – remains too high for such mass subsidization to be economically feasible. As token costs decrease and/or more workloads move to the edge, we anticipate a new wave of GenAI consumer companies emerging.
• [Inference Time Reasoning] This approach allows an AI model to think more like a human, breaking down problems, considering multiple solutions, and refining its approach iteratively as it processes the information. Instead of relying on sheer size of the model parameters and training data, models can use smarter, more efficient strategies to solve complex tasks. This shift marks a major paradigm change in how AI systems are developed and deployed. AI models with enhanced inference-time reasoning capabilities can tackle increasingly complex scientific challenges. Some of the most promising opportunities in this space lie in drug discovery/biomedicine, material sciences, and physics/robotics. In a major nod to open-source, Google DeepMind recently released the code and weights for AlphaFold 3. This surprise announcement followed just weeks after the system’s creators, Demis Hassabis and John Jumper, were awarded the 2024 Nobel Prize in Chemistry for their contributions. In the scientific domain, startups may pursue various monetization paths: Some choose to offer their tools as SaaS platforms, others pursue a licensing model, and some may even act as principal agents, directly bringing their solutions to market to capture a greater portion of the TAM.
• [Vertical AI Applications] Just as SaaS evolved from horizontal to vertical solutions, we anticipate a similar transition in the AI space. Early in a market's lifecycle, horizontal tools gain traction quickly due to their broad appeal to a large market. However, as the market matures and competition increases, startups often move towards specialized, vertical or domain-specific solutions to differentiate themselves. In AI, this shift to verticalization appears to be happening more quickly than in SaaS for several key reasons. AI thrives on domain-specific data: AI performs best when trained on data specific to an industry or use case. Many industries have highly domain-specific data sets, making specialized or verticalized training more effective. For instance, in RAG, understanding domain context is critical for retrieval accuracy. Crowded horizontal market: Unlike the early days of SaaS, many well-established incumbents are already heavily investing in GenAI and launching horizontal solutions. These incumbents are typically already the system of record for whatever use case they are targeting, be it Salesforce for CRM, SAP for ERP, or others. This gives incumbents a substantial edge in distribution and integration. For startups, targeting verticalized or specialized markets may allow them to carve out a more defensible position that can lead to a better chance of success. Regulatory stringency in key industries: Regulated industries like healthcare, legal, and finance have stringent regulatory requirements. These requirements are most effectively met through verticalized approaches tailored to the unique requirements of each industry.
Noteworthy Startups
• SW Development: Augment Code, CodeComplete, Codeium, Cognition, Cursor, Magic.dev, Poolside, TabbyML, Tabnine, Tessl
• Enterprise Productivity: Consensus, Dust.ai, Exa, Fireflies.ai, Glean, Highlight, Mem, Otter.ai, Read.ai, Taskade, Wokelo AI, Beautiful.ai, Gamma, Tome
• Consumer AI: Genspark, MultiOn, Liner, Ninjatech.ai, Perplexity, Simple AI, You.com
• Multi Modal AI: Black Forest Labs, Captions, Coactive, Creatify, Deepbrain, Descript, Heygen, Ideogram, Luma, Openart.ai, Opus Clip, PhotoRoom, Runway, Synthesia, Viggle AI
• Next-gen RPA: Automat, Caddi, HappyRobot, Orby, Sola, Tektonic AI
• Traditional Robotics: ANYbotics, Bright Machines, Field AI, Hillbot, Path Robotics, Physical Intelligence, Skild AI, Swiss-Miles, World labs
• Humanoid Robotics: 1x Technologies, Figure AI
• General Agents/Copilots: DeepOpinion, Ema, Factory AI, Gumloop, Jasper, Lyzr, Relevance AI, Sierra, SquidAI, Stack AI, Tektonic AI, Wordware, Writer
• HR/Recruiting: ConverzAI, Eightfold, Jobright.ai, Mercor, Micro1, Moonhub
• Customer Support: AptEdge, Cresta, Decagon, Maven AGI
• Sales and Marketing: 11x, Adsgency, Artisan AI, Bounti.ai, Connectly AI, Typeface, StyleAI, Mutiny, Nectar AI, Nooks, Omneky, Rox, Simplified
• Product Design and Engineering: Ambr, Skippr, Uizard, Vizcom
• Chip Design: Astrus, Mooreslab
• Vertical Specific- Healthcare: Abridge, Ambience Healthcare, Atropos Health, Cair Health, Hippocratic AI, Hyro, Nabla, Scribenote, Segmed.ai, Slingshot AI, Suki AI, Tennr
• Vertical Specific - Finance: AskLio, Auditoria.ai, Finpilot, Hebbia, Klarity, Kipoparts, Linq Alpha, Menos AI, Rogo, Spine AI
• Vertical Specific - Legal: Casetext (Thomson Reuters), Cicero, EvenUp, Genie AI, Harvey AI, Leya, Robin AI, Solomon AI, Solve Intelligence*, Spellbook/Rally
• Vertical Specific - Education and Language: Elsa, Eureka Labs, MagicSchool AI, Pace AI, Praktika, Riiid, Sana, Speak, Uplimit
• Vertical Specific - Gaming and Entertainment: Altera, Inworld AI
• Vertical Specific - Compliance: Greenlite, Norm AI
• Vertical Specific - Real Estate: Elise AI
• Vertical Specific - Mobility: Carvis.AI, Revv
VENTURE CAPITAL INVESTMENTS AND M&A
• [VC Investments] Year-to-date (YTD) AI investments have exceeded $60B, now representing more than one-third of all venture funding. The largest rounds continue to be in the infrastructure and model layers, with major raises from OpenAI ($6.6B), xAI ($5B), Anthropic ($4B), SSI ($1B), and CoreWeave ($1B). The growing computing demands remain a key driver, with OpenAI alone expected to spend $3B on training this year. Some application companies, particularly those in the code generation space, have also raised substantial rounds as they pre-train their own code generation models. Other areas that received significant investor attention include AI chips, AI clouds, robotics foundational model, and enterprise AI. Investor demand for AI startups remains strong, with many companies securing significant funding early in their lifecycle. Startups like SSI and World Labs have already reached unicorn status, driven by the strong pedigree of their founding teams. While overall AI valuations remain high –with an average revenue multiple of 26x – investors have become more selective due to the sheer volume of new startups building in this space. For instance, AI startups represented 75% of the most recent YC Summer batch. Strategic investors like Nvidia and the CSPs continue to show strong interest in AI startups, driving up overall valuations and increasing competition for financial VCs. CSPs are sitting on all time cash balances that need to be reinvested back into growth. For example, Amazon recently announced another $4B investment in Anthropic.
• [M&A] Rise of “reverse acquihires”. This year witnessed the rise of “reverse acquihires” where an incumbent hires a substantial portion of a startup’s team and sometimes licenses its technology, bypassing the complexities of a full acquisition. This strategy allows major tech companies to bolster their AI capabilities while avoiding regulatory scrutiny. Microsoft recruited key personnel from Inflection AI, most notably CEO Mustafa Suleyman, who now oversees Microsoft’s entire AI portfolio, including Copilot, Bing, and Edge, and reports directly to Nadella. Given the scope of Suleyman’s role, the price tag may have been worth it. Amazon acquired two-thirds of Adept AI’s workforce, including CEO David Luan, and secured a non-exclusive license to the company’s foundation model. The company received $25M for the licensing deal, while their investors, who put $414M into the company, will roughly recoup their investment. With the reverse acquisition of Character.ai, Google brought on CEO Noam Shazeer, President Daniel De Freitas, and about 30 of Character.ai’s 130 employees in a deal reportedly valued at $2.7B—more than 2.5x the company’s last valuation. Total M&A activity (estimated to be ~$2-3B) was relatively subdued this year. This was like last year where there were only three major acquisitions (MosaicML, CaseText, Neeva). Interestingly, 5 of the 8 notable acquisitions this year were in the tooling layer, and 4 of them were in the inference optimization space (OctoAI, Deci, Run:ai, Neural Magic). Notable acquisitions include Snowflake’s acquisition of Datavolo (undisclosed), Red Hat’s acquisition of Neural Magic (undisclosed), Nvidia’s acquisition of OctoAI for ~$250m, Nvidia’s acquisition of Deci for ~$300M, Nvidia’s acquisition of Run:ai for ~$700M, Docusign’s acquisition of Lexion for ~$165M, Cisco’s acquisition of Robust Intelligence (undisclosed), Canva’s acquisition of Leonardo.ai (undisclosed).
OTHER AI RELATED TRENDS IN 2024
• [Sovereign AI] As AI proliferates, the concept of Sovereign AI is drawing increasing attention. A central concern for many governments is whether they are comfortable with sensitive data being processed on platforms like ChatGPT, which are controlled by other countries. The broader geopolitical rift is increasingly mirrored within the microcosm of the AI world, resulting in the emergence of separate AI ecosystems across different regions. A key consideration is: at which layers of the AI stack will sovereign AI emerge? Current developments suggest it will primarily manifest in the infrastructure and model layers. For investors, this presents unique opportunities to back region-specific startups as distinct ecosystems arise globally. Success may not require backing the global leader; instead, regional champions could thrive in local markets as well. The U.S. continues to lead innovation in GenAI across all layers of the stack. Major AI labs like OpenAI, Anthropic, and Meta dominate advancements, thanks to a deep talent pool and world-class academic institutions. On the infrastructure side, U.S. Hyperscalers provide unmatched compute power, while Nvidia maintains its lead in hardware. This integrated ecosystem gives the U.S. a considerable advantage in the short to medium term. In response to semiconductor export controls, China is prioritizing its domestic chip industry. In May, the government announced a $47.5B state semiconductor investment fund to bolster its chip industry. Although hardware lags, Chinese LLMs, such as Alibaba's Qwen and DeepSeek, remain highly competitive. Surprisingly, China leads in generative AI adoption, with 83% of companies testing or implementing the technology – surpassing the U.S. (65%) and the global average (54%). Europe’s stringent regulations, like the EU AI Act, may stifle AI innovation. Regulations have already caused U.S. tech giants, including Meta and X, to delay AI rollouts in the region. Apple, too, opted not to launch Apple Intelligence on its latest iPhone in Europe for similar reasons. While Europe boasts standout labs like Mistral AI, their ability to compete independently with U.S. CSPs and AI labs remains unclear for now, especially considering the regulatory handicap. Japan is experiencing significant growth in data center infrastructure. Oracle recently announced an $8 billion investment in new data centers, following Microsoft’s $3 billion commitment. At the model layer, Sakana AI has emerged as a key player, recently closing a $200M Series A round. Japan’s government is fostering AI innovation with light regulations and strong support, recognizing AI as a critical technology to address its aging population and growing need for automation. The large market opportunity combined with government support positions Japan as a promising market for AI innovation and startups.
• [AI and Copyright] The intersection of AI and copyright law is becoming a critical issue as GenAI content becomes more pervasive. Recent controversies, such as claims that Apple, Nvidia, and Anthropic used YouTube videos without permission to train AI models, have highlighted concerns of intellectual property violations. Similarly, major record labels like Universal, Sony, and Warner are suing AI startups such as Suno and Udio for using copyrighted music to generate content. These disputes highlight the tension between innovation and the safeguarding of creative assets. The news industry is also grappling with these challenges. News Corp recently sued Perplexity AI over alleged false attributions and hallucinations tied to its publications, while simultaneously striking a $250M partnership with OpenAI to provide access to its archives. These contrasting moves reflect both the risks and opportunities AI presents to traditional media. Despite these disputes, some platforms are leveraging AI to enhance creativity and compliance. For example, Spotify is using generative AI to personalize user experiences while adhering to copyright laws. This suggests that AI and intellectual property can coexist, provided there are clear frameworks. Some companies are working to address this issue. An interesting startup is Tollbit which is helping to bridge the gap between publishers and AI developers by enabling publishers to monetize their content. Another one is ProRata.ai which is working on attribution technology to enable fair compensation to content owners. Striking a balance between innovation and protecting creators' rights will be a crucial topic in the years ahead.
• [AI Regulations] The EU AI Act represents the first comprehensive framework of its kind, categorizing AI systems by risk levels: unacceptable, high, limited, and minimal. Systems considered "unacceptable risk," like social scoring by governments or manipulative AI, are banned, while "high-risk" systems in critical sectors face stringent requirements. Key provisions include mandatory compliance assessments, robust documentation, and oversight by a European AI Board. Non-compliance could result in fines of up to €30M or 6% of global revenue. Following his re-election in November 2024, President Donald Trump announced plans to repeal Biden's AI executive order upon taking office. The Trump administration is arguing that the existing regulations stifle innovation and impose unnecessary burdens on businesses. Instead, they advocate for a more industry-friendly approach, emphasizing voluntary guidelines and reduced federal oversight to promote AI development.
CONCLUSION
If we had to distill everything into five key takeaways from AI developments in 2024, they would be:
• The entire infrastructure stack is undergoing a significant overhaul, reminiscent of the internet and cloud buildouts. The demand for inference is only beginning to accelerate and will be driven by increasing adoption of GenAI, new multimodal applications, and evolving model architectures.
• As scaling laws begin to plateau, model development is shifting away from large pre-training datasets towards inference-time reasoning. This shift enables models to tackle more complex reasoning tasks. Concurrently, the rise of smaller, specialized SLMs promises greater efficiency and flexibility for users.
• For the first time, AI is delivering tangible ROI in enterprise settings, with use cases like code generation, customer support, and search driving measurable impact. The next frontier lies in the proliferation of AI agents, but their true potential will only be realized after we build the underlying scaffolding required to enable multi-agent interactions.
• Investment in AI continues to grow, particularly in the infrastructure and foundational model layers. Most exits will be through M&A, but high investor expectations could clash with market realities, potentially impacting future valuations.
• The rapid adoption of AI has outpaced regulatory frameworks, sparking debates over topics like copyright and intellectual property. Meanwhile, nations are increasingly framing AI as a matter of sovereignty, leading to greater focus on the regionalization of AI ecosystems.
It’s been only 24 months since ChatGPT took the world by storm – an event Jensen Huang of Nvidia aptly described as the "iPhone moment" for AI. In this short time, we've witnessed some of the fastest innovations in modern history. Massive infrastructure investments, daily breakthroughs in foundational models, and an insatiable appetite for enterprise adoption have converged to reshape not only technology but the very way our society operates.
As we move into 2025, one thing is clear: this is only the beginning. There’s still so much left to build and discover. If history has taught us anything, it’s that progress is rarely linear – surprise breakthroughs will always occur alongside unexpected setbacks. For everyone living through this these pivotal years of AI – entrepreneurs, technologists, students, and investors alike – our mission is clear: to engage deeply, innovate responsibly, and create a future where humans and technology can coexist in harmony. In doing so, we leave a better world for our future generations, just as our predecessors aspired for us.
/ Service Ventures Team
Comments