Sarvam AI: Building generative AI a billion Indians can use

With the belief that India should have its own sovereign AI stack, with the agency to run it and the expertise to build it, Pratyush Kumar and Vivek Raghavan are building Sarvam AI

By Harichandan Arakali Forbes India Staff

Published: Sep 25, 2024 12:59:25 PM IST

Updated: Sep 25, 2024 01:30:25 PM IST

Full Bio

Vivek Raghavan (left) and Pratyush Kumar, Cofounders, Sarvam AI Image: Selvaprakash Lakshmanan for Forbes India; Digital imaging: Kapil Kashyap

Sarvam AI, which is seen in India’s tech circles as the country’s torchbearer in the world of GenAI (generative artificial intelligence), is only about a year old if one goes by the date it was incorporated as an Indian company.

However, Axonwise Private Limited, the company’s legal entity, has already raised $53 million—via a seed round followed by a Series A investment last year—from investors, including Lightspeed Venture Partners, Peak XV Partners, Khosla Ventures and Venture Highway. A select few corporate and angel investors are also on board.

One important reason such high-power backers have decided to fund Sarvam is that it is among a precious few ventures in the country to be working on their own foundation-level AI tech. That means they’re building the core elements that make up their own GenAI systems, including the algorithms, architectures, training techniques, curated datasets and so on, all leading to their own GenAI technologies on which they as well as others can build useful applications.

The company has released a raft of voice-based AI tools for enterprise customers, addressing some of the most common use cases, and an open source text-based language model, named Sarvam 2B, given that it was trained on 2 billion parameters, with an emphasis on Indian languages. “The intention was that India should have its own sovereign stack of AI. By that we mean not just the agency to run it, but also the expertise to build it,” says Pratyush Kumar, one of two co-founders of Sarvam.

The aspiration at Sarvam, as the word Sanskrit meaning ‘all’ suggests, is to achieve in AI something akin to what the Unified Payments Interface (UPI) has done in payments and fintech in the country: “We want to build GenAI that a billion Indians can use,” explains Vivek Raghavan, the other co-founder.

Kumar and he are adherents of the idea that Nandan Nilekani, the founding chairman of India’s Unique ID Authority, articulated in a paper for the International Monetary Fund on how India might play an important role in developing AI for the world, co-authored with Tanuj Bhojwani.

It is that just as with UPI and payments, India will find its own approach to AI that focuses on use cases to address the needs of a population that is nearly twice as large as all the European nations combined. And just as UPI is now being exported to the global south, Sarvam’s first AI tools are already seeing interest not only in Africa, but even in rich nations like Singapore.

Sarvam’s approach to achieving such scale involves the following components. First, the company aims to create a comprehensive suite of technologies—often referred to as the “full stack” in industry jargon—starting with foundational AI models and extending to applications that are directly useful in everyday life. The focus is on ensuring that the technology serves practical purposes rather than existing as a standalone entity.

An important aspect of Sarvam’s strategy is to cater to the linguistic diversity of India, where it’s common for people to communicate in a mix of local languages, dialects, and English. Therefore, Sarvam has built solutions taking into account the importance of voice.

Small is Beautiful

The models it has been building are along two directions. First, small models for Indian languages that are efficient. For example, in comparison with Sarvam 2B, which has 2 billion parameters, the smaller version of Llama 3, the latest open source AI model released by Meta Platforms, has 8 billion parameters. And its bigger variant has 70 billion parameters.

Parameters are what the AI models are trained on. They define how the models transform inputs and provide outputs such as text completion or predictions and so on. Then there are tokens, which are discrete units of text. They can be words, sub words, characters and so on. Tokens are the data that the models take as inputs.

In terms of tokens, Sarvam’s model involved 2 trillion out of a total of 4 trillion tokens, and they trained it on 1,000 Nvidia H100 GPUs (graphics processing units) rented at Yotta’s data centre in Mumbai. Llama’s training involved 15 trillion tokens and 16,000 GPUs, according to Meta’s announcement of the release of Llama 3 in April.

Second, the focus of Sarvam’s models is on Indian languages: “They have more real estate for Indian languages within what is called a tokeniser, which is the interface of the model to the external world,” Kumar says.

“If you look at say something like Common Crawl, less than 0.1 percent of the data is in Hindi, let alone any other language in India,” Raghavan points out. Common Crawl is a non-profit organisation that provides a regularly updated archive of web data. “So, then what ends up happening is that correspondingly the token real estate given to Indian language data becomes much less,” he adds.

In the case of Sarvam 2B, about 40 percent of the tokens are Indian language tokens, he continues, which is an important reason it’s a lot more efficient with Indian languages.

If one provides more tokens in Indian languages, the model does a more efficient job of reading and writing, inputting and outputting those tokens. An important part of the workings of these AI models involves breaking down words into tokens and putting them back together to give us GenAI responses.

Kumar and Raghavan say Sarvam’s models are efficient with Indian languages. And they can be used for a variety of tasks such as in speech models, translation, colloquialising an input given and other narrow, but specific use cases that have practical value in the real world.

Another model Sarvam has developed is named Shuka, an 8-billion parameter model built on top of Llama, “where we have added an Indian language voice to it”, Kumar says. Most models are text in, text out, but the challenge is that most Indians are not going to type in Bengali or Malayalam or Oriya and so on.

Team Sarvam has found a way to “directly” send voice to Llama instead of the conventional way of first recognising it, converting it to text and then sending it to the model. This direct method has made the process six times faster, Kumar says. “This is probably a global first in terms of Indian languages at the scale of 10 languages,” he says. “So our focus is small models for Indian languages that are efficient, and second, voice-enhanced models for Indian languages.”

While the large models from the big tech companies are valuable for occasional use or proof-of-concept scenarios, Sarvam’s founders believe that smaller models can better handle the high-frequency, narrow tasks required at scale, Raghavan says.

Also read: Understanding Indian startups' mission to build AI infrastructure for all

Agents that Serve

Another principle guiding Sarvam’s development is creating ‘Agentic AI’, Raghavan says. Unlike conventional chatbots, Sarvam’s models are intended to perform specific tasks—such as booking a doctor’s appointment, for example. These agents are engineered to interact directly with an organisation or business’s systems of record rather than merely engaging in conversation. Examples of systems of record could be all the data on HR or financial transactions.

Sarvam has introduced a suite of innovative products designed for user interaction through a voice-led technology. Currently the main offering here is Sarvam Agents, a suite of tools allowing users to engage in voice-driven conversations with a bot to execute various actions.

Available in multiple channels, including WhatsApp—which is widely used in India for voice notes, as Raghavan points out—and integrated within apps, Sarvam Agents can communicate in 10 languages. These AI agents can streamline user experiences in customer support, transaction assistance, or gathering feedback on new products or following up with potential ecommerce customers who abandoned their carts and so on.

Businesses can deploy Sarvam Agents within their apps to do all this.

Sarvam is today a 40-member team, mostly comprising young graduates. The company has released something akin to its first “full stack” solutions, Kumar says. They have also released their AI models in an API (application program interface) platform that people can consume as individual building blocks. “This is also interesting, because though we can say just buy AI agents from us, we also make the individual components available,” he says. And on 10 Indian languages, including Hindi, Gujarati, Marathi, Bengali, Oriya, Punjabi, Kannada, Telugu, Malayalam and Tamil.

Then there is what is known as the “orchestration fabric”, which has been created in-house, which allows users to actually deploy the various components that Sarvam offers for their own purposes. Then there’s the application layer which is what interacts with end users on voice and WhatsApp.

One can talk to a Sarvam bot on the company’s website today. “It will try to sell you Sarvam for now, but you get a sense of what it can do,” Kumar says. In 10 languages, it can book appointments, send emails, PowerPoint decks and so on. Pricing for Sarvam Agents is set at a competitive ₹1 per minute, making it an accessible option for enterprises looking to integrate voice interaction into their services.

Another notable aspect of Sarvam’s offering is its focus on sovereignty and control. While their services are available as a SaaS (software as a service) product, Sarvam also provides an appliance option. This allows organisations to deploy the technology on-premises, giving them complete control over their data and operations. This appliance model is designed for large-scale applications where extensive control and data security are critical.

“We are working very closely with Nvidia to develop an appliance like VUE, where you can actually buy something and that powers your GenAI use cases so that you completely take aside issues of privacy etc,” Kumar says. “So, it is your data, your compute and your model.”

VUE refers to Virtual Unified Environment, a technology offered by Nvidia that uses its GPUs to provide hi-performance virtual desktops. An appliance, in the IT context, is a purpose-built hardware device or system fine-tuned for specific functions.

Raghavan and Kumar wouldn’t give specific names of customers that are evaluating Sarvam’s tech, but think Nifty 50, they say. One investor in the company adds that “the biggest FMCG companies, the largest paints business, one of the biggest real estate companies are some of the potential customers”, evaluating Sarvam’s AI.

One other product Sarvam has unveiled is tailored specifically for legal professionals, named Sarvam A1 Legal. This innovative tool leverages GenAI to streamline various tasks commonly performed by lawyers. A1 provides a comprehensive suite of features designed to simplify legal work, including research memo generation, contract drafting, and document redaction.

What sets A1 apart is its integration with up-to-date information from key Indian legal sources such as the Reserve Bank of India, the Securities and Exchange Board of India, courts and tax tribunals, Raghavan says. This ensures that lawyers have access to the latest legal data and resources. In addition to A1, Sarvam is developing similar “work benches” for other professional fields to enhance their efficiency through AI, he says.

And then the experience of having built their own language model from scratch, with their own data sets “puts us in a position to become builders of models for other companies”, Kumar says. Such customers might have proprietary data on which they could build models and Sarvam could offer its expertise in that area.

“It’s a high-end services model”, which could potentially be a lucrative source of revenue, as Sarvam seeks to scale its operations. “So that’s where we are as a company. We have put together the first set of components and looking now to scale up,” he says.

“The ‘aha’ moment for me when I first saw the demo [of Sarvam] was that given the technology and the policy angle that the company brings to bear, you could suck in all the NCERT books into the engine working with the government,” says Hemant Mohapatra, partner at Lightspeed who led the venture capital (VC) firm’s investment in Sarvam.

He was speaking alongside Raghavan at a meetup organised by the Indian unit of Antler recently. Antler is a VC firm that combines early-stage investments with a startup generator or accelerator business model. Parts of that conversation was published by Antler on YouTube on August 9.

“You could then have a QR code, which everyone in India understands, slapped on every chapter and the way a child enters the world of that chapter is by scanning that code and asking some questions,” he continues.

Some of this has been attempted and it’s not new. But the difference that a large language model (LLM) makes is that it brings to bear all the knowledge of the world that is present on the internet and offers an engaging story to the student, with rich context, rather than rote learning for tomorrow’s quiz in class. That also is possible, of course, as students have already discovered across India with ChatGPT and Google’s Gemini.

“This richer [user] experience could not be possible in the absence of the LLM,” Mohapatra adds, “which is why some of this stuff is so powerful for us.”

An Evangelist, a Revolutionary

When Raghavan decided to volunteer at the Unique ID Authority of India (UIDAI), which developed Aadhaar, he thought he would give it six months. The association has lasted for 15 years now. He worked on technologies and solutions that today we know as India’s digital public infrastructure.

Earlier on, he also worked as chief AI evangelist at EkStep Foundation, the non-profit organisation backed by Nandan Nilekani and his wife Rohini. He remains an advisor on technology to UIDAI.

In more recent years, he mentored AI4Bharat, a research lab at IIT-Madras which works on developing open source datasets, tools, models and applications for Indian languages. He also got deeply involved in Bhashini, the government of India-backed language translation technology platform project.

Meanwhile, roughly between 2016 and 2023, Kumar, a computer scientist, was building first Padhai and then AI4Bharat with Mitesh Khapra, an associate professor in the computer science engineering department of IIT-Madras. Padhai was a small for-profit venture, but also “a revolt against all these people who were charging ₹50,000 or ₹1 lakh to teach such courses”, Kumar says. “We charged ₹1,000 each and taught deep learning to 50,000 students.”

After a degree in electrical engineering from IIT-Delhi, Raghavan earned a PhD in 1993 from Carnegie Mellon University, renowned for its computer science and engineering research. He then spent 20 years in the electronics design and manufacturing industry. Along the way, in early 2009, he sold Mojave Networks, a startup he’d co-founded, to Magma Design Automation.

He returned to India in 2007 and his previous corporate tech role was as managing director, India, at Synopsys, which in 2012 acquired Magma Design. Around 2010, Raghavan had his first brush with UIDAI, and the relationship has endured.

Kumar, an IIT-Bombay electrical engineering graduate, has been a computer science researcher for over 15 years now. He is also an alumnus of ETH Zurich, a renowned university known for its research in the natural sciences and its impact on the development of science and technology. Kumar earned his PhD from ETH in computer engineering and, before Sarvam, worked as a researcher at IBM and Microsoft Research. He was also an adjunct faculty member at IIT-Madras.

Two years ago, the Nilekanis donated ₹36 crore ($4.6 million) to set up a centre of excellence at AI4Bharat, which the institute named Nilekani Centre at AI4Bharat. This allowed Khapra, Kumar and their team to build several open-source AI tools that are now also finding takers in Singapore and other countries. It also allowed them to plunge into foundation-level work.

Khapra remains head of AI4Bharat, ensuring it has continuity of its mission, while collaborating closely with Sarvam.

It was sometime in 2019 that Manish Gupta, director of Google Research India, brought Sarvam’s co-founders together, Raghavan recalls, and so, through the Covid-19 pandemic, was kindled a conversation that, four years later, would lead to the founding of the company.

One important trigger that precipitated the decision to go all in and raise venture capital to build Sarvam was the “GPT moment,” Kumar recalls. When the LLM’s maker OpenAI released GPT 3.5, it was clear that it was “too much of a step change from what we thought we would do in AI”, Kumar says.

“We didn’t see it coming,” Raghavan too, recalled in the Antler meetup. Kumar tells Forbes India: “And that meant we had to significantly scale up our efforts.” Raghavan and he were on the same page on that, and so the duo incorporated Sarvam in July 2023.

Despite the rapid evolution and growth of GenAI, overall, “we are still at the very beginning of this trend”, Raghavan says. “This technology has the potential to fundamentally change things and the way things are done and make many things that were simply not possible before, possible now.”

“There is the possibility because of these technologies that every child can actually have quality education, or every person would have access to good health care,” he says. Such applications are still a few years out, but he feels the potential is real, and therefore these technologies are transformational. They are game changers, and therefore India can’t afford to be left behind. Sarvam is showing the way, and we need many more of them.