Episode 2:

The Importance of Data During a Global Pandemic with Utkarsh Gupta of 1mg

August 19, 2020

In this episode, Kostas Pardalis sits down with Utkarsh Gupta, senior engineer of data science at 1mg, India’s largest online healthcare platform. Together they discussed 1mg’s data infrastructure, its response to the global pandemic and how data drives their product and their business.


  • Utkarsh and 1MG’s background (1:32)
  • 1mg being based on a bedrock of data (4:25)
  • Business analytics (5:33)
  • Effects of COVID-19 pandemic on business (11:40)
  • Description of 1mg’s data stack (16:53)
  • Biggest challenges faced and managing collaboration (27:08)
  • Opinions on open source technology (40:31)


Utkarsh’s career path led away from electrical engineering to pursuing data science, doing big data consulting, algorithmic reading and mobile advertising, and to joining 1mg in 2017.

1mg is India’s largest healthcare platform with over 150 million visitors served and 25 million orders delivered in over 1,000 cities. This integrated health app incorporates pharmacies, diagnostics, digital health tools, records, reminders, and more to streamline health care and make it efficient for patients. It also assisted patients by making the costs of different brands of medicine transparent, thus helping lower the prices of certain medicines by raising awareness among the consumer of the market rates.

Data Infrastructure

With millions of users on the platform each day, Utkarsh noted that 1mg extracts and transforms a few terabytes of data everyday in their data pipeline. They collect data from multiple sources such as API logs, transactional data, third-party point of sale, marketing platforms, and more.

One unique challenge is that a lot of 1mg’s data, like prescription images, lab reports, audio messages with customer service, are unstructured and initially not computer readable. “Converting [this data] to a computer readable form is our Holy Grail,” he said.

Utkarsh notes that 1mg is completely cloud based, utilizing AWS and S3 for their data lake. “Once extraction is done and everything resides in S3, the next step is to convert all the data into a usable format and to create single tables with all the data joined from multiple source tables,” he said. “All of that happens through SQL queries on Athena.” The resulting data are dumped into S3 and pushed to RedShift and Cassandra.

Event data is very critical to 1mg, and they’ve built and in-house data collection infrastructure powered by RudderStack. Prior to that, they had only been processing events once per day in batches, but now with RudderStack, they have access to real-time event streams.

Adjusting to pandemic

As with many businesses, 1mg was impacted by the pandemic. On the positive side, they were able to help more people with digital healthcare solutions, but disruption in supply chains meant they had to adjust several data-driven models around delivery.

They had built an in-house order delivery prediction engine with deep learning that anticipated estimated delivery and arrival times for customer orders, but those models didn’t work well for constantly changing supply chains.  “Internally, the data that we captured has changed in a lot of dynamics,” he said, in response to the lockdown. “Every company’s operations were affected.”

1mg’s solution was to re-train their models with data sets from situations, comparing a set classified as normal distribution with one that had disrupted operations like localized cyclones or festivals. “Within two weeks of the lockdown starting we were ready to deploy a new revamped model for order time predictions.”

Data-driven business decisions

“Data is omnipresent,” Utkarsh remarked. It is involved in customer service, doctors talking with patients, online consultations and more. In addition to using data to estimate delivery times, data is used to inform a variety of initiatives and evaluate what has worked and what hasn’t.

The product development team uses A/B testing for all new features, like placement of objects on web pages, to the pricing of subscription plans. It helps create a single unified health repository to give a complete view of a patient’s health.

Directly related to helping their users manage health, they’ve used longitudinal data to create models of disease progression for chronic conditions and make personalized recommendations, education and interventions possible.

Marketing uses data to decide who the right person is for their email or push notifications. The sales team and supply chain team uses data for tracking inventory and fill rates for orders. “Everybody’s using it,” Utkarsh said. “The analytics team and data science team are at the core of it, understanding and maintaining the data and helping out everybody, enabling their use cases in a data-driven manner.”

The Data Stack Show is a weekly podcast powered by RudderStack. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.

RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.