Episode 145:

What is Synthetic Data? Featuring Omar Maher of Parallel Domain

July 5, 2023

This week on The Data Stack Show, Eric and Kostas chat with Omar Maher, the Director of Product Marketing at Parallel Domain. During the episode, the group discusses synthetic data in the context of computer vision and autonomous vehicle development. Omar shares his background in data and machine learning and explains how synthetic data can be used to generate labeled data that is fresh, clean, and useful for training and testing machine learning models. The conversation also includes the challenges of obtaining high-quality labeled data for computer vision projects, the importance of addressing edge cases, ethical implications of using synthetic data to train AI models, and more.

Notes:

Highlights from this week’s conversation include:

  • Omar’s Journey into Machine Learning and Current Work at Parallel Domain (3:25)
  • Interest in Data Analytics (6:27)
  • Challenges with Labeled Data (8:02)
  • Introduction to Synthetic Data (11:27)
  • Challenges with Real World Data (16:28)
  • Parallel Domain’s Background (19:44)
  • Improving Machine Learning Models with Synthetic Data (21:41)
  • Using Synthetic Data to Improve Performance (24:56)
  • Combining Synthetic and Real Data (27:34)
  • Pipeline for Synthetic Data Generation (29:46)
  • Simulating Realistic Environments and Sensors (32:44)
  • Building a Realistic Simulated World (35:48)
  • Complexity of Synthetic Data for Machine Learning (38:36)
  • Advancements in Gaming Industry and AI (42:27)
  • Synthetic Data Across Different Domains (46:03)
  • Ethical implications of synthetic data (48:34)
  • Final thoughts and takeaways (52:29)

The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.

RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.