Episode 161:

The Intersection of Generative AI and Data Infrastructure with Chang She of LanceDB

October 25, 2023

This week on The Data Stack Show, Eric and Kostas chat with Chang She, the CEO and Co-Founder of Eto Labs. During the episode, Chang discusses LanceDB, the history and success of Pandas, as well as the challenges of working with new technologies in the data industry. Chang shares his journey and the challenges faced in open sourcing Pandas, the need for new data infrastructure optimized for AI and ML, the future of AI and other data avenues, and more. 

Notes:

Highlights from this week’s conversation include:

  • Chang’s background and journey with Pandas (6:26)
  • The persisting challenges in data collection and preparation (10:37)
  • The resistance to change in using Python for data workflows (13:05)
  • AI hype and its impact (14:09)
  • The success and evolution of Pandas as a data framework (20:04)
  • The vision for a next-generation data infrastructure (26:48]
  • LanceDB’s file and table format (34:35)
  • Trade-Offs in Lance Format (42:45)
  • Introducing the Vector Database (46:30)
  • The split between production and serving databases (51:14)
  • The importance of unstructured data and multimodal use cases (57:01)
  • The potential of generative AI and the balance between value and hype (1:01:34)
  • Changing expectations of interacting with information systems (1:13:53)
  • Final thoughts and takeaways (1:15:32)

 

The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.

RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.