Episode 137:

Data Collection Secrets & The Search Data Problem with Josh Wills

May 10, 2023

This week on The Data Stack Show, Eric chats with Josh Wills, an experienced data scientist with work at places like IBM, Google, Slack, DuckDB, and others. During this conversation, Josh shares his journey working at these large companies including work in data engineering, data science, and other fields. Eric and Josh also discuss high-quality data and the process to get it, auction code, usage patterns and complexities in search, and more.

Notes:

Highlights from this week’s conversation include:

  • Josh’s background in data working at Google, Slack, and other companies (1:21)
  • The need and process for high quality data (4:33)
  • Digging into auction code (14:03)
  • Joining Slack and working in the early days of the company (18:00)
  • Not fighting the last war in data (25:42)
  • Building a product, while using the product (30:35)
  • Transitioning to the search team at Slack (36:50)
  • Usage patterns of search (41:21)
  • Josh’s work in helping build DuckDB (46:20)
  • Having the right toolset to increase precision and efficiency (52:42)
  • Final thoughts and takeaways (56:03)

 

The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.

RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.