Folding Data #9 Opensource AB Testing

Last week, I shared my attempt to assemble an entirely open-source modern data stack as a follow-up to Modern Data Stack for Analytics blog that includes SaaS products as well. The post spread virally and ignited a passionate discussion of what should be considered part of the open-source data stack that also highlighted some of the challenges open-source technologies face when created at big companies that may have their own agendas (e.g. PrestoDB > PrestoSQL > Trino). Huge thanks to everyone who proposed lots of promising and mature products (some are featured below) for the next revision of the blog!

Tool of the Week: GrowthBook

When I interviewed at Lyft for a data science position back in 2016, George Xing, who ran Analytics, asked me "if you had a budget of $2M to grow rides in a given city, how would you spend it?" Aside from checking out market health metrics to see whether ride growth is demand (passengers) or supply (drivers) constrained, you need to run a bunch of experiments (A/B tests) with a small budget to see what stimulus (e.g. driver bonus or passenger discount structure) results in the biggest impact on the target metric and pour the bulk of the budget into it. Companies at Lyft/Uber/Airbnb scale run thousands of experiments simultaneously to find ever better ways to improve the product and grow the business using sophisticated homegrown platforms. But what about experimentation for the rest of us?

That's why I am excited about GrowthBook – a modern open-source experimentation platform. "The top 1% of companies spend thousands of hours building their own A/B testing platforms in-house. The other 99% are left paying for expensive 3rd party SaaS tools or hacking together unmaintained open source libraries."

An Interesting Read

As much as we all love great tools, you don't always need self-serve BI or D3 interactive charts to make data-informed decisions that help save lives, as shown by the work of Florence Nightingale, a nurse during the Crimean war of the mid-XIX century, who used a couple of intuitive hand-drawn charts and managed to convince the British Army to implement better sanitation measures which ultimately saved hundreds of lives.

Proactive Data Quality at Data Engineering Podcast: Transcript Available

Last month I had a great conversation with the Data Engineering Podcast host, Tobias Macey, about proactive approaches to data quality, how to build a culture of data quality, and the $500K mistake I made as a Data Engineer that inspired me to start Datafold. If you prefer to read the words than listen (or you want to do both), this blog is for you!

Read Along to the Data Engineering Podcast

Last Chance to RSVP for the Next Data Quality Meetup

Our next Data Quality Meetup is August 26th - TOMORROW! If you haven’t already registered, now is your last reminder. Why should you join? At our meetups, we do a series of quick 7-minute lightning rounds with data leaders from top companies followed by a panel discussion. We invited data leaders from DoorDash, Truebill, AppFolio, Narrator.ai, and Evident.ly to share their insights around how to do data better.

Save my spot on August 26th 🗓

Before You Go

If it also feels to you that your stakeholders are seeing your Data team as wizards who can produce any analysis with a few magic keystrokes, this awesome pic by Emilie Schario, Director of Data and BI at Netlify and co-founder of the Locally Optimistic community will certainly resonate. 🎯