Buy not build

Which comma do you erase on the whiteboard above?

This dilemma is very common in engineering, and can be particularly challenging for choice in data infrastructure.

One of the costliest mistakes a team can make is attempting to build something in-house that can be bought off the shelf: databases, BI tools, data integration connectors – you name it. If tempted (or pushed by your Engineering partners to do so) – think:

How confident are you that you can do better than a vendor with a strong product vision and a solid technical team who has been working heads-down on the problem for 3-7 years?

Are you willing to wait X number of months until you have the team in place to build it and the solution produced is reliable enough?

Would the internal team create more value for your business re-creating something that your competitors buy off-the-shelf versus if they worked on your actual customer-facing product?

“Paying $50K for a tool to pump data between databases?!” But the problem is that it’s not that intuitive to take all expenses into account:

  • Cost of labor, including hiring, management & other overhead (multiply the salary by 2.7)
  • Opportunity cost of time & resources – the cost of not building something else your business needs

The pressure from Engineering to build data infrastructure can be immense (it’s interesting, it’s a hard problem, you can open-source it etc.). My experience says: buy what can be purchased and get your engineers excited about building higher-value-add products such as building user-facing features and automating decision making with ML – something that really differentiates your business.

Counterintuitively, “throwing money into the problem” – buying tools and infrastructure from a vendor – is almost always ultimately a cheaper, faster, and often a more effective solution than building it in-house.

As Nelson Auner neatly puts it in Building Analytical stack in 2020 in the context of buying vs. building data integration solutions:

From the engineering side, you may get “Don’t waste money - we could do this ourselves, it is so easy”. Be prepared to ask if any of your well-intentioned teammates:
  1. Have actually implemented and maintained a data pipeline for several years
  2. Are personally volunteering to do so, for you [the internal customer], in a timely manner
  3. Are excited to be on-call 24⁄7 to fix issues

That is not to say you can’t shoot yourself in the foot by picking a bad vendor: there are plenty of ineffective and expensive solutions with aggressive sales teams out there, so choose wisely.

And stay tuned for the upcoming posts about frameworks for choosing data infrastructure vendors!


Datafold is the fastest way to validate dbt model changes during development, deployment & migrations. Datafold allows data engineers to audit their work in minutes without writing tests or custom queries. Integrated into CI, Datafold enables data teams to deploy with full confidence, ship faster, and leave tedious QA and firefighting behind.

Datafold is the fastest way to test dbt code changes