Cloud Infrastructure Engineer

Engineering
  –  
San Francisco/Remote

What we are building

Datafold is a data observability platform that helps Data teams move faster and with higher confidence by automating data quality analysis, monitoring, and common repetitive tasks in data engineering workflow. It’s a category-defining product in a rapidly growing data domain.

As businesses increasingly rely on Analytics and Machine Learning, Data teams are facing unprecedented complexity of data and infrastructure. Lack of visibility into data quality and performance of data pipelines causes massive inefficiencies and costly business mistakes. You can read more about our vision and our view of the data ecosystem in our blog.

If you are not very familiar with the data domain, the nature of our product is similar to APM tools such as Prometheus, Datadog and New Relic: we collect and integrate metadata about our customers data infrastructure, pipelines and datasets and provide valuable insights.

Our product has been rapidly evolving and is already regularly used by over 100 data professionals from companies including Patreon and Thumbtack.

Team

We are a highly technical team with deep expertise in the Data domain: our founders built real-time telemetry systems and scaled some of the world’s largest data platforms at Autodesk and Lyft. Our company is backed by top-tier investors including YCombinator & NEA.

Now is a unique time to join our team since, on the one hand, we already have great momentum behind us –product, customers, investors, funding – but on the other hand, there is so much ahead and so many opportunities for the early team members to shape our product, technical stack, and culture.

Since the very beginning, our team has been remote & working from multiple continents and time zones. We value strong work ethics, honesty, and a growth mindset and are looking for mature and well-organized professionals who are excited about building a new and innovative product that will redefine how organizations use data.

Stack

Our stack is based on docker and is deployed in AWS. Besides our SaaS deployment, we also have isolated deployments for some customers either on AWS or GCP. Our frontend code is mostly implemented in React and TypeScript, the backend is implemented in Python using Redis, Neo4j and Postgres as backend storage.

We are looking for an experienced cloud engineer to join our team to build the definitive product in the data observability space.

What You’ll Do…

  • Instrument codebases to provide telemetry into the application
  • Develop and support deployments for clients on AWS and GCP
  • Cloud and Linux devops tooling
  • Work with our development teams on architecture and design of the deployment
  • Lead complex migrations of our product
  • Cloud security

What You Bring…

  • 3+ years of experience working as an SRE in a software engineering team
  • Experience scaling complex systems
  • 3+ years of experience with AWS, Google Cloud Platform or Terraform
  • Deep knowledge of security and best practices for cloud environments
  • Experience with modern telemetry and monitoring systems for time-series metrics and tracing

Responsibilities

  • Own the SaaS stack and support the team in evolution of the stack
  • Support our customer engineers with custom isolated deployments on either AWS or GCP
  • Write clean, maintainable code to manage our infrastructure, following best practices
  • Improve the current deployments and cloud infrastructure to reduce administration toil
  • Clearly communicate with the team. We are fully remote and work in multiple timezones
  • "Always learning" approach to your work and keeping up to date with new technology and best practices