Founding Data Engineer


San Francisco
Permanent
USD200000 - USD400000
Cloud and Infrastructure
PR/592750_1782270235
Founding Data Engineer

A team I'm working with is building AI systems to improve decision-making in drug development, specifically around predicting safety risks earlier in the process.

One of the biggest challenges in the space is that teams still rely on a mix of fragmented experimental data, prior research, and manual judgment to decide whether something is safe to advance. The focus here is on bringing that into a more unified, data-driven approach.

They've put together large, complex datasets across a range of scientific domains and are using those to power models that help researchers better understand risk and underlying mechanisms.

They're looking to add a Data Engineer to help build out the core data layer. The role is pretty end-to-end-designing and scaling pipelines, building internal APIs and tooling, and turning messy raw data into clean, structured datasets that can be used for modeling and analysis. You'd also work on automating data ingestion from different sources, improving data quality and reliability, and supporting model-related workflows. There's some exposure to LLM-driven workflows as well, particularly around extracting and structuring information.

It's an early, high-impact role where the data systems you build will directly influence how the broader team operates and scales.


What you'll do

* Own and scale core data infrastructure across research, ML, and product systems
* Build and maintain data pipelines for ingesting, processing, validating, and serving large, complex datasets from multiple sources
* Develop internal platforms that connect experimental workflows, data capture, processing, and downstream usage
* Transform raw, messy data into clean, versioned, and ML-ready datasets
* Design and build APIs and data tools that make it easy for different teams to access and use data
* Work on systems that automate data ingestion, cleaning, normalization, and structuring across a variety of inputs
* Support and scale model-related workflows, including batch processing and inference pipelines
* Implement data quality systems (validation, testing, monitoring, lineage, observability) to ensure reliability
* Partner closely with domain experts to understand workflows and translate them into scalable infrastructure
* Help support internal and external data delivery, including datasets, outputs, and derived insights
* Build systems that improve speed and efficiency across the organization


What they're looking for

* Experience building and maintaining large-scale data platforms used by multiple teams
* Comfortable working with messy, unstructured, or heterogeneous datasets
* Ability to operate across backend engineering, data systems, and infrastructure
* Strong focus on data quality, correctness, and reproducibility
* Experience working with cross-functional teams and translating real-world needs into technical solutions
* Familiarity with AI/ML or LLM-related data workflows is a plus
* Comfortable operating in ambiguous, fast-moving environments
* Interested in owning critical systems and having broad impact
* Curiosity across technical and applied problem spaces


Technical background (nice to have)

* Strong Python and SQL fundamentals
* Experience with distributed systems or large-scale data processing frameworks
* Cloud infrastructure (any major provider)
* Infrastructure as code and modern deployment practices
* Data platforms (warehouses, lakes, or similar storage systems)
* Workflow orchestration tools
* Experience building APIs or internal data tooling
* Exposure to ML or LLM-related infrastructure
* Experience handling large-scale datasets


What tends to work well in this environment

* Moves quickly and takes ownership
* Strong engineering judgment and attention to detail
* Proactively identifies problems and builds solutions
* Balances speed with reliability
* Thinks in systems rather than one-off fixes
* Comfortable navigating complexity and changing requirements
* Enjoys working with technical and research-oriented teams
* Focuses on building scalable, long-term solutions
* Motivated by high-impact work in an early-stage environment

FAQs

Congratulations, we understand that taking the time to apply is a big step. When you apply, your details go directly to the consultant who is sourcing talent. Due to demand, we may not get back to all applicants that have applied. However, we always keep your resume and details on file so when we see similar roles or see skillsets that drive growth in organizations, we will always reach out to discuss opportunities.

Yes. Even if this role isn’t a perfect match, applying allows us to understand your expertise and ambitions, ensuring you're on our radar for the right opportunity when it arises.

We also work in several ways, firstly we advertise our roles available on our site, however, often due to confidentiality we may not post all. We also work with clients who are more focused on skills and understanding what is required to future-proof their business. 

That's why we recommend registering your resume so you can be considered for roles that have yet to be created. 

Yes, we help with resume and interview preparation. From customized support on how to optimize your resume to interview preparation and compensation negotiations, we advocate for you throughout your next career move.

Handpicked roles for you