You’ll be a generalist responsible for building and running large-scale data, machine learning, and agentic systems. The focus is operational ML/AI, including agentic systems and geospatial data pipelines. You should be comfortable owning the full lifecycle: from data ingestion and distributed processing to model development, deployment, and monitoring. This role requires the ability to iterate quickly from initial concept to a robust, production-ready solution.
**Key Responsibilities**
* Take ownership of the end-to-end AI/ML lifecycle, with a strong focus on dealing with complex and messy data, thorough evaluation of different approaches, and successfully deploying robust models, and handling cost vs performance tradeoffs.
* Implement and integrate large-scale, agent-based systems with access to external systems, building these solutions from the ground up and integrating them with our existing infrastructure.
* Establish observability for pipelines, models, and agents (metrics, tracing, alerting).
* Collaborate with product and customer teams to drive revenue.
* Strong experience with distributed data processing, particularly Spark and SQL.
* Proven expertise in building production machine learning systems, including working with large, wide datasets, effective training, deployment, and monitoring.
* Experience designing and deploying task-oriented AI agents and working with coding agents.
* Experience working with cloud services across data, compute, and ML.
* Strong communication abilities, including code architecture and documentation, at a level where any technical team member can troubleshoot and contribute easily.
**Languages:** Scala, Python
**Tools / Frameworks:** Spark, AWS Sagemaker / Bedrock, Kubernetes
**Nice to Haves**
* Startup experience or growing projects from 0 to production in a larger org.
* Experience with large geospatial datasets, formats, and indexing strategies.
* Experience building operational AI agents that work at scale (millions of separate, complex tasks including web research)
* Experience with fine-tuning, distilling, and self-hosting LLM models.
* Experience in traditional ML, with a focus on working with messy data and robust evaluation of model approaches.
* Proficiency with CI/CD, infrastructure as code, and containerization.
**What Success Looks Like**
* ML/AI models deployed with robust monitoring and significant customer impact.
* Agentic workflows improving internal/external operations.
* Infrastructure that is stable, observable, and automated.
* Successful iteration and delivery of new ML/AI products from concept to production.
* Ability to contribute to existing geospatial pipelines directly or through the use of AI