AI Evaluation Engineer

RyzLabs

Argentinafull time - contractPosted 10 day(s) ago$0-$0 / yr

Apply Now

$0-$0 / yr

Salary

argentina

Region

ASAP

Start Date

About RyzLabs

No description provided.

About this Role.

RYZ Labs is looking for an experienced **AI Evaluation Engineer** to join one of our clients’ teams. ### Responsibilities * Design and implement evaluation pipelines to measure the performance and reliability of AI models. * Develop automated testing frameworks to assess model outputs at scale. * Analyze model performance using both traditional statistical metrics and AI-specific evaluation methods. * Evaluate AI systems built on modern architectures such as **LLM-based applications and Retrieval-Augmented Generation (RAG)**. * Identify potential issues related to **accuracy, hallucinations, bias, safety, and model drift**. * Conduct adversarial testing to uncover vulnerabilities and ensure safe model behavior. * Collaborate with engineering and AI teams to improve prompt design, model outputs, and system performance. * Monitor model performance in production and help define best practices for AI evaluation and observability. ### Requirements * Proficiency in **Python** and experience building scripts or pipelines to evaluate model outputs. * Experience working with **AI/ML systems**, particularly **large language models (LLMs)** or generative AI applications. * Familiarity with concepts such as **prompt engineering, prompt optimization, and LLM evaluation**. * Understanding of evaluation metrics such as **precision, recall, F1-score**, and AI-specific metrics related to model quality and safety. * Experience evaluating **RAG systems or knowledge retrieval pipelines** is a plus. * Experience with modern **AI evaluation or observability tools** is a plus (e.g., DeepEval, Promptfoo, RAGAS, LangSmith, Arize, Weights & Biases). * Strong analytical mindset with the ability to interpret model behavior and propose improvements. ### Nice to Have * Experience performing **adversarial testing or red-teaming** of AI systems. * Familiarity with **AI safety, bias detection, and model alignment practices**. * Experience working in production environments deploying or monitoring AI systems.

Skills Required

Python R

Ready to Apply?

Apply Now

Similar jobs

No similar jobs found.