Home
Jobs
AI Evaluation Engineer (Data Analysis & Multi-Agent Systems)

AI Evaluation Engineer (Data Analysis & Multi-Agent Systems)

Gramian Consulting Group

BrazilCONTRACTORPosted 0 day(s) ago$0-$0 / yr

Apply Now

$0-$0 / yr

Salary

brazil

Region

ASAP

Start Date

About Gramian Consulting Group

Gramian Consultancy brings together the perspective of a software engineer, the knowledge of a technical recruiter, and the vision of a business builder. This unique experience is our signature advantage to delivering top quality services in the domain of recruiting, staff augmentation, and outsourcing.

About this Role.

About Us

Gramian Consultancy is a boutique consultancy specializing in IT professional services and engineering talent solutions. With a strong background in software engineering and leadership, we help companies build high-performing teams by matching them with professionals who truly fit their needs.

Role overview

We are looking for an AI Evaluation Engineer specialized in data analysis to design benchmark tasks that simulate real-world analytical workflows.

You will create scenarios where AI systems must analyze large, messy, multi-source datasets, decompose tasks across multiple agents, and produce clear, verifiable conclusions.

Commitments Required: 8 hours per day with an overlap of 4 hours with PST.

Employment type: Contractor assignment (no medical/paid leave)

Duration of contract: 4 weeks+

Location: Bangladesh, Brazil, Colombia, Egypt, Ghana, India, Indonesia, Kenya, Nigeria,Turkey, Vietnam

Interview: take home assessment (60min)

Responsibilities

Design and develop multi-agent benchmark tasks focused on complex data analysis workflows
Create or curate realistic datasets (CSV, JSON, logs, reports, financial or operational data)
Build tasks requiring:
Cross-referencing across multiple data sources
Anomaly detection and contradiction identification
Statistical analysis and interpretation
Define task decomposition strategies across specialized sub-agents (e.g., financial, technical, operational analysis)
Develop verification logic to validate precise analytical outputs (not generic summaries)
Implement evaluation pipelines using Python and SQL
Create reproducible environments using Docker
Analyze task performance and refine for clarity, difficulty, and scoring accuracy

Requirements

5+ years of experience in data analysis or analytics-heavy roles
Strong proficiency in Python (pandas, NumPy) and SQL
Experience working with real-world, messy datasets (CSV, JSON, logs, reports)
Ability to design analytical problems with clear, verifiable answers
Solid understanding of statistics (distributions, correlations, outliers)
Familiarity with AI benchmarks or evaluation environments (e.g., SWE-bench or similar)
Hands-on experience with Docker (Dockerfiles, image builds, debugging)

Nice to Have

Experience in financial analysis, operations analytics, or risk analysis
Exposure to data pipelines or ETL workflows
Experience with data quality validation or anomaly detection systems
Familiarity with AI/ML data workflows or evaluation frameworks

Similar jobs

No similar jobs found.