
$0-$0 / yr
Salary
colombia
Region
ASAP
Start Date
Gramian Consultancy brings together the perspective of a software engineer, the knowledge of a technical recruiter, and the vision of a business builder. This unique experience is our signature advantage to delivering top quality services in the domain of recruiting, staff augmentation, and outsourcing.
Gramian Consultancy is a boutique consultancy specializing in IT professional services and engineering talent solutions. With a strong background in software engineering and leadership, we help companies build high-performing teams by matching them with professionals who truly fit their needs.
Role Overview
We are looking for experienced CUDA Developers to work on advanced AI and machine learning initiatives focused on improving the capabilities of large language models (LLMs). In this role, you will solve complex GPU programming challenges, optimize high-performance CUDA workloads, review AI-generated code, and contribute to the development of more capable AI systems.
Duration: 3 months
Commitment: 40h/week, 4h/day overlap with PST
Model: Contract, time and material
Location: 100% Remote: Bangladesh, Brazil, Colombia, Egypt, Ghana, India, Pakistan, Indonesia, Kenya, Nigeria, Turkey, Vietnam
Interview: 1 technical interview
Key Responsibilities
Solve advanced CUDA and GPU programming problems involving parallel computing and performance optimization
Review, evaluate, and improve AI-generated CUDA, C++, and Python code
Optimize GPU kernels for throughput, latency, memory efficiency, and resource utilization
Work with CUDA libraries and frameworks such as Thrust, cuBLAS, and cuDNN
Debug and resolve issues related to CUDA kernels, synchronization, and memory management
Develop high-quality technical prompts, solutions, explanations, and evaluations for AI model training
Collaborate with AI researchers, engineers, and evaluation teams
Stay up to date with the latest developments in CUDA, GPU architectures, and performance optimization techniques
Requirements
5+ years of professional software development experience with strong focus on CUDA development
Strong proficiency in C/C++
Strong hands-on experience with Python and scientific computing ecosystems
Experience working with PyTorch and NumPy
Experience with CUDA 12.3 or newer
Strong understanding of GPU programming, parallel computing, and performance optimization
Experience optimizing workloads for high-performance execution and efficient resource utilization
Experience with CUDA libraries such as Thrust, cuBLAS, and cuDNN