
Website RelationalAI
Before you apply: Here is an interview Q&A for you: Click here
NOTE: Here is why some companies may not hire you.
Hey!! Update Your CV Like a Pro. HERE are Tips from an Experienced Recruiter
Machine Learning Systems Engineer at RelationalAI, Remote (Global)
As a Machine Learning Engineer, you will contribute directly to our machine learning infrastructure, to the ScalarLM open source codebase, and build large-scale language model applications on top of it. You’ll operate at the intersection of high-performance computing, distributed systems, and cutting-edge machine learning research, developing the fundamental infrastructure that enables researchers and organizations worldwide to train and deploy large language models at scale.
This is an opportunity to take on technically demanding projects, contribute to foundational systems, and help shape the next generation of intelligent computing.
You Will:
- Contribute code and performance improvements to the open source project.
- Develop and optimize distributed training algorithms for large language models.
- Implement high-performance inference engines and optimization techniques.
- Work on integration between vLLM, Megatron-LM, and HuggingFace ecosystems.
- Build tools for seamless model training, fine-tuning, and deployment.
- Optimize performance of advanced GPU architectures.
- Collaborate with the open source community on feature development and bug fixes.
- Research and implement new techniques for self-improving AI agents.
Who You Are
Technical Skills:
- Programming Languages: Proficiency in both C/C++ and Python
- High Performance Computing: Deep understanding of HPC concepts, including:
- MPI (Message Passing Interface) programming and optimization
- Bulk Synchronous Parallel (BSP) computing models
- Multi-GPU and multi-node distributed computing
- CUDA/ROCm programming experience preferred
Machine Learning Foundations:
- Solid understanding of gradient descent and backpropagation algorithms
- Experience with transformer architectures and the ability to explain their mechanics
- Knowledge of deep learning training and its applications
- Understanding of distributed training techniques (data parallelism, model parallelism, pipeline parallelism, large batch training, optimization)
Research and Development
- Publications: Experience with machine learning research and publications preferred
- Research Skills: Ability to read, understand, and implement techniques from recent ML research papers
- Open Source: Demonstrated commitment to open source development and community collaboration
Experience
- 3+ years of experience in machine learning engineering or research.
- Experience with large-scale distributed training frameworks (Megatron-LM, DeepSpeed, FairScale, etc.).
- Familiarity with inference optimization frameworks (vLLM, TensorRT, etc.).
- Experience with containerization (Docker, Kubernetes) and cluster management.
- Background in systems programming and performance optimization.
Bonus points if:
- PhD or MS in Computer Science, Computer Engineering, Machine Learning, or related field.
- Experience with SLURM, Kubernetes, or other cluster orchestration systems.
- Knowledge of mixed precision training, data parallel training, and scaling laws.
- Experience with transformer architecture, pytorch, decoding algorithms.
- Familiarity with high performance GPU programming ecosystem.
- Previous contributions to major open source ML projects.
- Experience with MLOps and model deployment at scale.
- Understanding of modern attention mechanisms (multi-head attention, grouped query attention, etc.).
Why RelationalAI
RelationalAI is committed to an open, transparent, and inclusive workplace. We value the unique backgrounds of our team. We are driven by curiosity, value innovation, and help each other to succeed and to grow. We take the well-being of our colleagues seriously, and offer flexible working hours so each individual can find a healthy balance that affords them a productive, happy life wherever they choose to live.
Global Benefits at RelationalAI
At RelationalAI, we believe that people do their best work when they feel supported, empowered, and balanced. Our benefits prioritize well-being, flexibility, and growth, ensuring you have the resources to thrive both professionally and personally.
- We are all owners in the company and reward you with a competitive salary and equity.
- Work from anywhere in the world.
- Comprehensive benefits coverage, including global mental health support
- Open PTO – Take the time you need, when you need it.
- Company Holidays, Your Regional Holidays, and RAI Holidays—where we take one Monday off each month, followed by a week without recurring meetings, giving you the time and space to recharge.
- Paid parental leave – Supporting new parents as they grow their families.
- We invest in your learning & development
- Regular team offsites and global events – Building strong connections while working remotely through team offsites and global events that bring everyone together.
- A culture of transparency & knowledge-sharing – Open communication through team standups, fireside chats, and open meetings.
Country Hiring Guidelines:
RelationalAI hires around the world. All of our roles are remote; however, some locations might carry specific eligibility requirements.
Because of this, understanding location & visa support helps us better prepare to onboard our colleagues.
Our People Operations team can help answer any questions about location after starting the recruitment process.
Privacy Policy: EU residents applying for positions at RelationalAI can see our Privacy Policy here.
California residents applying for positions at RelationalAI can see our Privacy Policy here
RelationalAI is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, color, gender identity or expression, marital status, national origin, disability, protected veteran status, race, religion, pregnancy, sexual orientation, or any other characteristic protected by applicable laws, regulations and ordinances.
To apply for this job please visit job-boards.greenhouse.io.