Luc McCutcheon
lucmccutcheon.home@gmail.com | GitHub | LinkedIn | Google Scholar
Research Scientist & PhD Candidate (completing Summer 2026) specialising in Reinforcement Learning for distributed VLA post-training. Co-Founder/CTO with experience deploying full-stack AI solutions and leading technical teams. Expert in JAX/PyTorch for policy optimisation, with published research (ICRA, NeurIPS, IROS) focusing on world models and optimisation algorithms.
Experience
Research Scientist (Fixed-Term)
- Led the Reinforcement Learning workstream for the Unitree G1 humanoid, delivering robust policies for dynamic environments.
- Engineered an RL fine-tuning pipeline for VLA models (GR00T 1.6) using PPO/GRPO within custom Isaac Sim environments.
- Designed MuJoCo and Isaac Sim environments supporting curriculum and continual learning to minimise the sim-to-real gap.
- Coupled individual foot control with MPC planners to achieve precise, stability-aware foot placement in 3D space.
- Executed sim-to-real transfer via ONNX to deploy blind stair traversal locomotion on physical hardware.
- Trained cooperative multi-agent RL policies for a safety-critical defence application, using Kubernetes for distributed training.
Lead Research Scientist (Part-Time)
- Led a team of 10 researchers developing autonomous AI agents, driving technical recruitment and presenting to VC firms.
- Architected agentic pipelines from scratch and steered the internal research strategy, ranging from fundamental deep learning architecture to RL and VLM LoRA fine-tuning.
- Presented our full-stack approach to the Google Cloud team during their workshop for AI agents.
Research Scientist (Part-Time)
- Pioneered RL environments for software and web-apps, using computer vision for icon recognition, and creating systems for distributed multi-task training.
- Developed software and infrastructure allowing hybrid edge/cloud model routing based on task complexity for Lenovo POC.
Research Assistant (Volunteer)
- Improved exploration using Noisy Neural Networks and tackled partial observability using LSTMs and Deep Q-Learning.
Software Engineer Intern
- Engineered tactical voice cryptography (C++/Python) and automated red team attacks across two placements, bridging offensive and defensive security operations.
Education
PhD Reinforcement Learning (sponsored by Veolia Nuclear Solutions)
- Designed and implemented time-delay mitigating controller for Veolia Nuclear Solutions’ local-remote manipulator (DEXTER).
- Learned a probabilistic World Model to “undelay” RL environments to provide adaptive PD gains in highly stochastic conditions.
- Improved stability analysis through learning a Lyapunov function, using RL to generate counter-examples improving robustness.
- Co-designed and lectured “Intelligent Vehicle Design” university course, teaching planning methods for robotic control.
- Implemented various papers and algorithms from scratch in JAX, such as PCGrad, PPO, ReDo, CBP, VAE, ResNet and many more.
- Implemented numerous custom RL environments, creating complex wrappers for random time delayed environments and local-remote system parallelism.
BSc Computer Science (Hons) — First Class
- Recipient of the prestigious CyberFirst Scholarship, sponsored by QinetiQ.
- 1st Class Degree, key modules: Artificial Intelligence (92%), Data Structures & Algorithms (86%), Computational Intelligence (83%).
- Dissertation in time-series financial forecasting using an LSTM Neural Network with various financial indicators.
- Key takeaways: CUDA programming, evolutionary algorithms and fundamental mathematics.
Skills
Frameworks
JAX PyTorch Gymnasium vLLM NumPy
Code
Python C++ Rust JavaScript Lua Bash
Methods
PPO GRPO LoRA VLA SAC Rainbow ResNet
Compute / Infrastructure
Unitree G1 SLURM GCP AWS Docker
Languages
English (Native) French (C1)
Publications
Reviewing
ICLR 2026 | NeurIPS 2025 | TNNLS 2025 | IROS 2025
Awards & Courses
- Honourable Mention — (xAI) Grokathon 2026
- Bronze — Mathematics Olympiad
- Bronze — British Informatics Olympiad
- Grace Hopper Award — Computer Science
- Foundership Award — Student Enterprise
- NVIDIA Fundamentals of Deep Learning
- Coursera Reinforcement Learning Specialisation
- EPQ in Mathematics
Speaking
- Guest Speaker & Panel Discussion (Agents Workshop) — Google
- Model-based Reinforcement Learning — UoS School of Engineering
- JAX vs PyTorch — UoS School of Engineering
- Model Optimisation & Compilers — Cambridge Consultants
- Safe Reinforcement Learning — Cambridge Consultants