Luc McCutcheon

lucmccutcheon.home@gmail.com | GitHub | LinkedIn | Google Scholar

Research Scientist & PhD Candidate (completing Summer 2026) specialising in Reinforcement Learning for distributed VLA post-training. Co-Founder/CTO with experience deploying full-stack AI solutions and leading technical teams. Expert in JAX/PyTorch for policy optimisation, with published research (ICRA, NeurIPS, IROS) focusing on world models and optimisation algorithms.

Experience

Research Scientist (Fixed-Term)

Cambridge Consultants — Oct 2025 – Apr 2026

  • Led the Reinforcement Learning workstream for the Unitree G1 humanoid, delivering robust policies for dynamic environments.
  • Engineered an RL fine-tuning pipeline for VLA models (GR00T 1.6) using PPO/GRPO within custom Isaac Sim environments.
  • Designed MuJoCo and Isaac Sim environments supporting curriculum and continual learning to minimise the sim-to-real gap.
  • Coupled individual foot control with MPC planners to achieve precise, stability-aware foot placement in 3D space.
  • Executed sim-to-real transfer via ONNX to deploy blind stair traversal locomotion on physical hardware.
  • Trained cooperative multi-agent RL policies for a safety-critical defence application, using Kubernetes for distributed training.

Lead Research Scientist (Part-Time)

Agile Loop — Oct 2023 – Apr 2025

  • Led a team of 10 researchers developing autonomous AI agents, driving technical recruitment and presenting to VC firms.
  • Architected agentic pipelines from scratch and steered the internal research strategy, ranging from fundamental deep learning architecture to RL and VLM LoRA fine-tuning.
  • Presented our full-stack approach to the Google Cloud team during their workshop for AI agents.

Research Scientist (Part-Time)

Agile Loop — Jun 2023 – Oct 2023

  • Pioneered RL environments for software and web-apps, using computer vision for icon recognition, and creating systems for distributed multi-task training.
  • Developed software and infrastructure allowing hybrid edge/cloud model routing based on task complexity for Lenovo POC.

Research Assistant (Volunteer)

Connected & Autonomous Vehicle Lab — Jun 2022 – Aug 2022

  • Improved exploration using Noisy Neural Networks and tackled partial observability using LSTMs and Deep Q-Learning.

Software Engineer Intern

QinetiQ — Jul 2018 – Sep 2018 & Jul 2019 – Sep 2019

  • Engineered tactical voice cryptography (C++/Python) and automated red team attacks across two placements, bridging offensive and defensive security operations.

Education

PhD Reinforcement Learning (sponsored by Veolia Nuclear Solutions)

University of Surrey — 2021 – present

  • Designed and implemented time-delay mitigating controller for Veolia Nuclear Solutions’ local-remote manipulator (DEXTER).
  • Learned a probabilistic World Model to “undelay” RL environments to provide adaptive PD gains in highly stochastic conditions.
  • Improved stability analysis through learning a Lyapunov function, using RL to generate counter-examples improving robustness.
  • Co-designed and lectured “Intelligent Vehicle Design” university course, teaching planning methods for robotic control.
  • Implemented various papers and algorithms from scratch in JAX, such as PCGrad, PPO, ReDo, CBP, VAE, ResNet and many more.
  • Implemented numerous custom RL environments, creating complex wrappers for random time delayed environments and local-remote system parallelism.

BSc Computer Science (Hons) — First Class

University of Surrey — 2018 – 2021

  • Recipient of the prestigious CyberFirst Scholarship, sponsored by QinetiQ.
  • 1st Class Degree, key modules: Artificial Intelligence (92%), Data Structures & Algorithms (86%), Computational Intelligence (83%).
  • Dissertation in time-series financial forecasting using an LSTM Neural Network with various financial indicators.
  • Key takeaways: CUDA programming, evolutionary algorithms and fundamental mathematics.

Skills

Frameworks

JAX PyTorch Gymnasium vLLM NumPy

Code

Python C++ Rust JavaScript Lua Bash

Methods

PPO GRPO LoRA VLA SAC Rainbow ResNet

Compute / Infrastructure

Unitree G1 SLURM GCP AWS Docker

Languages

English (Native) French (C1)

Publications

First Author

  • Preventing Policy Collapse in Continual Reinforcement Learning — Under Review. Implemented PPO, Continual Backpropagation (CBP), Recycling Dormant Neurons (ReDo) and a custom optimiser (CPR). Identifies parameter resets as a source of policy collapse in continual RL and improves stability through soft parameter transformations.
  • Neural Lyapunov Function Approximation with Self-Supervised Reinforcement Learning — ICRA 2025. Efficient probabilistic world model for stable RL under stochastic conditions, jointly learning a SAC control policy, world model and Lyapunov function and exploiting geometric properties for stability analysis.
  • Adaptive PD Control using Deep RL for Local-Remote Teleoperation with Stochastic Time Delays — IROS 2023. Proposed PMDC, a model-based RL framework using a learned world model ensemble to mitigate stochastic delays up to 290ms.

Co-Author

  • Meta-World+: An Improved, Standardized, RL Benchmark — ICML 2025 Workshop (Spotlight) & NeurIPS 2025. Implemented optimiser algorithms such as PCGrad and GradNorm for multi-task learning.
  • Exploring In-Context Ensemble with Video-Language Models for Low-Level Workflow Understanding — NeurIPS 2024 Workshop. Project definition and implementation of Gemini models into a novel in-context ensemble learning framework.
  • Prediction Based Decision Making for Autonomous Highway Driving — ITSC 2022. Implemented the “Noisy Networks” baseline and part of the Rainbow algorithm.

All implementations are created from scratch using PyTorch or JAX/Flax and trained on a SLURM cluster or Google Cloud, each demonstrating state-of-the-art performance against baselines on challenging benchmarks.

Reviewing

ICLR 2026 | NeurIPS 2025 | TNNLS 2025 | IROS 2025

Awards & Courses

  • Honourable Mention — (xAI) Grokathon 2026
  • Bronze — Mathematics Olympiad
  • Bronze — British Informatics Olympiad
  • Grace Hopper Award — Computer Science
  • Foundership Award — Student Enterprise
  • NVIDIA Fundamentals of Deep Learning
  • Coursera Reinforcement Learning Specialisation
  • EPQ in Mathematics

Speaking

  • Guest Speaker & Panel Discussion (Agents Workshop) — Google
  • Model-based Reinforcement Learning — UoS School of Engineering
  • JAX vs PyTorch — UoS School of Engineering
  • Model Optimisation & Compilers — Cambridge Consultants
  • Safe Reinforcement Learning — Cambridge Consultants