Founding RL Researcher (San Francisco Bay Area) Job at Lanturn, San Francisco, CA

RFh6NnhXSlJISUtBNmhaenFPVVVBcHBaamc9PQ==
  • Lanturn
  • San Francisco, CA

Job Description

Founding Research Scientist (Long-Horizon RL) at Lanturn

Location: San Francisco (preferred) / Remote (US)

Compensation: $300K base + 0.5–1% equity

Type: Full-time · Founding Team

At Lanturn, we are building the next generation of reinforcement learning systems for real-world agents. Our focus is on enabling AI systems to learn from behavioral data and long-horizon workflows, through:

  • High-fidelity RL environments
  • Synthetic data generation
  • Closed-loop training systems

We are looking for a Founding RL Researcher to push the frontier of:

  • Long-horizon RL
  • Environment design
  • Post-training for agents

About us:

Lanturn is building the end-to-end behavioural learning stack for AI systems. We believe current approaches to RL and post-training are limited by short-horizon optimisation, weak or proxy reward signals, and a lack of grounded environments. Our approach is to build closed-loop RL systems where environments, data, training, and evaluation are tightly integrated and based on real-world behavioral data.

The role:

As a Founding RL Researcher, you will lead efforts to develop novel reinforcement learning algorithms and environments for training autonomous agents. You will work across:

  • Algorithm design
  • Environment modelling
  • Training systems
  • Evaluation frameworks

This role sits at the intersection of:

  • Frontier Labs-style RL research (environments + algorithms)
  • Modern LLM post-training (RLHF, preference optimisation)

Key responsibilities:

  • Design and implement RL systems for long-horizon tasks (10–100+ steps)
  • Develop and extend modern post-training methods:
  • PPO, DPO, ORPO
  • GRPO / GRPO++ and ranking-based optimization methods
  • Build RL environments grounded in real-world workflows
  • Work on meta-RL and adaptive learning systems:
  • Generalization across tasks
  • Rapid adaptation to new environments
  • Design reward systems for:
  • Behavioural correctness
  • Efficiency and robustness
  • Develop evaluation frameworks aligned with real-world outcomes
  • Collaborate with engineering teams to scale training systems

Ideal candidate:

You are a researcher with strong theoretical grounding and real-world system intuition, capable of working on open-ended problems in RL. You thrive in environments where:

  • Problems are not well-defined
  • Systems must be built from first principles
  • Research directly translates into deployed systems

Minimum qualifications:

  • Experience at a top-tier AI lab or company: OpenAI, DeepMind, Anthropic, FAIR, or equivalent
  • Strong background in reinforcement learning and post-training systems
  • Experience training large-scale models (LLMs or similar)
  • Strong programming skills (Python, PyTorch/JAX)

Preferred qualifications:

  • Experience with long-horizon RL or sequential decision-making systems
  • Experience designing or working with RL environments
  • Familiarity with: Preference optimization (DPO, ORPO), RLHF pipelines, and automated RL env generation
  • Experience with meta-RL / adaptive learning systems
  • Strong publication record in top-tier ML conferences

Core technical skills:

  • Deep understanding of: Policy gradient methods (PPO and beyond), KL-regularized optimization, and credit assignment in long-horizon settings
  • Experience with: Cascading RL pipelines (SFT → RL → evaluation), distributed training systems, and stability and scaling challenges
  • Strong intuition for: Exploration vs exploitation, reward shaping vs reward learning, and trajectory-level optimization

What makes this role unique ?

  • Focus on long-horizon behavioral learning, not short-form RLHF
  • Treats environment design and generation as a first-class problem
  • Opportunity to define GRPO++-style next-generation algorithms and publish to NeurIPS

Why join Lanturn ?

  • Founding ownership (0.5–1% equity)
  • Work on unsolved problems in RL and agent systems
  • High autonomy and research freedom
  • Direct impact on how real-world AI systems are trained
  • Work with second time founders directly who have worked with various big tech companies and enterprises.

If you’ve worked on RL at a top lab or have had production RL experience and want to push beyond current paradigms into real-world, long-horizon intelligence, this is your opportunity.

Job Tags

Full time, Part time

Similar Jobs

Ascension

Registered Nurse Clinical Educator Job at Ascension

 ...Francis Department/Speciality: Pediatrics Education Schedule: Day shift | PRN schedule...  ...educational programs that empower our nursing staff to deliver the highest standard of...  ...pediatric patient care. Partner with clinical leadership to identify knowledge gaps, ensuring... 

McLaren Medical Group

Student Nurse Intern II Job at McLaren Medical Group

Student Barbara Ann Karmanos Cancer Cn Requisition #: 25001713 Schedule: Per Diem Daily Work Times: Variable Hours Per Pay Period: 8 On Call: No Weekends: Yes McLaren Medical Group

Custom Filters Direct

Robotics & Automation Engineer Job at Custom Filters Direct

 ...Job Description Job Description Job Title: Robotics & Automation Engineer Company: Custom Filters Direct Location: Bloomsburg, PA Are you a hands-on Robotics & Automation Engineer who can troubleshoot, program, and keep systems running in real time? Custom... 

ElderCare.com

URGENT: Elder Care Provider Wanted - Elder Care Provider Needed In New Washoe City, Nv Competitive Pay At $12.00 Per Hour Job at ElderCare.com

 ...will enhance the quality of life for our family member, ensuring their comfort and dignity while promoting independence. We pay $12 per hour for your valuable services, and we are looking for someone who is patient, reliable, and has a genuine interest in providing care... 

adscale

PPC Account Manager Job at adscale

 ...owners build better digital ad campaigns with AI targeting, creative, and optimization. The opportunity: AdScale is looking for a PPC account manager to join our growing CSM department! In this role, you will manage AdScale accounts for our premium customers, and...