Turkish Aerospace Industries

AI Dogfight Simulation

AI Engineer
Apr 2024 – May 2025

Led development of deep reinforcement learning systems for autonomous flight combat simulations, achieving 85%+ win rate against human pilots in dogfight scenarios.

85%+

Win Rate vs Humans

5+

Engineers Led

60Hz

Control Frequency

<5ms

Inference Latency

The Challenge

Training fighter pilots is extraordinarily expensive and risky. Simulation-based training helps, but current AI opponents follow predictable patterns that experienced pilots quickly learn to exploit. The defense industry needed AI adversaries that could adapt, learn, and challenge human pilots in realistic combat scenarios.

Key Challenges

  • Creating AI that exhibits unpredictable, human-like tactical behavior
  • Handling the high-dimensional continuous action space of aircraft control
  • Ensuring real-time decision making at 60+ Hz control frequency
  • Developing AI that can generalize across different aircraft and scenarios
  • Balancing between aggressive tactics and survivability

The Solution

I led a team of 5 engineers to develop a multi-agent deep reinforcement learning system using hierarchical policies. The high-level tactical agent makes strategic decisions (engage, evade, pursue), while low-level controllers handle aircraft maneuvering. This separation allowed for both sophisticated tactics and precise control.

1

Implemented Soft Actor-Critic (SAC) algorithm for continuous control with entropy regularization

2

Designed hierarchical policy architecture separating tactical decisions from flight control

3

Built custom simulation environment with realistic flight dynamics and weapon systems

4

Developed curriculum learning pipeline progressing from basic maneuvers to full combat

5

Created self-play training regime for continuous improvement against evolving opponents

System Architecture

The system uses a hierarchical multi-agent architecture with centralized training and decentralized execution, enabling complex tactical behavior while maintaining real-time performance.

Tactical Decision Network

Transformer-based policy network processing situational awareness data. Outputs high-level tactical primitives (attack, defend, disengage) with associated parameters.

Maneuver Controller

MLP-based low-level controller translating tactical commands into aircraft control inputs (pitch, roll, yaw, throttle). Trained via imitation learning on expert trajectories.

Situational Awareness Module

Processes sensor data (radar, RWR, visual) into a unified state representation. Handles uncertainty and sensor limitations realistically.

Opponent Modeling

Online learning component that builds models of opponent behavior during engagement. Enables adaptive counter-tactics.

Training Infrastructure

Distributed training across 64 GPU workers with synchronized parameter updates. Custom simulation parallelization for high throughput.

Key Implementation Details

Hierarchical Policy Design

The two-level hierarchy was crucial for success. The tactical network operates at 2Hz, making strategic decisions based on the big picture. The maneuver controller runs at 60Hz, executing precise aircraft control. This separation mirrors how human pilots think—strategy at one level, reflexive flying at another.

Curriculum Learning for Combat

Training an agent from scratch on full combat is impractical—the reward signal is too sparse. I designed a curriculum starting with basic flight (maintain altitude, heading), progressing to pursuit (track target), then to weapons employment (valid firing solutions), and finally full engagement. Each stage bootstrapped from the previous.

Self-Play with Diversity

Pure self-play tends to converge to narrow strategies. I introduced population-based training where multiple agent variants train simultaneously, maintaining strategic diversity. A league system ensured agents couldn't over-specialize against specific opponents.

Real-Time Inference Optimization

Combat requires split-second decisions. I optimized the inference pipeline using ONNX Runtime with TensorRT backend, achieving sub-5ms inference latency. Careful attention to memory allocation eliminated garbage collection pauses during combat.

Tech Stack

Reinforcement Learning

PyTorchStable-Baselines3SACPPOSelf-Play

Simulation

JSBSimCustom Physics EngineOpenGL

Optimization

ONNX RuntimeTensorRTCUDARay/RLlib

Infrastructure

KubernetesMLflowWeights & BiasesNVIDIA DGX

Key Learnings

Hierarchical policies are essential for complex control problems with multiple time scales

Curriculum learning transforms impossible problems into tractable ones

Self-play requires careful diversity maintenance to avoid strategy collapse

Defense applications demand rigorous testing—AI must fail gracefully when uncertain

Interested in Similar Solutions?

Let's discuss how I can help bring your AI ideas to production.

Get in Touch