Dynamic Reinforcement Learning Firewall with Adaptive Threat Detection

Techs: Python, PHP, JavaScript, FastIron, Linux (Mint / Fedora), Laravel, Vue.js, Docker, Ruckus ICX 7150 Switch, Dell PowerEdge R730, Laptop, PyTorch, TensorFlow, Netmiko, Paramiko, Vite, Nginx, Git.
Department: Computer Science
MS Team URL: URL not found

The Dynamic Reinforcement Learning Firewall is an intelligent, adaptive network security system designed to overcome the limitations of static, rule-based packet filtering. It utilizes a Deep Q-Network (DQN) agent written in Python that monitors out-of-band network traffic mirrored from a physical switch. The AI agent extracts flow features and evaluates packet behavior in real time, dynamically learning optimal mitigation actions without acting as an inline network bottleneck. It operates through a 3-tier architecture: the Python agent handles remote enforcement by dynamically provisioning FastIron Access Control Lists (ACLs) to a Ruckus ICX 7150 switch via secure SSH and RESTCONF interfaces ; a Laravel backend manages threat logging and event broadcasting ; and a Vue.js frontend dashboard provides real-time threat visualization for administrators.

Objectives

To develop a Dynamic Reinforcement Learning Firewall using a Deep Q-Network for adaptive, real-time network threat detection and mitigation.

Socio-Economic Benefit

This project minimizes administrative overhead by automating the maintenance of complex firewall rulesets, transitioning network security from a reactive, manual process to a proactive, automated defense mechanism. By operating out-of-band, the system prevents network bottlenecks and inspects traffic with zero latency as traffic volume scales, ensuring enterprise performance standards are maintained without degrading physical network throughput. Economically, this protects enterprise assets by significantly reducing the window of vulnerability against novel cyber threats, polymorphic malware, and zero-day attacks. Ultimately, it serves as a foundational step toward a fully autonomous, high-throughput network security infrastructure that reduces the manual labor burden on security analysts.

Methodologies

The development followed an iterative, Agile-based methodology to accommodate the experimental nature of training a Reinforcement Learning model alongside complex hardware integration. The core methodology utilizes a Deep Q-Network (DQN) out-of-band filtering algorithm. The system captures mirrored incoming network packets in real-time from a switch capture port. The Python-based AI agent extracts state features (such as Source IP, Destination IP, Protocol, and Length) to define the current state for the RL environment. Decision-making employs Epsilon-Greedy action selection, allowing the agent to balance the exploration of random actions and the exploitation of optimal actions based on the Q-Network. Upon detecting an anomaly, the agent enforces mitigation through a Rule Manager that authenticates via Ed25519 SSH keys and dynamically provisions FastIron Access Control Lists (ACLs) to the hardware switch. The agent continuously updates its neural network weights by observing a reward system based on the accuracy of its threat mitigation, storing transitions in a Replay Memory, and periodically performing Gradient Descent. The entire architecture is structured across three containerized Docker layers: a Python agent for DQN decision-making, a Laravel backend for threat logging and WebSockets event broadcasting, and a Vue.js frontend for real-time monitoring.

Outcome

The project successfully delivers a proactive, self-optimizing network security solution that enhances automated threat mitigation. It proves the viability of using reinforcement learning to autonomously automate enterprise hardware rule management and mitigate zero-day network anomalies. By integrating a Deep Q-Network with a Ruckus ICX 7150 switch via SSH and RESTCONF, the system sniffs mirrored traffic and provisions dynamic ACLs in real time without creating inline network bottlenecks, thereby maintaining maximum network throughput. Ultimately, the project successfully establishes a robust, containerized 3-tier architecture and demonstrates a clear paradigm shift from traditional static software firewalls to an out-of-band, AI-driven hardware defense mechanism.