Dataset: GridStratLLM: Agent Framework for Coordinated Cyberattacks on the Smart Grid with Large Language Models

Kellerer, Nicolai; Hagenmeyer, Veit

doi:10.35097/bx5337kcykte438h

Dataset: GridStratLLM: Agent Framework for Coordinated Cyberattacks on the Smart Grid with Large Language Models

Kellerer, Nicolai

¹; Hagenmeyer, Veit

¹
¹ Institut für Automation und angewandte Informatik (IAI), Karlsruher Institut für Technologie (KIT)

Abstract (englisch):

A new cybersecurity threat emerges: Recent Large Language Models (LLMs) with advanced reasoning and tool calling enable even attackers lacking expert knowledge to coordinate large-scale attacks on Smart Grids (SG).
These LLMs can orchestrate multiple malware instances, select appropriate signals and deltas, and execute data-modification attacks on the S7 and Modbus protocols.
Thereby, the automatically generated attack progresses towards the targeted unsafe state and evades detection by the Intrusion Detection System (IDS).
To assess this emerging threat, we introduce GridStratLLM, a novel agent framework for coordinated attacks on industrial networks.
Furthermore, we evaluate attack plans generated by four frontier Large Language Models using the open-source Network Security Monitor (NSM) Zeek and a commercial NSM.
Finally, we contribute a dataset recorded in a Hardware-in-the-Loop (HIL) testbed to support the training of IDS solutions against these attacks.
The dataset is 24 hours and 11 minutes long, containing 436 attacks with 212 coordinated attacks.

Externe Links

Download (RADAR4KIT)

Export

Statistiken

Seitenaufrufe: 4
seit 19.05.2026

Zugehörige Institution(en) am KIT	Institut für Automation und angewandte Informatik (IAI)
Publikationstyp	Forschungsdaten
Publikationsdatum	19.05.2026
Erstellungsdatum	15.02.2026 - 03.03.2026
Identifikator	DOI: 10.35097/bx5337kcykte438h KITopen-ID: 1000193145
HGF-Programm	46.23.02 (POF IV, LK 01) Engineering Security for Energy Systems
Embargofrist	Die Forschungsdaten sind ab dem 22.06.2026 frei zugänglich.
Lizenz	Creative Commons Namensnennung 4.0 International
Schlagwörter	Attack Plan, LLM, Smart Grid, Modbus, S7, Data Modification
Liesmich	GridStratLLM Dataset This dataset contains coordinated cyberattacks generated using the GridStratLLM agent framework against a hardware-in-the-loop testbed of a distributed generation environment. It covers one normal operation and five attack datasets, each using a different LLM. Every dataset captures network traffic, process data from SCADA, log messages, and metadata from the attack scripts. Paper: https://doi.org/10.1145/3765611.3815147 GridStratLLM source code: https://github.com/nbke/GridStratLLM Each dataset directory contains: `attack_session_llm.parquet`: LLM prompts, plans, chain-of-thought reasoning, token usage `attack_worker.parquet`: Network interface info (MAC, IP, interface name) `packet_metadata.parquet`: Packet metadata (timestamps, addresses, ports, protocol) `packets.pcap`: Raw packet capture `attack_datamod_history.parquet`: Packet modification log with delta values `attack_exec_steps.parquet`: Attack execution timeline per worker `process_data.parquet`: WinCC SCADA data `logs.parquet`: Logs from PLC 1512 and PLC 1516 See `network.json` for a list of network devices. `modbus.json` contains a mapping of Modbus registers to signal names. `s7_connections.json` contains all signals transmitted via S7. Parquet Files attack_session_llm.parquet The structure of the `plan` column is explained in appendix E of the paper. `coordinated` is true if an attack session uses multiple attack workers. The column `all_messages` may be NULL due to a data capture issue. packet_metadata.parquet The entries in packet_metadata.parquet are in the same order as packets.pcap. If the UUID of a packet (id in packet_metadataparquet) is contained in the packet_id column in attack_datamod_history.parquet, then the packet originates from an attack script. SCADA process data: `process_data.parquet` PV: Control Signals: on_off Monitor Signals: temp_air, poa_direct, wind_speed, poa_diffuse, cell_temperature, inverter_ac_power, inverter_dc_power Wind: Control Signals: blade_rotation, rotation_speed Monitor Signals: power, height, pressure, wind_speed_a, wind_speed_b, temperature_a, temperature_b Battery: Control Signals: on_off, target_power Monitor Signals: current, voltage, temperature, state_of_charge, actual_charge_power Log messages: `logs.parquet` Log messages are in German and only sent when the signal value changes. Example: `Wertänderung "SysLogDaten".Inverter_ac_power Altwert: 239,0 aktueller Wert: 20,0 CPU:SECCPU16` DuckDB File: `merged_datasets.duckdb` The combined `merged_datasets.duckdb` file contains all data from the Parquet files plus raw packet data. Differences from the Parquet files: `process_data` is split into four tables: `wind_process_data`, `pv_process_data`, `battery_process_data`, `demand_process_data` (one column per signal instead of JSON `values`). `attack_session_llm` is renamed to `attack_session`. `attack_exec_steps` is renamed to `exec_steps` and `setup_duration` is stored as an `interval` instead of a bigint. `packet_metadata` is renamed to `packets`. The packets from the PCAP files are stored in the `raw_packet` BLOB column. The `l2_flow_id`, `l3_flow_id`, and `l4_flow_id` columns are omitted, which are always null in the Parquet files. `id` columns use `uuid` type instead of `blob`. Categorical columns (`model_name`, `transport`, `state`, `kind`, etc.) use DuckDB `enum` types instead of `string`. Column name prefix for process data tables: `C_`: Control signals (commands sent to power plants) `M_`: Monitor signals (measured values sent to SCADA) Funding This research is supported in part by funding from the topic Engineering Secure Systems of the Helmholtz Association (HGF) and by KASTEL Security Research Labs (structure 46.23.02).
Art der Forschungsdaten	Dataset

Repository KITopen

Dataset: GridStratLLM: Agent Framework for Coordinated Cyberattacks on the Smart Grid with Large Language Models

Abstract (englisch):

GridStratLLM Dataset

Parquet Files

attack_session_llm.parquet

packet_metadata.parquet

SCADA process data: `process_data.parquet`

Log messages: `logs.parquet`

DuckDB File: `merged_datasets.duckdb`

Funding

Repository KITopen

Dataset: GridStratLLM: Agent Framework for Coordinated Cyberattacks on the Smart Grid with Large Language Models

Abstract (englisch):

GridStratLLM Dataset

Parquet Files

attack_session_llm.parquet

packet_metadata.parquet

SCADA process data: process_data.parquet

Log messages: logs.parquet

DuckDB File: merged_datasets.duckdb

Funding

SCADA process data: `process_data.parquet`

Log messages: `logs.parquet`

DuckDB File: `merged_datasets.duckdb`