KIT | KIT-Bibliothek | Impressum | Datenschutz

Evaluating Large Language Models in Cybersecurity Knowledge with Cisco Certificates

Keppler, Gustav ORCID iD icon 1; Kunz, Jeremy 2; Hagenmeyer, Veit ORCID iD icon 1; Elbez, Ghada ORCID iD icon 1
1 Institut für Automation und angewandte Informatik (IAI), Karlsruher Institut für Technologie (KIT)
2 Karlsruher Institut für Technologie (KIT)

Abstract:

As generative artificial intelligence evolves, understanding the capabilities in the cybersecurity domain becomes crucial. This paper examines the capability of Large Language Models (LLMs) models in solving cybersecurity certification Multiple Choice Question Answering (MCQA) exams, comparing proprietary and open-weights models. Challenges related to test-set leakage, notably on the widely used MMLU benchmark, emphasize the need for continuous validation of benchmarking results. Open-weights models, namely Mistral Large 2, Qwen 2, and Phi 3, seem to overfit the MMLU Computer Security and indicate less usability for cybersecurity knowledge tasks. The study also introduces the first visual cybersecurity MCQA benchmark, assessing the capability of Large Multimodal Models (LMMs) in interpreting and responding to visual questions. Among the tested models, the proprietary Anthropic Claude 3.5 Sonnet and OpenAI GPT-4o outperformed others in the language and vision-language setting. However, Llama 3.1 model series demonstrated significant advancement in the open-weights
domain, signaling potential parity in cybersecurity knowledge with proprietary models in the near future. ... mehr


Postprint §
DOI: 10.5445/IR/1000178633
Veröffentlicht am 03.02.2025
Cover der Publikation
Zugehörige Institution(en) am KIT Institut für Automation und angewandte Informatik (IAI)
Publikationstyp Proceedingsbeitrag
Publikationsjahr 2025
Sprache Englisch
Identifikator ISBN: 978-3-031-79006-5
ISSN: 0302-9743
KITopen-ID: 1000178633
HGF-Programm 46.23.02 (POF IV, LK 01) Engineering Security for Energy Systems
Erschienen in Secure IT Systems. Ed.: L.H. Iwaya
Veranstaltung 29th Nordic Conference on Secure IT Systems (NordSec 2024), Karlstad, Schweden, 06.11.2024 – 07.11.2024
Verlag Springer Nature Switzerland
Seiten 219-238
Serie Lecture notes in computer science ; 15396
Vorab online veröffentlicht am 29.01.2025
Schlagwörter Large Language Models (LLMs) · MCQA ·, Benchmarking · Cybersecurity · Large Multimodal Models (LMMs) ·, Visual Question Answering
Nachgewiesen in Dimensions
Scopus
OpenAlex
KIT – Die Universität in der Helmholtz-Gemeinschaft
KITopen Landing Page