Abstract (englisch):
We assess ChatGPT's ability to identify and categorize actors in news media articles into different societal groups. We conducted three experiments to evaluate different models and prompting strategies. In experiment 1, testing gpt-3.5-turbo, we found that using the original codebooks created for manual content analysis is insufficient. However, combining named entity recognition with an optimized prompt (NERC pipeline) yielded an acceptable macro-averaged F1-score of .79. Experiment 2 compared gpt-3.5-turbo, gpt-4o, and gpt-4-turbo: the latter achieved the highest macro-averaged F1-score of .82 using the NERC pipeline. Challenges remained in classifying nuanced actor categories. Experiment 3 demonstrated high retest reliability for different gpt-4o releases.