Separating Hate Speech and Offensive Language Classes via Adversarial Debiasing

Yuan, S.; Maronikolakis, A.; Schütze, H.

doi:10.18653/v1/2022.woah-1.1

Separating Hate Speech and Offensive Language Classes via Adversarial Debiasing

Yuan, S. ¹; Maronikolakis, A.; Schütze, H.
¹ Institut für Angewandte Informatik und Formale Beschreibungsverfahren (AIFB), Karlsruher Institut für Technologie (KIT)

Abstract:

Research to tackle hate speech plaguing online media has made strides in providing solutions, analyzing bias and curating data. A challenging problem is ambiguity between hate speech and offensive language, causing low performance both overall and specifically for the hate speech class. It can be argued that misclassifying actual hate speech content as merely offensive can lead to further harm against targeted groups. In our work, we mitigate this potentially harmful phenomenon by proposing an adversarial debiasing method to separate the two classes. We show that our method works for English, Arabic German and Hindi, plus in a multilingual setting, improving performance over baselines.

Zugehörige Institution(en) am KIT	Institut für Angewandte Informatik und Formale Beschreibungsverfahren (AIFB)
Publikationstyp	Proceedingsbeitrag
Publikationsjahr	2022
Sprache	Englisch
Identifikator	ISBN: 978-1-955917-84-1 KITopen-ID: 1000151691
Erschienen in	Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH) Ed.: K. Narang
Veranstaltung	6th Workshop on Online Abuse and Harms (WOAH 2022), Seattle, WA, USA, 14.07.2022
Verlag	Association for Computational Linguistics (ACL)
Seiten	1-10
Schlagwörter	Computational linguistics, De-biasing, Improving performance, Offensive languages, Online medium, Performance, Speech content, Speech
Nachgewiesen in	Dimensions OpenAlex Scopus
Globale Ziele für nachhaltige Entwicklung

KITopen-Download

Verlagsausgabe

DOI: 10.5445/IR/1000151691

Veröffentlicht am 21.10.2022

Externe Links

Originalveröffentlichung
DOI: 10.18653/v1/2022.woah-1.1

Scopus
Zitationen: 3

Dimensions
Zitationen: 2

Export

Statistiken

Seitenaufrufe: 60
seit 21.10.2022

Downloads: 35
seit 30.10.2022

Repository KITopen

Separating Hate Speech and Offensive Language Classes via Adversarial Debiasing

Abstract: