Evaluating Computational Approaches for Harmful Content Analysis: Promise, Pitfalls, and Tools for Responsible Research

Evaluating Computational Approaches for Harmful Content Analysis: Promise, Pitfalls, and Tools for Responsible Research

Itai Himelboim & Mudit Baid (UGA graduate student), “Evaluating Computational Approaches for Harmful Content Analysis: Promise, Pitfalls, and Tools for Responsible Research,” accepted for publication in Big Data and Cognitive Computing. Abstract: This manuscript develops and demonstrates a practical framework for evaluating automated classifiers used in communication research, using harmful language detection as an illustrative case. We combine (a) a structured review of documentation practices for 27 publicly available classifiers and their associated annotation processes with (b) a cross-dataset evaluation that retests each model beyond its original training context. Across 27 datasets, we extract and compare reporting on construct definitions, annotator instructions, and inter-annotator agreement, and we quantify generalization by applying each model to multiple out-of-domain test sets. We also benchmark a contemporary large language model (GPT-5) under a consistent prompting protocol to illustrate how LLM-based classification compares to fine-tuned classifiers. Results show that documentation is uneven and often insufficient for theory-driven measurement, inter-annotator agreement varies widely across datasets, and cross-dataset performance frequently drops substantially relative to within-dataset evaluations. Building on these findings and existing validation guidance, we provide a reusable checklist and decision flow to help researchers select, justify, and report classifier-based measures in ways that support transparency and cumulative science. Recommendations for researchers, reviewers, and journal editors stress aligning model selection with standards of validity, reliability, and transparency.

Related Research