Advancing the use of information compression distances in authorship attribution.


Muñoz, S. P., Oliva, C., Lago-Fernández, L. F., & Arroyo, D.


Detecting unreliable information in social media is an open challenge, in part as a result of the difficulty to associate a piece of information to known and trustworthy actors. The identification of the origin of sources can help society deal with unverified, incomplete, or even false information. In this work we tackle the problem of associating a piece of information to a certain politician. The use of inaccurate information is of great relevance in the case of politicians, since it affects social perception and voting behavior. Moreover, misquotation can be weaponized to hinder adversary reputation. We consider the task of applying a compression-based metric to conduct authorship attribution in social media, namely in Twitter. In specific, we leverage the Normalized Compression Distance (NCD) to compare an author’s text with other authors’ texts. We show that this methodology performs well, obtaining 80.3% accuracy in a scenario with 6 different politicians.


doi = {10.1007/978-3-031-18253-2_8},
url = {},
year = {2022},
publisher = {Springer International Publishing},
pages = {114–122},
author = {Santiago Palmero Mu{\~{n}}oz and Christian Oliva and Luis F. Lago-Fern{\'{a}}ndez and David Arroyo},
title = {Advancing the~Use of~Information Compression Distances in~Authorship Attribution},
booktitle = {Disinformation in Open Online Media}