|
Discriminating Gender on Twitter
May 2011
John Burger, The MITRE Corporation
John Henderson, The MITRE Corporation
George Kim, The MITRE Corporation
Guido Zarrella, The MITRE Corporation
ABSTRACT
Accurate prediction of demographic attributes from
social media and other informal online content is
valuable for marketing, personalization, and legal investigation.
This paper describes the construction of
a large, multilingual dataset labeled with gender, and
investigates statistical models for determining the
gender of uncharacterized Twitter users. We explore
several different classifier types on this dataset. We
show the degree to which classifier accuracy varies
based on tweet volumes as well as when various
kinds of profile metadata are included in the models.
We also perform a large-scale human assessment using
Amazon Mechanical Turk. Our methods significantly
out-perform both baseline models and almost
all humans on the same task.

Additional Search Keywords
n/a
|