A component of MITRE's Social Radar suite of technologies, Author DNA performs characterization of unknown authors along several attributes, including gender, age, and location. We perform text classification to identify the latent demographic attributes of authors, viewing each author as a set of identifying features derived from the content and metadata associated with each of the user's social media postings. These features include the sets of words and characters used in a tweet's text, as well as the user's self-description, screen name, time zone, posting times, text length, emoticons, capitalization and punctuation density, and numerous other attributes associated with a user's social activity. We build a statistical profile over these units, and compare them to known samples by measuring the similarity of these distributions. This approach is agnostic about language or writing system, making it attractive for many potential applications. Our focus has been on machine learning algorithms that can be trained quickly on very large amounts of training data. This has allowed us to leverage enormous amounts of historical data and to apply the results in near-real time to large streams of tweets.
To discuss licensing or collaboration activities, please contact MITRE's TTO.