Source/Author: Lizzie Buchen (Wired)
You are how you e-mail: A new technique can tell people apart using only the timestamps in their Sent folders.
In the interactive, real-time world of Twitter, blogs and World of Warcraft, timing is one of the most salient aspects of social behavior. Now, researchers at Northwestern University and Yahoo Research in New York show that they can distinguish and categorize people based solely on the timestamps of their e-mails, paving the way for smarter advertisements, spam filters and social networking sites.
“You can’t track everything an individual is doing at every hour of the day,” said Dean Malmgren of Northwestern University, lead author of the study posted May 11 on the pre-publication physics repository, arXiv. “But this shows that with just a snapshot of what they’re doing — knowing what time they send their e-mails — you can actually get meaningful information.”
Of particular interest to Yahoo is a more effective way to catch spammers. Between 80 and 90 percent of all e-mail in the world is spam. Spam isn’t just obnoxious, it also uses up bandwidth, storage space and time. In 2009, spam may cost $42 billion in the United States and $130 billion worldwide — and that doesn’t include the money scammed from gullible internet users like Citigroup.
Spam filters and spammers are engaged in a perpetual arms race, with spammers constantly changing their domains and IP addresses and disguising dirty words. But spammers have a major limitation: In order to send their millions of e-mails, they need bots. If a temporal model of e-mail behavior can distinguish between different people, it can also distinguish people from nonpeople.
“Any novel way to identify spammers makes a huge contribution,” said Jake Hofman of Yahoo Research. “Even if you just reduce it by a small percent, that’s a big win.”
Malmgren and Hofman tested their model using data from two groups of college students: European students from a few years ago, when home internet access was rare, and American students when home internet access was much more common. They focused on how frequently the students were sending e-mails and when the e-mail sessions begun and ended.
Despite the dramatic chronological differences between these students — at least in the e-mail world — Malmgren found they fell into one of two categories: “day laborers,” who sent the bulk of their e-mails during the working day, or “e-mailaholics,” who sent e-mails from morning deep into the night.
“It was pretty amazing,” said Malmgren. “It didn’t have to be two categories. There could have been a continuum.”
The researchers also found that e-mail behavior was stable within individuals, with fewer than 20 percent of American students deviating from their e-mailer categories over two years. This stability could allow an e-mail service to recognize when an account is being commandeered by a spambot, at which point it can alert the user or freeze the account.
Hofman imagines numerous applications for analyzing time-related aspects of internet usage, beyond e-mail, and says this ability to robustly categorize people shows how powerful their model can be.
“This is just our toy demonstration,” he said. “There’s a lot of temporal data from e-mails and website visits out there, but they haven’t been leveraged for any meaningful analysis. The argument we’re making here is that these data can be a surprisingly useful source of information about individuals.”
Hofman says the technique could also allow websites to tailor their services to individuals, as the activity pattern of websites visits may be indicative of a user’s taste.
“It might turn out that I should market Blackberries and iPhones to users who visit sites more frequently, scattered throughout the day, like you and me” he said, “while I should market books and newspapers to users with lighter usage patterns, like my dad. This could influence what display or text ads I show these users when they’re on my site.”
A detailed description of activity patterns could also be useful for heavily trafficked sites, like Twitter, which could optimize how their servers allocates resources, and internet services that depend on real-time interactions, like Aardvark.
Citation: “Characterizing Individual Communication Patterns” by R. Dean Malmgren, Jake M. Hofman, Luís A. N. Amaral, and Duncan J. Watts. arXiv:0905.0106v1