Shopping? Check out our latest product comparisons

New technique developed to identify authors of anonymous emails

By

March 8, 2011

Concordia University professor, Benjamin Fung, has developed an effective new technique to...

Concordia University professor, Benjamin Fung, has developed an effective new technique to determine the authorship of anonymous emails (Image: Concordia University)

There might be many harmless reasons for sending anonymous emails – confessing your undying love for someone, seeking anonymous advice, or simply playing a joke on a friend – but there are also plenty of harmful reasons – making threats against someone, distributing child pornography or sending viruses, just to name a few. While police can often use the IP address to locate where an email originated, it may be harder to nail down exactly who sent it. A team of researchers claims to have developed an effective new technique to determine the authorship of anonymous emails that can provide presentable evidence in courts of law.

In an attempt to combat the increase of cybercrimes involving anonymous emails, Benjamin Fung, a professor of Information Systems Engineering at Quebec's Concordia University and an expert in data mining, and his colleagues set about developing a novel method of authorship attribution based on techniques used in speech recognition and data mining, which involves extracting useful, previously unknown knowledge from a large volume of raw data. Their approach relies on identifying frequent patterns and unique combinations of features that recur in a suspect's emails.

The technique works by first identifying the patterns found in emails written by the subject. Any of these patterns which are also found in the emails of other subjects are then filtered out, leaving patterns that are unique to the author of the emails being analyzed. These remaining frequent patterns then constitute what the researchers call the suspect's 'write-print' – a distinctive identifier akin to a fingerprint.

"Let's say the anonymous email contains typos or grammatical mistakes, or is written entirely in lowercase letters," says Fung. "We use those special characteristics to create a write-print. Using this method, we can even determine with a high degree of accuracy who wrote a given email, and infer the gender, nationality and education level of the author."

Fung and his colleagues tested their technique by examining the Enron Email Dataset – a collection containing over 200,000 real-life emails from 158 employees of the Enron Corporation. Using a sample of 10 emails written by each of 10 subjects – 100 emails in all – they were able to identify authorship with an accuracy of 80 to 90 percent.

"Our technique was designed to provide credible evidence that can be presented in a court of law," says Fung. "For evidence to be admissible, investigators need to explain how they have reached their conclusions. Our method allows them to do this."

About the Author
Darren Quick Darren's love of technology started in primary school with a Nintendo Game & Watch Donkey Kong (still functioning) and a Commodore VIC 20 computer (not still functioning). In high school he upgraded to a 286 PC, and he's been following Moore's law ever since. This love of technology continued through a number of university courses and crappy jobs until 2008, when his interests found a home at Gizmag.   All articles by Darren Quick
Tags
11 Comments

All lower case. Grammatical or spelling errors. Oh, boy. Whoopee. This sounds like what anyone with half a brain would spot in about 30 seconds. To call this a technique is like putting lipstick on a pig. I certainly wouldn't want some prosecutor to use this technique to "provide credible evidence" against me or anyone else. And any jurist who admits this as credible evidence should have their head examined.

teeduke
9th March, 2011 @ 06:22 am PST

"There might be many harmless reasons for sending anonymous emails - confessing your undying love for someone, seeking anonymous advice, or simply playing a joke on a friend - but there are also plenty of harmful reasons - making threats against someone, distributing child pornography or sending viruses, just to name a few."...

The author of the article sort of forgot to explore the principle reason a society needs anonymous speech. The expression of ideas unpopular to those in power, who might just take action to folks questioning the way things are. It's not about love notes or sending viruses or "the children!". Even if this were to prove effective on plain text, I doubt the author of a virus will be send any amount of text to be analyzed.

Who funded this research, the PRC or Iran?

Seriously, the Federalist Papers were published anonymously. Much of political speech in repressive nations is done anonymously, to avoid being 'disappeared'.

Oi. Vey.

Venril
9th March, 2011 @ 08:08 am PST

I agree with Venril. Also, someone could easily mimic the writing style of someone they wanted to frame if they knew that these analysis techniques were being used. And WHO is going to make a database of e-mails, presumably purloined by ISPs? That itself is scary.

Privacy is a major issue, especially with governments acting they way they have been, with excuses of "terrorism" or "protecting the children". Society needs anonymous speech.

mred
9th March, 2011 @ 10:42 am PST

Any judge that would prosecute based on something like this should be fired. This is bullshit!

-Anonymous

Facebook User
9th March, 2011 @ 11:10 am PST

This is nothing new.

wsa999
9th March, 2011 @ 02:20 pm PST

It's not BS, anonymous, it's the near magical power of data mining and machine learning. Those rubbishing the effectiveness are ignoring the fact presented that out of 158 authors of the Enron e-mail collection, the algorithm identified either 8 or 9 out of ten authors correctly from just 10 e-mail samples.

The all lowercase, typos, etc. are just easy to understand examples, but given the claim also made in the article that the algorithm can infer nationality, gender, education level, etc. it's most likely looking at a far wider range of factors... specific words, average syllables per word, words per sentence, specific grammar rules followed or disobeyed like starting a sentence with a proposition, etc.

Is anyone going to suggest that if I gave them ten writing/speech samples from Sarah Palin, Barack Obama and Charlie Sheen they wouldn't be able to pick out which came from which? People have unique speaking styles and data mining can pick those out the same way our own brains can, except the algorithms can explain their decisions better.

alcalde
9th March, 2011 @ 02:37 pm PST

Here's a better idea: www.SelfDestructingEmail.com

You can't get much more anonymous than completely invisible :-)

christopher
9th March, 2011 @ 06:37 pm PST

Here's the thing... these researchers had a list in hand of em authors, and they matched the authors to their list, as expected.

But the key is, the list was in-hand. In their examples of 'need' for this technology, there is no list of authors from which to choose a culprit. There is only a wide world full of people sending ems to each other.

Now, if the spookies have already collected all of everyone's emails, then they DO have a list of authors and samples of their emails... now THAT is the scary part, isn't it?

seekertom
9th March, 2011 @ 07:51 pm PST

Hopefully they can use it to track all the automated/robot spam e-mails back to their owners and free up the web a little! Although I believe very strongly in the principals of free speach, there are always those who will abuse the rights. Our world is not perfect and until we can find effective ways to prevent the abuse, we either have to accept it or accept certain limits to freedom, I guess it all depends on what you think is the greater evil.

Hmm_OK
9th March, 2011 @ 08:42 pm PST

Any defense lawyer could knock this out of court by presenting a sampling of posts from Craigslist, picked out using the same techniques.

Any sort of horrendous writing in any language can be found on Craigslist, surely with at least hundreds of "matches" to any given person's writing style.

Facebook User
9th March, 2011 @ 09:34 pm PST

This is so funny. The accuracy is just 80% to 90% and they expect this to be used in court! Thanks for the good laugh.

With a 10 to 20% margin of error, this is a toy, not a product. I suggest the makers provide their conclusions through an anonymous email -- it would save them a lot of ridicule and shame.

sidred
16th March, 2011 @ 10:38 am PDT
Post a Comment

Login with your gizmag account:

Or Login with Facebook:


Related Articles
Looking for something? Search our 27,762 articles