When Everyone Can Mine Your Data (original) (raw)

Roelof Temmingh has a knack for stirring up trouble. The 35-year-old South African electronic engineer has fought legal battles with financial institutions, developed theoretical models for cyberterrorism and served as a technical adviser for a book about how hackers could take over the continent of Africa.

But Temmingh's latest exploit could make the most last impact. He has created a tool he calls Maltego that lets just about anybody do the kind of data mining that in the past only fraud investors, government specialists and hackers typically could do. Since Temmingh released the first commercial version of Maltego this past summer, even several national intelligence agencies have made use of the software, he says.

Temmingh's software scans open data repositories on the Web and allows users to match the results with their own data. (He calls this approach "open-source intelligence.") The data are then graphically depicted. The commercial version of Maltego lets users save these visualizations in popular data formats like XML so the information can be used by other programs.

The commercial product isn't cheap--$430 a year--but the Pretoria-based Paterva, the company that Temmingh founded as he developed Maltego, offers a watered-down version free. Law enforcement, government and intelligence agencies can apply for a 10% discount.

"There's some interest from law enforcement, and there's been some interest from intelligence," says Temmingh. "There's also been some interest from large corporations that want to visualize some of the internal data that they get."

Worried about information leaks your company? Input lists of employees from your rival companies, and Maltego can graphically depict how they might be related to your employees. It can also provide likely e-mail address, phone numbers and personal Web sites--and then use this information to add a new layers to the investigation.

The magic behind Maltego is that it parses information from all kinds of sources using three simple principles. First, any piece of information can be reduced to its most basic characteristic, such as "individual," "place" or "address." Second, every "entity" can be linked to other entities--people can be linked to addresses, for example. And third, different entities can be matched or grouped according to rules.

Although user license agreements at social networking sites like Facebook and LinkedIn prevent Paterva from offering data searches involving social networking sites, the free version still has plenty of tools to give users a head start at finding interesting and "hidden" relationships. Beyond those built into the program, users can create their own rules about groups and relationships, dubbed "transforms," and share them with other users.

Curious what's being written about your company on blogs? Try the Technorati.com transform, and parse out all the most common related tags and keywords. Or try the Spock.com transform, which queries a database billed as "the world's leading people search engine." Search yourself or your neighbors; Maltego's approach is agnostic.

To be sure, Temmingh's software is a result of his experiences as a hacker.

But Maltego is much more than that, he asserts. "When you look at hacking, it's about either getting control of something or getting information. The information part is always more interesting."

See Also:

What Your Cellphone Knows About You

How To Mine All That Customer Data

Mining MySpace

Complete Coverage--Special Report: Identity