DJ Hero Review
Nissan's LandGlider Narrow track vehicles - the convergence of the car and the motorcycle
Emue and Visa Europe have been working closely over the past 18 months to develop the Visa... Anti-fraud credit card features E-Ink display
SPDY from Google's Chromium development team has achieved 55 percent faster page loading t... Google SPDY aims to make web faster
BMW has brought back the C1 as an electric-powered concept scooter called the C1-E E is for electric: The BMW C1-E concept scooter
Yes, that's supposed to be a piece of underwear. No, me neither. C-string makes your average thong look like grannypants (NSFW)
MORE TOP STORIES »
GOOD THINKING

Beyond the keywords: search engines getting smarter

By Loz Blain

22:00 May 21, 2007 PDT

Frédérique Segond, manager of parsing and semantics research at Xerox Research C...

Frédérique Segond, manager of parsing and semantics research at Xerox Research Centre Europe, who are responsible for the new FactSpotter search engine.

Keyword-based search engines are a huge compromise; think for a moment about the tricks you need to use to get a good specific result from Google. The next generation of search is contextually and lingustically smarter, thinking more like a human and able to chase the meaning of a search term through a document instead of just looking for a handful of words. Xerox's new enterprise FactSpotter engine uses smart semantic and concept parsing to deliver quality search results from huge text databases.

Text mining researchers at Xerox Innovation recently unveiled FactSpotter, a new document search engine that goes beyond conventional "keyword" search to present results based on meaning and context.

Developed in Grenoble, France, by researchers at the Xerox Research Centre Europe, the new text mining software combines a powerful linguistic engine with an interface that rewards natural language queries. Unlike traditional enterprise search tools, FactSpotter looks not only for the keywords contained in a query but also the context of the document those words contain. For example, when searching for specific information about "when did Angelina Jolie visit France", the engine will recognize and find the name string in a document, then contextualize to produce results starting with pronouns - for example "she visited France" in reference to Jolie.

The "smart" search engine can comb through almost any document regardless of the language, location, format or type; take advantage of the way humans think, speak and ask questions; and discriminate the results highlighting just a handful of relevant answers instead of returning thousands of unrelated responses.

"Our advanced search engine goes beyond today's typical 'keyword' search or current data-mining programs, which typically end up searching only 40 percent of all the documents that are relevant because the keywords are too limiting," said Frédérique Segond, manager of parsing and semantics research at XRCE. "Xerox's tool is more accurate because it delves into documents, extracting the concepts and the relationships among them. By 'understanding' the context, it returns the right information to the searcher, and it even highlights the exact location of the answer within the document."

FactSpotter is part of Xerox's ongoing intelligent document technology research that complements its growing portfolio of services-related innovations. The technology helps customers better manage data and document-intensive work processes in industries like banking, finance and legal. Xerox plans to launch FactSpotter next year as part of its Xerox Litigation Services offerings, which include electronic discovery services that primarily support legal and regulatory compliance.

Next Generation of Searching The new software goes beyond traditional search engines in several ways:
  • FactSpotter's novel interface means users can express their queries naturally instead of forcing them to adapt their questions to the logic of computers. Traditional systems, on the other hand, split a query into isolated words and return only documents that contain exactly those words.
  • Unlike traditional search engines that return the entire document forcing the user to find the relevant information manually, FactSpotter returns the specific portion of a search document that is relevant to the query.
  • FactSpotter takes into account the context of the entire document instead of just a cluster of nearby words. It introduces the concept of "relation," searching within and across sentences and paragraphs.
  • FactSpotter recognizes abstract concepts, like "people" or "building," and will retrieve all the words that fit within that category.

By analyzing the meaning of both the query and the searched document, FactSpotter can dramatically simplify and speed up time-consuming activities. For example, during the electronic discovery phase of a legal trial, FactSpotter will allow specific facts to be found quickly and easily among thousands (and often millions) of different documents. By delivering complete and relevant answers quickly and easily, FactSpotter could revolutionize the operations of data-intensive businesses such as electronic legal discovery, risk management, pharmaceutical research, competitive and market intelligence, security intelligence and fraud detection.

With smart text mining technologies like FactSpotter already in operation in enterprise document management, security and surveillance, academic applications and even spam filtering, it will be interesting to see how the technology transfers to the Web. How long until Google gains linguistic awareness? An interesting thought.

Tags
Post a Comment

Login with your gizmag account:




Or Login with Facebook:


Connect

Related Articles Email this article to a friend

Just enter your friends and your email address into the form below ...




Privacy is safe with us because we have a strict privacy policy.

Recent popular articles in Good Thinking
Recent Comments