Searching the web using text mining

searchWhat if you could get a search engine to summarize all the information found for you? Take a look at the iResearch Reporter!

When you search the Web using a traditional search engine like Google, it will present the results as a list of web pages that should (but may not) contain the search engine you are looking for.

But what if you could get the results presented as a text summary of the content found on those pages? This is what text mining is about.

The iResearch Reporter

Power Text Solutions has two demo text mining search engines out that find and summarize web content for you.

The iResearch Reporter (previously known as Sloth-Reader) functions very much as a generic search engine, and you may ask for information on any topic.

You enter your search query in the traditional manner. iResearch then takes one to two minutes to gather, digest and present its summary.

At the top of the search engine result page you get a few snippets of text pointing to more extensive extracts of text further down the page.

Then there are three alternatives for viewing: In focus (a relatively short summary based an a few text extracts), In-depth (a much longer text) and Selected Sources (a list of web pages, i.e. the sites iResearch has mined for information).

An example of a text mining summary

A search for “what is text mining”, brings up the following summary at the top:

There is a field called computational linguistics (also known as natural language processing) which is making a lot of progress in doing small subtasks in text analysis. It is relatively easy to write a program to extract phrases from an article or book that, when shown to a human reader, seem to summarize its contents. (The most frequent words and phrases in this article, minus the really common words like “the” are: text mining, information, programs, and example, which is not a bad five-word summary of its contents.) [1]

The fundamental limitations of text mining are first, that we will not be able to write programs that fully interpret text for a very long time, and second, that the information one needs is often not recorded in textual form.[1]

I distinguish between what I call “real” text mining, that discovers new pieces of knowledge, from approaches that find overall trends in textual data.[1]

The footnote points to Marti Hearst’s article What is text mining?

Other parts of the summary refers to other information sources.

News summarizing

Power Text’s other offering is NewsFeed Researcher, which processes a large number of news stories published on the same topic.

At present, the Newsfeed Researcher presents daily digests of major events in Business, Technology, U.S., World, Sports and Entertainment fetched via Google News.

There is no search field, but if you click on one of the main categories you will get a list of news stories which again will bring you to summaries of the kind described above. Since these summaries are preprocessed you do not have to wait for the summaries.

We have tried to find other free services out there, but has not succeeded so far. There are quite a few text extractor services online, i.e. sites that let you enter a URL or a block of text for analysis. These sites will give you a summary or terms that tells you what the text extract is about. They do not search the Web, however.

Cheshire3-Termine Demonstration using Medline Abstracts is an example of a vertical text mining search engine that lets you search a medical database for summaries.

For more information see Ernest Perez’ article Managing the Information Explosion in Online’s September/October issue (printed edition)
For a list of commercial software and applications, see the Wikipedia.

Share and Enjoy: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Digg
  • Sphinn
  • del.icio.us
  • Facebook
  • Mixx
  • Google
  • blogmarks
  • Blue Dot
  • Bumpzee
  • Furl
  • Ma.gnolia
  • MisterWong
  • Propeller
  • Reddit
  • Simpy
  • StumbleUpon
  • TwitThis
  • Wikio
  • YahooMyWeb
  • BlinkList
  • NewsVine