Searching the web using text mining
What if you could get a search engine to summarize all the information found for you? Take a look at the iResearch Reporter!
When you search the Web using a traditional search engine like Google, it will present the results as a list of web pages that should (but may not) contain the search engine you are looking for.
But what if you could get the results presented as a text summary of the content found on those pages? This is what text mining is about.
The iResearch Reporter
Power Text Solutions has two demo text mining search engines out that find and summarize web content for you.
The iResearch Reporter (previously known as Sloth-Reader) functions very much as a generic search engine, and you may ask for information on any topic.
You enter your search query in the traditional manner. iResearch then takes one to two minutes to gather, digest and present its summary.
At the top of the search engine result page you get a few snippets of text pointing to more extensive extracts of text further down the page.
Then there are three alternatives for viewing: In focus (a relatively short summary based an a few text extracts), In-depth (a much longer text) and Selected Sources (a list of web pages, i.e. the sites iResearch has mined for information).
An example of a text mining summary
A search for “what is text mining”, brings up the following summary at the top:
There is a field called computational linguistics (also known as natural language processing) which is making a lot of progress in doing small subtasks in text analysis. It is relatively easy to write a program to extract phrases from an article or book that, when shown to a human reader, seem to summarize its contents. (The most frequent words and phrases in this article, minus the really common words like “the” are: text mining, information, programs, and example, which is not a bad five-word summary of its contents.) [1]
The fundamental limitations of text mining are first, that we will not be able to write programs that fully interpret text for a very long time, and second, that the information one needs is often not recorded in textual form.[1]
I distinguish between what I call “real” text mining, that discovers new pieces of knowledge, from approaches that find overall trends in textual data.[1]
The footnote points to Marti Hearst’s article What is text mining?
Other parts of the summary refers to other information sources.
News summarizing
Power Text’s other offering is NewsFeed Researcher, which processes a large number of news stories published on the same topic.
At present, the Newsfeed Researcher presents daily digests of major events in Business, Technology, U.S., World, Sports and Entertainment fetched via Google News.
There is no search field, but if you click on one of the main categories you will get a list of news stories which again will bring you to summaries of the kind described above. Since these summaries are preprocessed you do not have to wait for the summaries.
We have tried to find other free services out there, but has not succeeded so far. There are quite a few text extractor services online, i.e. sites that let you enter a URL or a block of text for analysis. These sites will give you a summary or terms that tells you what the text extract is about. They do not search the Web, however.
Cheshire3-Termine Demonstration using Medline Abstracts is an example of a vertical text mining search engine that lets you search a medical database for summaries.
For more information see Ernest Perez’ article Managing the Information Explosion in Online’s September/October issue (printed edition)
For a list of commercial software and applications, see the Wikipedia.
Recent news from Pandia
Firefox plug-in personalises search results
Pandia Weekend Wrap-up
Microsoft considers increasing its bid for Yahoo!
Coming up: Google Ocean
Interview with Kosmix, the theme oriented search site
Tap into the SEO hive mind
Top 3 sites for researching search engines
Omgili evolves, now spiders social media to answer your questions
Pandia Weekend Wrap-up April 20
Microsoft improves Live News Search
Google adds quotations to search
Link Previews from CoolIris
PicLens improves image search
Nsyght launches beta
Pandia Weekend Wrap-up April 13
Google is testing how to use web site search forms























