AKos
Consulting & Solutions GmbH
a chemoinformatics company

 

 

AKos
Consulting & Solutions GmbH
a chemoinformatics company

 

Home Company Info Impressum Contact Email

 

Miner3D WebPlayer

Claim:

The WebMiner3D is a project and idea to increase the efficiency of searches in the Internet. The claim of this idea encompasses clustering of hits during the search, post-processing of search results with mathematical and visualization techniques, and the underlying clustering techniques, like giving documents a similarity number.

Background:

The Internet search engines can increase the relevance of the search hits. Nevertheless, there will be more and more relevant hits. These need to be processed to offer the searcher a manageable small number of answers.

Method:

The idea is to cluster the hits. This is a data reduction method. The searcher has to look only at one answer of a cluster and can then decide if these are interesting or uninteresting hits.

Clustering Method:

In order to cluster, one needs descriptors for each hit. The descriptors should be able to distinguish the hits. If I want to distinguish Japanese, American and European cars, the number of wheels is a bad descriptor, but the ratio of height to length to width makes it possible to distinguish Japanese and American cars.

One can cluster with mathematical methods or with visualization techniques. For this project we choose mathematical methods and visualization techniques. Visualization is very flexible and fast. The visualization software was taken from the software DataMiner3D (see www.dimension5.sk). One would offer a pre-clustering according to some algorithm, and would display the hits hierarchical. The first clustering would be done on the fly. If this clustering is not sufficient, the user can switch to special software like WebMiner3D to optimize the clustering. Visualization has another advantage; it is easy to see how well the clusters are separated.

In principal several methods of clustering will be used. The first will be to asign a similarity number to documents, similar documents can be clustered by simple sorting of the similarity number.  Jarvis -Patrick or similar clustering methods will be examined to cluster the hit list. 

Descriptors:

In Internet searches we could imagine the following descriptors:

  • Domain name

  • Categories like arts & humanities, arts & humanities > literature

  • Keywords

  • Words

  • Chemical Structures, reactions, etc

  • Pictures

  • Graphs and spectra

  • Tables

  • Meta-information

  • Other

 

The search engines could provide these descriptors. However, they must be able to work backwards. When searching, the search engines looks for documents containing the query words. For post processing, the search engine has to deliver the indexed words, and the categories (if they exist) which belong to the documents.

Business Model:

In order to get access to reasonable descriptors (words), with which one could hope to cluster the hits, one needs to work closely with a search engine, or one relies on a small information content provided as standard of the search engines..

Advantages:

A search engine that would offer post processing has a competitive edge. This could be vital. Presently, the index is only used for searching. Post processing would extract additional value from the index. There is a lot of effort under way to improve searching.  Many of these methods are very elaborate. Using clustering is a fairly simple method.

 

Execution:

First we have to investigate if our idea is technically feasible. Then we must see if there is an interest from the owner of the search engine to support the project financially. We have the programming and the necessary project management skills to write the software.

Contacts:

Dr. Alexander Kos

AKos Consulting & Solutions GmbH

 

 

For technical questions:

Dusan Toman

Hurbanova 36
SK-92001Hlohovec

SLOVAKIA 

Tel:+421 905 409604          

Email: info@dimension5.sk 

Iternet:www.dimension5.sk 

  

 

 

 

Up