Explore redescription set created on:

Information and resources:

Acknowledgements:

Please send questions, suggestions and comments related to the tool to:

matej.mihelcic@irb.hr
matmih1@gmail.com

Redescription mining [10] is a field of Knowledge Discovery that aims to find different characterizations of the same or similar subsets of entities. In addition, it allows users to find connections between attributes from different attributes sets, called views. This is useful in many different fields of science like biology, economy, medicine etc.

The main purpose of the tool InterSet [9] is to allow interactive, comprehensive, redescription set exploration. On this page you can test the features of the tool by exploring redescriptions created on three different datasets. The first dataset contains attributes describing world countries by using general country information and country trading patterns for the year 2012 ([4,11,12]). The second dataset contains attributes describing co-authorship graph and the author-conference bipartite graph ([2,3]). The third dataset contains information about the presence of phenotype properties and Clusters of Orthologoues genes in different bacterial species [1]. The tool is described in more detail in the paper "InterSet: Interactive redescription set exploration" published in the proccedings of the Discovery Science Conference (DS'16). ([9])

General data information

Country data contains 199 world countries. The data contains two views:

General country information: contains 49 numerical attributes
Country trade information: contains 312 numerical attributes

DBLP data contains 6455 authors. The data contains two views:

Author-conference bipartite graph: contains 304 Boolean attributes
Co-authorship graph: contains 6455 Boolean attributes

Phenotype data contains 1336 bacterial species. The data contains two views:

Phenotypes: contains 333 Boolean attributes
COGs: contains 4602 Boolean attributes

We created 4150 redescriptions on the Country data, 3674 redescriptions on the DBLP data and 6200 redescriptions on the Phenotype data. Redescriptions were created with the CLUS-RM redescription mining algorithm [6,7], which is based on Predictive Clustering Trees [5] algorithm. We used the random forest extensions incorporated into CLUS-RM (see [8]) to obtain redescriptions on a Phenotype data.

Redescription set exploration

The tool offers interactive redescription set exploration based on entities described by redescriptions in the redescription set, attributes contained in the redescription queries and general redescription properties. The properties used in the literature are: the Jaccard index, the p-value and redescription support [3].

Information and resources

In the information and resources section you can find the instructions on how to set up and use the tool. Also, there are some resources available that should help in understanding what is redescription mining, why is it used and how are redescriptions created.

Tool for interactive redescription set exploration

General data information

Redescription set exploration

Information and resources