How to set up the tool

On this page you can find all the information on how to set up the tool INTERSET so you can use it with your own data.
The INTERSET tool is a web application. The application's client is created by using standard technologies: HTML, CSS, JavaScript. The main part is built by using angular.js. The server side of the application is created using the node.js framework. In order for the tool to work, one has to create the appropriate sqlite database and fill it with required data. The full code of the tool is available Online.

Requirements

In order to work, the tool requires several packages. Links to this packages are provided in the "References and links" page.

Database structure

The database allowing full functionality contains 23 tables. We proved the sqlite code to create each of them.

DataTable contains the element value for all attributes used in redescription mining process. The ElementTable contains additional information about elements such as element descriptions or longer labels if available. The RedescriptionAttributeTable contains information about redescription attributes for each redescription in the redescription set. For now, the RedescriptionTable must contain all fields up to and including redescriptionSupport. Fields occuring after that are optional and represent additional redescription quality measures. Numeric attributes need to have both attributeMinValue and attributeMaxValue field defined whereas categoric attributes need to have one of those to values equal to null. The RedescriptionElementTable contains support sets for each redescription from the redescription set and the RedescriptionTable contains values of redescriptions measures for each redescription. The AttributeTable allows defining additional attribute labels or descriptions. The SOMClusters table contains elements contained in each SOM cluster. The ElementCoverage table contains information, for each element, on the number of redescriptions that contains it in the redescription support sets. The SOMDimensions table contains dimensions to be used to display the SOM. Table MeasuresNames allows entering various information about redescription measures used in the process. The displayName field is used to display the measure name in the global redescription information view, and the shortName is used in the corresponding list. It should be up to 3 letter abriviation to be put in the table header. The AttributeCoocurenceTable contains the co-occurrence frequencies of attributes in the redescription queries while the AttributeFrequencyTable contains the occurrence of attributes in redescription queries. The CategoryTable contains the information about categories for attribute containing categorical values. Each categoric value of such attributes must be assigned an integer code, that is used in the application. The GraphTable contains pairwise entity Jaccard index for all redescriptions contained in the redescription set. Similarly, the GraphTableAttr contains pairwise attribute Jaccard index for all redescriptions contained in the redescription set. The UserTable contains usernames and passwords. All tables with the suffix Back are used to save the exact information about the pre-trained SOM map (layout, number of clusters, entity membership etc.). Such SOM is loaded by using Load SOM Layout option and can be used to load SOM created by any external tool.

The Self Organising Map

To obtain the external SOM , we used the Kohonen package for R (link available in the "References and links" page). As a input to the SOM, we used the element occourence in redescriptions contained in the redescription set. The user has to specify the layout to be used to train SOM. As an output, the SOM returns clusters embeded in the layout and element cluster membership. The tool also contains the integrated SOM that can be computed on a set containing all redescriptions or on any selected subset of redescriptions.

Attribute cross view co-occurrence heatmap

The attribute heatmap uses information about attributes in redescription queries. This map is automatically computed from within the tool. Users need to define the dimension of the heatmap that will be displayed and used for exploration.

Crossfilter

The Crossfilter uses information about redescriptions provided in the RedescriptionTable. The measure displayNames and shortNames should be defined in the MeasuresNames table. This information is displayed at the top of each measure filter and in the redescription table list.

Utilities

We have created a set of java functions that automatically create the file input for the RedescriptionAttributeTable , RedescriptionElementTable , RedescriptionTable , SOMClusters and ElementCoverage. These functions are designed to work with the output of the CLUS-RM redescription mining algorithm and will be extended to allow automatic database construction.