On this page you can find all the information on how to set up the tool INTERSET so you can use it with your own data.
The INTERSET tool is a web application. The application's client is created by using standard technologies: HTML, CSS, JavaScript. The main part is built by using angular.js.
The server side of the application is created using the node.js framework. In order for the tool to work, one has to
create the appropriate sqlite database and fill it with required data. The full code of the tool is available Online.
In order to work, the tool requires several packages. Links to this packages are provided in the "References and links" page.
The database allowing full functionality contains 23 tables. We proved the sqlite code to create each of them.
CREATE TABLE DataTable(elementID int, attributeID int, elementValue double);
CREATE INDEX attrIndex ON DataTable (attributeID);
CREATE TABLE ElementTable(elementID int, elementName VARCHAR(255), elementDescription VARCHAR(255));
CREATE TABLE RedescriptionAttributeTable(redescriptionID int, clauseID int, attributeID int, attributeMinValue double, attributeMaxValue double, negated int DEFAULT 0);
CREATE INDEX redIndx ON RedescriptionAttributeTable (redescriptionID);
CREATE TABLE RedescriptionElementTable(redescriptionID int, elemetID int);
CREATE INDEX redID on RedescriptionElementTable (redescriptionID);
CREATE TABLE RedescriptionTable(redescriptionID int, redescriptionLR VARCHAR(255), redescriptionRR VARCHAR(255), redescriptionJS double, redescriptionSupport int, redescriptionAM1 double, redescriptionAM2 double,...);
CREATE TABLE AttributeTable(attributeID int, attributeName VARCHAR(255), attributeDescription VARCHAR(255), view int);
CREATE TABLE SOMClusters(userId INTEGER, elementID INTEGER, SOMClusterID INTEGER);
CREATE TABLE ElementCoverage(userId INTEGER, elementID int, redescriptionCount int);
CREATE TABLE SOMDimensions(userId INTEGER, NumRows INTEGER, NumColumns INTEGER);
CREATE TABLE MeasuresNames(measureID int, name VARCHAR(255), type VARCHAR(255), displayName VARCHAR(255), shortName VARCHAR(255));
CREATE TABLE AttributeCoocurenceTable(userId INTEGER, coocurence INTEGER, attributeID1 INTEGER, attributeID2 INTEGER);
CREATE TABLE AttributeFrequencyTable(userId INTEGER, frequency INTEGER, attributeID INTEGER, attributeName VARCHAR(255));
CREATE TABLE CategoryTable(attributeID int, categoryValue double, categoryName VARCHAR(255));
CREATE TABLE GraphTable (redId1 Integer, redId2 Integer, overlap Float);
CREATE TABLE GraphTableAttr (redId1 Integer, redId2 Integer, overlAttrap Float);
CREATE TABLE UserTable(userId INTEGER, userName VARCHAR(100), password VARCHAR(250));
CREATE TABLE ElementCoverageBack(elementID int, redescriptionCount int);
CREATE TABLE SOMClustersBack(elementID int, SOMClusterID int);
CREATE TABLE SomDimensionsBack(NumRows INTEGER, NumColumns INTEGER);
CREATE TABLE SelectedRedescriptionsElemBack(redescriptionID INTEGER);
CREATE TABLE SelectedRedescriptions(userId INTEGER, redescriptionID INTEGER);
CREATE TABLE SelectedRedescriptionsAttr(userId INTEGER, redescriptionID INTEGER);
CREATE TABLE SelectedRedescriptionsElem(userId INTEGER, redescriptionID INTEGER);
DataTable contains the element value for all attributes used in redescription mining process. The ElementTable contains additional information about elements such as element descriptions or longer labels if available.
The RedescriptionAttributeTable contains information about redescription attributes for each redescription in the redescription set. For now, the RedescriptionTable must contain all fields up to and including redescriptionSupport. Fields occuring after that are optional and represent additional redescription quality measures.
Numeric attributes need to have both attributeMinValue and attributeMaxValue field defined whereas categoric attributes need to have one of those to values equal to null. The RedescriptionElementTable contains support sets for each redescription from the redescription set and the RedescriptionTable
contains values of redescriptions measures for each redescription. The AttributeTable allows defining additional attribute labels or descriptions. The SOMClusters table contains elements contained in each SOM cluster.
The ElementCoverage table contains information, for each element, on the number of redescriptions that contains it in the redescription support sets.
The SOMDimensions table contains dimensions to be used to display the SOM. Table MeasuresNames allows entering various information about redescription measures used in the process. The displayName field is used
to display the measure name in the global redescription information view, and the shortName is used in the corresponding list. It should be up to 3 letter abriviation to be put in the table header.
The AttributeCoocurenceTable contains the co-occurrence frequencies of attributes in the redescription queries while the AttributeFrequencyTable contains the occurrence of attributes in redescription queries.
The CategoryTable contains the information about categories for attribute containing categorical values. Each categoric value of such attributes must be assigned an integer code, that is used in the application.
The GraphTable contains pairwise entity Jaccard index for all redescriptions contained in the redescription set. Similarly, the GraphTableAttr contains pairwise attribute Jaccard index for all redescriptions contained in the
redescription set. The UserTable contains usernames and passwords. All tables with the suffix Back are used to save the exact information about the pre-trained SOM map (layout, number of clusters, entity membership etc.).
Such SOM is loaded by using Load SOM Layout option and can be used to load SOM created by any external tool.
To obtain the external SOM , we used the Kohonen package for R (link available in the "References and links" page). As a input to the SOM, we used the element occourence in redescriptions contained in the redescription set. The user has to specify the layout to be used to train SOM. As an output, the SOM returns clusters embeded in the layout and element cluster membership. The tool also contains the integrated SOM that can be computed on a set containing all redescriptions or on any selected subset of redescriptions.
The attribute heatmap uses information about attributes in redescription queries. This map is automatically computed from within the tool. Users need to define the dimension of the heatmap that will be displayed and used for exploration.
The Crossfilter uses information about redescriptions provided in the RedescriptionTable. The measure displayNames and shortNames should be defined in the MeasuresNames table. This information is displayed at the top of each measure filter and in the redescription table list.
We have created a set of java functions that automatically create the file input for the RedescriptionAttributeTable , RedescriptionElementTable , RedescriptionTable , SOMClusters and ElementCoverage. These functions are designed to work with the output of the CLUS-RM redescription mining algorithm and will be extended to allow automatic database construction.