FCALGS: Data Formats

This document describes basic input and output data formats that are used by FCALGS.

Input Data (Formal Context)

A formal context is a triplet ⟨X,Y,I⟩ such that X is a finite nonempty set of objects (transactions), Y is a finite nonempty set of attributes (features), and IX×Y, i.e., I is a binary relation between objects from X and attributes from Y. The fact ⟨x,y⟩∈I is interpreted so that “object x has attribute y.” Obviously, ⟨X,Y,I⟩ corresponds to a two-dimensional data table with rows corresponding to objects from X, columns corresponding to attributes from Y, and table entries being either crosses or blanks, indicating whether an object does have or does not have an attribute, respectively.

We accept the following conventions. The sets of objects and attributes contain consecutive integers starting by zero, i.e., we can write X={0,…,m} and Y={0,…,n}. Furthermore, we assume that each object has at least one attribute (i.e., there is no “blank line” in the corresponding data table).

An input file corresponding to ⟨X,Y,I⟩ is an ASCII text file which consists of lines representing attributes of objects from X. A line is ignored if it consists solely of white space (spaces and tabs). Otherwise, a line encodes a list of attributes. A list of attributes is a white-space separated sequence of nonnegative integers which are present in the ascending order. The integers can take values 0,…,n and each of them denotes the corresponding attribute. For each object xX, the input file contains exactly one line which is a list of all its attributes. Moreover, the (lists of attributes of) objects in the file are present in the ascending order. That is, the first (nonempty) line in the file is a list of attributes of object 0, the second (nonempty) line in the file is a list of attributes of object 1,…, the last (nonempty) line in the file is a list of attributes of object m.

The following examples show formal contexts (as data tables) on the left and the corresponding input files on the right.

× × ×
  ×××
××  ×
× ×  
×× × 
0 2 4
2 3 4
0 1 4
0 2
0 1 3
×××××
 ××××
  ×××
   ××
    ×
0 1 2 3 4
1 2 3 4
2 3 4
3 4
4
×    
 ×   
  ×  
   × 
    ×
0
1
2
3
4
 ××××
× ×××
×× ××
××× ×
×××× 
1 2 3 4
0 2 3 4
0 1 3 4
0 1 2 4
0 1 2 3

Note that the amount of the white space on each line is not important. That is, the integers denoting attributes should be separated by a single white-space character or more white-space characters. This allows to write the input data in a more readable way (if it is convenient). For instance, the first context from the previous example can be equivalently specified as follows:

× × ×
  ×××
××  ×
× ×  
×× × 
0   2   4
    2 3 4
0 1     4
0   2
0 1   3

Given an input file of the previous form, the number of all objects is determined as the number of all nonempty lines in the file and the number of all attributes is determined from the index of the greatest integer present in the file.

Output Data (List of Concept Intents)

The basic output of the formal concept analysis is a list of formal concepts. Since each formal concept is uniquely given by its extent and/or intent, our tools produce a list of all concept intents. Each concept intent is encoded as a list of attributes, see Input Data (Formal Context) for explanation.