Analysis of rules (DA). Comparison with other approaches
The main idea behind DA is that a rule can be found based on the frequencies of joint occurrence or non-occurrence of events. Such a rule is called a “determinacy”, and the mathematical theory of such rules is called “determinacy analysis”, or DA.
Another key idea: DA is the basis for convenient and useful data analysis technology with wide applications. This idea is implemented in DALSolution.
What is a rule?
People find rules (determinacies) by observing the joint or disjoint occurrence of events. For example, if one notices that an occurrence of A is always followed by an occurrence of B, this means that there exists a rule “If A then B”, or A ® B. If you draw A as a circle and B as another circle, then the circle A is completely inside the circle B, as shown in Figure 1. This means that there exists an accurate rule A ® B.

Figure 1. The case of accurate rule A ® B. Circle A (red) is completely inside circle B. The bounding rectangle symbolizes the entire set of observations.
Everybody need rules.
The concept of a rule as a determinacy is closely related to the ideas of prediction and explanation. The primary reason for interest in rules is that knowledge of rules allows one to act while foreseeing the results. Doctors are interested in rules like “If a patient with a particular disease under particular conditions is treated with a particular drug, then he or she will recover or get significantly better without side effects”. Such rules help doctors to do their job. For a marketing specialist, a rules such as “If the image of the product is changed in a particular way, its attractiveness for particular consumers will increase”, are important. Knowledge of such rules allows for better planning of market behavior. An example of a rule which may get an electoral campaign consultant interested, is: “If a candidate makes a particular statement under particular conditions, his or her rating will increase among particular groups of voters and decrease among others”. Rules are the most natural form of knowledge, so everybody can benefit from them.
Each rule has two fundamental characteristics – accuracy and completeness
The accuracy of rule A ® B is defined as a proportion of occurrences of B among the occurrences of A. In figure 1 this proportion is equal to 1 (100%), which makes the rule A ® B accurate. Completeness is another fundamental characteristic of a rule. In figure 1 you can see that with the rule A ® B one can predict only about one fourth of all occurrences of B. The area of circle A is approximately one fourth of the area of circle B. The rule A ® B is accurate but not complete, its completeness is approximately equal to one fourth (25%).
The completeness of rule A ® B is generally defined as a proportion of occurrences of A among the occurrences of B. The completeness of the rule A ® B is equal to the accuracy of a reverse rule B ® A, while the accuracy of the rule A ® B equals the completeness of the reverse rule. Whether the estimate represents accuracy or completeness, reversing the arrow switches the parameter represented.
An inaccurate rule can be made more accurate
Accurate rules are difficult to find. Most rules are inaccurate. If the rule A ® B is not accurate, circle A is not completely within circle B, as shown in figure 2.

Figure 2. The case of inaccurate rule A ® B. Only a portion (colored in red) of circle A in inside circle B.
If one adds a certain factor C to the rule A ® B, the resultant rule AC ® B may turn out to be accurate. An example of this situation is shown in figure 3.

Figure 3. A factor C is added to the inaccurate rule A ® B. The rule AC ® B is accurate. All joint occurrences of A and C (colored in red) are inside circle B.
It may certainly happen that rule AC ® B is even less accurate than the initial rule A ® B. In figure 4, the accuracy of rule AC ® B equals 0.

Figure 4. A factor C is added to the inaccurate rule A ® B. The resultant rule AC ® B has zero accuracy. All cases of joint occurrence of A and C (colored gray) appear beyond circle B.
DALSolution finds and analyzes rules
DALSolution allows one to find rules and provides information about factors which make these rules more or less accurate (in the Rules Tables mode).
If you specify an inaccurate rule A ® B and some text variable, DALSolution will find those values of the variable that increase the rule accuracy (or decrease it, or leave it unchanged). In this manner, DALSolution works with qualitative factors in rules (in the Rules Tables mode).
If you specify an inaccurate rule A ® B and some numeric variable x, DALSolution will find the bounds p, q, describing a factor C = {q £ x £ r} such that the accuracy of rule AC ® B is not lower than a given threshold. In this manner, DALSolution helps you explore quantitative factors in rules (Optimization in the Rules Tables mode).
How is DA related to the practice of medical testing?
Answer. DA expands medical testing capabilities. By applying DA, a physician can identify tests based on a composition of several diagnostic factors.
Techniques for testing diseases, drugs, treatment methods, etc., are widely used in medicine. Let A be a condition used as a test and B a disease (or a result of treatment, a result of drug application, etc.) The relationship between A and B is commonly described by a table with cells containing the number of cases when “A” or “not A” is observed jointly with “B” or “not B” (Table 1):
| B | Not B | Sum | |
| A | a | b | a+b |
| Not A | c | d | c+d |
| Sum | a+c | b+d |
Table 1. Data used to compute the characteristics of a test for the presence of disease B based on the condition A.
The relationship between medical testing terminology, and the DA terminology
| Terminology accepted in the practice of testing | Formula | DA terminology |
| Predictive value of the positive test | Accuracy of positive diagnostic rule A ® B |
|
| Test sensitivity | Completeness of positive diagnostic rule A ® B |
|
| Predictive value of the negative test | Accuracy of negative diagnostic rule “Not A” ® “Not B” |
|
| Test specificity | Completeness of negative diagnostic rule “Not A” ® “Not B” |
|
| The diagnostic accuracy of the method | Weighted average of the accuracies of rules A ® B and "Not A" ® "Not B". |
It is more convenient to manipulate accuracy and completeness of rules than to use five test characteristics.
Argument 1. All test characteristics can be reduced to the accuracy and completeness of the positive and negative diagnostic rules.
Argument 2. It is easier to operate with two characteristics than with five.
Argument 3. Both accuracy and completeness have simple interpretations related to medical practice. The accuracy shows to what extent both the patient and the physician should trust the diagnosis made on the basis of the diagnostic rule. Completeness shows what proportion of patients suffering from the disease the physician can diagnose correctly based on the found diagnostic rule.
Argument 4. Using DA-terminology expands testing possibilities, by applying the analysis of rules implemented in DALSolution, to the search of tests.
How is the DA approach different from classic factor analysis?
“Classic factor analysis” is understood here as any factor analysis scheme where relationships between variables are described by a matrix of pair-wise statistical coefficients, and each coefficient represents a relationship between a pair of variables.
Answer.
Example. Comparison between the Analysis of Rules and the classic Factor Analysis.
The example below illustrates the inapplicability of the classic scheme of factor analysis for the analysis of rules in the case when a matrix of pairwise statistical coefficients provides an incomplete description of relationships between the variables (S.V. Chesnokov, 1975).
Consider three binary variables: x with values a and
, y with values b and
, and z with values c and
. Suppose that values of these variables are measured in
a sample of 1000 cases, and they are distributed as follows (Table 2):
x |
|||||
a |
250 |
0 |
0 |
250 |
|
|
0 |
250 |
250 |
0 |
|
bc |
b |
|
|
yz |
Table 2. Joint distribution by variables x, y, and z.
Problem formulation in the scheme of factor analysis
Given is a set of variables x, y, z. Find the minimal number of factors (variables from the given set) such that you could “restore” values of all the variables based on the factor values.
Solving the problem by means of the Analysis of Rules
As you can observe in the table, the values of variable x are completely determined by the values of variables y and z, according to the following rules:
Rule 1.
(If bc, then a)
Rule 2.
(If
b
, then
)
Rule 3.
(If
c, then
)
Rule 4.
(If

, then a)
From the perspective of DA, this set of rules represents a solution of the problem.
Variables y, z are perfect factors. They can be used to restore the
values of variable x without uncertainty. The algorithm restoring the values is
expressed as the function
, which is completely
defined by rules 1 - 4.
Therefore, the solution obtained by means of DA states that the number of factors equals two. Variables x, z can be selected as factors.
Solving the same problem with classic factor analysis
Examine the same situation using classic factor analysis. Following the standard recipe, you first need to compute the coefficients of statistical relation between pairs of variables and compose a matrix of these coefficients. Suppose that as a coefficient of statistical relation between pairs of variables you use any measure which equals zero in the case of statistical independence between variables, and assumes a maximal value (say, one) if the variables are exactly the same.
From the distribution in Table 2 you therefore obtain three pair-wise distributions shown in Table 3.
|
|
|
Table 3. Pair-wise distributions by variables (x, y), (y, z), (z, x), obtained from the distribution in Table 2. All pairs of variables are statistically independent.
You can observe in this table that variables x, y, z, considered in pairs, are statistically independent. The values of relationship measures for the pairs of variables (x, y), (y, z), (z, x) all equal zero, while pairs of variables (x, y), (y, z), (z, x) are all equal to 1. As a result, the matrix of pair-wise relationship measures will be a unitary 3X3 matrix in the form:

In multivariate analysis, this result is conventionally formulated as: variables x, y, z are in no way related to each other, they form three mutually independent factors.
Therefore, the solution obtained by means of Factor Analysis states that the minimal number of factors equals three.
Comparison of results
Classic factor analysis leads to the wrong result. The correct result is obtained by means of DA. The minimal number of factors equals two, not three. This can be directly observed in Table 2.
The reasons for the wrong result produced by factor analysis are:
How is DA different from regression analysis?
Answer.
1. The analysis of relationships task is formulated differently in regression analysis, than in DA.
The problem formulation in the scheme of regression analysis. Given are a variable y and a set of variables x1, x2, x3,…, xn, The task is to find a regression equation y = F(x1, x2, x3,…, xn ) showing how the variable y depends on the variables xi, i=1,2,3,…,n.
The problem formulation in the scheme of DA. Given are a value B and a set of variables x1, x2, x3,…, xn.. The task is to find all rules in the form
A1A2A3,…,Ak ® B
so that their accuracy and completeness are within a specified range. Here Ai is a value or a group of values from variable xi of those specified (i = 1,2,3,…,n).
2. Regression analysis also manipulates rules.
Regression analysis manipulates regression functions. A regression function y = f(x) consists of rules in the form “If x then y = f(x)”, or “x ® y = f(x)”. For each x = x0, the value y0 = f(x0) is computed as a mean of variable y for a given x0. Thus, regression analysis can be considered one of the methods for finding and analyzing rules, and hence is comparable to DA.
3. DA is applicable for the analysis of non-numeric variables, while regression analysis is not.
In order to apply regression analysis and find rules composing a regression function y = f(x), the variable y needs to be numeric. Regression analysis is not applicable if variable y is categorical. DA does not have such a restriction. The method for finding rules within DA does not require the variables to be numeric.
4. Regression analysis provides an approximate solution for the task of forecasting, while DA provides an accurate one.
Suppose that rules composing a regression function are used for prediction. For this task, the rules should be as accurate as possible, to provide for reliable explanation and a small forecasting error. However, regression analysis algorithms do not guarantee high accuracy of forecasting rules. Consider the following example (S. Chesnokov, Determinacy Analysis of Social-Economic Data, 1982, pp. 148-149). Let variables x, y have integer values, and the probability P(y|x) of each y to occur be defined as:

where e is a positive number not exceeding 1, and j(x), h(x) are some functions such that their difference is not equal to zero for any x. Note that P(y|x) is, by definition, the accuracy of rule x ® y.
If y = j(x), then P(y|x) = 1 - e . For small e all rules composing the function y = j(x) have a high accuracy which equals 1 - e . Accuracy of such rules approaches 1 when e approaches 0.
If y = h(x), then P(y|x) = e . For small e all rules composing the function y = j (x) have a low accuracy which equals e. Accuracy of such rules approaches 0 when e approaches 0.
If y ¹ h(x) and y ¹ j(x), then P(y|x) = 0. This means that in all other cases except for those described above, the accuracy of rule x ® y is equal to zero.
If e is sufficiently small (for example, smaller than 0.5), the function y = j(x) is found as the best solution for the task of predicting values of variable y based on values of variable x when DA is applied.
The value e is the prediction error for cases of erroneous forecasting of y based on function y = j(x). With a very small e , the forecast will be almost perfect with the solution provided by DA. Note that this property of the solution obtained with DA does not depend on the form of the function y = h(x). It holds for any function y = h(x).
Now, compute the regression function y = f(x) and compare it with the best solution y = j(x). The regression function is computed as:
f(x) = S y P(y|x)
where the summation is performed for all valid values of y. For the specified form of P(y|x), the sought regression function is:
f(x) = j(x) + e |h(x) - j(x)|
From this formula, one can see that the regression function y = f(x) does not coincide with the best forecasting solution y = j(x). The difference between the regression function f(x) and the best solution j (x) is equal to e [h(x) - j(x)]. By properly selecting a function h (x), one can make this difference as large as necessary for any small e . This is a very unfortunate property of regression analysis, and because of it regression functions generally do not provide the best solution in forecasting problems.
In computing regression functions, the principle of least squares (computing the least mean of squared deviations from the regression function to the points in the correlation field) is used. It is commonly believed that an application of this approach guarantees minimum forecasting error. This belief is erroneous. The minimization of mean squared deviations of the points in the correlation field from the regression function does not guarantee the maximum accuracy of rules composing the regression function. In the previous example, for any specified x the mean squared deviations Q(x) of the points in the correlation field from the exact regression function is
![]()
By properly selecting a function h(x), one can make this value as large as necessary for any small e.
Let us stress that the noted drawbacks of regression analysis are fundamental in nature. They are derived from the notion of regression and cannot be refuted by selecting a particular regression analysis model.
Let us turn to DA now. DA can point to the function y = j(x) as the best function, regardless of the form of function h(x). DA does not have the drawbacks of regression analysis. The reason for DA being effective in finding rules is not related to a particular distribution considered in our example. It is in the fact that DA algorithms are based on direct calculations of the accuracy and completeness of rules. This is another principle difference between DA and regression analysis.
How is DA different from the fuzzy sets approach?
Answer. DA and fuzzy sets theory are not compatible in either a mathematical or a methodological sense.
DA assumes that the lack of certainty for notions used in everyday language is described by rules whose accuracy is different from that maximally possible. Theory of fuzzy sets assumes that the lack of certainty of such notions is described by a fuzzy membership relation between a set and an element of that set.
How are DA and Data Mining methods related?
Answer. DA complements techniques implemented in Data Mining packages.
How are DA and “Decision trees” related?
Answer. DA techniques can be used for building “decision trees”. Rules tables in DALSolution are one of the forms used to represent “decision trees”.
Where can information about DA be found?
Answer.
General and introductory information:
Bibliography
Below you will find references to the main publications on the mathematical basis of determinacy analysis, selected works on applications in sociology, linguistics, logic of natural language, medicine, and geographic information systems, as well as works on philosophic issues in determinacy theory and logic.
Main works on determinacy theory
Determinacy analysis in sociology and the social sciences
Determinacy analysis in linguistics and studies of logic in natural language
Determinacy analysis in medicine
Determinacy analysis in geographic information systems
Philosophic issues in determinacy theory and logic
DALSolution implements Determinacy Analysis (abbreviated as DA). DA is a technology for data processing and analysis focused on the search for and examination of rules (determinacies).
The mathematical foundations of the method were developed during the 1970s by Sergei Chesnokov at the Institute of Systems Studies (VNIISI, Moscow), in the department headed by Academician S. S. Shatalin (1934-1997).
The first software systems implementing DA were developed in the late 1970s. In 1982, a comprehensive volume with a detailed treatise of the method’s foundations was published (Chesnokov, 1982).
In the 1980s, several fundamental mathematical results were obtained in determinacy theory. Determinacy logic was developed during this period, elaborating on the views of Aristotle, which have been largely ignored in mathematical logic of the 20th century. In particular, determinacy theory led to a radical generalization of Aristotelian syllogistics, which in turn resulted in a new understanding of the role of syllogistics for the fundamentals of logic and arithmetic, for analysis of data and for natural language (Chesnokov, works of 1983-1994).
Since the 1980-90s, DA has been widely used in medicine, sociology, linguistics, artificial intelligence studies, and geographic information systems (see Bibliography.) A strong impetus for the application of DA in various areas of science, management, and business resulted from the activities of Context Co., Ltd. and its research extension, the private Institute for Physics of Logos. Starting in 1989, the Company and the Institute assumed the development of the method and applications, as well as the realization of selected collaborative applied projects. New versions of DA software developed at Context Co. opened this technology to wide groups of researchers, managers, and business people.
| News | Products | Support | Contact Us | FAQ | Scheme | Index | Home |
Copyright © 1997-2004 Context Co., Ltd. All rights reserved. Terms and Conditions. |