News Products Support Contact Us FAQ Scheme Index home.gif (842 bytes)Home

Analysis of rules (DA). Comparison with other approaches


What is the theory behind DA?

The main idea behind DA is that a rule can be found based on the frequencies of joint occurrence or non-occurrence of events. Such a rule is called a “determinacy”, and the mathematical theory of such rules is called “determinacy analysis”, or DA.

Another key idea: DA is the basis for convenient and useful data analysis technology with wide applications. This idea is implemented in DALSolution.

What is a rule?

People find rules (determinacies) by observing the joint or disjoint occurrence of events. For example, if one notices that an occurrence of A is always followed by an occurrence of B, this means that there exists a rule “If A then B”, or A ® B. If you draw A as a circle and B as another circle, then the circle A is completely inside the circle B, as shown in Figure 1. This means that there exists an accurate rule A ® B.

Figure 1. The case of accurate rule A ® B. Circle A (red) is completely inside circle B. The bounding rectangle symbolizes the entire set of observations.

Everybody need rules.

The concept of a rule as a determinacy is closely related to the ideas of prediction and explanation. The primary reason for interest in rules is that knowledge of rules allows one to act while foreseeing the results. Doctors are interested in rules like “If a patient with a particular disease under particular conditions is treated with a particular drug, then he or she will recover or get significantly better without side effects”. Such rules help doctors to do their job. For a marketing specialist, a rules such as “If the image of the product is changed in a particular way, its attractiveness for particular consumers will increase”, are important. Knowledge of such rules allows for better planning of market behavior. An example of a rule which may get an electoral campaign consultant interested, is: “If a candidate makes a particular statement under particular conditions, his or her rating will increase among particular groups of voters and decrease among others”. Rules are the most natural form of knowledge, so everybody can benefit from them.

Each rule has two fundamental characteristics – accuracy and completeness

The accuracy of rule A ® B is defined as a proportion of occurrences of B among the occurrences of A. In figure 1 this proportion is equal to 1 (100%), which makes the rule A ® B accurate. Completeness is another fundamental characteristic of a rule. In figure 1 you can see that with the rule A ® B one can predict only about one fourth of all occurrences of B. The area of circle A is approximately one fourth of the area of circle B. The rule A ® B is accurate but not complete, its completeness is approximately equal to one fourth (25%).

The completeness of rule A ® B is generally defined as a proportion of occurrences of A among the occurrences of B. The completeness of the rule A ® B is equal to the accuracy of a reverse rule B ® A, while the accuracy of the rule A ® B equals the completeness of the reverse rule. Whether the estimate represents accuracy or completeness, reversing the arrow switches the parameter represented.

An inaccurate rule can be made more accurate

Accurate rules are difficult to find. Most rules are inaccurate. If the rule A ® B is not accurate, circle A is not completely within circle B, as shown in figure 2.

Figure 2. The case of inaccurate rule A ® B. Only a portion (colored in red) of circle A in inside circle B.

If one adds a certain factor C to the rule A ® B, the resultant rule AC ® B may turn out to be accurate. An example of this situation is shown in figure 3.

Figure 3. A factor C is added to the inaccurate rule A ® B. The rule AC ® B is accurate. All joint occurrences of A and C (colored in red) are inside circle B.

It may certainly happen that rule AC ® B is even less accurate than the initial rule A ® B. In figure 4, the accuracy of rule AC ® B equals 0.

Figure 4. A factor C is added to the inaccurate rule A ® B. The resultant rule AC ® B has zero accuracy. All cases of joint occurrence of A and C (colored gray) appear beyond circle B.

DALSolution finds and analyzes rules

DALSolution allows one to find rules and provides information about factors which make these rules more or less accurate (in the Rules Tables mode).

If you specify an inaccurate rule A ® B and some text variable, DALSolution will find those values of the variable that increase the rule accuracy (or decrease it, or leave it unchanged). In this manner, DALSolution works with qualitative factors in rules (in the Rules Tables mode).

If you specify an inaccurate rule A ® B and some numeric variable x, DALSolution will find the bounds p, q, describing a factor C = {q £ x £ r} such that the accuracy of rule AC ® B is not lower than a given threshold. In this manner, DALSolution helps you explore quantitative factors in rules (Optimization in the Rules Tables mode).


How is DA related to the practice of medical testing?

Answer. DA expands medical testing capabilities. By applying DA, a physician can identify tests based on a composition of several diagnostic factors.

Techniques for testing diseases, drugs, treatment methods, etc., are widely used in medicine. Let A be a condition used as a test and B a disease (or a result of treatment, a result of drug application, etc.) The relationship between A and B is commonly described by a table with cells containing the number of cases when “A” or “not A” is observed jointly with “B” or “not B” (Table 1):

  B Not B Sum
A a b a+b
Not A c d c+d
Sum a+c b+d  

Table 1. Data used to compute the characteristics of a test for the presence of disease B based on the condition A.

The relationship between medical testing terminology, and the DA terminology

Terminology accepted in the practice of testing Formula DA terminology
Predictive value of the positive test Accuracy of positive diagnostic rule

A ® B

Test sensitivity Completeness of positive diagnostic rule

A ® B

Predictive value of the negative test Accuracy of negative diagnostic rule

“Not A” ® “Not B”

Test specificity Completeness of negative diagnostic rule

“Not A” ® “Not B”

The diagnostic accuracy of the method Weighted average of the accuracies of rules

A ® B and "Not A" ® "Not B".

It is more convenient to manipulate accuracy and completeness of rules than to use five test characteristics.

Argument 1. All test characteristics can be reduced to the accuracy and completeness of the positive and negative diagnostic rules.

Argument 2. It is easier to operate with two characteristics than with five.

Argument 3. Both accuracy and completeness have simple interpretations related to medical practice. The accuracy shows to what extent both the patient and the physician should trust the diagnosis made on the basis of the diagnostic rule. Completeness shows what proportion of patients suffering from the disease the physician can diagnose correctly based on the found diagnostic rule.

Argument 4. Using DA-terminology expands testing possibilities, by applying the analysis of rules implemented in DALSolution, to the search of tests.


How is the DA approach different from classic factor analysis?

“Classic factor analysis” is understood here as any factor analysis scheme where relationships between variables are described by a matrix of pair-wise statistical coefficients, and each coefficient represents a relationship between a pair of variables.

Answer.

  1. Let us consider n variables x1, x2, x3,…, xn, which characterize an analytical situation. The best results are produced by classic factor analysis in the case when variables x1, x2, x3,…, xn are numeric and their values are normally distributed. With DA, your analysis is not limited to numeric variables. There are also no restrictions on the distribution of values for each of the variables x1, x2, x3,…, xn .
  2. If the matrix of pair-wise statistical coefficients provides an incomplete description of relationships among variables x1, x2, x3,…, xn, classic factor analysis may lead to erroneous results. Under the same conditions, DA leads to correct results.

Example. Comparison between the Analysis of Rules and the classic Factor Analysis.

The example below illustrates the inapplicability of the classic scheme of factor analysis for the analysis of rules in the case when a matrix of pairwise statistical coefficients provides an incomplete description of relationships between the variables (S.V. Chesnokov, 1975).

Consider three binary variables: x with values a and , y with values b and , and z with values c and . Suppose that values of these variables are measured in a sample of 1000 cases, and they are distributed as follows (Table 2):

x

a

250

0

0

250

0

250

250

0

bc

b

c

yz

Table 2. Joint distribution by variables x, y, and z.

Problem formulation in the scheme of factor analysis

Given is a set of variables x, y, z. Find the minimal number of factors (variables from the given set) such that you could “restore” values of all the variables based on the factor values.

Solving the problem by means of the Analysis of Rules

As you can observe in the table, the values of variable x are completely determined by the values of variables y and z, according to the following rules:

Rule 1. (If bc, then a)

Rule 2. (If  b, then )

Rule 3. (If c, then )

Rule 4. (If , then a)

From the perspective of DA, this set of rules represents a solution of the problem. Variables y, z are perfect factors. They can be used to restore the values of variable x without uncertainty. The algorithm restoring the values is expressed as the function , which is completely defined by rules 1 - 4.

Therefore, the solution obtained by means of DA states that the number of factors equals two. Variables x, z can be selected as factors.

Solving the same problem with classic factor analysis

Examine the same situation using classic factor analysis. Following the standard recipe, you first need to compute the coefficients of statistical relation between pairs of variables and compose a matrix of these coefficients. Suppose that as a coefficient of statistical relation between pairs of variables you use any measure which equals zero in the case of statistical independence between variables, and assumes a maximal value (say, one) if the variables are exactly the same.

From the distribution in Table 2 you therefore obtain three pair-wise distributions shown in Table 3.

x      
a 250 250  
250 250  
  b y
y      
b 250 250  
250 250  
  c z
z      
c 250 250  
250 250  
  a x

Table 3. Pair-wise distributions by variables (x, y), (y, z), (z, x), obtained from the distribution in Table 2. All pairs of variables are statistically independent.

You can observe in this table that variables x, y, z, considered in pairs, are statistically independent. The values of relationship measures for the pairs of variables (x, y), (y, z), (z, x) all equal zero, while pairs of variables (x, y), (y, z), (z, x) are all equal to 1. As a result, the matrix of pair-wise relationship measures will be a unitary 3X3 matrix in the form:

In multivariate analysis, this result is conventionally formulated as: variables x, y, z are in no way related to each other, they form three mutually independent factors.

Therefore, the solution obtained by means of Factor Analysis states that the minimal number of factors equals three.

Comparison of results

Classic factor analysis leads to the wrong result. The correct result is obtained by means of DA. The minimal number of factors equals two, not three. This can be directly observed in Table 2.

The reasons for the wrong result produced by factor analysis are:

  1. The matrix of statistical coefficients describing relations between pairs of variables provides an incomplete expression of relationship among the variables.
  2. The mathematical scheme of factor analysis employs the understanding of statistical relations as a measure of deviation from statistical independence, which leads to errors during the analysis of rules.

How is DA different from regression analysis?

Answer.

1. The analysis of relationships task is formulated differently in regression analysis, than in DA.

The problem formulation in the scheme of regression analysis. Given are a variable y and a set of variables x1, x2, x3,…, xn, The task is to find a regression equation y = F(x1, x2, x3,…, xn ) showing how the variable y depends on the variables xi, i=1,2,3,…,n.

The problem formulation in the scheme of DA. Given are a value B and a set of variables x1, x2, x3,…, xn.. The task is to find all rules in the form

A1A2A3,…,Ak ® B

so that their accuracy and completeness are within a specified range. Here Ai is a value or a group of values from variable xi of those specified (i = 1,2,3,…,n).

2. Regression analysis also manipulates rules.

Regression analysis manipulates regression functions. A regression function y = f(x) consists of rules in the form “If x then y = f(x)”, or “x ® y = f(x)”. For each x = x0, the value y0 = f(x0) is computed as a mean of variable y for a given x0. Thus, regression analysis can be considered one of the methods for finding and analyzing rules, and hence is comparable to DA.

3. DA is applicable for the analysis of non-numeric variables, while regression analysis is not.

In order to apply regression analysis and find rules composing a regression function y = f(x), the variable y needs to be numeric. Regression analysis is not applicable if variable y is categorical. DA does not have such a restriction. The method for finding rules within DA does not require the variables to be numeric.

4. Regression analysis provides an approximate solution for the task of forecasting, while DA provides an accurate one.

Suppose that rules composing a regression function are used for prediction. For this task, the rules should be as accurate as possible, to provide for reliable explanation and a small forecasting error. However, regression analysis algorithms do not guarantee high accuracy of forecasting rules. Consider the following example (S. Chesnokov, Determinacy Analysis of Social-Economic Data, 1982, pp. 148-149). Let variables x, y have integer values, and the probability P(y|x) of each y to occur be defined as:

where e is a positive number not exceeding 1, and j(x), h(x) are some functions such that their difference is not equal to zero for any x. Note that P(y|x) is, by definition, the accuracy of rule x ® y.

If y = j(x), then P(y|x) = 1 - e . For small e all rules composing the function y = j(x) have a high accuracy which equals 1 - e . Accuracy of such rules approaches 1 when e approaches 0.

If y = h(x), then P(y|x) = e . For small e all rules composing the function y = j (x) have a low accuracy which equals e. Accuracy of such rules approaches 0 when e approaches 0.

If y ¹ h(x) and y ¹ j(x), then P(y|x) = 0. This means that in all other cases except for those described above, the accuracy of rule x ® y is equal to zero.

If e is sufficiently small (for example, smaller than 0.5), the function y = j(x) is found as the best solution for the task of predicting values of variable y based on values of variable x when DA is applied.

The value e is the prediction error for cases of erroneous forecasting of y based on function y = j(x). With a very small e , the forecast will be almost perfect with the solution provided by DA. Note that this property of the solution obtained with DA does not depend on the form of the function y = h(x). It holds for any function y = h(x).

Now, compute the regression function y = f(x) and compare it with the best solution y = j(x). The regression function is computed as:

f(x) = S y P(y|x)

where the summation is performed for all valid values of y. For the specified form of P(y|x), the sought regression function is:

f(x) = j(x) + e |h(x) - j(x)|

From this formula, one can see that the regression function y = f(x) does not coincide with the best forecasting solution y = j(x). The difference between the regression function f(x) and the best solution j (x) is equal to e [h(x) - j(x)]. By properly selecting a function h (x), one can make this difference as large as necessary for any small e . This is a very unfortunate property of regression analysis, and because of it regression functions generally do not provide the best solution in forecasting problems.

In computing regression functions, the principle of least squares (computing the least mean of squared deviations from the regression function to the points in the correlation field) is used. It is commonly believed that an application of this approach guarantees minimum forecasting error. This belief is erroneous. The minimization of mean squared deviations of the points in the correlation field from the regression function does not guarantee the maximum accuracy of rules composing the regression function. In the previous example, for any specified x the mean squared deviations Q(x) of the points in the correlation field from the exact regression function is

By properly selecting a function h(x), one can make this value as large as necessary for any small e.

Let us stress that the noted drawbacks of regression analysis are fundamental in nature. They are derived from the notion of regression and cannot be refuted by selecting a particular regression analysis model.

Let us turn to DA now. DA can point to the function y = j(x) as the best function, regardless of the form of function h(x). DA does not have the drawbacks of regression analysis. The reason for DA being effective in finding rules is not related to a particular distribution considered in our example. It is in the fact that DA algorithms are based on direct calculations of the accuracy and completeness of rules. This is another principle difference between DA and regression analysis.


How is DA different from the fuzzy sets approach?

Answer. DA and fuzzy sets theory are not compatible in either a mathematical or a methodological sense.

  1. DA manipulates observed event frequencies which are empirically computed. The theory of fuzzy sets manipulates non-observed values (membership functions) which are not empirical.
  2. The axiomatic foundation of DA is compatible with the axioms of probability theory. Axioms of the fuzzy sets theory are not compatible with the axioms of probability theory.

DA assumes that the lack of certainty for notions used in everyday language is described by rules whose accuracy is different from that maximally possible. Theory of fuzzy sets assumes that the lack of certainty of such notions is described by a fuzzy membership relation between a set and an element of that set.


How are DA and Data Mining methods related?

Answer. DA complements techniques implemented in Data Mining packages.


How are DA and “Decision trees” related?

Answer. DA techniques can be used for building “decision trees”. Rules tables in DALSolution are one of the forms used to represent “decision trees”.


Where can information about DA be found?

Answer.

General and introductory information:

  1. User’s Guide, DA-System, Release 4.0 (in Russian)
  2. User’s Guide, DALSolution, Release 4.0 (in English)
  3. http://www.context.ru (in Russian)
  4. http://www.dalsolution.com (in English)
  5. http://www.itsv.com/itss (in English and Swedish)
  6. http://www.dca.at (in German)

Bibliography

Below you will find references to the main publications on the mathematical basis of determinacy analysis, selected works on applications in sociology, linguistics, logic of natural language, medicine, and geographic information systems, as well as works on philosophic issues in determinacy theory and logic.

Main works on determinacy theory

  1. S. V. Chesnokov. "Determinacy analysis of social-economic data". Moscow, Nauka, 1982 (in Russian).
  2. S. V. Chesnokov. "Syllogisms in determinacy analysis". Izvestija AN SSSR, the Technical Cybernetics series, 1984, #5, pp. 55-83 (in Russian, English translation appeared in Engineering Cybernetics 22, no. 6, 1985, pp. 96-120).
  3. S. V. Chesnokov. "Computation of the accuracy of D-syllogisms in the statistics of contingency tables". Izvestija AN SSSR, the Technical Cybernetics series, 1985, #1, pp. 141-144 (in Russian).
  4. S. V. Chesnokov. "Determinacy binary syllogistics". Izvestija AN SSSR, the Technical Cybernetics series, 1990, #5, pp. 3-21 (in Russian; English translation appeared in Soviet Journal of Computer and Systems Sciences, vol. 29, # 3, 1991, pp. 67-84)

Determinacy analysis in sociology and the social sciences

  1. S. V. Chesnokov. "Determinacy analysis of social-economic data". Sociological Studies, 1980, #3, pp. 179-189 (in Russian).
  2. S. V. Chesnokov. "Foundation of humanitarian measurement". Preprint of the Institute of Systems Studies, AN SSSR, Moscow, 1985 (in Russian).

Determinacy analysis in linguistics and studies of logic in natural language

  1. V. S. Rotenberg, S.V. Chesnokov. Virtuality of names in the course of a dialog in natural language. Izvestija AN SSSR, the Technical Cybernetics series, 1986, #5, pp. 115-127 (in Russian).
  2. S. V. Chesnokov. The Effect of Semantic Freedom in the Logic of Natural Language. Fuzzy Sets and Systems, 1987, v. 22, #1-2, pp. 121-154.
  3. S. V. Chesnokov, P. A. Luelsdorff. Determinacy Analysis and Theoretical Orthography. Theoretical Linguistics, Walter de Gruyter, Berlin - New York, 1991, v. 17 1/2/3, pp. 231-262.
  4. P. A. Luelsdorff, S. V. Chesnokov. Determinacy - Experience. In "Writing vs Speaking (Language, Text, Discourse, Communication)", S. Chmejrkova, F. Danesh, E. Havlova (eds.), Gunter Narr Verlag Tubingen, pp. 407-413.
  5. P. A. Luelsdorff, S. V. Chesnokov. Determinacy Form as the Essence of Language. In "Prague Linguistic Circle Papers", 1996, v. 2, pp. 205-234.

Determinacy analysis in medicine

  1. S. V. Chesnokov. "Application of Determinacy Analysis for diagnostic criteria searching and data processing in comprehensive ultrasonography". Chapter XVII in the book "Clinical Manual of Ultrasonic Diagnostics", Moscow, 1997, Vol. 4, pp. 362-376 (in Russian).
  2. S. V. Chesnokov "Determinacy Analysis and the search for diagnostic criteria in medicine (the case of comprehensive ultrasonography)". Ultrasonic Diagnostics, 1996, # 4, pp. 42-47 (in Russian).
  3. A. N. Hitrova. "Differential diagnostics of kidney sinus cysts and hydronephrosis by means of comprehensive ultrasonography" M.D. dissertation, Moscow, 1996 (in Russian).
  4. M. D. Musaeva. The application of dopplerography techniques in the diagnosis of gall bladder diseases. M.D. Dissertation, Moscow, 1996 (in Russian).
  5. I. Yu. Nasnikova. The use of dopplerography in the evaluation of urodynamics disorders. Materials for M.D. dissertation, Moscow, 1997 (in Russian).

Determinacy analysis in geographic information systems

  1. I. N. Zaslavsky. Logical Inference about Categorical Coverages in Multi-Layer GIS. Ph.D. dissertation, University of Washington, 1995.

Philosophic issues in determinacy theory and logic

  1. S. V. Chesnokov. "Physics of Logos", New York, Telex, 1991 (in Russian).
  2. S. V. Chesnokov. "People, science, logos", Moscow, 1987-1993 (unpublished).
  3. S. V. Chesnokov. "Is physics of Logos possible?", Moscow, 1994 (unpublished).

Brief historical reference

DALSolution implements Determinacy Analysis (abbreviated as DA). DA is a technology for data processing and analysis focused on the search for and examination of rules (determinacies).

The mathematical foundations of the method were developed during the 1970s by Sergei Chesnokov at the Institute of Systems Studies (VNIISI, Moscow), in the department headed by Academician S. S. Shatalin (1934-1997).

The first software systems implementing DA were developed in the late 1970s. In 1982, a comprehensive volume with a detailed treatise of the method’s foundations was published (Chesnokov, 1982).

In the 1980s, several fundamental mathematical results were obtained in determinacy theory. Determinacy logic was developed during this period, elaborating on the views of Aristotle, which have been largely ignored in mathematical logic of the 20th century. In particular, determinacy theory led to a radical generalization of Aristotelian syllogistics, which in turn resulted in a new understanding of the role of syllogistics for the fundamentals of logic and arithmetic, for analysis of data and for natural language (Chesnokov, works of 1983-1994).

Since the 1980-90s, DA has been widely used in medicine, sociology, linguistics, artificial intelligence studies, and geographic information systems (see Bibliography.) A strong impetus for the application of DA in various areas of science, management, and business resulted from the activities of Context Co., Ltd. and its research extension, the private Institute for Physics of Logos. Starting in 1989, the Company and the Institute assumed the development of the method and applications, as well as the realization of selected collaborative applied projects. New versions of DA software developed at Context Co. opened this technology to wide groups of researchers, managers, and business people.

 

| News | Products | Support | Contact Us | FAQ | Scheme | Index | Home |


Copyright © 1997-2004 Context Co., Ltd. All rights reserved. Terms and Conditions.