FuzME
Fuzzy k-Means with Extragrades Program

See FuzMe poster


Introduction
 

The main purpose of classification in a geographical context such as soil survey is to enable a concise description of the spatial variation of soil as a three-dimensional multivariate system. In this field of application classification rather than ordination is generally preferred for data reduction, because the relationships between the properties are often highly nonlinear.
However, where spatial variation is gradual instead of abrupt, disjoint classes poorly fit, the reality to be described. Therefore an approach with fuzzy classes seems more appropriate. In the type of applications we are dealing with, description is not the only purpose. Class memberships as, for instance, presented on soil maps are used for prediction of properties. Such properties may be those used to define the classification, or they may be related properties. Ideally, the classification system is designed in such a way that it provides an optimal basis for spatial interpolation as well as
prediction of proper ties from class memberships.
 

Fuzzy k-Means
 

One approach to fuzzy classification, is fuzzy c-means (Bezdek, 1981), or fuzzy k-means (De Gruijter & McBratney, 1988). Fuzzy k-means minimises the within-class sum square errors functional under the following conditions:

i=1,2,.....,n

k = 1,2,.....,c

mikÎ {0,1} i = 1,2,.....,n; k =1,.....,c      [1]

It is defined by the following objective function:

                          [2]

where n is the number of data, c is the number of classes, ck is the vector representing the centroid of class k, xi is the vector representing individual data i and d2(xi,ck) is the squared distance between xi and ck according to a chosen definition of distance, which for simplicity further denoted by d2ik. f is the fuzzy exponent and ranges from (1, ¥). It determines the degree of fuzziness of the final solution, that is the degree of overlap between groups. With f =1, the solution is a hard partition. As f approaches infinity the solution approaches its highest degree of fuzziness.

The minimisation of the objective function J provide the solution for the membership function (Bezdek, 1981):

i=1,2,.....,n; k =1,.....,c           [3]

    k = 1,2,....., c                        [4]
 

The fuzzy k-means algorithm (see Figure 1) is as follows:
[1] Choose the number of classes k, with 1<k<n.
[2] Choose a value for the fuzziness exponent f, with f>1.
[3] Choose a definition of distance in the variable-space.
[4] Choose a value for the stopping criterion e (e= 0.001 gives reasonable convergence. )
[5] Initialize M = M(0)  , e.g. with random memberships or with memberships from a hard k-means partition.
[6] At iteration it =1,2,3
    (re) calculate C=C(it)   using equation [4] and M(it-1)
[7] Re-calculate M=M(it)    using equation [3] and C(it).
    If numerical overflow would occur with dic close or equal to 0, mic  is set to 1.
[8] Compare M(it) to M(it-1) in a convenient matrix norm.
    If ||M(it) - M(it-1) || < e, then stop; otherwise return to step [6].
 

Read FuzME theory

Fuzzy k-means with extragrades
 

By their nature, continuous classes should provide better representations of outliers or atypical individuals than discontinuous classes. This is especially the case with outliers located between clusters in property space. ( we refer to this type of individuals as intragrades) Fuzzy k-means, for instance, will indeed give intermediate memberships to intragrades. However, outliers outside the main body of data points, referred to as extragrades, are still not suitably represented by fuzzy k-means.

De Gruijter & McBratney (1988) to modify the objective function J to account for extragrades. This improvement makes the memberships directly depend upon the distances to the class centroids as:

   [5]

where m* denotes the membership to a fuzzy class of outliers and a is a parameter that determines the mean value of m*. The aim is to accommodate the outlier entia in a special class to decrease the effect of them on classification. The members of this particular class are not concentrated in a fuzzy hypersphere around a defined class centre, as with regular classes. Instead, they are spread across and over regions of larger distances between an individual and the class centres. Minimisation of this objective similar to that used in fuzzy k-means. The solution for the membership:

i=1,2,...,n; k=1,...,          [6]

i=1,2,...,n

k = 1,2,...,c
The algorithm for solving the above equations can be found in deGruijter and McBratney (1988) [See Figure 2] and implemented in the program FuzME. Program FuzME uses Brent's algorithm (Press et al., 1992)  for searching optimal a value rather than the regula falsi method as described in deGruijter and McBratney (1988).
Download the paper by De Gruijter & McBratney (1988).

Read more about fuzzy k means with extragrades and noise clustering.

 

Read FuzME theory

Fuzzy linear discriminant analysis

 

FuzME application in soil classification

See THE AUSTRALIAN SOIL IDENTIFICATION SPREADSHEET (ASIS)

Read about FuzME theory

Read more about fuzzy k means with extragrades and noise clustering.

References
 

Bezdek, J.C., 1981. Pattern Recognition with Fuzzy Objective Function Algorithms. Plenum Press, New York.

deGruijter, J.J., McBratney, A.B., 1988. A modified fuzzy k means for predictive classification. In: Bock,H.H.(ed) Classification and Related Methods of Data Analysis. pp. 97-104. Elsevier Science, Amsterdam. Download here

Gustafson, D.E., Kessel, W., 1979. Fuzzy clustering with a fuzzy covariance matrix. Proc. IEEE-CDC 2, 761-766

McBratney, A.B., Moore, A.W., 1985. Application of fuzzy sets to climatic classification. Agricultural and Forest Meteorology 35, 165-185.

McBratney, A.B., deGruijter, J.J., 1992. A continuum approach to soil classification by modified fuzzy k-means with extragrades. Journal of Soil Science 43, 159-175.

Odeh, I.O.A., McBratney, A.B., Chittleborough, D.J., 1992. Soil pattern recognition with fuzzy c-means: Application to classification and soil-landform interrealtionship. Soil Science Society of American Journal 56, 505-516.

Pal, N.R., Bezdek, J.C., 1995. On cluster validity for the fuzzy c-means model. IEEE Transactions on Fuzzy Systems 3, 370-379.

Press,  W.H., Teukolsky, S.A., Vetterling, W.T., Flannery, B.P., 1992. Numerical Recipes: The art of scientific computing. Cambridge University Press. http://www.nr.com

Roubens, M., 1982. Fuzzy clustering algorithms and their cluster validity. European Journal of Operational Research 10, 294-301.

Xie,X.L., Beni,G.1991. A validity measure for fuzzy clustering. IEEE Transactions of Pattern Analysis and Machine Intelligence 13, 841-847.
 


Last Updated December 2003.