FUZZY CLASSIFICATION AND PATTERN RECOGNITION

FUZZY
Dr.NaveenBansal Profile Pic
Dr.NaveenBansal,India,Teacher
Published Date:25-10-2017
Your Website URL(Optional)
Comment
CHAPTER 11 FUZZY CLASSIFICATION AND PATTERN RECOGNITION PARTI CLASSIFICATION From causes which appear similar, we expect similar effects. This is the sum total of all our experimental conclusions. DavidHume Scottishphilosopher, Enquiry Concerning Human Understanding1748 There is structure in nature. Much of this structure is known to us and is quite beautiful. Consider the naturalsphericity of raindrops and bubbles; why do balloons take this shape? How about the elegant beauty of crystals, rhombic solids with rectangular, pentagonal, or hexagonal cross sections? Why do these naturally beautiful, geometric shapes exist? What causes the natural repetition of the mounds of sand dunes? Some phenomena we cannot see directly: for example, the elliptical shape of the magnetic field around the earth; or we can see only when certain atmospheric conditions exist, such as the beautiful and circular appearanceofarainbowortherepetitivepatternsoftheauroraborealisinthenightskynear the North Pole. Some patterns, such as the helical appearance of DNA or the cylindrical shape of some bacteria, have only appeared to us since the advent of extremely powerful electron microscopes. Consider the geometry and colorful patterns of a butterfly’s wings; why do these patterns exist in our physical world? The answers to some of these questions are still unknown; many others have been discovered through increased understanding of physics, chemistry, and biology. Just as there is structure in nature, we believe there is an underlying structure in most of the phenomena we wish to understand. Examples abound in image recognition, Fuzzy Logic with Engineering Applications, Second Edition T. J. Ross  2004 John Wiley & Sons, Ltd ISBNs: 0-470-86074-X (HB); 0-470-86075-8 (PB) www.MatlabSite.comCLASSIFICATION BY EQUIVALENCE RELATIONS 363 molecular biology applications such as protein folding and 3D molecular structure, oil exploration, cancer detection, and many others. For fields dealing with diagnosis we often seektofindstructureinthedataobtainedfromobservation.Ourobservationscanbevisual, audio,oranyofavarietyofsensor-basedelectronicoropticalsignals.Findingthestructure in data is the essence of classification. As the quotation at the beginning of this chapter suggests,ourexperimentalobservationsleadustodeveloprelationshipsbetweentheinputs and outputs of an experiment. As we are able to conduct more experiments we see the relationships forming some recognizable, or classifiable, structure. By finding structure, we are classifying the data according to similar patterns, attributes, features, and other characteristics. The general area is known as classification. In classification, also termed clustering, the most important issue is deciding what criteria to classify against. For example, suppose we want to classify people. In describing people we will look at their height, weight, gender, religion, education, appearance, and so on. Many of these features are numerical quantities such as height and weight; other featuresaresimplylinguisticdescriptorsandthesecanbequitenon-numeric.Wecaneasily classify people according to gender, or one feature. For this classification the criterion is simple: female or male. We might want to classify people into three size categories: small, medium, and large. For this classification we might only need two of the features describing people: height and weight. Here, the classification criterion might be some algebraic combination of height and weight. Suppose we want to classify people according towhetherwewouldwantthemasneighbors.Herethenumberoffeaturestobeusedinthe classification is not at all clear, and we might also have trouble developing a criterion for this classification. Nevertheless, a criterion for classification must be prepared before we can segregate the data into definable classes. As is often the case in classification studies, the number and kind of features and the type of classification criteria are choices that are continually changed as the data are manipulated; and this iteration continues until we think we have a grouping of the data that seems plausible from a structural and physical perspective. This chapter summarizes only two popular methods of classification. The first is classification using equivalent relations Zadeh, 1971; Bezdek and Harris, 1978. This approach makes use of certain special properties of equivalent relations and the concept of defuzzificationknown aslambda-cutsonthe relations. The secondmethodofclassification is a very popular method known as fuzzy c-means, so named because of its close analog in the crisp world, hard c-means Bezdek, 1981. This method uses concepts in n-dimensional Euclidean space to determine the geometric closeness of data points by assigning themto variousclusters orclassesandthen determining the distance betweenthe clusters. CLASSIFICATION BYEQUIVALENCERELATIONS Crisp Relations Define a set, x =x (x ,x ) ∈ R, as the equivalent class of x on a universe of data i j i j i points, X. This class is contained in a special relation, R, known as an equivalence relation (see Chapter 3). This class is a set of all elements related to x that have the following i properties Bezdek, 1974: www.MatlabSite.com364 FUZZY CLASSIFICATION AND PATTERN RECOGNITION 1. x ∈ x therefore (x ,x ) ∈ R i i i i 2. x = x ⇒ x ∩x =∅ i j i j  3. x = X x∈X The first property is that of reflexivity (see Chapter 3), the second property indicates that equivalent classes do not overlap, and the third property simply expresses that the union of all equivalent classes exhausts the universe. Hence, the equivalence relation R can divide the universe X into mutually exclusive equivalent classes, i.e., X R=x x ∈ X (11.1) whereXRiscalledthequotientset.ThequotientsetofXrelativetoR,denotedXR,isthe set whose elements are the equivalence classes of X under the equivalence relation R. The cardinality of XR (i.e., the number of distinct equivalence classes of X under R) is called the rank of the matrix R. Example 11.1 Ross, 1995. Define a universe of integers X=1,2,3,...,10 and define R as the crisp relation for ‘‘the identical remainder after dividing each element of the universe by 3.’’ We have 1 234 567 89 10   1 1 001 001 00 1 20 100 100 10 0    3 0 010 010 01 0     4 1 001 001 00 1     5 0 100 100 10 0   R =   6 0 010 010 01 0   71 001 001 00 1    8 0 100 100 10 0     9 0 010 010 01 0 10 1 001 001 00 1 We note that thisrelation is reflexive, it is symmetric, and, as can be determined by inspection (see Chapter 3), it is also transitive;hence the matrix is an equivalence relation. We can group the elements of the universe into classes as follows: 1 = 4 = 7 = 10=1,4,7,10 with remainder = 1 2 = 5 = 8=2,5,8 with remainder = 2 3 = 6 = 9=3,6,9 with remainder = 0 Then we can show that the classes do not overlap, i.e., they are mutually exclusive: 1 ∩2=∅ and 2 ∩3=∅ and that the union of all the classes exhausts (comprises) the universe.  x = X The quotientset is then determined to have three classes, X R=(1,4,7,10),(2,5,8),(3,6,9) www.MatlabSite.comCLASSIFICATION BY EQUIVALENCE RELATIONS 365 Not all relations are equivalent, but if a relation is at least a tolerance relation (i.e., it exhibits properties of reflexivity and symmetry) then it can be converted to an equivalent relation through max–min compositions with itself. Example 11.2. Suppose you have a collection (universe) of five data points, X=x ,x ,x ,x ,x 1 2 3 4 5 and these data points show similarity to one another according to the following tolerance relation, which is reflexive and symmetric:   11 00 0   11 00 1   R =00 10 0 1   00 01 0 01 00 1 We see that this tolerance relation is not transitive from the expression (x ,x ) ∈ R,(x ,x ) ∈ R but (x ,x ) ∈ R 1 2 1 2 5 1 1 5 1 As indicated in Chapter 3, any tolerance relation can be reformed into an equivalence relation through at most n −1 compositions with itself. In this case one composition of R with itself 1 results in an equivalence relation,   1 100 1   1 100 1   ◦ R R =0 010 0 = R 1 1   0 001 0 1 100 1 As one can see in the relation, R, there are three classes. The first, second and fifth columns are identical and the fourth and fifth columns are each unique. The data points can then be classified into three groups or classes, as delineated below: x = x = x =x ,x ,x x =x x =x 1 2 5 1 2 5 3 3 4 4 Fuzzy Relations As already illustrated, crisp equivalent relations can be used to divide the universe X into mutually exclusive classes. In the case of fuzzyrelations, for all fuzzyequivalent relations, their λ-cuts are equivalent ordinary relations. Hence, to classify data points in the universe using fuzzy relations, we need to find the associated fuzzy equivalent relation. Example 11.3. Example 3.11hadatolerancerelation,sayR , describingfive datapoints,that t ∼ was formed into a fuzzy equivalence relation, R, by composition;this process is repeated here ∼ for this classification example.     10.800.10.2 10.80.40.50.8 0.81 0.400.9 0.810.40.50.9     00.41 0 0 0.40.410.40.4 R =  −→ R =  t ∼ ∼     0.10 0 1 0.5 0.50.50.410.5 0.20.900.51 0.80.90.40.51 www.MatlabSite.com366 FUZZY CLASSIFICATION AND PATTERN RECOGNITION TABLE 11.1 Classification of five data points according to λ-cut level λ-cut level Classification 1.0 x x x x x 1 2 3 4 5 0.9 x x .x x x 1 2 5 3 4 0.8 x .x .x x x 1 2 5 3 4 0.5 x .x .x .x x 1 2 4 5 3 0.4 x .x .x .x .x 1 2 3 4 5 x 2 x 5 x 1 x 4 x 3 FIGURE11.1 λ λ = 0.9 0.8 0.5 0.4 Classification diagram for Example 11.3. By taking λ-cuts of fuzzy equivalent relation R at values of λ = 1, 0.9, 0.8, 0.5, and 0.4, we get the following:       10 10 000 11 00 1       1 01 001 11 00 1       R = 1  R =00 100 R =00 10 0 1 0.9 0.8       1 00 010 00 01 0 01 01 001 11 00 1     1 101 1 11 111 1 101 1 11 111     R = 0 010 0 R = 11 111     0.5 0.4     1 101 1 11 111 1 101 1 11 111 where we can see that the clustering of the five data points according to the λ-cut level is as shown in Table 11.1. We can express the classification scenario described in Table 11.1 with a systematic classificationdiagram,asshowninFig. 11.1.Inthefigureyoucanseethatthehigherthevalue of λ, the finer is the classification. That is, as λ gets larger the tendency of classification tends to approach the trivial case where each data point is assigned to its own class. Another example in fuzzy classification considers grouping photographs of family members together according to visual similarity in attempting to determine genetics of the family tree when considering only facial image. Example 11.4 Tamuraet al.,1971. Threefamiliesexistthathaveatotalof16people,all ofwhomare relatedbyblood.Eachpersonhastheirphototaken,andthe16photosaremixed. A person not familiar with the members of the three families is asked to view the photographs to grade their resemblance to one another. In conducting this study the person assigns the similarity relation matrix, r as shown in Table 11.2. The matrix developed by the person is a ij tolerance fuzzy relation, but it does not have properties of equivalence, i.e., www.MatlabSite.comCLASSIFICATION BY EQUIVALENCE RELATIONS 367 TABLE 11.2 Similarity relation matrix. r ij 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 11.0 20.0 1.0 3 0.0 0.0 1.0 4 0.0 0.0 0.4 1.0 5 0.0 0.8 0.0 0.0 1.0 6 0.5 0.0 0.2 0.2 0.0 1.0 7 0.0 0.8 0.0 0.0 0.4 0.0 1.0 8 0.4 0.2 0.2 0.5 0.0 0.8 0.0 1.0 9 0.0 0.4 0.0 0.8 0.4 0.2 0.4 0.0 1.0 10 0.0 0.0 0.2 0.2 0.0 0.0 0.2 0.0 0.2 1.0 11 0.0 0.5 0.2 0.2 0.0 0.0 0.8 0.0 0.4 0.2 1.0 12 0.0 0.0 0.2 0.8 0.0 0.0 0.0 0.0 0.4 0.8 0.0 1.0 13 0.8 0.0 0.2 0.4 0.0 0.4 0.0 0.4 0.0 0.0 0.0 0.0 1.0 14 0.0 0.8 0.0 0.2 0.4 0.0 0.8 0.0 0.2 0.2 0.6 0.0 0.0 1.0 15 0.0 0.0 0.4 0.8 0.0 0.2 0.0 0.0 0.2 0.0 0.0 0.2 0.2 0.0 1.0 16 0.6 0.0 0.0 0.2 0.2 0.8 0.0 0.4 0.0 0.0 0.0 0.0 0.4 0.2 0.4 1.0 r =1for i = j ij r = r ij ji r ≥ λ and r ≥ λ but r min(λ ,λ ), i.e., transitivitydoes not hold ij 1 jk 2 ik 1 2 For example, r = 0.5,r = 0.8, but r = 0.4 0.5 16 68 18 By compositionthe equivalence relation shown in Table 11.3 is obtained. Whenwetakea λ-cutofthisfuzzyequivalentrelationat λ = 0.6,wegetthedefuzzified relation shown in Table 11.4. Four distinct classes are identified: 1,6,8,13,16, 2,5,7,11,14, 3, 4,9,10,12,15 From this clustering it seems that only photograph number 3 cannot be identified with any of the three families. Perhaps a lower value of λ might assign photograph 3 to one of the other threeclasses.Theotherthreeclustersareallcorrectinthatthemembersidentifiedineachclass are, in fact, the members of the correct families as described in Tamura et al. 1971. Classification using equivalence relations can also be employed to segregate data that are originally developed as a similarity relation using some of the similarity methods developed at the end of Chapter 3. The following problem is an example of this, involving earthquake damage assessment. It was first introduced in Chapter 3 as Example 3.12. Example 11.5. Five regions have suffered damage from a recent earthquake (see Example 3.12). The buildings in each region are characterized according to three damage levels: no damage, medium damage, and seriousdamage. The percentage of buildingsfor a given region in each of the damage levels is given in Table 11.5. www.MatlabSite.com368 FUZZY CLASSIFICATION AND PATTERN RECOGNITION TABLE 11.3 Equivalence relation 12 345 67 8910 11 12 13 14 15 16 11.0 20.4 1.0 3 0.4 0.4 1.0 4 0.5 0.4 0.4 1.0 5 0.4 0.8 0.4 0.4 1.0 6 0.6 0.4 0.4 0.5 0.4 1.0 7 0.4 0.8 0.4 0.4 0.8 0.4 1.0 8 0.6 0.4 0.4 0.5 0.4 0.8 0.4 1.0 9 0.5 0.4 0.4 0.8 0.4 0.5 0.4 0.5 1.0 10 0.5 0.4 0.4 0.8 0.4 0.5 0.4 0.5 0.8 1.0 11 0.4 0.8 0.4 0.4 0.8 0.4 0.8 0.4 0.4 0.4 1.0 12 0.5 0.4 0.4 0.8 0.4 0.5 0.4 0.5 0.8 0.8 0.4 1.0 13 0.8 0.4 0.4 0.5 0.4 0.6 0.4 0.6 0.5 0.5 0.4 0.5 1.0 14 0.4 0.8 0.4 0.4 0.8 0.4 0.8 0.4 0.4 0.4 0.8 0.4 0.4 1.0 15 0.5 0.4 0.4 0.8 0.4 0.5 0.4 0.5 0.8 0.8 0.4 0.8 0.5 0.4 1.0 16 0.6 0.4 0.4 0.5 0.4 0.8 0.4 0.8 0.5 0.5 0.4 0.5 0.6 0.4 0.5 1.0 TABLE 11.4 Defuzzified relation 1 23 45 678 910 1112131415 16 11 20 1 30 0 1 40 0 0 1 5 0 10 01 6 1 00 00 1 7 0 10 01 01 8 1 00 00 101 9 0 00 10 000 1 10 00 01 00 0 01 1 11 01 00 10 1 00 0 1 12 00 01 00 0 01 1 0 1 13 10 00 01 0 10 0 0 0 1 14 0 1 0 0 1 0 1 0 0 0 1001 15 0 0 0 1 0 0 0 0 1 1 01001 16 1 0 0 0 0 1 0 1 0 0 00100 1 TABLE 11.5 Proportion of buildingsdamaged. in three levels by region Regions x x x x x 1 2 3 4 5 x –Ratio with no damage 0.3 0.2 0.1 0.7 0.4 i1 x –Ratio with medium damage 0.6 0.4 0.6 0.2 0.6 i2 x –Ratio with serious damage 0.1 0.4 0.3 0.1 0 i3 www.MatlabSite.comCLUSTER ANALYSIS 369 TABLE 11.6 Classification of earthquake damage by region for λ = 0.934 Regions Mercalli intensity x VII 4 x ,x VIII 1 5 x ,x IX 2 3 Using the cosine amplitude approach, described in Chapter 3, we obtain the following tolerance relation, R : 1 ∼   1 0.836 1 sym    0.914 0.934 1 R =  1 ∼   0.682 0.60.441 1 0.982 0.74 0.818 0.774 1 Three max–min compositionsproduce a fuzzy equivalence relation,   1 0.914 1 sym    3 R = R = 0.914 0.934 1   1 ∼ ∼   0.774 0.774 0.774 1 0.982 0.914 0.914 0.774 1 Now, if we take λ-cuts at two different values of λ,say λ = 0.914 and λ = 0.934, the following defuzzified crisp equivalence relations and their associated classes are derived:   1 1101 1 1101    λ = 0.914 : R = 1 1101   λ   0 0010 1 1111 x,x,x,x , x 1 2 3 5 4   10001   01100   λ = 0.934 : R = 01100 λ     00010 10001 x,x , x,x , x 1 5 2 3 4 Hence,ifwewantedtoclassifytheearthquakedamageforpurposesofinsurancepayout into, say, two intensities on the modified Mercalli scale (the Mercalli scale is a measure of an earthquake’sstrengthintermsofaveragedamagetheearthquakecausesinstructuresinagiven region), then regions 1, 2, 3, and 5 belong to a larger Mercalli intensity and region 4 belongs to a smaller Mercalli intensity (see λ = 0.914). But if we wanted to have a finer division for, say, three Mercalli scales, we could assign the regions shown in Table 11.6. CLUSTERANALYSIS Clustering refers to identifying the number of subclasses of c clusters in a data universe X comprised of n data samples, and partitioning X into c clusters (2 ≤cn). Note that www.MatlabSite.com370 FUZZY CLASSIFICATION AND PATTERN RECOGNITION c = 1 denotes rejection of the hypothesis that there are clusters in the data, whereas c = n constitutes the trivial case where each sample is in a ‘‘cluster’’ by itself. There are two kinds of c-partitions of data: hard (or crisp) and soft (or fuzzy). For numerical data one assumes that the members of each cluster bear more mathematical similarity to each other than to members of other clusters. Two important issues to consider in this regard are how to measure the similarity between pairs of observations and how to evaluate the partitions once they are formed. Oneofthesimplestsimilaritymeasuresisdistancebetweenpairsoffeaturevectorsin thefeaturespace.Ifonecandetermineasuitabledistancemeasureandcomputethedistance betweenallpairsofobservations,thenonemayexpectthatthedistancebetweenpointsinthe sameclusterwillbeconsiderablylessthanthedistancebetweenpointsindifferentclusters. Several circumstances, however, mitigate the general utility of this approach, such as the combination of values of incompatible features, as would be the case, for example, when different features have significantly different scales. The clustering method described in thischapterdefines‘‘optimum’’partitionsthroughaglobalcriterionfunctionthatmeasures theextenttowhichcandidatepartitionsoptimizeaweightedsumofsquarederrorsbetween datapointsandclustercentersinfeaturespace.Manyotherclusteringalgorithmshavebeen proposed for distinguishing substructure in high-dimensional data Bezdek et al., 1986. It isemphasizedherethatthemethodofclusteringmustbecloselymatchedwiththeparticular data under study for successful interpretation of substructure in the data. CLUSTERVALIDITY Inmanycases,thenumber cofclustersinthedataisknown.Inothercases,however,itmay be reasonable to expect cluster substructure at more than one value of c. In this situation it is necessary to identify the value of c that gives the most plausible number of clusters in the data for the analysis at hand. This problem is known as cluster validity see Duda and Hart, 1973; or Bezdek, 1981. If the data used are labeled, there is a unique and absolute measure of cluster validity: the c that is given. For unlabeled data, no absolute measure of clustering validity exists. Although the importance of these differences is not known, it is clear that the features nominated should be sensitive to the phenomena of interest and not to other variations that might not matter to the applications at hand. c-MEANSCLUSTERING Bezdek 1981 developed an extremely powerful classification method to accommodate fuzzy data. It is an extension of a method known as c-means, or hard c-means, when employed in a crisp classification sense. To introduce this method, we define a sample set of n data samples that we wish to classify: X=x ,x ,x ,...,x (11.2) 1 2 3 n Each data sample, x,isdefinedby m features,i.e., i x =x ,x ,x ,...x (11.3) i i1 i2 i3 im where each x in the universe X is an m-dimensional vector of m elements or m features. i Since the m features all can have different units, in general, we have to normalize each of www.MatlabSite.comHARD c-MEANS (HCM) 371 z V 2 V x 1 O y FIGURE 11.2 Cluster idea with hard c-means. the features to a unified scale before classification. In a geometric sense, each x is a point i in m-dimensional feature space, and the universe of the data sample, X, is a point set with n elements in the sample space. Bezdek1981 suggestedusinganobjectivefunctionapproachforclusteringthe data into hyperspherical clusters. This idea for hard clustering is shown in three-dimensional feature space in Fig. 11.2. In this figure, each cluster of data is shown as a hyperspherical shape with a hypothetical geometric cluster center. The objective function is developed so as to do two things simultaneously: first, minimize the Euclidean distance between each data point in a cluster and its cluster center (a calculated point), and second, maximize the Euclidean distance between cluster centers. HARDc-MEANS(HCM) HCM is used to classify data in a crisp sense. By this we mean that each data point will be assigned to one, and only one, data cluster. In this sense these clusters are also called partitions – that is, partitions of the data. Define a family of sets A ,i = 1,2,...,c as a i hard c-partition of X, where the following set-theoretic forms apply to these partitions: c  A = X (11.4) i i=1 A ∩A =∅ all i = j (11.5) i j ∅⊂ A ⊂Xall i (11.6) i again, where X=x ,x ,x ,...,x is a finite set space comprised of the universe of data 1 2 3 n samples, and c is the number of classes, or partitions, or clusters, into which we want to classify the data. We note the obvious, 2 ≤cn (11.7) where c = n classes just places each data sample into its own class, and c = 1 places all data samples into the same class; neither case requires any effort in classification, and both www.MatlabSite.com372 FUZZY CLASSIFICATION AND PATTERN RECOGNITION are intrinsically uninteresting. Equation (11.4) expresses the fact that the set of all classes exhausts the universe of data samples. Equation (11.5) indicates that none of the classes overlap in the sense that a data sample can belong to more than one class. Equation (11.6) simply expresses that a class cannot be empty and it cannot contain all the data samples. Suppose we have the case where c = 2. Equations (11.4) and (11.5) are then mani- fested in the following set expressions: A = A A ∪A =Xand A ∩A =∅ 2 1 1 1 1 1 These set expressions are equivalent to the excluded middle axioms (Eqs. (2.12)). The function-theoretic expressions associated with Eqs. (11.4), (11.5), and (11.6) are these c χ (x ) = 1 for all k (11.8) A k i i=1 χ (x ) ∧ χ (x ) = 0 for all k (11.9) A k A k i j n 0 χ (x )n for all i (11.10) A k i k=1 where the characteristic function χ (x ) is defined once again as A k i 1, x ∈ A k i χ (x ) = (11.11) A k i 0, x ∈ / A k i Equations (11.8) and (11.9) explain that any sample x can only and definitely belong to k one of the c classes. Equation (11.10) implies that no class is empty and no class is the whole set X (i.e., the universe). For simplicity in notation, our membership assignment of the jth data point in the ith cluster, or class, is defined to be χ ≡ χ (x ). Now define a matrix U comprised ij A j i of elements χ (i = 1,2,...,c; j = 1,2,...,n); hence, U is a matrix with c rows and n ij columns. Then we define a hard c-partition space for X as the following matrix set: c n M = U χ ∈0,1, χ = 1,0 χ n (11.12) c ij ik ik i=1 k=1 Any matrix U ∈ M is a hard c-partition. The cardinality of any hard c-partition, M ,is c c       c 1 c c−i n η = (−1) · i (11.13) M c i c i=1   c where the expression is the binomial coefficientof c things taken i at a time. i Example 11.6. Suppose we have five data points in a universe, X=x ,x ,x ,x ,x .Also, 1 2 3 4 5 suppose we want to cluster these five points into two classes. For this case we have n =5and www.MatlabSite.comHARD c-MEANS (HCM) 373 c = 2. The cardinality, using Eq. (11.13), of this hard c-partition is given by 1 5 η = 2(−1) +2 = 15 M c 2 Some of the 15 possiblehard 2-partitionsare listed here:      11 110 11 10 0 11 000 10 000 00 001 00 01 1 00 111 01 111     10 100 10 01 0 1 000 1 01 011 01 10 1 0 111 0 andsoon. Notice that these two matrices,     111 10 0000 1 and 000 01 1111 0 are not different-clustering 2-partitions. In fact, they are the same 2-partitions irrespective of anarbitraryrow-swap.Ifwelabel the firstrowofthe firstUmatrix class c and welabel 1 the second row class c , we would get the same classification for the second U matrix by 2 simply relabeling each row: the first row is c and the second row is c . The cardinality 2 1 measure given in Eq. (11.13) gives the number of unique c-partitions for n data points. Aninterestingquestionnowarises:Ofallthepossible c-partitionsfor ndatasamples, how can we select the most reasonable c-partition for the partition space M ? For instance, c in the example just provided, which of the 15 possible hard 2-partitions for five data points andtwoclassesisthebest?Theanswertothisquestionisprovidedbytheobjectivefunction (orclassificationcriteria)tobeusedtoclassifyorclusterthedata.Theoneproposedforthe hard c-means algorithm is known as a within-class sum of squared errors approach using a Euclidean norm to characterize distance. This algorithm is denoted J(U, v), where U is the partition matrix, and the parameter,v, is a vector of cluster centers. This objective function is given by n c 2 J(U,v) = χ (d ) (11.14) ik ik k=1 i=1 m where d is a Euclidean distance measure (in m-dimensional feature space, R )between ik the kth data sample x and ith cluster center v,givenby k i   1/2 m 2   d = d(x −v )=x −v = (x − v ) (11.15) ik k i k i kj ij j=1 m Since each data sample requires m coordinates to describe its location in R -space, each cluster center also requires m coordinates to describe its location in this same space. Therefore, the ith cluster center is a vector of length m, v =v ,v ,...,v i i1 i2 im www.MatlabSite.com374 FUZZY CLASSIFICATION AND PATTERN RECOGNITION where the jth coordinate is calculated by n χ · x ik kj k=1 v = (11.16) ij n χ ik k=1 ∗ We seek the optimum partition, U , to be the partition that produces the minimum value for the function, J. That is, ∗ ∗ J(U ,v ) = min J(U,v)(11.17) U∈M c ∗ Finding the optimum partition matrix, U , is exceedingly difficult for practical problems because M →∞ for even modest-sized problems. For example, for the case c where n = 25 and c = 10, the cardinality approaches an extremely large number, i.e., 18 M → 10 Obviously,asearchforoptimalitybyexhaustionisnotcomputationallyfeasible c forproblemsofreasonableinterest.Fortunately,veryusefulandeffectivealternativesearch algorithms have been devised Bezdek, 1981. One such search algorithm is known as iterative optimization. Basically, this method is like many other iterative methods in that we start with an initial guess at the U matrix. From this assumed matrix, input values for the number of classes, and iteration tolerance (the accuracywe demand in the solution), we calculate the centers of the clusters (classes). From these cluster, or class, centers we recalculate the membership values that each data point has in the cluster. We compare these values with the assumed values and continue this process until the changes from cycle to cycle are within our prescribed tolerance level. The step-by-step procedures in this iterative optimization method are provided here Bezdek, 1981: 1. Fixc(2 ≤cn) and initialize the U matrix: (0) U ∈ M c Then do r = 0,1,2,.... 2. Calculate the c center vectors: (r) (r) v with U i (r) 3. Update U ; calculate the updated characteristic functions (for all i, k): (r) (r) 1,d = mind for all j ∈ c (r+1) ik jk χ = (11.18) ik 0, otherwise 4. If (r+1) (r) U −U ≤ε(tolerance level)(11.19) stop; otherwise set r = r +1 and return to step 2. In step 4, the notation is any matrix norm such as the Euclidean norm. www.MatlabSite.comHARD c-MEANS (HCM) 375 FIGURE 11.3 X Butterfly classification problem Bezdek, 1981. Example 11.7 Bezdek, 1981. A good illustration of the iterative optimization method is provided with the ‘‘butterfly problem’’ shown in Fig. 11.3. In this problem we have 15 data points and one of them is on a vertical line of symmetry (the point in the middle of the data cluster). Suppose we want to cluster our data into two classes. We can see that the points to the left of the line of symmetry should be in one class and the points to the right of the line of symmetry should be in the other class. The problem lies in assigning the point on the line of symmetry to a class. To which class should this point belong? Whichever class the algorithm assigns this point to, there will be a good argument that it should be a member of the other class.Alternatively,theargument may revolvearoundthe factthatthe choiceof twoclassesis a poorone forthisproblem.Three classesmightbethebestchoice, butthe physicsunderlying the data might be binary and two classes may be the only option. InconductingtheiterativeoptimizationapproachwehavetoassumeaninitialUmatrix. This matrix will have two rows (two classes, c = 2) and 15 columns (15 data points, n = 15). It is important to understand that the classes may be unlabeled in this process. That is, we can look at the structure of the data without the need for the assignment of labels to the classes. This is often the case when one is first looking at a group of data. After several iterations with the data, and as we become more and more knowledgeable about the data, we can then assign labels to the classes. We start the solution with the assumption that the point in the middle (i.e., the eighth column) is assigned to the class represented by the bottom row of the initial U (0) matrix, U :   11 111 11 000 000 00 (0) U = 00 000 00 111 111 11 After four iterations Bezdek, 1981 this method converges to within a tolerance level of ε = 0.01, as   11 111 11 000 000 00 (4) U = 00 000 00 111 111 11 We note that the point on the line of symmetry (i.e., the eighth column) is still assigned to the class represented by the second row of the U matrix. The elements in the U matrix indicate membership of that data pointin the first or second class. For example, the pointon the line of symmetry has full membership in the second class and no membership in the first class; yet it is plain to see from Fig. 11.3 that physically it should probably share membership with each class. This is not possible with crisp classification; membership is binary – a point is either a member of a class or not. The following example illustrates again the crisp classification method. The process will be instructive because of its similarity to the subsequent algorithm to be developed for the fuzzy classification method. Example 11.8. In a chemical engineering process involving an automobile’s catalytic con- verter (which converts carbon monoxide to carbon dioxide) we have a relationship between www.MatlabSite.com376 FUZZY CLASSIFICATION AND PATTERN RECOGNITION y 3.2 c c 1 2 3 2.8 (0) 2 U 1 1 1.31.5 2 3 x FIGURE11.4 Four data points in two-dimensional feature space. the conversion efficiency of the catalytic converter and the inverse of the temperature of the catalyst.Twoclassesofdataareknownfromthereactionefficiency.Pointsofhighconversion efficiencyandhightemperatureareindicatorsofanonpollutingsystem(class c ),andpointsof 1 low conversion efficiency and low temperature are indicative of a polluting system (class c ). 2 Suppose you measure the conversion efficiency and temperature (T) of four different catalytic converters and attempt to characterize them as polluting or nonpolluting. The four data points (n = 4) are shown in Fig. 11.4, where the y axis is conversion efficiency and the x axis is the inverseofthetemperature(inaconversionprocesslikethistheexactsolutiontakestheformof ln(1/T)). The data are described by two features (m = 2), and have the following coordinates in 2D space: x =1,3 1 x =1.5,3.2 2 x =1.3,2.8 3 x =3,1 4 We desire to classify these data points into two classes (c = 2). It is sometimes useful to calculate the cardinality of the possible number of crisp partitions for this system, i.e., to find η using Eq. (11.13); thus, M c           1 1 c 2 2 c−i n 1 4 0 4 η = (−1) i = (−1) (1) + (−1) (2) M c i 1 2 c 2 1 = −2 +16 = 7 2 which says that there are seven unique ways (irrespective of row-swaps) to classify the four points into two clusters. Let us begin the iterative optimization algorithm with an initial guess ofthecrisppartition,U,byassumingx tobeinclass1andx ,x ,x tobeinclass2,asshown 1 2 3 4 www.MatlabSite.comHARD c-MEANS (HCM) 377 in Fig. 11.4, i.e.,   10 00 (0) U = 01 11 (0) Now, from the initial U (which is one of the seven possible crisp partitions) we seek the ∗ optimum partition U , i.e., (0) (1) (2) ∗ U −→ U −→ U −→ · · · −→ U Ofcourse,optimalityisdefinedintermsofthedesiredtoleranceorconvergencelevel, ε. In general, for class 1 we calculate the coordinates of the cluster center, χ x + χ x + χ x + χ x 11 1j 12 2j 13 3j 14 4j v = 1j χ + χ + χ + χ 11 12 13 14 (1)x + (0)x + (0)x + (0)x 1j 2j 3j 4j = 1 +0 +0 +0 and v =v ,v ,...,v i i1 i2 im In this case m = 2, which means we deal with two coordinates for each data point. Therefore, v =v ,v i i1 i2 where for c =1(whichisclass1), v =v ,v 1 11 12 for c =2(whichisclass2), v =v ,v 2 21 22 Therefore, using the expression for v for c = 1, and j = 1 and 2, respectively, ij  1(1)  v = = 1 −→ x coordinate 11 1 ⇒ v =1,3 1 1(3)   v = = 3 −→ y coordinate 12 1 whichjusthappenstobe thecoordinatesofpoint x , sincethisisthe onlypointintheclassfor 1 (0) the assumed initial partition, U .For c = 2 or class 2, we get cluster center coordinates (0)x + (1)x + (1)x + (1)x x + x + x 1j 2j 3j 4j 2j 3j 4j v = = 2j 0 +1 +1 +1 3 Hence, for c =2and j = 1 and 2, respectively,  1(1.5) +1(1.3) +1(3)  v = = 1.93 −→ x coordinate 21 3 ⇒ v =1.93,2.33 2 1(3.2) +1(2.8) +1(1)   v = = 2.33 −→ y coordinate 22 3 Now, we compute the values for d , or the distances from the sample x (a data set) to ik k the center, v,ofthe ith class. Using Eq. (11.15), i   1/2 m 2   d = (x − v ) ik kj ij j=1 www.MatlabSite.com378 FUZZY CLASSIFICATION AND PATTERN RECOGNITION 2 2 1/2 we get, for example, for c =1: d = (x − v ) + (x − v ) . Therefore, for each 1k k1 11 k2 12 data set k = 1 to 4, we compute the values of d as follows: for cluster 1, ik  2 2 d = (1 −1) + (3 −3) = 0.0 11  2 2 d = (1.5 −1) + (3.2 −3) = 0.54 12  2 2 d = (1.3 −1) + (2.8 −3) = 0.36 13  2 2 d = (3 −1) + (1 −3) = 2.83 14 and for cluster 2,  2 2 d = (1 −1.93) + (3 −2.33) = 1.14 21  2 2 d = (1.5 −1.93) + (3.2 −2.33) = 0.97 22  2 2 d = (1.3 −1.93) + (2.8 −2.33) = 0.78 23  2 2 d = (3 −1.93) + (1 −2.33) = 1.70 24 (1) Now,weupdatethepartitiontoU foreachdatapoint(for (c −1)clusters)usingEq. (11.18). Hence, for class 1 we compare d against the minimum of d ,d : ik ik 2k For k = 1, d = 0.0, min(d ,d ) = min(0,1.14) = 0.0; thus χ = 1 11 11 21 11 For k = 2, d = 0.54, min(d ,d ) = min(0.54,0.97) = 0.54; thus χ = 1 12 12 22 12 For k = 3, d = 0.36, min(d ,d ) = min(0.36,0.78) = 0.36; thus χ = 1 13 13 23 13 For k = 4, d = 2.83, min(d ,d ) = min(2.83,1.70) = 1.70; thus χ = 0 14 14 24 14 Therefore, the updated partition is   11 10 (1) U = 00 01 (0) (1) SincethepartitionsU andU aredifferent,werepeatthesameprocedurebasedonthenew setup of two classes. For c = 1, the center coordinates are x + x + x 1j 2j 3j v or v = , since χ = 0 1j j 14 1 +1 +1 +0  x + x + x 1 +1.5 +1.3 11 21 31   v = = = 1.26 11  3 3 v =1.26,3.0 1  x + x + x 3 +3.2 +2.8 12 22 32   v = = = 3.0 12 3 3 www.MatlabSite.comFUZZY c-MEANS (FCM) 379 and for c = 2, the center coordinates are x 4j v or v = , since χ = χ = χ = 0 2j j 21 22 23 0 +0 +0 +1 3 v = = 3 21 1 v =3,1 2 1 v = = 1 22 1 Now, we calculate distance measures again:   2 2 2 2 d = (1 −1.26) + (3 −3) = 0.26 d = (1 −3) + (3 −1) = 2.83 11 21   2 2 2 2 d = (1.5 −1.26) + (3.2 −3) = 0.31 d = (1.5 −3) + (3.2 −1) = 2.66 12 22   2 2 2 2 d = (1.3 −1.26) + (2.8 −3) = 0.20 d = (1.3 −3) + (2.8 −1) = 2.47 13 23   2 2 2 2 d = (3 −1.26) + (1 −3) = 2.65 d = (3 −3) + (1 −1) = 0.0 14 24 (1) (2) and again update the partition U to U : For k = 1, d = 0.26, min(d ,d ) = min(0.26,2.83) = 0.26; thus χ = 1 11 11 21 11 For k = 2, d = 0.31, min(d ,d ) = min(0.31,2.66) = 0.31; thus χ = 1 12 12 22 12 For k = 3, d = 0.20, min(d ,d ) = min(0.20,2.47) = 0.20; thus χ = 1 13 13 23 13 For k = 4, d = 2.65, min(d ,d ) = min(2.65,0.0) = 0.0; thus χ = 0 14 14 24 14 (1) (2) Because the partitions U and U are identical, we could say the iterative process has converged; therefore, the optimum hard partition (crisp) is   11 10 (∗) U = 00 01 This optimum partition tells us that for this catalytic converter example, the data pointsx , x , 1 2 andx aremoresimilarinthe2Dfeaturespace,anddifferentfromdatapointx .Wecouldsay 3 4 that pointsx , x ,and x are more indicative of a nonpollutingconverter than is data pointx . 1 2 3 4 FUZZYc-MEANS(FCM) Let us consider whether the butterfly example in Fig. 11.3 could be improved with the use of fuzzy set methods. To develop these methods in classification, we define a family of fuzzy sets A ,i = 1,2,...,c as a fuzzy c-partition on a universe of data points, X. i ∼ Because fuzzy sets allow for degrees of membership we can extend the crisp classification idea into a fuzzy classification notion. Then we can assign membership to the various data www.MatlabSite.com380 FUZZY CLASSIFICATION AND PATTERN RECOGNITION points in each fuzzy set (fuzzy class, fuzzy cluster). Hence, a single point can have partial membership in more than one class. It will be useful to describe the membership value that the kth data point has in the ith class with the following notation: µ = µ (x ) ∈ 0,1 ik A k i ∼ with the restriction (as with crisp classification) that the sum of all membership values for a single data point in all of the classes has to be unity: c µ = 1 for all k = 1,2,...,n (11.20) ik i=1 Asbeforeincrispclassification,therecanbenoemptyclassesandtherecanbenoclassthat contains all the data points. This qualification is manifested in the following expression: n 0 µ n (11.21) ik k=1 Becauseeachdatapointcanhavepartialmembershipinmorethanoneclass,therestriction of Eq. (11.9) is not present in the fuzzy classification case, i.e., µ ∧ µ = 0 (11.22) ik jk The provisions of Eqs. (11.8) and (11.10) still hold for the fuzzy case, however, c µ (x ) = 1 for all k (11.23) A k i i=1 n 0 µ (x )n for all i (11.24) A k i k=1 Before, in the case of c = 2, the classification problem reduced to that of the excluded middle axioms for crisp classification. Since we now allow partial membership, the case of c = 2 does not follow the restrictions of the excluded middle axioms, i.e., for two classes A and A , i j ∼ ∼ A ∩A =∅ (11.25) i j ∼ ∅⊂ A ⊂ X (11.26) i ∼ Wecannowdefineafamilyoffuzzypartitionmatrices,M ,fortheclassificationinvolving fc c classes and n data points, c n M = U µ ∈ 0,1; µ = 1;0 µ n (11.27) fc ik ik ik ∼ i=1 k=1 where i = 1,2,...,c and k = 1,2,...,n. www.MatlabSite.comFUZZY c-MEANS (FCM) 381 Any U ∈ M is a fuzzy c-partition, and it follows from the overlapping character fc ∼ of the classes and the infinite number of membership values possible for describing class membership that the cardinality of M is also infinity, i.e., η =∞. fc M fc Example 11.9 Similar to Bezdek, 1981. Suppose you are a fruit geneticist interested in genetic relationships among fruits. In particular, you know that a tangelo is a cross between a grapefruitandatangerine.Youdescribethefruitwithsuchfeaturesascolor,weight,sphericity, sugar content, skin texture, and so on. Hence, your feature space could be highly dimensional. Suppose you have three fruits (three data points): X=x = grapefruit,x = tangelo,x = tangerine 1 2 3 These data points are described by m features, as discussed. You want to class the three fruits into two classes to determine the genetic assignment for the three fruits. In the crisp case, the classification matrix can take one of three forms, i.e., the cardinality for this case where n = 3 and c =2is η = 3 (see Eq. (11.13)). Suppose you arrange your U matrix as follows: M c ∼ x x x 1 2 3   c 10 0 1 U = ∼ c 01 1 2 The three possiblepartitions of the matrix are     100 11 0 10 1 011 00 1 01 0 Notice in the first partition that we are left with the uncomfortable segregation of the grapefruitinoneclassandthetangeloandthetangerineintheother;thetangelosharesnothing in common with the grapefruit In the second partition, the grapefruit and the tangelo are in a class, suggesting that they share nothing in common with the tangerine Finally, the third partition is the most genetically discomforting of all, because here the tangelo is in a class by itself,sharingnothingincommonwithitsprogenitorsOneofthesethreepartitionswillbethe final partition when any algorithm is used. The question is, which one is best? Intuitively the answer is none, but in crisp classification we have to use one of these. In the fuzzy case this segregation and genetic absurdity are not a problem. We can have the most intuitive situation where the tangelo shares membership with both classes with the parents. For example, the following partition might be a typical outcome for the fruit genetics problem: x x x 1 2 3   10.91 0.58 0.13 U = ∼ 20.09 0.42 0.87 In this case, Eq. (11.24) shows that the sum of each row is a number between 0 and n,or 0 µ = 1.62 3 1k k 0 µ = 1.38 3 2k k and for Eq. (11.22) there is overlap among the classes for each data point, µ ∧ µ = min(0.91,0.09) = 0.09 = 0 11 21 µ ∧ µ = min(0.58,0.42) = 0.42 = 0 12 22 µ ∧ µ = min(0.13,0.87) = 0.13 = 0 13 23 www.MatlabSite.com

Advise: Why You Wasting Money in Costly SEO Tools, Use World's Best Free SEO Tool Ubersuggest.