Class FuzzyKMeansClusterer<T extends Clusterable>

java.lang.Object
org.hipparchus.clustering.Clusterer<T>
org.hipparchus.clustering.FuzzyKMeansClusterer<T>
Type Parameters:
T - type of the points to cluster

public class FuzzyKMeansClusterer<T extends Clusterable> extends Clusterer<T>
Fuzzy K-Means clustering algorithm.

The Fuzzy K-Means algorithm is a variation of the classical K-Means algorithm, with the major difference that a single data point is not uniquely assigned to a single cluster. Instead, each point i has a set of weights uij which indicate the degree of membership to the cluster j.

The algorithm then tries to minimize the objective function: \[ J = \sum_{i=1}^C\sum_{k=1]{N} u_{i,k}^m d_{i,k}^2 \] with \(d_{i,k}\) being the distance between data point i and the cluster center k.

The algorithm requires two parameters:

  • k: the number of clusters
  • fuzziness: determines the level of cluster fuzziness, larger values lead to fuzzier clusters

Additional, optional parameters:

  • maxIterations: the maximum number of iterations
  • epsilon: the convergence criteria, default is 1e-3

The fuzzy variant of the K-Means algorithm is more robust with regard to the selection of the initial cluster centers.

  • Constructor Details

    • FuzzyKMeansClusterer

      public FuzzyKMeansClusterer(int k, double fuzziness) throws MathIllegalArgumentException
      Creates a new instance of a FuzzyKMeansClusterer.

      The euclidean distance will be used as default distance measure.

      Parameters:
      k - the number of clusters to split the data into
      fuzziness - the fuzziness factor, must be > 1.0
      Throws:
      MathIllegalArgumentException - if fuzziness <= 1.0
    • FuzzyKMeansClusterer

      public FuzzyKMeansClusterer(int k, double fuzziness, int maxIterations, DistanceMeasure measure) throws MathIllegalArgumentException
      Creates a new instance of a FuzzyKMeansClusterer.
      Parameters:
      k - the number of clusters to split the data into
      fuzziness - the fuzziness factor, must be > 1.0
      maxIterations - the maximum number of iterations to run the algorithm for. If negative, no maximum will be used.
      measure - the distance measure to use
      Throws:
      MathIllegalArgumentException - if fuzziness <= 1.0
    • FuzzyKMeansClusterer

      public FuzzyKMeansClusterer(int k, double fuzziness, int maxIterations, DistanceMeasure measure, double epsilon, RandomGenerator random) throws MathIllegalArgumentException
      Creates a new instance of a FuzzyKMeansClusterer.
      Parameters:
      k - the number of clusters to split the data into
      fuzziness - the fuzziness factor, must be > 1.0
      maxIterations - the maximum number of iterations to run the algorithm for. If negative, no maximum will be used.
      measure - the distance measure to use
      epsilon - the convergence criteria (default is 1e-3)
      random - random generator to use for choosing initial centers
      Throws:
      MathIllegalArgumentException - if fuzziness <= 1.0
  • Method Details

    • getK

      public int getK()
      Return the number of clusters this instance will use.
      Returns:
      the number of clusters
    • getFuzziness

      public double getFuzziness()
      Returns the fuzziness factor used by this instance.
      Returns:
      the fuzziness factor
    • getMaxIterations

      public int getMaxIterations()
      Returns the maximum number of iterations this instance will use.
      Returns:
      the maximum number of iterations, or -1 if no maximum is set
    • getEpsilon

      public double getEpsilon()
      Returns the convergence criteria used by this instance.
      Returns:
      the convergence criteria
    • getRandomGenerator

      public RandomGenerator getRandomGenerator()
      Returns the random generator this instance will use.
      Returns:
      the random generator
    • getMembershipMatrix

      public RealMatrix getMembershipMatrix()
      Returns the nxk membership matrix, where n is the number of data points and k the number of clusters.

      The element Ui,j represents the membership value for data point i to cluster j.

      Returns:
      the membership matrix
      Throws:
      MathIllegalStateException - if cluster(Collection) has not been called before
    • getDataPoints

      public List<T> getDataPoints()
      Returns an unmodifiable list of the data points used in the last call to cluster(Collection).
      Returns:
      the list of data points, or null if cluster(Collection) has not been called before.
    • getClusters

      public List<CentroidCluster<T>> getClusters()
      Returns the list of clusters resulting from the last call to cluster(Collection).
      Returns:
      the list of clusters, or null if cluster(Collection) has not been called before.
    • getObjectiveFunctionValue

      public double getObjectiveFunctionValue()
      Get the value of the objective function.
      Returns:
      the objective function evaluation as double value
      Throws:
      MathIllegalStateException - if cluster(Collection) has not been called before
    • cluster

      public List<CentroidCluster<T>> cluster(Collection<T> dataPoints) throws MathIllegalArgumentException
      Performs Fuzzy K-Means cluster analysis.
      Specified by:
      cluster in class Clusterer<T extends Clusterable>
      Parameters:
      dataPoints - the points to cluster
      Returns:
      the list of clusters
      Throws:
      MathIllegalArgumentException - if the data points are null or the number of clusters is larger than the number of data points