Class FuzzyKMeansClusterer<T extends Clusterable>
- Type Parameters:
T
- type of the points to cluster
The Fuzzy K-Means algorithm is a variation of the classical K-Means algorithm, with the major difference that a single data point is not uniquely assigned to a single cluster. Instead, each point i has a set of weights uij which indicate the degree of membership to the cluster j.
The algorithm then tries to minimize the objective function: \[ J = \sum_{i=1}^C\sum_{k=1]{N} u_{i,k}^m d_{i,k}^2 \] with \(d_{i,k}\) being the distance between data point i and the cluster center k.
The algorithm requires two parameters:
- k: the number of clusters
- fuzziness: determines the level of cluster fuzziness, larger values lead to fuzzier clusters
Additional, optional parameters:
- maxIterations: the maximum number of iterations
- epsilon: the convergence criteria, default is 1e-3
The fuzzy variant of the K-Means algorithm is more robust with regard to the selection of the initial cluster centers.
-
Constructor Summary
ConstructorDescriptionFuzzyKMeansClusterer
(int k, double fuzziness) Creates a new instance of a FuzzyKMeansClusterer.FuzzyKMeansClusterer
(int k, double fuzziness, int maxIterations, DistanceMeasure measure) Creates a new instance of a FuzzyKMeansClusterer.FuzzyKMeansClusterer
(int k, double fuzziness, int maxIterations, DistanceMeasure measure, double epsilon, RandomGenerator random) Creates a new instance of a FuzzyKMeansClusterer. -
Method Summary
Modifier and TypeMethodDescriptioncluster
(Collection<T> dataPoints) Performs Fuzzy K-Means cluster analysis.Returns the list of clusters resulting from the last call tocluster(Collection)
.Returns an unmodifiable list of the data points used in the last call tocluster(Collection)
.double
Returns the convergence criteria used by this instance.double
Returns the fuzziness factor used by this instance.int
getK()
Return the number of clusters this instance will use.int
Returns the maximum number of iterations this instance will use.Returns thenxk
membership matrix, wheren
is the number of data points andk
the number of clusters.double
Get the value of the objective function.Returns the random generator this instance will use.Methods inherited from class org.hipparchus.clustering.Clusterer
distance, getDistanceMeasure
-
Constructor Details
-
FuzzyKMeansClusterer
Creates a new instance of a FuzzyKMeansClusterer.The euclidean distance will be used as default distance measure.
- Parameters:
k
- the number of clusters to split the data intofuzziness
- the fuzziness factor, must be > 1.0- Throws:
MathIllegalArgumentException
- iffuzziness <= 1.0
-
FuzzyKMeansClusterer
public FuzzyKMeansClusterer(int k, double fuzziness, int maxIterations, DistanceMeasure measure) throws MathIllegalArgumentException Creates a new instance of a FuzzyKMeansClusterer.- Parameters:
k
- the number of clusters to split the data intofuzziness
- the fuzziness factor, must be > 1.0maxIterations
- the maximum number of iterations to run the algorithm for. If negative, no maximum will be used.measure
- the distance measure to use- Throws:
MathIllegalArgumentException
- iffuzziness <= 1.0
-
FuzzyKMeansClusterer
public FuzzyKMeansClusterer(int k, double fuzziness, int maxIterations, DistanceMeasure measure, double epsilon, RandomGenerator random) throws MathIllegalArgumentException Creates a new instance of a FuzzyKMeansClusterer.- Parameters:
k
- the number of clusters to split the data intofuzziness
- the fuzziness factor, must be > 1.0maxIterations
- the maximum number of iterations to run the algorithm for. If negative, no maximum will be used.measure
- the distance measure to useepsilon
- the convergence criteria (default is 1e-3)random
- random generator to use for choosing initial centers- Throws:
MathIllegalArgumentException
- iffuzziness <= 1.0
-
-
Method Details
-
getK
public int getK()Return the number of clusters this instance will use.- Returns:
- the number of clusters
-
getFuzziness
public double getFuzziness()Returns the fuzziness factor used by this instance.- Returns:
- the fuzziness factor
-
getMaxIterations
public int getMaxIterations()Returns the maximum number of iterations this instance will use.- Returns:
- the maximum number of iterations, or -1 if no maximum is set
-
getEpsilon
public double getEpsilon()Returns the convergence criteria used by this instance.- Returns:
- the convergence criteria
-
getRandomGenerator
Returns the random generator this instance will use.- Returns:
- the random generator
-
getMembershipMatrix
Returns thenxk
membership matrix, wheren
is the number of data points andk
the number of clusters.The element Ui,j represents the membership value for data point
i
to clusterj
.- Returns:
- the membership matrix
- Throws:
MathIllegalStateException
- ifcluster(Collection)
has not been called before
-
getDataPoints
Returns an unmodifiable list of the data points used in the last call tocluster(Collection)
.- Returns:
- the list of data points, or
null
ifcluster(Collection)
has not been called before.
-
getClusters
Returns the list of clusters resulting from the last call tocluster(Collection)
.- Returns:
- the list of clusters, or
null
ifcluster(Collection)
has not been called before.
-
getObjectiveFunctionValue
public double getObjectiveFunctionValue()Get the value of the objective function.- Returns:
- the objective function evaluation as double value
- Throws:
MathIllegalStateException
- ifcluster(Collection)
has not been called before
-
cluster
public List<CentroidCluster<T>> cluster(Collection<T> dataPoints) throws MathIllegalArgumentException Performs Fuzzy K-Means cluster analysis.- Specified by:
cluster
in classClusterer<T extends Clusterable>
- Parameters:
dataPoints
- the points to cluster- Returns:
- the list of clusters
- Throws:
MathIllegalArgumentException
- if the data points are null or the number of clusters is larger than the number of data points
-