Class ZipfDistribution

java.lang.Object
org.hipparchus.distribution.discrete.AbstractIntegerDistribution
org.hipparchus.distribution.discrete.ZipfDistribution
All Implemented Interfaces:
Serializable, IntegerDistribution

public class ZipfDistribution extends AbstractIntegerDistribution
Implementation of the Zipf distribution.

Parameters: For a random variable X whose values are distributed according to this distribution, the probability mass function is given by

   P(X = k) = H(N,s) * 1 / k^s    for k = 1,2,...,N.
 

H(N,s) is the normalizing constant which corresponds to the generalized harmonic number of order N of s.

  • N is the number of elements
  • s is the exponent
See Also:
  • Constructor Details

    • ZipfDistribution

      public ZipfDistribution(int numberOfElements, double exponent) throws MathIllegalArgumentException
      Create a new Zipf distribution with the given number of elements and exponent.
      Parameters:
      numberOfElements - Number of elements.
      exponent - Exponent.
      Throws:
      MathIllegalArgumentException - if numberOfElements <= 0 or exponent <= 0.
  • Method Details

    • getNumberOfElements

      public int getNumberOfElements()
      Get the number of elements (e.g. corpus size) for the distribution.
      Returns:
      the number of elements
    • getExponent

      public double getExponent()
      Get the exponent characterizing the distribution.
      Returns:
      the exponent
    • probability

      public double probability(int x)
      For a random variable X whose values are distributed according to this distribution, this method returns P(X = x). In other words, this method represents the probability mass function (PMF) for the distribution.
      Parameters:
      x - the point at which the PMF is evaluated
      Returns:
      the value of the probability mass function at x
    • logProbability

      public double logProbability(int x)
      For a random variable X whose values are distributed according to this distribution, this method returns log(P(X = x)), where log is the natural logarithm. In other words, this method represents the logarithm of the probability mass function (PMF) for the distribution. Note that due to the floating point precision and under/overflow issues, this method will for some distributions be more precise and faster than computing the logarithm of IntegerDistribution.probability(int).

      The default implementation simply computes the logarithm of probability(x).

      Specified by:
      logProbability in interface IntegerDistribution
      Overrides:
      logProbability in class AbstractIntegerDistribution
      Parameters:
      x - the point at which the PMF is evaluated
      Returns:
      the logarithm of the value of the probability mass function at x
    • cumulativeProbability

      public double cumulativeProbability(int x)
      For a random variable X whose values are distributed according to this distribution, this method returns P(X <= x). In other words, this method represents the (cumulative) distribution function (CDF) for this distribution.
      Parameters:
      x - the point at which the CDF is evaluated
      Returns:
      the probability that a random variable with this distribution takes a value less than or equal to x
    • getNumericalMean

      public double getNumericalMean()
      Use this method to get the numerical value of the mean of this distribution. For number of elements N and exponent s, the mean is Hs1 / Hs, where
      • Hs1 = generalizedHarmonic(N, s - 1),
      • Hs = generalizedHarmonic(N, s).
      Returns:
      the mean or Double.NaN if it is not defined
    • calculateNumericalMean

      protected double calculateNumericalMean()
      Returns:
      the mean of this distribution
    • getNumericalVariance

      public double getNumericalVariance()
      Use this method to get the numerical value of the variance of this distribution. For number of elements N and exponent s, the mean is (Hs2 / Hs) - (Hs1^2 / Hs^2), where
      • Hs2 = generalizedHarmonic(N, s - 2),
      • Hs1 = generalizedHarmonic(N, s - 1),
      • Hs = generalizedHarmonic(N, s).
      Returns:
      the variance (possibly Double.POSITIVE_INFINITY or Double.NaN if it is not defined)
    • calculateNumericalVariance

      protected double calculateNumericalVariance()
      Returns:
      the variance of this distribution
    • getSupportLowerBound

      public int getSupportLowerBound()
      Access the lower bound of the support. This method must return the same value as inverseCumulativeProbability(0). In other words, this method must return

      inf {x in Z | P(X <= x) > 0}.

      The lower bound of the support is always 1 no matter the parameters.
      Returns:
      lower bound of the support (always 1)
    • getSupportUpperBound

      public int getSupportUpperBound()
      Access the upper bound of the support. This method must return the same value as inverseCumulativeProbability(1). In other words, this method must return

      inf {x in R | P(X <= x) = 1}.

      The upper bound of the support is the number of elements.
      Returns:
      upper bound of the support
    • isSupportConnected

      public boolean isSupportConnected()
      Use this method to get information about whether the support is connected, i.e. whether all integers between the lower and upper bound of the support are included in the support. The support of this distribution is connected.
      Returns:
      true