Class Variance

All Implemented Interfaces:
Serializable, DoubleConsumer, AggregatableStatistic<Variance>, StorelessUnivariateStatistic, UnivariateStatistic, WeightedEvaluation, MathArrays.Function

Computes the variance of the available values. By default, the unbiased "sample variance" definitional formula is used:

variance = sum((x_i - mean)^2) / (n - 1)

where mean is the Mean and n is the number of sample observations.

The definitional formula does not have good numerical properties, so this implementation does not compute the statistic using the definitional formula.

  • The getResult method computes the variance using updating formulas based on West's algorithm, as described in Chan, T. F. and J. G. Lewis 1979, Communications of the ACM, vol. 22 no. 9, pp. 526-531.
  • The evaluate methods leverage the fact that they have the full array of values in memory to execute a two-pass algorithm. Specifically, these methods use the "corrected two-pass algorithm" from Chan, Golub, Levesque, Algorithms for Computing the Sample Variance, American Statistician, vol. 37, no. 3 (1983) pp. 242-247.

Note that adding values using increment or incrementAll and then executing getResult will sometimes give a different, less accurate, result than executing evaluate with the full array of values. The former approach should only be used when the full array of values is not available.

The "population variance" ( sum((x_i - mean)^2) / n ) can also be computed using this statistic. The isBiasCorrected property determines whether the "population" or "sample" value is returned by the evaluate and getResult methods. To compute population variances, set this property to false.

Note that this implementation is not synchronized. If multiple threads access an instance of this class concurrently, and at least one of the threads invokes the increment() or clear() method, it must be synchronized externally.

See Also:
  • Field Summary

    Fields
    Modifier and Type
    Field
    Description
    protected final boolean
    Whether or not increment(double) should increment the internal second moment.
    protected final SecondMoment
    SecondMoment is used in incremental calculation of Variance
  • Constructor Summary

    Constructors
    Constructor
    Description
    Constructs a Variance with default (true) isBiasCorrected property.
    Variance(boolean isBiasCorrected)
    Constructs a Variance with the specified isBiasCorrected property.
    Variance(boolean isBiasCorrected, SecondMoment m2)
    Constructs a Variance with the specified isBiasCorrected property and the supplied external second moment.
    Constructs a Variance based on an external second moment.
    Variance(Variance original)
    Copy constructor, creates a new Variance identical to the original.
  • Method Summary

    Modifier and Type
    Method
    Description
    void
    Aggregates the provided instance into this instance.
    void
    Clears the internal state of the Statistic
    Returns a copy of the statistic with the same internal state.
    double
    evaluate(double[] values, double mean)
    Returns the variance of the entries in the input array, using the precomputed mean value.
    double
    evaluate(double[] values, double[] weights, double mean)
    Returns the weighted variance of the values in the input array, using the precomputed weighted mean value.
    double
    evaluate(double[] values, double[] weights, double mean, int begin, int length)
    Returns the weighted variance of the entries in the specified portion of the input array, using the precomputed weighted mean value.
    double
    evaluate(double[] values, double[] weights, int begin, int length)
    Returns the weighted variance of the entries in the specified portion of the input array, or Double.NaN if the designated subarray is empty.
    double
    evaluate(double[] values, double mean, int begin, int length)
    Returns the variance of the entries in the specified portion of the input array, using the precomputed mean value.
    double
    evaluate(double[] values, int begin, int length)
    Returns the variance of the entries in the specified portion of the input array, or Double.NaN if the designated subarray is empty.
    long
    Returns the number of values that have been added.
    double
    Returns the current value of the Statistic.
    void
    increment(double d)
    Updates the internal state of the statistic to reflect the addition of the new value.
    boolean
    Check if bias is corrected.
    withBiasCorrection(boolean biasCorrection)
    Returns a new copy of this variance with the given bias correction setting.

    Methods inherited from class org.hipparchus.stat.descriptive.AbstractStorelessUnivariateStatistic

    equals, hashCode, toString

    Methods inherited from class java.lang.Object

    clone, finalize, getClass, notify, notifyAll, wait, wait, wait

    Methods inherited from interface org.hipparchus.stat.descriptive.AggregatableStatistic

    aggregate, aggregate

    Methods inherited from interface java.util.function.DoubleConsumer

    andThen

    Methods inherited from interface org.hipparchus.stat.descriptive.StorelessUnivariateStatistic

    accept, incrementAll, incrementAll

    Methods inherited from interface org.hipparchus.stat.descriptive.UnivariateStatistic

    evaluate

    Methods inherited from interface org.hipparchus.stat.descriptive.WeightedEvaluation

    evaluate
  • Field Details

    • moment

      protected final SecondMoment moment
      SecondMoment is used in incremental calculation of Variance
    • incMoment

      protected final boolean incMoment
      Whether or not increment(double) should increment the internal second moment. When a Variance is constructed with an external SecondMoment as a constructor parameter, this property is set to false and increments must be applied to the second moment directly.
  • Constructor Details

    • Variance

      public Variance()
      Constructs a Variance with default (true) isBiasCorrected property.
    • Variance

      public Variance(SecondMoment m2)
      Constructs a Variance based on an external second moment.

      When this constructor is used, the statistic may only be incremented via the moment, i.e., increment(double) does nothing; whereas m2.increment(value) increments both m2 and the Variance instance constructed from it.

      Parameters:
      m2 - the SecondMoment (Third or Fourth moments work here as well.)
    • Variance

      public Variance(boolean isBiasCorrected)
      Constructs a Variance with the specified isBiasCorrected property.
      Parameters:
      isBiasCorrected - setting for bias correction - true means bias will be corrected and is equivalent to using the argumentless constructor
    • Variance

      public Variance(boolean isBiasCorrected, SecondMoment m2)
      Constructs a Variance with the specified isBiasCorrected property and the supplied external second moment.
      Parameters:
      isBiasCorrected - setting for bias correction - true means bias will be corrected
      m2 - the SecondMoment (Third or Fourth moments work here as well.)
    • Variance

      public Variance(Variance original) throws NullArgumentException
      Copy constructor, creates a new Variance identical to the original.
      Parameters:
      original - the Variance instance to copy
      Throws:
      NullArgumentException - if original is null
  • Method Details

    • increment

      public void increment(double d)
      Updates the internal state of the statistic to reflect the addition of the new value.

      If all values are available, it is more accurate to use UnivariateStatistic.evaluate(double[]) rather than adding values one at a time using this method and then executing getResult(), since evaluate leverages the fact that is has the full list of values together to execute a two-pass algorithm. See Variance.

      Note also that when Variance(SecondMoment) is used to create a Variance, this method does nothing. In that case, the SecondMoment should be incremented directly.

      Specified by:
      increment in interface StorelessUnivariateStatistic
      Specified by:
      increment in class AbstractStorelessUnivariateStatistic
      Parameters:
      d - the new value.
    • getResult

      public double getResult()
      Returns the current value of the Statistic.
      Specified by:
      getResult in interface StorelessUnivariateStatistic
      Specified by:
      getResult in class AbstractStorelessUnivariateStatistic
      Returns:
      value of the statistic, Double.NaN if it has been cleared or just instantiated.
    • getN

      public long getN()
      Returns the number of values that have been added.
      Specified by:
      getN in interface StorelessUnivariateStatistic
      Returns:
      the number of values.
    • clear

      public void clear()
      Clears the internal state of the Statistic
      Specified by:
      clear in interface StorelessUnivariateStatistic
      Specified by:
      clear in class AbstractStorelessUnivariateStatistic
    • aggregate

      public void aggregate(Variance other)
      Aggregates the provided instance into this instance.

      This method can be used to combine statistics computed over partitions or subsamples - i.e., the value of this instance after this operation should be the same as if a single statistic would have been applied over the combined dataset.

      Specified by:
      aggregate in interface AggregatableStatistic<Variance>
      Parameters:
      other - the instance to aggregate into this instance
    • evaluate

      public double evaluate(double[] values, int begin, int length) throws MathIllegalArgumentException
      Returns the variance of the entries in the specified portion of the input array, or Double.NaN if the designated subarray is empty. Note that Double.NaN may also be returned if the input includes NaN and / or infinite values.

      See Variance for details on the computing algorithm.

      Returns 0 for a single-value (i.e. length = 1) sample.

      Does not change the internal state of the statistic.

      Throws MathIllegalArgumentException if the array is null.

      Specified by:
      evaluate in interface MathArrays.Function
      Specified by:
      evaluate in interface StorelessUnivariateStatistic
      Specified by:
      evaluate in interface UnivariateStatistic
      Parameters:
      values - the input array
      begin - index of the first array element to include
      length - the number of elements to include
      Returns:
      the variance of the values or Double.NaN if length = 0
      Throws:
      MathIllegalArgumentException - if the array is null or the array index parameters are not valid
      See Also:
    • evaluate

      public double evaluate(double[] values, double[] weights, int begin, int length) throws MathIllegalArgumentException
      Returns the weighted variance of the entries in the specified portion of the input array, or Double.NaN if the designated subarray is empty.

      Uses the formula

         Σ(weights[i]*(values[i] - weightedMean)²)/(Σ(weights[i]) - 1)
       
      where weightedMean is the weighted mean.

      This formula will not return the same result as the unweighted variance when all weights are equal, unless all weights are equal to 1. The formula assumes that weights are to be treated as "expansion values," as will be the case if for example the weights represent frequency counts. To normalize weights so that the denominator in the variance computation equals the length of the input vector minus one, use

         evaluate(values, MathArrays.normalizeArray(weights, values.length));
       

      Returns 0 for a single-value (i.e. length = 1) sample.

      Throws IllegalArgumentException if any of the following are true:

      • the values array is null
      • the weights array is null
      • the weights array does not have the same length as the values array
      • the weights array contains one or more infinite values
      • the weights array contains one or more NaN values
      • the weights array contains negative values
      • the start and length arguments do not determine a valid array

      Does not change the internal state of the statistic.

      Specified by:
      evaluate in interface WeightedEvaluation
      Parameters:
      values - the input array
      weights - the weights array
      begin - index of the first array element to include
      length - the number of elements to include
      Returns:
      the weighted variance of the values or Double.NaN if length = 0
      Throws:
      MathIllegalArgumentException - if the parameters are not valid
    • evaluate

      public double evaluate(double[] values, double mean, int begin, int length) throws MathIllegalArgumentException
      Returns the variance of the entries in the specified portion of the input array, using the precomputed mean value. Returns Double.NaN if the designated subarray is empty.

      See Variance for details on the computing algorithm.

      The formula used assumes that the supplied mean value is the arithmetic mean of the sample data, not a known population parameter. This method is supplied only to save computation when the mean has already been computed.

      Returns 0 for a single-value (i.e. length = 1) sample.

      Does not change the internal state of the statistic.

      Parameters:
      values - the input array
      mean - the precomputed mean value
      begin - index of the first array element to include
      length - the number of elements to include
      Returns:
      the variance of the values or Double.NaN if length = 0
      Throws:
      MathIllegalArgumentException - if the array is null or the array index parameters are not valid
    • evaluate

      public double evaluate(double[] values, double mean) throws MathIllegalArgumentException
      Returns the variance of the entries in the input array, using the precomputed mean value. Returns Double.NaN if the array is empty.

      See Variance for details on the computing algorithm.

      If isBiasCorrected is true the formula used assumes that the supplied mean value is the arithmetic mean of the sample data, not a known population parameter. If the mean is a known population parameter, or if the "population" version of the variance is desired, set isBiasCorrected to false before invoking this method.

      Returns 0 for a single-value (i.e. length = 1) sample.

      Does not change the internal state of the statistic.

      Parameters:
      values - the input array
      mean - the precomputed mean value
      Returns:
      the variance of the values or Double.NaN if the array is empty
      Throws:
      MathIllegalArgumentException - if the array is null
    • evaluate

      public double evaluate(double[] values, double[] weights, double mean, int begin, int length) throws MathIllegalArgumentException
      Returns the weighted variance of the entries in the specified portion of the input array, using the precomputed weighted mean value. Returns Double.NaN if the designated subarray is empty.

      Uses the formula

         Σ(weights[i]*(values[i] - mean)²)/(Σ(weights[i]) - 1)
       

      The formula used assumes that the supplied mean value is the weighted arithmetic mean of the sample data, not a known population parameter. This method is supplied only to save computation when the mean has already been computed.

      This formula will not return the same result as the unweighted variance when all weights are equal, unless all weights are equal to 1. The formula assumes that weights are to be treated as "expansion values," as will be the case if for example the weights represent frequency counts. To normalize weights so that the denominator in the variance computation equals the length of the input vector minus one, use

         evaluate(values, MathArrays.normalizeArray(weights, values.length), mean);
       

      Returns 0 for a single-value (i.e. length = 1) sample.

      Throws MathIllegalArgumentException if any of the following are true:

      • the values array is null
      • the weights array is null
      • the weights array does not have the same length as the values array
      • the weights array contains one or more infinite values
      • the weights array contains one or more NaN values
      • the weights array contains negative values
      • the start and length arguments do not determine a valid array

      Does not change the internal state of the statistic.

      Parameters:
      values - the input array
      weights - the weights array
      mean - the precomputed weighted mean value
      begin - index of the first array element to include
      length - the number of elements to include
      Returns:
      the variance of the values or Double.NaN if length = 0
      Throws:
      MathIllegalArgumentException - if the parameters are not valid
    • evaluate

      public double evaluate(double[] values, double[] weights, double mean) throws MathIllegalArgumentException
      Returns the weighted variance of the values in the input array, using the precomputed weighted mean value.

      Uses the formula

         Σ(weights[i]*(values[i] - mean)²)/(Σ(weights[i]) - 1)
       

      The formula used assumes that the supplied mean value is the weighted arithmetic mean of the sample data, not a known population parameter. This method is supplied only to save computation when the mean has already been computed.

      This formula will not return the same result as the unweighted variance when all weights are equal, unless all weights are equal to 1. The formula assumes that weights are to be treated as "expansion values," as will be the case if for example the weights represent frequency counts. To normalize weights so that the denominator in the variance computation equals the length of the input vector minus one, use

         evaluate(values, MathArrays.normalizeArray(weights, values.length), mean);
       

      Returns 0 for a single-value (i.e. length = 1) sample.

      Throws MathIllegalArgumentException if any of the following are true:

      • the values array is null
      • the weights array is null
      • the weights array does not have the same length as the values array
      • the weights array contains one or more infinite values
      • the weights array contains one or more NaN values
      • the weights array contains negative values

      Does not change the internal state of the statistic.

      Parameters:
      values - the input array
      weights - the weights array
      mean - the precomputed weighted mean value
      Returns:
      the variance of the values or Double.NaN if length = 0
      Throws:
      MathIllegalArgumentException - if the parameters are not valid
    • isBiasCorrected

      public boolean isBiasCorrected()
      Check if bias is corrected.
      Returns:
      Returns the isBiasCorrected.
    • withBiasCorrection

      public Variance withBiasCorrection(boolean biasCorrection)
      Returns a new copy of this variance with the given bias correction setting.
      Parameters:
      biasCorrection - The bias correction flag to set.
      Returns:
      a copy of this instance with the given bias correction setting
    • copy

      public Variance copy()
      Returns a copy of the statistic with the same internal state.
      Specified by:
      copy in interface StorelessUnivariateStatistic
      Specified by:
      copy in interface UnivariateStatistic
      Specified by:
      copy in class AbstractStorelessUnivariateStatistic
      Returns:
      a copy of the statistic