Class RandomPercentile

java.lang.Object
org.hipparchus.stat.descriptive.AbstractStorelessUnivariateStatistic
org.hipparchus.stat.descriptive.rank.RandomPercentile
All Implemented Interfaces:
Serializable, DoubleConsumer, AggregatableStatistic<RandomPercentile>, StorelessUnivariateStatistic, UnivariateStatistic, MathArrays.Function

A StorelessUnivariateStatistic estimating percentiles using the RANDOM Algorithm.

Storage requirements for the RANDOM algorithm depend on the desired accuracy of quantile estimates. Quantile estimate accuracy is defined as follows.

Let \(X\) be the set of all data values consumed from the stream and let \(q\) be a quantile (measured between 0 and 1) to be estimated. If

  • \(\epsilon\) is the configured accuracy
  • \(\hat{q}\) is a RandomPercentile estimate for \(q\) (what is returned by getResult() or getResult(double)) with \(100q\) as actual parameter)
  • \(rank(\hat{q}) = |\{x \in X : x < \hat{q}\}|\) is the actual rank of \(\hat{q}\) in the full data stream
  • \(n = |X|\) is the number of observations
then we can expect \((q - \epsilon)n < rank(\hat{q}) < (q + \epsilon)n\).

The algorithm maintains \(\left\lceil {log_{2}(1/\epsilon)}\right\rceil + 1\) buffers of size \(\left\lceil {1/\epsilon \sqrt{log_2(1/\epsilon)}}\right\rceil\). When epsilon is set to the default value of \(10^{-4}\), this makes 15 buffers of size 36,453.

The algorithm uses the buffers to maintain samples of data from the stream. Until all buffers are full, the entire sample is stored in the buffers. If one of the getResult methods is called when all data are available in memory and there is room to make a copy of the data (meaning the combined set of buffers is less than half full), the getResult method delegates to a Percentile instance to compute and return the exact value for the desired quantile. For default epsilon, this means exact values will be returned whenever fewer than \(\left\lceil {15 \times 36453 / 2} \right\rceil = 273,398\) values have been consumed from the data stream.

When buffers become full, the algorithm merges buffers so that they effectively represent a larger set of values than they can hold. Subsequently, data values are sampled from the stream to fill buffers freed by merge operations. Both the merging and the sampling require random selection, which is done using a RandomGenerator. To get repeatable results for large data streams, users should provide RandomGenerator instances with fixed seeds. RandomPercentile itself does not reseed or otherwise initialize the RandomGenerator provided to it. By default, it uses a Well19937c generator with the default seed.

Note: This implementation is not thread-safe.

See Also:
  • Field Summary

    Fields
    Modifier and Type
    Field
    Description
    static final double
    Default quantile estimation error setting
  • Constructor Summary

    Constructors
    Constructor
    Description
    Constructs a RandomPercentile with quantile estimation error set to the default (DEFAULT_EPSILON), using the default PRNG as source of random data.
    RandomPercentile(double epsilon)
    Constructs a RandomPercentile with quantile estimation error epsilon using the default PRNG as source of random data.
    RandomPercentile(double epsilon, RandomGenerator randomGenerator)
    Constructs a RandomPercentile with quantile estimation error epsilon using randomGenerator as its source of random data.
    Constructs a RandomPercentile with default estimation error using randomGenerator as its source of random data.
    Copy constructor, creates a new RandomPercentile identical to the original.
  • Method Summary

    Modifier and Type
    Method
    Description
    void
    Aggregates the provided instance into this instance.
    void
    Clears the internal state of the Statistic
    Returns a copy of the statistic with the same internal state.
    double
    evaluate(double[] values, int begin, int length)
    Returns an estimate of the median, computed using the designated array segment as input data.
    double
    evaluate(double percentile, double[] values)
    Returns an estimate of percentile over the given array.
    double
    evaluate(double percentile, double[] values, int begin, int length)
    Returns an estimate of the given percentile, computed using the designated array segment as input data.
    double
    Returns the total number of values that have been consumed by the aggregates.
    double
    Returns the estimated quantile position of value in the combined dataset of the aggregates.
    double
    getAggregateRank(double value, Collection<RandomPercentile> aggregates)
    Computes the estimated rank of value in the combined dataset of the aggregates.
    long
    Returns the number of values that have been added.
    double
    getQuantileRank(double value)
    Returns the estimated quantile position of value in the dataset.
    double
    getRank(double value)
    Gets the estimated rank of value, i.e.
    double
    Returns an estimate of the median.
    double
    getResult(double percentile)
    Returns an estimate of the given percentile.
    void
    increment(double d)
    Updates the internal state of the statistic to reflect the addition of the new value.
    static long
    maxValuesRetained(double epsilon)
    Returns the maximum number of double values that a RandomPercentile instance created with the given epsilon value will retain in memory.
    double
    reduce(double percentile, Collection<RandomPercentile> aggregates)
    Computes the given percentile by combining the data from the collection of aggregates.

    Methods inherited from class org.hipparchus.stat.descriptive.AbstractStorelessUnivariateStatistic

    equals, hashCode, toString

    Methods inherited from class java.lang.Object

    clone, finalize, getClass, notify, notifyAll, wait, wait, wait

    Methods inherited from interface org.hipparchus.stat.descriptive.AggregatableStatistic

    aggregate, aggregate

    Methods inherited from interface java.util.function.DoubleConsumer

    andThen

    Methods inherited from interface org.hipparchus.stat.descriptive.StorelessUnivariateStatistic

    accept, incrementAll, incrementAll

    Methods inherited from interface org.hipparchus.stat.descriptive.UnivariateStatistic

    evaluate
  • Field Details

    • DEFAULT_EPSILON

      public static final double DEFAULT_EPSILON
      Default quantile estimation error setting
      See Also:
  • Constructor Details

    • RandomPercentile

      public RandomPercentile(double epsilon, RandomGenerator randomGenerator)
      Constructs a RandomPercentile with quantile estimation error epsilon using randomGenerator as its source of random data.
      Parameters:
      epsilon - bound on quantile estimation error (see class javadoc)
      randomGenerator - PRNG used in sampling and merge operations
      Throws:
      MathIllegalArgumentException - if percentile is not in the range [0, 100]
    • RandomPercentile

      public RandomPercentile(RandomGenerator randomGenerator)
      Constructs a RandomPercentile with default estimation error using randomGenerator as its source of random data.
      Parameters:
      randomGenerator - PRNG used in sampling and merge operations
      Throws:
      MathIllegalArgumentException - if percentile is not in the range [0, 100]
    • RandomPercentile

      public RandomPercentile(double epsilon)
      Constructs a RandomPercentile with quantile estimation error epsilon using the default PRNG as source of random data.
      Parameters:
      epsilon - bound on quantile estimation error (see class javadoc)
      Throws:
      MathIllegalArgumentException - if percentile is not in the range [0, 100]
    • RandomPercentile

      public RandomPercentile()
      Constructs a RandomPercentile with quantile estimation error set to the default (DEFAULT_EPSILON), using the default PRNG as source of random data.
    • RandomPercentile

      public RandomPercentile(RandomPercentile original)
      Copy constructor, creates a new RandomPercentile identical to the original. Note: the RandomGenerator used by the new instance is referenced, not copied - i.e., the new instance shares a generator with the original.
      Parameters:
      original - the PSquarePercentile instance to copy
  • Method Details

    • getN

      public long getN()
      Description copied from interface: StorelessUnivariateStatistic
      Returns the number of values that have been added.
      Specified by:
      getN in interface StorelessUnivariateStatistic
      Returns:
      the number of values.
    • evaluate

      public double evaluate(double percentile, double[] values, int begin, int length) throws MathIllegalArgumentException
      Returns an estimate of the given percentile, computed using the designated array segment as input data.
      Parameters:
      values - source of input data
      begin - position of the first element of the values array to include
      length - number of array elements to include
      percentile - desired percentile (scaled 0 - 100)
      Returns:
      estimated percentile
      Throws:
      MathIllegalArgumentException - if percentile is out of the range [0, 100]
    • evaluate

      public double evaluate(double[] values, int begin, int length)
      Returns an estimate of the median, computed using the designated array segment as input data.
      Specified by:
      evaluate in interface MathArrays.Function
      Specified by:
      evaluate in interface StorelessUnivariateStatistic
      Specified by:
      evaluate in interface UnivariateStatistic
      Parameters:
      values - source of input data
      begin - position of the first element of the values array to include
      length - number of array elements to include
      Returns:
      estimated percentile
      Throws:
      MathIllegalArgumentException - if percentile is out of the range [0, 100]
      See Also:
    • evaluate

      public double evaluate(double percentile, double[] values)
      Returns an estimate of percentile over the given array.
      Parameters:
      values - source of input data
      percentile - desired percentile (scaled 0 - 100)
      Returns:
      estimated percentile
      Throws:
      MathIllegalArgumentException - if percentile is out of the range [0, 100]
    • copy

      public RandomPercentile copy()
      Description copied from class: AbstractStorelessUnivariateStatistic
      Returns a copy of the statistic with the same internal state.
      Specified by:
      copy in interface StorelessUnivariateStatistic
      Specified by:
      copy in interface UnivariateStatistic
      Specified by:
      copy in class AbstractStorelessUnivariateStatistic
      Returns:
      a copy of the statistic
    • clear

      public void clear()
      Description copied from class: AbstractStorelessUnivariateStatistic
      Clears the internal state of the Statistic
      Specified by:
      clear in interface StorelessUnivariateStatistic
      Specified by:
      clear in class AbstractStorelessUnivariateStatistic
    • getResult

      public double getResult()
      Returns an estimate of the median.
      Specified by:
      getResult in interface StorelessUnivariateStatistic
      Specified by:
      getResult in class AbstractStorelessUnivariateStatistic
      Returns:
      value of the statistic, Double.NaN if it has been cleared or just instantiated.
    • getResult

      public double getResult(double percentile)
      Returns an estimate of the given percentile.
      Parameters:
      percentile - desired percentile (scaled 0 - 100)
      Returns:
      estimated percentile
      Throws:
      MathIllegalArgumentException - if percentile is out of the range [0, 100]
    • getRank

      public double getRank(double value)
      Gets the estimated rank of value, i.e. \(|\{x \in X : x < value\}|\) where \(X\) is the set of values that have been consumed from the stream.
      Parameters:
      value - value whose overall rank is sought
      Returns:
      estimated number of sample values that are strictly less than value
    • getQuantileRank

      public double getQuantileRank(double value)
      Returns the estimated quantile position of value in the dataset. Specifically, what is returned is an estimate of \(|\{x \in X : x < value\}| / |X|\) where \(X\) is the set of values that have been consumed from the stream.
      Parameters:
      value - value whose quantile rank is sought.
      Returns:
      estimated proportion of sample values that are strictly less than value
    • increment

      public void increment(double d)
      Description copied from class: AbstractStorelessUnivariateStatistic
      Updates the internal state of the statistic to reflect the addition of the new value.
      Specified by:
      increment in interface StorelessUnivariateStatistic
      Specified by:
      increment in class AbstractStorelessUnivariateStatistic
      Parameters:
      d - the new value.
    • reduce

      public double reduce(double percentile, Collection<RandomPercentile> aggregates)
      Computes the given percentile by combining the data from the collection of aggregates. The result describes the combined sample of all data added to any of the aggregates.
      Parameters:
      percentile - desired percentile (scaled 0-100)
      aggregates - RandomPercentile instances to combine data from
      Returns:
      estimate of the given percentile using combined data from the aggregates
      Throws:
      MathIllegalArgumentException - if percentile is out of the range [0, 100]
    • getAggregateRank

      public double getAggregateRank(double value, Collection<RandomPercentile> aggregates)
      Computes the estimated rank of value in the combined dataset of the aggregates. Sums the values from getRank(double).
      Parameters:
      value - value whose rank is sought
      aggregates - collection to aggregate rank over
      Returns:
      estimated number of elements in the combined dataset that are less than value
    • getAggregateQuantileRank

      public double getAggregateQuantileRank(double value, Collection<RandomPercentile> aggregates)
      Returns the estimated quantile position of value in the combined dataset of the aggregates. Specifically, what is returned is an estimate of \(|\{x \in X : x < value\}| / |X|\) where \(X\) is the set of values that have been consumed from all of the datastreams feeding the aggregates.
      Parameters:
      value - value whose quantile rank is sought.
      aggregates - collection of RandomPercentile instances being combined
      Returns:
      estimated proportion of combined sample values that are strictly less than value
    • getAggregateN

      public double getAggregateN(Collection<RandomPercentile> aggregates)
      Returns the total number of values that have been consumed by the aggregates.
      Parameters:
      aggregates - collection of RandomPercentile instances whose combined sample size is sought
      Returns:
      total number of values that have been consumed by the aggregates
    • aggregate

      public void aggregate(RandomPercentile other) throws NullArgumentException
      Aggregates the provided instance into this instance.

      Other must have the same buffer size as this. If the combined data size exceeds the maximum storage configured for this instance, buffers are merged to create capacity. If all that is needed is computation of aggregate results, reduce(double, Collection) is faster, may be more accurate and does not require the buffer sizes to be the same.

      Specified by:
      aggregate in interface AggregatableStatistic<RandomPercentile>
      Parameters:
      other - the instance to aggregate into this instance
      Throws:
      NullArgumentException - if the input is null
      IllegalArgumentException - if other has different buffer size than this
    • maxValuesRetained

      public static long maxValuesRetained(double epsilon)
      Returns the maximum number of double values that a RandomPercentile instance created with the given epsilon value will retain in memory.

      If the number of values that have been consumed from the stream is less than 1/2 of this value, reported statistics are exact.

      Parameters:
      epsilon - bound on the relative quantile error (see class javadoc)
      Returns:
      upper bound on the total number of primitive double values retained in memory
      Throws:
      MathIllegalArgumentException - if epsilon is not in the interval (0,1)