Class RandomPercentile
- All Implemented Interfaces:
Serializable
,DoubleConsumer
,AggregatableStatistic<RandomPercentile>
,StorelessUnivariateStatistic
,UnivariateStatistic
,MathArrays.Function
StorelessUnivariateStatistic
estimating percentiles using the
RANDOM
Algorithm.
Storage requirements for the RANDOM algorithm depend on the desired accuracy of quantile estimates. Quantile estimate accuracy is defined as follows.
Let \(X\) be the set of all data values consumed from the stream and let \(q\) be a quantile (measured between 0 and 1) to be estimated. If
- \(\epsilon\) is the configured accuracy
- \(\hat{q}\) is a RandomPercentile estimate for \(q\) (what is returned
by
getResult()
orgetResult(double)
) with \(100q\) as actual parameter) - \(rank(\hat{q}) = |\{x \in X : x < \hat{q}\}|\) is the actual rank of \(\hat{q}\) in the full data stream
- \(n = |X|\) is the number of observations
The algorithm maintains \(\left\lceil {log_{2}(1/\epsilon)}\right\rceil + 1\) buffers
of size \(\left\lceil {1/\epsilon \sqrt{log_2(1/\epsilon)}}\right\rceil\). When
epsilon
is set to the default value of \(10^{-4}\), this makes 15 buffers
of size 36,453.
The algorithm uses the buffers to maintain samples of data from the stream. Until
all buffers are full, the entire sample is stored in the buffers.
If one of the getResult
methods is called when all data are available in memory
and there is room to make a copy of the data (meaning the combined set of buffers is
less than half full), the getResult
method delegates to a Percentile
instance to compute and return the exact value for the desired quantile.
For default epsilon
, this means exact values will be returned whenever fewer than
\(\left\lceil {15 \times 36453 / 2} \right\rceil = 273,398\) values have been consumed
from the data stream.
When buffers become full, the algorithm merges buffers so that they effectively represent
a larger set of values than they can hold. Subsequently, data values are sampled from the
stream to fill buffers freed by merge operations. Both the merging and the sampling
require random selection, which is done using a RandomGenerator
. To get
repeatable results for large data streams, users should provide RandomGenerator
instances with fixed seeds. RandomPercentile
itself does not reseed or otherwise
initialize the RandomGenerator
provided to it. By default, it uses a
Well19937c
generator with the default seed.
Note: This implementation is not thread-safe.
- See Also:
-
Field Summary
Modifier and TypeFieldDescriptionstatic final double
Default quantile estimation error setting -
Constructor Summary
ConstructorDescriptionConstructs aRandomPercentile
with quantile estimation error set to the default (DEFAULT_EPSILON
), using the default PRNG as source of random data.RandomPercentile
(double epsilon) Constructs aRandomPercentile
with quantile estimation errorepsilon
using the default PRNG as source of random data.RandomPercentile
(double epsilon, RandomGenerator randomGenerator) Constructs aRandomPercentile
with quantile estimation errorepsilon
usingrandomGenerator
as its source of random data.RandomPercentile
(RandomGenerator randomGenerator) Constructs aRandomPercentile
with default estimation error usingrandomGenerator
as its source of random data.RandomPercentile
(RandomPercentile original) Copy constructor, creates a newRandomPercentile
identical to theoriginal
. -
Method Summary
Modifier and TypeMethodDescriptionvoid
aggregate
(RandomPercentile other) Aggregates the provided instance into this instance.void
clear()
Clears the internal state of the Statisticcopy()
Returns a copy of the statistic with the same internal state.double
evaluate
(double[] values, int begin, int length) Returns an estimate of the median, computed using the designated array segment as input data.double
evaluate
(double percentile, double[] values) Returns an estimate of percentile over the given array.double
evaluate
(double percentile, double[] values, int begin, int length) Returns an estimate of the given percentile, computed using the designated array segment as input data.double
getAggregateN
(Collection<RandomPercentile> aggregates) Returns the total number of values that have been consumed by the aggregates.double
getAggregateQuantileRank
(double value, Collection<RandomPercentile> aggregates) Returns the estimated quantile position of value in the combined dataset of the aggregates.double
getAggregateRank
(double value, Collection<RandomPercentile> aggregates) Computes the estimated rank of value in the combined dataset of the aggregates.long
getN()
Returns the number of values that have been added.double
getQuantileRank
(double value) Returns the estimated quantile position of value in the dataset.double
getRank
(double value) Gets the estimated rank ofvalue
, i.e.double
Returns an estimate of the median.double
getResult
(double percentile) Returns an estimate of the given percentile.void
increment
(double d) Updates the internal state of the statistic to reflect the addition of the new value.static long
maxValuesRetained
(double epsilon) Returns the maximum number ofdouble
values that aRandomPercentile
instance created with the givenepsilon
value will retain in memory.double
reduce
(double percentile, Collection<RandomPercentile> aggregates) Computes the given percentile by combining the data from the collection of aggregates.Methods inherited from class org.hipparchus.stat.descriptive.AbstractStorelessUnivariateStatistic
equals, hashCode, toString
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
Methods inherited from interface org.hipparchus.stat.descriptive.AggregatableStatistic
aggregate, aggregate
Methods inherited from interface java.util.function.DoubleConsumer
andThen
Methods inherited from interface org.hipparchus.stat.descriptive.StorelessUnivariateStatistic
accept, incrementAll, incrementAll
Methods inherited from interface org.hipparchus.stat.descriptive.UnivariateStatistic
evaluate
-
Field Details
-
DEFAULT_EPSILON
public static final double DEFAULT_EPSILONDefault quantile estimation error setting- See Also:
-
-
Constructor Details
-
RandomPercentile
Constructs aRandomPercentile
with quantile estimation errorepsilon
usingrandomGenerator
as its source of random data.- Parameters:
epsilon
- bound on quantile estimation error (see class javadoc)randomGenerator
- PRNG used in sampling and merge operations- Throws:
MathIllegalArgumentException
- if percentile is not in the range [0, 100]
-
RandomPercentile
Constructs aRandomPercentile
with default estimation error usingrandomGenerator
as its source of random data.- Parameters:
randomGenerator
- PRNG used in sampling and merge operations- Throws:
MathIllegalArgumentException
- if percentile is not in the range [0, 100]
-
RandomPercentile
public RandomPercentile(double epsilon) Constructs aRandomPercentile
with quantile estimation errorepsilon
using the default PRNG as source of random data.- Parameters:
epsilon
- bound on quantile estimation error (see class javadoc)- Throws:
MathIllegalArgumentException
- if percentile is not in the range [0, 100]
-
RandomPercentile
public RandomPercentile()Constructs aRandomPercentile
with quantile estimation error set to the default (DEFAULT_EPSILON
), using the default PRNG as source of random data. -
RandomPercentile
Copy constructor, creates a newRandomPercentile
identical to theoriginal
. Note: the RandomGenerator used by the new instance is referenced, not copied - i.e., the new instance shares a generator with the original.- Parameters:
original
- thePSquarePercentile
instance to copy
-
-
Method Details
-
getN
public long getN()Description copied from interface:StorelessUnivariateStatistic
Returns the number of values that have been added.- Specified by:
getN
in interfaceStorelessUnivariateStatistic
- Returns:
- the number of values.
-
evaluate
public double evaluate(double percentile, double[] values, int begin, int length) throws MathIllegalArgumentException Returns an estimate of the given percentile, computed using the designated array segment as input data.- Parameters:
values
- source of input databegin
- position of the first element of the values array to includelength
- number of array elements to includepercentile
- desired percentile (scaled 0 - 100)- Returns:
- estimated percentile
- Throws:
MathIllegalArgumentException
- if percentile is out of the range [0, 100]
-
evaluate
public double evaluate(double[] values, int begin, int length) Returns an estimate of the median, computed using the designated array segment as input data.- Specified by:
evaluate
in interfaceMathArrays.Function
- Specified by:
evaluate
in interfaceStorelessUnivariateStatistic
- Specified by:
evaluate
in interfaceUnivariateStatistic
- Parameters:
values
- source of input databegin
- position of the first element of the values array to includelength
- number of array elements to include- Returns:
- estimated percentile
- Throws:
MathIllegalArgumentException
- if percentile is out of the range [0, 100]- See Also:
-
evaluate
public double evaluate(double percentile, double[] values) Returns an estimate of percentile over the given array.- Parameters:
values
- source of input datapercentile
- desired percentile (scaled 0 - 100)- Returns:
- estimated percentile
- Throws:
MathIllegalArgumentException
- if percentile is out of the range [0, 100]
-
copy
Description copied from class:AbstractStorelessUnivariateStatistic
Returns a copy of the statistic with the same internal state.- Specified by:
copy
in interfaceStorelessUnivariateStatistic
- Specified by:
copy
in interfaceUnivariateStatistic
- Specified by:
copy
in classAbstractStorelessUnivariateStatistic
- Returns:
- a copy of the statistic
-
clear
public void clear()Description copied from class:AbstractStorelessUnivariateStatistic
Clears the internal state of the Statistic- Specified by:
clear
in interfaceStorelessUnivariateStatistic
- Specified by:
clear
in classAbstractStorelessUnivariateStatistic
-
getResult
public double getResult()Returns an estimate of the median.- Specified by:
getResult
in interfaceStorelessUnivariateStatistic
- Specified by:
getResult
in classAbstractStorelessUnivariateStatistic
- Returns:
- value of the statistic,
Double.NaN
if it has been cleared or just instantiated.
-
getResult
public double getResult(double percentile) Returns an estimate of the given percentile.- Parameters:
percentile
- desired percentile (scaled 0 - 100)- Returns:
- estimated percentile
- Throws:
MathIllegalArgumentException
- if percentile is out of the range [0, 100]
-
getRank
public double getRank(double value) Gets the estimated rank ofvalue
, i.e. \(|\{x \in X : x < value\}|\) where \(X\) is the set of values that have been consumed from the stream.- Parameters:
value
- value whose overall rank is sought- Returns:
- estimated number of sample values that are strictly less than
value
-
getQuantileRank
public double getQuantileRank(double value) Returns the estimated quantile position of value in the dataset. Specifically, what is returned is an estimate of \(|\{x \in X : x < value\}| / |X|\) where \(X\) is the set of values that have been consumed from the stream.- Parameters:
value
- value whose quantile rank is sought.- Returns:
- estimated proportion of sample values that are strictly less than
value
-
increment
public void increment(double d) Description copied from class:AbstractStorelessUnivariateStatistic
Updates the internal state of the statistic to reflect the addition of the new value.- Specified by:
increment
in interfaceStorelessUnivariateStatistic
- Specified by:
increment
in classAbstractStorelessUnivariateStatistic
- Parameters:
d
- the new value.
-
reduce
Computes the given percentile by combining the data from the collection of aggregates. The result describes the combined sample of all data added to any of the aggregates.- Parameters:
percentile
- desired percentile (scaled 0-100)aggregates
- RandomPercentile instances to combine data from- Returns:
- estimate of the given percentile using combined data from the aggregates
- Throws:
MathIllegalArgumentException
- if percentile is out of the range [0, 100]
-
getAggregateRank
Computes the estimated rank of value in the combined dataset of the aggregates. Sums the values fromgetRank(double)
.- Parameters:
value
- value whose rank is soughtaggregates
- collection to aggregate rank over- Returns:
- estimated number of elements in the combined dataset that are less than value
-
getAggregateQuantileRank
Returns the estimated quantile position of value in the combined dataset of the aggregates. Specifically, what is returned is an estimate of \(|\{x \in X : x < value\}| / |X|\) where \(X\) is the set of values that have been consumed from all of the datastreams feeding the aggregates.- Parameters:
value
- value whose quantile rank is sought.aggregates
- collection of RandomPercentile instances being combined- Returns:
- estimated proportion of combined sample values that are strictly less than
value
-
getAggregateN
Returns the total number of values that have been consumed by the aggregates.- Parameters:
aggregates
- collection of RandomPercentile instances whose combined sample size is sought- Returns:
- total number of values that have been consumed by the aggregates
-
aggregate
Aggregates the provided instance into this instance.Other must have the same buffer size as this. If the combined data size exceeds the maximum storage configured for this instance, buffers are merged to create capacity. If all that is needed is computation of aggregate results,
reduce(double, Collection)
is faster, may be more accurate and does not require the buffer sizes to be the same.- Specified by:
aggregate
in interfaceAggregatableStatistic<RandomPercentile>
- Parameters:
other
- the instance to aggregate into this instance- Throws:
NullArgumentException
- if the input is nullIllegalArgumentException
- if other has different buffer size than this
-
maxValuesRetained
public static long maxValuesRetained(double epsilon) Returns the maximum number ofdouble
values that aRandomPercentile
instance created with the givenepsilon
value will retain in memory.If the number of values that have been consumed from the stream is less than 1/2 of this value, reported statistics are exact.
- Parameters:
epsilon
- bound on the relative quantile error (see class javadoc)- Returns:
- upper bound on the total number of primitive double values retained in memory
- Throws:
MathIllegalArgumentException
- if epsilon is not in the interval (0,1)
-