Serializable, DoubleConsumer, AggregatableStatistic<RandomPercentile>, StorelessUnivariateStatistic, UnivariateStatistic, MathArrays.Functionpublic class RandomPercentile extends AbstractStorelessUnivariateStatistic implements StorelessUnivariateStatistic, AggregatableStatistic<RandomPercentile>, Serializable
StorelessUnivariateStatistic estimating percentiles using the
RANDOM
Algorithm.
Storage requirements for the RANDOM algorithm depend on the desired accuracy of quantile estimates. Quantile estimate accuracy is defined as follows.
Let \(X\) be the set of all data values consumed from the stream and let \(q\) be a quantile (measured between 0 and 1) to be estimated. If
getResult() or getResult(double)) with \(100q\) as
actual parameter)
The algorithm maintains \(\left\lceil {log_{2}(1/\epsilon)}\right\rceil + 1\) buffers
of size \(\left\lceil {1/\epsilon \sqrt{log_2(1/\epsilon)}}\right\rceil\). When
epsilon is set to the default value of \(10^{-4}\), this makes 15 buffers
of size 36,453.
The algorithm uses the buffers to maintain samples of data from the stream. Until
all buffers are full, the entire sample is stored in the buffers.
If one of the getResult methods is called when all data are available in memory
and there is room to make a copy of the data (meaning the combined set of buffers is
less than half full), the getResult method delegates to a Percentile
instance to compute and return the exact value for the desired quantile.
For default epsilon, this means exact values will be returned whenever fewer than
\(\left\lceil {15 \times 36453 / 2} \right\rceil = 273,398\) values have been consumed
from the data stream.
When buffers become full, the algorithm merges buffers so that they effectively represent
a larger set of values than they can hold. Subsequently, data values are sampled from the
stream to fill buffers freed by merge operations. Both the merging and the sampling
require random selection, which is done using a RandomGenerator. To get
repeatable results for large data streams, users should provide RandomGenerator
instances with fixed seeds. RandomPercentile itself does not reseed or otherwise
initialize the RandomGenerator provided to it. By default, it uses a
Well19937c generator with the default seed.
Note: This implementation is not thread-safe.
| Modifier and Type | Field | Description |
|---|---|---|
static double |
DEFAULT_EPSILON |
Default quantile estimation error setting
|
| Constructor | Description |
|---|---|
RandomPercentile() |
Constructs a
RandomPercentile with quantile estimation error
set to the default (DEFAULT_EPSILON), using the default PRNG
as source of random data. |
RandomPercentile(double epsilon) |
Constructs a
RandomPercentile with quantile estimation error
epsilon using the default PRNG as source of random data. |
RandomPercentile(double epsilon,
RandomGenerator randomGenerator) |
Constructs a
RandomPercentile with quantile estimation error
epsilon using randomGenerator as its source of random data. |
RandomPercentile(RandomGenerator randomGenerator) |
Constructs a
RandomPercentile with default estimation error
using randomGenerator as its source of random data. |
RandomPercentile(RandomPercentile original) |
Copy constructor, creates a new
RandomPercentile identical
to the original. |
| Modifier and Type | Method | Description |
|---|---|---|
void |
aggregate(RandomPercentile other) |
Aggregates the provided instance into this instance.
|
void |
clear() |
Clears the internal state of the Statistic
|
RandomPercentile |
copy() |
Returns a copy of the statistic with the same internal state.
|
double |
evaluate(double[] values,
int begin,
int length) |
Returns an estimate of the median, computed using the designated
array segment as input data.
|
double |
evaluate(double percentile,
double[] values) |
Returns an estimate of percentile over the given array.
|
double |
evaluate(double percentile,
double[] values,
int begin,
int length) |
Returns an estimate of the given percentile, computed using the designated
array segment as input data.
|
double |
getAggregateN(Collection<RandomPercentile> aggregates) |
Returns the total number of values that have been consumed by the aggregates.
|
double |
getAggregateQuantileRank(double value,
Collection<RandomPercentile> aggregates) |
Returns the estimated quantile position of value in the combined dataset of the aggregates.
|
double |
getAggregateRank(double value,
Collection<RandomPercentile> aggregates) |
Computes the estimated rank of value in the combined dataset of the aggregates.
|
long |
getN() |
Returns the number of values that have been added.
|
double |
getQuantileRank(double value) |
Returns the estimated quantile position of value in the dataset.
|
double |
getRank(double value) |
Gets the estimated rank of
value, i.e. |
double |
getResult() |
Returns an estimate of the median.
|
double |
getResult(double percentile) |
Returns an estimate of the given percentile.
|
void |
increment(double d) |
Updates the internal state of the statistic to reflect the addition of the new value.
|
static long |
maxValuesRetained(double epsilon) |
Returns the maximum number of
double values that a RandomPercentile
instance created with the given epsilon value will retain in memory. |
double |
reduce(double percentile,
Collection<RandomPercentile> aggregates) |
Computes the given percentile by combining the data from the collection
of aggregates.
|
equals, hashCode, toStringaggregate, aggregateandThenclone, finalize, getClass, notify, notifyAll, wait, wait, waitaccept, incrementAll, incrementAllevaluatepublic static final double DEFAULT_EPSILON
public RandomPercentile(double epsilon,
RandomGenerator randomGenerator)
RandomPercentile with quantile estimation error
epsilon using randomGenerator as its source of random data.epsilon - bound on quantile estimation error (see class javadoc)randomGenerator - PRNG used in sampling and merge operationsMathIllegalArgumentException - if percentile is not in the range [0, 100]public RandomPercentile(RandomGenerator randomGenerator)
RandomPercentile with default estimation error
using randomGenerator as its source of random data.randomGenerator - PRNG used in sampling and merge operationsMathIllegalArgumentException - if percentile is not in the range [0, 100]public RandomPercentile(double epsilon)
RandomPercentile with quantile estimation error
epsilon using the default PRNG as source of random data.epsilon - bound on quantile estimation error (see class javadoc)MathIllegalArgumentException - if percentile is not in the range [0, 100]public RandomPercentile()
RandomPercentile with quantile estimation error
set to the default (DEFAULT_EPSILON), using the default PRNG
as source of random data.public RandomPercentile(RandomPercentile original)
RandomPercentile identical
to the original. Note: the RandomGenerator used by the new
instance is referenced, not copied - i.e., the new instance shares
a generator with the original.original - the PSquarePercentile instance to copypublic long getN()
StorelessUnivariateStatisticgetN in interface StorelessUnivariateStatisticpublic double evaluate(double percentile,
double[] values,
int begin,
int length)
throws MathIllegalArgumentException
values - source of input databegin - position of the first element of the values array to includelength - number of array elements to includepercentile - desired percentile (scaled 0 - 100)MathIllegalArgumentException - if percentile is out of the range [0, 100]public double evaluate(double[] values,
int begin,
int length)
evaluate in interface MathArrays.Functionevaluate in interface StorelessUnivariateStatisticevaluate in interface UnivariateStatisticvalues - source of input databegin - position of the first element of the values array to includelength - number of array elements to includeMathIllegalArgumentException - if percentile is out of the range [0, 100]UnivariateStatistic.evaluate(double[], int, int)public double evaluate(double percentile,
double[] values)
values - source of input datapercentile - desired percentile (scaled 0 - 100)MathIllegalArgumentException - if percentile is out of the range [0, 100]public RandomPercentile copy()
AbstractStorelessUnivariateStatisticcopy in interface StorelessUnivariateStatisticcopy in interface UnivariateStatisticcopy in class AbstractStorelessUnivariateStatisticpublic void clear()
AbstractStorelessUnivariateStatisticclear in interface StorelessUnivariateStatisticclear in class AbstractStorelessUnivariateStatisticpublic double getResult()
getResult in interface StorelessUnivariateStatisticgetResult in class AbstractStorelessUnivariateStatisticDouble.NaN if it
has been cleared or just instantiated.public double getResult(double percentile)
percentile - desired percentile (scaled 0 - 100)MathIllegalArgumentException - if percentile is out of the range [0, 100]public double getRank(double value)
value, i.e. \(|\{x \in X : x < value\}|\)
where \(X\) is the set of values that have been consumed from the stream.value - value whose overall rank is soughtvaluepublic double getQuantileRank(double value)
value - value whose quantile rank is sought.valuepublic void increment(double d)
AbstractStorelessUnivariateStatisticincrement in interface StorelessUnivariateStatisticincrement in class AbstractStorelessUnivariateStatisticd - the new value.public double reduce(double percentile,
Collection<RandomPercentile> aggregates)
percentile - desired percentile (scaled 0-100)aggregates - RandomPercentile instances to combine data fromMathIllegalArgumentException - if percentile is out of the range [0, 100]public double getAggregateRank(double value,
Collection<RandomPercentile> aggregates)
getRank(double).value - value whose rank is soughtaggregates - collection to aggregate rank overpublic double getAggregateQuantileRank(double value,
Collection<RandomPercentile> aggregates)
value - value whose quantile rank is sought.aggregates - collection of RandomPercentile instances being combinedvaluepublic double getAggregateN(Collection<RandomPercentile> aggregates)
aggregates - collection of RandomPercentile instances whose combined sample size is soughtpublic void aggregate(RandomPercentile other) throws NullArgumentException
Other must have the same buffer size as this. If the combined data size
exceeds the maximum storage configured for this instance, buffers are
merged to create capacity. If all that is needed is computation of
aggregate results, reduce(double, Collection) is faster,
may be more accurate and does not require the buffer sizes to be the same.
aggregate in interface AggregatableStatistic<RandomPercentile>other - the instance to aggregate into this instanceNullArgumentException - if the input is nullIllegalArgumentException - if other has different buffer size than thispublic static long maxValuesRetained(double epsilon)
double values that a RandomPercentile
instance created with the given epsilon value will retain in memory.
If the number of values that have been consumed from the stream is less than 1/2 of this value, reported statistics are exact.
epsilon - bound on the relative quantile error (see class javadoc)MathIllegalArgumentException - if epsilon is not in the interval (0,1)Copyright © 2016–2018 Hipparchus.org. All rights reserved.