public class SimpleRegression extends Object implements Serializable, UpdatingMultipleLinearRegression
  y = intercept + slope * x  
 Standard errors for intercept and slope are
 available as well as ANOVA, r-square and Pearson's r statistics.
Observations (x,y pairs) can be added to the model one at a time or they can be provided in a 2-dimensional array. The observations are not stored in memory, so there is no limit to the number of observations that can be added to the model.
Usage Notes:
NaN. At least two observations with
 different x coordinates are required to estimate a bivariate regression
 model.
 false to
 the SimpleRegression(boolean) constructor.  When the
 hasIntercept property is false, the model is estimated without a
 constant term and getIntercept() returns 0.| Constructor and Description | 
|---|
| SimpleRegression()Create an empty SimpleRegression instance | 
| SimpleRegression(boolean includeIntercept)Create a SimpleRegression instance, specifying whether or not to estimate
 an intercept. | 
| Modifier and Type | Method and Description | 
|---|---|
| void | addData(double[][] data)Adds the observations represented by the elements in
  data. | 
| void | addData(double x,
       double y)Adds the observation (x,y) to the regression data set. | 
| void | addObservation(double[] x,
              double y)Adds one observation to the regression model. | 
| void | addObservations(double[][] x,
               double[] y)Adds a series of observations to the regression model. | 
| void | append(SimpleRegression reg)Appends data from another regression calculation to this one. | 
| void | clear()Clears all data from the model. | 
| double | getIntercept()Returns the intercept of the estimated regression line, if
  hasIntercept()is true; otherwise 0. | 
| double | getInterceptStdErr()Returns the 
 standard error of the intercept estimate,
 usually denoted s(b0). | 
| double | getMeanSquareError()Returns the sum of squared errors divided by the degrees of freedom,
 usually abbreviated MSE. | 
| long | getN()Returns the number of observations that have been added to the model. | 
| double | getR()Returns 
 Pearson's product moment correlation coefficient,
 usually denoted r. | 
| double | getRegressionSumSquares()Returns the sum of squared deviations of the predicted y values about
 their mean (which equals the mean of y). | 
| double | getRSquare()Returns the 
 coefficient of determination,
 usually denoted r-square. | 
| double | getSignificance()Returns the significance level of the slope (equiv) correlation. | 
| double | getSlope()Returns the slope of the estimated regression line. | 
| double | getSlopeConfidenceInterval()Returns the half-width of a 95% confidence interval for the slope
 estimate. | 
| double | getSlopeConfidenceInterval(double alpha)Returns the half-width of a (100-100*alpha)% confidence interval for
 the slope estimate. | 
| double | getSlopeStdErr()Returns the standard
 error of the slope estimate,
 usually denoted s(b1). | 
| double | getSumOfCrossProducts()Returns the sum of crossproducts, xi*yi. | 
| double | getSumSquaredErrors()Returns the 
 sum of squared errors (SSE) associated with the regression
 model. | 
| double | getTotalSumSquares()Returns the sum of squared deviations of the y values about their mean. | 
| double | getXSumSquares()Returns the sum of squared deviations of the x values about their mean. | 
| boolean | hasIntercept()Returns true if the model includes an intercept term. | 
| double | predict(double x)Returns the "predicted"  yvalue associated with the
 suppliedxvalue,  based on the data that has been
 added to the model when this method is activated. | 
| RegressionResults | regress()Performs a regression on data present in buffers and outputs a RegressionResults object. | 
| RegressionResults | regress(int[] variablesToInclude)Performs a regression on data present in buffers including only regressors
 indexed in variablesToInclude and outputs a RegressionResults object | 
| void | removeData(double[][] data)Removes observations represented by the elements in  data. | 
| void | removeData(double x,
          double y)Removes the observation (x,y) from the regression data set. | 
public SimpleRegression()
public SimpleRegression(boolean includeIntercept)
Use false to estimate a model with no intercept.  When the
 hasIntercept property is false, the model is estimated without a
 constant term and getIntercept() returns 0.
includeIntercept - whether or not to include an intercept term in
 the regression modelpublic void addData(double x,
                    double y)
Uses updating formulas for means and sums of squares defined in "Algorithms for Computing the Sample Variance: Analysis and Recommendations", Chan, T.F., Golub, G.H., and LeVeque, R.J. 1983, American Statistician, vol. 37, pp. 242-247, referenced in Weisberg, S. "Applied Linear Regression". 2nd Ed. 1985.
x - independent variable valuey - dependent variable valuepublic void append(SimpleRegression reg)
The mean update formulae are based on a paper written by Philippe Pébay: Formulas for Robust, One-Pass Parallel Computation of Covariances and Arbitrary-Order Statistical Moments, 2008, Technical Report SAND2008-6212, Sandia National Laboratories.
reg - model to append data frompublic void removeData(double x,
                       double y)
Mirrors the addData method. This method permits the use of SimpleRegression instances in streaming mode where the regression is applied to a sliding "window" of observations, however the caller is responsible for maintaining the set of observations in the window.
The method has no effect if there are no points of data (i.e. n=0)x - independent variable valuey - dependent variable valuepublic void addData(double[][] data)
             throws MathIllegalArgumentException
data.
 
 (data[0][0],data[0][1]) will be the first observation, then
 (data[1][0],data[1][1]), etc.
 This method does not replace data that has already been added.  The
 observations represented by data are added to the existing
 dataset.
 To replace all data, use clear() before adding the new
 data.
data - array of observations to be addedMathIllegalArgumentException - if the length of data[i] is not
 greater than or equal to 2public void addObservation(double[] x,
                           double y)
                    throws MathIllegalArgumentException
addObservation in interface UpdatingMultipleLinearRegressionx - the independent variables which form the design matrixy - the dependent or response variableMathIllegalArgumentException - if the length of x does not equal
 the number of independent variables in the modelpublic void addObservations(double[][] x,
                            double[] y)
                     throws MathIllegalArgumentException
addObservations in interface UpdatingMultipleLinearRegressionx - a series of observations on the independent variablesy - a series of observations on the dependent variable
 The length of x and y must be the sameMathIllegalArgumentException - if x is not rectangular, does not match
 the length of y or does not contain sufficient data to estimate the modelpublic void removeData(double[][] data)
data.
 If the array is larger than the current n, only the first n elements are processed. This method permits the use of SimpleRegression instances in streaming mode where the regression is applied to a sliding "window" of observations, however the caller is responsible for maintaining the set of observations in the window.
 To remove all data, use clear().
data - array of observations to be removedpublic void clear()
clear in interface UpdatingMultipleLinearRegressionpublic long getN()
getN in interface UpdatingMultipleLinearRegressionpublic double predict(double x)
y value associated with the
 supplied x value,  based on the data that has been
 added to the model when this method is activated.
 
  predict(x) = intercept + slope * x 
Preconditions:
Double,NaN is
 returned.
 x - input x valuey valuepublic double getIntercept()
hasIntercept() is true; otherwise 0.
 The least squares estimate of the intercept is computed using the normal equations. The intercept is sometimes denoted b0.
Preconditions:
Double,NaN is
 returned.
 SimpleRegression(boolean)public boolean hasIntercept()
hasIntercept in interface UpdatingMultipleLinearRegressionSimpleRegression(boolean)public double getSlope()
The least squares estimate of the slope is computed using the normal equations. The slope is sometimes denoted b1.
Preconditions:
Double.NaN is
 returned.
 public double getSumSquaredErrors()
The sum is computed using the computational formula
 SSE = SYY - (SXY * SXY / SXX)
 where SYY is the sum of the squared deviations of the y
 values about their mean, SXX is similarly defined and
 SXY is the sum of the products of x and y mean deviations.
 
 The sums are accumulated using the updating algorithm referenced in
 addData(double, double).
The return value is constrained to be non-negative - i.e., if due to rounding errors the computational formula returns a negative result, 0 is returned.
Preconditions:
Double,NaN is
 returned.
 public double getTotalSumSquares()
This is defined as SSTO here.
 If n < 2, this returns Double.NaN.
public double getXSumSquares()
n < 2, this returns Double.NaN.public double getSumOfCrossProducts()
public double getRegressionSumSquares()
This is usually abbreviated SSR or SSM. It is defined as SSM here
Preconditions:
Double.NaN is
 returned.
 public double getMeanSquareError()
 If there are fewer than three data pairs in the model,
 or if there is no variation in x, this returns
 Double.NaN.
public double getR()
Preconditions:
Double,NaN is
 returned.
 public double getRSquare()
Preconditions:
Double,NaN is
 returned.
 public double getInterceptStdErr()
 If there are fewer that three observations in the
 model, or if there is no variation in x, this returns
 Double.NaN.
Double.NaN is
 returned when the intercept is constrained to be zeropublic double getSlopeStdErr()
 If there are fewer that three data pairs in the model,
 or if there is no variation in x, this returns Double.NaN.
 
public double getSlopeConfidenceInterval()
                                  throws MathIllegalArgumentException
The 95% confidence interval is
 (getSlope() - getSlopeConfidenceInterval(),
 getSlope() + getSlopeConfidenceInterval())
 If there are fewer that three observations in the
 model, or if there is no variation in x, this returns
 Double.NaN.
 Usage Note:
 The validity of this statistic depends on the assumption that the
 observations included in the model are drawn from a
 
 Bivariate Normal Distribution.
MathIllegalArgumentException - if the confidence interval can not be computed.public double getSlopeConfidenceInterval(double alpha)
                                  throws MathIllegalArgumentException
The (100-100*alpha)% confidence interval is
 (getSlope() - getSlopeConfidenceInterval(),
 getSlope() + getSlopeConfidenceInterval())
 To request, for example, a 99% confidence interval, use
 alpha = .01
 Usage Note:
 The validity of this statistic depends on the assumption that the
 observations included in the model are drawn from a
 
 Bivariate Normal Distribution.
Preconditions:
Double.NaN.
 (0 < alpha < 1); otherwise an
 MathIllegalArgumentException is thrown.
 alpha - the desired significance levelMathIllegalArgumentException - if the confidence interval can not be computed.public double getSignificance()
 Specifically, the returned value is the smallest alpha
 such that the slope confidence interval with significance level
 equal to alpha does not include 0.
 On regression output, this is often denoted Prob(|t| > 0)
 
 Usage Note:
 The validity of this statistic depends on the assumption that the
 observations included in the model are drawn from a
 
 Bivariate Normal Distribution.
 If there are fewer that three observations in the
 model, or if there is no variation in x, this returns
 Double.NaN.
MathIllegalStateException - if the significance level can not be computed.public RegressionResults regress() throws MathIllegalArgumentException
If there are fewer than 3 observations in the model and hasIntercept is true
 a MathIllegalArgumentException is thrown.  If there is no intercept term, the model must
 contain at least 2 observations.
regress in interface UpdatingMultipleLinearRegressionMathIllegalArgumentException - if the model is not correctly specifiedMathIllegalArgumentException - if there is not sufficient data in the model to
 estimate the regression parameterspublic RegressionResults regress(int[] variablesToInclude) throws MathIllegalArgumentException
regress in interface UpdatingMultipleLinearRegressionvariablesToInclude - an array of indices of regressors to includeMathIllegalArgumentException - if the variablesToInclude array is null or zero lengthMathIllegalArgumentException - if a requested variable is not present in modelCopyright © 2016–2020 Hipparchus.org. All rights reserved.