regression#
- class datacheese.regression.LinearRegression#
Bases:
object
Ordinary least squares linear regression model.
Examples
>>> import numpy as np >>> from datacheese.regression import LinearRegression
Generate input data:
>>> X = np.array( ... [ ... [1, 1, 0], ... [1, 2, -1], ... [2, -2, 0], ... [0, -3, 7], ... [8, -6, 0], ... ], ... dtype=np.float64, ... ) >>> X array([[ 1., 1., 0.], [ 1., 2., -1.], [ 2., -2., 0.], [ 0., -3., 7.], [ 8., -6., 0.]])
Generate target values using equations \(y_0 = x_0 + 2 x_1 + x_2 + 3\) and \(y_1 = 2 x_0 - x_1 - 3 x_2 - 2\):
>>> Y = np.matmul( ... X, ... np.array([[1, 2], [2, -1], [1, -3]]), ... ) + np.array([3, -2]) >>> Y array([[ 6., -1.], [ 7., 1.], [ 1., 4.], [ 4., -20.], [ -1., 20.]])
Fit model using data:
>>> model = LinearRegression() >>> model.fit(X, Y)
Use model to make predictions:
>>> X_test = np.array([[3, 5, -2], [-2, 4, 3]], dtype=np.float64) >>> X_test array([[ 3., 5., -2.], [-2., 4., 3.]]) >>> Y_test = np.matmul( ... X_test, ... np.array([[1, 2], [2, -1], [1, -3]]), ... ) + np.array([3, -2]) >>> Y_test array([[ 14., 5.], [ 12., -19.]]) >>> model.predict(X_test) array([[ 14., 5.], [ 12., -19.]])
Compute \(R^2\) accuracies:
>>> model.score(X_test, Y_test) array([1., 1.])
Setting
Lamdba
to non-zero value performs ridge regression:>>> model.fit(X, Y, Lambda=0.5) >>> model.predict(X_test) array([[ 13.6669421 , 4.50426919], [ 11.51303651, -18.88126742]]) >>> model.score(X_test, Y_test) array([0.99827693, 0.99946753])
- fit(X, Y, Lambda=0.0)#
Fit linear regression model to training data.
- Parameters:
X (numpy.ndarray) – 2D training features array, of shape
n x d
, wheren
is the number of training examples andd
is the number of dimensions.Y (numpy.ndarray) – 2D training target values array, of shape
n x t
, wheren
is the number of training examples andt
is the number of targets.Lambda (float, default 0.0) – Regularization constant to be used as L2 penalty term weight in ridge regression. Default is 0.0, which is the special case of ordinary least squares regression.
- predict(X)#
Use fitted weights to predict target values for test data.
- Parameters:
X (numpy.ndarray) – 2D testing features array, of shape
m x d
, wherem
is the number of testing examples andd
is the number of dimensions.- Returns:
Y_pred – 2D array of predicted target values.
- Return type:
numpy.ndarray
- score(X, Y)#
Use fitted weights to predict target values for test data and compute \(R^2\) accuracies using actual target values.
- Parameters:
X (numpy.ndarray) – 2D testing features array, of shape
m x d
, wherem
is the number of testing examples andd
is the number of dimensions.Y (numpy.ndarray) – 2D testing target values array, of shape
m x t
, wherem
is the number of testing examples andt
is the number of targets.
- Returns:
r_squared – Array of \(R^2\) accuracy scores, i.e. values between 0 and 1.
- Return type:
numpy.ndarray
- class datacheese.regression.LogisticRegression#
Bases:
object
Binary logistic regression model.
Examples
>>> import numpy as np >>> from datacheese.regression import LogisticRegression
Generate input data:
>>> X = np.array([[0, 0], [0, 1], [1, 0], [1, 1]], dtype=np.float64) >>> X array([[0., 0.], [0., 1.], [1., 0.], [1., 1.]])
Generate target values based on OR and AND logic gates:
>>> Y = np.column_stack((np.any(X, axis=1), np.all(X, axis=1))).astype(int) >>> Y array([[0, 0], [1, 0], [1, 0], [1, 1]])
Fit model using data:
>>> model = LogisticRegression() >>> model.fit(X, Y)
Use model to make predictions:
>>> model.predict(X) array([[0, 0], [1, 0], [1, 0], [1, 1]])
Use model to obtain prediction probabilities:
>>> model.predict_prob(X) array([[5.15437859e-02, 1.96513677e-04], [9.79474434e-01, 4.90497532e-02], [9.79474434e-01, 4.90497532e-02], [9.99976135e-01, 9.31203746e-01]])
Compute accuracy:
>>> model.score(X, Y) array([1., 1.])
Compute negative log probability by changing the
metric
parameter:>>> model.score(X, Y, metric='log_loss') array([0.0944218 , 0.17206078])
- fit(X, Y, lr=0.1, Lambda=0.0, tolerance=0.0001, max_iters=1000, method='gradient')#
Fit logistic regression model to training data.
- Parameters:
X (numpy.ndarray) – 2D training features array, of shape
n x d
, wheren
is the number of training examples andd
is the number of dimensions.Y (numpy.ndarray) – 2D training target values array, of shape
n x t
, wheren
is the number of training examples andt
is the number of targets.lr (float, default 0.1) – Learning rate for weight update, only used with gradient descent.
Lambda (float, default 0.0) – Regularization constant, lambda, to be used as penalty term weight.
tolerance (float, default 0.0001) – Tolerance of maximum element in gradient vector, used to for termination criteria.
max_iters (int, default 1000) – Maximum number of iterations.
method (str, default
gradient
) – Method to use for computation. Must be eithergradient
, representing gradient descent, ornewton
, representing Newton’s method.
- predict(X)#
Use fitted weights to predict target values for test data.
- Parameters:
X (numpy.ndarray) – 2D testing features array, of shape
m x d
, wherem
is the number of testing examples andd
is the number of dimensions.- Returns:
Y_pred – Array of predicted target values.
- Return type:
numpy.ndarray
- predict_prob(X)#
Use fitted weights to compute target probabilities for test data.
- Parameters:
X (numpy.ndarray) – 2D testing features array, of shape
m x d
, wherem
is the number of testing examples andd
is the number of dimensions.- Returns:
Y_prob – Array of target probabilities.
- Return type:
numpy.ndarray
- score(X, Y, metric='accuracy')#
Use fitted weights to predict target values for test data and compute prediction score. This can be classification accuracy or log loss, depending on the chosen metric.
- Parameters:
X (numpy.ndarray) – 2D testing features array, of shape
m x d
, wherem
is the number of testing examples andd
is the number of dimensions.Y (numpy.ndarray) – 2D testing target values array, of shape
m x t
, wherem
is the number of testing examples andt
is the number of targets.metric (str) – Chosen metric. Must be one of
accuracy
orlog_loss
, corresponding to classification accuracy or log loss respectively.
- Returns:
score – Array of prediction scores.
- Return type:
numpy.ndarray