regression#

class datacheese.regression.LinearRegression#

Bases: object

Ordinary least squares linear regression model.

Examples

>>> import numpy as np
>>> from datacheese.regression import LinearRegression

Generate input data:

>>> X = np.array(
...     [
...         [1, 1, 0],
...         [1, 2, -1],
...         [2, -2, 0],
...         [0, -3, 7],
...         [8, -6, 0],
...     ],
...     dtype=np.float64,
... )
>>> X
array([[ 1.,  1.,  0.],
       [ 1.,  2., -1.],
       [ 2., -2.,  0.],
       [ 0., -3.,  7.],
       [ 8., -6.,  0.]])

Generate target values using equations \(y_0 = x_0 + 2 x_1 + x_2 + 3\) and \(y_1 = 2 x_0 - x_1 - 3 x_2 - 2\):

>>> Y = np.matmul(
...     X,
...     np.array([[1, 2], [2, -1], [1, -3]]),
... ) + np.array([3, -2])
>>> Y
array([[  6.,  -1.],
       [  7.,   1.],
       [  1.,   4.],
       [  4., -20.],
       [ -1.,  20.]])

Fit model using data:

>>> model = LinearRegression()
>>> model.fit(X, Y)

Use model to make predictions:

>>> X_test = np.array([[3, 5, -2], [-2, 4, 3]], dtype=np.float64)
>>> X_test
array([[ 3.,  5., -2.],
       [-2.,  4.,  3.]])
>>> Y_test = np.matmul(
...     X_test,
...     np.array([[1, 2], [2, -1], [1, -3]]),
... ) + np.array([3, -2])
>>> Y_test
array([[ 14.,   5.],
       [ 12., -19.]])
>>> model.predict(X_test)
array([[ 14.,   5.],
       [ 12., -19.]])

Compute \(R^2\) accuracies:

>>> model.score(X_test, Y_test)
array([1., 1.])

Setting Lamdba to non-zero value performs ridge regression:

>>> model.fit(X, Y, Lambda=0.5)
>>> model.predict(X_test)
array([[ 13.6669421 ,   4.50426919],
       [ 11.51303651, -18.88126742]])
>>> model.score(X_test, Y_test)
array([0.99827693, 0.99946753])

fit(X, Y, Lambda=0.0)#

Fit linear regression model to training data.

Parameters:

X (numpy.ndarray) – 2D training features array, of shape n x d, where n is the number of training examples and d is the number of dimensions.
Y (numpy.ndarray) – 2D training target values array, of shape n x t, where n is the number of training examples and t is the number of targets.
Lambda (float, default 0.0) – Regularization constant to be used as L2 penalty term weight in ridge regression. Default is 0.0, which is the special case of ordinary least squares regression.

predict(X)#

Use fitted weights to predict target values for test data.

Parameters:: X (numpy.ndarray) – 2D testing features array, of shape m x d, where m is the number of testing examples and d is the number of dimensions.
Returns:: Y_pred – 2D array of predicted target values.
Return type:: numpy.ndarray

score(X, Y)#

Use fitted weights to predict target values for test data and compute \(R^2\) accuracies using actual target values.

Parameters:

X (numpy.ndarray) – 2D testing features array, of shape m x d, where m is the number of testing examples and d is the number of dimensions.
Y (numpy.ndarray) – 2D testing target values array, of shape m x t, where m is the number of testing examples and t is the number of targets.

Returns:

r_squared – Array of \(R^2\) accuracy scores, i.e. values between 0 and 1.

Return type:

numpy.ndarray

class datacheese.regression.LogisticRegression#

Bases: object

Binary logistic regression model.

Examples

>>> import numpy as np
>>> from datacheese.regression import LogisticRegression

Generate input data:

>>> X = np.array([[0, 0], [0, 1], [1, 0], [1, 1]], dtype=np.float64)
>>> X
array([[0., 0.],
       [0., 1.],
       [1., 0.],
       [1., 1.]])

Generate target values based on OR and AND logic gates:

>>> Y = np.column_stack((np.any(X, axis=1), np.all(X, axis=1))).astype(int)
>>> Y
array([[0, 0],
       [1, 0],
       [1, 0],
       [1, 1]])

Fit model using data:

>>> model = LogisticRegression()
>>> model.fit(X, Y)

Use model to make predictions:

>>> model.predict(X)
array([[0, 0],
       [1, 0],
       [1, 0],
       [1, 1]])

Use model to obtain prediction probabilities:

>>> model.predict_prob(X)
array([[5.15437859e-02, 1.96513677e-04],
       [9.79474434e-01, 4.90497532e-02],
       [9.79474434e-01, 4.90497532e-02],
       [9.99976135e-01, 9.31203746e-01]])

Compute accuracy:

>>> model.score(X, Y)
array([1., 1.])

Compute negative log probability by changing the metric parameter:

>>> model.score(X, Y, metric='log_loss')
array([0.0944218 , 0.17206078])

fit(X, Y, lr=0.1, Lambda=0.0, tolerance=0.0001, max_iters=1000, method='gradient')#

Fit logistic regression model to training data.

Parameters:

X (numpy.ndarray) – 2D training features array, of shape n x d, where n is the number of training examples and d is the number of dimensions.
Y (numpy.ndarray) – 2D training target values array, of shape n x t, where n is the number of training examples and t is the number of targets.
lr (float, default 0.1) – Learning rate for weight update, only used with gradient descent.
Lambda (float, default 0.0) – Regularization constant, lambda, to be used as penalty term weight.
tolerance (float, default 0.0001) – Tolerance of maximum element in gradient vector, used to for termination criteria.
max_iters (int, default 1000) – Maximum number of iterations.
method (str, default gradient) – Method to use for computation. Must be either gradient, representing gradient descent, or newton, representing Newton’s method.

predict(X)#

Use fitted weights to predict target values for test data.

Parameters:: X (numpy.ndarray) – 2D testing features array, of shape m x d, where m is the number of testing examples and d is the number of dimensions.
Returns:: Y_pred – Array of predicted target values.
Return type:: numpy.ndarray

predict_prob(X)#

Use fitted weights to compute target probabilities for test data.

Parameters:: X (numpy.ndarray) – 2D testing features array, of shape m x d, where m is the number of testing examples and d is the number of dimensions.
Returns:: Y_prob – Array of target probabilities.
Return type:: numpy.ndarray

score(X, Y, metric='accuracy')#

Use fitted weights to predict target values for test data and compute prediction score. This can be classification accuracy or log loss, depending on the chosen metric.

Parameters:

X (numpy.ndarray) – 2D testing features array, of shape m x d, where m is the number of testing examples and d is the number of dimensions.
Y (numpy.ndarray) – 2D testing target values array, of shape m x t, where m is the number of testing examples and t is the number of targets.
metric (str) – Chosen metric. Must be one of accuracy or log_loss, corresponding to classification accuracy or log loss respectively.

Returns:

score – Array of prediction scores.

Return type:

numpy.ndarray