utils#
- datacheese.utils.assert_ndarray_shape(A, shape, array_name='ndarray')#
Assert array is of given shape.
- Parameters:
A (numpy.ndarray) – Array to check shape of.
shape (tuple) – Shape to expect, represented by a tuple of dimensions. Dimensions with
None
are ignored.array_name (str, default
ndarray
) – Display name of given array, used to construct error message.
Examples
>>> import numpy as np >>> from datacheese.utils import assert_ndarray_shape >>> A = np.zeros((3, 4)) >>> assert_ndarray_shape(A, shape=(3, 4), array_name='A')
None
may be used to ignore a dimensions.>>> assert_ndarray_shape(A, shape=(3, None), array_name='A')
ArrayShapeError
is raised when shapes don’t match.>>> assert_ndarray_shape(A, shape=(None, 7), array_name='A') Traceback (most recent call last): raise ArrayShapeError( ArrayShapeError: Invalid shape for A, expected shape (None, 7), got shape (3, 4)
ArrayShapeError
is raised when number of dimensions don’t match.>>> assert_ndarray_shape(A, shape=(3, 4, 9), array_name='A') Traceback (most recent call last): raise ArrayShapeError( ArrayShapeError: Invalid number of dimensions for A, expected 3 dimensions, got 2 dimensions
- datacheese.utils.assert_fitted(fitted, class_name='class')#
Assert that an estimator has been fitted.
- Parameters:
fitted (bool) – Whether or not the given estimator has been fitted.
class_name (str, default
class
) – Display name of class instance, used to construct error message.
Examples
>>> from datacheese.utils import assert_fitted >>> assert_fitted(True, class_name='myclass') >>> assert_fitted(False, class_name='myclass') Traceback (most recent call last): raise NotFittedError( datacheese.exceptions.NotFittedError: This myclass instance has not been fitted yet. Call 'fit' method before using this estimator.
- datacheese.utils.assert_str_choice(str_val, choices, str_name='string', case_insensitive=False)#
Assert that a string value belongs to given list of allowed choices.
- Parameters:
str_val (str) – String value.
choices (list) – List of allowed choices.
str_name (str, default
string
) – Display name of string variable, used to construct error message.case_insensitive (bool) – Case sensitivity when checking if string value is in given list.
Examples
>>> from datacheese.utils import assert_str_choice >>> eu_country = 'Germany' >>> choices = ['Germany', 'Italy', 'Spain'] >>> assert_str_choice(eu_country, choices, str_name='EU country') >>> eu_country = 'Britain' >>> assert_str_choice(eu_country, choices, str_name='EU country') Traceback (most recent call last): raise ValueError( ValueError: Invalid value 'Britain' for 'EU country', must be one of 'Germany', 'Italy', 'Spain'.
Set
case_insensitive
toTrue
to ignore case:>>> eu_country = 'germany' >>> assert_str_choice( ... eu_country, ... choices, ... str_name='EU country', ... case_insensitive=True, ... )
- datacheese.utils.pad_array(A, edge, c)#
Add constant padding to 2D array on one side.
- Parameters:
A (numpy.ndarray) – 2D array to be padded.
edge (str) – Edge on which padding is to be added. Must be one of
top
,bottom
,left
, orright
.c (float) – Constant to be padded.
- Returns:
Ap – Padded array.
- Return type:
numpy.ndarray
Examples
>>> import numpy as np >>> from datacheese.utils import pad_array >>> A = np.zeros((3, 4), dtype=np.float64) >>> A array([[0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.]]) >>> pad_array(A, 'right', 2) array([[0., 0., 0., 0., 2.], [0., 0., 0., 0., 2.], [0., 0., 0., 0., 2.]]) >>> pad_array(A, 'bottom', -1) array([[ 0., 0., 0., 0.], [ 0., 0., 0., 0.], [ 0., 0., 0., 0.], [-1., -1., -1., -1.]])
- datacheese.utils.pairwise_distances(A1, A2, p=2)#
Compute pairwise Minkowski distances between two sets of 1D arrays. Minkowski distance between vectors \(\textbf(x)\) and \(\textbf(y)\) is defined as follows:
\[D(\textbf(x), \textbf(y)) = \bigg(\sum\limits^n_{i = 1}{|x_i - y_i|^p}\bigg)^\frac{1}{p}\]- Parameters:
A1 (numpy.ndarray) – 2D array. Must share same second axis length with that of
A2
.A2 (numpy.ndarray) – 2D array. Must share same second axis length with that of
A1
.p (int, default 2) – Exponent of Minkowski distance.
- Returns:
distances – Array of pairwise distances.
- Return type:
ndarray
- datacheese.utils.array_mode_value(A, seed=None)#
Extract the mode of a 1D array. Ties are broken randomly.
- Parameters:
A (numpy.ndarray) – 1D array.
seed (int or None, default None) – Random seed for reproducible results.
- Returns:
mode – Computed mode value.
- Return type:
any