utils#

datacheese.utils.assert_ndarray_shape(A, shape, array_name='ndarray')#

Assert array is of given shape.

Parameters:
  • A (numpy.ndarray) – Array to check shape of.

  • shape (tuple) – Shape to expect, represented by a tuple of dimensions. Dimensions with None are ignored.

  • array_name (str, default ndarray) – Display name of given array, used to construct error message.

Examples

>>> import numpy as np
>>> from datacheese.utils import assert_ndarray_shape
>>> A = np.zeros((3, 4))
>>> assert_ndarray_shape(A, shape=(3, 4), array_name='A')

None may be used to ignore a dimensions.

>>> assert_ndarray_shape(A, shape=(3, None), array_name='A')

ArrayShapeError is raised when shapes don’t match.

>>> assert_ndarray_shape(A, shape=(None, 7), array_name='A')
Traceback (most recent call last):
    raise ArrayShapeError(
ArrayShapeError: Invalid shape for A, expected shape (None, 7), got shape
(3, 4)

ArrayShapeError is raised when number of dimensions don’t match.

>>> assert_ndarray_shape(A, shape=(3, 4, 9), array_name='A')
Traceback (most recent call last):
    raise ArrayShapeError(
ArrayShapeError: Invalid number of dimensions for A, expected 3 dimensions,
got 2 dimensions
datacheese.utils.assert_fitted(fitted, class_name='class')#

Assert that an estimator has been fitted.

Parameters:
  • fitted (bool) – Whether or not the given estimator has been fitted.

  • class_name (str, default class) – Display name of class instance, used to construct error message.

Examples

>>> from datacheese.utils import assert_fitted
>>> assert_fitted(True, class_name='myclass')
>>> assert_fitted(False, class_name='myclass')
Traceback (most recent call last):
    raise NotFittedError(
datacheese.exceptions.NotFittedError: This myclass instance has not been
fitted yet. Call 'fit' method before using this estimator.
datacheese.utils.assert_str_choice(str_val, choices, str_name='string', case_insensitive=False)#

Assert that a string value belongs to given list of allowed choices.

Parameters:
  • str_val (str) – String value.

  • choices (list) – List of allowed choices.

  • str_name (str, default string) – Display name of string variable, used to construct error message.

  • case_insensitive (bool) – Case sensitivity when checking if string value is in given list.

Examples

>>> from datacheese.utils import assert_str_choice
>>> eu_country = 'Germany'
>>> choices = ['Germany', 'Italy', 'Spain']
>>> assert_str_choice(eu_country, choices, str_name='EU country')
>>> eu_country = 'Britain'
>>> assert_str_choice(eu_country, choices, str_name='EU country')
Traceback (most recent call last):
    raise ValueError(
ValueError: Invalid value 'Britain' for 'EU country', must be one of
'Germany', 'Italy', 'Spain'.

Set case_insensitive to True to ignore case:

>>> eu_country = 'germany'
>>> assert_str_choice(
...     eu_country,
...     choices,
...     str_name='EU country',
...     case_insensitive=True,
... )
datacheese.utils.pad_array(A, edge, c)#

Add constant padding to 2D array on one side.

Parameters:
  • A (numpy.ndarray) – 2D array to be padded.

  • edge (str) – Edge on which padding is to be added. Must be one of top, bottom, left, or right.

  • c (float) – Constant to be padded.

Returns:

Ap – Padded array.

Return type:

numpy.ndarray

Examples

>>> import numpy as np
>>> from datacheese.utils import pad_array
>>> A = np.zeros((3, 4), dtype=np.float64)
>>> A
array([[0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.]])
>>> pad_array(A, 'right', 2)
array([[0., 0., 0., 0., 2.],
       [0., 0., 0., 0., 2.],
       [0., 0., 0., 0., 2.]])
>>> pad_array(A, 'bottom', -1)
array([[ 0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.],
       [-1., -1., -1., -1.]])
datacheese.utils.pairwise_distances(A1, A2, p=2)#

Compute pairwise Minkowski distances between two sets of 1D arrays. Minkowski distance between vectors \(\textbf(x)\) and \(\textbf(y)\) is defined as follows:

\[D(\textbf(x), \textbf(y)) = \bigg(\sum\limits^n_{i = 1}{|x_i - y_i|^p}\bigg)^\frac{1}{p}\]
Parameters:
  • A1 (numpy.ndarray) – 2D array. Must share same second axis length with that of A2.

  • A2 (numpy.ndarray) – 2D array. Must share same second axis length with that of A1.

  • p (int, default 2) – Exponent of Minkowski distance.

Returns:

distances – Array of pairwise distances.

Return type:

ndarray

datacheese.utils.array_mode_value(A, seed=None)#

Extract the mode of a 1D array. Ties are broken randomly.

Parameters:
  • A (numpy.ndarray) – 1D array.

  • seed (int or None, default None) – Random seed for reproducible results.

Returns:

mode – Computed mode value.

Return type:

any