Preprocessing

niaarm.preprocessing.cosine_similarity(reference, targets, *, transactions)

Cosine similarity from input to multiple targets.

Parameters:
  • reference (int) – Index of input transaction

  • targets (np.ndarray) – Indices of target transactions to compare

  • transactions (np.ndarray) – One-hot encoded transaction data (keyword-only)

Returns:

Cosine similarities from input to each target

Return type:

np.ndarray

niaarm.preprocessing.euclidean(reference, targets, *, cat_data, num_data, cat_weights, num_weights)

Euclidean distance from input to multiple targets.

Parameters:
  • reference (int) – Index of reference transaction

  • targets (np.ndarray) – Indices of target transactions to compare

  • cat_data (np.ndarray | None) – Categorical feature data (keyword-only)

  • num_data (np.ndarray | None) – Numerical feature data (keyword-only)

  • cat_weights (np.ndarray | None) – Squared weights for categorical features ( keyword-only)

  • num_weights (np.ndarray | None) – Squared weights for numerical features ( keyword-only)

Returns:

Distances from input to each target

Return type:

np.ndarray

niaarm.preprocessing.mean_or_mode(column)

Aggregate function that returns the mode for categorical features, and the mean for numerical features.

niaarm.preprocessing.squash(dataset, threshold, similarity='euclidean')

Squash dataset.

Parameters:
  • dataset (Dataset) – Dataset to squash.

  • threshold (float) – Similarity threshold. Should be between 0 and 1.

  • similarity (Literal["euclidean", "cosine"]) – Similarity measure for comparing transactions (euclidean or cosine). Default: ‘euclidean’.

Returns:

Squashed dataset.

Return type:

Dataset