Preprocessing

niaarm.preprocessing.cosine_similarity(reference, targets, *, transactions)

Cosine similarity from input to multiple targets.

Parameters:

reference (int) – Index of input transaction
targets (np.ndarray) – Indices of target transactions to compare
transactions (np.ndarray) – One-hot encoded transaction data (keyword-only)

Returns:

Cosine similarities from input to each target

Return type:

np.ndarray

niaarm.preprocessing.euclidean(reference, targets, *, cat_data, num_data, cat_weights, num_weights)

Euclidean distance from input to multiple targets.

Parameters:

reference (int) – Index of reference transaction
targets (np.ndarray) – Indices of target transactions to compare
cat_data (np.ndarray | None) – Categorical feature data (keyword-only)
num_data (np.ndarray | None) – Numerical feature data (keyword-only)
cat_weights (np.ndarray | None) – Squared weights for categorical features ( keyword-only)
num_weights (np.ndarray | None) – Squared weights for numerical features ( keyword-only)

Returns:

Distances from input to each target

Return type:

np.ndarray

niaarm.preprocessing.mean_or_mode(column): Aggregate function that returns the mode for categorical features, and the mean for numerical features.

niaarm.preprocessing.squash(dataset, threshold, similarity='euclidean')

Squash dataset.

Parameters:

dataset (Dataset) – Dataset to squash.
threshold (float) – Similarity threshold. Should be between 0 and 1.
similarity (Literal["euclidean", "cosine"]) – Similarity measure for comparing transactions (euclidean or cosine). Default: ‘euclidean’.

Returns:

Squashed dataset.

Return type:

Dataset