Preprocessing
- niaarm.preprocessing.cosine_similarity(reference, targets, *, transactions)
Cosine similarity from input to multiple targets.
- Parameters:
reference (int) – Index of input transaction
targets (np.ndarray) – Indices of target transactions to compare
transactions (np.ndarray) – One-hot encoded transaction data (keyword-only)
- Returns:
Cosine similarities from input to each target
- Return type:
np.ndarray
- niaarm.preprocessing.euclidean(reference, targets, *, cat_data, num_data, cat_weights, num_weights)
Euclidean distance from input to multiple targets.
- Parameters:
reference (int) – Index of reference transaction
targets (np.ndarray) – Indices of target transactions to compare
cat_data (np.ndarray | None) – Categorical feature data (keyword-only)
num_data (np.ndarray | None) – Numerical feature data (keyword-only)
cat_weights (np.ndarray | None) – Squared weights for categorical features ( keyword-only)
num_weights (np.ndarray | None) – Squared weights for numerical features ( keyword-only)
- Returns:
Distances from input to each target
- Return type:
np.ndarray
- niaarm.preprocessing.mean_or_mode(column)
Aggregate function that returns the mode for categorical features, and the mean for numerical features.
- niaarm.preprocessing.squash(dataset, threshold, similarity='euclidean')
Squash dataset.
- Parameters:
dataset (Dataset) – Dataset to squash.
threshold (float) – Similarity threshold. Should be between 0 and 1.
similarity (Literal["euclidean", "cosine"]) – Similarity measure for comparing transactions (euclidean or cosine). Default: ‘euclidean’.
- Returns:
Squashed dataset.
- Return type: