API Reference

ml_wrappers

Module for wrapping datasets and models in one uniform format.

class ml_wrappers.DatasetWrapper(dataset, clear_references=False)[source]

Bases: object

A wrapper around a dataset to make dataset operations more uniform across explainers.

apply_indexer(column_indexer, bucket_unknown=False)[source]

Indexes categorical string features on the dataset.

Parameters
  • column_indexer (sklearn.compose.ColumnTransformer) – The transformation steps to index the given dataset.

  • bucket_unknown (bool) – If true, buckets unknown values to separate categorical level.

apply_one_hot_encoder(one_hot_encoder)[source]

One-hot-encode categorical string features on the dataset.

Parameters

one_hot_encoder (sklearn.preprocessing.OneHotEncoder) – The transformation steps to one-hot-encode the given dataset.

apply_timestamp_featurizer(timestamp_featurizer)[source]

Apply timestamp featurization on the dataset.

Parameters

timestamp_featurizer (CustomTimestampFeaturizer) – The transformation steps to featurize timestamps in the given dataset.

augment_data(max_num_of_augmentations=inf)[source]

Augment the current dataset.

Parameters

max_augment_data_size (int) – number of times we stack permuted x to augment.

compute_summary(nclusters=10, use_gpu=False, **kwargs)[source]

Summarizes the dataset if it hasn’t been summarized yet.

property dataset

Get the dataset.

Returns

The underlying dataset.

Return type

numpy.ndarray or scipy.sparse.csr_matrix

get_column_indexes(features, categorical_features)[source]

Get the column indexes for the given column names.

Parameters
  • features (list[str]) – The full list of existing column names.

  • categorical_features (list[str]) – The list of categorical feature names to get indexes for.

Returns

The list of column indexes.

Return type

list[int]

get_features(features=None, explain_subset=None, **kwargs)[source]

Get the features of the dataset if None on current kwargs.

Returns

The features of the dataset if currently None on kwargs.

Return type

list

property num_features

Get the number of features (columns) on the dataset.

Returns

The number of features (columns) in the dataset.

Return type

int

one_hot_encode(columns)[source]

Indexes categorical string features on the dataset.

Parameters

columns (list[int]) – Parameter specifying the subset of column indexes that may need to be one-hot-encoded.

Returns

The transformation steps to one-hot-encode the given dataset.

Return type

sklearn.preprocessing.OneHotEncoder

property original_dataset

Get the original dataset prior to performing any operations.

Note: if the original dataset was a pandas dataframe, this will return the numpy version.

Returns

The original dataset.

Return type

numpy.ndarray or scipy.sparse matrix

property original_dataset_with_type

Get the original typed dataset which could be a numpy array or pandas DataFrame or pandas Series.

Returns

The original dataset.

Return type

numpy.ndarray or pandas.DataFrame or pandas.Series or scipy.sparse matrix

reset_index()[source]

Reset index to be part of the features on the dataset.

sample(max_dim_clustering=50, sampling_method='hdbscan')[source]

Sample the examples.

First does random downsampling to upper_bound rows, then tries to find the optimal downsample based on how many clusters can be constructed from the data. If sampling_method is hdbscan, uses hdbscan to cluster the data and then downsamples to that number of clusters. If sampling_method is k-means, uses different values of k, cutting in half each time, and chooses the k with highest silhouette score to determine how much to downsample the data. The danger of using only random downsampling is that we might downsample too much or too little, so the clustering approach is a heuristic to give us some idea of how much we should downsample to.

Parameters
  • max_dim_clustering (int) – Dimensionality threshold for performing reduction.

  • sampling_method (str) – Method to use for sampling, can be ‘hdbscan’ or ‘kmeans’.

set_index()[source]

Undo reset_index. Set index as feature on internal dataset to be an index again.

string_index(columns=None)[source]

Indexes categorical string features on the dataset.

Parameters

columns (list) – Optional parameter specifying the subset of columns that may need to be string indexed.

Returns

The transformation steps to index the given dataset.

Return type

sklearn.compose.ColumnTransformer

property summary_dataset

Get the summary dataset without any subsetting.

Returns

The original dataset or None if summary was not computed.

Return type

numpy.ndarray or scipy.sparse.csr_matrix

take_subset(explain_subset)[source]

Take a subset of the dataset if not done before.

Parameters

explain_subset (list) – A list of column indexes to take from the original dataset.

timestamp_featurizer()[source]

Featurizes the timestamp columns.

Returns

The transformation steps to featurize the timestamp columns.

Return type

ml_wrappers.DatasetWrapper

property typed_dataset

Get the dataset in the original type, pandas DataFrame or Series.

Returns

The underlying dataset.

Return type

numpy.ndarray or pandas.DataFrame or pandas.Series or scipy.sparse matrix

typed_wrapper_func(dataset, keep_index_as_feature=False)[source]

Get a wrapper function to convert the dataset to the original type, pandas DataFrame or Series.

Parameters
  • dataset (numpy.ndarray or scipy.sparse.csr_matrix) – The dataset to convert to original type.

  • keep_index_as_feature (bool) – Whether to keep the index as a feature when converting back. Off by default to convert it back to index.

Returns

A wrapper function for a given dataset to convert to original type.

Return type

numpy.ndarray or scipy.sparse.csr_matrix or pandas.DataFrame or pandas.Series

ml_wrappers.wrap_model(model, examples, model_task: str = ModelTask.UNKNOWN, num_classes: Optional[int] = None, classes: Optional[Union[list, numpy.array]] = None, device='auto')[source]
If needed, wraps the model in a common API based on model task and

prediction function contract.

Parameters
  • model (model with a predict or predict_proba function.) – The model to evaluate on the examples.

  • examples (ml_wrappers.DatasetWrapper or numpy.ndarray or pandas.DataFrame or panads.Series or scipy.sparse.csr_matrix or shap.DenseData or torch.Tensor) – The model evaluation examples. Note the examples will be wrapped in a DatasetWrapper, if not wrapped when input.

  • model_task (str) – Optional parameter to specify whether the model is a classification or regression model. In most cases, the type of the model can be inferred based on the shape of the output, where a classifier has a predict_proba method and outputs a 2 dimensional array, while a regressor has a predict method and outputs a 1 dimensional array.

  • classes (list or np.ndarray) – optional parameter specifying a list of class names the dataset

  • num_classes (int) – optional parameter specifying the number of classes in the dataset

  • device (str, for instance: 'cpu', 'cuda') – optional parameter specifying the device to move the model to. If not specified, then cpu is the default

Returns

The wrapper model.

Return type

model

ml_wrappers.common

Defines a common directory shared across ML model and dataset wrappers.

ml_wrappers.common.constants

Defines constants for ml-wrappers.

class ml_wrappers.common.constants.Attributes[source]

Bases: object

Provide constants for attributes.

EXPECTED_VALUE = 'expected_value'
class ml_wrappers.common.constants.DNNFramework[source]

Bases: object

Provide DNN framework constants.

PYTORCH = 'pytorch'
TENSORFLOW = 'tensorflow'
class ml_wrappers.common.constants.Defaults[source]

Bases: object

Provide constants for default values to explain methods.

AUTO = 'auto'
DEFAULT_BATCH_SIZE = 100
HDBSCAN = 'hdbscan'
MAX_DIM = 50
class ml_wrappers.common.constants.Device(value)[source]

Bases: enum.Enum

Specifies all possible device types.

AUTO = 'auto'
CPU = 'cpu'
CUDA = 'cuda'
class ml_wrappers.common.constants.Dynamic[source]

Bases: object

Provide constants for dynamically generated classes.

GLOBAL_EXPLANATION = 'DynamicGlobalExplanation'
LOCAL_EXPLANATION = 'DynamicLocalExplanation'
class ml_wrappers.common.constants.ExplainParams[source]

Bases: object

Provide constants for interpret community (init, explain_local and explain_global) parameters.

BATCH_SIZE = 'batch_size'
CLASSES = 'classes'
CLASSIFICATION = 'classification'
EVAL_DATA = 'eval_data'
EVAL_Y_PRED = 'eval_y_predicted'
EVAL_Y_PRED_PROBA = 'eval_y_predicted_proba'
EXPECTED_VALUES = 'expected_values'
EXPLAIN_SUBSET = 'explain_subset'
EXPLANATION_ID = 'explanation_id'
FEATURES = 'features'
GLOBAL_IMPORTANCE_NAMES = 'global_importance_names'
GLOBAL_IMPORTANCE_RANK = 'global_importance_rank'
GLOBAL_IMPORTANCE_VALUES = 'global_importance_values'
GLOBAL_NAMES = 'global_names'
GLOBAL_RANK = 'global_rank'
GLOBAL_VALUES = 'global_values'
ID = 'id'
INCLUDE_LOCAL = 'include_local'
INIT_DATA = 'init_data'
IS_ENG = 'is_engineered'
IS_LOCAL_SPARSE = 'is_local_sparse'
IS_RAW = 'is_raw'
LOCAL_EXPLANATION = 'local_explanation'
LOCAL_IMPORTANCE_VALUES = 'local_importance_values'
METHOD = 'method'
MODEL_ID = 'model_id'
MODEL_TASK = 'model_task'
MODEL_TYPE = 'model_type'
NUM_CLASSES = 'num_classes'
NUM_EXAMPLES = 'num_examples'
NUM_FEATURES = 'num_features'
PER_CLASS_NAMES = 'per_class_names'
PER_CLASS_RANK = 'per_class_rank'
PER_CLASS_VALUES = 'per_class_values'
PROBABILITIES = 'probabilities'
SAMPLING_POLICY = 'sampling_policy'
SHAP_VALUES_OUTPUT = 'shap_values_output'
classmethod get_private(explain_param)[source]

Return the private version of the ExplainParams property.

Parameters
  • cls (ExplainParams) – ExplainParams input class.

  • explain_param (str) – The ExplainParams property to get private version of.

Returns

The private version of the property.

Return type

str

classmethod get_serializable()[source]

Return only the ExplainParams properties that have meaningful data values for serialization.

Parameters

cls (ExplainParams) – ExplainParams input class.

Returns

A set of property names, e.g., ‘GLOBAL_IMPORTANCE_VALUES’, ‘MODEL_TYPE’, etc.

Return type

set[str]

class ml_wrappers.common.constants.ExplainType[source]

Bases: object

Provide constants for model and explainer type information, useful for visualization.

CLASSIFICATION = 'classification'
DATA = 'data_type'
EXPLAIN = 'explain_type'
EXPLAINER = 'explainer'
FUNCTION = 'function'
GLOBAL = 'global'
HAN = 'han'
IS_ENG = 'is_engineered'
IS_RAW = 'is_raw'
LIME = 'lime'
LOCAL = 'local'
METHOD = 'method'
MIMIC = 'mimic'
MODEL = 'model_type'
MODEL_CLASS = 'model_class'
MODEL_TASK = 'model_task'
PFI = 'pfi'
REGRESSION = 'regression'
SHAP = 'shap'
SHAP_DEEP = 'shap_deep'
SHAP_GPU_KERNEL = 'shap_gpu_kernel'
SHAP_KERNEL = 'shap_kernel'
SHAP_LINEAR = 'shap_linear'
SHAP_TREE = 'shap_tree'
TABULAR = 'tabular'
class ml_wrappers.common.constants.ExplainableModelType(value)[source]

Bases: str, enum.Enum

Provide constants for the explainable model type.

LINEAR_EXPLAINABLE_MODEL_TYPE = 'linear_explainable_model_type'
TREE_EXPLAINABLE_MODEL_TYPE = 'tree_explainable_model_type'
class ml_wrappers.common.constants.ExplanationParams[source]

Bases: object

Provide constants for explanation parameters.

CLASSES = 'classes'
EXPECTED_VALUES = 'expected_values'
class ml_wrappers.common.constants.Extension[source]

Bases: object

Provide constants for extensions to interpret package.

BLACKBOX = 'blackbox'
GLASSBOX = 'model'
GLOBAL = 'global'
GREYBOX = 'specific'
LOCAL = 'local'
class ml_wrappers.common.constants.InterpretData[source]

Bases: object

Provide Data and Visualize constants for interpret core.

BASE_VALUE = 'Base Value'
EXPLANATION_CLASS_DIMENSION = 'explanation_class_dimension'
EXPLANATION_TYPE = 'explanation_type'
EXTRA = 'extra'
FEATURE_LIST = 'feature_list'
GLOBAL_FEATURE_IMPORTANCE = 'global_feature_importance'
INTERCEPT = 'intercept'
LOCAL_FEATURE_IMPORTANCE = 'local_feature_importance'
MLI = 'mli'
MULTICLASS = 'multiclass'
NAMES = 'names'
OVERALL = 'overall'
PERF = 'perf'
SCORES = 'scores'
SINGLE = 'single'
SPECIFIC = 'specific'
TYPE = 'type'
UNIVARIATE = 'univariate'
VALUE = 'value'
VALUES = 'values'
class ml_wrappers.common.constants.LightGBMParams[source]

Bases: object

Provide constants for LightGBM.

CATEGORICAL_FEATURE = 'categorical_feature'
class ml_wrappers.common.constants.LightGBMSerializationConstants[source]

Bases: object

Provide internal class that defines fields used for MimicExplainer serialization.

IDENTITY = '_identity'
LOGGER = '_logger'
MODEL_STR = 'model_str'
MULTICLASS = 'multiclass'
OBJECTIVE = 'objective'
REGRESSION = 'regression'
TREE_EXPLAINER = '_tree_explainer'
enum_properties = ['_shap_values_output']
nonify_properties = ['_logger', '_tree_explainer']
save_properties = ['_lgbm']
class ml_wrappers.common.constants.MimicSerializationConstants[source]

Bases: object

Provide internal class that defines fields used for MimicExplainer serialization.

ALLOW_ALL_TRANSFORMATIONS = '_allow_all_transformations'
FUNCTION = 'function'
IDENTITY = '_identity'
INITIALIZATION_EXAMPLES = 'initialization_examples'
LOGGER = '_logger'
MODEL = 'model'
ORIGINAL_EVAL_EXAMPLES = '_original_eval_examples'
PREDICT_PROBA_FLAG = 'predict_proba_flag'
RESET_INDEX = 'reset_index'
TIMESTAMP_FEATURIZER = '_timestamp_featurizer'
enum_properties = ['_shap_values_output']
nonify_properties = ['_logger', 'model', 'function', 'initialization_examples', '_original_eval_examples', '_timestamp_featurizer']
save_properties = ['surrogate_model']
class ml_wrappers.common.constants.ModelTask(value)[source]

Bases: str, enum.Enum

Provide model task constants.

For tabular data, can be ‘classification’, ‘regression’, or ‘unknown’. For text data, can be ‘text_classification’, ‘sentiment_analysis’, ‘question_answering’, ‘entailment’, ‘summarizations’ or ‘unknown’.

By default the model domain is inferred if ‘unknown’, but this can be overridden if you specify ‘classification’ or ‘regression’.

CLASSIFICATION = 'classification'
ENTAILMENT = 'entailment'
IMAGE_CLASSIFICATION = 'image_classification'
MULTILABEL_IMAGE_CLASSIFICATION = 'multilabel_image_classification'
MULTILABEL_TEXT_CLASSIFICATION = 'multilabel_text_classification'
OBJECT_DETECTION = 'object_detection'
QUESTION_ANSWERING = 'question_answering'
REGRESSION = 'regression'
SENTIMENT_ANALYSIS = 'sentiment_analysis'
SUMMARIZATIONS = 'summarizations'
TEXT_CLASSIFICATION = 'text_classification'
UNKNOWN = 'unknown'
class ml_wrappers.common.constants.ResetIndex(value)[source]

Bases: str, enum.Enum

Provide index column handling constants. Can be ‘ignore’, ‘reset’ or ‘reset_teacher’.

By default the index column is ignored, but you can override to reset it and make it a feature column that is then featurized to numeric, or reset it and ignore it during featurization but set it as the index when calling predict on the original model.

Ignore = 'ignore'
Reset = 'reset'
ResetTeacher = 'reset_teacher'
class ml_wrappers.common.constants.SHAPDefaults[source]

Bases: object

Provide constants for default values to SHAP.

INDEPENDENT = 'independent'
class ml_wrappers.common.constants.SKLearn[source]

Bases: object

Provide scikit-learn related constants.

EXAMPLES = 'examples'
LABELS = 'labels'
PREDICT = 'predict'
PREDICTIONS = 'predictions'
PREDICT_PROBA = 'predict_proba'
class ml_wrappers.common.constants.Scipy[source]

Bases: object

Provide scipy related constants.

CSR_FORMAT = 'csr'
class ml_wrappers.common.constants.ShapValuesOutput(value)[source]

Bases: str, enum.Enum

Provide constants for the SHAP values output from the explainer.

Can be ‘default’, ‘probability’ or ‘teacher_probability’. If ‘teacher_probability’ is specified, we use the probabilities from the teacher model.

DEFAULT = 'default'
PROBABILITY = 'probability'
TEACHER_PROBABILITY = 'teacher_probability'
class ml_wrappers.common.constants.Spacy[source]

Bases: object

Provide spaCy related constants.

EN = 'en'
NER = 'ner'
TAGGER = 'tagger'
class ml_wrappers.common.constants.Tensorflow[source]

Bases: object

Provide TensorFlow and TensorBoard related constants.

CPU0 = '/CPU:0'
TFLOG = 'tflog'

ml_wrappers.dataset

Defines a common dataset wrapper and common functions for data manipulation.

class ml_wrappers.dataset.CustomTimestampFeaturizer(features=None, return_pandas=False, modify_in_place=False)[source]

Bases: sklearn.base.BaseEstimator, sklearn.base.TransformerMixin

An estimator for featurizing timestamp columns to numeric data.

Parameters
  • features (list[str]) – Optional feature column names.

  • return_pandas (bool) – Whether to return the transformed dataset as a pandas DataFrame.

  • modify_in_place (bool) – Whether to modify the original dataset in place.

fit(X, y=None)[source]

Fits the CustomTimestampFeaturizer.

Parameters
  • X (numpy.ndarray or pandas.DataFrame or scipy.sparse.csr_matrix) – The dataset containing timestamp columns to featurize.

  • y (Optional target values (None for unsupervised transformations)) – The target values.

transform(X)[source]

Transforms the timestamp columns to numeric type in the given dataset.

Specifically, extracts the year, month, day, hour, minute, second and time since min timestamp in the training dataset.

Parameters

X (numpy.ndarray or pandas.DataFrame or scipy.sparse.csr_matrix) – The dataset containing timestamp columns to featurize.

Returns

The transformed dataset.

Return type

numpy.ndarray or scipy.sparse.csr_matrix

class ml_wrappers.dataset.DatasetWrapper(dataset, clear_references=False)[source]

Bases: object

A wrapper around a dataset to make dataset operations more uniform across explainers.

apply_indexer(column_indexer, bucket_unknown=False)[source]

Indexes categorical string features on the dataset.

Parameters
  • column_indexer (sklearn.compose.ColumnTransformer) – The transformation steps to index the given dataset.

  • bucket_unknown (bool) – If true, buckets unknown values to separate categorical level.

apply_one_hot_encoder(one_hot_encoder)[source]

One-hot-encode categorical string features on the dataset.

Parameters

one_hot_encoder (sklearn.preprocessing.OneHotEncoder) – The transformation steps to one-hot-encode the given dataset.

apply_timestamp_featurizer(timestamp_featurizer)[source]

Apply timestamp featurization on the dataset.

Parameters

timestamp_featurizer (CustomTimestampFeaturizer) – The transformation steps to featurize timestamps in the given dataset.

augment_data(max_num_of_augmentations=inf)[source]

Augment the current dataset.

Parameters

max_augment_data_size (int) – number of times we stack permuted x to augment.

compute_summary(nclusters=10, use_gpu=False, **kwargs)[source]

Summarizes the dataset if it hasn’t been summarized yet.

property dataset

Get the dataset.

Returns

The underlying dataset.

Return type

numpy.ndarray or scipy.sparse.csr_matrix

get_column_indexes(features, categorical_features)[source]

Get the column indexes for the given column names.

Parameters
  • features (list[str]) – The full list of existing column names.

  • categorical_features (list[str]) – The list of categorical feature names to get indexes for.

Returns

The list of column indexes.

Return type

list[int]

get_features(features=None, explain_subset=None, **kwargs)[source]

Get the features of the dataset if None on current kwargs.

Returns

The features of the dataset if currently None on kwargs.

Return type

list

property num_features

Get the number of features (columns) on the dataset.

Returns

The number of features (columns) in the dataset.

Return type

int

one_hot_encode(columns)[source]

Indexes categorical string features on the dataset.

Parameters

columns (list[int]) – Parameter specifying the subset of column indexes that may need to be one-hot-encoded.

Returns

The transformation steps to one-hot-encode the given dataset.

Return type

sklearn.preprocessing.OneHotEncoder

property original_dataset

Get the original dataset prior to performing any operations.

Note: if the original dataset was a pandas dataframe, this will return the numpy version.

Returns

The original dataset.

Return type

numpy.ndarray or scipy.sparse matrix

property original_dataset_with_type

Get the original typed dataset which could be a numpy array or pandas DataFrame or pandas Series.

Returns

The original dataset.

Return type

numpy.ndarray or pandas.DataFrame or pandas.Series or scipy.sparse matrix

reset_index()[source]

Reset index to be part of the features on the dataset.

sample(max_dim_clustering=50, sampling_method='hdbscan')[source]

Sample the examples.

First does random downsampling to upper_bound rows, then tries to find the optimal downsample based on how many clusters can be constructed from the data. If sampling_method is hdbscan, uses hdbscan to cluster the data and then downsamples to that number of clusters. If sampling_method is k-means, uses different values of k, cutting in half each time, and chooses the k with highest silhouette score to determine how much to downsample the data. The danger of using only random downsampling is that we might downsample too much or too little, so the clustering approach is a heuristic to give us some idea of how much we should downsample to.

Parameters
  • max_dim_clustering (int) – Dimensionality threshold for performing reduction.

  • sampling_method (str) – Method to use for sampling, can be ‘hdbscan’ or ‘kmeans’.

set_index()[source]

Undo reset_index. Set index as feature on internal dataset to be an index again.

string_index(columns=None)[source]

Indexes categorical string features on the dataset.

Parameters

columns (list) – Optional parameter specifying the subset of columns that may need to be string indexed.

Returns

The transformation steps to index the given dataset.

Return type

sklearn.compose.ColumnTransformer

property summary_dataset

Get the summary dataset without any subsetting.

Returns

The original dataset or None if summary was not computed.

Return type

numpy.ndarray or scipy.sparse.csr_matrix

take_subset(explain_subset)[source]

Take a subset of the dataset if not done before.

Parameters

explain_subset (list) – A list of column indexes to take from the original dataset.

timestamp_featurizer()[source]

Featurizes the timestamp columns.

Returns

The transformation steps to featurize the timestamp columns.

Return type

ml_wrappers.DatasetWrapper

property typed_dataset

Get the dataset in the original type, pandas DataFrame or Series.

Returns

The underlying dataset.

Return type

numpy.ndarray or pandas.DataFrame or pandas.Series or scipy.sparse matrix

typed_wrapper_func(dataset, keep_index_as_feature=False)[source]

Get a wrapper function to convert the dataset to the original type, pandas DataFrame or Series.

Parameters
  • dataset (numpy.ndarray or scipy.sparse.csr_matrix) – The dataset to convert to original type.

  • keep_index_as_feature (bool) – Whether to keep the index as a feature when converting back. Off by default to convert it back to index.

Returns

A wrapper function for a given dataset to convert to original type.

Return type

numpy.ndarray or scipy.sparse.csr_matrix or pandas.DataFrame or pandas.Series

ml_wrappers.dataset.dataset_utils

Defines helpful utilities for the DatasetWrapper.

ml_wrappers.dataset.dataset_wrapper

Defines a helpful dataset wrapper to allow operations such as summarizing data, taking the subset or sampling.

class ml_wrappers.dataset.dataset_wrapper.DatasetWrapper(dataset, clear_references=False)[source]

Bases: object

A wrapper around a dataset to make dataset operations more uniform across explainers.

apply_indexer(column_indexer, bucket_unknown=False)[source]

Indexes categorical string features on the dataset.

Parameters
  • column_indexer (sklearn.compose.ColumnTransformer) – The transformation steps to index the given dataset.

  • bucket_unknown (bool) – If true, buckets unknown values to separate categorical level.

apply_one_hot_encoder(one_hot_encoder)[source]

One-hot-encode categorical string features on the dataset.

Parameters

one_hot_encoder (sklearn.preprocessing.OneHotEncoder) – The transformation steps to one-hot-encode the given dataset.

apply_timestamp_featurizer(timestamp_featurizer)[source]

Apply timestamp featurization on the dataset.

Parameters

timestamp_featurizer (CustomTimestampFeaturizer) – The transformation steps to featurize timestamps in the given dataset.

augment_data(max_num_of_augmentations=inf)[source]

Augment the current dataset.

Parameters

max_augment_data_size (int) – number of times we stack permuted x to augment.

compute_summary(nclusters=10, use_gpu=False, **kwargs)[source]

Summarizes the dataset if it hasn’t been summarized yet.

property dataset

Get the dataset.

Returns

The underlying dataset.

Return type

numpy.ndarray or scipy.sparse.csr_matrix

get_column_indexes(features, categorical_features)[source]

Get the column indexes for the given column names.

Parameters
  • features (list[str]) – The full list of existing column names.

  • categorical_features (list[str]) – The list of categorical feature names to get indexes for.

Returns

The list of column indexes.

Return type

list[int]

get_features(features=None, explain_subset=None, **kwargs)[source]

Get the features of the dataset if None on current kwargs.

Returns

The features of the dataset if currently None on kwargs.

Return type

list

property num_features

Get the number of features (columns) on the dataset.

Returns

The number of features (columns) in the dataset.

Return type

int

one_hot_encode(columns)[source]

Indexes categorical string features on the dataset.

Parameters

columns (list[int]) – Parameter specifying the subset of column indexes that may need to be one-hot-encoded.

Returns

The transformation steps to one-hot-encode the given dataset.

Return type

sklearn.preprocessing.OneHotEncoder

property original_dataset

Get the original dataset prior to performing any operations.

Note: if the original dataset was a pandas dataframe, this will return the numpy version.

Returns

The original dataset.

Return type

numpy.ndarray or scipy.sparse matrix

property original_dataset_with_type

Get the original typed dataset which could be a numpy array or pandas DataFrame or pandas Series.

Returns

The original dataset.

Return type

numpy.ndarray or pandas.DataFrame or pandas.Series or scipy.sparse matrix

reset_index()[source]

Reset index to be part of the features on the dataset.

sample(max_dim_clustering=50, sampling_method='hdbscan')[source]

Sample the examples.

First does random downsampling to upper_bound rows, then tries to find the optimal downsample based on how many clusters can be constructed from the data. If sampling_method is hdbscan, uses hdbscan to cluster the data and then downsamples to that number of clusters. If sampling_method is k-means, uses different values of k, cutting in half each time, and chooses the k with highest silhouette score to determine how much to downsample the data. The danger of using only random downsampling is that we might downsample too much or too little, so the clustering approach is a heuristic to give us some idea of how much we should downsample to.

Parameters
  • max_dim_clustering (int) – Dimensionality threshold for performing reduction.

  • sampling_method (str) – Method to use for sampling, can be ‘hdbscan’ or ‘kmeans’.

set_index()[source]

Undo reset_index. Set index as feature on internal dataset to be an index again.

string_index(columns=None)[source]

Indexes categorical string features on the dataset.

Parameters

columns (list) – Optional parameter specifying the subset of columns that may need to be string indexed.

Returns

The transformation steps to index the given dataset.

Return type

sklearn.compose.ColumnTransformer

property summary_dataset

Get the summary dataset without any subsetting.

Returns

The original dataset or None if summary was not computed.

Return type

numpy.ndarray or scipy.sparse.csr_matrix

take_subset(explain_subset)[source]

Take a subset of the dataset if not done before.

Parameters

explain_subset (list) – A list of column indexes to take from the original dataset.

timestamp_featurizer()[source]

Featurizes the timestamp columns.

Returns

The transformation steps to featurize the timestamp columns.

Return type

ml_wrappers.DatasetWrapper

property typed_dataset

Get the dataset in the original type, pandas DataFrame or Series.

Returns

The underlying dataset.

Return type

numpy.ndarray or pandas.DataFrame or pandas.Series or scipy.sparse matrix

typed_wrapper_func(dataset, keep_index_as_feature=False)[source]

Get a wrapper function to convert the dataset to the original type, pandas DataFrame or Series.

Parameters
  • dataset (numpy.ndarray or scipy.sparse.csr_matrix) – The dataset to convert to original type.

  • keep_index_as_feature (bool) – Whether to keep the index as a feature when converting back. Off by default to convert it back to index.

Returns

A wrapper function for a given dataset to convert to original type.

Return type

numpy.ndarray or scipy.sparse.csr_matrix or pandas.DataFrame or pandas.Series

ml_wrappers.dataset.timestamp_featurizer

Defines a custom timestamp featurizer for converting timestamp columns to numeric.

class ml_wrappers.dataset.timestamp_featurizer.CustomTimestampFeaturizer(features=None, return_pandas=False, modify_in_place=False)[source]

Bases: sklearn.base.BaseEstimator, sklearn.base.TransformerMixin

An estimator for featurizing timestamp columns to numeric data.

Parameters
  • features (list[str]) – Optional feature column names.

  • return_pandas (bool) – Whether to return the transformed dataset as a pandas DataFrame.

  • modify_in_place (bool) – Whether to modify the original dataset in place.

fit(X, y=None)[source]

Fits the CustomTimestampFeaturizer.

Parameters
  • X (numpy.ndarray or pandas.DataFrame or scipy.sparse.csr_matrix) – The dataset containing timestamp columns to featurize.

  • y (Optional target values (None for unsupervised transformations)) – The target values.

transform(X)[source]

Transforms the timestamp columns to numeric type in the given dataset.

Specifically, extracts the year, month, day, hour, minute, second and time since min timestamp in the training dataset.

Parameters

X (numpy.ndarray or pandas.DataFrame or scipy.sparse.csr_matrix) – The dataset containing timestamp columns to featurize.

Returns

The transformed dataset.

Return type

numpy.ndarray or scipy.sparse.csr_matrix

ml_wrappers.model

Common infrastructure, class hierarchy and utilities for model explanations.

class ml_wrappers.model.EndpointWrapperModel(api_key, url, allow_self_signed_https=False, extra_headers=None, transform_output_dict=False, class_names=None, wrap_input_data_dict=False, batch_size=10, api_key_auto_refresh_callable=None)[source]

Bases: object

Defines an MLFlow model wrapper for an endpoint.

allow_self_signed_https(allowed)[source]

Allow self signed HTTPS.

Parameters

allowed (bool) – Whether to allow self signed HTTPS.

static from_auto_refresh_callable(api_key_auto_refresh_callable, url, **kwargs)[source]

Create an EndpointWrapperModel from an auto refresh callable.

The callable method should return the latest API key.

Parameters
  • api_key_auto_refresh_callable (callable) – The method to call to refresh the API key.

  • kwargs (dict) – The keyword arguments.

Returns

The EndpointWrapperModel.

Return type

ml_wrappers.model.EndpointWrapperModel

load_context(context)[source]

Load the context.

Parameters

context (mlflow.pyfunc.model.PythonModelContext) – The context.

predict(context, model_input=None)[source]

Predict using the model.

Parameters
  • context (mlflow.pyfunc.model.PythonModelContext or pandas.DataFrame) – The context for MLFlow model or the input data.

  • model_input (pandas.DataFrame) – The input to the model.

Returns

The predictions.

Return type

numpy.ndarray

predict_proba(data)[source]

Predict using the model.

Parameters

data (pandas.DataFrame) – The input to the model.

Returns

The predictions.

Return type

numpy.ndarray

class ml_wrappers.model.OpenaiWrapperModel(api_type, api_base, api_version, api_key, engine='gpt-4-32k', temperature=0.7, max_tokens=800, top_p=0.95, frequency_penalty=0, presence_penalty=0, stop=None)[source]

Bases: object

A model wrapper for an openai model endpoint.

predict(context, model_input=None)[source]

Predict using the model.

Parameters
  • context (mlflow.pyfunc.model.PythonModelContext or pandas.DataFrame) – The context for MLFlow model or the input data.

  • model_input (pandas.DataFrame) – The input to the model.

Returns

The predictions.

Return type

numpy.ndarray

class ml_wrappers.model.WrappedClassificationModel(model, eval_function, examples=None)[source]

Bases: ml_wrappers.model.base_wrapped_model.BaseWrappedModel

A class for wrapping a classification model.

predict(dataset)[source]

Predict the output using the wrapped classification model.

Parameters

dataset (ml_wrappers.DatasetWrapper) – The dataset to predict on.

predict_proba(dataset)[source]

Predict the output probability using the wrapped model.

Parameters

dataset (ml_wrappers.DatasetWrapper) – The dataset to predict_proba on.

class ml_wrappers.model.WrappedPytorchModel(model, image_to_tensor=False)[source]

Bases: object

A class for wrapping a PyTorch model.

Note at time of initialization, since we don’t have access to the dataset, we can’t infer if this is for classification or regression case. Hence, we add the predict_classes method for classification, and keep predict for either outputting values in regression or probabilities in classification.

predict(dataset)[source]

Predict the output using the wrapped PyTorch model.

Parameters

dataset (ml_wrappers.DatasetWrapper) – The dataset to predict on.

Returns

The prediction results.

Return type

numpy.ndarray

predict_classes(dataset)[source]

Predict the class using the wrapped PyTorch model.

Parameters

dataset (ml_wrappers.DatasetWrapper) – The dataset to predict on.

Returns

The predicted classes.

Return type

numpy.ndarray

predict_proba(dataset)[source]

Predict the output probability using the wrapped PyTorch model.

Parameters

dataset (ml_wrappers.DatasetWrapper) – The dataset to predict_proba on.

Returns

The predicted probabilities.

Return type

numpy.ndarray

class ml_wrappers.model.WrappedRegressionModel(model, eval_function, examples=None)[source]

Bases: ml_wrappers.model.base_wrapped_model.BaseWrappedModel

A class for wrapping a regression model.

predict(dataset)[source]

Predict the output using the wrapped regression model.

Parameters

dataset (ml_wrappers.DatasetWrapper) – The dataset to predict on.

class ml_wrappers.model.WrappedTensorflowModel(model)[source]

Bases: object

A class for wrapping a TensorFlow model.

Note at time of initialization, since we don’t have access to the dataset, we can’t infer if this is for classification or regression case. Hence, we add the predict_classes method for classification, and keep predict for either outputting values in regression or probabilities in classification.

predict(dataset)[source]

Predict the output using the wrapped TensorFlow model.

Parameters

dataset (ml_wrappers.DatasetWrapper) – The dataset to predict on.

Returns

The prediction results.

Return type

numpy.ndarray

predict_classes(dataset)[source]

Predict the class using the wrapped TensorFlow model.

Parameters

dataset (ml_wrappers.DatasetWrapper) – The dataset to predict on.

Returns

The predicted classes.

Return type

numpy.ndarray

predict_proba(dataset)[source]

Predict the output probability using the wrapped TensorFlow model.

Parameters

dataset (ml_wrappers.DatasetWrapper) – The dataset to predict_proba on.

Returns

The predicted probabilities.

Return type

numpy.ndarray

ml_wrappers.model.is_sequential(model)[source]

Returns True if the model is a sequential model.

Note the model class name can be keras.src.engine.sequential.Sequential, keras.engine.sequential.Sequential or tensorflow.python.keras.engine.sequential.Sequential depending on the tensorflow version. In the latest 2.13 version, the namespace changed from keras.engine to keras.src.engine. The check should include all of these cases.

Parameters

model (tf.keras.Model) – The model to check.

Returns

True if the model is a sequential model.

Return type

bool

ml_wrappers.model.wrap_model(model, examples, model_task: str = ModelTask.UNKNOWN, num_classes: Optional[int] = None, classes: Optional[Union[list, numpy.array]] = None, device='auto')[source]
If needed, wraps the model in a common API based on model task and

prediction function contract.

Parameters
  • model (model with a predict or predict_proba function.) – The model to evaluate on the examples.

  • examples (ml_wrappers.DatasetWrapper or numpy.ndarray or pandas.DataFrame or panads.Series or scipy.sparse.csr_matrix or shap.DenseData or torch.Tensor) – The model evaluation examples. Note the examples will be wrapped in a DatasetWrapper, if not wrapped when input.

  • model_task (str) – Optional parameter to specify whether the model is a classification or regression model. In most cases, the type of the model can be inferred based on the shape of the output, where a classifier has a predict_proba method and outputs a 2 dimensional array, while a regressor has a predict method and outputs a 1 dimensional array.

  • classes (list or np.ndarray) – optional parameter specifying a list of class names the dataset

  • num_classes (int) – optional parameter specifying the number of classes in the dataset

  • device (str, for instance: 'cpu', 'cuda') – optional parameter specifying the device to move the model to. If not specified, then cpu is the default

Returns

The wrapper model.

Return type

model

ml_wrappers.model.base_wrapped_model

Defines a base class for wrapping models.

class ml_wrappers.model.base_wrapped_model.BaseWrappedModel(model, eval_function, examples, model_task)[source]

Bases: object

A base class for WrappedClassificationModel and WrappedRegressionModel.

ml_wrappers.model.evaluator

ml_wrappers.model.fastai_wrapper

Defines model wrappers and utilities for fastai tabular models.

class ml_wrappers.model.fastai_wrapper.WrappedFastAITabularModel(model)[source]

Bases: object

A class for wrapping a FastAI tabular model in the scikit-learn style.

predict(dataset)[source]

Predict the output value using the wrapped FastAI model.

Parameters

dataset (ml_wrappers.DatasetWrapper) – The dataset to predict on.

Returns

The predicted values.

Return type

numpy.ndarray

predict_proba(dataset)[source]

Predict the output probability using the FastAI model.

Parameters

dataset (ml_wrappers.DatasetWrapper) – The dataset to predict_proba on.

Returns

The predicted probabilities.

Return type

numpy.ndarray

ml_wrappers.model.function_wrapper

Defines helper utilities for resolving prediction function shape inconsistencies.

ml_wrappers.model.image_model_wrapper

Defines wrappers for vision-based models.

class ml_wrappers.model.image_model_wrapper.MLflowDRiseWrapper(model: Any, classes: Union[list, numpy.ndarray])[source]

Bases: object

Wraps a Mlflow model with a predict API function.

To be compatible with the D-RISE explainability method, all models must be wrapped to have the same output and input class and a predict function for object detection. This wrapper is customized for the FasterRCNN model from AutoML. Unlike the Pytorch wrapper, this wrapper does not inherit from GeneralObjectDetectionModelWrapper as this super class requires predict to take a tensor input.

predict(dataset: pandas.core.frame.DataFrame, iou_threshold: float = 0.25, score_threshold: float = 0.5)[source]

Predict the output value using the wrapped MLflow model.

Parameters
  • dataset (pandas.DataFrame) – The dataset to predict on.

  • iou_threshold (float) – Intersection-over-Union (IoU) threshold for NMS (or the amount of acceptable error). Objects with error scores higher than the threshold will be removed.

  • score_threshold (float) – Threshold to filter detections based on predicted confidence scores.

Returns

The predicted values.

Return type

numpy.ndarray

class ml_wrappers.model.image_model_wrapper.PytorchDRiseWrapper(model, number_of_classes: int, device='auto', transforms=None, iou_threshold=None, score_threshold=None)[source]

Bases: object

Wraps a PytorchFasterRCNN model with a predict API function.

To be compatible with the D-RISE explainability method, all models must be wrapped to have the same output and input class and a predict function for object detection. This wrapper is customized for the FasterRCNN model from Pytorch, and can also be used with the RetinaNet or any other models with the same output class.

predict(x: Any, iou_threshold: float = 0.5, score_threshold: float = 0.5)[source]

Create a list of detection records from the image predictions.

Parameters
  • x (torch.Tensor) – Tensor of the image

  • iou_threshold (float) – Intersection-over-Union (IoU) threshold for NMS (or the amount of acceptable error). Objects with error scores higher than the threshold will be removed.

  • score_threshold (float) – Threshold to filter detections based on predicted confidence scores.

Returns

Baseline detections to get saliency maps for

Return type

List of Detection Records

class ml_wrappers.model.image_model_wrapper.WrappedFastAIImageClassificationModel(model, multilabel=False)[source]

Bases: object

A class for wrapping a FastAI model in the scikit-learn style.

predict(dataset)[source]

Predict the output value using the wrapped FastAI model.

Parameters

dataset (ml_wrappers.DatasetWrapper) – The dataset to predict on.

Returns

The predicted values.

Return type

numpy.ndarray

predict_proba(dataset)[source]

Predict the output probability using the FastAI model.

Parameters

dataset (ml_wrappers.DatasetWrapper) – The dataset to predict_proba on.

Returns

The predicted probabilities.

Return type

numpy.ndarray

class ml_wrappers.model.image_model_wrapper.WrappedMlflowAutomlImagesClassificationModel(model: Any)[source]

Bases: object

A class for wrapping an AutoML for images MLflow classification model in the scikit-learn style.

predict(dataset: pandas.core.frame.DataFrame) numpy.ndarray[source]

Predict the output value using the wrapped MLflow model.

Parameters

dataset (pandas.DataFrame) – The dataset to predict on.

Returns

The predicted values.

Return type

numpy.ndarray

predict_proba(dataset: pandas.core.frame.DataFrame) numpy.ndarray[source]

Predict the output probability using the MLflow model.

Parameters

dataset (pandas.DataFrame) – The dataset to predict_proba on.

Returns

The predicted probabilities.

Return type

numpy.ndarray

class ml_wrappers.model.image_model_wrapper.WrappedMlflowAutomlObjectDetectionModel(model: Any, classes: Union[list, numpy.array])[source]

Bases: object

A class for wrapping an AutoML for images MLflow object detection model in the scikit-learn style.

predict(dataset: pandas.core.frame.DataFrame, iou_threshold: float = 0.5, score_threshold: float = 0.5)[source]

Create a list of detection records from the image predictions.

Below is example Label (y) representation for a cohort of 2 images, with 3 objects detected for the first image and 1 for the second image.

[
[

[class, topX, topY, bottomX, bottomY, (optional) confidence_score], [class, topX, topY, bottomX, bottomY, (optional) confidence_score], [class, topX, topY, bottomX, bottomY, (optional) confidence_score],

], [

[class, topX, topY, bottomX, bottomY, (optional) confidence_score],

]

]

Parameters
  • dataset (pandas.DataFrame) – The dataset to predict on.

  • iou_threshold (float) – Intersection-over-Union (IoU) threshold for NMS (or the amount of acceptable error). Objects with error cores higher than the threshold will be removed.

  • score_threshold (float) – Threshold to filter detections based on predicted confidence scores.

Returns

Final detections from the object detector

Return type

numpy array of Detection Records

predict_proba(dataset: pandas.core.frame.DataFrame, iou_threshold=0.1) numpy.ndarray[source]

Predict the output probability using the MLflow model.

Parameters
  • dataset (pandas.DataFrame) – The dataset to predict_proba on.

  • iou_threshold (float) – amount of acceptable error. objects with error scores higher than the threshold will be removed

Returns

The predicted probabilities.

Return type

numpy.ndarray

class ml_wrappers.model.image_model_wrapper.WrappedObjectDetectionModel(model: Any, number_of_classes: int, device='auto')[source]

Bases: object

A class for wrapping a object detection model in the scikit-learn style.

predict(x, iou_threshold: float = 0.5, score_threshold: float = 0.5)[source]

Create a list of detection records from the image predictions.

Parameters

x (torch.Tensor) – Tensor of the image

Returns

Baseline detections to get saliency maps for

Return type

numpy array of Detection Records

Example Label (y) representation for a cohort of 2 images:

[

[ [object_1, x1, y1, b1, h1, (optional) confidence_score], [object_2, x2, y2, b2, h2, (optional) confidence_score], [object_1, x3, y3, b3, h3, (optional) confidence_score]

],

[

[object_1, x4, y4, b4, h4, (optional) confidence_score], [object_2, x5, y5, b5, h5, (optional) confidence_score]

]

]

predict_proba(dataset, iou_threshold=0.1)[source]

Predict the output probability using the wrapped model.

Parameters
  • dataset (ml_wrappers.DatasetWrapper) – The dataset to predict_proba on.

  • iou_threshold (float) – amount of acceptable error. objects with error scores higher than the threshold will be removed

class ml_wrappers.model.image_model_wrapper.WrappedTransformerImageClassificationModel(model)[source]

Bases: object

A class for wrapping a Transformers model in the scikit-learn style.

predict(dataset)[source]

Predict the output using the wrapped Transformers model.

Parameters

dataset (ml_wrappers.DatasetWrapper) – The dataset to predict on.

Returns

The predicted values.

Return type

numpy.ndarray

predict_proba(dataset)[source]

Predict the output probability using the Transformers model.

Parameters

dataset (ml_wrappers.DatasetWrapper) – The dataset to predict_proba on.

Returns

The predicted probabilities.

Return type

numpy.ndarray

ml_wrappers.model.image_model_wrapper.expand_class_scores(scores: Any, labels: Any, number_of_classes: int) Any[source]

Extrapolate a full set of class scores.

Many object detection models don’t return a full set of class scores, but rather just a score for the predicted class. This is a helper function that approximates a full set of class scores by dividing the difference between 1.0 and the predicted class score among the remaning classes.

Parameters
  • scores (torch.Tensor) – Set of class specific scores. Shape [D] where D is number of detections

  • labels (torch.Tensor (ints)) – Set of label indices corresponding to predicted class. Shape [D] where D is number of detections

  • number_of_classes (int) – Number of classes model predicts

Returns

A set of expanded scores, of shape [D, C], where C is number of classes

Type

torch.Tensor

ml_wrappers.model.model_utils

Defines common model utilities.

ml_wrappers.model.model_wrapper

Defines helpful model wrapper and utils for implicitly rewrapping the model to conform to explainer contracts.

ml_wrappers.model.model_wrapper.wrap_model(model, examples, model_task: str = ModelTask.UNKNOWN, num_classes: Optional[int] = None, classes: Optional[Union[list, numpy.array]] = None, device='auto')[source]
If needed, wraps the model in a common API based on model task and

prediction function contract.

Parameters
  • model (model with a predict or predict_proba function.) – The model to evaluate on the examples.

  • examples (ml_wrappers.DatasetWrapper or numpy.ndarray or pandas.DataFrame or panads.Series or scipy.sparse.csr_matrix or shap.DenseData or torch.Tensor) – The model evaluation examples. Note the examples will be wrapped in a DatasetWrapper, if not wrapped when input.

  • model_task (str) – Optional parameter to specify whether the model is a classification or regression model. In most cases, the type of the model can be inferred based on the shape of the output, where a classifier has a predict_proba method and outputs a 2 dimensional array, while a regressor has a predict method and outputs a 1 dimensional array.

  • classes (list or np.ndarray) – optional parameter specifying a list of class names the dataset

  • num_classes (int) – optional parameter specifying the number of classes in the dataset

  • device (str, for instance: 'cpu', 'cuda') – optional parameter specifying the device to move the model to. If not specified, then cpu is the default

Returns

The wrapper model.

Return type

model

ml_wrappers.model.predictions_wrapper

Defines classes to wrap the training/test data and the corresponding predictions from the model.

exception ml_wrappers.model.predictions_wrapper.DataValidationException[source]

Bases: Exception

An exception indicating that some user supplied data is not valid.

Parameters

exception_message (str) – A message describing the error.

exception ml_wrappers.model.predictions_wrapper.EmptyDataException[source]

Bases: Exception

An exception indicating that some operation produced empty data.

Parameters

exception_message (str) – A message describing the error.

class ml_wrappers.model.predictions_wrapper.PredictionsModelWrapper(test_data: pandas.core.frame.DataFrame, y_pred: numpy.ndarray, should_construct_pandas_query: Optional[bool] = True)[source]

Bases: object

Model wrapper to wrap the samples used to train the models and the predictions of the model. This wrapper is useful when it is not possible to load the model.

predict(query_test_data: pandas.core.frame.DataFrame) numpy.ndarray[source]

Return the predictions based on the query data.

Parameters

query_test_data (pd.DataFrame) – The data for which the predictions need to be returned.

Returns

Predictions of the model.

Return type

np.ndarray

class ml_wrappers.model.predictions_wrapper.PredictionsModelWrapperClassification(test_data: pandas.core.frame.DataFrame, y_pred: numpy.ndarray, y_pred_proba: Optional[numpy.ndarray] = None, should_construct_pandas_query: Optional[bool] = True)[source]

Bases: ml_wrappers.model.predictions_wrapper.PredictionsModelWrapper

Model wrapper to wrap the samples used to train the models and the predictions of the model for classification tasks.

predict_proba(query_test_data: pandas.core.frame.DataFrame) numpy.ndarray[source]

Return the prediction probabilities based on the query data.

Parameters

query_test_data (pd.DataFrame) – The data for which the prediction probabilities need to be returned.

Returns

Prediction probabilities of the model.

Return type

np.ndarray

class ml_wrappers.model.predictions_wrapper.PredictionsModelWrapperRegression(test_data: pandas.core.frame.DataFrame, y_pred: numpy.ndarray, should_construct_pandas_query: Optional[bool] = True)[source]

Bases: ml_wrappers.model.predictions_wrapper.PredictionsModelWrapper

Model wrapper to wrap the samples used to train the models and the predictions of the model for regression tasks.

ml_wrappers.model.pytorch_wrapper

Defines model wrappers and utilities for pytorch models.

class ml_wrappers.model.pytorch_wrapper.WrappedPytorchModel(model, image_to_tensor=False)[source]

Bases: object

A class for wrapping a PyTorch model.

Note at time of initialization, since we don’t have access to the dataset, we can’t infer if this is for classification or regression case. Hence, we add the predict_classes method for classification, and keep predict for either outputting values in regression or probabilities in classification.

predict(dataset)[source]

Predict the output using the wrapped PyTorch model.

Parameters

dataset (ml_wrappers.DatasetWrapper) – The dataset to predict on.

Returns

The prediction results.

Return type

numpy.ndarray

predict_classes(dataset)[source]

Predict the class using the wrapped PyTorch model.

Parameters

dataset (ml_wrappers.DatasetWrapper) – The dataset to predict on.

Returns

The predicted classes.

Return type

numpy.ndarray

predict_proba(dataset)[source]

Predict the output probability using the wrapped PyTorch model.

Parameters

dataset (ml_wrappers.DatasetWrapper) – The dataset to predict_proba on.

Returns

The predicted probabilities.

Return type

numpy.ndarray

ml_wrappers.model.tensorflow_wrapper

Defines model wrappers and utilities for tensorflow models.

class ml_wrappers.model.tensorflow_wrapper.WrappedTensorflowModel(model)[source]

Bases: object

A class for wrapping a TensorFlow model.

Note at time of initialization, since we don’t have access to the dataset, we can’t infer if this is for classification or regression case. Hence, we add the predict_classes method for classification, and keep predict for either outputting values in regression or probabilities in classification.

predict(dataset)[source]

Predict the output using the wrapped TensorFlow model.

Parameters

dataset (ml_wrappers.DatasetWrapper) – The dataset to predict on.

Returns

The prediction results.

Return type

numpy.ndarray

predict_classes(dataset)[source]

Predict the class using the wrapped TensorFlow model.

Parameters

dataset (ml_wrappers.DatasetWrapper) – The dataset to predict on.

Returns

The predicted classes.

Return type

numpy.ndarray

predict_proba(dataset)[source]

Predict the output probability using the wrapped TensorFlow model.

Parameters

dataset (ml_wrappers.DatasetWrapper) – The dataset to predict_proba on.

Returns

The predicted probabilities.

Return type

numpy.ndarray

ml_wrappers.model.tensorflow_wrapper.is_sequential(model)[source]

Returns True if the model is a sequential model.

Note the model class name can be keras.src.engine.sequential.Sequential, keras.engine.sequential.Sequential or tensorflow.python.keras.engine.sequential.Sequential depending on the tensorflow version. In the latest 2.13 version, the namespace changed from keras.engine to keras.src.engine. The check should include all of these cases.

Parameters

model (tf.keras.Model) – The model to check.

Returns

True if the model is a sequential model.

Return type

bool

ml_wrappers.model.text_model_wrapper

Defines wrappers for text-based models.

class ml_wrappers.model.text_model_wrapper.WrappedQuestionAnsweringModel(model)[source]

Bases: object

A class for wrapping a Transformers model in the scikit-learn style.

predict(dataset)[source]

Predict the output using the wrapped Transformers model.

Parameters

dataset (ml_wrappers.DatasetWrapper) – The dataset to predict on.

class ml_wrappers.model.text_model_wrapper.WrappedTextClassificationModel(model, multilabel=False)[source]

Bases: object

A class for wrapping a Transformers model in the scikit-learn style.

predict(dataset)[source]

Predict the output using the wrapped Transformers model.

Parameters

dataset (ml_wrappers.DatasetWrapper) – The dataset to predict on.

predict_proba(dataset)[source]

Predict the output probability using the Transformers model.

Parameters

dataset (ml_wrappers.DatasetWrapper) – The dataset to predict_proba on.

ml_wrappers.model.wrapped_classification_model

Defines a class for wrapping classification models.

class ml_wrappers.model.wrapped_classification_model.WrappedClassificationModel(model, eval_function, examples=None)[source]

Bases: ml_wrappers.model.base_wrapped_model.BaseWrappedModel

A class for wrapping a classification model.

predict(dataset)[source]

Predict the output using the wrapped classification model.

Parameters

dataset (ml_wrappers.DatasetWrapper) – The dataset to predict on.

predict_proba(dataset)[source]

Predict the output probability using the wrapped model.

Parameters

dataset (ml_wrappers.DatasetWrapper) – The dataset to predict_proba on.

ml_wrappers.model.wrapped_classification_without_proba_model

Defines a class for wrapping classifiers without predict_proba.

class ml_wrappers.model.wrapped_classification_without_proba_model.WrappedClassificationWithoutProbaModel(model)[source]

Bases: object

A class for wrapping a classifier without a predict_proba method.

Note: the classifier may not output numeric values for its predictions. We generate a trival boolean version of predict_proba

predict(dataset)[source]

Predict the output using the wrapped regression model.

Parameters

dataset (ml_wrappers.DatasetWrapper) – The dataset to predict on.

predict_proba(dataset)[source]

Predict the output probability using the wrapped model.

Parameters

dataset (ml_wrappers.DatasetWrapper) – The dataset to predict_proba on.

ml_wrappers.model.wrapped_regression_model

Defines a class for wrapping regression models.

class ml_wrappers.model.wrapped_regression_model.WrappedRegressionModel(model, eval_function, examples=None)[source]

Bases: ml_wrappers.model.base_wrapped_model.BaseWrappedModel

A class for wrapping a regression model.

predict(dataset)[source]

Predict the output using the wrapped regression model.

Parameters

dataset (ml_wrappers.DatasetWrapper) – The dataset to predict on.

ml_wrappers.version