API Reference
Table of Contents
ml_wrappers
Module for wrapping datasets and models in one uniform format.
- class ml_wrappers.DatasetWrapper(dataset, clear_references=False)[source]
Bases:
object
A wrapper around a dataset to make dataset operations more uniform across explainers.
- apply_indexer(column_indexer, bucket_unknown=False)[source]
Indexes categorical string features on the dataset.
- Parameters
column_indexer (sklearn.compose.ColumnTransformer) – The transformation steps to index the given dataset.
bucket_unknown (bool) – If true, buckets unknown values to separate categorical level.
- apply_one_hot_encoder(one_hot_encoder)[source]
One-hot-encode categorical string features on the dataset.
- Parameters
one_hot_encoder (sklearn.preprocessing.OneHotEncoder) – The transformation steps to one-hot-encode the given dataset.
- apply_timestamp_featurizer(timestamp_featurizer)[source]
Apply timestamp featurization on the dataset.
- Parameters
timestamp_featurizer (CustomTimestampFeaturizer) – The transformation steps to featurize timestamps in the given dataset.
- augment_data(max_num_of_augmentations=inf)[source]
Augment the current dataset.
- Parameters
max_augment_data_size (int) – number of times we stack permuted x to augment.
- compute_summary(nclusters=10, use_gpu=False, **kwargs)[source]
Summarizes the dataset if it hasn’t been summarized yet.
- property dataset
Get the dataset.
- Returns
The underlying dataset.
- Return type
numpy.ndarray or scipy.sparse.csr_matrix
- get_column_indexes(features, categorical_features)[source]
Get the column indexes for the given column names.
- get_features(features=None, explain_subset=None, **kwargs)[source]
Get the features of the dataset if None on current kwargs.
- Returns
The features of the dataset if currently None on kwargs.
- Return type
- property num_features
Get the number of features (columns) on the dataset.
- Returns
The number of features (columns) in the dataset.
- Return type
- property original_dataset
Get the original dataset prior to performing any operations.
Note: if the original dataset was a pandas dataframe, this will return the numpy version.
- Returns
The original dataset.
- Return type
numpy.ndarray or scipy.sparse matrix
- property original_dataset_with_type
Get the original typed dataset which could be a numpy array or pandas DataFrame or pandas Series.
- Returns
The original dataset.
- Return type
numpy.ndarray or pandas.DataFrame or pandas.Series or scipy.sparse matrix
- sample(max_dim_clustering=50, sampling_method='hdbscan')[source]
Sample the examples.
First does random downsampling to upper_bound rows, then tries to find the optimal downsample based on how many clusters can be constructed from the data. If sampling_method is hdbscan, uses hdbscan to cluster the data and then downsamples to that number of clusters. If sampling_method is k-means, uses different values of k, cutting in half each time, and chooses the k with highest silhouette score to determine how much to downsample the data. The danger of using only random downsampling is that we might downsample too much or too little, so the clustering approach is a heuristic to give us some idea of how much we should downsample to.
- set_index()[source]
Undo reset_index. Set index as feature on internal dataset to be an index again.
- string_index(columns=None)[source]
Indexes categorical string features on the dataset.
- Parameters
columns (list) – Optional parameter specifying the subset of columns that may need to be string indexed.
- Returns
The transformation steps to index the given dataset.
- Return type
sklearn.compose.ColumnTransformer
- property summary_dataset
Get the summary dataset without any subsetting.
- Returns
The original dataset or None if summary was not computed.
- Return type
numpy.ndarray or scipy.sparse.csr_matrix
- take_subset(explain_subset)[source]
Take a subset of the dataset if not done before.
- Parameters
explain_subset (list) – A list of column indexes to take from the original dataset.
- timestamp_featurizer()[source]
Featurizes the timestamp columns.
- Returns
The transformation steps to featurize the timestamp columns.
- Return type
- property typed_dataset
Get the dataset in the original type, pandas DataFrame or Series.
- Returns
The underlying dataset.
- Return type
numpy.ndarray or pandas.DataFrame or pandas.Series or scipy.sparse matrix
- typed_wrapper_func(dataset, keep_index_as_feature=False)[source]
Get a wrapper function to convert the dataset to the original type, pandas DataFrame or Series.
- Parameters
dataset (numpy.ndarray or scipy.sparse.csr_matrix) – The dataset to convert to original type.
keep_index_as_feature (bool) – Whether to keep the index as a feature when converting back. Off by default to convert it back to index.
- Returns
A wrapper function for a given dataset to convert to original type.
- Return type
numpy.ndarray or scipy.sparse.csr_matrix or pandas.DataFrame or pandas.Series
- ml_wrappers.wrap_model(model, examples, model_task: str = ModelTask.UNKNOWN, num_classes: Optional[int] = None, classes: Optional[Union[list, numpy.array]] = None, device='auto')[source]
- If needed, wraps the model in a common API based on model task and
prediction function contract.
- Parameters
model (model with a predict or predict_proba function.) – The model to evaluate on the examples.
examples (ml_wrappers.DatasetWrapper or numpy.ndarray or pandas.DataFrame or panads.Series or scipy.sparse.csr_matrix or shap.DenseData or torch.Tensor) – The model evaluation examples. Note the examples will be wrapped in a DatasetWrapper, if not wrapped when input.
model_task (str) – Optional parameter to specify whether the model is a classification or regression model. In most cases, the type of the model can be inferred based on the shape of the output, where a classifier has a predict_proba method and outputs a 2 dimensional array, while a regressor has a predict method and outputs a 1 dimensional array.
classes (list or np.ndarray) – optional parameter specifying a list of class names the dataset
num_classes (int) – optional parameter specifying the number of classes in the dataset
device (str, for instance: 'cpu', 'cuda') – optional parameter specifying the device to move the model to. If not specified, then cpu is the default
- Returns
The wrapper model.
- Return type
model
ml_wrappers.common
Defines a common directory shared across ML model and dataset wrappers.
ml_wrappers.common.constants
Defines constants for ml-wrappers.
- class ml_wrappers.common.constants.Attributes[source]
Bases:
object
Provide constants for attributes.
- EXPECTED_VALUE = 'expected_value'
- class ml_wrappers.common.constants.DNNFramework[source]
Bases:
object
Provide DNN framework constants.
- PYTORCH = 'pytorch'
- TENSORFLOW = 'tensorflow'
- class ml_wrappers.common.constants.Defaults[source]
Bases:
object
Provide constants for default values to explain methods.
- AUTO = 'auto'
- DEFAULT_BATCH_SIZE = 100
- HDBSCAN = 'hdbscan'
- MAX_DIM = 50
- class ml_wrappers.common.constants.Device(value)[source]
Bases:
enum.Enum
Specifies all possible device types.
- AUTO = 'auto'
- CPU = 'cpu'
- CUDA = 'cuda'
- class ml_wrappers.common.constants.Dynamic[source]
Bases:
object
Provide constants for dynamically generated classes.
- GLOBAL_EXPLANATION = 'DynamicGlobalExplanation'
- LOCAL_EXPLANATION = 'DynamicLocalExplanation'
- class ml_wrappers.common.constants.ExplainParams[source]
Bases:
object
Provide constants for interpret community (init, explain_local and explain_global) parameters.
- BATCH_SIZE = 'batch_size'
- CLASSES = 'classes'
- CLASSIFICATION = 'classification'
- EVAL_DATA = 'eval_data'
- EVAL_Y_PRED = 'eval_y_predicted'
- EVAL_Y_PRED_PROBA = 'eval_y_predicted_proba'
- EXPECTED_VALUES = 'expected_values'
- EXPLAIN_SUBSET = 'explain_subset'
- EXPLANATION_ID = 'explanation_id'
- FEATURES = 'features'
- GLOBAL_IMPORTANCE_NAMES = 'global_importance_names'
- GLOBAL_IMPORTANCE_RANK = 'global_importance_rank'
- GLOBAL_IMPORTANCE_VALUES = 'global_importance_values'
- GLOBAL_NAMES = 'global_names'
- GLOBAL_RANK = 'global_rank'
- GLOBAL_VALUES = 'global_values'
- ID = 'id'
- INCLUDE_LOCAL = 'include_local'
- INIT_DATA = 'init_data'
- IS_ENG = 'is_engineered'
- IS_LOCAL_SPARSE = 'is_local_sparse'
- IS_RAW = 'is_raw'
- LOCAL_EXPLANATION = 'local_explanation'
- LOCAL_IMPORTANCE_VALUES = 'local_importance_values'
- METHOD = 'method'
- MODEL_ID = 'model_id'
- MODEL_TASK = 'model_task'
- MODEL_TYPE = 'model_type'
- NUM_CLASSES = 'num_classes'
- NUM_EXAMPLES = 'num_examples'
- NUM_FEATURES = 'num_features'
- PER_CLASS_NAMES = 'per_class_names'
- PER_CLASS_RANK = 'per_class_rank'
- PER_CLASS_VALUES = 'per_class_values'
- PROBABILITIES = 'probabilities'
- SAMPLING_POLICY = 'sampling_policy'
- SHAP_VALUES_OUTPUT = 'shap_values_output'
- classmethod get_private(explain_param)[source]
Return the private version of the ExplainParams property.
- Parameters
cls (ExplainParams) – ExplainParams input class.
explain_param (str) – The ExplainParams property to get private version of.
- Returns
The private version of the property.
- Return type
- classmethod get_serializable()[source]
Return only the ExplainParams properties that have meaningful data values for serialization.
- Parameters
cls (ExplainParams) – ExplainParams input class.
- Returns
A set of property names, e.g., ‘GLOBAL_IMPORTANCE_VALUES’, ‘MODEL_TYPE’, etc.
- Return type
- class ml_wrappers.common.constants.ExplainType[source]
Bases:
object
Provide constants for model and explainer type information, useful for visualization.
- CLASSIFICATION = 'classification'
- DATA = 'data_type'
- EXPLAIN = 'explain_type'
- EXPLAINER = 'explainer'
- FUNCTION = 'function'
- GLOBAL = 'global'
- HAN = 'han'
- IS_ENG = 'is_engineered'
- IS_RAW = 'is_raw'
- LIME = 'lime'
- LOCAL = 'local'
- METHOD = 'method'
- MIMIC = 'mimic'
- MODEL = 'model_type'
- MODEL_CLASS = 'model_class'
- MODEL_TASK = 'model_task'
- PFI = 'pfi'
- REGRESSION = 'regression'
- SHAP = 'shap'
- SHAP_DEEP = 'shap_deep'
- SHAP_GPU_KERNEL = 'shap_gpu_kernel'
- SHAP_KERNEL = 'shap_kernel'
- SHAP_LINEAR = 'shap_linear'
- SHAP_TREE = 'shap_tree'
- TABULAR = 'tabular'
- class ml_wrappers.common.constants.ExplainableModelType(value)[source]
-
Provide constants for the explainable model type.
- LINEAR_EXPLAINABLE_MODEL_TYPE = 'linear_explainable_model_type'
- TREE_EXPLAINABLE_MODEL_TYPE = 'tree_explainable_model_type'
- class ml_wrappers.common.constants.ExplanationParams[source]
Bases:
object
Provide constants for explanation parameters.
- CLASSES = 'classes'
- EXPECTED_VALUES = 'expected_values'
- class ml_wrappers.common.constants.Extension[source]
Bases:
object
Provide constants for extensions to interpret package.
- BLACKBOX = 'blackbox'
- GLASSBOX = 'model'
- GLOBAL = 'global'
- GREYBOX = 'specific'
- LOCAL = 'local'
- class ml_wrappers.common.constants.InterpretData[source]
Bases:
object
Provide Data and Visualize constants for interpret core.
- BASE_VALUE = 'Base Value'
- EXPLANATION_CLASS_DIMENSION = 'explanation_class_dimension'
- EXPLANATION_TYPE = 'explanation_type'
- EXTRA = 'extra'
- FEATURE_LIST = 'feature_list'
- GLOBAL_FEATURE_IMPORTANCE = 'global_feature_importance'
- INTERCEPT = 'intercept'
- LOCAL_FEATURE_IMPORTANCE = 'local_feature_importance'
- MLI = 'mli'
- MULTICLASS = 'multiclass'
- NAMES = 'names'
- OVERALL = 'overall'
- PERF = 'perf'
- SCORES = 'scores'
- SINGLE = 'single'
- SPECIFIC = 'specific'
- TYPE = 'type'
- UNIVARIATE = 'univariate'
- VALUE = 'value'
- VALUES = 'values'
- class ml_wrappers.common.constants.LightGBMParams[source]
Bases:
object
Provide constants for LightGBM.
- CATEGORICAL_FEATURE = 'categorical_feature'
- class ml_wrappers.common.constants.LightGBMSerializationConstants[source]
Bases:
object
Provide internal class that defines fields used for MimicExplainer serialization.
- IDENTITY = '_identity'
- LOGGER = '_logger'
- MODEL_STR = 'model_str'
- MULTICLASS = 'multiclass'
- OBJECTIVE = 'objective'
- REGRESSION = 'regression'
- TREE_EXPLAINER = '_tree_explainer'
- enum_properties = ['_shap_values_output']
- nonify_properties = ['_logger', '_tree_explainer']
- save_properties = ['_lgbm']
- class ml_wrappers.common.constants.MimicSerializationConstants[source]
Bases:
object
Provide internal class that defines fields used for MimicExplainer serialization.
- ALLOW_ALL_TRANSFORMATIONS = '_allow_all_transformations'
- FUNCTION = 'function'
- IDENTITY = '_identity'
- INITIALIZATION_EXAMPLES = 'initialization_examples'
- LOGGER = '_logger'
- MODEL = 'model'
- ORIGINAL_EVAL_EXAMPLES = '_original_eval_examples'
- PREDICT_PROBA_FLAG = 'predict_proba_flag'
- RESET_INDEX = 'reset_index'
- TIMESTAMP_FEATURIZER = '_timestamp_featurizer'
- enum_properties = ['_shap_values_output']
- nonify_properties = ['_logger', 'model', 'function', 'initialization_examples', '_original_eval_examples', '_timestamp_featurizer']
- save_properties = ['surrogate_model']
- class ml_wrappers.common.constants.ModelTask(value)[source]
-
Provide model task constants.
For tabular data, can be ‘classification’, ‘regression’, or ‘unknown’. For text data, can be ‘text_classification’, ‘sentiment_analysis’, ‘question_answering’, ‘entailment’, ‘summarizations’ or ‘unknown’.
By default the model domain is inferred if ‘unknown’, but this can be overridden if you specify ‘classification’ or ‘regression’.
- CLASSIFICATION = 'classification'
- ENTAILMENT = 'entailment'
- IMAGE_CLASSIFICATION = 'image_classification'
- MULTILABEL_IMAGE_CLASSIFICATION = 'multilabel_image_classification'
- MULTILABEL_TEXT_CLASSIFICATION = 'multilabel_text_classification'
- OBJECT_DETECTION = 'object_detection'
- QUESTION_ANSWERING = 'question_answering'
- REGRESSION = 'regression'
- SENTIMENT_ANALYSIS = 'sentiment_analysis'
- SUMMARIZATIONS = 'summarizations'
- TEXT_CLASSIFICATION = 'text_classification'
- UNKNOWN = 'unknown'
- class ml_wrappers.common.constants.ResetIndex(value)[source]
-
Provide index column handling constants. Can be ‘ignore’, ‘reset’ or ‘reset_teacher’.
By default the index column is ignored, but you can override to reset it and make it a feature column that is then featurized to numeric, or reset it and ignore it during featurization but set it as the index when calling predict on the original model.
- Ignore = 'ignore'
- Reset = 'reset'
- ResetTeacher = 'reset_teacher'
- class ml_wrappers.common.constants.SHAPDefaults[source]
Bases:
object
Provide constants for default values to SHAP.
- INDEPENDENT = 'independent'
- class ml_wrappers.common.constants.SKLearn[source]
Bases:
object
Provide scikit-learn related constants.
- EXAMPLES = 'examples'
- LABELS = 'labels'
- PREDICT = 'predict'
- PREDICTIONS = 'predictions'
- PREDICT_PROBA = 'predict_proba'
- class ml_wrappers.common.constants.Scipy[source]
Bases:
object
Provide scipy related constants.
- CSR_FORMAT = 'csr'
- class ml_wrappers.common.constants.ShapValuesOutput(value)[source]
-
Provide constants for the SHAP values output from the explainer.
Can be ‘default’, ‘probability’ or ‘teacher_probability’. If ‘teacher_probability’ is specified, we use the probabilities from the teacher model.
- DEFAULT = 'default'
- PROBABILITY = 'probability'
- TEACHER_PROBABILITY = 'teacher_probability'
ml_wrappers.dataset
Defines a common dataset wrapper and common functions for data manipulation.
- class ml_wrappers.dataset.CustomTimestampFeaturizer(features=None, return_pandas=False, modify_in_place=False)[source]
Bases:
sklearn.base.BaseEstimator
,sklearn.base.TransformerMixin
An estimator for featurizing timestamp columns to numeric data.
- Parameters
- fit(X, y=None)[source]
Fits the CustomTimestampFeaturizer.
- Parameters
X (numpy.ndarray or pandas.DataFrame or scipy.sparse.csr_matrix) – The dataset containing timestamp columns to featurize.
y (Optional target values (None for unsupervised transformations)) – The target values.
- transform(X)[source]
Transforms the timestamp columns to numeric type in the given dataset.
Specifically, extracts the year, month, day, hour, minute, second and time since min timestamp in the training dataset.
- Parameters
X (numpy.ndarray or pandas.DataFrame or scipy.sparse.csr_matrix) – The dataset containing timestamp columns to featurize.
- Returns
The transformed dataset.
- Return type
numpy.ndarray or scipy.sparse.csr_matrix
- class ml_wrappers.dataset.DatasetWrapper(dataset, clear_references=False)[source]
Bases:
object
A wrapper around a dataset to make dataset operations more uniform across explainers.
- apply_indexer(column_indexer, bucket_unknown=False)[source]
Indexes categorical string features on the dataset.
- Parameters
column_indexer (sklearn.compose.ColumnTransformer) – The transformation steps to index the given dataset.
bucket_unknown (bool) – If true, buckets unknown values to separate categorical level.
- apply_one_hot_encoder(one_hot_encoder)[source]
One-hot-encode categorical string features on the dataset.
- Parameters
one_hot_encoder (sklearn.preprocessing.OneHotEncoder) – The transformation steps to one-hot-encode the given dataset.
- apply_timestamp_featurizer(timestamp_featurizer)[source]
Apply timestamp featurization on the dataset.
- Parameters
timestamp_featurizer (CustomTimestampFeaturizer) – The transformation steps to featurize timestamps in the given dataset.
- augment_data(max_num_of_augmentations=inf)[source]
Augment the current dataset.
- Parameters
max_augment_data_size (int) – number of times we stack permuted x to augment.
- compute_summary(nclusters=10, use_gpu=False, **kwargs)[source]
Summarizes the dataset if it hasn’t been summarized yet.
- property dataset
Get the dataset.
- Returns
The underlying dataset.
- Return type
numpy.ndarray or scipy.sparse.csr_matrix
- get_column_indexes(features, categorical_features)[source]
Get the column indexes for the given column names.
- get_features(features=None, explain_subset=None, **kwargs)[source]
Get the features of the dataset if None on current kwargs.
- Returns
The features of the dataset if currently None on kwargs.
- Return type
- property num_features
Get the number of features (columns) on the dataset.
- Returns
The number of features (columns) in the dataset.
- Return type
- property original_dataset
Get the original dataset prior to performing any operations.
Note: if the original dataset was a pandas dataframe, this will return the numpy version.
- Returns
The original dataset.
- Return type
numpy.ndarray or scipy.sparse matrix
- property original_dataset_with_type
Get the original typed dataset which could be a numpy array or pandas DataFrame or pandas Series.
- Returns
The original dataset.
- Return type
numpy.ndarray or pandas.DataFrame or pandas.Series or scipy.sparse matrix
- sample(max_dim_clustering=50, sampling_method='hdbscan')[source]
Sample the examples.
First does random downsampling to upper_bound rows, then tries to find the optimal downsample based on how many clusters can be constructed from the data. If sampling_method is hdbscan, uses hdbscan to cluster the data and then downsamples to that number of clusters. If sampling_method is k-means, uses different values of k, cutting in half each time, and chooses the k with highest silhouette score to determine how much to downsample the data. The danger of using only random downsampling is that we might downsample too much or too little, so the clustering approach is a heuristic to give us some idea of how much we should downsample to.
- set_index()[source]
Undo reset_index. Set index as feature on internal dataset to be an index again.
- string_index(columns=None)[source]
Indexes categorical string features on the dataset.
- Parameters
columns (list) – Optional parameter specifying the subset of columns that may need to be string indexed.
- Returns
The transformation steps to index the given dataset.
- Return type
sklearn.compose.ColumnTransformer
- property summary_dataset
Get the summary dataset without any subsetting.
- Returns
The original dataset or None if summary was not computed.
- Return type
numpy.ndarray or scipy.sparse.csr_matrix
- take_subset(explain_subset)[source]
Take a subset of the dataset if not done before.
- Parameters
explain_subset (list) – A list of column indexes to take from the original dataset.
- timestamp_featurizer()[source]
Featurizes the timestamp columns.
- Returns
The transformation steps to featurize the timestamp columns.
- Return type
- property typed_dataset
Get the dataset in the original type, pandas DataFrame or Series.
- Returns
The underlying dataset.
- Return type
numpy.ndarray or pandas.DataFrame or pandas.Series or scipy.sparse matrix
- typed_wrapper_func(dataset, keep_index_as_feature=False)[source]
Get a wrapper function to convert the dataset to the original type, pandas DataFrame or Series.
- Parameters
dataset (numpy.ndarray or scipy.sparse.csr_matrix) – The dataset to convert to original type.
keep_index_as_feature (bool) – Whether to keep the index as a feature when converting back. Off by default to convert it back to index.
- Returns
A wrapper function for a given dataset to convert to original type.
- Return type
numpy.ndarray or scipy.sparse.csr_matrix or pandas.DataFrame or pandas.Series
ml_wrappers.dataset.dataset_utils
Defines helpful utilities for the DatasetWrapper.
ml_wrappers.dataset.dataset_wrapper
Defines a helpful dataset wrapper to allow operations such as summarizing data, taking the subset or sampling.
- class ml_wrappers.dataset.dataset_wrapper.DatasetWrapper(dataset, clear_references=False)[source]
Bases:
object
A wrapper around a dataset to make dataset operations more uniform across explainers.
- apply_indexer(column_indexer, bucket_unknown=False)[source]
Indexes categorical string features on the dataset.
- Parameters
column_indexer (sklearn.compose.ColumnTransformer) – The transformation steps to index the given dataset.
bucket_unknown (bool) – If true, buckets unknown values to separate categorical level.
- apply_one_hot_encoder(one_hot_encoder)[source]
One-hot-encode categorical string features on the dataset.
- Parameters
one_hot_encoder (sklearn.preprocessing.OneHotEncoder) – The transformation steps to one-hot-encode the given dataset.
- apply_timestamp_featurizer(timestamp_featurizer)[source]
Apply timestamp featurization on the dataset.
- Parameters
timestamp_featurizer (CustomTimestampFeaturizer) – The transformation steps to featurize timestamps in the given dataset.
- augment_data(max_num_of_augmentations=inf)[source]
Augment the current dataset.
- Parameters
max_augment_data_size (int) – number of times we stack permuted x to augment.
- compute_summary(nclusters=10, use_gpu=False, **kwargs)[source]
Summarizes the dataset if it hasn’t been summarized yet.
- property dataset
Get the dataset.
- Returns
The underlying dataset.
- Return type
numpy.ndarray or scipy.sparse.csr_matrix
- get_column_indexes(features, categorical_features)[source]
Get the column indexes for the given column names.
- get_features(features=None, explain_subset=None, **kwargs)[source]
Get the features of the dataset if None on current kwargs.
- Returns
The features of the dataset if currently None on kwargs.
- Return type
- property num_features
Get the number of features (columns) on the dataset.
- Returns
The number of features (columns) in the dataset.
- Return type
- property original_dataset
Get the original dataset prior to performing any operations.
Note: if the original dataset was a pandas dataframe, this will return the numpy version.
- Returns
The original dataset.
- Return type
numpy.ndarray or scipy.sparse matrix
- property original_dataset_with_type
Get the original typed dataset which could be a numpy array or pandas DataFrame or pandas Series.
- Returns
The original dataset.
- Return type
numpy.ndarray or pandas.DataFrame or pandas.Series or scipy.sparse matrix
- sample(max_dim_clustering=50, sampling_method='hdbscan')[source]
Sample the examples.
First does random downsampling to upper_bound rows, then tries to find the optimal downsample based on how many clusters can be constructed from the data. If sampling_method is hdbscan, uses hdbscan to cluster the data and then downsamples to that number of clusters. If sampling_method is k-means, uses different values of k, cutting in half each time, and chooses the k with highest silhouette score to determine how much to downsample the data. The danger of using only random downsampling is that we might downsample too much or too little, so the clustering approach is a heuristic to give us some idea of how much we should downsample to.
- set_index()[source]
Undo reset_index. Set index as feature on internal dataset to be an index again.
- string_index(columns=None)[source]
Indexes categorical string features on the dataset.
- Parameters
columns (list) – Optional parameter specifying the subset of columns that may need to be string indexed.
- Returns
The transformation steps to index the given dataset.
- Return type
sklearn.compose.ColumnTransformer
- property summary_dataset
Get the summary dataset without any subsetting.
- Returns
The original dataset or None if summary was not computed.
- Return type
numpy.ndarray or scipy.sparse.csr_matrix
- take_subset(explain_subset)[source]
Take a subset of the dataset if not done before.
- Parameters
explain_subset (list) – A list of column indexes to take from the original dataset.
- timestamp_featurizer()[source]
Featurizes the timestamp columns.
- Returns
The transformation steps to featurize the timestamp columns.
- Return type
- property typed_dataset
Get the dataset in the original type, pandas DataFrame or Series.
- Returns
The underlying dataset.
- Return type
numpy.ndarray or pandas.DataFrame or pandas.Series or scipy.sparse matrix
- typed_wrapper_func(dataset, keep_index_as_feature=False)[source]
Get a wrapper function to convert the dataset to the original type, pandas DataFrame or Series.
- Parameters
dataset (numpy.ndarray or scipy.sparse.csr_matrix) – The dataset to convert to original type.
keep_index_as_feature (bool) – Whether to keep the index as a feature when converting back. Off by default to convert it back to index.
- Returns
A wrapper function for a given dataset to convert to original type.
- Return type
numpy.ndarray or scipy.sparse.csr_matrix or pandas.DataFrame or pandas.Series
ml_wrappers.dataset.timestamp_featurizer
Defines a custom timestamp featurizer for converting timestamp columns to numeric.
- class ml_wrappers.dataset.timestamp_featurizer.CustomTimestampFeaturizer(features=None, return_pandas=False, modify_in_place=False)[source]
Bases:
sklearn.base.BaseEstimator
,sklearn.base.TransformerMixin
An estimator for featurizing timestamp columns to numeric data.
- Parameters
- fit(X, y=None)[source]
Fits the CustomTimestampFeaturizer.
- Parameters
X (numpy.ndarray or pandas.DataFrame or scipy.sparse.csr_matrix) – The dataset containing timestamp columns to featurize.
y (Optional target values (None for unsupervised transformations)) – The target values.
- transform(X)[source]
Transforms the timestamp columns to numeric type in the given dataset.
Specifically, extracts the year, month, day, hour, minute, second and time since min timestamp in the training dataset.
- Parameters
X (numpy.ndarray or pandas.DataFrame or scipy.sparse.csr_matrix) – The dataset containing timestamp columns to featurize.
- Returns
The transformed dataset.
- Return type
numpy.ndarray or scipy.sparse.csr_matrix
ml_wrappers.model
Common infrastructure, class hierarchy and utilities for model explanations.
- class ml_wrappers.model.EndpointWrapperModel(api_key, url, allow_self_signed_https=False, extra_headers=None, transform_output_dict=False, class_names=None, wrap_input_data_dict=False, batch_size=10, api_key_auto_refresh_callable=None)[source]
Bases:
object
Defines an MLFlow model wrapper for an endpoint.
- allow_self_signed_https(allowed)[source]
Allow self signed HTTPS.
- Parameters
allowed (bool) – Whether to allow self signed HTTPS.
- static from_auto_refresh_callable(api_key_auto_refresh_callable, url, **kwargs)[source]
Create an EndpointWrapperModel from an auto refresh callable.
The callable method should return the latest API key.
- Parameters
api_key_auto_refresh_callable (callable) – The method to call to refresh the API key.
kwargs (dict) – The keyword arguments.
- Returns
The EndpointWrapperModel.
- Return type
- load_context(context)[source]
Load the context.
- Parameters
context (mlflow.pyfunc.model.PythonModelContext) – The context.
- predict(context, model_input=None)[source]
Predict using the model.
- Parameters
context (mlflow.pyfunc.model.PythonModelContext or pandas.DataFrame) – The context for MLFlow model or the input data.
model_input (pandas.DataFrame) – The input to the model.
- Returns
The predictions.
- Return type
numpy.ndarray
- class ml_wrappers.model.OpenaiWrapperModel(api_type, api_base, api_version, api_key, engine='gpt-4-32k', temperature=0.7, max_tokens=800, top_p=0.95, frequency_penalty=0, presence_penalty=0, stop=None)[source]
Bases:
object
A model wrapper for an openai model endpoint.
- predict(context, model_input=None)[source]
Predict using the model.
- Parameters
context (mlflow.pyfunc.model.PythonModelContext or pandas.DataFrame) – The context for MLFlow model or the input data.
model_input (pandas.DataFrame) – The input to the model.
- Returns
The predictions.
- Return type
numpy.ndarray
- class ml_wrappers.model.WrappedClassificationModel(model, eval_function, examples=None)[source]
Bases:
ml_wrappers.model.base_wrapped_model.BaseWrappedModel
A class for wrapping a classification model.
- predict(dataset)[source]
Predict the output using the wrapped classification model.
- Parameters
dataset (ml_wrappers.DatasetWrapper) – The dataset to predict on.
- predict_proba(dataset)[source]
Predict the output probability using the wrapped model.
- Parameters
dataset (ml_wrappers.DatasetWrapper) – The dataset to predict_proba on.
- class ml_wrappers.model.WrappedPytorchModel(model, image_to_tensor=False)[source]
Bases:
object
A class for wrapping a PyTorch model.
Note at time of initialization, since we don’t have access to the dataset, we can’t infer if this is for classification or regression case. Hence, we add the predict_classes method for classification, and keep predict for either outputting values in regression or probabilities in classification.
- predict(dataset)[source]
Predict the output using the wrapped PyTorch model.
- Parameters
dataset (ml_wrappers.DatasetWrapper) – The dataset to predict on.
- Returns
The prediction results.
- Return type
numpy.ndarray
- predict_classes(dataset)[source]
Predict the class using the wrapped PyTorch model.
- Parameters
dataset (ml_wrappers.DatasetWrapper) – The dataset to predict on.
- Returns
The predicted classes.
- Return type
numpy.ndarray
- predict_proba(dataset)[source]
Predict the output probability using the wrapped PyTorch model.
- Parameters
dataset (ml_wrappers.DatasetWrapper) – The dataset to predict_proba on.
- Returns
The predicted probabilities.
- Return type
numpy.ndarray
- class ml_wrappers.model.WrappedRegressionModel(model, eval_function, examples=None)[source]
Bases:
ml_wrappers.model.base_wrapped_model.BaseWrappedModel
A class for wrapping a regression model.
- predict(dataset)[source]
Predict the output using the wrapped regression model.
- Parameters
dataset (ml_wrappers.DatasetWrapper) – The dataset to predict on.
- class ml_wrappers.model.WrappedTensorflowModel(model)[source]
Bases:
object
A class for wrapping a TensorFlow model.
Note at time of initialization, since we don’t have access to the dataset, we can’t infer if this is for classification or regression case. Hence, we add the predict_classes method for classification, and keep predict for either outputting values in regression or probabilities in classification.
- predict(dataset)[source]
Predict the output using the wrapped TensorFlow model.
- Parameters
dataset (ml_wrappers.DatasetWrapper) – The dataset to predict on.
- Returns
The prediction results.
- Return type
numpy.ndarray
- predict_classes(dataset)[source]
Predict the class using the wrapped TensorFlow model.
- Parameters
dataset (ml_wrappers.DatasetWrapper) – The dataset to predict on.
- Returns
The predicted classes.
- Return type
numpy.ndarray
- predict_proba(dataset)[source]
Predict the output probability using the wrapped TensorFlow model.
- Parameters
dataset (ml_wrappers.DatasetWrapper) – The dataset to predict_proba on.
- Returns
The predicted probabilities.
- Return type
numpy.ndarray
- ml_wrappers.model.is_sequential(model)[source]
Returns True if the model is a sequential model.
Note the model class name can be keras.src.engine.sequential.Sequential, keras.engine.sequential.Sequential or tensorflow.python.keras.engine.sequential.Sequential depending on the tensorflow version. In the latest 2.13 version, the namespace changed from keras.engine to keras.src.engine. The check should include all of these cases.
- Parameters
model (tf.keras.Model) – The model to check.
- Returns
True if the model is a sequential model.
- Return type
- ml_wrappers.model.wrap_model(model, examples, model_task: str = ModelTask.UNKNOWN, num_classes: Optional[int] = None, classes: Optional[Union[list, numpy.array]] = None, device='auto')[source]
- If needed, wraps the model in a common API based on model task and
prediction function contract.
- Parameters
model (model with a predict or predict_proba function.) – The model to evaluate on the examples.
examples (ml_wrappers.DatasetWrapper or numpy.ndarray or pandas.DataFrame or panads.Series or scipy.sparse.csr_matrix or shap.DenseData or torch.Tensor) – The model evaluation examples. Note the examples will be wrapped in a DatasetWrapper, if not wrapped when input.
model_task (str) – Optional parameter to specify whether the model is a classification or regression model. In most cases, the type of the model can be inferred based on the shape of the output, where a classifier has a predict_proba method and outputs a 2 dimensional array, while a regressor has a predict method and outputs a 1 dimensional array.
classes (list or np.ndarray) – optional parameter specifying a list of class names the dataset
num_classes (int) – optional parameter specifying the number of classes in the dataset
device (str, for instance: 'cpu', 'cuda') – optional parameter specifying the device to move the model to. If not specified, then cpu is the default
- Returns
The wrapper model.
- Return type
model
ml_wrappers.model.base_wrapped_model
Defines a base class for wrapping models.
ml_wrappers.model.evaluator
ml_wrappers.model.fastai_wrapper
Defines model wrappers and utilities for fastai tabular models.
- class ml_wrappers.model.fastai_wrapper.WrappedFastAITabularModel(model)[source]
Bases:
object
A class for wrapping a FastAI tabular model in the scikit-learn style.
- predict(dataset)[source]
Predict the output value using the wrapped FastAI model.
- Parameters
dataset (ml_wrappers.DatasetWrapper) – The dataset to predict on.
- Returns
The predicted values.
- Return type
numpy.ndarray
- predict_proba(dataset)[source]
Predict the output probability using the FastAI model.
- Parameters
dataset (ml_wrappers.DatasetWrapper) – The dataset to predict_proba on.
- Returns
The predicted probabilities.
- Return type
numpy.ndarray
ml_wrappers.model.function_wrapper
Defines helper utilities for resolving prediction function shape inconsistencies.
ml_wrappers.model.image_model_wrapper
Defines wrappers for vision-based models.
- class ml_wrappers.model.image_model_wrapper.MLflowDRiseWrapper(model: Any, classes: Union[list, numpy.ndarray])[source]
Bases:
object
Wraps a Mlflow model with a predict API function.
To be compatible with the D-RISE explainability method, all models must be wrapped to have the same output and input class and a predict function for object detection. This wrapper is customized for the FasterRCNN model from AutoML. Unlike the Pytorch wrapper, this wrapper does not inherit from GeneralObjectDetectionModelWrapper as this super class requires predict to take a tensor input.
- predict(dataset: pandas.core.frame.DataFrame, iou_threshold: float = 0.25, score_threshold: float = 0.5)[source]
Predict the output value using the wrapped MLflow model.
- Parameters
dataset (pandas.DataFrame) – The dataset to predict on.
iou_threshold (float) – Intersection-over-Union (IoU) threshold for NMS (or the amount of acceptable error). Objects with error scores higher than the threshold will be removed.
score_threshold (float) – Threshold to filter detections based on predicted confidence scores.
- Returns
The predicted values.
- Return type
numpy.ndarray
- class ml_wrappers.model.image_model_wrapper.PytorchDRiseWrapper(model, number_of_classes: int, device='auto', transforms=None, iou_threshold=None, score_threshold=None)[source]
Bases:
object
Wraps a PytorchFasterRCNN model with a predict API function.
To be compatible with the D-RISE explainability method, all models must be wrapped to have the same output and input class and a predict function for object detection. This wrapper is customized for the FasterRCNN model from Pytorch, and can also be used with the RetinaNet or any other models with the same output class.
- predict(x: Any, iou_threshold: float = 0.5, score_threshold: float = 0.5)[source]
Create a list of detection records from the image predictions.
- Parameters
x (torch.Tensor) – Tensor of the image
iou_threshold (float) – Intersection-over-Union (IoU) threshold for NMS (or the amount of acceptable error). Objects with error scores higher than the threshold will be removed.
score_threshold (float) – Threshold to filter detections based on predicted confidence scores.
- Returns
Baseline detections to get saliency maps for
- Return type
List of Detection Records
- class ml_wrappers.model.image_model_wrapper.WrappedFastAIImageClassificationModel(model, multilabel=False)[source]
Bases:
object
A class for wrapping a FastAI model in the scikit-learn style.
- predict(dataset)[source]
Predict the output value using the wrapped FastAI model.
- Parameters
dataset (ml_wrappers.DatasetWrapper) – The dataset to predict on.
- Returns
The predicted values.
- Return type
numpy.ndarray
- predict_proba(dataset)[source]
Predict the output probability using the FastAI model.
- Parameters
dataset (ml_wrappers.DatasetWrapper) – The dataset to predict_proba on.
- Returns
The predicted probabilities.
- Return type
numpy.ndarray
- class ml_wrappers.model.image_model_wrapper.WrappedMlflowAutomlImagesClassificationModel(model: Any)[source]
Bases:
object
A class for wrapping an AutoML for images MLflow classification model in the scikit-learn style.
- class ml_wrappers.model.image_model_wrapper.WrappedMlflowAutomlObjectDetectionModel(model: Any, classes: Union[list, numpy.array])[source]
Bases:
object
A class for wrapping an AutoML for images MLflow object detection model in the scikit-learn style.
- predict(dataset: pandas.core.frame.DataFrame, iou_threshold: float = 0.5, score_threshold: float = 0.5)[source]
Create a list of detection records from the image predictions.
Below is example Label (y) representation for a cohort of 2 images, with 3 objects detected for the first image and 1 for the second image.
- [
- [
[class, topX, topY, bottomX, bottomY, (optional) confidence_score], [class, topX, topY, bottomX, bottomY, (optional) confidence_score], [class, topX, topY, bottomX, bottomY, (optional) confidence_score],
], [
[class, topX, topY, bottomX, bottomY, (optional) confidence_score],
]
]
- Parameters
dataset (pandas.DataFrame) – The dataset to predict on.
iou_threshold (float) – Intersection-over-Union (IoU) threshold for NMS (or the amount of acceptable error). Objects with error cores higher than the threshold will be removed.
score_threshold (float) – Threshold to filter detections based on predicted confidence scores.
- Returns
Final detections from the object detector
- Return type
numpy array of Detection Records
- predict_proba(dataset: pandas.core.frame.DataFrame, iou_threshold=0.1) numpy.ndarray [source]
Predict the output probability using the MLflow model.
- Parameters
dataset (pandas.DataFrame) – The dataset to predict_proba on.
iou_threshold (float) – amount of acceptable error. objects with error scores higher than the threshold will be removed
- Returns
The predicted probabilities.
- Return type
numpy.ndarray
- class ml_wrappers.model.image_model_wrapper.WrappedObjectDetectionModel(model: Any, number_of_classes: int, device='auto')[source]
Bases:
object
A class for wrapping a object detection model in the scikit-learn style.
- predict(x, iou_threshold: float = 0.5, score_threshold: float = 0.5)[source]
Create a list of detection records from the image predictions.
- Parameters
x (torch.Tensor) – Tensor of the image
- Returns
Baseline detections to get saliency maps for
- Return type
numpy array of Detection Records
Example Label (y) representation for a cohort of 2 images:
[
[ [object_1, x1, y1, b1, h1, (optional) confidence_score], [object_2, x2, y2, b2, h2, (optional) confidence_score], [object_1, x3, y3, b3, h3, (optional) confidence_score]
],
- [
[object_1, x4, y4, b4, h4, (optional) confidence_score], [object_2, x5, y5, b5, h5, (optional) confidence_score]
]
]
- predict_proba(dataset, iou_threshold=0.1)[source]
Predict the output probability using the wrapped model.
- Parameters
dataset (ml_wrappers.DatasetWrapper) – The dataset to predict_proba on.
iou_threshold (float) – amount of acceptable error. objects with error scores higher than the threshold will be removed
- class ml_wrappers.model.image_model_wrapper.WrappedTransformerImageClassificationModel(model)[source]
Bases:
object
A class for wrapping a Transformers model in the scikit-learn style.
- predict(dataset)[source]
Predict the output using the wrapped Transformers model.
- Parameters
dataset (ml_wrappers.DatasetWrapper) – The dataset to predict on.
- Returns
The predicted values.
- Return type
numpy.ndarray
- predict_proba(dataset)[source]
Predict the output probability using the Transformers model.
- Parameters
dataset (ml_wrappers.DatasetWrapper) – The dataset to predict_proba on.
- Returns
The predicted probabilities.
- Return type
numpy.ndarray
- ml_wrappers.model.image_model_wrapper.expand_class_scores(scores: Any, labels: Any, number_of_classes: int) Any [source]
Extrapolate a full set of class scores.
Many object detection models don’t return a full set of class scores, but rather just a score for the predicted class. This is a helper function that approximates a full set of class scores by dividing the difference between 1.0 and the predicted class score among the remaning classes.
- Parameters
scores (torch.Tensor) – Set of class specific scores. Shape [D] where D is number of detections
labels (torch.Tensor (ints)) – Set of label indices corresponding to predicted class. Shape [D] where D is number of detections
number_of_classes (int) – Number of classes model predicts
- Returns
A set of expanded scores, of shape [D, C], where C is number of classes
- Type
torch.Tensor
ml_wrappers.model.model_utils
Defines common model utilities.
ml_wrappers.model.model_wrapper
Defines helpful model wrapper and utils for implicitly rewrapping the model to conform to explainer contracts.
- ml_wrappers.model.model_wrapper.wrap_model(model, examples, model_task: str = ModelTask.UNKNOWN, num_classes: Optional[int] = None, classes: Optional[Union[list, numpy.array]] = None, device='auto')[source]
- If needed, wraps the model in a common API based on model task and
prediction function contract.
- Parameters
model (model with a predict or predict_proba function.) – The model to evaluate on the examples.
examples (ml_wrappers.DatasetWrapper or numpy.ndarray or pandas.DataFrame or panads.Series or scipy.sparse.csr_matrix or shap.DenseData or torch.Tensor) – The model evaluation examples. Note the examples will be wrapped in a DatasetWrapper, if not wrapped when input.
model_task (str) – Optional parameter to specify whether the model is a classification or regression model. In most cases, the type of the model can be inferred based on the shape of the output, where a classifier has a predict_proba method and outputs a 2 dimensional array, while a regressor has a predict method and outputs a 1 dimensional array.
classes (list or np.ndarray) – optional parameter specifying a list of class names the dataset
num_classes (int) – optional parameter specifying the number of classes in the dataset
device (str, for instance: 'cpu', 'cuda') – optional parameter specifying the device to move the model to. If not specified, then cpu is the default
- Returns
The wrapper model.
- Return type
model
ml_wrappers.model.predictions_wrapper
Defines classes to wrap the training/test data and the corresponding predictions from the model.
- exception ml_wrappers.model.predictions_wrapper.DataValidationException[source]
Bases:
Exception
An exception indicating that some user supplied data is not valid.
- Parameters
exception_message (str) – A message describing the error.
- exception ml_wrappers.model.predictions_wrapper.EmptyDataException[source]
Bases:
Exception
An exception indicating that some operation produced empty data.
- Parameters
exception_message (str) – A message describing the error.
- class ml_wrappers.model.predictions_wrapper.PredictionsModelWrapper(test_data: pandas.core.frame.DataFrame, y_pred: numpy.ndarray, should_construct_pandas_query: Optional[bool] = True)[source]
Bases:
object
Model wrapper to wrap the samples used to train the models and the predictions of the model. This wrapper is useful when it is not possible to load the model.
- class ml_wrappers.model.predictions_wrapper.PredictionsModelWrapperClassification(test_data: pandas.core.frame.DataFrame, y_pred: numpy.ndarray, y_pred_proba: Optional[numpy.ndarray] = None, should_construct_pandas_query: Optional[bool] = True)[source]
Bases:
ml_wrappers.model.predictions_wrapper.PredictionsModelWrapper
Model wrapper to wrap the samples used to train the models and the predictions of the model for classification tasks.
- predict_proba(query_test_data: pandas.core.frame.DataFrame) numpy.ndarray [source]
Return the prediction probabilities based on the query data.
- Parameters
query_test_data (pd.DataFrame) – The data for which the prediction probabilities need to be returned.
- Returns
Prediction probabilities of the model.
- Return type
np.ndarray
- class ml_wrappers.model.predictions_wrapper.PredictionsModelWrapperRegression(test_data: pandas.core.frame.DataFrame, y_pred: numpy.ndarray, should_construct_pandas_query: Optional[bool] = True)[source]
Bases:
ml_wrappers.model.predictions_wrapper.PredictionsModelWrapper
Model wrapper to wrap the samples used to train the models and the predictions of the model for regression tasks.
ml_wrappers.model.pytorch_wrapper
Defines model wrappers and utilities for pytorch models.
- class ml_wrappers.model.pytorch_wrapper.WrappedPytorchModel(model, image_to_tensor=False)[source]
Bases:
object
A class for wrapping a PyTorch model.
Note at time of initialization, since we don’t have access to the dataset, we can’t infer if this is for classification or regression case. Hence, we add the predict_classes method for classification, and keep predict for either outputting values in regression or probabilities in classification.
- predict(dataset)[source]
Predict the output using the wrapped PyTorch model.
- Parameters
dataset (ml_wrappers.DatasetWrapper) – The dataset to predict on.
- Returns
The prediction results.
- Return type
numpy.ndarray
- predict_classes(dataset)[source]
Predict the class using the wrapped PyTorch model.
- Parameters
dataset (ml_wrappers.DatasetWrapper) – The dataset to predict on.
- Returns
The predicted classes.
- Return type
numpy.ndarray
- predict_proba(dataset)[source]
Predict the output probability using the wrapped PyTorch model.
- Parameters
dataset (ml_wrappers.DatasetWrapper) – The dataset to predict_proba on.
- Returns
The predicted probabilities.
- Return type
numpy.ndarray
ml_wrappers.model.tensorflow_wrapper
Defines model wrappers and utilities for tensorflow models.
- class ml_wrappers.model.tensorflow_wrapper.WrappedTensorflowModel(model)[source]
Bases:
object
A class for wrapping a TensorFlow model.
Note at time of initialization, since we don’t have access to the dataset, we can’t infer if this is for classification or regression case. Hence, we add the predict_classes method for classification, and keep predict for either outputting values in regression or probabilities in classification.
- predict(dataset)[source]
Predict the output using the wrapped TensorFlow model.
- Parameters
dataset (ml_wrappers.DatasetWrapper) – The dataset to predict on.
- Returns
The prediction results.
- Return type
numpy.ndarray
- predict_classes(dataset)[source]
Predict the class using the wrapped TensorFlow model.
- Parameters
dataset (ml_wrappers.DatasetWrapper) – The dataset to predict on.
- Returns
The predicted classes.
- Return type
numpy.ndarray
- predict_proba(dataset)[source]
Predict the output probability using the wrapped TensorFlow model.
- Parameters
dataset (ml_wrappers.DatasetWrapper) – The dataset to predict_proba on.
- Returns
The predicted probabilities.
- Return type
numpy.ndarray
- ml_wrappers.model.tensorflow_wrapper.is_sequential(model)[source]
Returns True if the model is a sequential model.
Note the model class name can be keras.src.engine.sequential.Sequential, keras.engine.sequential.Sequential or tensorflow.python.keras.engine.sequential.Sequential depending on the tensorflow version. In the latest 2.13 version, the namespace changed from keras.engine to keras.src.engine. The check should include all of these cases.
- Parameters
model (tf.keras.Model) – The model to check.
- Returns
True if the model is a sequential model.
- Return type
ml_wrappers.model.text_model_wrapper
Defines wrappers for text-based models.
- class ml_wrappers.model.text_model_wrapper.WrappedQuestionAnsweringModel(model)[source]
Bases:
object
A class for wrapping a Transformers model in the scikit-learn style.
- predict(dataset)[source]
Predict the output using the wrapped Transformers model.
- Parameters
dataset (ml_wrappers.DatasetWrapper) – The dataset to predict on.
- class ml_wrappers.model.text_model_wrapper.WrappedTextClassificationModel(model, multilabel=False)[source]
Bases:
object
A class for wrapping a Transformers model in the scikit-learn style.
- predict(dataset)[source]
Predict the output using the wrapped Transformers model.
- Parameters
dataset (ml_wrappers.DatasetWrapper) – The dataset to predict on.
- predict_proba(dataset)[source]
Predict the output probability using the Transformers model.
- Parameters
dataset (ml_wrappers.DatasetWrapper) – The dataset to predict_proba on.
ml_wrappers.model.wrapped_classification_model
Defines a class for wrapping classification models.
- class ml_wrappers.model.wrapped_classification_model.WrappedClassificationModel(model, eval_function, examples=None)[source]
Bases:
ml_wrappers.model.base_wrapped_model.BaseWrappedModel
A class for wrapping a classification model.
- predict(dataset)[source]
Predict the output using the wrapped classification model.
- Parameters
dataset (ml_wrappers.DatasetWrapper) – The dataset to predict on.
- predict_proba(dataset)[source]
Predict the output probability using the wrapped model.
- Parameters
dataset (ml_wrappers.DatasetWrapper) – The dataset to predict_proba on.
ml_wrappers.model.wrapped_classification_without_proba_model
Defines a class for wrapping classifiers without predict_proba.
- class ml_wrappers.model.wrapped_classification_without_proba_model.WrappedClassificationWithoutProbaModel(model)[source]
Bases:
object
A class for wrapping a classifier without a predict_proba method.
Note: the classifier may not output numeric values for its predictions. We generate a trival boolean version of predict_proba
- predict(dataset)[source]
Predict the output using the wrapped regression model.
- Parameters
dataset (ml_wrappers.DatasetWrapper) – The dataset to predict on.
- predict_proba(dataset)[source]
Predict the output probability using the wrapped model.
- Parameters
dataset (ml_wrappers.DatasetWrapper) – The dataset to predict_proba on.
ml_wrappers.model.wrapped_regression_model
Defines a class for wrapping regression models.
- class ml_wrappers.model.wrapped_regression_model.WrappedRegressionModel(model, eval_function, examples=None)[source]
Bases:
ml_wrappers.model.base_wrapped_model.BaseWrappedModel
A class for wrapping a regression model.
- predict(dataset)[source]
Predict the output using the wrapped regression model.
- Parameters
dataset (ml_wrappers.DatasetWrapper) – The dataset to predict on.