Function Reference

PGBM is a lightweight package. For the Torch backend, we expose a set of functions and methods via (PGBM, PGBMDist and PGBMRegressor).

For the Scikit-learn backend, we provide a modified version of Scikit-learn’s HistGradientBoostingRegressor .

Torch backend

The PyTorch backend is exposed by importing pgbm.torch.

class pgbm.torch.DistPGBM(size=1, rank=0)[source]

Bases: object

Distributed Probabilistic Gradient Boosting Machines (PGBM) (Python class)

DistPGBM fits a Probabilistic Gradient Boosting Machine regression model and returns point and probabilistic predictions.

This class uses Torch as backend and can be used for distributed training.

Parameters:
  • size (int) – world size of distributed training, defaults to 1

  • rank – rank of the process in the distributed process pool, defaults to 0

crps_ensemble(yhat_dist, y)[source]

Calculate the empirical Continuously Ranked Probability Score (CRPS) for a set of forecasts for a number of samples (lower is better).

Based on crps_ensemble from properscoring https://pypi.org/project/properscoring/

Parameters:
  • yhat_dist (torch.Tensor) – forecasts for each sample of size [n_forecasts x n_samples].

  • y (torch.Tensor) – ground truth value of each sample of size [n_samples].

Returns:

CRPS score for each sample

Return type:

torch.Tensor

Example:

train_set = (X_train, y_train)
test_set = (X_test, y_test)
model = DistPGBM()
model.train(train_set, objective, metric)
yhat_test_dist = model.predict_dist(X_test)
crps = model.crps_ensemble(yhat_test_dist, y_test)
load(filename, device=None)[source]

Load a DistPGBM model from a file.

Parameters:
  • filename (str) – location of model file

  • device (torch.device, optional) – torch device, defaults to torch.device(‘cpu’)

Returns:

self

Return type:

DistPGBM object

Example:

model = DistPGBM()
model.load('model.pt')
optimize_distribution(X, y, distributions=None, tree_correlations=None)[source]

Find the distribution and tree correlation that best fits the data according to lowest CRPS score.

The parameters ‘distribution’ and ‘tree_correlation’ of a DistPGBM model will be adjusted to the best values after running this script.

This function returns the best found distribution and tree correlation.

Parameters:
  • X (torch.Tensor) – sample set of size [n_samples x n_features] for which to optimize the distribution.

  • y (torch.Tensor) – ground truth of size [n_samples] for sample set X

  • distributions (list, optional) – list containing distributions to choose from. Options are: normal, studentt, laplace, logistic, lognormal, gamma, gumbel, weibull, negativebinomial, poisson. Defaults to None (corresponds to iterating over all distributions)

  • tree_correlations (torch.Tensor, optional) – vector containing tree correlations to use in optimization procedure, defaults to None (corresponds to iterating over a default range).

Returns:

distribution and tree correlation that yields lowest CRPS

Return type:

tuple

Example:

train_set = (X_train, y_train)
validation_set = (X_validation, y_validation)
model = DistPGBM()
(best_dist, best_tree_corr) = model.optimize_distribution(X_validation, y_validation)
permutation_importance(X, y=None, n_permutations=10, levels=None)[source]

Calculate feature importance of a DistPGBM model for a sample set X by randomly permuting each feature.

This function can be executed in a supervised and unsupervised manner, depending on whether y is given.

If y is provided, the output of this function is the change in error metric when randomly permuting a feature.

If y is not provided, the output is the weighted average change in prediction when randomly permuting a feature.

Parameters:
  • X (torch.Tensor) – sample set of size [n_samples x n_features] for which to determine the feature importance.

  • y (torch.Tensor, optional) – ground truth of size [n_samples] for sample set X, defaults to None

  • n_permutations (int, optional) – number of random permutations to perform for each feature, defaults to 10

Returns:

permutation importance score per feature

Return type:

torch.Tensor

Example:

train_set = (X_train, y_train)
test_set = (X_test, y_test)
model = DistPGBM()
model.train(train_set, objective, metric)
perm_importance_supervised = model.permutation_importance(X_test, y_test)  # Supervised
perm_importance_unsupervised = model.permutation_importance(X_test)  # Unsupervised
predict(X, parallel=True)[source]

Generate point estimates/forecasts for a given sample set X.

Parameters:
  • X (torch.Tensor) – sample set of size [n_samples x n_features] for which to create the estimates/forecasts.

  • parallel (boolean, optional) – compute predictions for all trees in parallel (True) or serial (False). Use False when experiencing out-of-memory errors.

Returns:

predictions of size [n_samples]

Return type:

torch.Tensor

Example:

yhat_test = model.predict(X_test)
predict_dist(X, n_forecasts=100, parallel=True, output_sample_statistics=False)[source]

Generate probabilistic estimates/forecasts for a given sample set X

Parameters:
  • X (torch.Tensor) – sample set of size [n_samples x n_features] for which to create the estimates/forecasts.

  • n_forecasts (int, optional) – number of estimates/forecasts to create, defaults to 100

  • parallel (boolean, optional) – compute predictions for all trees in parallel (True) or serial (False). Use False when experiencing out-of-memory errors.

  • output_sample_statistics (boolean, optional) – whether to also output the learned sample mean and variance. If True, the function will return a tuple (forecasts, mu, variance) with the latter arrays containing the learned mean and variance per sample that can be used to parameterize a distribution, defaults to False

Returns:

predictions of size [n_forecasts x n_samples]

Return type:

torch.Tensor

Example:

yhat_test = model.predict_dist(X_test)
save(filename)[source]

Save a DistPGBM model to a file. The model parameters are saved as numpy arrays and dictionaries.

Parameters:

filename (str) – location of model file

Returns:

dictionary saved in filename

Return type:

dictionary

Example:

model = DistPGBM()
model.train(train_set, objective, metric)
model.save('model.pt')
train(train_set, objective, metric, params=None, valid_set=None, sample_weight=None, eval_sample_weight=None)[source]

Train a DistPGBM model.

Parameters:
  • train_set (tuple) – sample set (X, y) of size ([n_training_samples x n_features], [n_training_samples])) on which to train the DistPGBM model, where X contains the features of the samples and y is the ground truth.

  • objective (function) – The objective function is the loss function that will be optimized during the gradient boosting process. The function should consume a numpy vector of predictions yhat and ground truth values y and output the gradient and hessian with respect to yhat of the loss function.

  • metric (function) – The metric function is the function that generates the error metric. The evaluation metric should consume a numpy vector of predictions yhat and ground truth values y, and output a scalar loss.

  • params (dictionary, optional) – Dictionary containing the learning parameters of a DistPGBM model, defaults to None.

  • valid_set (tuple, optional) – sample set (X, y) of size ([n_validation_samples x n_features], [n_validation_samples])) on which to validate the DistPGBM model, where X contains the features of the samples and y is the ground truth, defaults to None.

  • sample_weight (torch.Tensor, optional) – sample weights for the training data, defaults to None.

  • eval_sample_weight (torch.Tensor, optional) – sample weights for the validation data, defaults to None.

Returns:

self

Return type:

DistPGBM object

Example:

# Load packages
from pgbm import DistPGBM
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.datasets import fetch_california_housing
#%% Objective for pgbm
def mseloss_objective(yhat, y, sample_weight=None):
        gradient = (yhat - y)
        hessian = np.ones_like(yhat)

        return gradient, hessian

def rmseloss_metric(yhat, y, sample_weight=None):
        loss = torch.sqrt(torch.mean(torch.square(yhat - y)))

        return loss
#%% Load data
X, y = fetch_california_housing(return_X_y=True)
#%% Train pgbm
# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1)
train_data = (X_train, y_train)
# Train on set
model = DistPGBM()
model.train(train_data, objective=mseloss_objective, metric=rmseloss_metric)
class pgbm.torch.PGBM[source]

Bases: object

Probabilistic Gradient Boosting Machines (PGBM) (Python class)

PGBM fits a Probabilistic Gradient Boosting Machine regression model and returns point and probabilistic predictions.

This class uses Torch as backend.

Example:

from pgbm import PGBM
model = PGBM()
crps_ensemble(yhat_dist, y)[source]

Calculate the empirical Continuously Ranked Probability Score (CRPS) for a set of forecasts for a number of samples (lower is better).

Based on crps_ensemble from properscoring https://pypi.org/project/properscoring/

Parameters:
  • yhat_dist (torch.Tensor) – forecasts for each sample of size [n_forecasts x n_samples].

  • y (torch.Tensor) – ground truth value of each sample of size [n_samples].

Returns:

CRPS score for each sample

Return type:

torch.Tensor

Example:

train_set = (X_train, y_train)
test_set = (X_test, y_test)
model = PGBM()
model.train(train_set, objective, metric)
yhat_test_dist = model.predict_dist(X_test)
crps = model.crps_ensemble(yhat_test_dist, y_test)
load(filename, device=None)[source]

Load a PGBM model from a file.

Parameters:
  • filename (str) – location of model file

  • device (torch.device, optional) – torch device, defaults to torch.device(‘cpu’)

Returns:

self

Return type:

PGBM object

Example:

model = PGBM()
model.load('model.pt')
optimize_distribution(X, y, distributions=None, tree_correlations=None)[source]

Find the distribution and tree correlation that best fits the data according to lowest CRPS score.

The parameters ‘distribution’ and ‘tree_correlation’ of a PGBM model will be adjusted to the best values after running this script.

This function returns the best found distribution and tree correlation.

Parameters:
  • X (torch.Tensor) – sample set of size [n_samples x n_features] for which to optimize the distribution.

  • y (torch.Tensor) – ground truth of size [n_samples] for sample set X

  • distributions (list, optional) – list containing distributions to choose from. Options are: normal, studentt, laplace, logistic, lognormal, gamma, gumbel, weibull, negativebinomial, poisson. Defaults to None (corresponds to iterating over all distributions)

  • tree_correlations (torch.Tensor, optional) – vector containing tree correlations to use in optimization procedure, defaults to None (corresponds to iterating over a default range).

Returns:

distribution and tree correlation that yields lowest CRPS

Return type:

tuple

Example:

train_set = (X_train, y_train)
validation_set = (X_validation, y_validation)
model = PGBM()
(best_dist, best_tree_corr) = model.optimize_distribution(X_validation, y_validation)
permutation_importance(X, y=None, n_permutations=10, levels=None)[source]

Calculate feature importance of a PGBM model for a sample set X by randomly permuting each feature.

This function can be executed in a supervised and unsupervised manner, depending on whether y is given.

If y is provided, the output of this function is the change in error metric when randomly permuting a feature.

If y is not provided, the output is the weighted average change in prediction when randomly permuting a feature.

Parameters:
  • X (torch.Tensor) – sample set of size [n_samples x n_features] for which to determine the feature importance.

  • y (torch.Tensor, optional) – ground truth of size [n_samples] for sample set X, defaults to None

  • n_permutations (int, optional) – number of random permutations to perform for each feature, defaults to 10

Returns:

permutation importance score per feature

Return type:

torch.Tensor

Example:

train_set = (X_train, y_train)
test_set = (X_test, y_test)
model = PGBM()
model.train(train_set, objective, metric)
perm_importance_supervised = model.permutation_importance(X_test, y_test)  # Supervised
perm_importance_unsupervised = model.permutation_importance(X_test)  # Unsupervised
predict(X, parallel=True)[source]

Generate point estimates/forecasts for a given sample set X.

Parameters:
  • X (torch.Tensor) – sample set of size [n_samples x n_features] for which to create the estimates/forecasts.

  • parallel (boolean, optional) – compute predictions for all trees in parallel (True) or serial (False). Use False when experiencing out-of-memory errors.

Returns:

predictions of size [n_samples]

Return type:

torch.Tensor

Example:

yhat_test = model.predict(X_test)
predict_dist(X, n_forecasts=100, parallel=True, output_sample_statistics=False)[source]

Generate probabilistic estimates/forecasts for a given sample set X

Parameters:
  • X (torch.Tensor) – sample set of size [n_samples x n_features] for which to create the estimates/forecasts.

  • n_forecasts (int, optional) – number of estimates/forecasts to create, defaults to 100

  • parallel (boolean, optional) – compute predictions for all trees in parallel (True) or serial (False). Use False when experiencing out-of-memory errors.

  • output_sample_statistics (boolean, optional) – whether to also output the learned sample mean and variance. If True, the function will return a tuple (forecasts, mu, variance) with the latter arrays containing the learned mean and variance per sample that can be used to parameterize a distribution, defaults to False

Returns:

predictions of size [n_forecasts x n_samples]

Return type:

torch.Tensor

Example:

yhat_test = model.predict_dist(X_test)
save(filename)[source]

Save a PGBM model to a file. The model parameters are saved as numpy arrays and dictionaries.

Parameters:

filename (str) – location of model file

Returns:

dictionary saved in filename

Return type:

dictionary

Example:

model = PGBM()
model.train(train_set, objective, metric)
model.save('model.pt')
train(train_set, objective, metric, params=None, valid_set=None, sample_weight=None, eval_sample_weight=None)[source]

Train a PGBM model.

Parameters:
  • train_set (tuple) – sample set (X, y) of size ([n_training_samples x n_features], [n_training_samples])) on which to train the PGBM model, where X contains the features of the samples and y is the ground truth.

  • objective (function) – The objective function is the loss function that will be optimized during the gradient boosting process. The function should consume a numpy vector of predictions yhat and ground truth values y and output the gradient and hessian with respect to yhat of the loss function.

  • metric (function) – The metric function is the function that generates the error metric. The evaluation metric should consume a numpy vector of predictions yhat and ground truth values y, and output a scalar loss.

  • params (dictionary, optional) – Dictionary containing the learning parameters of a PGBM model, defaults to None.

  • valid_set (tuple, optional) – sample set (X, y) of size ([n_validation_samples x n_features], [n_validation_samples])) on which to validate the PGBM model, where X contains the features of the samples and y is the ground truth, defaults to None.

  • sample_weight (torch.Tensor, optional) – sample weights for the training data, defaults to None.

  • eval_sample_weight (torch.Tensor, optional) – sample weights for the validation data, defaults to None.

Returns:

self

Return type:

PGBM object

Example:

# Load packages
from pgbm import PGBM
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.datasets import fetch_california_housing
#%% Objective for pgbm
def mseloss_objective(yhat, y, sample_weight=None):
        gradient = (yhat - y)
        hessian = np.ones_like(yhat)

        return gradient, hessian

def rmseloss_metric(yhat, y, sample_weight=None):
        loss = torch.sqrt(torch.mean(torch.square(yhat - y)))

        return loss
#%% Load data
X, y = fetch_california_housing(return_X_y=True)
#%% Train pgbm
# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1)
train_data = (X_train, y_train)
# Train on set
model = PGBM()
model.train(train_data, objective=mseloss_objective, metric=rmseloss_metric)
class pgbm.torch.PGBMRegressor(objective='mse', metric='rmse', max_leaves=32, learning_rate=0.1, n_estimators=100, min_split_gain=0.0, min_data_in_leaf=3, bagging_fraction=1, feature_fraction=1, max_bin=256, reg_lambda=1.0, random_state=2147483647, device='cpu', gpu_device_id=0, derivatives='exact', distribution='normal', checkpoint=False, tree_correlation=None, monotone_constraints=None, monotone_iterations=1, verbose=2, init_model=None)[source]

Bases: BaseEstimator

Probabilistic Gradient Boosting Machines (PGBM) Regressor (Scikit-learn wrapper)

PGBMRegressor fits a Probabilistic Gradient Boosting Machine regression model and returns point and probabilistic predictions.

This class uses Torch as backend.

Parameters:
  • objective (str or function, default='mse') –

    Objective function to minimize. If not mse, a user defined objective function should be supplied in the form of:

    def objective(y, yhat, sample_weight=None):
            [......]
    
            return gradient, hessian
    

    in which gradient and hessian are of the same shape as y.

  • metric (str or function, default='rmse') –

    Metric to evaluate predictions during training and evaluation. If not rmse, a user defined metric function should be supplied in the form of:

    def metric(y, yhat, sample_weight=None):
            [......]
    
            return metric
    

    in which metric is a scalar.

  • max_leaves (int, default=32) – The maximum number of leaves per tree. Increase this value to create more complicated trees, and reduce the value to create simpler trees (reduce overfitting).

  • learning_rate (float, default=0.1) – The learning rate of the algorithm; the amount of each new tree prediction that should be added to the ensemble.

  • n_estimators (int, default=100, constraint>0) – The number of trees to create. Typically setting this value higher may improve performance, at the expense of training speed and potential for overfit. Use in conjunction with learning rate and max_leaves; more trees generally requires a lower learning_rate and/or a lower max_leaves.

  • min_split_gain (float, default = 0.0, constraint >= 0.0) – The minimum gain for a node to split when building a tree.

  • min_data_in_leaf (int, default=3, constraint>=2) – The minimum number of samples in a leaf of a tree. Increase this value to reduce overfit.

  • bagging_fraction (float, default=1, constraint>0, constraint<=1) – Fraction of samples to use when building a tree. Set to a value between 0 and 1 to randomly select a portion of samples to construct each new tree. A lower fraction speeds up training (and can be used to deal with out-of-memory issues when training on GPU) and may reduce overfit.

  • feature_fraction (float, default=1, constraint>0, constraint<=1) – Fraction of features to use when building a tree. Set to a value between 0 and 1 to randomly select a portion of features to construct each new tree. A lower fraction speeds up training (and can be used to deal with out-of-memory issues when training on GPU) and may reduce overfit.

  • max_bin (int, default=256, constraint<=32,767) – The maximum number of bins used to bin continuous features. Increasing this value can improve prediction performance, at the cost of training speed and potential overfit.

  • reg_lambda (float, default=1.0, constraint>0) – Regularization parameter.

  • random_state (int, default=2147483647) – Random seed to use for feature_fraction and bagging_fraction.

  • device (str, default=`cpu`) – Choose from cpu or gpu. Set Torch training device.

  • gpu_device_id (int, default=0) – id of gpu device in case multiple gpus are present in the system, defaults to 0.

  • derivatives (str, default=`exact`) – Choose from exact or approx. Determines whether to compute the derivatives exactly or approximately. If exact, PGBMRegressor expects a loss function that outputs a gradient and hessian vector of size [n_training_samples]. If approx, PGBMRegressor expects a loss function with a scalar output.

  • distribution (str, default=`normal`) – Choice of output distribution for probabilistic predictions. Options are: normal, studentt, laplace, logistic, lognormal, gamma, gumbel, weibull, negativebinomial, poisson.

  • checkpoint (bool, default=`False`) – Set to True to save a model checkpoint after each iteration to the current working directory.

  • tree_correlation (float, default=np.log_10(n_samples_train)/100) – Tree correlation hyperparameter. This controls the amount of correlation we assume to exist between each subsequent tree in the ensemble.

  • monotone_constraints (List or torch.Tensor) – List detailing monotone constraints for each feature in the dataset, where 0 represents no constraint, 1 a positive monotone constraint, and -1 a negative monotone constraint. For example, for a dataset with 3 features, this parameter could be [1, 0, -1] for respectively a positive, none and negative monotone contraint on feature 1, 2 and 3.

  • monotone_iterations (int, default=1) – The number of alternative splits that will be considered if a monotone constraint is violated by the current split proposal. Increase this to improve accuracy at the expense of training speed.

  • verbose (int, default=2) – Flag to output metric results for each iteration. Set to 1 to supress output.

  • init_model (str, default=None) – Path to an initial model for which continual training is desired. The model will use the parameters from the initial model.

Returns:

self

Return type:

PGBM object

Example:

from pgbm import PGBMRegressor
model = PGBMRegressor()
crps_ensemble(yhat_dist, y)[source]

Calculate the empirical Continuously Ranked Probability Score (CRPS) for a set of forecasts for a number of samples (lower is better).

Based on crps_ensemble from properscoring https://pypi.org/project/properscoring/

Parameters:
  • yhat_dist (np.array) – forecasts for each sample of size [n_forecasts x n_samples].

  • y (np.array) – ground truth value of each sample of size [n_samples].

Returns:

CRPS score for each sample

Return type:

np.array

Example:

model = PGBMRegressor()
model.fit(X_train, y_train)
yhat_test_dist = model.predict_dist(X_test)
crps = model.crps_ensemble(yhat_test_dist, y_test)
fit(X, y, eval_set=None, sample_weight=None, eval_sample_weight=None, early_stopping_rounds=None)[source]

Fit a PGBMRegressor model.

Parameters:
  • X (torch.Tensor) – sample set of size [n_training_samples x n_features]

  • y (torch.Tensor) – ground truth of size [n_training_samples] for sample set X

  • eval_set (tuple, optional) – validation set of size ([n_validation_samples x n_features], [n_validation_samples]), defaults to None

  • sample_weight (torch.Tensor, optional) – sample weights for the training data, defaults to None

  • eval_sample_weight (torch.Tensor, optional) – sample weights for the eval_set, defaults to None

  • early_stopping_rounds (int, optional) – stop training if metric on the eval_set has not improved for early_stopping_rounds, defaults to None

Returns:

self

Return type:

fitted PGBM object

Example:

from pgbm import PGBMRegressor
from sklearn.model_selection import train_test_split
from sklearn.datasets import fetch_california_housing
X, y = fetch_california_housing(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1)
model = PGBMRegressor()
model.fit(X_train, y_train)
predict(X, parallel=True)[source]

Generate point estimates/forecasts for a given sample set X.

Parameters:
  • X (torch.Tensor) – sample set of size [n_samples x n_features] for which to create the estimates/forecasts.

  • parallel (boolean, optional) – compute predictions for all trees in parallel (True) or serial (False). Use False when experiencing out-of-memory errors.

Returns:

predictions of size [n_samples]

Return type:

np.array

Example:

yhat_test = model.predict(X_test)
predict_dist(X, n_forecasts=100, parallel=True, output_sample_statistics=False)[source]

Generate probabilistic estimates/forecasts for a given sample set X

Parameters:
  • X (torch.Tensor) – sample set of size [n_samples x n_features] for which to create the estimates/forecasts.

  • n_forecasts (int, optional) – number of estimates/forecasts to create, defaults to 100

  • parallel (boolean, optional) – compute predictions for all trees in parallel (True) or serial (False). Use False when experiencing out-of-memory errors.

  • output_sample_statistics (boolean, optional) – whether to also output the learned sample mean and variance. If True, the function will return a tuple (forecasts, mu, variance) with the latter arrays containing the learned mean and variance per sample that can be used to parameterize a distribution, defaults to False

Returns:

predictions of size [n_forecasts x n_samples]

Return type:

np.array

Example:

yhat_test = model.predict_dist(X_test)
rmseloss_metric(yhat, y, sample_weight=None)[source]

Root Mean Squared Error Loss

Parameters:
  • yhat (np.array) – forecasts for each sample of size [n_samples].

  • y (np.array) – ground truth value of each sample of size [n_samples].

  • sample_weight (np.array) – sample weights of size [n_samples].

Returns:

RMSE

Return type:

float

Example:

model = PGBMRegressor()
model.fit(X_train, y_train)
yhat_test = model.predict(X_test)
rmse = model.rmseloss_metric(yhat_test, y_test)
save(filename)[source]

Save a fitted PGBM model to a file. The model parameters are saved as numpy arrays and dictionaries.

Parameters:

filename (str) – location of model file

Returns:

dictionary saved in filename

Return type:

dictionary

Example:

model = PGBMRegressor()
model.fit(X, y)
model.save('model.pt')
score(X, y, sample_weight=None, parallel=True)[source]

Compute R2 score of fitted PGBMRegressor

Parameters:
  • X (torch.Tensor) – sample set of size [n_samples x n_features] for which to create the estimates/forecasts.

  • y (torch.Tensor) – ground truth of size [n_samples]

  • sample_weight (torch.Tensor, optional :param parallel: compute predictions for all trees in parallel (True) or serial (False). Use False when experiencing out-of-memory errors. :type parallel: boolean, optional) – sample weights, defaults to None

Returns:

R2 score

Return type:

float

Example

model = PGBMRegressor()
model.fit(X_train, y_train)
r2_score = model.score(X_test, y_test)

Scikit-learn backend

The Scikit-learn backend is exposed by importing pgbm.sklearn.