imbalanced

silk_ml.imbalanced.resample(X, Y, rate=0.9, strategy='hybrid')[source]

Sampling based methods to balance dataset

Parameters
  • X (pd.DataFrame) – Main dataset with the variables

  • Y (pd.Series) – Target variable

  • rate (float) – Ratio of the number of samples in the minority class over the number of samples in the majority class after resampling

  • strategy ('hybrid' | 'over_sampling' | 'under_sampling') – Strategy to balance the dataset