classification

class silk_ml.classification.Classifier(target=None, filename=None, target_name=None)[source]

Bases: object

General tasks for classification and data analysis

Parameters
  • target (str or None) – Categorical variable to classify

  • filename (str or None) – Name with path for reading a csv file

  • target_name (str or None) – Target name for reports

read_csv(target, filename)[source]

Reads a CSV file and separate the X and Y variables

Parameters
  • target (str) – Categorical variable to classify

  • filename (str) – Name with path for reading a csv file

Returns

X, Y, and data values

Return type

list(pd.DataFrame)

standardize(normalizer, scaler)[source]

Applies a normalizer and scaler preprocessing steps

Parameters
  • normalizer (Class.fit_transform) – Class that centers the data

  • scaler (Class.fit_transform) – Class that modifies the data boundaries

features_metrics(plot=None)[source]

Checks for each variable the probability of being splited

Parameters

plot ('all' or 'categorical' or 'numerical' or None) – Plots the variables, showing the difference in the classes

Returns

Table of variables and their classification tests

Return type

pd.DataFrame

remove_features(features)[source]

Remove features from the X values

Parameters

features (list(str)) – Column’s names to remove

resample(rate=0.9, strategy='hybrid')[source]

Sampling based methods to balance dataset

Parameters
  • rate (float) – Ratio of the number of samples in the minority class over the number of samples in the majority class after resampling

  • strategy ('hybrid' or 'over_sampling' or 'under_sampling') – Strategy to balance the dataset

cross_validation(models, scores, folds=30)[source]

Validates several models and scores

Parameters
  • models (list(tuple)) – Models to evaluate

  • scores (list(tuple)) – Scores to measure the models

  • folds (int) – Number of folds in a (Stratified)KFold

plot_corr(values=True)[source]

Plots the correlation matrix

Parameters

values (bool) – Shows each of the correlation values

plot_mainfold(method)[source]

Plots the reduced space using a mainfold transformation

Parameters

method (Class.fit_transform) – Mainfold transformation method

plot_roc_cross_val(models)[source]

Plots all the models with their ROC

Parameters

models (list(tuple)) – Models to evaluate