Lazy Estimator

lazygrid.lazy_estimator

class lazygrid.lazy_estimator.LazyPipeline(steps, database: str = './database/', verbose: bool = False)

A LazyPipeline estimator.

A lazy pipeline is a sklearn-like pipeline that follows the memoization paradigm. Once the pipeline has been fitted, its steps are pickled and stored in a local database. Therefore, when the program starts again, the pipeline will fetch its fitted steps from the database and will skip the fit operation.

Parameters:
  • steps – List of (name, transform) tuples (implementing fit/transform) that are chained, in the order in which they are chained, with the last object an estimator.
  • database – Used to cache the fitted transformers of the pipeline. It is the path to the database directory. Caching the transformers is advantageous when fitting is time consuming.
  • verbose (bool, default=False) – If True, the time elapsed while fitting each step will be printed as it is completed.
named_steps

Read-only attribute to access any step parameter by user given name. Keys are step names and values are steps parameters.

Type:bunch object, a dictionary with attribute access

Examples

>>> from sklearn import svm
>>> from sklearn.datasets import make_classification
>>> from sklearn.feature_selection import SelectKBest
>>> from sklearn.feature_selection import f_regression
>>> from lazygrid.lazy_estimator import LazyPipeline
>>> import pandas as pd
>>> # generate some data to play with
>>> X, y = make_classification(
...     n_informative=5, n_redundant=0, random_state=42)
>>> X = pd.DataFrame(X)
>>> # ANOVA SVM-C
>>> anova_filter = SelectKBest(f_regression, k=5)
>>> clf = svm.SVC(kernel='linear')
>>> anova_svm = LazyPipeline([('anova', anova_filter), ('svc', clf)])
>>> # You can set the parameters using the names issued
>>> # For instance, fit using a k of 10 in the SelectKBest
>>> # and a parameter 'C' of the svm
>>> anova_svm.set_params(anova__k=10, svc__C=.1).fit(X, y)
Pipeline(steps=[('anova', SelectKBest(...)), ('svc', SVC(...))])
>>> prediction = anova_svm.predict(X)
>>> anova_svm.score(X, y)
0.83
>>> # getting the selected features chosen by anova_filter
>>> anova_svm['anova'].get_support()
array([False, False,  True,  True, False, False,  True,  True, False,
       True, False,  True,  True, False,  True, False,  True,  True,
       False, False])
>>> # Another way to get selected features chosen by anova_filter
>>> anova_svm.named_steps.anova.get_support()
array([False, False,  True,  True, False, False,  True,  True, False,
       True, False,  True,  True, False,  True, False,  True,  True,
       False, False])
>>> # Indexing can also be used to extract a sub-pipeline.
>>> sub_pipeline = anova_svm[:1]
>>> sub_pipeline
Pipeline(steps=[('anova', SelectKBest(...))])
>>> coef = anova_svm[-1].coef_
>>> anova_svm['svc'] is anova_svm[-1]
True
>>> coef.shape
(1, 10)
>>> sub_pipeline.inverse_transform(coef).shape
(1, 20)
fit(X: pandas.core.frame.DataFrame, y: Iterable = None, **fit_params)

Fit the model

Fit all the transforms one after the other and transform the data, then fit the transformed data using the final estimator.

Parameters:
  • X – Training data. Must fulfill input requirements of first step of the pipeline.
  • y – Training targets. Must fulfill label requirements for all steps of the pipeline.
  • **fit_params (dict of string -> object) – Parameters passed to the fit method of each step, where each parameter name is prefixed such that parameter p for step s has key s__p.
Returns:

This estimator

Return type:

self