Lazy Estimator¶

lazygrid.lazy_estimator

class lazygrid.lazy_estimator.LazyPipeline(steps, database: str = './database/', verbose: bool = False)¶

A LazyPipeline estimator.

A lazy pipeline is a sklearn-like pipeline that follows the memoization paradigm. Once the pipeline has been fitted, its steps are pickled and stored in a local database. Therefore, when the program starts again, the pipeline will fetch its fitted steps from the database and will skip the fit operation.

Parameters:

steps – List of (name, transform) tuples (implementing fit/transform) that are chained, in the order in which they are chained, with the last object an estimator.
database – Used to cache the fitted transformers of the pipeline. It is the path to the database directory. Caching the transformers is advantageous when fitting is time consuming.
verbose (bool, default=False) – If True, the time elapsed while fitting each step will be printed as it is completed.

named_steps¶

Read-only attribute to access any step parameter by user given name. Keys are step names and values are steps parameters.

Type:	bunch object, a dictionary with attribute access

See also

sklearn.pipeline.Pipeline

Examples

>>> from sklearn import svm
>>> from sklearn.datasets import make_classification
>>> from sklearn.feature_selection import SelectKBest
>>> from sklearn.feature_selection import f_regression
>>> from lazygrid.lazy_estimator import LazyPipeline
>>> import pandas as pd
>>> # generate some data to play with
>>> X, y = make_classification(
...     n_informative=5, n_redundant=0, random_state=42)
>>> X = pd.DataFrame(X)
>>> # ANOVA SVM-C
>>> anova_filter = SelectKBest(f_regression, k=5)
>>> clf = svm.SVC(kernel='linear')
>>> anova_svm = LazyPipeline([('anova', anova_filter), ('svc', clf)])
>>> # You can set the parameters using the names issued
>>> # For instance, fit using a k of 10 in the SelectKBest
>>> # and a parameter 'C' of the svm
>>> anova_svm.set_params(anova__k=10, svc__C=.1).fit(X, y)
Pipeline(steps=[('anova', SelectKBest(...)), ('svc', SVC(...))])
>>> prediction = anova_svm.predict(X)
>>> anova_svm.score(X, y)
0.83
>>> # getting the selected features chosen by anova_filter
>>> anova_svm['anova'].get_support()
array([False, False,  True,  True, False, False,  True,  True, False,
       True, False,  True,  True, False,  True, False,  True,  True,
       False, False])
>>> # Another way to get selected features chosen by anova_filter
>>> anova_svm.named_steps.anova.get_support()
array([False, False,  True,  True, False, False,  True,  True, False,
       True, False,  True,  True, False,  True, False,  True,  True,
       False, False])
>>> # Indexing can also be used to extract a sub-pipeline.
>>> sub_pipeline = anova_svm[:1]
>>> sub_pipeline
Pipeline(steps=[('anova', SelectKBest(...))])
>>> coef = anova_svm[-1].coef_
>>> anova_svm['svc'] is anova_svm[-1]
True
>>> coef.shape
(1, 10)
>>> sub_pipeline.inverse_transform(coef).shape
(1, 20)

fit(X: pandas.core.frame.DataFrame, y: Iterable = None, **fit_params)¶

Fit the model

Fit all the transforms one after the other and transform the data, then fit the transformed data using the final estimator.

Parameters:	X – Training data. Must fulfill input requirements of first step of the pipeline. y – Training targets. Must fulfill label requirements for all steps of the pipeline. *fit_params (dict of string -> object*) – Parameters passed to the `fit` method of each step, where each parameter name is prefixed such that parameter `p` for step `s` has key `s__p`.
Returns:	This estimator
Return type:	self