Lazy Estimator¶
-
class
lazygrid.lazy_estimator.
LazyPipeline
(steps, database: str = './database/', verbose: bool = False)¶ A LazyPipeline estimator.
A lazy pipeline is a sklearn-like pipeline that follows the memoization paradigm. Once the pipeline has been fitted, its steps are pickled and stored in a local database. Therefore, when the program starts again, the pipeline will fetch its fitted steps from the database and will skip the fit operation.
Parameters: - steps – List of (name, transform) tuples (implementing fit/transform) that are chained, in the order in which they are chained, with the last object an estimator.
- database – Used to cache the fitted transformers of the pipeline. It is the path to the database directory. Caching the transformers is advantageous when fitting is time consuming.
- verbose (bool, default=False) – If True, the time elapsed while fitting each step will be printed as it is completed.
-
named_steps
¶ Read-only attribute to access any step parameter by user given name. Keys are step names and values are steps parameters.
Type: bunch object, a dictionary with attribute access
See also
Examples
>>> from sklearn import svm >>> from sklearn.datasets import make_classification >>> from sklearn.feature_selection import SelectKBest >>> from sklearn.feature_selection import f_regression >>> from lazygrid.lazy_estimator import LazyPipeline >>> import pandas as pd >>> # generate some data to play with >>> X, y = make_classification( ... n_informative=5, n_redundant=0, random_state=42) >>> X = pd.DataFrame(X) >>> # ANOVA SVM-C >>> anova_filter = SelectKBest(f_regression, k=5) >>> clf = svm.SVC(kernel='linear') >>> anova_svm = LazyPipeline([('anova', anova_filter), ('svc', clf)]) >>> # You can set the parameters using the names issued >>> # For instance, fit using a k of 10 in the SelectKBest >>> # and a parameter 'C' of the svm >>> anova_svm.set_params(anova__k=10, svc__C=.1).fit(X, y) Pipeline(steps=[('anova', SelectKBest(...)), ('svc', SVC(...))]) >>> prediction = anova_svm.predict(X) >>> anova_svm.score(X, y) 0.83 >>> # getting the selected features chosen by anova_filter >>> anova_svm['anova'].get_support() array([False, False, True, True, False, False, True, True, False, True, False, True, True, False, True, False, True, True, False, False]) >>> # Another way to get selected features chosen by anova_filter >>> anova_svm.named_steps.anova.get_support() array([False, False, True, True, False, False, True, True, False, True, False, True, True, False, True, False, True, True, False, False]) >>> # Indexing can also be used to extract a sub-pipeline. >>> sub_pipeline = anova_svm[:1] >>> sub_pipeline Pipeline(steps=[('anova', SelectKBest(...))]) >>> coef = anova_svm[-1].coef_ >>> anova_svm['svc'] is anova_svm[-1] True >>> coef.shape (1, 10) >>> sub_pipeline.inverse_transform(coef).shape (1, 20)
-
fit
(X: pandas.core.frame.DataFrame, y: Iterable = None, **fit_params)¶ Fit the model
Fit all the transforms one after the other and transform the data, then fit the transformed data using the final estimator.
Parameters: - X – Training data. Must fulfill input requirements of first step of the pipeline.
- y – Training targets. Must fulfill label requirements for all steps of the pipeline.
- **fit_params (dict of string -> object) – Parameters passed to the
fit
method of each step, where each parameter name is prefixed such that parameterp
for steps
has keys__p
.
Returns: This estimator
Return type: self