Datasets¶
-
lazygrid.datasets.
fetch_datasets
(output_dir: str = './data', update_data: bool = False, min_classes: int = 0, task: str = 'classification', max_samples: int = inf, max_features: int = inf) → pandas.core.frame.DataFrame¶ Load OpenML data sets compatible with the requirements.
Parameters: - output_dir – Directory where the .csv file will be stored
- update_data – If True it deletes cached data sets and downloads their latest version; otherwise it loads data sets as specified inside the cache
- min_classes – Minimum number of classes required for each data set
- task – Classification or regression
- max_samples – Maximum number of samples required for each data set
- max_features – Maximum number of features required for each data set
Returns: Information required to load the latest version of each data set
Return type: Dataframe
Examples
>>> import lazygrid as lg >>> >>> datasets = lg.datasets.fetch_datasets(task="classification", min_classes=2, max_samples=1000, max_features=10) >>> datasets.loc["iris"] version 45 did 42098 n_samples 150 n_features 4 n_classes 3 Name: iris, dtype: int64
-
lazygrid.datasets.
load_npy_dataset
(path_x: str, path_y: str) -> (<class 'numpy.ndarray'>, <class 'numpy.ndarray'>, <class 'int'>)¶ Load npy data set.
Parameters: - path_x – Path to data matrix
- path_y – Path to data labels
Returns: Data matrix, data labels, and number of classes
Return type: Tuple
Examples
>>> import os >>> from sklearn.datasets import make_classification >>> import numpy as np >>> import lazygrid as lg >>> >>> x, y = make_classification(random_state=42) >>> >>> path_x, path_y = "x.npy", "y.npy" >>> np.save(path_x, x) >>> np.save(path_y, y) >>> >>> x, y, n_classes = lg.datasets.load_npy_dataset(path_x, path_y)
-
lazygrid.datasets.
load_openml_dataset
(data_id: int = None, dataset_name: str = None) -> (<class 'numpy.ndarray'>, <class 'numpy.ndarray'>, <class 'int'>)¶ Load OpenML data set.
Parameters: - data_id – Data set identifier
- dataset_name – Data set name
Returns: Data matrix, data labels, and number of classes
Return type: Tuple
Examples
>>> import lazygrid as lg >>> >>> x, y, n_classes = lg.datasets.load_openml_dataset(dataset_name="iris") >>> n_classes 3