ML-Ensemble – high performance ensemble learning

Overview



Python 3.7 Python 2.7 PyPI version Build Status Build status Code Health Coverage Status License DOI

A Python library for high performance ensemble learning

ML-Ensemble combines a Scikit-learn high-level API with a low-level computational graph framework to build memory efficient, maximally parallelized ensemble networks in as few lines of codes as possible.

ML-Ensemble is thread safe as long as base learners are and can fall back on memory mapped multiprocessing for memory-neutral process-based concurrency. For tutorials and full documentation, visit the project website.

Ensembles as computational graphs

An ensemble is built on top of a computational graph, allowing users great design freedom. Ensembles can be built with recursion, dynamic evaluation (e.g. if-else) and much more. A high-level API wraps common ensemble architectures into Scikit-learn estimators.



Example computational graph of a layer in an ensemble

Memory efficient parallelized learning

ML-Ensemble is optimized for speed and minimal memory consumption. No serialization of data takes place, regardless of whether multithreading or multiprocessing is used. Additionally, multithreading is pickle-free.

Ease of use

Ready-made ensembles are built by adding layers to an instance. No matter how complex the ensemble, to train it call the fit method:

ensemble = Subsemble()

# First layer
ensemble.add(list_of_estimators)

# Second layer
ensemble.add(list_of_estimators)

# Final meta estimator
ensemble.add_meta(estimator)

# Train ensemble
ensemble.fit(X, y)

Similarly, it's straightforward to modify an existing ensemble:

# Remove layer
ensemble.remove(2)

# Change layer parameters
ensemble.replace(0, new_list_of_estimators)

And to create differentiated preprocessing pipelines for different subsets of estimators within a given layer, simple pass a mapping to the add method:

preprocessing = {'pipeline-1': list_of_transformers_1,
                 'pipeline-2': list_of_transformers_2}

estimators = {'pipeline-1': list_of_estimators_1,
              'pipeline-2': list_of_estimators_2}

ensemble.add(estimators, preprocessing)

Dedicated diagnostics

ML Ensemble implements a dedicated diagnostics and model selection suite for intuitive and speedy ensemble evaluation. The Evaluator allows you to evaluate several preprocessing pipelines and several estimators in one go, giving you birds-eye view of how different candidates for the ensemble perform.

Moreover, entire ensembles can be used as a preprocessing pipeline, to leverage model selection for higher-level layers. Simply set model_selection to True in the ensemble (don't forget to turn it off when done).

preprocessing_dict = {'pipeline-1': list_of_transformers_1,
                      'pipeline-2': list_of_transformers_2}


evaluator = Evaluator(scorer=score_func)
evaluator.fit(X, y, list_of_estimators, param_dists_dict, preprocessing_dict)

All ensembles and model selection instances provides summary statistics in tabular format. You can find fit and prediction times in any ensemble by calling data, and cv-scores if you passed a scorer. For the model selection suite, the results attribute gives you the outcome of an evaluation::

              fit_time_mean  fit_time_std  train_score_mean  train_score_std  test_score_mean  test_score_std               params
prep-1 est-1       0.001353      0.001316          0.957037         0.005543         0.960000        0.032660                   {}
       est-2       0.000447      0.000012          0.980000         0.004743         0.966667        0.033333  {'n_neighbors': 15}
prep-2 est-1       0.001000      0.000603          0.957037         0.005543         0.960000        0.032660                   {}
       est-2       0.000448      0.000036          0.965185         0.003395         0.960000        0.044222   {'n_neighbors': 8}
prep-3 est-1       0.000735      0.000248          0.791111         0.019821         0.780000        0.133500                   {}
       est-2       0.000462      0.000143          0.837037         0.014815         0.800000        0.126491   {'n_neighbors': 9}

Install

PyPI

ML-Ensemble is available on PyPI. Install with

pip install mlens

Bleeding edge

Fork the GitHub repository:

git clone https://github.com/flennerhag/mlens.git; cd mlens;
python setup.py install

Citation

For scientific publication, ML-Ensemble can be cited as

@misc{flennerhag:2017mlens,
  author = {Flennerhag, Sebastian},
  title  = {ML-Ensemble},
  month  = nov,
  year   = 2017,
  doi    = {10.5281/zenodo.1042144},
  url    = {https://dx.doi.org/10.5281/zenodo.1042144}
}

Questions

Please see issue tracker.

Contribute

ML-Ensemble is an open-source project that welcome any contribution, small as large. Bug fixes and minor improvements can be pulled as is; larger PRs need to be unit tested. We generally follow the PEP-8 style guide.

License

MIT License

Copyright (c) 2017–2020 Sebastian Flennerhag

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Comments
  • [WIP] 0.2.0

    [WIP] 0.2.0

    ML-Ensemble 0.2.0

    Changes:

    • Low-level computational graph API
    • New Learner and Transformer classes as nodes
    • New Group instance to bind learners and transformers horizontally
    • New Layer API that based on a stack of groups
    • ParallelProcessing context manager
    • Generalized backend to generic stack and map operations. Learner, Transformer, Group, Layer and Sequential are acceptable inputs.
    • Data class for metrics collected during fitting
    • EnsembleTransformer deprecated, can now use any ensemble as is via model_selection parameter
    opened by flennerhag 21
  • KerasClassifier

    KerasClassifier "can't pickle _thread.RLock objects" message when predicting

    I'm able to fit a model that includes KerasClassifier as a model in the ensemble. However, at prediction time I get the following error. I've tried changing the backend as well as the number of jobs but to no avail. Any ideas?

    `--------------------------------------------------------------------------- TypeError Traceback (most recent call last) in () ----> 1 preds = ensemble.predict_proba(X[294:])

    D:\Continuum\anaconda3\lib\site-packages\mlens\ensemble\base.py in predict_proba(self, X, **kwargs) 633 """ 634 kwargs.pop('proba', None) --> 635 return self.predict(X, proba=True, **kwargs) 636 637 def _build_layer(self, estimators, indexer, preprocessing, **kwargs):

    D:\Continuum\anaconda3\lib\site-packages\mlens\ensemble\base.py in predict(self, X, **kwargs) 613 return 614 X, _ = check_inputs(X, check_level=self.array_check) --> 615 return self._backend.predict(X, **kwargs) 616 617 def predict_proba(self, X, **kwargs):

    D:\Continuum\anaconda3\lib\site-packages\mlens\ensemble\base.py in predict(self, X, **kwargs) 199 predictions from final layer. 200 """ --> 201 if not self.fitted: 202 NotFittedError("Instance not fitted.") 203

    D:\Continuum\anaconda3\lib\site-packages\mlens\parallel\base.py in fitted(self) 359 if not self.stack or not self._check_static_params(): 360 return False --> 361 return all([g.fitted for g in self.stack]) 362 363 @property

    D:\Continuum\anaconda3\lib\site-packages\mlens\parallel\base.py in (.0) 359 if not self.stack or not self._check_static_params(): 360 return False --> 361 return all([g.fitted for g in self.stack]) 362 363 @property

    D:\Continuum\anaconda3\lib\site-packages\mlens\parallel\base.py in fitted(self) 359 if not self.stack or not self._check_static_params(): 360 return False --> 361 return all([g.fitted for g in self.stack]) 362 363 @property

    D:\Continuum\anaconda3\lib\site-packages\mlens\parallel\base.py in (.0) 359 if not self.stack or not self._check_static_params(): 360 return False --> 361 return all([g.fitted for g in self.stack]) 362 363 @property

    D:\Continuum\anaconda3\lib\site-packages\mlens\parallel\handles.py in fitted(self) 256 if not self._check_static_params(): 257 return False --> 258 return all([o.fitted for o in self.learners + self.transformers]) 259 260 def get_params(self, deep=True):

    D:\Continuum\anaconda3\lib\site-packages\mlens\parallel\handles.py in (.0) 256 if not self._check_static_params(): 257 return False --> 258 return all([o.fitted for o in self.learners + self.transformers]) 259 260 def get_params(self, deep=True):

    D:\Continuum\anaconda3\lib\site-packages\mlens\parallel\learner.py in fitted(self) 743 # Check estimator param overlap 744 fitted = self.learner + self.sublearners --> 745 fitted_params = fitted[0].estimator.get_params(deep=True) 746 model_estimator_params = self.estimator.get_params(deep=True) 747

    D:\Continuum\anaconda3\lib\site-packages\mlens\parallel\learner.py in estimator(self) 64 def estimator(self): 65 """Deep copy of estimator""" ---> 66 return deepcopy(self._estimator) 67 68 @estimator.setter

    D:\Continuum\anaconda3\lib\copy.py in deepcopy(x, memo, _nil) 178 y = x 179 else: --> 180 y = _reconstruct(x, memo, *rv) 181 182 # If is its own copy, don't memoize.

    D:\Continuum\anaconda3\lib\copy.py in _reconstruct(x, memo, func, args, state, listiter, dictiter, deepcopy) 278 if state is not None: 279 if deep: --> 280 state = deepcopy(state, memo) 281 if hasattr(y, 'setstate'): 282 y.setstate(state)

    D:\Continuum\anaconda3\lib\copy.py in deepcopy(x, memo, _nil) 148 copier = _deepcopy_dispatch.get(cls) 149 if copier: --> 150 y = copier(x, memo) 151 else: 152 try:

    D:\Continuum\anaconda3\lib\copy.py in _deepcopy_dict(x, memo, deepcopy) 238 memo[id(x)] = y 239 for key, value in x.items(): --> 240 y[deepcopy(key, memo)] = deepcopy(value, memo) 241 return y 242 d[dict] = _deepcopy_dict

    D:\Continuum\anaconda3\lib\copy.py in deepcopy(x, memo, _nil) 178 y = x 179 else: --> 180 y = _reconstruct(x, memo, *rv) 181 182 # If is its own copy, don't memoize.

    D:\Continuum\anaconda3\lib\copy.py in _reconstruct(x, memo, func, args, state, listiter, dictiter, deepcopy) 278 if state is not None: 279 if deep: --> 280 state = deepcopy(state, memo) 281 if hasattr(y, 'setstate'): 282 y.setstate(state)

    D:\Continuum\anaconda3\lib\copy.py in deepcopy(x, memo, _nil) 148 copier = _deepcopy_dispatch.get(cls) 149 if copier: --> 150 y = copier(x, memo) 151 else: 152 try:

    D:\Continuum\anaconda3\lib\copy.py in _deepcopy_dict(x, memo, deepcopy) 238 memo[id(x)] = y 239 for key, value in x.items(): --> 240 y[deepcopy(key, memo)] = deepcopy(value, memo) 241 return y 242 d[dict] = _deepcopy_dict

    D:\Continuum\anaconda3\lib\copy.py in deepcopy(x, memo, _nil) 178 y = x 179 else: --> 180 y = _reconstruct(x, memo, *rv) 181 182 # If is its own copy, don't memoize.

    D:\Continuum\anaconda3\lib\copy.py in _reconstruct(x, memo, func, args, state, listiter, dictiter, deepcopy) 278 if state is not None: 279 if deep: --> 280 state = deepcopy(state, memo) 281 if hasattr(y, 'setstate'): 282 y.setstate(state)

    D:\Continuum\anaconda3\lib\copy.py in deepcopy(x, memo, _nil) 148 copier = _deepcopy_dispatch.get(cls) 149 if copier: --> 150 y = copier(x, memo) 151 else: 152 try:

    D:\Continuum\anaconda3\lib\copy.py in _deepcopy_dict(x, memo, deepcopy) 238 memo[id(x)] = y 239 for key, value in x.items(): --> 240 y[deepcopy(key, memo)] = deepcopy(value, memo) 241 return y 242 d[dict] = _deepcopy_dict

    D:\Continuum\anaconda3\lib\copy.py in deepcopy(x, memo, _nil) 148 copier = _deepcopy_dispatch.get(cls) 149 if copier: --> 150 y = copier(x, memo) 151 else: 152 try:

    D:\Continuum\anaconda3\lib\copy.py in _deepcopy_list(x, memo, deepcopy) 213 append = y.append 214 for a in x: --> 215 append(deepcopy(a, memo)) 216 return y 217 d[list] = _deepcopy_list

    D:\Continuum\anaconda3\lib\copy.py in deepcopy(x, memo, _nil) 178 y = x 179 else: --> 180 y = _reconstruct(x, memo, *rv) 181 182 # If is its own copy, don't memoize.

    D:\Continuum\anaconda3\lib\copy.py in _reconstruct(x, memo, func, args, state, listiter, dictiter, deepcopy) 278 if state is not None: 279 if deep: --> 280 state = deepcopy(state, memo) 281 if hasattr(y, 'setstate'): 282 y.setstate(state)

    D:\Continuum\anaconda3\lib\copy.py in deepcopy(x, memo, _nil) 148 copier = _deepcopy_dispatch.get(cls) 149 if copier: --> 150 y = copier(x, memo) 151 else: 152 try:

    D:\Continuum\anaconda3\lib\copy.py in _deepcopy_dict(x, memo, deepcopy) 238 memo[id(x)] = y 239 for key, value in x.items(): --> 240 y[deepcopy(key, memo)] = deepcopy(value, memo) 241 return y 242 d[dict] = _deepcopy_dict

    D:\Continuum\anaconda3\lib\copy.py in deepcopy(x, memo, _nil) 178 y = x 179 else: --> 180 y = _reconstruct(x, memo, *rv) 181 182 # If is its own copy, don't memoize.

    D:\Continuum\anaconda3\lib\copy.py in _reconstruct(x, memo, func, args, state, listiter, dictiter, deepcopy) 278 if state is not None: 279 if deep: --> 280 state = deepcopy(state, memo) 281 if hasattr(y, 'setstate'): 282 y.setstate(state)

    D:\Continuum\anaconda3\lib\copy.py in deepcopy(x, memo, _nil) 148 copier = _deepcopy_dispatch.get(cls) 149 if copier: --> 150 y = copier(x, memo) 151 else: 152 try:

    D:\Continuum\anaconda3\lib\copy.py in _deepcopy_dict(x, memo, deepcopy) 238 memo[id(x)] = y 239 for key, value in x.items(): --> 240 y[deepcopy(key, memo)] = deepcopy(value, memo) 241 return y 242 d[dict] = _deepcopy_dict

    D:\Continuum\anaconda3\lib\copy.py in deepcopy(x, memo, _nil) 178 y = x 179 else: --> 180 y = _reconstruct(x, memo, *rv) 181 182 # If is its own copy, don't memoize.

    D:\Continuum\anaconda3\lib\copy.py in _reconstruct(x, memo, func, args, state, listiter, dictiter, deepcopy) 278 if state is not None: 279 if deep: --> 280 state = deepcopy(state, memo) 281 if hasattr(y, 'setstate'): 282 y.setstate(state)

    D:\Continuum\anaconda3\lib\copy.py in deepcopy(x, memo, _nil) 148 copier = _deepcopy_dispatch.get(cls) 149 if copier: --> 150 y = copier(x, memo) 151 else: 152 try:

    D:\Continuum\anaconda3\lib\copy.py in _deepcopy_dict(x, memo, deepcopy) 238 memo[id(x)] = y 239 for key, value in x.items(): --> 240 y[deepcopy(key, memo)] = deepcopy(value, memo) 241 return y 242 d[dict] = _deepcopy_dict

    D:\Continuum\anaconda3\lib\copy.py in deepcopy(x, memo, _nil) 167 reductor = getattr(x, "reduce_ex", None) 168 if reductor: --> 169 rv = reductor(4) 170 else: 171 reductor = getattr(x, "reduce", None)

    TypeError: can't pickle _thread.RLock objects`

    opened by onacrame 12
  • Kernel keep dying when running lightGBM

    Kernel keep dying when running lightGBM

    I ran the sample kernel script from kaggle: https://www.kaggle.com/flennerhag/ml-ensemble-scikit-learn-style-ensemble-learning

    The script will run without issue. But, when I add a lightGBM to the base learner, the script will run for hours without finishing the calculation. I changed the nthread=1 (from -1), and set the Evaluator to backend='threading', njobs=1, and remove the xgboost from the base learner, the kernel will die whenever running the fit method.

    Here is the parameter for the lightgbm: lgb = LGBMRegressor(objective='regression', nthread=1,seed=SEED) 'lgb': {'learning_rate': uniform(0.02,0.04), 'num_leaves': randint(50, 60), 'n_estimators': randint(150,200), 'min_child_weight': randint(30,60)}

    setup for the Evaluator:

    evl = Evaluator(scorer, cv=2, random_state=SEED, verbose=5, backend='threading', n_jobs=1 )

    Please let me know if I need to provide any more info. best Mike

    opened by TengGao 11
  • Error when propagating features from sparse matrix

    Error when propagating features from sparse matrix

    I'm trying to use mlens in a system I'm developing but, based on the documentation and the code, it's not really clear to me what propagate_features values I should use given my data. Could you offer a bit of additional explanation in the tutorial so I know what should go in?

    priority 
    opened by jattenberg 11
  • Does it support Dataframe as input?

    Does it support Dataframe as input?

    The estimator I am trying to fit accepts a pandas data frame as input in the fit method, using the column labels, however when using the SuperLearner, the data is converted to a numpy.ndarray when passing to the estimator's fit method, is there a way to preserve the column label data?

    opened by 26345211 10
  • Final K-Fold Score

    Final K-Fold Score

    One thing I have been unable to figure out is how to get a k-fold cross validation score for the whole ensemble.

    I have used sklearns built in cross_valid_score but this is very slow (I think because it ends up doing cvs whilst in another cv loop!).

    How can I get a final k-fold cross validation score for the final ensemble please? (great package btw :) )

    opened by JoshuaC3 10
  • Modifying estimator/learner after fitting

    Modifying estimator/learner after fitting

    Hi,

    First of all, let me say that I really like your library: thank you very much for building it. I am trying to modify an individual estimator/learner (LinearRegression) after calling the 'fit' method. Specifically I have a 3-layer, SuperLearner and in the second layer I am using the following code: ######################################## ## 2ND LAYER # Build the second layer (potentially add more ests_2 = [ (‘gbr’, GradientBoostingRegressor()), (‘rfr’, RandomForestRegressor()), (‘lrg’, LinearRegression()), (‘lrr’, LinearRegression()), (‘mlp’, MLPRegressor()), (‘knn’, KNeighborsRegressor()), (‘xgb’, XGBRegressor()), (‘ada’, AdaBoostRegressor()) ] pars_2_1 = {‘random_state’: seed} pars_2_2 = {‘max_depth’: 15} prms_2 = {‘gbr’:pars_2_1, ‘rfr’:pars_2_2} ensemble.add(ests_2, folds=10, proba=False)

    opened by CirdanCapital 8
  • Type Error When Running Example

    Type Error When Running Example

    Hi,

    I'm trying to run the Getting Started example and am hitting the following error, code and trace below:

    Code:

    `import numpy as np from pandas import DataFrame from mlens.metrics import make_scorer from sklearn.metrics import f1_score from sklearn.datasets import load_iris

    seed = 2017 np.random.seed(seed)

    f1 = make_scorer(f1_score, average='micro', greater_is_better=True)

    data = load_iris() idx = np.random.permutation(150) X = data.data[idx] y = data.target[idx]

    from mlens.ensemble import SuperLearner from sklearn.linear_model import LogisticRegression from sklearn.ensemble import RandomForestClassifier from sklearn.svm import SVC

    --- Build ---

    Passing a scorer will create cv scores during fitting

    ensemble = SuperLearner(scorer=f1, random_state=seed)

    Build the first layer

    ensemble.add([RandomForestClassifier(random_state=seed), SVC()])

    Attach the final meta estimator

    ensemble.add_meta(LogisticRegression())

    --- Use ---

    Fit ensemble

    ensemble.fit(X[:75], y[:75])

    Predict

    preds = ensemble.predict(X[75:]) `

    Error:

    TypeError Traceback (most recent call last) in () 18 19 # Fit ensemble ---> 20 ensemble.fit(X[:75], y[:75]) 21 22 # Predict

    C:\Anaconda2\lib\site-packages\mlens\ensemble\base.pyc in fit(self, X, y) 714 X, y = X[idx], y[idx] 715 --> 716 self.scores_ = self.layers.fit(X, y) 717 718 return self

    C:\Anaconda2\lib\site-packages\mlens\ensemble\base.pyc in fit(self, X, y, return_preds, **process_kwargs) 232 # Fit ensemble 233 try: --> 234 processor.process() 235 236 if self.verbose:

    C:\Anaconda2\lib\site-packages\mlens\parallel\manager.pyc in process(self) 216 217 for n, lyr in enumerate(self.layers.layers.values()): --> 218 self._partial_process(n, lyr, parallel) 219 220 self.fitted = 1

    C:\Anaconda2\lib\site-packages\mlens\parallel\manager.pyc in _partial_process(self, n, lyr, parallel) 306 kwargs['P'] = self.job.P[n + 1] 307 --> 308 f(**kwargs) 309 310

    C:\Anaconda2\lib\site-packages\mlens\parallel\estimation.pyc in fit(self, X, y, P, dir, parallel) 198 # Load instances from cache and store as layer attributes 199 # Typically, as layer.estimators_, layer.preprocessing_ --> 200 self._assemble(dir) 201 202 if self.verbose:

    C:\Anaconda2\lib\site-packages\mlens\parallel\estimation.pyc in assemble(self, dir) 93 94 if self.scorer is not None and self.layer.cls is not 'full': ---> 95 self.layer.scores = self._build_scores(s) 96 97 def _build_scores(self, s):

    C:\Anaconda2\lib\site-packages\mlens\parallel\estimation.pyc in _build_scores(self, s) 126 # Aggregate to get cross-validated mean scores 127 for k, v in scores.items(): --> 128 scores[k] = (np.mean(v), np.std(v)) 129 130 return scores

    C:\Anaconda2\lib\site-packages\numpy\core\fromnumeric.pyc in mean(a, axis, dtype, out, keepdims) 2940 2941 return _methods._mean(a, axis=axis, dtype=dtype, -> 2942 out=out, **kwargs) 2943 2944

    C:\Anaconda2\lib\site-packages\numpy\core_methods.pyc in _mean(a, axis, dtype, out, keepdims) 63 dtype = mu.dtype('f8') 64 ---> 65 ret = umr_sum(arr, axis, dtype, out, keepdims) 66 if isinstance(ret, mu.ndarray): 67 ret = um.true_divide(

    TypeError: unsupported operand type(s) for +: 'NoneType' and 'NoneType'

    opened by Mikesev 8
  • check_params fails on dynamic parameter updates

    check_params fails on dynamic parameter updates

    First of all thank you so much for creating such an awesome package for stacking technique. I created a stack and used Superlearner to fit the input data. but received an error while trying to predict on the fitted model. Below is my code.

    ################################## from mlens.ensemble import SuperLearner

    sl = SuperLearner( folds=10, random_state=SEED, verbose=2, backend="multiprocessing" )

    sl.add(list(base_learners.values()), proba=True) sl.add_meta(meta_learner, proba=True)

    sl.fit(xtrain, ytrain)

    p_sl = sl.predict_proba(xtest) ##################################

    The model was fitted properly, but below is the error I received while predicting on the test set.

    ################################## ~\AppData\Local\Continuum\Anaconda3\lib\site-packages\mlens\parallel_base_functions.py in check_params(lpar, rpar) 286 if isinstance(lpar, (int, float)): 287 if np.isnan(lpar): --> 288 _pass = np.isnan(rpar) 289 elif np.isinf(lpar): 290 _pass = np.isinf(rpar)

    TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe'' ##################################

    I tried several alternatives to mitigate this issue but no fate. Could you please help here?

    bug 
    opened by nitin194 6
  • AttributeError: 'SuperLearner' object has no attribute 'scores_'

    AttributeError: 'SuperLearner' object has no attribute 'scores_'

    Hi, I have found an issue about SuperLearner.scores_.

    I have fitted an SuperLearner ensemble and I wanted to check the CV scores of base learners by typing pd.DataFrame(ensemble.scores_). However, an error occurs: AttributeError: 'SuperLearner' object has no attribute 'scores_'

    This is wired. 1) I have checked my instantiated and fitted ensemble. There is indeed no scores_ attribute. 2) I've never seen this issue before.

    (And what make me crushed is that I spent a long time fitting this ensemble, only to found I can't see how my base learners behave...)

    Anyway, here is my code:

    ensemble = SuperLearner(scorer=mean_absolute_error, folds=5, random_state=seed, n_jobs=-1, shuffle=True)
    
    ensemble.add([('et01', ExtraTreesRegressor(n_estimators=..., max_depth=..., n_jobs=-1)),
                  ('et02', ExtraTreesRegressor(n_estimators=..., max_depth=..., n_jobs=-1)),
                  ('et03', ExtraTreesRegressor(n_estimators=..., max_depth=..., n_jobs=-1)),
                  ('xgb01', XGBRegressor(n_estimators=..., max_depth=..., learning_rate=..., nthread=20)),
                  ('xgb02', XGBRegressor(n_estimators=..., max_depth=..., learning_rate=..., nthread=20)),
                  ('xgb03', XGBRegressor(n_estimators=..., learning_rate=..., max_depth=..., gamma=..., nthread=20)),
                  ('rf01', RandomForestRegressor(n_estimators=..., n_jobs=-1)),
                  ('rf02', RandomForestRegressor(n_estimators=..., n_jobs=-1)),
                  ('rf03', RandomForestRegressor(n_estimators=..., n_jobs=-1)),
                  ('ridge01', Ridge(alpha=...)),
                  ('ridge02', Ridge(alpha=...)),
                  ('ridge03', Ridge(alpha=...)),
                  ('lasso01', Lasso(alpha=...)),
                  ('lasso02', Lasso(alpha=...)),
                  ('lasso03', Lasso(alpha=...)),
                  ('lgbm01', LGBMRegressor(n_estimators=..., learning_rate=...)),
                  ('lgbm02', LGBMRegressor(n_estimators=..., learning_rate=...)),
                  ('lgbm03', LGBMRegressor(n_estimators=..., learning_rate=...)),
                  ('mlp01', MLPRegressor(hidden_layer_sizes=(...,))),
                  ('mlp02', MLPRegressor(hidden_layer_sizes=(...,)))
                 ])
    
    ensemble.add_meta(Ridge(alpha=..., fit_intercept=False))
    
    ensemble.fit(X, y)
    
    print pd.DataFrame(ensemble.scores_)
    

    So, my question is:

    Q1. Is this the problem of my code or mlens ? I'm using 0.1.6 version

    Q2. If this is the problem of my code, where should I change ?

    opened by jackmiemie 5
  • Stacking of Classifiers that Operate on Different Feature Subsets

    Stacking of Classifiers that Operate on Different Feature Subsets

    I have a dataset with say 200 features. What I want is to give 30 features to 1 classifier, 90 to another and 80 to another in one layer of ensembled clfs and then take their outputs to and give them to a meta classifier. I believe this is achievable via Subset Class available in your library but can't figure the right way. I have found a similar way in another library 'mlxtend' code of which is available below. However, I'd lie to do my work via your library. Thanking you in anticipation.

    from sklearn.datasets import load_iris
    from mlxtend.classifier import StackingCVClassifier
    from mlxtend.feature_selection import ColumnSelector
    from sklearn.pipeline import make_pipeline
    from sklearn.linear_model import LogisticRegression
    
    iris = load_iris()
    X = iris.data
    y = iris.target
    
    pipe1 = make_pipeline(ColumnSelector(cols=(0, 2)),
                          LogisticRegression())
    pipe2 = make_pipeline(ColumnSelector(cols=(1, 2, 3)),
                          LogisticRegression())
    
    sclf = StackingCVClassifier(classifiers=[pipe1, pipe2], 
                                meta_classifier=LogisticRegression(),
                                random_state=42)
    
    sclf.fit(X, y)
    
    opened by m-mohsin-zafar 4
  • Error when using preprocessing per case in model selection

    Error when using preprocessing per case in model selection

    Hello,

    I followed this notebook to create an ensemble.

    First I tune the base learners with a preprocessing pipeline:

    preprocessing = {'sc': [StandardScaler()]}
    # Fit the base learners
    evl.fit(
        x_train, y_train,
        estimators=base_learners,
        param_dicts=param_dicts,
        preprocessing=preprocessing,
        n_iter=1
    )
    

    In the notebook, the model selection is done like this:

    in_layer_proba = SuperLearner(model_selection=True).add(base_learners,
                                                            proba=True)
    in_layer_class = SuperLearner(model_selection=True).add(base_learners,
                                                            proba=False)
    
    

    This works fine, but I think the preprocessing part is missing here (the standard scaler), is it not? So I did the following:

    in_layer_proba = SuperLearner(model_selection=True).add(
                    estimators_per_case, preprocessing_cases, proba=True
    )
    in_layer_class = SuperLearner(model_selection=True).add(
            estimators_per_case, preprocessing_cases, proba=False
    )
    preprocess = {'proba': [('layer-1', in_layer_proba)],
                           'class': [('layer-1', in_layer_class)]}
    evl.fit(
        x_train, y_train,
        meta_learners,
        param_dicts=param_dicts,
        preprocessing=preprocess,
        n_iter=1
    )
    

    And I get the following error. I am not sure if I am doing something wrong or if it is a bug?

    opened by bastian-f 2
  • Bug fix: mlens.external.sklearn.type_of_target.py

    Bug fix: mlens.external.sklearn.type_of_target.py

    Since python 3.10, the Sequence class of the collections module is now located at collections.abc. The fix is to check the python version and import from the correct location.

    opened by Mrjoeybux 0
  • Apply preprocessing to target variable as well

    Apply preprocessing to target variable as well

    Dear Flennerhag and community,

    Thank you for your amazing work!

    I have been using it for some work, and I stumbled upon a case where I need to subset rows of my data. I have successfully coded a custom transformer to only select a subset of rows of my features X. However, this transformation does not get applied to the target variable y when fitting my training data.

    Do you have any idea on how to implement such thing?

    Thank's in advance for your help.

    opened by ogrnz 0
  • confirmation

    confirmation

    I want to know the data flow ( or diagram) of the superlearner stacking in mlens package. If I have dataset so I split it to training and testing then I use the kfolds in the training set then I use the Predictions of base models to train the next level so what about the test set

    opened by Mohammedsaied89 0
Releases(0.2.3)
  • 0.2.3(Oct 30, 2018)

  • 0.2.2(Feb 20, 2018)

  • 0.2.1(Nov 5, 2017)

    Introducing the computational graph backend. Version 0.2.0 implements the Learner-Transformer API, which generalizes the backend and expands the low-level API.

    Version 0.2.1. includes critical a patch for model selection.

    Source code(tar.gz)
    Source code(zip)
  • 0.1.5.2(Jul 27, 2017)

  • 0.1.5.1(Jul 25, 2017)

  • 0.1.5(Jul 18, 2017)

    • Possible to set environmental variables
    • multiprocessing default backend
    • spawn as default start method for parallel jobs (w. multiprocessing)
    • Possible to specify y as partition input in clustered subsumable partitioning
    • Minor bug fixes
    • Refactored backend for streamlined front-end feature development
    Source code(tar.gz)
    Source code(zip)
  • 0.1.4(Jul 13, 2017)

    Updates

    • Prediction array dtype option (default=float32)
    • Feature propagation
    • Clustered subsemble partitioning
    • No memmaps passed to estimators (only ndarray views)
    • Threading as default global backend (changeable through mlens.config.BACKEND)
    • Global configuration (mlens.config)
    • Optional specification of temporary directory
    • Scoring exception handling
    Source code(tar.gz)
    Source code(zip)
  • 0.1.3(May 30, 2017)

  • 0.1.2(May 18, 2017)

  • 0.1.0(Apr 9, 2017)

Owner
Sebastian Flennerhag
PhD candidate in Machine Learning, currently at @deepmind
Sebastian Flennerhag
Repo for code associated with Modeling the Mitral Valve.

Project Title Mitral Valve Getting Started Repo for code associated with Modeling the Mitral Valve. See https://arxiv.org/abs/1902.00018 for preprint,

Alex Kaiser 1 May 17, 2022
g9.py - Torch interactive graphics

g9.py - Torch interactive graphics A Torch toy in the browser. Demo at https://srush.github.io/g9py/ This is a shameless copy of g9.js, written in Pyt

Sasha Rush 13 Nov 16, 2022
🦕 NanoSaur is a little tracked robot ROS2 enabled, made for an NVIDIA Jetson Nano

🦕 nanosaur NanoSaur is a little tracked robot ROS2 enabled, made for an NVIDIA Jetson Nano Website: nanosaur.ai Do you need an help? Discord For tech

NanoSaur 162 Dec 09, 2022
A minimal solution to hand motion capture from a single color camera at over 100fps. Easy to use, plug to run.

Minimal Hand A minimal solution to hand motion capture from a single color camera at over 100fps. Easy to use, plug to run. This project provides the

Yuxiao Zhou 824 Jan 07, 2023
An algorithmic trading bot that learns and adapts to new data and evolving markets using Financial Python Programming and Machine Learning.

ALgorithmic_Trading_with_ML An algorithmic trading bot that learns and adapts to new data and evolving markets using Financial Python Programming and

1 Mar 14, 2022
PyTorch implementation of UPFlow (unsupervised optical flow learning)

UPFlow: Upsampling Pyramid for Unsupervised Optical Flow Learning By Kunming Luo, Chuan Wang, Shuaicheng Liu, Haoqiang Fan, Jue Wang, Jian Sun Megvii

kunming luo 87 Dec 20, 2022
Encode and decode text application

Text Encoder and Decoder Encode and decode text in many ways using this application! Encode in: ASCII85 Base85 Base64 Base32 Base16 Url MD5 Hash SHA-1

Alice 1 Feb 12, 2022
On the Analysis of French Phonetic Idiosyncrasies for Accent Recognition

On the Analysis of French Phonetic Idiosyncrasies for Accent Recognition With the spirit of reproducible research, this repository contains codes requ

0 Feb 24, 2022
Open source implementation of AceNAS: Learning to Rank Ace Neural Architectures with Weak Supervision of Weight Sharing

AceNAS This repo is the experiment code of AceNAS, and is not considered as an official release. We are working on integrating AceNAS as a built-in st

Yuge Zhang 6 Sep 07, 2022
[SIGGRAPH 2022 Journal Track] AvatarCLIP: Zero-Shot Text-Driven Generation and Animation of 3D Avatars

AvatarCLIP: Zero-Shot Text-Driven Generation and Animation of 3D Avatars Fangzhou Hong1*  Mingyuan Zhang1*  Liang Pan1  Zhongang Cai1,2,3  Lei Yang2 

Fangzhou Hong 749 Jan 04, 2023
Implementation of temporal pooling methods studied in [ICIP'20] A Comparative Evaluation Of Temporal Pooling Methods For Blind Video Quality Assessment

Implementation of temporal pooling methods studied in [ICIP'20] A Comparative Evaluation Of Temporal Pooling Methods For Blind Video Quality Assessment

Zhengzhong Tu 5 Sep 16, 2022
Pyramid Pooling Transformer for Scene Understanding

Pyramid Pooling Transformer for Scene Understanding Requirements: torch 1.6+ torchvision 0.7.0 timm==0.3.2 Validated on torch 1.6.0, torchvision 0.7.0

Yu-Huan Wu 119 Dec 29, 2022
MLOps will help you to understand how to build a Continuous Integration and Continuous Delivery pipeline for an ML/AI project.

page_type languages products description sample python azure azure-machine-learning-service azure-devops Code which demonstrates how to set up and ope

1 Nov 01, 2021
Matching python environment code for Lux AI 2021 Kaggle competition, and a gym interface for RL models.

Lux AI 2021 python game engine and gym This is a replica of the Lux AI 2021 game ported directly over to python. It also sets up a classic Reinforceme

Geoff McDonald 74 Nov 03, 2022
This repository contains the code for the paper 'PARM: Paragraph Aggregation Retrieval Model for Dense Document-to-Document Retrieval' published at ECIR'22.

Paragraph Aggregation Retrieval Model (PARM) for Dense Document-to-Document Retrieval This repository contains the code for the paper PARM: A Paragrap

Sophia Althammer 33 Aug 26, 2022
Implementation of Memory-Efficient Neural Networks with Multi-Level Generation, ICCV 2021

Memory-Efficient Multi-Level In-Situ Generation (MLG) By Jiaqi Gu, Hanqing Zhu, Chenghao Feng, Mingjie Liu, Zixuan Jiang, Ray T. Chen and David Z. Pan

Jiaqi Gu 2 Jan 04, 2022
Hierarchical Few-Shot Generative Models

Hierarchical Few-Shot Generative Models Giorgio Giannone, Ole Winther This repo contains code and experiments for the paper Hierarchical Few-Shot Gene

Giorgio Giannone 6 Dec 12, 2022
Source code for the NeurIPS 2021 paper "On the Second-order Convergence Properties of Random Search Methods"

Second-order Convergence Properties of Random Search Methods This repository the paper "On the Second-order Convergence Properties of Random Search Me

Adamos Solomou 0 Nov 13, 2021
[NeurIPS 2021] Source code for the paper "Qu-ANTI-zation: Exploiting Neural Network Quantization for Achieving Adversarial Outcomes"

Qu-ANTI-zation This repository contains the code for reproducing the results of our paper: Qu-ANTI-zation: Exploiting Quantization Artifacts for Achie

Secure AI Systems Lab 8 Mar 26, 2022
This project hosts the code for implementing the ISAL algorithm for object detection and image classification

Influence Selection for Active Learning (ISAL) This project hosts the code for implementing the ISAL algorithm for object detection and image classifi

25 Sep 11, 2022