pandas, scikit-learn, xgboost and seaborn integration

Overview

pandas-ml

Latest Docs https://travis-ci.org/pandas-ml/pandas-ml.svg?branch=master

Overview

pandas, scikit-learn and xgboost integration.

Installation

$ pip install pandas_ml

Documentation

http://pandas-ml.readthedocs.org/en/stable/

Example

>>> import pandas_ml as pdml
>>> import sklearn.datasets as datasets

# create ModelFrame instance from sklearn.datasets
>>> df = pdml.ModelFrame(datasets.load_digits())
>>> type(df)
<class 'pandas_ml.core.frame.ModelFrame'>

# binarize data (features), not touching target
>>> df.data = df.data.preprocessing.binarize()
>>> df.head()
   .target  0  1  2  3  4  5  6  7  8 ...  54  55  56  57  58  59  60  61  62  63
0        0  0  0  1  1  1  1  0  0  0 ...   0   0   0   0   1   1   1   0   0   0
1        1  0  0  0  1  1  1  0  0  0 ...   0   0   0   0   0   1   1   1   0   0
2        2  0  0  0  1  1  1  0  0  0 ...   1   0   0   0   0   1   1   1   1   0
3        3  0  0  1  1  1  1  0  0  0 ...   1   0   0   0   1   1   1   1   0   0
4        4  0  0  0  1  1  0  0  0  0 ...   0   0   0   0   0   1   1   1   0   0
[5 rows x 65 columns]

# split to training and test data
>>> train_df, test_df = df.model_selection.train_test_split()

# create estimator (accessor is mapped to sklearn namespace)
>>> estimator = df.svm.LinearSVC()

# fit to training data
>>> train_df.fit(estimator)

# predict test data
>>> test_df.predict(estimator)
0     4
1     2
2     7
...
448    5
449    8
Length: 450, dtype: int64

# Evaluate the result
>>> test_df.metrics.confusion_matrix()
Predicted   0   1   2   3   4   5   6   7   8   9
Target
0          52   0   0   0   0   0   0   0   0   0
1           0  37   1   0   0   1   0   0   3   3
2           0   2  48   1   0   0   0   1   1   0
3           1   1   0  44   0   1   0   0   3   1
4           1   0   0   0  43   0   1   0   0   0
5           0   1   0   0   0  39   0   0   0   0
6           0   1   0   0   1   0  35   0   0   0
7           0   0   0   0   2   0   0  42   1   0
8           0   2   1   0   1   0   0   0  33   1
9           0   2   1   2   0   0   0   0   1  38

Supported Packages

  • scikit-learn
  • patsy
  • xgboost
Comments
  • Fixed imports of deprecated modules which were removed in pandas 0.24.0

    Fixed imports of deprecated modules which were removed in pandas 0.24.0

    Certain functions were deprecated in a previous version of pandas and moved to a different module (see #117). This PR fixes the imports of those functions.

    opened by kristofve 8
  • REL: v0.4.0

    REL: v0.4.0

    • [x] Compat/test for sklearn 0.18.0 (#81)
      • [x] initial fix (#81)
      • [x] wrapper for cross validation classes (re-enable skipped tests) (#85)
      • [x] tests for multioutput (#86)
      • [x] Update doc
    • [x] Compat/test for pandas 0.19.0 (#83)
    • [x] Update release note (#88)
    opened by sinhrks 4
  • Importation error

    Importation error

    I tried to import pandas_ml but it gave the error :

    AttributeError: type object 'NDFrame' has no attribute 'groupby'

    I'm running python3.8.1 and I installed pandas_ml via pip (version 20.0.2)

    I dig in the code, error is l.80 of file series.py

    @Appender(pd.core.generic.NDFrame.groupby.__doc__)

    Here pandas is imported at the top of the file with a classic import pandas as pd

    I guess there is a problem with the versions...

    Thanks in advance for any help

    opened by ierezell 2
  • Confusion Matrix no accessible

    Confusion Matrix no accessible

    Hi,

    I've been using confusion_matrix since it was an independent package. I've installed pandas_ml to continue using the package, but it seems that the setup.py script does not install the package.

    Could it be an issue with the find_packages function?

    opened by mmartinortiz 2
  • Seaborn Scatterplot matrix / pairplot integration

    Seaborn Scatterplot matrix / pairplot integration

    import seaborn as sns
    sns.set()
    
    df = sns.load_dataset("iris")
    sns.pairplot(df, hue="species")
    

    displays

    iris_scatter_matrix

    but pairplot doesn't work the same way with ModelFrame

    import pandas as pd
    pd.set_option('max_rows', 10)
    import sklearn.datasets as datasets
    import pandas_ml as pdml  # https://github.com/pandas-ml/pandas-ml
    import seaborn as sns
    import matplotlib.pyplot as plt
    df = pdml.ModelFrame(datasets.load_iris())
    sns.pairplot(df, hue=".target")
    

    iris_modelframe

    There is some useless subplots

    opened by scls19fr 2
  • Error while running train.py from speech commands in tensorflow examples.

    Error while running train.py from speech commands in tensorflow examples.

    Have the following error: File "train.py", line 27, in <module> from callbacks import ConfusionMatrixCallback File "/home/tesseract/ayush_workspace/NLP/WakeWord/tensorflow_trainer/ml/callbacks.py", line 21, in <module> from pandas_ml import ConfusionMatrix File "/home/tesseract/anaconda3/envs/ciao/lib/python3.6/site-packages/pandas_ml/__init__.py", line 3, in <module> from pandas_ml.core import ModelFrame, ModelSeries # noqa File "/home/tesseract/anaconda3/envs/ciao/lib/python3.6/site-packages/pandas_ml/core/__init__.py", line 3, in <module> from pandas_ml.core.frame import ModelFrame # noqa File "/home/tesseract/anaconda3/envs/ciao/lib/python3.6/site-packages/pandas_ml/core/frame.py", line 18, in <module> from pandas_ml.core.series import ModelSeries File "/home/tesseract/anaconda3/envs/ciao/lib/python3.6/site-packages/pandas_ml/core/series.py", line 11, in <module> class ModelSeries(ModelTransformer, pd.Series): File "/home/tesseract/anaconda3/envs/ciao/lib/python3.6/site-packages/pandas_ml/core/series.py", line 80, in ModelSeries @Appender(pd.core.generic.NDFrame.groupby.__doc__) AttributeError: type object 'NDFrame' has no attribute 'groupby' Happening with both version 5 and 6.1

    opened by ayush7 1
  • error for example https://pandas-ml.readthedocs.io/en/latest/xgboost.html

    error for example https://pandas-ml.readthedocs.io/en/latest/xgboost.html

    code from example https://pandas-ml.readthedocs.io/en/latest/xgboost.html '''import pandas_ml as pdml import sklearn.datasets as datasets df = pdml.ModelFrame(datasets.load_digits()) train_df, test_df = df.cross_validation.train_test_split() estimator = df.xgboost.XGBClassifier() train_df.fit(estimator) predicted = test_df.predict(estimator) q=1 test_df.metrics.confusion_matrix() train_df.xgboost.plot_importance()

    tuned_parameters = [{'max_depth': [3, 4]}] cv = df.grid_search.GridSearchCV(df.xgb.XGBClassifier(), tuned_parameters, cv=5)

    df.fit(cv) df.grid_search.describe(cv) q=1

    '''

    gives error ''' File "E:\Pandas\my_code\S_pandas_ml_feb27.py", line 10, in train_df.xgboost.plot_importance() File "C:\Users\sndr\Anaconda3\Lib\site-packages\pandas_ml\xgboost\base.py", line 61, in plot_importance return xgb.plot_importance(self._df.estimator.booster(),

    builtins.TypeError: 'str' object is not callable ''' I use Windows and 3.6.4 |Anaconda, Inc.| (default, Jan 16 2018, 10:22:32) [MSC v.1900 64 bit (AMD64)] Python Type "help", "copyright", "credits" or "license" for more information.

    opened by Sandy4321 1
  • pandas 0.24.0 has deprecated pandas.util.decorators

    pandas 0.24.0 has deprecated pandas.util.decorators

    See https://pandas.pydata.org/pandas-docs/stable/whatsnew/v0.24.0.html#deprecations

    This causes the import statement in https://github.com/pandas-ml/pandas-ml/blob/master/pandas_ml/core/frame.py to break.

    Looks like just need to change it to 'from pandas.utils'

    opened by usul83 1
  • 'mean_absoloute_error

    'mean_absoloute_error

    from sklearn import metrics print('MAE:',metrics.mean_absoloute_error(y_test,y_pred)) module 'sklearn.metrics' has no attribute 'mean_absoloute_error This error is occurred..any solution

    opened by vikramk1507 0
  • AttributeError: type object 'NDFrame' has no attribute 'groupby'

    AttributeError: type object 'NDFrame' has no attribute 'groupby'

    AttributeError: type object 'NDFrame' has no attribute 'groupby'

    from pandas_ml import ConfusionMatrix cm = ConfusionMatrix(actu, pred) cm.print_stats()


    AttributeError Traceback (most recent call last) in ----> 1 from pandas_ml import confusion_matrix 2 3 cm = ConfusionMatrix(actu, pred) 4 cm.print_stats()

    /usr/local/lib/python3.8/site-packages/pandas_ml/init.py in 1 #!/usr/bin/env python 2 ----> 3 from pandas_ml.core import ModelFrame, ModelSeries # noqa 4 from pandas_ml.tools import info # noqa 5 from pandas_ml.version import version as version # noqa

    /usr/local/lib/python3.8/site-packages/pandas_ml/core/init.py in 1 #!/usr/bin/env python 2 ----> 3 from pandas_ml.core.frame import ModelFrame # noqa 4 from pandas_ml.core.series import ModelSeries # noqa

    /usr/local/lib/python3.8/site-packages/pandas_ml/core/frame.py in 16 from pandas_ml.core.accessor import _AccessorMethods 17 from pandas_ml.core.generic import ModelPredictor, _shared_docs ---> 18 from pandas_ml.core.series import ModelSeries 19 20

    /usr/local/lib/python3.8/site-packages/pandas_ml/core/series.py in 9 10 ---> 11 class ModelSeries(ModelTransformer, pd.Series): 12 """ 13 Wrapper for pandas.Series to support sklearn.preprocessing

    /usr/local/lib/python3.8/site-packages/pandas_ml/core/series.py in ModelSeries() 78 return df 79 ---> 80 @Appender(pd.core.generic.NDFrame.groupby.doc) 81 def groupby(self, by=None, axis=0, level=None, as_index=True, sort=True, 82 group_keys=True, squeeze=False):

    AttributeError: type object 'NDFrame' has no attribute 'groupby'

    opened by gfranco008 5
  • AttributeError: module 'sklearn.metrics' has no attribute 'jaccard_similarity_score'

    AttributeError: module 'sklearn.metrics' has no attribute 'jaccard_similarity_score'

    I am using scikit-learn version 0.23.1 and I get the following error: AttributeError: module 'sklearn.metrics' has no attribute 'jaccard_similarity_score' when calling the function ConfusionMatrix.

    opened by petraknovak 11
  • Error while running train.py from speech commands in tensorflow examples. AttributeError: type object 'NDFrame' has no attribute 'groupby'

    Error while running train.py from speech commands in tensorflow examples. AttributeError: type object 'NDFrame' has no attribute 'groupby'

    Have the following error: File "train.py", line 27, in <module> from callbacks import ConfusionMatrixCallback File "/home/tesseract/ayush_workspace/NLP/WakeWord/tensorflow_trainer/ml/callbacks.py", line 21, in <module> from pandas_ml import ConfusionMatrix File "/home/tesseract/anaconda3/envs/ciao/lib/python3.6/site-packages/pandas_ml/__init__.py", line 3, in <module> from pandas_ml.core import ModelFrame, ModelSeries # noqa File "/home/tesseract/anaconda3/envs/ciao/lib/python3.6/site-packages/pandas_ml/core/__init__.py", line 3, in <module> from pandas_ml.core.frame import ModelFrame # noqa File "/home/tesseract/anaconda3/envs/ciao/lib/python3.6/site-packages/pandas_ml/core/frame.py", line 18, in <module> from pandas_ml.core.series import ModelSeries File "/home/tesseract/anaconda3/envs/ciao/lib/python3.6/site-packages/pandas_ml/core/series.py", line 11, in <module> class ModelSeries(ModelTransformer, pd.Series): File "/home/tesseract/anaconda3/envs/ciao/lib/python3.6/site-packages/pandas_ml/core/series.py", line 80, in ModelSeries @Appender(pd.core.generic.NDFrame.groupby.__doc__) AttributeError: type object 'NDFrame' has no attribute 'groupby' Happening with both version 5 and 6.1

    opened by ayush7 3
  • Pandas 1.0.0rc0/0.6.1 module 'sklearn.preprocessing' has no attribute 'Imputer'

    Pandas 1.0.0rc0/0.6.1 module 'sklearn.preprocessing' has no attribute 'Imputer'

    SKLEARN

    sklearn.preprocessing.Imputer Warning DEPRECATED

    class sklearn.preprocessing.Imputer(*args, **kwargs)[source] Imputation transformer for completing missing values.

    Read more in the User Guide.

    
    ---------------------------------------------------------------------------
    AttributeError                            Traceback (most recent call last)
    <ipython-input-1-e0471065d85c> in <module>
          1 import pandas as pd
          2 import numpy as np
    ----> 3 import pandas_ml as pdml
          4 a1 = np.random.randint(0,2,size=(100,2))
          5 df = pd.DataFrame(a1,columns=['i1','i2'])
    
    C:\g\test\lib\pandas_ml\__init__.py in <module>
          1 #!/usr/bin/env python
          2 
    ----> 3 from pandas_ml.core import ModelFrame, ModelSeries       # noqa
          4 from pandas_ml.tools import info                         # noqa
          5 from pandas_ml.version import version as __version__     # noqa
    
    C:\g\test\lib\pandas_ml\core\__init__.py in <module>
          1 #!/usr/bin/env python
          2 
    ----> 3 from pandas_ml.core.frame import ModelFrame       # noqa
          4 from pandas_ml.core.series import ModelSeries     # noqa
    
    C:\g\test\lib\pandas_ml\core\frame.py in <module>
          8 
          9 import pandas_ml.imbaccessors as imbaccessors
    ---> 10 import pandas_ml.skaccessors as skaccessors
         11 import pandas_ml.smaccessors as smaccessors
         12 import pandas_ml.snsaccessors as snsaccessors
    
    C:\g\test\lib\pandas_ml\skaccessors\__init__.py in <module>
         17 from pandas_ml.skaccessors.neighbors import NeighborsMethods                      # noqa
         18 from pandas_ml.skaccessors.pipeline import PipelineMethods                        # noqa
    ---> 19 from pandas_ml.skaccessors.preprocessing import PreprocessingMethods              # noqa
         20 from pandas_ml.skaccessors.svm import SVMMethods                                  # noqa
    
    C:\g\test\lib\pandas_ml\skaccessors\preprocessing.py in <module>
         11     _keep_col_classes = [pp.Binarizer,
         12                          pp.FunctionTransformer,
    ---> 13                          pp.Imputer,
         14                          pp.KernelCenterer,
         15                          pp.LabelEncoder,
    
    AttributeError: module 'sklearn.preprocessing' has no attribute 'Imputer'
    
    opened by apiszcz 11
Releases(v0.6.1)
GroundSeg Clustering Optimized Kdtree

ground seg and clustering based on kitti velodyne data, and a additional optimized kdtree for knn and radius nn search

2 Dec 02, 2021
Timeseries analysis for neuroscience data

=================================================== Nitime: timeseries analysis for neuroscience data ===============================================

NIPY developers 212 Dec 09, 2022
Code Repository for Machine Learning with PyTorch and Scikit-Learn

Code Repository for Machine Learning with PyTorch and Scikit-Learn

Sebastian Raschka 1.4k Jan 03, 2023
决策树分类与回归模型的实现和可视化

DecisionTree 决策树分类与回归模型,以及可视化 DecisionTree ID3 C4.5 CART 分类 回归 决策树绘制 分类树 回归树 调参 剪枝 ID3 ID3决策树是最朴素的决策树分类器: 无剪枝 只支持离散属性 采用信息增益准则 在data.py中,我们记录了一个小的西瓜数据

Welt Xing 10 Oct 22, 2022
Mosec is a high-performance and flexible model serving framework for building ML model-enabled backend and microservices

Mosec is a high-performance and flexible model serving framework for building ML model-enabled backend and microservices. It bridges the gap between any machine learning models you just trained and t

164 Jan 04, 2023
The MLOps is the process of continuous integration and continuous delivery of Machine Learning artifacts as a software product, keeping it inside a loop of Design, Model Development and Operations.

MLOps The MLOps is the process of continuous integration and continuous delivery of Machine Learning artifacts as a software product, keeping it insid

Maykon Schots 25 Nov 27, 2022
Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks

Spark Python Notebooks This is a collection of IPython notebook/Jupyter notebooks intended to train the reader on different Apache Spark concepts, fro

Jose A Dianes 1.5k Jan 02, 2023
Projeto: Machine Learning: Linguagens de Programacao 2004-2001

Projeto: Machine Learning: Linguagens de Programacao 2004-2001 Projeto de Data Science e Machine Learning de análise de linguagens de programação de 2

Victor Hugo Negrisoli 0 Jun 29, 2021
Katana project is a template for ASAP 🚀 ML application deployment

Katana project is a FastAPI template for ASAP 🚀 ML API deployment

Mohammad Shahebaz 100 Dec 26, 2022
Python package for concise, transparent, and accurate predictive modeling

Python package for concise, transparent, and accurate predictive modeling. All sklearn-compatible and easy to use. 📚 docs • 📖 demo notebooks Modern

Chandan Singh 983 Jan 01, 2023
Responsible Machine Learning with Python

Examples of techniques for training interpretable ML models, explaining ML models, and debugging ML models for accuracy, discrimination, and security.

ph_ 624 Jan 06, 2023
An implementation of Relaxed Linear Adversarial Concept Erasure (RLACE)

Background This repository contains an implementation of Relaxed Linear Adversarial Concept Erasure (RLACE). Given a dataset X of dense representation

Shauli Ravfogel 4 Apr 13, 2022
Firebase + Cloudrun + Machine learning

A simple end to end consumer lending decision engine powered by Google Cloud Platform (firebase hosting and cloudrun)

Emmanuel Ogunwede 8 Aug 16, 2022
Model Agnostic Confidence Estimator (MACEST) - A Python library for calibrating Machine Learning models' confidence scores

Model Agnostic Confidence Estimator (MACEST) - A Python library for calibrating Machine Learning models' confidence scores

Oracle 95 Dec 28, 2022
Repository for DCA0305, an undergraduate course about Machine Learning Workflows and Pipelines

Federal University of Rio Grande do Norte Technology Center Department of Computer Engineering and Automation Machine Learning Based Systems Design Re

Ivanovitch Silva 81 Oct 18, 2022
Steganography is the art of hiding the fact that communication is taking place, by hiding information in other information.

Steganography is the art of hiding the fact that communication is taking place, by hiding information in other information.

Priyansh Sharma 7 Nov 09, 2022
Primitives for machine learning and data science.

An Open Source Project from the Data to AI Lab, at MIT MLPrimitives Pipelines and primitives for machine learning and data science. Documentation: htt

MLBazaar 65 Dec 29, 2022
This is an implementation of the proximal policy optimization algorithm for the C++ API of Pytorch

This is an implementation of the proximal policy optimization algorithm for the C++ API of Pytorch. It uses a simple TestEnvironment to test the algorithm

Martin Huber 59 Dec 09, 2022
MCML is a toolkit for semi-supervised dimensionality reduction and quantitative analysis of Multi-Class, Multi-Label data

MCML is a toolkit for semi-supervised dimensionality reduction and quantitative analysis of Multi-Class, Multi-Label data. We demonstrate its use

Pachter Lab 26 Nov 29, 2022
A comprehensive set of fairness metrics for datasets and machine learning models, explanations for these metrics, and algorithms to mitigate bias in datasets and models.

AI Fairness 360 (AIF360) The AI Fairness 360 toolkit is an extensible open-source library containg techniques developed by the research community to h

1.9k Jan 06, 2023