An alternative if you do want to encode multiple categorical features is to use a Pipeline with a FeatureUnion and a couple custom Transformers.
First need two transformers - one for selecting a single column and one for making LabelEncoder usable in a Pipeline (The fit_transform method only takes X, it needs to take an optional y to work in a Pipeline).
from sklearn.base import BaseEstimator, TransformerMixinclass SingleColumnSelector(TransformerMixin, BaseEstimator): def __init__(self, column): self.column = column def transform(self, X, y=None): return X[:, self.column].reshape(-1, 1) def fit(self, X, y=None): return selfclass PipelineAwareLabelEncoder(TransformerMixin, BaseEstimator): def fit(self, X, y=None): return self def transform(self, X, y=None): return LabelEncoder().fit_transform(X).reshape(-1, 1)
Next create a Pipeline (or just a FeatureUnion) which has 2 branches - one for each of the categorical columns. Within each select 1 column, encode the labels and then one hot encode.
import pandas as pdimport numpy as npfrom sklearn.preprocessing import LabelEncoder, OneHotEncoder, FunctionTransformerfrom sklearn.pipeline import Pipeline, make_pipeline, FeatureUnionpipeline = Pipeline([('encoded_features', FeatureUnion([('countries', make_pipeline( SingleColumnSelector(0), PipelineAwareLabelEncoder(), OneHotEncoder() )), ('names', make_pipeline( SingleColumnSelector(1), PipelineAwareLabelEncoder(), OneHotEncoder() )) ]))])
Finally run your full dataframe through the Pipeline - it will one hot encode each column separately and concatenate at the end.
df = pd.DataFrame([["AUS", "Sri"],["USA","Vignesh"],["IND", "Pechi"],["USA","Raj"]], columns=['Country', 'Name'])X = df.valuestransformed_X = pipeline.fit_transform(X)print(transformed_X.toarray())
Which returns (first 3 columns are the countries, second 4 are the names)
[[ 1. 0. 0. 0. 0. 1. 0.] [ 0. 0. 1. 0. 0. 0. 1.] [ 0. 1. 0. 1. 0. 0. 0.] [ 0. 0. 1. 0. 1. 0. 0.]]