python – How to create a for loop for OneHotEncoder – Stack Overflow

[

I have a list of categorical columns within my dataframe that I am trying to OneHotEncode. I’ve used the following code for each of these columns individually, but cannot figure out how to iterate through my categoricals list to do the same. Does anyone know how to do this?

categoricals = ['bedrooms', 'bathrooms', 'floors', 'condition', 'grade', 
                'yr_built']

from sklearn.preprocessing import OneHotEncoder

bedrooms = df[['bedrooms']]

bed = OneHotEncoder(categories="auto", sparse=False, handle_unknown="ignore")

bed.fit(bedrooms)

bed_encoded = bed.transform(bedrooms)

bed_encoded = pd.DataFrame(
    bed_encoded,
    columns=bed.categories_[0],
    index=df.index
)

df.drop("bedrooms", axis=1, inplace=True)

df = pd.concat([df, bed_encoded], axis=1)

,

Method:1

Create the DataFrame first. You can use an ordinal encoder like label-encoder first and then do the one-hot encoding.

categorical_cols = ['bedrooms', 'bathrooms', 'floors', 'condition', 'grade', 
            'yr_built'] 

from sklearn.preprocessing import LabelEncoder
# instantiate labelencoder object
le = LabelEncoder()

# apply le on categorical feature columns
# data is the dataframe

data[categorical_cols] = data[categorical_cols].apply(lambda col: 
le.fit_transform(col))    
from sklearn.preprocessing import OneHotEncoder
ohe = OneHotEncoder()

#One-hot-encode the categorical columns.
#outputs an array instead of dataframe.
array_hot_encoded = ohe.fit_transform(data[categorical_cols])

#Convert it to df
data_hot_encoded = pd.DataFrame(array_hot_encoded, index=data.index)

#Extract only the columns that are numeric and don't need to be encoded
data_numeric_cols = data.drop(columns=categorical_cols)

#Concatenate the two dataframes : 
data_out = pd.concat([data_hot_encoded, data_numeric_cols], axis=1)

You could also use pd.factorize() to ap the categorical data mapped to ordinal data.

Method:2

Use pd.get_dummies() so that you can do one-hot encoding directly from raw data.(don’t have to convert into ordinal data)

import pandas as pd
df = pd.get_dummies(data, columns = categorical_cols)

]