Delete emojis or replace for text using regex in pandas

Solution for Delete emojis or replace for text using regex in pandas
is Given Below:

I have this toy dataset:

df = pd.DataFrame({'id':[1,2,3,4,5,6],
                   'text':['Oh no Monday','Oh no Monday','Gotcha 🇫🇷!',
                           'Coffee, please','Coffee, please','Mails 🌎'],
                   'dates':['2019-05-30T17:48:45+0000','2019-05-30T17:48:45+0000',
                           '2019-05-25T19:40:43+0000','2019-03-30T14:41:20+0000',
                           '2019-03-30T14:41:20+0000','2019-04-10T19:50:49+0000'],
                   'group':['meme','humour','meme','gif','gif','meme'],
                   'theme':['light','light','funny','dark','sad','funny']})

id  text    dates   group   theme
0   1   Oh no Monday    2019-05-30T17:48:45+0000    meme    light
1   2   Oh no Monday    2019-05-30T17:48:45+0000    humour  light
2   3   Gotcha 🇫🇷!    2019-05-25T19:40:43+0000    meme    funny
3   4   Coffee, please  2019-03-30T14:41:20+0000    gif dark
4   5   Coffee, please  2019-03-30T14:41:20+0000    gif sad
5   6   Mails 🌎    2019-04-10T19:50:49+0000    meme    funny

I want to either delete the emojis or replace the emoji for text. I tried this:

df['clean_text'] = df['text'].str.extract(r'(^[a-zA-Z]+)')

But it deletes my longer texts, for example I want Oh no Monday but I get only Oh!:


id  text                dates                       group   theme   clean_text
0   1   Oh no Monday    2019-05-30T17:48:45+0000    meme    light   Oh
1   2   Oh no Monday    2019-05-30T17:48:45+0000    humour  light   Oh
2   3   Gotcha 🇫🇷!        2019-05-25T19:40:43+0000    meme    funny   Gotcha
3   4   Coffee, please  2019-03-30T14:41:20+0000    gif dark    Coffee
4   5   Coffee, please  2019-03-30T14:41:20+0000    gif sad Coffee
5   6   Mails 🌎        2019-04-10T19:50:49+0000    meme    funny   Mails

Please, any help or guidance will be greatly appreciated.

You can delete emojis using regex:

pat = r'[U0001F600-U0001F64F]|[U0001F300-U0001F5FF]|[U0001F680-U0001F6FF]|[U0001F1E0-U0001F1FF]'
>>> df['text'].str.replace(pat, '', regex=True)
0      Oh no Monday
1      Oh no Monday
2          Gotcha !
3    Coffee, please
4    Coffee, please
5            Mails
Name: text, dtype: object

A complete list:

# https://en.wikipedia.org/wiki/Unicode_block
EMOJI_PATTERN = re.compile(
    "["
    "U0001F1E0-U0001F1FF"  # flags (iOS)
    "U0001F300-U0001F5FF"  # symbols & pictographs
    "U0001F600-U0001F64F"  # emoticons
    "U0001F680-U0001F6FF"  # transport & map symbols
    "U0001F700-U0001F77F"  # alchemical symbols
    "U0001F780-U0001F7FF"  # Geometric Shapes Extended
    "U0001F800-U0001F8FF"  # Supplemental Arrows-C
    "U0001F900-U0001F9FF"  # Supplemental Symbols and Pictographs
    "U0001FA00-U0001FA6F"  # Chess Symbols
    "U0001FA70-U0001FAFF"  # Symbols and Pictographs Extended-A
    "U00002702-U000027B0"  # Dingbats
    "U000024C2-U0001F251" 
    "]+"
)

df['text'].str.replace(EMOJI_PATTERN, '', regex=True)