Aggregate values in pandas dataframe based on lists of indices in a pandas series

Solution for Aggregate values in pandas dataframe based on lists of indices in a pandas series
is Given Below:

Suppose you have a dataframe with an “id” column and a column of values:

df1 = pd.DataFrame({'id': ['a', 'b', 'c'] , 'vals': [1, 2, 3]})
df1
    id  vals
0   a   1
1   b   2
2   c   3

You also have a series that contains lists of “id” values that correspond to those in df1:

df2 = pd.Series([['b', 'c'], ['a', 'c'], ['a', 'b']])
df2
    id    
0   [b, c] 
1   [a, c] 
2   [a, b] 

Now, you need a computationally efficient method for taking the mean of the “vals” column in df1 using the corresponding ids in df2 and creating a new column in df1. For instance, for the first row (index=0) we would take the mean of the values for ids “b” and “c” in df1 (since these are the id values in df2 for index=0):

    id  vals avg_vals
0   a   1    2.5
1   b   2    2.0
2   c   3    1.5

You could do it this way:

df1['avg_vals'] = df2.apply(lambda x: df1.loc[df1['id'].isin(x), 'vals'].mean())
df1
  id  vals  avg_vals
0  a     1       2.5
1  b     2       2.0
2  c     3       1.5

…but suppose it is too slow for your purposes. I.e., I need something much more computationally efficient if possible! Thanks for your help in advance.

Let us try

df1['new'] = pd.DataFrame(df2.tolist()).replace(dict(zip(df1.id,df1.vals))).mean(1)
df1
Out[109]: 
  id  vals  new
0  a     1  2.5
1  b     2  2.0
2  c     3  1.5

Try something like:

df1['avg_vals'] = (df2.explode()
                      .map(df1.set_index('id')['vals'])
                      .groupby(level=0)
                      .mean()
                  )

output:

  id  vals  avg_vals
0  a     1       2.5
1  b     2       2.0
2  c     3       1.5

Thanks to @Beny and @mozway for their answers. But, these still were not performing as efficiently as I needed. I was able to take some of mozway’s answer and add a merge and groupby to it which sped things up:

    df1 = pd.DataFrame({'id': ['a', 'b', 'c'] , 'vals': [1, 2, 3]})
    df2 = pd.Series([['b', 'c'], ['a', 'c'], ['a', 'b']])
    df2 = df2.explode().reset_index(drop=False)

    df1['avg_vals'] = pd.merge(df1, df2, left_on='id', right_on=0, how='right').groupby('index').mean()['vals'] 

    df1
    id  vals  avg_vals
    0  a     1       2.5
    1  b     2       2.0
    2  c     3       1.5