Solution for How to iterate one dataframe based on a mapping file in R?
is Given Below:
|Serial No.||Company 1||Company 2||Company 3|
I have a data structure like this where the column headings represent some companies and the row headings represents consumers who buy the products. ‘NA’ representing no purchase for that company’s products by the consumer.
I have a second mapping file where the companies are represented as row headings as follows –
The data set is for over 10000 consumers and 1000 companies. I’m getting the market share for different countries and categories using the aggregate function and mapping file.
I want to make a look to iterate values in the first data-frame to change the share for different countries and categories. The idea is to make a loop where I can choose which country’s (or category) share needs to be changed along with the share and then to use the mapping file to iterate values for companies in those countries (or category). The values need to be changes for only those consumers who buy the products from companies belonging to that country (or category).
Can someone suggest how can this be done in R (preferably) or Python?
Before iteration I will use the aggregate function in R to get the shares for a country (or category) like this –
In the loop I want to be able to specify the share for some country (say UK) to whatever is required (say 0.5). The mapping file will be used to iterate values to the first data structure where people have bought products from companies in UK.
The final output will be something like this.
Here’s a guess: ultimately, this is a combination of reshape from wide to long, then
merge/join, and finally aggregation/summarizing by group. If you need more information for either operation, using those key-words (on SO) will provide very useful information.
base R (and
## reshape dat1melted <- reshape2::melt(dat1, "Serial No.", variable.name = "Company") dat1melted$Company <- as.character(dat1melted$Company) dat1melted <- dat1melted[!is.na(dat1melted$value),] dat1melted # Serial No. Company value # 2 02 Company 1 2 # 4 04 Company 1 1 # 6 01 Company 2 2 # 10 05 Company 2 4 # 12 02 Company 3 5 # 13 03 Company 3 4 ## merge dat1merged <- merge(dat1melted, dat2, by = "Company", all.x = TRUE) dat1merged # Company Serial No. value Country Category # 1 Company 1 02 2 UK FMCG # 2 Company 1 04 1 UK FMCG # 3 Company 2 01 2 UK FMCG # 4 Company 2 05 4 UK FMCG # 5 Company 3 02 5 India FMCG # 6 Company 3 03 4 India FMCG ## aggregate by group aggregate(value ~ Country, data = dat1merged, FUN = sum) # Country value # 1 India 9 # 2 UK 9
library(dplyr) # library(tidyr) # pivot_longer dat1 %>% ## reshape tidyr::pivot_longer(-`Serial No.`, names_to = "Company") %>% filter(!is.na(value)) %>% ## merge left_join(., dat2, by = "Company") %>% ## aggregate by group group_by(Country) %>% summarize(value = sum(value)) # # A tibble: 2 x 2 # Country value # <chr> <int> # 1 India 9 # 2 UK 9