Employing R to Serve Excel Spreadsheet: Pivot Table Generation, Multiple Worksheets Creation, and Graph Insertion

Martinqiu
Geek Culture
Published in
5 min readApr 26, 2021

--

R users still use Excel, particularly when working with people who solely rely on Excel. The task to share analytical results generated by R to Excel is frequent in many organizations. Hence, a convenient and versatile passage between R and Excel is highly desirable. Here is how to establish such a passage.

Although R allows exporting results to csv format, which Excel can open, several major challenges remain. First, you can insert only one sheet to a csv file, no more. Second, all non-standard characters (e.g., Chinese or Japanese) are not displayed properly because csv files cannot deal with Unicode/UTF-8. Third, graphs cannot be exported to csv files.

Some R packages allow exporting to Excel files, but they require the installation of Java, which creates additional hurdles. I witnessed someone failed to install Java on his system, and as a result, he had to replace his computer.

Thanks to the developers of openxlsx, all the three challenges mentioned above are now gone for people working between R and Excel. Java is not required either. Below I use the famous and free diamonds data offered in ggplot2 package for an illustration of what I promised in the title: Pivot Table Generation, Multiple Worksheets Creation, and Graph Insertion.

For those who are unfamiliar with diamond data: the dataset stores information for more than 50 thousand diamonds, including three features in the format of categorical variables, cut, clarity, and color, and two numerical parts, carat, and price.

We can choose any two categorical features, let’s say cut and clarity, to create a pivot table, which shows the average price or stone size given the combo of cut and clarity. This job is not easy in Excel as VBA is required. But in R, it is done in two lines with the help of pipe operations (%>%).

require(ggplot2)
require(dplyr)
diamonds %>% group_by(cut,color) %>% summarize(mean.value=mean(price)) %>% spread(cut,mean.value)

The mean function can be replaced with other functions, median, sum, count, and sd for the chosen summary statistics.

Below are the result of cut-color-mean price. We create a nice table summarizing the average price for each cut and color combination. I want to export tables like this to a spreadsheet.

## # A tibble: 7 x 6
## color Fair Good `Very Good` Premium Ideal
## <ord> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 D 4291. 3405. 3470. 3631. 2629.
## 2 E 3682. 3424. 3215. 3539. 2598.
## 3 F 3827. 3496. 3779. 4325. 3375.
## 4 G 4239. 4123. 3873. 4501. 3721.
## 5 H 5136. 4276. 4535. 5217. 3889.
## 6 I 4685. 5079. 5256. 5946. 4452.
## 7 J 4976. 4574. 5104. 6295. 4918.

To export the result you have to save the result it first. You need to assign it to a dataframe; call it table1.

table1= diamonds %>% group_by(cut,color) %>% summarize(mean.value=mean(price)) %>% spread(cut,mean.value)

To create more tables like the one above, we can repeat the command using different diamond features.

table2= diamonds %>% group_by(clarity,cut) %>% summarize(mean.value=mean(carat)) %>% spread(clarity,mean.value)

But given the repetitive nature of the table creation, let’s write a function to do the job; call it excel.func.

excel.func=function(x,y,z){
diamonds %>% group_by({{x}},{{y}}) %>% summarize(mean.value=mean({{z}})) %>% spread({{x}},mean.value)
}

Note that to refer to the function argument, you shall use {{x}}. I may explain this in another article later.

Now we apply the function excel.func to create three different tables for different combo of cut, color and clarity, with the mean value of either price or carat.

table1=excel.func(cut,color,price)
table2=excel.func(clarity,cut,carat)
table3=excel.func(color,clarity,price)

Next, we use openxlsx package to export them to a spreadsheet.

The first common scenario is creating a new spreadsheet file with each table written to a separate sheet. The steps are as follows.

Use createWorkbook to create a workbook object in R, named diamond.wb, and add three sheets to it using addWorksheet. The three sheets are called my sheet 1 to 3 (you can call it whatever you want), respectively.

require(openxlsx)
diamond.wb <- createWorkbook()
addWorksheet(diamond.wb,"my sheet 1")
addWorksheet(diamond.wb,"my sheet 2")
addWorksheet(diamond.wb,"my sheet 3")

The next step is to use writeData to hook the data.frames, table 1 to 3, to the three sheets in diamond.wb, respectively.

writeData(diamond.wb,"my sheet 1",table1)
writeData(diamond.wb,"my sheet 2",table2)
writeData(diamond.wb,"my sheet 3",table3)

The above repetitive work can be simplified using sapply. That will be a different topic.

The three tables are now saved to the workbook diamond.wb in R. The final step is easy: export diamond.wb to an Excel file on your computer (I put it under F drive for quick verification).

saveWorkbook(diamond.wb, file = "F:/diamonds_summary.xlsx", overwrite = TRUE)

The result is as follows:

Another common scenario is to add a new table to an existing spreadsheet without overwriting.

Let’s create another table on cut-color-mean carat, call it table4, and save it to “my sheet 1” where we already stored information of cut-color-mean price of table1.

table4=excel.func(cut,color,carat)
writeData(diamond.wb,"my sheet 1",table4,startCol = 1,startRow =11)

Note that in writeData function, I designate the position of the table4 in my sheet 1 at row 11, column 1, so it will not overwrite existing information there.

The final step is to save to the same spreadsheet file. But before you proceed, make sure you have closed the Excel file you opened for review. Otherwise, the saving won’t work, and no error messages will alert you. Of course, you can always save to a new Excel file with all four tables by changing the file argument below.

saveWorkbook(diamond.wb, file = "F:/diamonds_summary.xlsx", overwrite = TRUE)

The result is as follows:

The openxlsx Package also allows you to export a plot created in R (e.g., by ggplot2) to a spreadsheet. As illustrated below, I use inserPlot to do the job. Note that inserPlot will insert only the most current plot displayed in R (can be loaded by print function).

addWorksheet(diamond.wb,"my plot")plot1=ggplot(diamonds)+geom_bar(aes(x=clarity,fill=color))
print(plot1)
insertPlot(diamond.wb,"my plot",startCol = 3,startRow =4, width = 16, height = 10, fileType = "png", units = "cm")saveWorkbook(diamond.wb, file = "F:/diamonds_summary.xlsx", overwrite = TRUE)

The result is as follows:

You can get more help on this package to customize your R->excel operations by checking the help of addWorkshee and writeData.

Thanks for Reading.

This article is my first post. I am sharing solutions to some common issues that my students of business analytics encountered. I hope you will also find they are helpful.

LinkedIn: https://www.linkedin.com/in/martin-qiu-46a583b/

--

--

Martinqiu
Geek Culture

I am a marketing professor and I teach BDMA (big data and marketing analytics) at Lazaridis School of Business, Wilfrid Laurier University, in Waterloo, Canada.