4.4 Exporting to and Importing from Excel Files
Now, you know how to export to and import from delimited files. In this section, you will learn how to export to and import from Excel files with extensions .xls and .xlxs.
4.4.1 Export data into Excel files
To export data into an Excel file, you can use the writexl package. Let’s first install the package.
Now, we can load the writexl package and use the write_xlsx()
function to write data into an Excel file with extension .xlsx. Let’s first load the data from “gm_small.csv” and write it to “gm_small.xlsx”.
library(writexl)
library(readr)
gm_small <- read_csv("data/gm_small.csv")
write_xlsx(gm_small, "data/gm_small.xlsx")
By default, the column names of the data frame/tibble will be written to the first row of the Excel file. To skip the column names, you can set col_names = FALSE
in the write_xlsx()
function.
In addition to writing a single data frame to an Excel file, write_xlsx()
can also write multiple data frames into a single Excel file, with each Excel sheet containing one data frame. To do that, you need to supply a list of data frames as the first argument in write_xlsx()
. Let’s take a look at the following example which write both gm_small
and sahp
(a tibble in the r02pro package introduced in Section 4.1.2) into an Excel file named “two_data.xlsx”.
library(r02pro)
two_data <- list(gapminder = gm_small, sahp = sahp)
write_xlsx(two_data, "data/two_data.xlsx")
You can open this file with Excel and verify its contents.
4.4.2 Import Excel Files (.xls and .xlsx ) using read_excel()
After learning how to export data into an Excel file, let’s see how to read an existing Excel file into R. We can use the read_excel()
function in the readxl
package to import Excel files. Here, readxl
is another subpackage in the tidyverse package. Thus we can directly load the package if the tidyverse package is already installed.
Let’s import the Excel file “two_data.xlsx” we just created into R.
library(readxl)
my_df <- read_excel("data/two_data.xlsx")
head(my_df)
#> # A tibble: 5 × 4
#> country year gender continent
#> <chr> <dbl> <chr> <chr>
#> 1 Albania 2004 female Europe
#> 2 Andorra 2004 female <NA>
#> 3 United Arab Emirates 2004 female Asia
#> 4 Argentina 2004 female Americas
#> 5 Armenia 2004 female Asia
We can see from the result that although the file contains two sheets, the function will import the first sheet by default. To import the second sheet, you can set the sheet
argument to the sheet name (sahp
) or the position of the sheet (2).
sahp_1 <- read_excel("data/two_data.xlsx", sheet = "sahp")
# sahp_1 <- read_excel('two_data.xlsx', sheet = 2) #same result as the previous
# line
head(sahp_1)
#> # A tibble: 6 × 12
#> dt_sold bedroom bathroom gar_car oa_qual liv_area lot_area house…¹
#> <dttm> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
#> 1 2010-03-25 00:00:00 3 2.5 2 6 1479 13517 2Story
#> 2 2009-04-10 00:00:00 4 3.5 2 7 2122 11492 2Story
#> 3 2010-01-15 00:00:00 3 2 1 5 1057 7922 1Story
#> 4 2010-04-19 00:00:00 3 2.5 2 5 1444 9802 2Story
#> 5 2010-03-22 00:00:00 3 2 2 6 1445 14235 1.5Fin
#> 6 2010-06-06 00:00:00 2 2.5 2 6 1888 16492 1Story
#> # … with 4 more variables: kit_qual <chr>, heat_qual <chr>, central_air <chr>,
#> # sale_price <dbl>, and abbreviated variable name ¹house_style
If we only want to import a portion of the data, say the first 5 rows and the first 4 columns, then we can set the argument range = "A1:D5"
, just like the range in an excel file. Note that the index starts with the first row, which may correspond to the column names.
sahp_2 <- read_excel("data/two_data.xlsx", sheet = "sahp", range = "A1:D5")
sahp_2
#> # A tibble: 4 × 4
#> dt_sold bedroom bathroom gar_car
#> <dttm> <dbl> <dbl> <dbl>
#> 1 2010-03-25 00:00:00 3 2.5 2
#> 2 2009-04-10 00:00:00 4 3.5 2
#> 3 2010-01-15 00:00:00 3 2 1
#> 4 2010-04-19 00:00:00 3 2.5 2
Note that read_excel()
can read both .xls and xlsx file types.
4.4.4 Exercises
Use R to create the following data frame and assign it to the name df1
.
#> word1 number1
#> 1 one 1
#> 2 two NA
#> 3 <NA> 3
Then, use R to create the following data frame and assign it to the name df2
.
#> word2 number2
#> 1 three 3
#> 2 <NA> 4
#> 3 five 5
Then create a list named my_list
with df1
as the first element and df2
as the second element.
Write R code to set working directory to the desktop, then export
my_list
into an excel file namedlist.xlsx
. How many sheets are there in the excel file? What are the contents in each sheet?Write R code to import the first two rows and the first column of the second sheet from the excel file you just created. And verify the object value.