Chapter 9 Tidy Data
Learning Objectives
After completing this chapter, you will be able to:
- Recognize whether a dataset is in tidy format
- Reshape data between long and wide formats using
pivot_longer()andpivot_wider() - Separate and combine columns with
separate()andunite() - Handle missing values with
replace_na()andfill()
In the past several chapters, you have learned a lot in data visualization, data import and export, and data manipulation. All the data you have seen so far share a very attractive property, namely, they are all tidy. So, what is the so called tidy data? Following the definition in Wickham and Grolemund (2016), tidy data has the following three interrelated properties.
At a glance – Chapter ROADMAP
Section 9.1. Pivoting: Reshape data between long and wide formats to achieve TIDY structure.
Section 9.2. Splitting & Merging: Separate one column into multiple or unite multiple into one.
Section 9.3. Missing Values: Master advanced techniques for identifying, filling, and handlingNAvalues.
- Each variable must have its own column.
- Each observation must have its own row.
- Each value must have its own cell.
These properties of tidy data enable us to conduct efficient data manipulation and visualization. Note that in practical applications, many collected data is untidy. Although untidy data could also be very useful in terms of reporting and visually more intuitive, you are recommended to tidy it before applying the tools we learned in this book.