Chapter 9 Tidy Data

Learning Objectives

After completing this chapter, you will be able to:

  • Recognize whether a dataset is in tidy format
  • Reshape data between long and wide formats using pivot_longer() and pivot_wider()
  • Separate and combine columns with separate() and unite()
  • Handle missing values with replace_na() and fill()

In the past several chapters, you have learned a lot in data visualization, data import and export, and data manipulation. All the data you have seen so far share a very attractive property, namely, they are all tidy. So, what is the so called tidy data? Following the definition in Wickham and Grolemund (2016), tidy data has the following three interrelated properties.

At a glance – Chapter ROADMAP

Section 9.1. Pivoting: Reshape data between long and wide formats to achieve TIDY structure.
Section 9.2. Splitting & Merging: Separate one column into multiple or unite multiple into one.
Section 9.3. Missing Values: Master advanced techniques for identifying, filling, and handling NA values.

  1. Each variable must have its own column.
  2. Each observation must have its own row.
  3. Each value must have its own cell.

These properties of tidy data enable us to conduct efficient data manipulation and visualization. Note that in practical applications, many collected data is untidy. Although untidy data could also be very useful in terms of reporting and visually more intuitive, you are recommended to tidy it before applying the tools we learned in this book.

References

Wickham, Hadley, and Garrett Grolemund. 2016. R for Data Science: Import, Tidy, Transform, Visualize, and Model Data. " O’Reilly Media, Inc.".

Buy Me A Coffee