6 👀 Visualizing One-Variable Data

You can find more examples of charts and graphics at R Charts

Now that we have talked about summarizing data, let’s talk about visualizing it.

6.1 Histograms

You can use the built-in hist() function in order to create histograms, very easily. Below, I will graph the TAVG, average temperature, from the dca_weather dataset in February, when month equals 2.

feb_temps<- dca_weather$TAVG[dca_weather$month %in% 2]
hist(feb_temps, 
     col= "skyblue", 
     main= "Average Daily Temperature at DCA in February 2022", 
     xlab= "Temperature (Fahrenheit)", 
     ylab="Number of Days")

When we use any plotting function in R, we always pass some extra commands to customize the chart called a parameter.

You can type all the parameters in a single line, but it is usually cleaner to make a new line for each parameter. As long as you follow each parameter with a comma , and the parameters are all inside the parentheses () of your plot command, you will be OK.

6.1.1 Graphical Parameters

Here’s a breakdown of the parameters in the histogram above:

6.2 Histogram Exercise

Your turn! In the box below is a histogram of march_temps at DCA. Label the graph appropriately, and change the color to darkseagreen4.

6.3 Dotplots

Unfortunately, there’s no built in function to create dotplots in R. We have to repurpose the stripchart command in order to make something closest to a dotplot.

First, I will store TAVG, average temperature, from the dca_weather dataset in June, when month equals 6. I will save this data into a variable called june_temps.

june_temps<- dca_weather$TAVG[dca_weather$month %in% 6]

Here’s the code for creating a dotplot. Notice that june_temps is actually a vector (a list of many numbers).

stripchart(june_temps, method = "stack", 
           offset = .5, 
           at = 0, 
           pch = 19,
           col = "steelblue", 
           main = "Average June Temperatures at DCA", 
           xlab = "Temperature (Fahrenheit)")

As before, you pass several parameters into stripchart(). You can safely ignore stack=, offset=, and at=. Just include them, any time you need a dotplot.

pch= is a parameter that you can feel free to modify. It specifies the style of each dot. For a list of different dot styles you can use, click here.

6.4 Boxplots

6.4.1 Single Boxplots

Let’s start off with the most simple case: making one boxplot with one variable. I’ll make a boxplot of June Temperatures in DCA:

boxplot(june_temps,
        col="pink",
        main= "Boxplot of June Temperatures at DCA Airport",
        xlab= "Temperature (Fahrenheit)",
        horizontal= TRUE)

By default, the boxplot() command will create boxplots in the vertical orientation (more on this later). If you want to have the boxplots become horizontal, set the horizontal= parameter to TRUE.

6.4.2 Grouped Boxplots

Normally, you don’t just make one boxplot, but several boxplots to compare variance within groups. To do this, we have to give the boxplot() command a second grouping variable, which we indicate with the ~ operator.

Usually, both the variable you want to graph and the variable you want to group by are in the same dataset, so we specify it with the data= parameter. Below are the bosplots for TAVG with the dca_weather dataset, grouped by month.

boxplot(TAVG~month, data=dca_weather,
        col="skyblue",
        main= "Daily Average Temperatures at DCA Airport, by Month",
        xlab= "Month",
        ylab="Temperature (Fahrenheit)")

The ~ variable effectively tells R which axis to to place each variable in, based on y~x. In other words, since the first variable is TAVG, that goes on the y-axis, and month goes on the x-axis. We have effectively set month as our grouping variable.

Look at what happens when you switch the axes around:

boxplot(month~TAVG, data=dca_weather)

Yep. It’s not pretty. Be careful of which variable goes where.

6.4.3 Customizing boxplot()

Here’s an idea of the things you can do with boxplots:

boxplot(TAVG~month, data=dca_weather,
        col=c("skyblue", "darkcyan", "cadetblue1", "chartreuse", "forestgreen", "darkorange"),
        main= "Daily Average Temperatures at DCA Airport, by Month",
        xlab= "Month",
        ylab="Temperature (Fahrenheit)",
        names=c("Jan", "Feb", "Mar", "Apr", "May", "Jun"))
  • In our original graph, Months were labeled as their numbers. If you want to give custom names for each group, use the names= parameter. Remember to set the names= parameter to a vector equal to the number of boxplots there are, otherwise, R will return an error.
  • If you want to give your boxplots different colors, set col= to a character vector of color names. Remember the length of this vector also needs to be equal to the number of boxplots.
  • R automatically calculates what the appropriate x and y axes values are. If you want to change them, specify an xlim= and ylim= parameter. Each parameter should take the form of c(min, max), where min and max are the numeric values for the start of the axis and the end of the axis, respectively.

6.5 Boxplot Exercise

Your turn! The following boxplot was supposed to graph the PRCP variable (representing amount of snow/rainfall in inches), grouped by month, but is completely messed up. Fix it so that:

  • You graph PRCP, grouped by month
  • You have an appropriate main title, x-axis, and y-axis labels
  • The “Months” are labeled as “Jan”, “Feb”, “Mar”, etc. etc.
  • You add some pretty colors to your boxplots.
  • You have a y-axis that goes from 0 to 100.

6.6 Exporting your graphs

You can find your plots under the Plots tab in the lower right.

There are a couple of ways to export your graphs in RStudio.

  1. Right click on the plot, and select “Save Image As”, or “Copy to Clipboard”.
  1. Click on Export on the top, and then click “Save As Image”, or “Copy to Clipboard”.

6.6.1 Blurry Graphs?

Blurry graphs happen because your plot window is too small. For some reason, it hapens more on the FCPS provided computers than on personal laptops. To fix:

  1. Under the plots tab, click “Zoom”. A new window will pop open.
  2. This window will have a larger version of your graph. Use the right click method to save your graph.

6.7 More graphing options

6.7.1 Using par()

If you run par() before the command for any graph, you can stack multiple graphs into the same window. Here’s an example:

Switch the 1 and 2 around in the mfrow= parameter so it says par(mfrow=c(2,1)). Then, rerun the code. What happens?

As you can see, passing the mfrow= parameter allows you to stack multiple graphs based on the a certain number of rows and columns. This will be helpful, especially when you have multiple graphs of the same variable.

6.7.2 Overlaying a five number summary onto a boxplot

Every graph in R can be thought of as a layer– if you want to add anything on top of your graph, it is represented as a subsequent call after your plot command.

For example, if I wanted to overlay a five number summary on top of each part of the boxplot, I run the text() command. Inside the text command, I tell R what labels= to place, and at which x= value they should be placed at.

boxplot(june_temps,
        col="pink",
        main= "Boxplot of June Temperatures at DCA Airport",
        xlab= "Temperature (Fahrenheit)",
        horizontal= TRUE)
text(labels=fivenum(june_temps), x=fivenum(june_temps),  y=1.3)