2 🔢 Creating Dataframes

2.1 Combining Vectors into Dataframes

In the last chapter, you learned how to create vectors that included elements, either through c(), seq(), or :.

Each vector can be considered a column within a table. So if you want to create a table, you just have to combine the vectors together, right? Let’s combine the friend_names and friend_ages vectors from the previous chapter:

friends <- c(friend_names, friend_ages, lives_in_dc)
friends
##  [1] "Abram"    "Bryant"   "Colleen"  "David"    "Esther"  
##  [6] "Jeremiah" "34"       "35"       "32"       "29"      
## [11] "30"       "30"       "TRUE"     "FALSE"    "FALSE"   
## [16] "TRUE"     "FALSE"    "TRUE"

As you can see, all this does is combine two vectors together into a single, longer vector. It does not give us a table with multiple rows and columns, as expected.

In order to combine vectors of information into a single data frame, you can use the data.frame function:

friends <- data.frame(names = friend_names,
           ages = friend_ages,
           DC_Resident = lives_in_dc,
           fav_number = friend_fav_number)
friends
##      names ages DC_Resident fav_number
## 1    Abram   34        TRUE       1.00
## 2   Bryant   35       FALSE       2.17
## 3  Colleen   32       FALSE      26.00
## 4    David   29        TRUE       7.00
## 5   Esther   30       FALSE      10.00
## 6 Jeremiah   30        TRUE       9.00

Here, we have created a names variable in the friends data frame that corresponds to the values in the friend_names vector, and similarly an ages variable in friends that corresponds to the values in friend_ages.

Note that all three vectors that we used were of the same length. You can check the length by using the length() function.

c(length(friend_names), length(friend_ages), length(lives_in_dc))
## [1] 6 6 6

2.1.1 Troubleshooting data.frame

If your vectors are not all of equal length, R will return an error.

In my previous example, I had six friends: Abram, Bryant, Colleen, David, Esther, and Jeremiah. Let’s say that I don’t know whether Jeremiah lives in DC, so my lives_in_dc vector is only a length of 5. I’m overwriting my old lives_in_dc vector with the one below.

lives_in_dc<- c(TRUE, FALSE, TRUE, TRUE, FALSE)

c(length(friend_names), length(friend_ages), length(lives_in_dc))
## [1] 6 6 5
#Combine the dataframe just like I had before. 
friends <- data.frame(names = friend_names,
           ages = friend_ages,
           DC_Resident = lives_in_dc,
           stringsAsFactors = FALSE)
## Error in data.frame(names = friend_names, ages = friend_ages, DC_Resident = lives_in_dc, : arguments imply differing number of rows: 6, 5

At this point, we have a decision to make. You can either fix the mistake and add the data, or you can leave it as NA. NA is a special value used within R, which we will talk about later.

#Force your vector to be a length of 6
lives_in_dc<- c(TRUE, TRUE, FALSE, TRUE, FALSE, NA)

c(length(friend_names), length(friend_ages), length(lives_in_dc))
## [1] 6 6 6
#Combine the dataframe just like I had before. 
friends <- data.frame(names = friend_names,
           ages = friend_ages,
           DC_Resident = lives_in_dc,
           stringsAsFactors = FALSE)

friends
##      names ages DC_Resident
## 1    Abram   34        TRUE
## 2   Bryant   35        TRUE
## 3  Colleen   32       FALSE
## 4    David   29        TRUE
## 5   Esther   30       FALSE
## 6 Jeremiah   30          NA

2.2 Importing an Excel or CVS File into R as a Data frame

XLSX and CSV files are how many datasets are stored. They open through the Excel app. You can tell that they are Excel from its icon:

Excel File.

FIGURE 2.1: Excel File.

You can open these files in Excel to preview the data and perform basic data analysis, but for more serious data analysis, and better control of charts and graphics, we can import them into R.

Before you continue, run this command:

## [1] "/Users/kaisamng/Github/RGuides"

Whenever we manipulate a file in R, R assumes that the file is within your working directory. Think of the working directory as your desk. If you left your math homework at your desk at home, and you need your mom to take a picture of your work, you need to tell her where you left it.

Unless you give the exact path to a file, R will always assume you are referencing a file in the working directory. For example, on FCPS computers, your default working directory is Documents. If you download a CSV file data.csv, it automatically goes to your Downloads folder instead. To import into Excel, you can only import it with data.csv if you move it into your Documents folder.

To set your working directory, go to the R Studio Menu > Session > Set Working Directory > Choose Directory.

We strongly suggest that, once you download your data file, you move it into a folder dedicated to that purpose– call it “IBET Project” or “RS1 Unit 1 Project” or whatever– inside your Documents folder.

Do not use OneDrive or Google Drive. These services will mess up your data.

Importing the dataset itself is easy.

  • XLSX: Go to the menu > File > Import Dataset > From Excel.
  • CSV: Go to menu > File > Import Dataset > From Text (readr).
Navigating to the Import Menu.

FIGURE 2.2: Navigating to the Import Menu.

  1. In this case, I am using the anime dataset. Click on “browse” to find the file that you want to import into R.
The Import Data Window.

FIGURE 2.3: The Import Data Window.

  1. If everything looks good, click Import at the bottom right of the window.

    1. If you want to change the name of the dataset, do that on the bottom left of the window under “name.”
  2. You should now have your dataset imported. Verify that it is correctly imported, on the top right.