Mapping Student Enrollment Data with R

By Elizabeth Hoff, Institutional Research Analyst, and Steve Wilkerson, Associate Vice Provost, The University of Texas at San Antonio

Our office recently experimented with mapping student enrollment data using readily available packages in the open-source R software. Full R code can be found at the end of this tip. For this map, we displayed our fall 2016 in-state student population by county of residence.

Our first step was to create a dataset consisting of in-state students’ IDs, counties of residence, and a count of the number of students residing in each county. The resulting dataset structure is shown below (Note: These are not actual student IDs):

tt1R.PNG
 

Once the dataset was imported into R, we loaded the graphing package, ggplot2, as well as the R mapping packages: ggmap, maps, and mapdata.

The mapdata package includes a built-in state dataset of every state in the United States and its associated latitude and longitude. We stored this as a dataframe and then selected a subset, named ‘tx_state’, to include only our state of interest, Texas. This dataset will be used to create the base map of our state, Texas.

The mapdata built-in county dataset is similar in structure to state dataset, but includes every county in the United States. Thus, we similarly stored the county dataset as a dataframe and selected only counties in the state of Texas, in a subset named ‘tx_county’.

Next, we applied the ggplot package and mapdata state dataset to map the state of Texas.

tt2R.PNG
 

We then used the mapdata county dataset to add outlines of the counties to our map. We also created an object called ditch_the_axes that functionally removes all axes, borders, gridlines, etc. from our map.

tt3R.PNG
 

To incorporate our student enrollment data into the map, we needed to join our imported student dataset to the mapdata county dataset using the dplyr package. For the join to be successful, we needed to ensure that the county information stored in our student dataset was structurally identical to that stored in the mapdata county dataset.

We constructed a student enrollment map by using the base Texas county map designed previously and adding our enrollment count by county as a fill.

tt4R.PNG

As you can see, the majority of our student population is from Bexar county (in blue), which is the location of the university. The gray areas indicate counties with no enrolled students.

To increase the color variation in the final map, we customized the legend scale by using a log transformation and our official university colors. In addition, we customized the legend breaks and title to improve readability.

The result is an attractive, publishable figure of our enrolled in-state students by county of residence. Thanks, R!

tt5R.PNG

 

 Full Code:

tt6R.PNG

 

 

 Comments

 
To add a comment, Sign In
There are no comments.