Random Samples in Excel Using the RAND Function

By Angela Henderson, Director of Institutional Research and Effectiveness, Stetson University

This tip provides two methods for selecting a simple random sample from an Excel data file using the RAND() function. This tip is particularly helpful if you need to determine survey or research samples from student, staff, or faculty populations. Two simple ways to accomplish this goal based on an example scenario are provided below.

Scenario: Create a sample of 25% of undergraduate students enrolled in Fall 2016.

Data file: Fall 2016 undergraduate census data file (containing all undergraduate students).

Data file sample

Random1.JPG

Method 1
 
This method uses the formula = RAND() to generate a numeric value between 0 and 1 for each record.
 
For this example, enter the formula: =RAND() in cell E2 to generate a random number for the first record.
 
Random2.JPG
 
Click on cell E2 and then copy the formula down and create a random value for each record.
 
Random3.JPG
To prevent the random values from changing when the worksheet recalculates, select and copy all values in column E. With column E cells still highlighted, right-click and select Values under Paste Options. This replaces the formulas in column E with the calculated values and prevents the data from changing. To prevent the random values from changing when the worksheet recalculates, select and copy all values in column E. With column E cells still highlighted, right-click and select Values under Paste Options. This replaces the formulas in column E with the calculated values and prevents the data from changing.
 
Random4.JPG
Select all the data columns and click on “Sort & Filter” on the Home tab in the Ribbon. Select Filter to apply filters to all the selected columns.
 
Once filters are applied, click the filter arrow in the header of the random value column (column E) and select Sort Smallest to Largest.
 
Ramdom5.JPG

As shown below, records are now sorted according to the random value generated. Select the number of records corresponding to your sample needs. For example, if your data file contains 1,000 records, simply select the first 250 records for a random sample of 25% of the population.
 
Random6.JPG
Method 2
 
A second, but somewhat less exact method, can be used to generate a TRUE/FALSE indicator for each associated record. Using the formula = RAND()<0.255 returns TRUE for about 25% of the records selected. As such, some variance in the percentage of results returned as TRUE can be expected. Using = RAND()<0.255 (as opposed to = RAND()<0.25) helps reduce the variance in the percentage of records returned.
 
If seeking a different sample size, for example, 50%, simply change the value in the formula to reflect the desired proportion: = RAND()<0.505.
 
Using this method creating a random sample list of 25% of the population, enter the formula
= RAND()<0.255 in cell E2.
 
Random7.JPG

Click on cell E2 and then copy the formula down and create an indicator for each record.
 
Random8.JPG

As in the prior method, to prevent TRUE/FALSE indicators from changing when the worksheet recalculates, select and copy all values in column E. With column E cells still highlighted, right-click and select Values under Paste Options. This replaces the formulas in column E with the calculated values and prevents the data from changing.
 
To add filters to the columns, select all the data columns and click on “Sort & Filter” on the Home tab in the Ribbon. Select Filter to apply filters to all the selected columns.
 
Once filters are applied, click the filter arrow in the header of the TRUE/FALSE indicator column (column E) and uncheck FALSE and (Blanks) so only TRUE remains selected.
 
Random9.JPG

As shown below, results are now filtered to show only records with a TRUE indicator that should be included in the sample.

Random10.JPG

 

 

 Comments

 
To add a comment, Sign In
There are no comments.