SAS Enterprise Guide: Reading, Importing, and Appending Multiple Text Files
In many circumstances, different text files are created with the same formats representing different groups or time periods. When multiple text files are held in a specific location, whether they are on a server, cloud, or desktop computer; importing all of them independently and appending them is time consuming. SAS Enterprise Guide has an ability within the code to import multiple text files from the same location with a common structure and append them simultaneously, given that there is a common naming convention.
SAS Enterprise Guide is a hybrid between the traditional SAS programming language and a visual, point-and-click interface. The software bridges the gap for users not as familiar with base SAS, its programming language, and a more guided infrastructure. Enterprise Guide has code created behind the scenes and it allows users to create and integrate their own code into governed programs already generated by the software.
Enterprise Guide is project based and allows users to store and maintain different types of files. The files are stored in one location and can include imported datasets, codes, queries, notes, analysis, results, and reports. The import wizard within SAS Enterprise Guide allows formats and informats to be recommended and generated, and you can change and amend the format if needed. For example, the [COHORT20111.txt] text file below has a field STUID which SAS recognizes as a numeric field. However, it holds no numerical value.
The import wizard in Enterprise Guide allows you to rapidly change the type of variable from a numeric field and convert it to a string. Once this occurs, the corresponding format and informat changes with the field type.
After the import wizard has been executed, SAS Enterprise Guide creates code behind the scenes that you can copy, amend, or use in another project. To review the code generated with this import wizard specific to the text file mentioned above, you can right-click on the import step and select “Extract Code.” The code Enterprise Guide generated to import this data presents itself within a screen that can either be copied, saved, or amended.
The code created from the Import Wizard in SAS Enterprise Guide is much more detailed than you would ever need if the code was written separately from Enterprise Guide’s import process.
Enterprise Guide has a solution for importing multiple text files that share common formats and a naming convention. In the following example, a local directory has 30 different files stored with a common naming convention and common format. The files were created by the same source and all fields across the files are the same. These files all have the exact same structure and every field is standardized and has a common format. In order to read and import these independent files into SAS, 30 independent import steps would be created followed by the Append function joining all the data. See the example below.
In this example, all 30 text files were imported independently and joined using an Append function. However, the following is a process to read all files in one import step and append them at the same time.
First, you would create the import step for one of the data sets using the import wizard. During Step 1 of the Import Wizard, there is an option called “Performance.” Once “Performance” is selected, there is another option to “Bypass the data cleansing process,” which must be selected. The cleansing process that SAS Enterprise Guide performs creates a temporary file and stores it in a directory location. Some may feel selecting this option is counterintuitive, however, if this option is not selected the code cannot be generalized over multiple sets of data.
In Step 2 of the Import Wizard, SAS recognizes that the data has a row header. Renaming the columns to comply with SAS naming convention is also recommended and labels can be amended in Step 3 if needed. The import step also recognizes the data fields are separated by a common delimiter, the vertical bar.
In Step 3, you can change the variable type, format, and label if needed. At this point there is nothing critical that would prevent the import of multiple sets of data.
In Step 4, the option to “Generalize import step to run outside SAS Enterprise Guide” must be selected.
Now, the data for one text file will be imported into SAS Enterprise Guide using the format specified with code generated in the background. The next steps require changes in the code itself. In order to access the code for the import procedure, double-click on the import node and select the code tab.
At this point, the code will be much more specific than needed. Enterprise Guide leaves no stone unturned. All fields are clearly accounted for in the code. Note that the INFILE data step needs to read the exact location of the stored data. If there is a location stated other than the exact location directed by the INFILE statement, go back to the import step and confirm that the cleaning process is bypassed.
In order to change the code, you can either copy the code or simply click in the code and begin typing. Once this occurs, a warning displays requiring that you create a copy of the code for modification purposes. Click Yes.
For purposes of this tutorial, the point-and-click option will be sufficient. Once the code is editable, search for the INFILE statement and change the latter part of the file name. Replace the unique portion of the filename with an asterisk, which will direct SAS to import all datasets with a non-unique portion of the filename. In the example provided, all files share a common naming convention starting with CLACOHORT. The unique portion in the file name is numerical, and in this example it is the semester.
You can (and should) also change the final imported data’s name of the file created by SAS to a more general name. This has to be redirected from the original import step, otherwise a critical error may occur. In this example, the final data is renamed AllCohortData.
At this point, the new code needed to import multiple files has been created and the program can be run. Once the program is run, a new code will be created in the Process Flow map within SAS Enterprise Guide. If the name of the dataset created does not differ from the original, the new run will override and replace prior data or a critical error may occur.
For more information, contact me at mjohnston@pensacolastate.edu.