Converting Text Columns to Fixed-Width Text

By Kathy Coy, Director, Institutional Research, William Rainey Harper College

The “CONCATENATE” Function in Excel

The January 2017 eAIR article, Using CONCATENATE Function in Excel to Create Long STATA Commands, did a nice job of explaining what the CONCATENATE function is - a way to combine the content of multiple cells in Excel - and what it is used for. This Tech Tip covers using CONCATENATE, REPT, and LEN Functions to convert text columns from an Excel file to fixed-width text.

Properties of a Fixed-Width Text File

Data in a fixed-width text file is arranged in rows and columns, with one entry per row. Each row contains one or many pieces of data arranged by columns or fields. Each data column has a defined width, specified in characters, which determines the maximum amount of data it can contain (column width for each column is the same for all rows). If the data value does not use all the space allotted for the column, the balance is often padded with spaces before the next column of data is presented.

Problem to Solve

The Illinois Community College Board (ICCB), a governing official to which Harper College reports, only accepts submissions in the form of text files comprised of a series of fixed-width fields. Some of the fields contain padding because the information contained is of different lengths (e.g., name, address) or the information is missing entirely (e.g., null values). Because making multiple edits and sorting in the text file format is next to impossible, we use Excel for the data cleaning process. This leaves us with the challenge of getting the file back into the fixed width text format prior to submission. To solve this problem, we use CONCATENATE, REPT, and LEN functions in Excel. We use a two-step process.

Before you begin, create a copy of the tab that has the cleaned submission data. It is always a good practice to save a version of your data before making any big changes—you never know when you will need to revert to the data before those changes were implemented. Because functions in Excel can be touchy, I recommend that you convert all numeric fields to text.

Step One – Use CONCATENATE, REPT, and LEN to create “lost” padding in column where the data has different lengths

Identify which columns need padding. For this example, they are Columns D, E, F, G, H, and I. The values in Columns A, B, C, and J use all the allotted characters.

Len1.JPG
Insert a column to the right of the first column that needs padding for the submission file. Name the column using the item name or number from the submission instructions with a dash flat at the end (-flat). Adding the column next to the submission column will make the last step easier. In the above example, the first column that needs padding is D (Item 9 or FirstName).

Len2.JPG
Use the format cells command to format the flat column to “General.” This will ensure that the formula works properly. You and CONCATENATE are now best friends, but that doesn’t mean CONCATENATE won’t ever act up over formatting differences.

Use the CONCATENATE formula in Excel. The general syntax for CONCATENATE is CONCATENATE(text1, [text2], …) , although there are many modifications you can add to the formula to make it better suited for submissions, REPT and LEN functions are friends of your new best friend – so all functions get along nicely. In the function line, paste the following formula (editing the black/bold text portions as appropriate). Code: =CONCATENATE(D2,REPT(" ",16-LEN(D2)))

Len3.JPG
This creates a fixed-width column in which spaces are added to the data in column D such that the overall width is 16 characters or spaces. To copy the formula to the end of the column, mouse over it until the pointer becomes a cross, then double-click.

In the above example, the number value (16-LEN) in the middle of the formula sets the width of the column. Adjusting LENgth of columns to match the submission fields you are required to use. Many of ICCB fields have required field lengths, regardless of the value – the LEN function added to our CONCATENATE helps us take into consideration the required length of these fields. The following snippet illustrates how the two different lengths in the code create different column widths. The REPT function adds space where we need it to take into account the length requirement.

Code for field 10: =CONCATENATE(F2,REPT(" ",30-LEN(F2)))

Code for field 11: =CONCATENATE(H2,REPT(" ",16-LEN(H2))) 

Len4.JPG

Repeat steps for all applicable columns. Anything you need to pad, create those flat-copy columns and apply the CONCATENTATE/REPT/LEN formula as needed.

Step Two – Bring it all together in one column

Insert a column at the beginning of the worksheet. Alternately, use the first open column at the end of the file. Format the new column as “General.”

Concatenate the columns sequentiallyselecting the flat column rather than the original data column. Now we use one giant CONCATENATE formula to combine all our child flat-columns. Think of this giant CONCATENATE column as the parent to all those child formulas, gathering them and forcing them to sit comfortably in one column for a family photo.

Len5.JPG

Step Three - Copy and paste final column into Notebook for submissions

Copy the entire concatenated column and paste it into a blank notebook file. Remove any extraneous rows from the beginning and end of the file. The “padding” created in step one is evident in the text file:

Len6.JPG

Name the text file using the naming convention the agency or governing body requires (in our case, ICCB). Congratulations you are ready to submit your fixed-width text file!

 

 

 Comments

 
To add a comment, Sign In
Total Comments: 1
 
Cal posted on 4/13/2017 3:57 PM
And if you don't like "concatenate", you can use the &
=concatenate(a1,b1)
is the same as
a1&b1