Combining Multiple Dat Files

Fair Finance Watch Resources: Combining multiple ‘dat’ files.

Cabrini-Green, while it still survives. Chicago, IL, March 2006.

April 3, 2006, shortly after midnight

Matthew,

I’ve revised the code as we discussed. As I inspected the output, I encountered a subtle error that demonstrates the limits

of automation. Please inspect each set of tables from this code very carefully, to avoid possible misinterpretations. First,

here is the blank Excel worksheet you should keep open while running this SAS file. You will get results that look like this.

Now page over to Sheet 9 of the workbook, for Tables 9a and 9b -- racial/ethnic breakdown for high-cost and non high-

cost first-lien home purchase originations. Look carefully at Table 9a: There are no high-cost originations for those reporting

as “4 Black, Ethnicity Unknown.” Therefore, SAS refuses to paste in a column for this category. Therefore, the template

we used to create the Excel worksheet performs the wrong calculations to derive the disparity ratios. If you look at the formulas

in the disparity ratios at the very bottom, you will see how each is calculated, and you’ll see how things must be fixed manually

to ensure that the ratios are correct. You can insert an extra row in Table 9a, but then you will still have to fix the disparity

ratio formulas in the bottom rows of the worksheet. I’m sorry, but I do not know how to force SAS to produce the extra

row when there are no observations in the respective category. You should also note that the same thing can happen

when there is an unexpected *column* that is not in the template.

For this reason, I offer the following advice. First, run this code on an institution that you know fairly well. Then sit down,

print out all nine worksheets, and spend a solid block of time looking for logical inconsistencies -- comparing the number of

originations shown on various tables, for example, to ensure that everything sums correctly. Second, if this code has no

logical inconsistencies in the worksheets in produces, then exercise caution when analyzing multiple lenders in a single

sitting, or when working with smaller lenders; the smaller the number of total records for a particular lender, the higher

probability that one of the row or column entries will include no records -- and therefore throw the Excel template into a tizzy.

I’m sorry for these complications. But it seems the main problem here is in *presentation.* SAS can bring over the actual

tabulations perfectly -- that is, if you look at Table 9a and Table 9b, they provide accurate tabulations of these different

categories. But once we try to automate that final step of the process -- the nicely formatted disparity ratios at the bottom --

this means we have to format Excel with the expectation of a certain number of rows and a certain number of columns.

Therefore, the safest path is to use SAS to bring things over into Excel, and only then should you sit down and calculate

the disparity ratios using the formula functions in Excel, looking carefully at the labels to ensure that you’ve got all the

conceptual categories right.

best,

elvin

April 2, 2006, evening

Matthew. I’ve revised the files to produce a parallel set of tables that include all conventional loans -- including sub-

ordinate liens as well as those with no information on lien status. Keep this file open while you run this SAS file, and

you should get the numbers you need.

best,

elvin

April 2, 2006, afternoon

Matthew,

The revised Excel worksheet is here. If you keep this file open on your system while you run this SAS file, you should get

the numbers you need. I’ll work now to revise to include income in the rate-spread worksheet.

best,

elvin

March 30, 2006, afternoon

Matthew,

I’ve revised the code to automate the process of crosstabulations into Excel. Keep this file open on your system, while you

excecute this SAS file. The result should look like this.

best,

elvin

March 30, 2006

Matthew,

I’ve prepared sas code that combines three different dat files, and produces two distinct cross-tabulations for the combined files.

Simply change the names to correspond with whatever set of dat files you’re trying to combine: cut and paste as many of the

‘raw’ data sets, and then make changes to the data step that creates “icp.analyze.” I’ve fixed the code so that it avoids reading

in the first line of each data file -- the line with identifying information for the reporting entity. The SAS code is here, and it

should produce comma delimited files in your c directory that look like this (for actions taken) and this (for the distribution of

rate spread and non rate-spread originations).

best,

elvin

Copy Left 2006, Elvin K. Wyly.