Fair
Finance Watch Resources: Combining
multiple ‘dat’ files.
Cabrini-Green,
while it still survives. Chicago, IL, March 2006.
April
3, 2006, shortly after midnight
Matthew,
I’ve revised the
code as we discussed. As I inspected the
output, I encountered a subtle error that demonstrates the limits
of
automation. Please inspect each set of
tables from this code very carefully, to avoid possible misinterpretations. First,
here is the blank Excel worksheet you should keep
open while running this SAS file. You will get results that look
like this.
Now page over to
Sheet 9 of the workbook, for Tables 9a and 9b -- racial/ethnic breakdown for
high-cost and non high-
cost first-lien
home purchase originations. Look
carefully at Table 9a: There are no
high-cost originations for those reporting
as “4 Black,
Ethnicity Unknown.” Therefore, SAS
refuses to paste in a column for this category.
Therefore, the template
we used to
create the Excel worksheet performs the wrong calculations to derive the
disparity ratios. If you look at the
formulas
in the disparity
ratios at the very bottom, you will see how each is calculated, and you’ll see
how things must be fixed manually
to ensure that
the ratios are correct. You can insert
an extra row in Table 9a, but then you will still have to fix the disparity
ratio formulas
in the bottom rows of the worksheet. I’m
sorry, but I do not know how to force SAS to produce the extra
row when there
are no observations in the respective category.
You should also note that the same thing can happen
when there is an
unexpected *column* that is not in the template.
For this reason,
I offer the following advice. First, run
this code on an institution that you know fairly well. Then sit down,
print out all
nine worksheets, and spend a solid block of time looking for logical
inconsistencies -- comparing the number of
originations
shown on various tables, for example, to ensure that everything sums
correctly. Second, if this code has no
logical
inconsistencies in the worksheets in produces, then exercise caution when
analyzing multiple lenders in a single
sitting, or when
working with smaller lenders; the smaller the number of total records for a
particular lender, the higher
probability that
one of the row or column entries will include no records -- and therefore throw
the Excel template into a tizzy.
I’m sorry for
these complications. But it seems the
main problem here is in *presentation.*
SAS can bring over the actual
tabulations
perfectly -- that is, if you look at Table 9a and Table 9b, they provide
accurate tabulations of these different
categories. But once we try to automate that final step
of the process -- the nicely formatted disparity ratios at the bottom --
this means we
have to format Excel with the expectation of a certain number of rows and a
certain number of columns.
Therefore, the
safest path is to use SAS to bring things over into Excel, and only then should
you sit down and calculate
the disparity
ratios using the formula functions in Excel, looking carefully at the labels to
ensure that you’ve got all the
conceptual
categories right.
best,
elvin
April
2, 2006, evening
Matthew. I’ve revised the files to produce a parallel set
of tables that include all conventional loans -- including sub-
ordinate liens
as well as those with no information on lien status. Keep this file open
while you run this SAS file, and
you should get
the numbers you need.
best,
elvin
April
2, 2006, afternoon
Matthew,
The revised
Excel worksheet is here. If you keep this file open on your system
while you run this SAS file, you should get
the numbers you
need. I’ll work now to revise to include
income in the rate-spread worksheet.
best,
elvin
March
30, 2006, afternoon
Matthew,
I’ve revised the
code to automate the process of crosstabulations into Excel. Keep this file open
on your system, while you
excecute this SAS file.
The result should look like this.
best,
elvin
March
30, 2006
Matthew,
I’ve prepared
sas code that combines three different dat files, and produces two distinct
cross-tabulations for the combined files.
Simply change
the names to correspond with whatever set of dat files you’re trying to
combine: cut and paste as many of the
‘raw’ data sets,
and then make changes to the data step that creates “icp.analyze.” I’ve fixed the code so that it avoids reading
in the first
line of each data file -- the line with identifying information for the
reporting entity. The SAS code is here, and it
should produce
comma delimited files in your c directory that look like this
(for actions taken) and this (for the distribution of
rate spread and non rate-spread originations).
best,
elvin
Copy Left 2006, Elvin K. Wyly.