From the last post the formula for A-squared the test statistic for the Anderson-Darling normality test is :
Where the part in the red box is the cumulative density function for the normal distribution. This is the area under the normal curve from negative infinity to Yi. This represents the probability that a variable takes on a values less than or equal Yi. So this will be a value between 0 and 1.
The values in the CDF table we loaded last time represent the area under the curve from 0 to Yi ( or x depending how you want to name), for a distribution with a mean of 0 and standard deviation of 1. So to get the area from negative infinity an additional calculation is made. If x>0 then 0.5 is added to the table value, if x is less than 0 then 1-table value is used. These examples will clarify, :
So this is great if every distribution had a mean of 0 and standard deviation of 1. Luckily there is an approach that allows us the use the table for test any sample of data, and it is described in the green box above. Yi = (xi – xbar) / standard deviation
Using the table created in the last post here is the procedure (in the file STATAD.P) to lookup the table value of a data point X, from a sample with a mean of XBAR and standard deviation of STD, and then calculate the CDF value. The value is returned in the variable LU, from a call LOOKUPCDF(x, xbar, std, lu);
A quick code review
Green arrow : standardize value of X to use the table;
Red arrow : multiple R by 100 to convert to the table index and store in J
(note: the table is indexed from 0, and is in singled dimensioned unlike the physical table, which has the two lookup values)
Blue Arrow: determine if J is greater or less than 0 and set the pointer to lookup in the table.
Yellow arrow : read the CDF.TBL file using the index K
Gray arrow : using the value looked up and if J is positive or negative make the appropriate adjustment to the table value (add 0.5 J > 0 or subtract from 1 if j<0)
Then the calling procedure can use the value returned in the LU position in the call.
One note here is that this method of opening and reading the table does slow down the calculation, but on the other hand does not require the table to take up valuable memory. Thus the challenges of our retro computing days, sometimes we had to choose between fast and memory…
The next post will show how this value is used to calculate A-Squared, A-Squared Prime and the P-value for the Anderson-Darling Test.