This is part 1 of a three part post on coding the Anderson-Darling normality test in STATS, written in Kyan Pascal for the Atari 400/800 computers.
The Anderson-Darling test is used to test if a sample set may come from a population that is normally distributed (think bell curve). This is one of several tests that can be used, but often is the first one tried as it is a somewhat conservative test. The reason one would care if a sample set of data comes from a normal distribution is that it allows the use of more robust stats suited for normal data.
There is a three step process to use this test :
- Calculate the test statistic A-squared
- Calculate the A-Squared Prime stat
- determine the p-value used to test the hypothesis that the data comes from a normal distribution
In order to calculate A-Squared the formula calls for the CDF function or the area under the curve from -infinity to x. If you are still reading this the picture below will give some context.
The easiest way (and easy is relative here) one would use a table like the one below. Which is set up for the Standard normal curve. (mean=0 and Standard deviation =1)
So before I could even think about how to implement the test in Pascal I needed to build a file to house the table and lookup procedure. The above table was copied and pasted into the Kyan Pascal Editor as a text file. See Below :
Next the file needed to be converted from text to a file of real numbers where the values could be looked up, after normalizing the sample value Yi = (Xi – Xbar) / Std Deviation as shown above.
Here is the code to convert the text file to file of REAL that can be used as a random access file.
This first image gives the program documentation in the header comments.
The program first initializes the input file (CDFTBL.TXT), the text file described above, The commond RESET(CDFIN, ‘CDFTBL.TXT) opens and associates the file CDFTBL.TXT to CDFIN, and sets the read pointer to the beginning of the file. The output file (CDF.TBL) is created with the REWRITE(CDFOUT, ‘CDF.TBL’); command. The output file is named CDF.TBL on the disk.
The text file has 10 values, 7 characters long, in 41 lines of data, each is read as a string then converted to a real number then written to the output file. Each of these values is the area under the normal curve from 0 to Xi.
The next post will describe how a value from the table is looked up and then used in the calculations needed for the Anderson-Darling normality test.