a. complete the Dorsett Education program (cassette s/w with voice) on Statistics – 16 cassette files downloaded from archive.org and document at traidna@wordpress.com

b. Write a statistics program in Kyan Pascal on Atari 800 (emulator) which will compute

descriptive stats, Anderson-darling test for normality, one and two sample T-Tests, one and two proportion tests etc.

c. create YouTube.com videos on creating the statistics program, as there does not appear to be any for Kyan Pascal on Atari

d. blog on the process at traidna.wordpress.com, update the kyanpascal.wordpress.com blog on creating programs

and now is the time to re-cap final results to the goals.

Starting with the 16 Statistics tapes from Doresett Educational Program

All of the 16 tapes were down loaded, along with the master cartridge needed to use the tapes, I ran through each of the tapes, except one, where the download had a duplicate of one of the tapes. I earned my graduate certificate in Applied Statistics through Penn State World Campus, where all classes were virtual web-based. I can only think that the Dorsett Tape approach was certainly a predecessor to this type of remote learning.

For part B – creating a Statistics program in Kyan Pascal. I calling this a success, while I didn’t implement Hypothesis testing, I did complete Descriptive stats, Normality-Tests, and plotting of standard normal curves, and additional features such as colors, and loading data files. By what was accomplished it is definitely proof of concept that modern stats programs such as Minitab can be developed on the Atari 400/800 in pascal, but of course the power of the 8-bit systems does limit speed and amount of data that can be implemented. That said the chaining of executable programs feature of Kyan Pascal allows for lots of functionality in the space available. Finally a self running Demo feature was an interesting exercise for extra credit.

For Part C. I’ll call this partially complete, I did create two YouTube videos that show the self running demo, given infinite time and space I would have liked to create some Kyan Pascal tutorial videos as well.

Video #1 : https://www.youtube.com/watch?v=oDuF0EkyX3A&t=1s

Video #2 : https://www.youtube.com/watch?v=p0bu5M6qt_E

For Part D. I made 12 posts that about one every third day. These posts had a good mix of samples of the Dorsett Tapes, Kyan Pascal Coding features and techniques, running, debugging and validating STATS against modern software. I’ll call this section successful.

In conclusion, I’ve enjoyed this round of Retrochallenge, and have learned a significant amount about the Atari 8-bits through use of the Altirra emulator. I never had an Atari ( I Timex Sinclair 1000, Vic 20, Apple II and PC 286 in my formative years), but I certainly would love to see STATS run on actual hardware at some point. While I was successful in implementing statistics programming, there are many improvements that could be made to the algorithms I created that would either speed up processing, or reduce disk space. The methods I used to create and read/write to the table for CDF could absolutely be replicated for T-tables and F-tables for hypothesis testing and calculation of confidence intervals of means, medians etc. While I just scratched the surface or graphics programming, it’s clear that run charts, box, dot, individual value plots, control charts could all be implemented.

So It’s been a successful run and if I continue to make additional enhancements to STATS for Atari I will continue to post here.

]]>

See the self running demo on youtube : http://youtu.be/p0bu5M6qt_E

On the main menu the two new options have been added, [C] for change colors and [R] to read data from file.

Starting with the read files selection, when R and <enter> are selected the user is prompted for a file name and into which data set should the data be loaded.

When STATS is run from the distribution disk on drive 1, the default drive for the data file is drive one, as shown above. The file name can also be given with the drive designation as well. If the system has multiple drives and the data file is on drive 2 then “D2:WTDATA.TXT” could be entered. Kyan Pascal allows for drives 1 and 2. This is useful because while there is some room on the program disk, having drive 2 available allows for endless number of data files.

The format of a data file is simply one number per line with a final carriage return after the final entry. Here is the last 20 or so entries in the WTDATA.TXT file.

As the data loads, the program shows a counter of the number of data points read and stored, and pauses after loading request a [C] to be entered to continue. Up to 4 data sets can be loaded.

Once loaded then the dataset can be used for other functions in the program, below the descriptive statistics for the WTDATA.TXT data is shown.

In an earlier post I discussed the specifics of setting colors on the Atari 400/800’s, and now have added the ability for the user to select there color to use and either bright text on dark background or dark text on a light background. When the user selects [C] from the main menu they are presented the color menu and are asked to provide the number of the color and the light/dark scheme that they wish to use.

In the example above the user picked color 5, PINK and light text on dark background, resulting in the screen below.

To be transparent about the code, there is not much error checking for any feature in STATS, in a market ready program, things like, checking if a data file exists, and checking for valid responses to questions for menu options should be done, in the timeline of this RetroChallenge I’ve been more interested in seeing what I could get to work than all the expected checks of a final software product.

My next post will be a final wrap up of my RetroChallenge project – Dorsett STATS tapes and STATS for Atari 400/800 in Kyan Pascal.

]]>

Here is a graphics demo program to show these two issues. The program includes the GRAPHICS.I, PLOT.I and DRAWTO.I program extensions. The I wrote a simple procedure that draws a box on the graphics part the screen in graphics mode 7. After a short delay a two lines of code to initialize and add 0.1 to the REAL number variable R, another short delay then a WRITE to the text portion of the screen.

Here is what the screen looks like after the call to GRAPHICS(7) and drawing of the box, everything as expected :

After the short delay the two lines of code to initialize R to 1.00 and add 0.1, results in the follow changes to the screen, several stray pixels can now be seen:

and finally when the text ‘BOX DEMO’ is written to the screen, several more stray pixels are turned on. I did also test with only the writing of the text and the issues persists, so it’s not a residual from the math issue.

I am not sure of the cause of the issue, if it is specifically the library routines provided with Kyan Pascal, as I understand these were provided as additional utilities after the release of Kyan Pascal, but documentation seems to be scarce for these. Or perhaps there is an emulator issue? However I did test the same idea in BASIC and did not have an issue.

Heading into the last week of the RetroChallenge I hope to add the ability to load data from a text file and add that to the menu, and provide a final demo video on youtube.com.

]]>

From the main splash screen for STATS there are three options, Demo, Menu and Quit.

When M is entered on this screen the following menu is presented:

The selections at this time are limited, but are sufficient to demonstrate how one program can call to another already compiled program. Show below is the code that is execute when [N] is entered to run the Normal Distribution Plot program. The current program terminates and starts the executable named in the CHAIN procedure, ‘STATNP’ in this case.

STATNP is passed the variable MODE from the calling program. All variables declared by the initial VAR command are available to the new executable, so long as the VAR statement matches in variable type in the called program.

So in this case the CHAR variable MODE in STATS.P is passed into the CHAR variable CH in STATSNP.P. In the STATNP program checks to see if CH is ‘D’ for demo, if so it will provide a default set of plots, otherwise the user is prompted for means and standard deviations for three plots.

When STATSNP is selected from the menu the user will enter information for three normal plots. In the example below, the first plot will have a mean of 0, and standard deviation of 1, the second curve has a different mean, 3, and will show as a curve of the same shape, just shifted right, and the third curve has the same mean, but a small standard deviation, and will plot as a taller, narrower curve.

Here is the resulting plots, plot one in red, plot two in green and plot 3 in white :

Plotting on the graphics screen is pretty straight forward once one gets comfortable with 0,0 being in the upper left corner. What is not easy is trying to add labels to the plot, and I run into issues using a graphics mode that has the split screen with graphics and text modes (I will cover that in a future post).

As for the formulas and code to produce a normal distribution plot. The Normal Probability Density Function is used to create the curve. From Wikipedia here is the notation and formula for calculation the Normal PDF :

and this is translated into Pascal below and converted to integers for plotting :

Below is the code to set the graphics mode to 23, set up the background color and three foreground colors. There is an interesting mapping from the SETCOLOR(x, command to using the color in a PLOT or other graphics routine, in that the value of the color is x+1 from the SETCOLOR call.

In my next post I will describe to issues I had with the graphics mode, trying to use a combined text and graphics mode, and in doing real number calculations once in graphics mode.

]]>

In case statistics is confusing to you, or what’s so important about this p-value let me try give some very quick background. For a normality test we assume that data comes from a normally distributed population, and try to prove it’s from one that is not. The p-value is the probability that I would be wrong if I said the data is not from a normal distribution. So if the p-value is small ( less than 0.05) I can assume the data does not come from a normally distributed set of data.

In the previous posts the work to determine the value of the CDF is discussed, now the test statistics can be calculated, A-Squared, A-Squared Prime, and the P-value.

The code block below is the implementation of the Asquared statistic.

Red arrow – is the FOR loop to do the summation over all values (NN) in the sample

Yellow Arrow – Call to the CDF lookup in STATAD.P

Blue Arrow – calculation for this data point

Green – Arrow – finishes the calculation after then summing is complete

With the A-Squared value A-squared Prime and the P-value can be calculated, the source code is shown in the two images below:

I validated my code with four sets of data which I also ran through Minitab Express (commercial stats program from Minitab inc.) and the results are shown below. While I was testing I also looked at the descriptive stats from earlier posts:

Here are the four sets of data (as a reminder, this is my daily weight over four weeks in March 2017) shown both in Minitab express and my STATS program.

Here are the results for all four sets of data, and except for some small round differences, all looks good.

As an side, we can not prove any of the four data sets do not come from a normal population, however 7 data points is a fairly small size sample.

Next on the agenda is adding the ability to load data sets from text files and some menu options, and then at least one graph or plot to take advantage of the graphics capabilities of the Atari 800 and Kyan Pascal.

]]>

Where the part in the red box is the cumulative density function for the normal distribution. This is the area under the normal curve from negative infinity to Yi. This represents the probability that a variable takes on a values less than or equal Yi. So this will be a value between 0 and 1.

The values in the CDF table we loaded last time represent the area under the curve from 0 to Yi ( or x depending how you want to name), for a distribution with a mean of 0 and standard deviation of 1. So to get the area from negative infinity an additional calculation is made. If x>0 then 0.5 is added to the table value, if x is less than 0 then 1-table value is used. These examples will clarify, :

So this is great if every distribution had a mean of 0 and standard deviation of 1. Luckily there is an approach that allows us the use the table for test any sample of data, and it is described in the green box above. Yi = (xi – xbar) / standard deviation

Using the table created in the last post here is the procedure (in the file STATAD.P) to lookup the table value of a data point X, from a sample with a mean of XBAR and standard deviation of STD, and then calculate the CDF value. The value is returned in the variable LU, from a call LOOKUPCDF(x, xbar, std, lu);

A quick code review

Green arrow : standardize value of X to use the table;

Red arrow : multiple R by 100 to convert to the table index and store in J

(note: the table is indexed from 0, and is in singled dimensioned unlike the physical table, which has the two lookup values)

Blue Arrow: determine if J is greater or less than 0 and set the pointer to lookup in the table.

Yellow arrow : read the CDF.TBL file using the index K

Gray arrow : using the value looked up and if J is positive or negative make the appropriate adjustment to the table value (add 0.5 J > 0 or subtract from 1 if j<0)

Then the calling procedure can use the value returned in the LU position in the call.

One note here is that this method of opening and reading the table does slow down the calculation, but on the other hand does not require the table to take up valuable memory. Thus the challenges of our retro computing days, sometimes we had to choose between fast and memory…

The next post will show how this value is used to calculate A-Squared, A-Squared Prime and the P-value for the Anderson-Darling Test.

]]>

The Anderson-Darling test is used to test if a sample set may come from a population that is normally distributed (think bell curve). This is one of several tests that can be used, but often is the first one tried as it is a somewhat conservative test. The reason one would care if a sample set of data comes from a normal distribution is that it allows the use of more robust stats suited for normal data.

There is a three step process to use this test :

- Calculate the test statistic A-squared
- Calculate the A-Squared Prime stat
- determine the p-value used to test the hypothesis that the data comes from a normal distribution

In order to calculate A-Squared the formula calls for the CDF function or the area under the curve from -infinity to x. If you are still reading this the picture below will give some context.

The easiest way (and easy is relative here) one would use a table like the one below. Which is set up for the Standard normal curve. (mean=0 and Standard deviation =1)

So before I could even think about how to implement the test in Pascal I needed to build a file to house the table and lookup procedure. The above table was copied and pasted into the Kyan Pascal Editor as a text file. See Below :

Next the file needed to be converted from text to a file of real numbers where the values could be looked up, after normalizing the sample value Yi = (Xi – Xbar) / Std Deviation as shown above.

Here is the code to convert the text file to file of REAL that can be used as a random access file.

This first image gives the program documentation in the header comments.

The program first initializes the input file (CDFTBL.TXT), the text file described above, The commond RESET(CDFIN, ‘CDFTBL.TXT) opens and associates the file CDFTBL.TXT to CDFIN, and sets the read pointer to the beginning of the file. The output file (CDF.TBL) is created with the REWRITE(CDFOUT, ‘CDF.TBL’); command. The output file is named CDF.TBL on the disk.

The text file has 10 values, 7 characters long, in 41 lines of data, each is read as a string then converted to a real number then written to the output file. Each of these values is the area under the normal curve from 0 to Xi.

The next post will describe how a value from the table is looked up and then used in the calculations needed for the Anderson-Darling normality test.

]]>

This post will include the formulas for Mean, Standard Deviation, Variation, Skewness and Kurtosis, (peaked-ness of a distribution). Along with some quick and dirty Pascal code to implement as a sample/demo.

Show below is the first two screens of the documentation:

Here is the output of the demo code, which will be presented in this and following posts.

which is calculated from the following sample data, which is a daily measurement of my body weight for a week. (as an aside I have lost 157 pounds over the past year and I will be using data I have collected during that time as sample data for this program)

Here is the hard coded assignment of the data, in future coding the data will be read from a text file and converted to real numbers)

The data is stored in an ARRAY of Real – called SD[], SD[0] – holds the sample size and SD[1] to SD[7] holds the data in time ordered fashion.

I will later provide a full listing of the source code, in the mean time here are the code snipets for calculating the sum of the data and then then mean, along with displaying it to the screen. The GOTOXY(X,Y) is in the include file CONIO.P and REALTOSTR() function converts a real number from scientific notation to a decimal string format, for easier viewing. To calculate the sum a for loop from one to n=7 is used and each element of the array is added to the running total, also an array SA[] is a copied from SD[] and will be used later to have a sorted copy for finding the minimum, maximum, median and to be used in the Anderson-Darling normality test.

Then from the sum the mean (xbar) is calculated by dividing the sum by n (7 in this case) and all are printed to the screen, using the GOTOXY and REALTOSTR calls.

The next post will pick up with sorting the array, calculation of Standard Deviation, Variance, Skewness and Kurtosis.

]]>

This will be the last post specific to the tapes and I’ll be moving on to the coding of the STATS program. Here are a few images from the tapes:

Tape 5 : Probality

Tape 6 : Probability Continued

Tape 7 : Probability Distributions

Tape 14 : Hypothesis Testing

Tape 15 : Hypothesis testing continued

Test 16 : Statistics Review

This weekend I will pick up with Pascal coding of the STATS program!!!

]]>

I intend for a short running demo to show what STATS can do, along with the program menu and ability to quit from the start up screen. So lets take a look at the code. As we do it’s worthwhile to note that Kyan Pascal also program chaining and I intend to use the feature to build our the STATS program to allow the data, calculations and graphs and plots to be built up individually and to pass data via the chaining.

The source file is STATS.P, and here is the top portion of the program.

Focusing on the code in the red box, we see the comment at the top of the file enclosed in (* *). The source file name is listed STATS.P and a brief descrtiption of what this file is. Next the Program statement to start the program and program name “STATS;”

One variable “CH” is declared as a character, and will be used to capture the menu selection when that code is added. Then two library source files are included with the #I directive. “CONIO.P” is a library of console IO procedures I will build up as I go through this project, and at this time has two procedures, “CLS” to clear the screen and “GOTOXY” to position the cursor on the screen for printing. I wrote the CLS code, the GOTOXY code is borrowed from one of the added features disks, and was renamed from the POSITION procedure which you can see is in assembly code. Here’s what they look like :

Moving down in the STATS.P code, down to the bottom where then main BEGIN END. block resides. We see a call to a procedure SPLASH, a READ statement, and a call to CLS in CONIO.P to clear the screen. SPLASH is the procedure that draws the screen, STATS logo and menu.

Here is the code for the SPLASH procedure. Starts with some comments and the declaration with the PROCEDURE statement, In the image below. That is followed by three set color statements. The SETCOLOR procedure is in the include file SETCOLOR.I we mentioned above. The calls are of the form SETOLOR(register, color, hue); In text mode (aka graphics 0 ) register 4 sets the boarder color and hue, in this case 0,0 for black and darkest hue. Register 2 sets the background color and hue, 12 is green and 0 again is dark hue. Register 1, only sets the hue for the printed characters, 14 is for the brightest hue. For now these are hard coded in, perhaps time permitting I will add a menu to allow the user to change the colors.

Next comes a call to clear the screen with CLS, then calls to GOTOXY to position the cursor and begin to print text to the screen, along with the logo.

Finally the menu at the bottom of the screen is displayed and control returns to the main program block, and the READ(CH); is executed. Once a character and return are typed the screen clears and program ends.

So the project is off to a good start, some additional utilities will need to be built, such as convert a real number to integer, reading data from text file and convert to numeric data. Also data structure will need to be addressed and then finally some stats calculations and graphs and plots coded.

Stay tuned.

]]>