RetroChallenge 04/2017 – Post 12 – Wrap Up

When I set out for the April 2017 round of the RetroChallenge here were my aspirations

a. complete the Dorsett Education program (cassette s/w with voice) on Statistics – 16 cassette files downloaded from archive.org and document at traidna@wordpress.com

b. Write a statistics program in Kyan Pascal on Atari 800 (emulator) which will compute
descriptive stats, Anderson-darling test for normality, one and two sample T-Tests, one and two proportion tests etc.

c. create YouTube.com videos on creating the statistics program, as there does not appear to be any for Kyan Pascal on Atari

d. blog on the process at traidna.wordpress.com, update the kyanpascal.wordpress.com blog on creating programs

and now is the time to re-cap final results to the goals.

Starting with the 16 Statistics tapes from Doresett Educational Program

All of the 16 tapes were down loaded, along with the master cartridge needed to use the tapes, I ran through each of the tapes, except one, where the download had a duplicate of one of the tapes.  I earned my graduate certificate in Applied Statistics through Penn State World Campus, where all classes were virtual web-based.  I can only think that the Dorsett Tape approach was certainly a predecessor to this type of remote learning.

tape16 Review - Title

For part B – creating a Statistics program in Kyan Pascal.  I calling this a success, while I didn’t implement Hypothesis testing, I did complete Descriptive stats, Normality-Tests, and plotting of standard normal curves, and additional features such as colors, and loading data files.  By what was accomplished it is definitely proof of concept that modern stats programs such as Minitab can be developed on the Atari 400/800 in pascal, but of course the power of the 8-bit systems does limit speed and amount of data that can be implemented.  That said the chaining of executable programs feature of Kyan Pascal allows for lots of functionality in the space available. Finally a self running Demo feature was an interesting exercise for extra credit.

For Part C. I’ll call this partially complete, I did create two YouTube videos that show the self running demo, given infinite time and space I would have liked to create some Kyan Pascal tutorial videos as well.

Video #1 : https://www.youtube.com/watch?v=oDuF0EkyX3A&t=1s

Video #2 : https://www.youtube.com/watch?v=p0bu5M6qt_E

For Part D. I made 12 posts that about one every third day.  These posts had a good mix of samples of the Dorsett Tapes, Kyan Pascal Coding features and techniques, running, debugging and validating STATS against modern software.  I’ll call this section successful.

In conclusion, I’ve enjoyed this round of Retrochallenge, and have learned a significant amount about the Atari 8-bits through use of the Altirra emulator.  I never had an Atari ( I Timex Sinclair 1000, Vic 20, Apple II and PC 286 in my formative years), but I certainly would love to see STATS run on actual hardware at some point.  While I was successful in implementing statistics programming, there are many improvements that could be made to the algorithms I created that would either speed up processing, or reduce disk space.  The methods I used to create and read/write to the table for CDF could absolutely be replicated for T-tables and F-tables for hypothesis testing and calculation of confidence intervals of means, medians etc.  While I just scratched the surface or graphics programming, it’s clear that run charts, box, dot, individual value plots, control charts could all be implemented.

So It’s been a successful run and if I continue to make additional enhancements to STATS for Atari I will continue to post here.

RetroChallenge Day 30 – Post 11 – Reading Data Files – Colors

I’ve added two final features to STATS for Atari 400/800 to round out the program and exercise Kyan Pascal for the April 2017 Retro Challenge.  The ability to read data from text files and a menu driven utility to change colors.

See the self running demo on youtube : http://youtu.be/p0bu5M6qt_E

On the main menu the two new options have been added, [C] for change colors and [R] to read data from file.

menu1

Starting with the read files selection, when R and <enter> are selected the user is prompted for a file name and into which data set should the data be loaded.

read1

When STATS is run from the distribution disk on drive 1, the default drive for the data file is drive one, as shown above.  The file name can also be given with the drive designation as well.  If the system has multiple drives and the data file is on drive 2 then “D2:WTDATA.TXT” could be entered.  Kyan Pascal allows for drives 1 and 2.  This is useful because while there is some room on the program disk, having drive 2 available allows for endless number of data files.

The format of a data file is simply one number per line with a final carriage return after the final entry.  Here is the last 20 or so entries in the WTDATA.TXT file.

wtdata

As the data loads, the program shows a counter of the number of data points read and stored, and pauses after loading request a [C] to be entered to continue.  Up to 4 data sets can be loaded.

Once loaded then the dataset can be used for other functions in the program, below the descriptive statistics for the WTDATA.TXT data is shown.

read2

In an earlier post I discussed the specifics of setting colors on the Atari 400/800’s, and now have added the ability for the user to select there color to use and either bright text on dark background or dark text on a light background.  When the user selects [C] from the main menu they are presented the color menu and are asked to provide the number of the color and the light/dark scheme that they wish to use.

color1

In the example above the user picked color 5, PINK and light text on dark background, resulting in the screen below.

color2

To be transparent about the code, there is not much error checking for any feature in STATS, in a market ready program, things like, checking if a data file exists, and checking for valid responses to questions for menu options should be done, in the timeline of  this RetroChallenge I’ve been more interested in seeing what I could get to work than all the expected checks of a final software product.

My next post will be a final wrap up of my RetroChallenge project – Dorsett STATS tapes and STATS for Atari 400/800 in Kyan Pascal.

 

 

 

 

 

RetroChallenge Day 23 – Post 10 – Graphics Issues with Kyan Pascal

In my last post I mentioned some issues with the graphics utilities with Kyan Pascal, in this post I will review what I have found.  The two issues I have run into is that stray pixels are plotted to the graphics screen with REAL number math code is executed and when text is written to the text part of the screen in a dual graphics and text screen mode.

Here is a graphics demo program to show these two issues.  The program includes the GRAPHICS.I, PLOT.I and DRAWTO.I program extensions.  The I wrote a simple procedure that draws a box on the graphics part the screen in graphics mode 7.  After a short delay a two lines of code to initialize and add 0.1 to the REAL number variable R, another short delay then a WRITE to the text portion of the screen.

gdemo1gdemo2

Here is what the screen looks like after the call to GRAPHICS(7) and drawing of the box, everything as expected :

gdemo3

After the short delay the two lines of code to initialize R to 1.00 and add 0.1, results in the follow changes to the screen, several stray pixels can now be seen:

gdemo4

and finally when the text ‘BOX DEMO’ is written to the screen, several more stray pixels are turned on.  I did also test with only the writing of the text and the issues persists, so it’s not a residual from the math issue.

gdemo5

I am not sure of the cause of the issue, if it is specifically  the library routines provided with Kyan Pascal, as I understand these were provided as additional utilities after the release of Kyan Pascal, but documentation seems to be scarce for these. Or perhaps there is an emulator issue?  However I did test the same idea in BASIC and did not have an issue.

Heading into the last week of the RetroChallenge I hope to add the ability to load data from a text file and add that to the menu, and provide a final demo video on youtube.com.

 

RetroChallenge Day 22 – Post 9 – Using Graphics to Plot Normal Distributions and Menus

During this past week I have been learning use graphics in Kyan Pascal on the Atari 800 emulator Altirra.  A program to plot three overlapping normal distribution plots was written and it can illustrate the shift in mean or difference in spread of the plot of a normal distribution, given the mean and standard deviation of the distribution.  Also a short menu system was added to the STATS program which utilizes the “CHAIN” feature of Kyan Pascal.  These three topics will be discussed below.

From the main splash screen for STATS there are three options, Demo, Menu and Quit.

stats menu1

When M is entered on this screen the following menu is presented:

stats menu

The selections at this time are limited, but are sufficient to demonstrate how one program can call to another already compiled program.  Show below is the code that is execute when [N] is entered to run the Normal Distribution Plot  program.  The current program terminates and starts the executable named in the CHAIN procedure, ‘STATNP’ in this case.

statsmenu 3

STATNP is passed the variable MODE from the calling program.  All variables declared by the initial VAR command are available to the new executable, so long as the VAR statement matches in variable type in the called program.

So in this case the CHAR variable MODE in STATS.P is passed into the CHAR variable CH in STATSNP.P.  In the STATNP program checks to see if CH is ‘D’ for demo, if so it will provide a default set of plots, otherwise the user is prompted for means and standard deviations for three plots.

 

When STATSNP is selected from the menu the user will enter information for three normal plots.  In the example below, the first plot will have a mean of 0, and standard deviation of 1, the second curve has a different mean, 3, and will show as a curve of the same shape, just shifted right, and the third curve has the same mean, but a small standard deviation, and will plot as a taller, narrower curve.

statsnp1

Here is the resulting plots, plot one in red, plot two in green and plot 3 in white :

statsnp2

Plotting on the graphics screen is pretty straight forward once one gets comfortable with 0,0 being in the upper left corner.  What is not easy is trying to add labels to the plot,  and I run into issues using a graphics mode that has the split screen with graphics and text modes (I will cover that in a future post).

As for the formulas and code to produce a normal distribution plot.  The Normal Probability Density Function is used to create the curve.  From Wikipedia here is the notation and formula for calculation the Normal PDF  :

normalpdf

 

and this is translated into Pascal below and converted to integers for plotting :

statsnp3

Below is the code to set the graphics mode to 23, set up the background color and three foreground colors.  There is an interesting mapping from the SETCOLOR(x, command to using the color in a PLOT or other graphics routine, in that the value of the color is x+1 from the SETCOLOR call.

 

statsnp44

In my next post I will describe to issues I had with the graphics mode, trying to use a combined text and graphics mode, and in doing real number calculations once in graphics mode.

RetroChallenge Day 16 – Post 8 – Anderson-Darling Test Part 3

In this post the Pascal Code to implement the Anderson-Darling normality test will be discussed, along with a set of sample runs compared to the results from Minitab Express statistical software.  The complete set of formulas for calculating the Anderson-Darling Test statistic for Normality and associated P-Value are :

 

andersondarling formulas

In case statistics is confusing to you, or what’s so important about this p-value let me try give some very quick background.  For a normality test we assume that data comes from a normally distributed population, and try to prove it’s from one that is not.  The p-value is the probability that I would be wrong if I said the data is not from a normal distribution.  So if the p-value is small ( less than 0.05) I can assume the data does not come from a normally distributed set of data.

In the previous posts the work to determine the value of the CDF is discussed, now the test statistics can be calculated, A-Squared, A-Squared Prime, and the P-value.

The code block below is the implementation of the Asquared  statistic.

Red arrow – is the FOR loop to do the summation over all values (NN) in the sample
Yellow Arrow – Call to the CDF lookup in STATAD.P
Blue Arrow – calculation for this data point
Green – Arrow – finishes the calculation after then summing is complete

ad asquared

With the A-Squared value A-squared Prime and the P-value can be calculated, the source code is shown in the two images below:

ad asquaredprimead pvalue

I validated my code with four sets of data which I also ran through Minitab Express (commercial stats program from Minitab inc.) and the results are shown below.  While I was testing I also looked at the descriptive stats from earlier posts:

Here are the four sets of data (as a reminder, this is my daily weight over four weeks in March 2017) shown both in Minitab express and my STATS program.

sampleallmt

Here are the results for all four sets of data, and except for some small round differences, all looks good.

sample1sample1mt

sample2sample2mtsample3sample3mtsample4.

sample4mt

As an side, we can not prove any of the four data sets do not come from a normal population, however 7 data points is a fairly small size sample.

Next on the agenda is adding the ability to load data sets from text files and some menu options, and then at least one graph or plot to take advantage of the graphics capabilities of the Atari 800 and Kyan Pascal.

 

 

RetroChallenge Day 15 – Post 7 – Anderson-Darling Test Part 2 – CDF Look Up

From the last post the formula for A-squared the test statistic for the Anderson-Darling normality test is :

cdfmtbl9

Where the part in the red box is the cumulative density function for the normal distribution.  This is the area under the normal curve from negative infinity to Yi.  This represents the probability that a variable takes on a values less than or equal Yi.  So this will be a value between 0 and 1.

cdfmtbl6

The values in the CDF table we loaded last time represent the area under the curve from 0 to Yi ( or x depending how you want to name), for a distribution with a mean of 0 and standard deviation of 1.  So to get the area from negative infinity an additional calculation is made.  If x>0 then 0.5 is added to the table value, if x is less than 0 then 1-table value is used.  These examples will clarify, :

cdfmtbl10

cdfmtbl8

So this is great if every distribution had a mean of 0 and standard deviation of 1.  Luckily there is an approach that allows us the use the table for test any sample of data, and it is described in the green box above.  Yi = (xi – xbar) / standard deviation

Using the table created in the last post here is the procedure (in the file STATAD.P) to lookup the table value of a data point X, from a sample with a mean of XBAR and standard deviation of STD, and then calculate the CDF value.   The value is returned in the variable LU, from a call LOOKUPCDF(x, xbar, std, lu);

lookup1

A quick code review

Green arrow : standardize value of X to use the table;
Red arrow :   multiple R by 100 to convert to the table index and store in J
(note: the table is indexed from 0, and is in singled dimensioned unlike the physical table, which has the two lookup values)
Blue Arrow: determine if J is greater or less than 0 and set the pointer to lookup in the table.
Yellow arrow : read the CDF.TBL file using the index K
Gray arrow : using the value looked up and if J is positive or negative make the appropriate adjustment to the table value (add 0.5 J > 0 or subtract from 1 if j<0)

lookup2

Then the calling procedure can use the value returned in the LU position in the call.

One note here is that this method of opening and reading the table does slow down the calculation, but on the other hand does not require the table to take up valuable memory.  Thus the challenges of our retro computing days, sometimes we had to choose between fast and memory…

The next post will show how this value is used to calculate A-Squared, A-Squared Prime and the P-value for the Anderson-Darling Test.

statad01

RetroChallenge Day 14 – Post 6 – Anderson-Darling Normality Test Part I

This is part 1 of a three part post on coding the Anderson-Darling normality test in STATS, written in Kyan Pascal for the Atari 400/800 computers.

The Anderson-Darling test is used to test if a sample set may come from a population that is normally distributed (think bell curve).  This is one of several tests that can be used, but often is the first one tried as it is a somewhat conservative test.  The reason one would care if a sample set of data comes from a normal distribution is that it allows the use of more robust stats suited for normal data.

There is a three step process to use this test :

  1. Calculate the test statistic A-squared
  2. Calculate the A-Squared Prime stat
  3. determine the p-value used to test the hypothesis that the data comes from a normal distribution

ad1

In order to calculate A-Squared the formula calls for the CDF function or the area under the curve from -infinity to x. If you are still reading this the picture below will give some context.

 

ad2

The easiest way (and easy is relative here) one would use a table like the one below.  Which is set up for the Standard normal curve.  (mean=0 and Standard deviation =1)

ad3.jpg

 

So before I could even think about how to implement the test in Pascal I needed to build a file to house the table and lookup procedure.   The above table was copied and pasted into the Kyan Pascal Editor as a text file.  See Below :

cdfmtbl5

Next the file needed to be converted from text to a file of real numbers where the values could be looked up, after normalizing the sample value Yi = (Xi – Xbar) / Std Deviation as shown above.

Here is the code to convert the text file to file of REAL that can be used as a random access file.

This first image gives the program documentation in the header comments.

cdfmtbl1

cdfmtbl2

The program first initializes the input file (CDFTBL.TXT), the text file described above,  The commond RESET(CDFIN, ‘CDFTBL.TXT) opens and associates the file CDFTBL.TXT to CDFIN, and sets the read pointer to the beginning of the file. The output file (CDF.TBL) is created with the REWRITE(CDFOUT, ‘CDF.TBL’); command.  The output file is named CDF.TBL on the disk.

cdfmtbl3

The text file has 10 values, 7 characters long, in 41 lines of data, each is read as a string then converted to a real number then written to the output file.  Each of these values is the area under the normal curve from 0 to Xi.

cdfmtbl4

The next post will describe how a value from the table is looked up and then used in the calculations needed for the Anderson-Darling normality test.