RetroChallenge Day 23 – Post 10 – Graphics Issues with Kyan Pascal

In my last post I mentioned some issues with the graphics utilities with Kyan Pascal, in this post I will review what I have found.  The two issues I have run into is that stray pixels are plotted to the graphics screen with REAL number math code is executed and when text is written to the text part of the screen in a dual graphics and text screen mode.

Here is a graphics demo program to show these two issues.  The program includes the GRAPHICS.I, PLOT.I and DRAWTO.I program extensions.  The I wrote a simple procedure that draws a box on the graphics part the screen in graphics mode 7.  After a short delay a two lines of code to initialize and add 0.1 to the REAL number variable R, another short delay then a WRITE to the text portion of the screen.

gdemo1gdemo2

Here is what the screen looks like after the call to GRAPHICS(7) and drawing of the box, everything as expected :

gdemo3

After the short delay the two lines of code to initialize R to 1.00 and add 0.1, results in the follow changes to the screen, several stray pixels can now be seen:

gdemo4

and finally when the text ‘BOX DEMO’ is written to the screen, several more stray pixels are turned on.  I did also test with only the writing of the text and the issues persists, so it’s not a residual from the math issue.

gdemo5

I am not sure of the cause of the issue, if it is specifically  the library routines provided with Kyan Pascal, as I understand these were provided as additional utilities after the release of Kyan Pascal, but documentation seems to be scarce for these. Or perhaps there is an emulator issue?  However I did test the same idea in BASIC and did not have an issue.

Heading into the last week of the RetroChallenge I hope to add the ability to load data from a text file and add that to the menu, and provide a final demo video on youtube.com.

 

RetroChallenge Day 22 – Post 9 – Using Graphics to Plot Normal Distributions and Menus

During this past week I have been learning use graphics in Kyan Pascal on the Atari 800 emulator Altirra.  A program to plot three overlapping normal distribution plots was written and it can illustrate the shift in mean or difference in spread of the plot of a normal distribution, given the mean and standard deviation of the distribution.  Also a short menu system was added to the STATS program which utilizes the “CHAIN” feature of Kyan Pascal.  These three topics will be discussed below.

From the main splash screen for STATS there are three options, Demo, Menu and Quit.

stats menu1

When M is entered on this screen the following menu is presented:

stats menu

The selections at this time are limited, but are sufficient to demonstrate how one program can call to another already compiled program.  Show below is the code that is execute when [N] is entered to run the Normal Distribution Plot  program.  The current program terminates and starts the executable named in the CHAIN procedure, ‘STATNP’ in this case.

statsmenu 3

STATNP is passed the variable MODE from the calling program.  All variables declared by the initial VAR command are available to the new executable, so long as the VAR statement matches in variable type in the called program.

So in this case the CHAR variable MODE in STATS.P is passed into the CHAR variable CH in STATSNP.P.  In the STATNP program checks to see if CH is ‘D’ for demo, if so it will provide a default set of plots, otherwise the user is prompted for means and standard deviations for three plots.

 

When STATSNP is selected from the menu the user will enter information for three normal plots.  In the example below, the first plot will have a mean of 0, and standard deviation of 1, the second curve has a different mean, 3, and will show as a curve of the same shape, just shifted right, and the third curve has the same mean, but a small standard deviation, and will plot as a taller, narrower curve.

statsnp1

Here is the resulting plots, plot one in red, plot two in green and plot 3 in white :

statsnp2

Plotting on the graphics screen is pretty straight forward once one gets comfortable with 0,0 being in the upper left corner.  What is not easy is trying to add labels to the plot,  and I run into issues using a graphics mode that has the split screen with graphics and text modes (I will cover that in a future post).

As for the formulas and code to produce a normal distribution plot.  The Normal Probability Density Function is used to create the curve.  From Wikipedia here is the notation and formula for calculation the Normal PDF  :

normalpdf

 

and this is translated into Pascal below and converted to integers for plotting :

statsnp3

Below is the code to set the graphics mode to 23, set up the background color and three foreground colors.  There is an interesting mapping from the SETCOLOR(x, command to using the color in a PLOT or other graphics routine, in that the value of the color is x+1 from the SETCOLOR call.

 

statsnp44

In my next post I will describe to issues I had with the graphics mode, trying to use a combined text and graphics mode, and in doing real number calculations once in graphics mode.

RetroChallenge Day 16 – Post 8 – Anderson-Darling Test Part 3

In this post the Pascal Code to implement the Anderson-Darling normality test will be discussed, along with a set of sample runs compared to the results from Minitab Express statistical software.  The complete set of formulas for calculating the Anderson-Darling Test statistic for Normality and associated P-Value are :

 

andersondarling formulas

In case statistics is confusing to you, or what’s so important about this p-value let me try give some very quick background.  For a normality test we assume that data comes from a normally distributed population, and try to prove it’s from one that is not.  The p-value is the probability that I would be wrong if I said the data is not from a normal distribution.  So if the p-value is small ( less than 0.05) I can assume the data does not come from a normally distributed set of data.

In the previous posts the work to determine the value of the CDF is discussed, now the test statistics can be calculated, A-Squared, A-Squared Prime, and the P-value.

The code block below is the implementation of the Asquared  statistic.

Red arrow – is the FOR loop to do the summation over all values (NN) in the sample
Yellow Arrow – Call to the CDF lookup in STATAD.P
Blue Arrow – calculation for this data point
Green – Arrow – finishes the calculation after then summing is complete

ad asquared

With the A-Squared value A-squared Prime and the P-value can be calculated, the source code is shown in the two images below:

ad asquaredprimead pvalue

I validated my code with four sets of data which I also ran through Minitab Express (commercial stats program from Minitab inc.) and the results are shown below.  While I was testing I also looked at the descriptive stats from earlier posts:

Here are the four sets of data (as a reminder, this is my daily weight over four weeks in March 2017) shown both in Minitab express and my STATS program.

sampleallmt

Here are the results for all four sets of data, and except for some small round differences, all looks good.

sample1sample1mt

sample2sample2mtsample3sample3mtsample4.

sample4mt

As an side, we can not prove any of the four data sets do not come from a normal population, however 7 data points is a fairly small size sample.

Next on the agenda is adding the ability to load data sets from text files and some menu options, and then at least one graph or plot to take advantage of the graphics capabilities of the Atari 800 and Kyan Pascal.

 

 

RetroChallenge Day 15 – Post 7 – Anderson-Darling Test Part 2 – CDF Look Up

From the last post the formula for A-squared the test statistic for the Anderson-Darling normality test is :

cdfmtbl9

Where the part in the red box is the cumulative density function for the normal distribution.  This is the area under the normal curve from negative infinity to Yi.  This represents the probability that a variable takes on a values less than or equal Yi.  So this will be a value between 0 and 1.

cdfmtbl6

The values in the CDF table we loaded last time represent the area under the curve from 0 to Yi ( or x depending how you want to name), for a distribution with a mean of 0 and standard deviation of 1.  So to get the area from negative infinity an additional calculation is made.  If x>0 then 0.5 is added to the table value, if x is less than 0 then 1-table value is used.  These examples will clarify, :

cdfmtbl10

cdfmtbl8

So this is great if every distribution had a mean of 0 and standard deviation of 1.  Luckily there is an approach that allows us the use the table for test any sample of data, and it is described in the green box above.  Yi = (xi – xbar) / standard deviation

Using the table created in the last post here is the procedure (in the file STATAD.P) to lookup the table value of a data point X, from a sample with a mean of XBAR and standard deviation of STD, and then calculate the CDF value.   The value is returned in the variable LU, from a call LOOKUPCDF(x, xbar, std, lu);

lookup1

A quick code review

Green arrow : standardize value of X to use the table;
Red arrow :   multiple R by 100 to convert to the table index and store in J
(note: the table is indexed from 0, and is in singled dimensioned unlike the physical table, which has the two lookup values)
Blue Arrow: determine if J is greater or less than 0 and set the pointer to lookup in the table.
Yellow arrow : read the CDF.TBL file using the index K
Gray arrow : using the value looked up and if J is positive or negative make the appropriate adjustment to the table value (add 0.5 J > 0 or subtract from 1 if j<0)

lookup2

Then the calling procedure can use the value returned in the LU position in the call.

One note here is that this method of opening and reading the table does slow down the calculation, but on the other hand does not require the table to take up valuable memory.  Thus the challenges of our retro computing days, sometimes we had to choose between fast and memory…

The next post will show how this value is used to calculate A-Squared, A-Squared Prime and the P-value for the Anderson-Darling Test.

statad01

RetroChallenge Day 14 – Post 6 – Anderson-Darling Normality Test Part I

This is part 1 of a three part post on coding the Anderson-Darling normality test in STATS, written in Kyan Pascal for the Atari 400/800 computers.

The Anderson-Darling test is used to test if a sample set may come from a population that is normally distributed (think bell curve).  This is one of several tests that can be used, but often is the first one tried as it is a somewhat conservative test.  The reason one would care if a sample set of data comes from a normal distribution is that it allows the use of more robust stats suited for normal data.

There is a three step process to use this test :

  1. Calculate the test statistic A-squared
  2. Calculate the A-Squared Prime stat
  3. determine the p-value used to test the hypothesis that the data comes from a normal distribution

ad1

In order to calculate A-Squared the formula calls for the CDF function or the area under the curve from -infinity to x. If you are still reading this the picture below will give some context.

 

ad2

The easiest way (and easy is relative here) one would use a table like the one below.  Which is set up for the Standard normal curve.  (mean=0 and Standard deviation =1)

ad3.jpg

 

So before I could even think about how to implement the test in Pascal I needed to build a file to house the table and lookup procedure.   The above table was copied and pasted into the Kyan Pascal Editor as a text file.  See Below :

cdfmtbl5

Next the file needed to be converted from text to a file of real numbers where the values could be looked up, after normalizing the sample value Yi = (Xi – Xbar) / Std Deviation as shown above.

Here is the code to convert the text file to file of REAL that can be used as a random access file.

This first image gives the program documentation in the header comments.

cdfmtbl1

cdfmtbl2

The program first initializes the input file (CDFTBL.TXT), the text file described above,  The commond RESET(CDFIN, ‘CDFTBL.TXT) opens and associates the file CDFTBL.TXT to CDFIN, and sets the read pointer to the beginning of the file. The output file (CDF.TBL) is created with the REWRITE(CDFOUT, ‘CDF.TBL’); command.  The output file is named CDF.TBL on the disk.

cdfmtbl3

The text file has 10 values, 7 characters long, in 41 lines of data, each is read as a string then converted to a real number then written to the output file.  Each of these values is the area under the normal curve from 0 to Xi.

cdfmtbl4

The next post will describe how a value from the table is looked up and then used in the calculations needed for the Anderson-Darling normality test.

RetroChallenge Day 7 – Post 5 Documenting formulas, Sample STATS Code

Moving on from the Dorsett tapes on Statistics to coding the STATS program in Pascal on the emulated Atari 400/800 I am starting with documenting stats formulas in a text file with the ED editor that comes with Kayan Pascal.  Now I will need to take some liberties with typing them in as there are limited characters to do so.  Then perhaps this file can be tied into the program as some help.

This post will include the formulas for Mean, Standard Deviation, Variation, Skewness and Kurtosis, (peaked-ness of a distribution).  Along with some quick and dirty Pascal code to implement as a sample/demo.

Show below is the first two screens of the documentation:

statdoc01.statdoc02

Here is the output of the demo code, which will be presented in this and following posts.

stat01-1

which is calculated from the following sample data, which is a daily measurement of my body weight for a week.  (as an aside I have lost 157 pounds over the past year and I will be using data I have collected during that time as sample data for this program)

Here is the hard coded assignment of the data, in future coding the data will be read from a text file and converted to real numbers)

stat01-2 data

The data is stored in an ARRAY of Real – called SD[], SD[0] – holds the sample size and SD[1] to SD[7] holds the data in time ordered fashion.

I will later provide a full listing of the source code, in the mean time here are the code snipets for calculating the sum of the data and then then mean, along with displaying it to the screen.  The GOTOXY(X,Y) is in the include file CONIO.P and REALTOSTR() function converts a real number from scientific notation to a decimal string format, for easier viewing.  To calculate the sum a for loop from one to n=7 is used and each element of the array is added to the running total, also an array SA[] is a copied from SD[] and will be used later to have a sorted copy for finding the minimum, maximum, median and to be used in the Anderson-Darling normality test.

Then from the sum the mean (xbar) is calculated by dividing the sum by n (7 in this case) and all are printed to the screen, using the GOTOXY and REALTOSTR calls.

stat01-3 - sumstat01-4 mean

The next post will pick up with sorting the array, calculation of Standard Deviation, Variance, Skewness and Kurtosis.

 

 

 

 

RetroChallenge Days 3-6 – Post 4 Dorsett Stats Tapes 4 to 16

Over the course of the last few days I view the majority of the Dorsett Statistics Tapes.  Each is 10-15 minutes in length and the topics covered included Graphical analysis, Distributions, Hypothesis testing.  The topics are covered from a high level with some minimal discussion of formulas, and testing, however some topics are not covered, such a p-values and testing for goodness of fit for distributions.  The statistics program I am developing in Kayan Pascal on the Atari 800 will have the Anderson Darling test for normality (fit to normal distribution) and some level of p-values on hypothesis testing.  I did find one issue with the Dortsett tapes – the file for ST10.wav was the same content as ST9.wav, not a big issue and I’m sure when Kevin Savetz digitized all of the Dorsett tapes it was quite an exercise in converting and cataloging and zipping.  I continue to be curious as to how there are multiple colors on the text screen?  My understanding of Graphics mode 0 is two colors (well really 1 1/2) color and hue for background and text, I will check on text windows in graphics mode 0, which might be how this is being done.

This will be the last post specific to the tapes and I’ll be moving on to the coding of the STATS program.  Here are a few images from the tapes:

Tape 5 : Probality

 

Tape 6 : Probability Continued

 

 

Tape 7 : Probability Distributions

 

Tape 14 : Hypothesis Testing

 

Tape 15 : Hypothesis testing continued

 

Test 16 : Statistics Review

 

This weekend I will pick up with Pascal coding of the STATS program!!!