The following program calculates the the best model and statistical coefficients for the following model:
H(Y) = A + B F(X)
Where X is the independent variable and Y is the dependent variable. In addition, H() and F() are transformation functions for the regression variables. The program also calculates the coefficient of determination R-Square.
The program performs different transformations on all the variables. These transformations include:
The program attempts to fit a total of 64 different curves. For data that have only positive values, the program succeeds in calculating 64 different models. The presence of negative values and zeros will reduce the number of models.
The program displays the following simple menu:
BEST LINEAR REGRESSION
=======================
0) QUIT
1) KEYBOARD INPUT
2) FILE INPUT
3) FIND BEST FIT
SELECT CHOICE BY NUMBER:
In option 1 the program prompts you to enter the number of observations and then type in the data for X and Y.
In option 2, the program prompts you for the name of the input text file. This file (which has each value on a separate line) specifies the number of observations and then lists the observations for the variables X and Y.
Option 3 causes the program to calculate the best fit and performs the following tasks:
Here is a sample session that fits the data in the following table:
| X | Y |
| 100 | 212 |
| 10 | 50 |
| 25 | 77 |
| 30 | 86 |
| 35 | 95 |
| 40 | 104 |
The above data can be read from a text file that looks like this:
6 100 212 10 50 25 77 30 86 35 95 40 104
The top ten models that fit the above data are:
R^2 = 1 Y = ( 32 ) + ( 1.8 ) * X MeanX = 40 MeanY = 104 SdevX = 31.144823 SdevY = 56.060681 R^2 = .99935625 1/SQR(Y) = ( .21511956 ) + (-3.1685186e-2 ) * LOG(X) MeanX = 3.4620093 MeanY = .10542515 SdevX = .74485911 SdevY = .0236086 R^2 = .99841951 Y^3 = ( 360453.48 ) + ( 9.1831683 ) * X^3 MeanX = 191750 MeanY = 2121326 SdevX = 396560.22 SdevY = 3644560.5 R^2 = .99813172 Y^3 = (-214475.78 ) + ( 969.88309 ) * X^2 MeanX = 2408.3333 MeanY = 2121326 SdevX = 3754.2198 SdevY = 3644560.5 R^2 = .99785498 Y^2 = ( 3378.0973 ) + ( 4.1758765 ) * X^2 MeanX = 2408.3333 MeanY = 13435 SdevX = 3754.2198 SdevY = 15694. R^2 = .99622634 1/Y^2 = (-8.3885371e-6 ) + ( 4.1354627e-3 ) * 1/X MeanX = 3.9484127e-2 MeanY = 1.548966e-4 SdevX = 3.1300032e-2 SdevY = 1.2968504e-4 R^2 = .99622573 LOG(Y) = ( 3.2928317 ) + ( .20925336 ) * SQR(X) MeanX = 5.9800231 MeanY = 4.5441716 SdevX = 2.2554798 SdevY = .47285993 R^2 = .99555317 SQR(Y) = ( 3.2993261 ) + ( 1.11005 ) * SQR(X) MeanX = 5.9800231 MeanY = 9.9374506 SdevX = 2.2554798 SdevY = 2.5092807 R^2 = .98888781 1/Y = (-1.4552693e-3 ) + ( 6.9457301e-2 ) * 1/SQR(X) MeanX = .18765778 MeanY = 1.1578934e-2 SdevX = 7.1571091e-2 SdevY = 4.9989872e-3 R^2 = .98791158 SQR(Y) = ( 6.7342628 ) + ( 8.0079697e-2 ) * X MeanX = 40 MeanY = 9.9374506 SdevX = 31.144823 SdevY = 2.5092807
Here is the BASIC listing:
! PROGRAM TO FIND BEST LINEARIZED REGRESSION
OPTION TYPO
OPTION NOLET
DECLARE NUMERIC MAX_CURVES
DEClARE NUMERIC ITX, ITY, NDATA, CH, I, K
DECLARE NUMERIC SumX, SumX2, SumY, SumY2, SumXY, Yt, Xt
DECLARE STRING A$, R$, D$
DIM R2(64), Slope(64), Intercept(64), MeanX(64), MeanY(64), SdevX(64), SdevY(64), TX(64), TY(64)
DIM X(1), Y(1)
MAX_CURVES = 64
SUB InitStatArrays
LOCAL I
FOR I = 1 to MAX_CURVES
R2(I) = 0
Slope(I) = 0
Intercept(I) = 0
MeanX(I) = 0
MeanY(I) = 0
SdevX(I) = 0
SdevY(I) = 0
TX(i) = 0
TY(i) = 0
NEXT I
END SUB
SUB SortResults
LOCAL I, J, BUFF
FOR I = 1 TO MAX_CURVES - 1
FOR J = I+1 TO MAX_CURVES
IF R2(I) < R2(J) THEN
BUFF = R2(I)
R2(I) = R2(J)
R2(J) = BUFF
BUFF = Slope(I)
Slope(I) = Slope(J)
Slope(J) = BUFF
BUFF = Intercept(I)
Intercept(I) = Intercept(J)
Intercept(J) = BUFF
BUFF = MeanX(I)
MeanX(I) = MeanX(J)
MeanX(J) = BUFF
BUFF = MeanY(I)
MeanY(I) = MeanY(J)
MeanY(J) = BUFF
BUFF = SdevX(I)
SdevX(I) = SdevX(J)
SdevX(J) = BUFF
BUFF = SdevY(I)
SdevY(I) = SdevY(J)
SdevY(J) = BUFF
BUFF = TX(I)
TX(I) = TX(J)
TX(J) = BUFF
BUFF = TY(I)
TY(I) = TY(J)
TY(J) = BUFF
END IF
NEXT J
NEXT I
END SUB
DEF SayTransf$(TI, V$)
LOCAL B$
SELECT CASE TI
CASE 1
B$ = V$
CASE 2
B$ = "LOG(" & V$ &")"
CASE 3
B$ = "SQR(" & V$ & ")"
CASE 4
B$ = "1/SQR(" & V$ & ")"
CASE 5
B$ = "1/" & V$
CASE 6
B$ = V$ & "^2"
CASE 7
B$ = "1/" & V$ & "^2"
CASE 8
B$ = V$ & "^3"
CASE ELSE
B$ = V$
END SELECT
SayTransf$ = B$
END DEF
DO
PRINT
PRINT TAB(20);"BEST LINEAR REGRESSION"
PRINT TAB(20);"======================"
PRINT "0) QUIT"
PRINT "1) KEYBOARD INPUT"
PRINT "2) FILE INPUT"
PRINT "3) FIND BEST FIT"
INPUT PROMPT "SELECT CHOICE BY NUMBER:":CH
IF CH=0 THEN
PRINT "BYE!"
ELSEIF CH=1 THEN
A$ = "KEYBOARD"
INPUT PROMPT "ENTER NUMBER OF OBSERVATIONS: ": NDATA
MAT REDIM X(NDATA), Y(NDATA)
FOR I = 1 TO NDATA
PRINT "X(";I;")";
INPUT X(I)
PRINT "Y(";I;")";
INPUT Y(I)
NEXT I
ELSEIF CH=2 THEN
INPUT PROMPT "ENTER FILENAME? ":A$
WHEN ERROR IN
OPEN #1: NAME A$, ORG TEXT, CREATE OLD, ACCESS INPUT
INPUT #1: NDATA
MAT REDIM X(NDATA), Y(NDATA)
FOR I = 1 TO NDATA
INPUT #1: X(I)
INPUT #1: Y(I)
NEXT I
CLOSE #1
USE
PRINT "COULD NOT OPEN OR READ FROM FILE ";A$
END WHEN
ELSEIF CH=3 THEN
CALL InitStatArrays
K = 0
FOR ITX = 1 TO 8
FOR ITY = 1 to 8
SumX = 0
SumY = 0
SumX2 = 0
SumY2 = 0
SumXY = 0
K = K + 1
TX(K) = ITX
TY(K) = ITY
WHEN ERROR IN
FOR I = 1 TO NDATA
SELECT CASE ITX
CASE 1
Xt = X(I)
CASE 2
Xt = LOG(X(I))
CASE 3
Xt = SQR(X(I))
CASE 4
Xt = 1/SQR(X(I))
CASE 5
Xt = 1/X(I)
CASE 6
Xt = X(I)^2
CASE 7
Xt = 1/X(I)^2
CASE 8
Xt = X(I)^3
CASE ELSE
Xt = X(i)
END SELECT
SELECT CASE ITY
CASE 1
Yt = Y(I)
CASE 2
Yt = LOG(Y(I))
CASE 3
Yt = SQR(Y(I))
CASE 4
Yt = 1/SQR(Y(I))
CASE 5
Yt = 1/Y(I)
CASE 6
Yt = Y(I)^2
CASE 7
Yt = 1/Y(I)^2
CASE 8
Yt = Y(I)^3
CASE ELSE
Yt = Y(I)
END SELECT
SumX = SumX + Xt
SumX2 = SumX2 + Xt^2
SumY = SumY + Yt
SumY2 = SumY2 + Yt^2
SumXY = SumXY + Xt * Yt
NEXT I
MeanX(K) = SumX / NDATA
MeanY(K) = SumY / NDATA
SdevX(K) = Sqr((SumX2 - SumX^2/NDATA)/(NDATA-1))
SdevY(K) = Sqr((SumY2 - SumY^2/NDATA)/(NDATA-1))
Slope(K) = (NDATA * SumXY - SumX * SumY) / (NDATA * SumX2 - SumX ^ 2)
Intercept(K) = MeanY(K) - Slope(K) * MeanX(K)
R2(K) = ((NDATA * SumXY - SumX * SumY) / (NDATA * (NDATA - 1) * SdevX(K) * SdevY(K))) ^ 2
USE
MeanX(K) = 0
MeanY(K) = 0
SdevX(K) = 0
SdevY(K) = 0
Slope(K) = 0
Intercept(K) = 0
R2(K) = 0
END WHEN
NEXT ITY
NEXT ITX
CALL SortResults
PRINT
PRINT "TOP 5 CURVES"
! Show top 5 best cyrve fits
FOR I = 1 TO 5
PRINT "R^2 = ";R2(I)
PRINT SayTransf$(TY(I), "Y");" = (";Intercept(I);") + (";Slope(I);") * "; SayTransf$(TX(I), "X")
PRINT "MeanX = "; MeanX(I);" MeanY = ";MeanY(I)
PRINT "SdevX = "; SdevX(I);" SdevY = ";SdevY(I)
PRINT
NEXT I
I = POS(A$, ".")
IF I > 0 THEN
R$ = A$[1:I-1] & "_REPORT.TXT"
ELSE
R$ = A$ & "_REPORT.TXT"
END IF
OPEN #1: NAME R$, ORG TEXT, CREATE NEWOLD, ACCESS OUTIN
ERASE #1
PRINT #1: "DATA SOURCE ";A$
D$ = DATE$
PRINT #1: D$[5:6] & "/" & D$[7:8] & "/" & D$[1:4] & " " & TIME$
PRINT #1: ""
FOR I = 1 TO MAX_CURVES
IF R2(I) <= 0 THEN EXIT FOR
PRINT #1: "R^2 = ";R2(I)
PRINT #1: SayTransf$(TY(I), "Y");" = (";Intercept(I);") + (";Slope(I);") * "; SayTransf$(TX(I), "X")
PRINT #1: "MeanX = "; MeanX(I);" MeanY = ";MeanY(I)
PRINT #1: "SdevX = "; SdevX(I);" SdevY = ";SdevY(I)
PRINT #1: ""
NEXT I
CLOSE #1
PRINT "FULL LIST OR CURVE FITS WAS WRITTEN TO FILE ";R$
ELSE
PRINT "INVALID CHOICE"
END IF
LOOP UNTIL CH = 0
END
Copyright (c) Namir Shammas. All rights reserved.