The following program calculates the the best model and statistical coefficients for the following model:
H(Y) = A + B F(X) + C G(X)
Where X is the independent variable and Y is the dependent variable. In addition, H(), F(), and G() are transformation functions for the regression variables. The program also calculates the coefficient of determination R-Square.
The program performs different transformations on all the variables. These transformations include:
The program attempts to fit all the possible different curves. For data that have only positive values, the program succeeds in calculating the full set of possible models. The presence of negative values and zeros will reduce the number of models.
The program displays the following simple menu:
BEST MULTIPLE LINEAR REGRESSION (2 VARS)
========================================
0) QUIT
1) KEYBOARD INPUT
2) FILE INPUT
3) FIND BEST FIT
SELECT CHOICE BY NUMBER:
In option 1 the program prompts you to enter the number of observations and then type in the data for X, and Y.
In option 2, the program prompts you for the name of the input text file. This file (which has each value on a separate line) specifies the number of observations and then lists the observations for the variables X and Y.
Option 3 causes the program to calculate the best fit and performs the following tasks:
Here is a sample session that fits the data in the following table:
| X | Y |
| 1 | 7 |
| 2 | 8 |
| 3 | 13 |
| 4 | 22 |
| 5 | 35 |
| 6 | 52 |
The above data can be read from a text file that looks like this:
6 1 7 2 8 3 13 4 22 5 35 6 52
The top ten models that fit the above data are:
R^2 = 1 Y = ( 10. ) + (-5 ) * X1 + ( 2 ) * X2^2 R^2 = .99996571 SQR(Y) = ( .93514887 ) + ( 1.7141375 ) * X1 + (-2.2358857 ) * LOG(X2) R^2 = .9999189 Y = ( 16.387089 ) + (-10.995891 ) * SQR(X1) + ( 1.7338791 ) * X2^2 R^2 = .99982379 LOG(Y) = (-8.1673155 ) + ( 4.4660914 ) * LOG(X1) + ( 10.112393 ) * 1/SQR(X2) R^2 = .99979627 SQR(Y) = ( 5.6943149 ) + ( 2.5478097 ) * X1 + (-5.6105579 ) * SQR(X2) R^2 = .99973459 LOG(Y) = (-1.0388005 ) + ( 2.0308004 ) * SQR(X1) + ( .95445884 ) * 1/X2^2 R^2 = .99969114 SQR(Y) = (-3.3140124 ) + ( 1.4420207 ) * X1 + ( 4.5327844 ) * 1/SQR(X2) R^2 = .99968287 Y = ( 5.6070019 ) + (-6.4580511 ) * LOG(X1) + ( 1.6027157 ) * X2^2 R^2 = .99959842 1/SQR(Y) = (-.25855689 ) + ( .99222751 ) * 1/SQR(X1) + (-.35571662 ) * 1/X2^2 R^2 = .99959659 LOG(Y) = (-1.8024133 ) + ( 2.8587806 ) * LOG(X1) + ( 3.7525629 ) * 1/X2
Here is the BASIC listing:
! PROGRAM TO FIND BEST MULTIPLE LINEARIZED REGRESSION FOR 2 VARIABLES
!
! F(Y) = A + B G(X) + C H(X)
OPTION TYPO
OPTION NOLET
DECLARE NUMERIC MAX_CURVES
DEClARE NUMERIC ITX1, ITX2, ITY, NDATA, CH, I, K, A, B, T1, T2, T3
DECLARE NUMERIC SumX1, SumX1Sqr, SumY, SumYSqr, SumX1Y, Yt, X1t, X2t
DECLARE NUMERIC SumX2, SumX2Sqr, SumX1X2, SumX2Y, MeanX1, MeanX2, MeanY
DECLARE STRING A$, R$, D$
DIM R2(512), SlopeX1(512), SlopeX2(512), Intercept(512), TX1(512), TY(512), TX2(512)
DIM X(1), Y(1)
SUB InitStatArrays
LOCAL I
FOR I = 1 to 512
R2(I) = 0
SlopeX1(I) = 0
SlopeX1(I) = 0
Intercept(I) = 0
TX1(I) = 0
TX2(I) = 0
TY(I) = 0
NEXT I
END SUB
SUB SortResults
LOCAL I, J, BUFF
FOR I = 1 TO MAX_CURVES - 1
FOR J = I+1 TO MAX_CURVES
IF R2(I) < R2(J) THEN
BUFF = R2(I)
R2(I) = R2(J)
R2(J) = BUFF
BUFF = SlopeX1(I)
SlopeX1(I) = SlopeX1(J)
SlopeX1(J) = BUFF
BUFF = SlopeX2(I)
SlopeX2(I) = SlopeX2(J)
SlopeX2(J) = BUFF
BUFF = Intercept(I)
Intercept(I) = Intercept(J)
Intercept(J) = BUFF
BUFF = TX1(I)
TX1(I) = TX1(J)
TX1(J) = BUFF
BUFF = TY(I)
TY(I) = TY(J)
TY(J) = BUFF
BUFF = TX2(I)
TX2(I) = TX2(J)
TX2(J) = BUFF
END IF
NEXT J
NEXT I
END SUB
DEF SayTransf$(TI, V$)
LOCAL B$
SELECT CASE TI
CASE 1
B$ = V$
CASE 2
B$ = "LOG(" & V$ &")"
CASE 3
B$ = "SQR(" & V$ & ")"
CASE 4
B$ = "1/SQR(" & V$ & ")"
CASE 5
B$ = "1/" & V$
CASE 6
B$ = V$ & "^2"
CASE 7
B$ = "1/" & V$ & "^2"
CASE 8
B$ = V$ & "^3"
CASE ELSE
B$ = V$
END SELECT
SayTransf$ = B$
END DEF
DO
PRINT
PRINT TAB(20);"BEST MULTIPLE LINEAR REGRESSION (2 VARS)"
PRINT TAB(20);"========================================"
PRINT "0) QUIT"
PRINT "1) KEYBOARD INPUT"
PRINT "2) FILE INPUT"
PRINT "3) FIND BEST FIT"
INPUT PROMPT "SELECT CHOICE BY NUMBER: ":CH
IF CH=0 THEN
PRINT "BYE!"
ELSEIF CH=1 THEN
A$ = "KEYBOARD"
INPUT PROMPT "ENTER NUMBER OF OBSERVATIONS: ": NDATA
MAT REDIM X(NDATA), Y(NDATA)
FOR I = 1 TO NDATA
PRINT "X(";I;")";
INPUT X(I)
PRINT "Y(";I;")";
INPUT Y(I)
NEXT I
ELSEIF CH=2 THEN
INPUT PROMPT "ENTER FILENAME? ":A$
WHEN ERROR IN
OPEN #1: NAME A$, ORG TEXT, CREATE OLD, ACCESS INPUT
INPUT #1: NDATA
MAT REDIM X(NDATA), Y(NDATA)
FOR I = 1 TO NDATA
INPUT #1: X(I)
INPUT #1: Y(I)
NEXT I
CLOSE #1
USE
PRINT "COULD NOT OPEN OR READ FROM FILE ";A$
END WHEN
ELSEIF CH=3 THEN
CALL InitStatArrays
K = 0
FOR ITX1 = 1 TO 7
FOR ITX2 = ITX1+1 TO 8
FOR ITY = 1 to 8
SumX1 = 0
SumX2 = 0
SumY = 0
SumX1Sqr = 0
SumX2Sqr = 0
SumYSqr = 0
SumX1Y = 0
SumX2Y = 0
SumX1X2 = 0
K = K + 1
TX1(K) = ITX1
TX2(K) = ITX2
TY(K) = ITY
WHEN ERROR IN
FOR I = 1 TO NDATA
SELECT CASE ITX1
CASE 1
X1t = X(I)
CASE 2
X1t = LOG(X(I))
CASE 3
X1t = SQR(X(I))
CASE 4
X1t = 1/SQR(X(I))
CASE 5
X1t = 1/X(I)
CASE 6
X1t = X(I)^2
CASE 7
X1t = 1/X(I)^2
CASE 8
X1t = X(I)^3
CASE ELSE
X1t = X(i)
END SELECT
SELECT CASE ITX2
CASE 1
X2t = X(I)
CASE 2
X2t = LOG(X(I))
CASE 3
X2t = SQR(X(I))
CASE 4
X2t = 1/SQR(X(I))
CASE 5
X2t = 1/X(I)
CASE 6
X2t = X(I)^2
CASE 7
X2t = 1/X(I)^2
CASE 8
X2t = X(I)^3
CASE ELSE
X2t = X(i)
END SELECT
SELECT CASE ITY
CASE 1
Yt = Y(I)
CASE 2
Yt = LOG(Y(I))
CASE 3
Yt = SQR(Y(I))
CASE 4
Yt = 1/SQR(Y(I))
CASE 5
Yt = 1/Y(I)
CASE 6
Yt = Y(I)^2
CASE 7
Yt = 1/Y(I)^2
CASE 8
Yt = Y(I)^3
CASE ELSE
Yt = Y(I)
END SELECT
SumX1 = SumX1 + X1t
SumX2 = SumX2 + X2t
SumY = SumY + Yt
SumX1Sqr = SumX1Sqr + X1t^2
SumX2Sqr = SumX2Sqr + X2t^2
SumYSqr = SumYSqr + Yt^2
SumX1Y = SumX1Y + X1t * Yt
SumX2Y = SumX2Y + X2t * Yt
SumX1X2 = SumX1X2 + X1t * X2t
NEXT I
MeanX1 = SumX1 / NDATA
MeanX2 = SumX2 / NDATA
MeanY = SumY / NDATA
A = (NDATA * SumX1Sqr - SumX1 ^ 2) * (NDATA * SumX2Y - SumX2 * SumY)
B = (NDATA * SumX1X2 - SumX1 * SumX2) * (NDATA * SumX1Y - SumX1 * SumY)
T1 = NDATA * SumX1Sqr - SumX1 ^ 2
T2 = NDATA * SumX2Sqr - SumX2 ^ 2
T3 = NDATA * SumX1X2 - SumX1 * SumX2
SlopeX2(K) = (A - B) / (T1 * T2 - T3 ^ 2)
T1 = NDATA * SumX1Y - SumX1 * SumY
T2 = NDATA * SumX1X2 - SumX1 * SumX2
T3 = NDATA * SumX1Sqr - SumX1 ^ 2
SlopeX1(K) = (T1 - SlopeX2(K) * T2) / T3
Intercept(K) = MeanY - SlopeX1(K) * MeanX1 - SlopeX2(K) * MeanX2
T1 = SumY ^ 2 / NDATA
R2(K) = (Intercept(K) * SumY + SlopeX1(K) * SumX1Y + SlopeX2(K) * SumX2Y - T1) / (SumYSqr - T1)
USE
SlopeX1(K) = 0
SlopeX2(K) = 0
Intercept(K) = 0
R2(K) = 0
END WHEN
NEXT ITY
NEXT ITX2
NEXT ITX1
MAX_CURVES = K
CALL SortResults
PRINT
PRINT "TOP 10 CURVES"
! Show top 10 best cyrve fits
FOR I = 1 TO 10
PRINT "R^2 = ";R2(I)
PRINT SayTransf$(TY(I), "Y");" = (";Intercept(I);") + (";SlopeX1(I);") * "; SayTransf$(TX1(I), "X");" + (";SlopeX2(I);") * "; SayTransf$(TX2(I), "X")
PRINT
NEXT I
I = POS(A$, ".")
IF I > 0 THEN
R$ = A$[1:I-1] & "_REPORT.TXT"
ELSE
R$ = A$ & "_REPORT.TXT"
END IF
OPEN #1: NAME R$, ORG TEXT, CREATE NEWOLD, ACCESS OUTIN
ERASE #1
PRINT #1: "DATA SOURCE ";A$
D$ = DATE$
PRINT #1: D$[5:6] & "/" & D$[7:8] & "/" & D$[1:4] & " " & TIME$
PRINT #1: ""
FOR I = 1 TO MAX_CURVES
IF R2(I) <= 0 THEN EXIT FOR
PRINT #1: "R^2 = ";R2(I)
PRINT #1: SayTransf$(TY(I), "Y");" = (";Intercept(I);") + (";SlopeX1(I);") * "; SayTransf$(TX1(I), "X1");" + (";SlopeX2(I);") * "; SayTransf$(TX2(I), "X2")
PRINT #1: ""
NEXT I
CLOSE #1
PRINT "EXPANDED LIST OR CURVE FITS WAS WRITTEN TO FILE ";R$
ELSE
PRINT "INVALID CHOICE"
END IF
LOOP UNTIL CH = 0
END
Copyright (c) Namir Shammas. All rights reserved.