問題設定¶

データx, yについて、以下のものを求めよ。

平均
分散
共分散
回帰式y=ax+bのa, b

import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

x = np.array([2, 3, 4, 5, 6])
y = np.array([-6, -5, -4, -3, -1])

解答¶

平均¶

ave_x = sum(x)/len(x)
ave_y = sum(y)/len(y)
ave_x, ave_y

(4.0, -3.7999999999999998)

分散¶

dis_x = sum([(xx - ave_x)**2 for xx in x])/len(x)
dis_y = sum([(yy - ave_y)**2 for yy in y])/len(y)
dis_x, dis_y

(2.0, 2.96)

共分散¶

dis_xy = sum([(x[i]-ave_x)*(y[i]-ave_y) for i in range(len(x))])/len(x)
dis_xy

2.3999999999999999

回帰式y=ax+bのパラメータ¶

http://mathtrain.jp/leastsquares を参考に...

a = dis_xy/dis_x
b = ave_y - a*ave_x
a, b

(1.2, -8.5999999999999996)

図にしてみる¶

f = lambda x: a*x + b
input = np.linspace(0, 10, 1000)
output = [f(xx) for xx in input]
plt.plot(input, output)
plt.scatter(x, y, color="red")
plt.show()

うまく共分散、分散、平均を用いて最小二乗法による一次近似ができた。