回帰モデルの評価指標まとめ – MSE、MSLE、MAE、決定係数など

概要

回帰モデルを評価するときに使用する評価指標をまとめました。

回帰モデルの評価指標一覧

評価指標	関数
平均二乗誤差 (MSE)	sklearn.metrics.mean_squared_error()
平均平方二乗誤差 (RMSE)	numpy.sqrt(sklearn.metrics.mean_squared_error())
対数平均二乗誤差 (MSLE)	sklearn.metrics.mean_squared_log_error()
対数平均平方二乗誤差 (RMSLE)	numpy.sqrt(sklearn.metrics.mean_squared_log_error())
平均絶対誤差 (MAE)	sklearn.metrics.mean_absolute_error()
決定係数 ($R^2$)	sklearn.metrics.r2_score()

平均二乗誤差 (MSE)

平均二乗誤差 (Mean Squared Error, MSE) は次の式で計算する指標です。 scikit-learn の sklearn.metrics.mean_squared_error() で計算できます。

$$ \text{MSE} = \frac{1}{N} \sum_{i = 1}^N (y_i – \hat{y}_i)^2 $$

$N$: サンプル数
$y_i$: $i$ 番目のサンプルの目標値
$\hat{y}_i$: $i$ 番目のサンプルの予測値

In [1]:

import numpy as np
from sklearn.metrics import mean_squared_error

# y_trueが真の値、y_predが予測値
y_true = np.array([1.0, 1.1, 1.3, 1.4, 1.6, 1.7, 1.9, 2.0])
y_pred = np.array([1.0, 1.2, 1.3, 1.5, 1.5, 1.9, 1.9, 2.0])

# scikit-learn で計算する場合
mse = mean_squared_error(y_true, y_pred)
print(mse)

# numpy で計算する場合
mse = np.mean((y_true - y_pred) ** 2)
print(mse)

0.008749999999999997
0.008749999999999997

平均平方二乗誤差 (RMSE)

平均平方二乗誤差 (Root Mean Squared Error, RMSE) は次の式で計算する指標です。 scikit-learn の sklearn.metrics.mean_squared_error() で計算できます。

$$ \text{RMSE} = \sqrt{\frac{1}{N} \sum_{i = 1}^N (y_i – \hat{y}_i)^2} $$

$N$: サンプル数
$y_i$: $i$ 番目のサンプルの目標値
$\hat{y}_i$: $i$ 番目のサンプルの予測値

In [2]:

import numpy as np
from sklearn.metrics import mean_squared_error

# y_trueが真の値、y_predが予測値
y_true = np.array([1.0, 1.1, 1.3, 1.4, 1.6, 1.7, 1.9, 2.0])
y_pred = np.array([1.0, 1.2, 1.3, 1.5, 1.5, 1.9, 1.9, 2.0])

# scikit-learn で計算する場合
rmse = np.sqrt(mean_squared_error(y_true, y_pred))
print(rmse)

# numpy で計算する場合
rmse = np.sqrt(np.mean((y_true - y_pred) ** 2))
print(rmse)

0.09354143466934851
0.09354143466934851

対数平均二乗誤差 (MSLE)

対数平均二乗誤差 (Mean Squared Logarithmic Error, MSLE) は次の式で計算する指標です。 scikit-learn の sklearn.metrics.mean_squared_log_error() で計算できます。

$$ \text{MSLE} = \frac{1}{N} \sum_{i = 1}^N (\log (1 + y_i) – \log (1 + \hat{y}_i)) $$

$N$: サンプル数
$y_i$: $i$ 番目のサンプルの目標値
$\hat{y}_i$: $i$ 番目のサンプルの予測値

In [3]:

import numpy as np
from sklearn.metrics import mean_squared_log_error

# y_trueが真の値、y_predが予測値
y_true = np.array([1.0, 1.1, 1.3, 1.4, 1.6, 1.7, 1.9, 2.0])
y_pred = np.array([1.0, 1.2, 1.3, 1.5, 1.5, 1.9, 1.9, 2.0])

# scikit-learn で計算する場合
msle = mean_squared_log_error(y_true, y_pred)
print(msle)

# numpy で計算する場合
msle = np.mean((np.log1p(y_true) - np.log1p(y_pred)) ** 2)
print(msle)

0.0013093993706169834
0.0013093993706169834

対数平均平方二乗誤差 (RMSLE)

対数平均平方二乗誤差 (Root Mean Squared Logarithmic Error, RMSLE) は次の式で計算する指標です。 scikit-learn の sklearn.metrics.mean_squared_log_error() で計算できます。

$$ \text{RMSLE} = \sqrt{\frac{1}{N} \sum_{i = 1}^N (\log (1 + y_i) – \log (1 + \hat{y}_i))} $$

$N$: サンプル数
$y_i$: $i$ 番目のサンプルの目標値
$\hat{y}_i$: $i$ 番目のサンプルの予測値

In [4]:

import numpy as np
from sklearn.metrics import mean_squared_log_error

# y_trueが真の値、y_predが予測値
y_true = np.array([1.0, 1.1, 1.3, 1.4, 1.6, 1.7, 1.9, 2.0])
y_pred = np.array([1.0, 1.2, 1.3, 1.5, 1.5, 1.9, 1.9, 2.0])

# scikit-learn で計算する場合
rmsle = np.sqrt(mean_squared_log_error(y_true, y_pred))
print(rmsle)

# numpy で計算する場合
rmsle = np.sqrt(np.mean((np.log1p(y_true) - np.log1p(y_pred)) ** 2))
print(rmsle)

0.03618562381135613
0.03618562381135613

平均絶対誤差 (MAE)

平均絶対誤差 (Mean Absolute Error, MAE) は次の式で計算する指標です。 scikit-learn の sklearn.metrics.mean_absolute_error() で計算できます。

$$ \text{MAE} = \frac{1}{N} \sum_{i = 1}^N |y_i – \hat{y}_i| $$

$N$: サンプル数
$y_i$: $i$ 番目のサンプルの目標値
$\hat{y}_i$: $i$ 番目のサンプルの予測値

In [5]:

import numpy as np
from sklearn.metrics import mean_absolute_error

# y_trueが真の値、y_predが予測値
y_true = np.array([1.0, 1.1, 1.3, 1.4, 1.6, 1.7, 1.9, 2.0])
y_pred = np.array([1.0, 1.2, 1.3, 1.5, 1.5, 1.9, 1.9, 2.0])

# scikit-learn で計算する場合
mae = mean_absolute_error(y_true, y_pred)
print(mae)

# numpy で計算する場合
mae = np.mean(np.abs(y_true - y_pred))
print(mae)

0.0625
0.0625

決定係数 ($R^2$)

決定係数 (coefficient of determination, $R^2$) は次の式で計算する指標です。 scikit-learn の sklearn.metrics.r2_score() で計算できます。

$$ \begin{aligned} R^2 &= 1 – \frac{\sum_{i = 1}^N (y_i – \hat{y}_i)^2}{\sum_{i = 1}^N (y_i – \bar{y})^2} \\ \bar{y} &= \sum_{i = 1}^N y_i \end{aligned} $$

$N$: サンプル数
$y_i$: $i$ 番目のサンプルの目標値
$\hat{y}_i$: $i$ 番目のサンプルの予測値

In [6]:

from sklearn.metrics import r2_score

# y_trueが真の値、y_predが予測値
y_true = np.array([1.0, 1.1, 1.3, 1.4, 1.6, 1.7, 1.9, 2.0])
y_pred = np.array([1.0, 1.2, 1.3, 1.5, 1.5, 1.9, 1.9, 2.0])

# scikit-learn で計算する場合
r2 = r2_score(y_true, y_pred)
print(r2)

# numpy で計算する場合
r2 = 1 - np.mean((y_true - y_pred) ** 2) / np.mean((y_true - y_true.mean()) ** 2)
print(r2)