Постановка задачи: цифровая медиа-компания (похожая на Voot, Hotstar, Netflix и т. д.) запустила шоу. Первоначально шоу получило хороший отклик, но затем количество зрителей снизилось. Компания хочет выяснить, что пошло не так.
In [317]:
# Importing all required packages import numpy as np import pandas as pd import matplotlib.pyplot as plt import seaborn as sns %matplotlib inline
In [318]:
#Importing dataset media = pd.read_csv('mediacompany.csv') media = media.drop('Unnamed: 7',axis = 1)
In [319]:
#Let's explore the top 5 rows media.head()
Выход[319]:
In [320]:
# Converting date to Pandas datetime format media['Date'] = pd.to_datetime(media['Date'])
In [321]:
media.head()
Выход[321]:
In [322]:
# Deriving "days since the show started" from datetime import date d0 = date(2017, 2, 28) d1 = media.Date delta = d1 - d0 media['day']= delta
In [323]:
media.head()
Выход[323]:
In [324]:
# Cleaning days media['day'] = media['day'].astype(str) media['day'] = media['day'].map(lambda x: x[0:2]) media['day'] = media['day'].astype(int)
In [325]:
media.head()
Выход[325]:
In [326]:
# days vs Views_show media.plot.line(x='day', y='Views_show')
Выход[326]:
In [327]:
# Scatter Plot (days vs Views_show) colors = (0,0,0) area = np.pi*3 plt.scatter(media.day, media.Views_show, s=area, c=colors, alpha=0.5) plt.title('Scatter plot pythonspot.com') plt.xlabel('x') plt.ylabel('y') plt.show()
In [328]:
# plot for days vs Views_show and days vs Ad_impressions fig = plt.figure() host = fig.add_subplot(111) par1 = host.twinx() par2 = host.twinx() host.set_xlabel("Day") host.set_ylabel("View_Show") par1.set_ylabel("Ad_impression") color1 = plt.cm.viridis(0) color2 = plt.cm.viridis(0.5) color3 = plt.cm.viridis(.9) p1, = host.plot(media.day,media.Views_show, color=color1,label="View_Show") p2, = par1.plot(media.day,media.Ad_impression,color=color2, label="Ad_impression") lns = [p1, p2] host.legend(handles=lns, loc='best') # right, left, top, bottom par2.spines['right'].set_position(('outward', 60)) # no x-ticks par2.xaxis.set_ticks([]) # Sometimes handy, same for xaxis #par2.yaxis.set_ticks_position('right') host.yaxis.label.set_color(p1.get_color()) par1.yaxis.label.set_color(p2.get_color()) plt.savefig("pyplot_multiple_y-axis.png", bbox_inches='tight')
In [329]:
# Derived Metrics # Weekdays are taken such that 1 corresponds to Sunday and 7 to Saturday # Generate the weekday variable media['weekday'] = (media['day']+3)%7 media.weekday.replace(0,7, inplace=True) media['weekday'] = media['weekday'].astype(int) media.head()
Выход[329]:
Запуск первой модели (lm1) Будни и посетители
In [330]:
# Putting feature variable to X X = media[['Visitors','weekday']] # Putting response variable to y y = media['Views_show']
In [331]:
from sklearn.linear_model import LinearRegression
In [332]:
# Representing LinearRegression as lr(Creating LinearRegression Object) lm = LinearRegression()
In [333]:
# fit the model to the training data lm.fit(X,y)
Выход[333]:
LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False)
In [334]:
import statsmodels.api as sm #Unlike SKLearn, statsmodels don't automatically fit a constant, #so you need to use the method sm.add_constant(X) in order to add a constant. X = sm.add_constant(X) # create a fitted model in one line lm_1 = sm.OLS(y,X).fit() print(lm_1.summary()) OLS Regression Results ============================================================================== Dep. Variable: Views_show R-squared: 0.485 Model: OLS Adj. R-squared: 0.472 Method: Least Squares F-statistic: 36.26 Date: Fri, 09 Mar 2018 Prob (F-statistic): 8.01e-12 Time: 10:27:35 Log-Likelihood: -1042.5 No. Observations: 80 AIC: 2091. Df Residuals: 77 BIC: 2098. Df Model: 2 Covariance Type: nonrobust ============================================================================== coef std err t P>|t| [0.025 0.975] ------------------------------------------------------------------------------ const -3.862e+04 1.07e+05 -0.360 0.720 -2.52e+05 1.75e+05 Visitors 0.2787 0.057 4.911 0.000 0.166 0.392 weekday -3.591e+04 6591.205 -5.448 0.000 -4.9e+04 -2.28e+04 ============================================================================== Omnibus: 2.684 Durbin-Watson: 0.650 Prob(Omnibus): 0.261 Jarque-Bera (JB): 2.653 Skew: 0.423 Prob(JB): 0.265 Kurtosis: 2.718 Cond. No. 1.46e+07 ============================================================================== Warnings: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified. [2] The condition number is large, 1.46e+07. This might indicate that there are strong multicollinearity or other numerical problems.
In [335]:
# create Weekend variable, with value 1 at weekends and 0 at weekdays def cond(i): if i % 7 == 5: return 1 elif i % 7 == 4: return 1 else :return 0 return i media['weekend']=[cond(i) for i in media['day']]
In [336]:
media.head()
Исход[336]:
Запуск второй модели (lm2) посетителей и выходных
In [337]:
# Putting feature variable to X X = media[['Visitors','weekend']] # Putting response variable to y y = media['Views_show']
In [338]:
import statsmodels.api as sm #Unlike SKLearn, statsmodels don't automatically fit a constant, #so you need to use the method sm.add_constant(X) in order to add a constant. X = sm.add_constant(X) # create a fitted model in one line lm_2 = sm.OLS(y,X).fit() print(lm_2.summary()) OLS Regression Results ============================================================================== Dep. Variable: Views_show R-squared: 0.500 Model: OLS Adj. R-squared: 0.487 Method: Least Squares F-statistic: 38.55 Date: Fri, 09 Mar 2018 Prob (F-statistic): 2.51e-12 Time: 10:27:35 Log-Likelihood: -1041.3 No. Observations: 80 AIC: 2089. Df Residuals: 77 BIC: 2096. Df Model: 2 Covariance Type: nonrobust ============================================================================== coef std err t P>|t| [0.025 0.975] ------------------------------------------------------------------------------ const -8.833e+04 1.01e+05 -0.875 0.384 -2.89e+05 1.13e+05 Visitors 0.1934 0.061 3.160 0.002 0.071 0.315 weekend 1.807e+05 3.15e+04 5.740 0.000 1.18e+05 2.43e+05 ============================================================================== Omnibus: 1.302 Durbin-Watson: 1.254 Prob(Omnibus): 0.521 Jarque-Bera (JB): 1.367 Skew: 0.270 Prob(JB): 0.505 Kurtosis: 2.656 Cond. No. 1.41e+07 ============================================================================== Warnings: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified. [2] The condition number is large, 1.41e+07. This might indicate that there are strong multicollinearity or other numerical problems.
Запуск третьей модели (lm3) посетителей, выходные и Character_A
In [339]:
# Putting feature variable to X X = media[['Visitors','weekend','Character_A']] # Putting response variable to y y = media['Views_show']
In [340]:
import statsmodels.api as sm #Unlike SKLearn, statsmodels don't automatically fit a constant, #so you need to use the method sm.add_constant(X) in order to add a constant. X = sm.add_constant(X) # create a fitted model in one line lm_3 = sm.OLS(y,X).fit() print(lm_3.summary()) OLS Regression Results ============================================================================== Dep. Variable: Views_show R-squared: 0.586 Model: OLS Adj. R-squared: 0.570 Method: Least Squares F-statistic: 35.84 Date: Fri, 09 Mar 2018 Prob (F-statistic): 1.53e-14 Time: 10:27:35 Log-Likelihood: -1033.8 No. Observations: 80 AIC: 2076. Df Residuals: 76 BIC: 2085. Df Model: 3 Covariance Type: nonrobust =============================================================================== coef std err t P>|t| [0.025 0.975] ------------------------------------------------------------------------------- const -4.722e+04 9.31e+04 -0.507 0.613 -2.33e+05 1.38e+05 Visitors 0.1480 0.057 2.586 0.012 0.034 0.262 weekend 1.812e+05 2.89e+04 6.281 0.000 1.24e+05 2.39e+05 Character_A 9.542e+04 2.41e+04 3.963 0.000 4.75e+04 1.43e+05 ============================================================================== Omnibus: 0.908 Durbin-Watson: 1.600 Prob(Omnibus): 0.635 Jarque-Bera (JB): 0.876 Skew: -0.009 Prob(JB): 0.645 Kurtosis: 2.488 Cond. No. 1.42e+07 ============================================================================== Warnings: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified. [2] The condition number is large, 1.42e+07. This might indicate that there are strong multicollinearity or other numerical problems.
In [341]:
# Create lag variable media['Lag_Views'] = np.roll(media['Views_show'], 1) media.Lag_Views.replace(108961,0, inplace=True)
In [342]:
media.head()
Выход[342]:
Запуск четвертой модели (lm4): посетители, Character_A, Lag_views и выходные
In [343]:
# Putting feature variable to X X = media[['Visitors','Character_A','Lag_Views','weekend']] # Putting response variable to y y = media['Views_show']
In [344]:
import statsmodels.api as sm #Unlike SKLearn, statsmodels don't automatically fit a constant, #so you need to use the method sm.add_constant(X) in order to add a constant. X = sm.add_constant(X) # create a fitted model in one line lm_4 = sm.OLS(y,X).fit() print(lm_4.summary()) OLS Regression Results ============================================================================== Dep. Variable: Views_show R-squared: 0.740 Model: OLS Adj. R-squared: 0.726 Method: Least Squares F-statistic: 53.46 Date: Fri, 09 Mar 2018 Prob (F-statistic): 3.16e-21 Time: 10:27:36 Log-Likelihood: -1015.1 No. Observations: 80 AIC: 2040. Df Residuals: 75 BIC: 2052. Df Model: 4 Covariance Type: nonrobust =============================================================================== coef std err t P>|t| [0.025 0.975] ------------------------------------------------------------------------------- const -2.98e+04 7.43e+04 -0.401 0.689 -1.78e+05 1.18e+05 Visitors 0.0659 0.047 1.394 0.167 -0.028 0.160 Character_A 5.527e+04 2.01e+04 2.748 0.008 1.52e+04 9.53e+04 Lag_Views 0.4317 0.065 6.679 0.000 0.303 0.560 weekend 2.273e+05 2.4e+04 9.467 0.000 1.79e+05 2.75e+05 ============================================================================== Omnibus: 1.425 Durbin-Watson: 2.626 Prob(Omnibus): 0.491 Jarque-Bera (JB): 0.821 Skew: -0.130 Prob(JB): 0.663 Kurtosis: 3.423 Cond. No. 1.44e+07 ============================================================================== Warnings: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified. [2] The condition number is large, 1.44e+07. This might indicate that there are strong multicollinearity or other numerical problems.
In [345]:
plt.figure(figsize = (20,10)) # Size of the figure sns.heatmap(media.corr(),annot = True)
Выход[345]:
<matplotlib.axes._subplots.AxesSubplot at 0x1d2cc0301d0>
Запуск пятой модели (lm5) Персонаж_А, выходные и платформа_просмотров
In [346]:
# Putting feature variable to X X = media[['weekend','Character_A','Views_platform']] # Putting response variable to y y = media['Views_show']
In [347]:
import statsmodels.api as sm #Unlike SKLearn, statsmodels don't automatically fit a constant, #so you need to use the method sm.add_constant(X) in order to add a constant. X = sm.add_constant(X) # create a fitted model in one line lm_5 = sm.OLS(y,X).fit() print(lm_5.summary()) OLS Regression Results ============================================================================== Dep. Variable: Views_show R-squared: 0.602 Model: OLS Adj. R-squared: 0.586 Method: Least Squares F-statistic: 38.24 Date: Fri, 09 Mar 2018 Prob (F-statistic): 3.59e-15 Time: 10:27:37 Log-Likelihood: -1032.3 No. Observations: 80 AIC: 2073. Df Residuals: 76 BIC: 2082. Df Model: 3 Covariance Type: nonrobust ================================================================================== coef std err t P>|t| [0.025 0.975] ---------------------------------------------------------------------------------- const -1.205e+05 9.97e+04 -1.208 0.231 -3.19e+05 7.81e+04 weekend 1.781e+05 2.78e+04 6.410 0.000 1.23e+05 2.33e+05 Character_A 7.062e+04 2.6e+04 2.717 0.008 1.89e+04 1.22e+05 Views_platform 0.1507 0.048 3.152 0.002 0.055 0.246 ============================================================================== Omnibus: 4.279 Durbin-Watson: 1.516 Prob(Omnibus): 0.118 Jarque-Bera (JB): 2.153 Skew: 0.061 Prob(JB): 0.341 Kurtosis: 2.206 Cond. No. 2.03e+07 ============================================================================== Warnings: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified. [2] The condition number is large, 2.03e+07. This might indicate that there are strong multicollinearity or other numerical problems.
Запуск шестой модели (lm6) Персонаж_А, выходные и посетители
In [348]:
# Putting feature variable to X X = media[['weekend','Character_A','Visitors']] # Putting response variable to y y = media['Views_show']
In [349]:
import statsmodels.api as sm #Unlike SKLearn, statsmodels don't automatically fit a constant, #so you need to use the method sm.add_constant(X) in order to add a constant. X = sm.add_constant(X) # create a fitted model in one line lm_6 = sm.OLS(y,X).fit() print(lm_6.summary()) OLS Regression Results ============================================================================== Dep. Variable: Views_show R-squared: 0.586 Model: OLS Adj. R-squared: 0.570 Method: Least Squares F-statistic: 35.84 Date: Fri, 09 Mar 2018 Prob (F-statistic): 1.53e-14 Time: 10:27:37 Log-Likelihood: -1033.8 No. Observations: 80 AIC: 2076. Df Residuals: 76 BIC: 2085. Df Model: 3 Covariance Type: nonrobust =============================================================================== coef std err t P>|t| [0.025 0.975] ------------------------------------------------------------------------------- const -4.722e+04 9.31e+04 -0.507 0.613 -2.33e+05 1.38e+05 weekend 1.812e+05 2.89e+04 6.281 0.000 1.24e+05 2.39e+05 Character_A 9.542e+04 2.41e+04 3.963 0.000 4.75e+04 1.43e+05 Visitors 0.1480 0.057 2.586 0.012 0.034 0.262 ============================================================================== Omnibus: 0.908 Durbin-Watson: 1.600 Prob(Omnibus): 0.635 Jarque-Bera (JB): 0.876 Skew: -0.009 Prob(JB): 0.645 Kurtosis: 2.488 Cond. No. 1.42e+07 ============================================================================== Warnings: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified. [2] The condition number is large, 1.42e+07. This might indicate that there are strong multicollinearity or other numerical problems.
Запуск седьмой модели (lm7) Персонаж_А, выходные, посетители и показы рекламы
In [350]:
# Putting feature variable to X X = media[['weekend','Character_A','Visitors','Ad_impression']] # Putting response variable to y y = media['Views_show']
In [351]:
import statsmodels.api as sm #Unlike SKLearn, statsmodels don't automatically fit a constant, #so you need to use the method sm.add_constant(X) in order to add a constant. X = sm.add_constant(X) # create a fitted model in one line lm_7 = sm.OLS(y,X).fit() print(lm_7.summary()) OLS Regression Results ============================================================================== Dep. Variable: Views_show R-squared: 0.803 Model: OLS Adj. R-squared: 0.792 Method: Least Squares F-statistic: 76.40 Date: Fri, 09 Mar 2018 Prob (F-statistic): 1.10e-25 Time: 10:27:38 Log-Likelihood: -1004.1 No. Observations: 80 AIC: 2018. Df Residuals: 75 BIC: 2030. Df Model: 4 Covariance Type: nonrobust ================================================================================= coef std err t P>|t| [0.025 0.975] --------------------------------------------------------------------------------- const -2.834e+05 6.97e+04 -4.067 0.000 -4.22e+05 -1.45e+05 weekend 1.485e+05 2.04e+04 7.296 0.000 1.08e+05 1.89e+05 Character_A -2.934e+04 2.16e+04 -1.356 0.179 -7.24e+04 1.38e+04 Visitors 0.0144 0.042 0.340 0.735 -0.070 0.099 Ad_impression 0.0004 3.96e-05 9.090 0.000 0.000 0.000 ============================================================================== Omnibus: 4.808 Durbin-Watson: 1.166 Prob(Omnibus): 0.090 Jarque-Bera (JB): 4.007 Skew: 0.476 Prob(JB): 0.135 Kurtosis: 3.545 Cond. No. 1.32e+10 ============================================================================== Warnings: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified. [2] The condition number is large, 1.32e+10. This might indicate that there are strong multicollinearity or other numerical problems.
Запуск модели восьмерки (lm8) Персонаж_А, выходные и показы рекламы
In [352]:
# Putting feature variable to X X = media[['weekend','Character_A','Ad_impression']] # Putting response variable to y y = media['Views_show']
In [353]:
import statsmodels.api as sm # Unlike SKLearn, statsmodels don't automatically fit a constant, # so you need to use the method sm.add_constant(X) in order to add a constant. X = sm.add_constant(X) # create a fitted model in one line lm_8 = sm.OLS(y,X).fit() print(lm_8.summary()) OLS Regression Results ============================================================================== Dep. Variable: Views_show R-squared: 0.803 Model: OLS Adj. R-squared: 0.795 Method: Least Squares F-statistic: 103.0 Date: Fri, 09 Mar 2018 Prob (F-statistic): 1.05e-26 Time: 10:27:38 Log-Likelihood: -1004.2 No. Observations: 80 AIC: 2016. Df Residuals: 76 BIC: 2026. Df Model: 3 Covariance Type: nonrobust ================================================================================= coef std err t P>|t| [0.025 0.975] --------------------------------------------------------------------------------- const -2.661e+05 4.74e+04 -5.609 0.000 -3.61e+05 -1.72e+05 weekend 1.51e+05 1.88e+04 8.019 0.000 1.14e+05 1.89e+05 Character_A -2.99e+04 2.14e+04 -1.394 0.167 -7.26e+04 1.28e+04 Ad_impression 0.0004 3.69e-05 9.875 0.000 0.000 0.000 ============================================================================== Omnibus: 4.723 Durbin-Watson: 1.169 Prob(Omnibus): 0.094 Jarque-Bera (JB): 3.939 Skew: 0.453 Prob(JB): 0.139 Kurtosis: 3.601 Cond. No. 9.26e+09 ============================================================================== Warnings: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified. [2] The condition number is large, 9.26e+09. This might indicate that there are strong multicollinearity or other numerical problems.
In [354]:
#Ad impression in million media['ad_impression_million'] = media['Ad_impression']/1000000
Запуск седьмой модели (lm7) Character_A, выходные, посетители, ad_impressions_million и Cricket_match_india
In [355]:
# Putting feature variable to X X = media[['weekend','Character_A','ad_impression_million','Cricket_match_india']] # Putting response variable to y y = media['Views_show']
In [356]:
import statsmodels.api as sm #Unlike SKLearn, statsmodels don't automatically fit a constant, #so you need to use the method sm.add_constant(X) in order to add a constant. X = sm.add_constant(X) # create a fitted model in one line lm_9 = sm.OLS(y,X).fit() print(lm_9.summary()) OLS Regression Results ============================================================================== Dep. Variable: Views_show R-squared: 0.803 Model: OLS Adj. R-squared: 0.793 Method: Least Squares F-statistic: 76.59 Date: Fri, 09 Mar 2018 Prob (F-statistic): 1.02e-25 Time: 10:27:39 Log-Likelihood: -1004.0 No. Observations: 80 AIC: 2018. Df Residuals: 75 BIC: 2030. Df Model: 4 Covariance Type: nonrobust ========================================================================================= coef std err t P>|t| [0.025 0.975] ----------------------------------------------------------------------------------------- const -2.633e+05 4.8e+04 -5.484 0.000 -3.59e+05 -1.68e+05 weekend 1.521e+05 1.9e+04 7.987 0.000 1.14e+05 1.9e+05 Character_A -3.196e+04 2.19e+04 -1.457 0.149 -7.57e+04 1.17e+04 ad_impression_million 363.7938 37.113 9.802 0.000 289.861 437.727 Cricket_match_india -1.396e+04 2.74e+04 -0.510 0.612 -6.85e+04 4.06e+04 ============================================================================== Omnibus: 5.270 Durbin-Watson: 1.161 Prob(Omnibus): 0.072 Jarque-Bera (JB): 4.560 Skew: 0.468 Prob(JB): 0.102 Kurtosis: 3.701 Cond. No. 9.32e+03 ============================================================================== Warnings: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified. [2] The condition number is large, 9.32e+03. This might indicate that there are strong multicollinearity or other numerical problems.
Запуск седьмой модели (lm7) Character_A, выходные и ad_impressions_million
In [357]:
# Putting feature variable to X X = media[['weekend','Character_A','ad_impression_million']] # Putting response variable to y y = media['Views_show']
In [358]:
import statsmodels.api as sm #Unlike SKLearn, statsmodels don't automatically fit a constant, #so you need to use the method sm.add_constant(X) in order to add a constant. X = sm.add_constant(X) # create a fitted model in one line lm_10 = sm.OLS(y,X).fit() print(lm_10.summary()) OLS Regression Results ============================================================================== Dep. Variable: Views_show R-squared: 0.803 Model: OLS Adj. R-squared: 0.795 Method: Least Squares F-statistic: 103.0 Date: Fri, 09 Mar 2018 Prob (F-statistic): 1.05e-26 Time: 10:27:39 Log-Likelihood: -1004.2 No. Observations: 80 AIC: 2016. Df Residuals: 76 BIC: 2026. Df Model: 3 Covariance Type: nonrobust ========================================================================================= coef std err t P>|t| [0.025 0.975] ----------------------------------------------------------------------------------------- const -2.661e+05 4.74e+04 -5.609 0.000 -3.61e+05 -1.72e+05 weekend 1.51e+05 1.88e+04 8.019 0.000 1.14e+05 1.89e+05 Character_A -2.99e+04 2.14e+04 -1.394 0.167 -7.26e+04 1.28e+04 ad_impression_million 364.4670 36.909 9.875 0.000 290.957 437.977 ============================================================================== Omnibus: 4.723 Durbin-Watson: 1.169 Prob(Omnibus): 0.094 Jarque-Bera (JB): 3.939 Skew: 0.453 Prob(JB): 0.139 Kurtosis: 3.601 Cond. No. 9.26e+03 ============================================================================== Warnings: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified. [2] The condition number is large, 9.26e+03. This might indicate that there are strong multicollinearity or other numerical problems.
Делаем прогнозы с помощью lm10
In [359]:
# Making predictions using the model X = media[['weekend','Character_A','ad_impression_million']] X = sm.add_constant(X) Predicted_views = lm_10.predict(X)
In [360]:
from sklearn.metrics import mean_squared_error, r2_score mse = mean_squared_error(media.Views_show, Predicted_views) r_squared = r2_score(media.Views_show, Predicted_views)
In [361]:
print('Mean_Squared_Error :' ,mse) print('r_square_value :',r_squared) Mean_Squared_Error : 4677651616.25 r_square_value : 0.802643446858
In [362]:
#Actual vs Predicted c = [i for i in range(1,81,1)] fig = plt.figure() plt.plot(c,media.Views_show, color="blue", linewidth=2.5, linestyle="-") plt.plot(c,Predicted_views, color="red", linewidth=2.5, linestyle="-") fig.suptitle('Actual and Predicted', fontsize=20) # Plot heading plt.xlabel('Index', fontsize=18) # X-label plt.ylabel('Views', fontsize=16) # Y-label
Исход[362]:
Text(0,0.5,'Views')
In [363]:
# Error terms c = [i for i in range(1,81,1)] fig = plt.figure() plt.plot(c,media.Views_show-Predicted_views, color="blue", linewidth=2.5, linestyle="-") fig.suptitle('Error Terms', fontsize=20) # Plot heading plt.xlabel('Index', fontsize=18) # X-label plt.ylabel('Views_show-Predicted_views', fontsize=16) # Y-label
Исход[363]:
Text(0,0.5,'Views_show-Predicted_views')
Делаем прогнозы с помощью lm6
In [364]:
# Making predictions using the model X = media[['weekend','Character_A','Visitors']] X = sm.add_constant(X) Predicted_views = lm_6.predict(X)
In [365]:
from sklearn.metrics import mean_squared_error, r2_score mse = mean_squared_error(media.Views_show, Predicted_views) r_squared = r2_score(media.Views_show, Predicted_views)
In [366]:
print('Mean_Squared_Error :' ,mse) print('r_square_value :',r_squared) Mean_Squared_Error : 9815432480.45 r_square_value : 0.585873408098
In [367]:
#Actual vs Predicted c = [i for i in range(1,81,1)] fig = plt.figure() plt.plot(c,media.Views_show, color="blue", linewidth=2.5, linestyle="-") plt.plot(c,Predicted_views, color="red", linewidth=2.5, linestyle="-") fig.suptitle('Actual and Predicted', fontsize=20) # Plot heading plt.xlabel('Index', fontsize=18) # X-label plt.ylabel('Views', fontsize=16) # Y-label
Исход[367]:
Text(0,0.5,'Views')
In [368]:
# Error terms c = [i for i in range(1,81,1)] fig = plt.figure() plt.plot(c,media.Views_show-Predicted_views, color="blue", linewidth=2.5, linestyle="-") fig.suptitle('Error Terms', fontsize=20) # Plot heading plt.xlabel('Index', fontsize=18) # X-label plt.ylabel('Views_show-Predicted_views', fontsize=16) # Y-label
Исход[368]:
Text(0,0.5,'Views_show-Predicted_views')
Надеюсь, это было полезно.
Я скоро обновлю ссылку на код и набор данных.
Свяжитесь со мной по Мэдди Ананд или посетите www.maddyanand.com.