{"id":175044,"date":"2021-03-08T11:07:11","date_gmt":"2021-03-08T05:37:11","guid":{"rendered":"https:\/\/www.jigsawacademy.com\/?p=175044"},"modified":"2022-07-06T20:02:26","modified_gmt":"2022-07-06T14:32:26","slug":"blogs-ai-ml-ridge-regression","status":"publish","type":"post","link":"https:\/\/www.jigsawacademy.com\/blogs\/ai-ml\/ridge-regression","title":{"rendered":"Ridge Regression: An Interesting Overview In 2021"},"content":{"rendered":"\r\n<h2><strong>Introduction<\/strong><\/h2>\r\n\r\n\r\n\r\n<p>When one analyses data with multicollinearity, the technique of \u2018Ridge regression\u2019 is used for model tuning using\u00a0ridge and lasso regression\u00a0or L2 regularization. It is useful since whenever multicollinearity happens in the data, the data exhibits large variances, and the least-squares will be unbiased. Thus the predicted values and actual values will have large variances.<\/p>\r\n\r\n\r\n\r\n<p>The ridge regression cost function denoted below Lambda is the ridge function penalty term denoted by the alpha parameter.<\/p>\r\n\r\n\r\n\r\n<p>Min(||Y \u2013 X(theta)||^2 \u03bb||theta||^2)<\/p>\r\n\r\n\r\n\r\n<p>If values of alpha get bigger, the penalty is larger, and the coefficient&#8217;s magnitude is smaller. Thus it can prevent multicollinearity by parameter-shrinking, and further model complexity is reduced due to the shrinkage of the coefficient.<\/p>\r\n\r\n\r\n\r\n<p>In this article let us look at:<\/p>\r\n\r\n\r\n\r\n<ol>\r\n<li><strong><a class=\"rank-math-link\" href=\"#Ridge-Regression-Models\">Ridge Regression Models<\/a><\/strong><\/li>\r\n<li><strong><a class=\"rank-math-link\" href=\"#Standardization\">Standardization<\/a><\/strong><\/li>\r\n<li><strong><a class=\"rank-math-link\" href=\"#Assumptions-of-Ridge-Regressions\">Assumptions of Ridge Regressions<\/a><\/strong><\/li>\r\n<li><strong><a class=\"rank-math-link\" href=\"#Linear-Regression-Model\">Linear Regression Model<\/a><\/strong><\/li>\r\n<li><strong><a class=\"rank-math-link\" href=\"#Regularization\">Regularization<\/a><\/strong><\/li>\r\n<\/ol>\r\n\r\n\r\n\r\n<h2 id=\"Ridge-Regression-Models\" class=\"has-vivid-cyan-blue-color has-text-color\">1. <strong>Ridge Regression Models<\/strong><\/h2>\r\n\r\n\r\n\r\n<p>For machine learning models, the ridge regression formula\u00a0is given by<\/p>\r\n\r\n\r\n\r\n<p>Y = XB e<\/p>\r\n\r\n\r\n\r\n<p>Here, Y- dependent variable, X- independent variables, e- residual errors and B- regression coefficients in the ridge regression derivation. When the lambda function is also considered and identified L2 regularization data is ready, one can undertake standardization.<\/p>\r\n\r\n\r\n\r\n<h2 id=\"Standardization\" class=\"has-vivid-cyan-blue-color has-text-color\">2. <strong>Standardization<\/strong><\/h2>\r\n\r\n\r\n\r\n<p>Ridge regression\u00a0uses standardized variables. Hence to standardize the independent and dependent variables, their mean value needs to be subtracted and divided by the standard deviation. However, one will need to notate whether the variables have been standardized and ensure that the final values of displayed regression coefficients are in the original scale. Thus, the ridge trace is always a standardized scale.<\/p>\r\n\r\n\r\n\r\n<p>Bias and variance trade-off:<\/p>\r\n\r\n\r\n\r\n<p>Actual dataset ridge regression building makes a trade-off between variance and bias, which follows the trends in the \u03bb function mentioned below.<\/p>\r\n\r\n\r\n\r\n<ul>\r\n<li>If \u03bb increases, then bias also increases.<\/li>\r\n<li>If \u03bb increases, then the variance decreases.<\/li>\r\n<\/ul>\r\n\r\n\r\n\r\n<h2 id=\"Assumptions-of-Ridge-Regressions\" class=\"has-vivid-cyan-blue-color has-text-color\">3. <strong>Assumptions of Ridge Regressions<\/strong><\/h2>\r\n\r\n\r\n\r\n<p>The ridge and linear regression models both follow variables of constant variance, independence and linearity. But ridge regression multicollinearity\u00a0assumes the error distributions and does not give the confidence limits in a ridge vs lasso regression.<\/p>\r\n\r\n\r\n\r\n<h2 id=\"Linear-Regression-Model\" class=\"has-vivid-cyan-blue-color has-text-color\">4. <strong>Linear Regression Model<\/strong><\/h2>\r\n\r\n\r\n\r\n<p>When to use ridge regression? Consider the below problem in linear ridge regression example to understand how ridge regression, when implemented, reduces errors.<\/p>\r\n\r\n\r\n\r\n<p>The data is of food restaurants in a particular region where the best food item combination for increased sales is evaluated.<\/p>\r\n\r\n\r\n\r\n<p>The first step is to upload the libraries required. This is done by importing numpy (np), pandas (pd), the OS (os), seaborn (sns), linear regression from the sklearn. linear_model, matplotlib.pyplot (plt) with the classic (plt.style.use), warnings with warnings.filterwarnings with \u2018ignore\u2019 and df=pd.read_excel (\u201cfood.xlsx\u201d).<\/p>\r\n\r\n\r\n\r\n<p>Once all missing values have attributes and data EDA is complete, dummy-variables are created. Note, the dataset should not contain categorical variables. Hence, if columns=cat is used to show the data set\u2019s categorical variables, we have<\/p>\r\n\r\n\r\n\r\n<p>df is equaal to pd.get_dummies(columns=cat,df, drop_first=True)<\/p>\r\n\r\n\r\n\r\n<p>This is then standardized and used in the Linear Regression method as the data set.<\/p>\r\n\r\n\r\n\r\n<p>The next step is\u00a0scaling variables\u00a0since continuous variables have weights that differ. This process returns all attributes z-scores in the # scale data. Start with<\/p>\r\n\r\n\r\n\r\n<p>from sklearn.preprocessing import StandardScaler using std_scale = StandardScaler() with std_scale. Also ensure<\/p>\r\n\r\n\r\n\r\n<p>df[&#8216;final_price&#8217;] = std_scale.fit_transform(df[[&#8216;final_price&#8217;]])<\/p>\r\n\r\n\r\n\r\n<p>df[&#8216;week&#8217;] = std_scale.fit_transform(df[[&#8216;week&#8217;]])<\/p>\r\n\r\n\r\n\r\n<p>df[&#8216;area_range&#8217;] = std_scale.fit_transform(df[[&#8216;area_range&#8217;]])<\/p>\r\n\r\n\r\n\r\n<p>The third step is to execute a\u00a0Train-Test Split\u00a0accomplished by the operations below.<\/p>\r\n\r\n\r\n\r\n<p># Copy predictor variables into dataframe X where X is the df.drop(&#8216;orders&#8217;, axis=1)\u00a0<\/p>\r\n\r\n\r\n\r\n<p># Copy target into dataframe y. The Target variable Target is now converted in to Log values and given by y = np.log(df[[&#8216;orders&#8217;]])<\/p>\r\n\r\n\r\n\r\n<p>Now, # Split y and X into training\/test in a 75:25 ratio using the import<\/p>\r\n\r\n\r\n\r\n<p>import from sklearn.model_selection, train_test_split \u00a0 where\u00a0X_test, X_train,\u00a0y_test, y_train, = train_test_split(X, y, random_state=1, test_size=0.25).<\/p>\r\n\r\n\r\n\r\n<p>The final step is applying the\u00a0Linear Regression Model.<\/p>\r\n\r\n\r\n\r\n<p># invoke the LinearRegression function and find the bestfit model on training data where the regression_model = LinearRegression()and regression_model.fit(X_train, y_train)<\/p>\r\n\r\n\r\n\r\n<p># To explore each independent attribute\u2019s coefficients, we use the operation below.\u00a0<\/p>\r\n\r\n\r\n\r\n<p>for col_name, idx in enumerate(X_train.columns):<\/p>\r\n\r\n\r\n\r\n<p>print(&#8220;The coefficient for {} is {}&#8221;.format(col_name, regression_model.coef_[0][idx]))<\/p>\r\n\r\n\r\n\r\n<p>The coefficients can be represented as\u00a0<\/p>\r\n\r\n\r\n\r\n<p>final_price -0.40354286519747384<\/p>\r\n\r\n\r\n\r\n<p>week -0.0041068045722690814<\/p>\r\n\r\n\r\n\r\n<p>area_range 0.16906454326841025<\/p>\r\n\r\n\r\n\r\n<p>website_homepage_mention_1.0 0.44689072858872664<\/p>\r\n\r\n\r\n\r\n<p>food_category_Desert 0.5722054451619581<\/p>\r\n\r\n\r\n\r\n<p>food_category_Biryani -0.10369818094671146<\/p>\r\n\r\n\r\n\r\n<p>food_category_Extras -0.22769824296095417<\/p>\r\n\r\n\r\n\r\n<p>food_category_Other Snacks -0.44682163212660775<\/p>\r\n\r\n\r\n\r\n<p>food_category_Pasta is -0.7352610382529601<\/p>\r\n\r\n\r\n\r\n<p>food_category_Rice Bowl 1.640603292571774<\/p>\r\n\r\n\r\n\r\n<p>food_category_Pizza 0.499963614474803<\/p>\r\n\r\n\r\n\r\n<p>food_category_Salad 0.22723622749570868<\/p>\r\n\r\n\r\n\r\n<p>food_category_Seafood -0.07845778484039663<\/p>\r\n\r\n\r\n\r\n<p>food_category_Starters -0.3782239478810047<\/p>\r\n\r\n\r\n\r\n<p>food_category_Sandwich 0.3733070983152591<\/p>\r\n\r\n\r\n\r\n<p>food_category_Soup -1.0586633401722432<\/p>\r\n\r\n\r\n\r\n<p>cuisine_Italian -0.03927567006223066<\/p>\r\n\r\n\r\n\r\n<p>cuisine_Indian -1.1335822602848094<\/p>\r\n\r\n\r\n\r\n<p>center_type_Noida 0.0501474731039986<\/p>\r\n\r\n\r\n\r\n<p>center_type_Gurgaon -0.16528108967295807<\/p>\r\n\r\n\r\n\r\n<p>night_service_1 0.0038398863634691582<\/p>\r\n\r\n\r\n\r\n<p>home_delivery_1.0 1.026400462237632<\/p>\r\n\r\n\r\n\r\n<p>Now, to checking the magnitude of coefficients use pandas import Series, DataFrame predictors = X_train.columns\u00a0<\/p>\r\n\r\n\r\n\r\n<p>Here,\u00a0<\/p>\r\n\r\n\r\n\r\n<p>coef = Series(regression_model.coef_.flatten(), predictors).sort_values()<\/p>\r\n\r\n\r\n\r\n<p>plt.figure(figsize=(10,8)) and coef.plot(kind=&#8217;bar&#8217;, title=&#8217;Model Coefficients&#8217;)<\/p>\r\n\r\n\r\n\r\n<p>plt.show()<\/p>\r\n\r\n\r\n\r\n<p>From the diagram the variables with \u201cpositive\u201d values like area_range, food_category_Salad, food_category_Desert,food_category_Pizza , food_category_Rice Bowl, home_delivery_1.0, website_homepage_mention_1.0, food_category_Sandwich, are the factors that influence the ridge regression\u00a0model most.<\/p>\r\n\r\n\r\n\r\n<p>Noting that in the\u00a0ridge regression equation,\u00a0the<strong>\u00a0<\/strong>higher impact is found when the beta coefficient is higher, dishes like Pizza, Rice Bowl, Desert using website_homepage_mention and home delivery play out as important factors in the number of orders or demand with high frequency. The regression model&#8217;s negative variables predict restaurant orders in food category_Pasta, cuisine_Indian,food_category_Soup, and food_category_Other_Snacks.<\/p>\r\n\r\n\r\n\r\n<p>The Final_price is seen to hurt the order of\u00a0ridge regression. Dishes like Pasta, Soup, other_snacks, Indian food categories also hurt the restaurant\u2019s number of orders and model prediction when all predictors considered are kept constant. The model also has variables like night_service and week, which have no appreciable impact on the order frequency in model prediction. Thus one concludes that the continuous variables are less significant when compared to the categorical variables or object types of variables.<\/p>\r\n\r\n\r\n\r\n<h2 id=\"Regularization\" class=\"has-vivid-cyan-blue-color has-text-color\">5. <strong>Regularization<\/strong><\/h2>\r\n\r\n\r\n\r\n<p>Regularization is the process of ridge regression regularization where the hyperparameter of Ridge or alpha values are manually set (as they are not learned automatically by the ridge regression algorithm), by running a grid search for optimum values of alpha for Ridge Regularization executed in GridSearchCV by importing as from sklearn.model_selection import GridSearchCV, from sklearn.linear_model import Ridge where ridge=Ridge(),ridge_regressor.fit(X,y),ridge_regressor=GridSearchCV(ridge,parameters,scoring=&#8217;neg_mean_squared_error&#8217;,cv=5)and parameters={&#8216;alpha&#8217;:[1e-15,1e-10,1e-8,1e-3,1e-2,1,5,10,20,30,35,40,45,50,55,100]}\u00a0<\/p>\r\n\r\n\r\n\r\n<p>Now print(ridge_regressor.best_score_)and print(ridge_regressor.best_params_) for {&#8216;alpha&#8217;: 0.01} which is -0.3751867421112124.The value\u2019s sign is negative due to Grid Search Cross Validation library error and can be ignored.\u00a0<\/p>\r\n\r\n\r\n\r\n<p>coef = Series(ridgeReg.coef_.flatten(),predictors).sort_values()<\/p>\r\n\r\n\r\n\r\n<p>predictors = X_train.columns\u00a0<\/p>\r\n\r\n\r\n\r\n<p>coef.plot(kind=&#8217;bar&#8217;, title=&#8217;Model Coefficients&#8217;)<\/p>\r\n\r\n\r\n\r\n<p>plt.figure(figsize=(10,8))<\/p>\r\n\r\n\r\n\r\n<p>plt.show()<\/p>\r\n\r\n\r\n\r\n<p>Now, the final\u00a0ridge regression model\u00a0predicts the equation.<\/p>\r\n\r\n\r\n\r\n<p>Orders = 4.65 1.02home_delivery_1.0 .46 website_homepage_mention_1 0 (-.40* final_price) .17area_range 0.57food_category_Desert (-0.22food_category_Extras) (-0.73food_category_Pasta) 0.49food_category_Pizza 1.6food_category_Rice_Bowl 0.22food_category_Salad 0.37food_category_Sandwich (-1.05food_category_Soup) (-0.37food_category_Starters) (-1.13cuisine_Indian) (-0.16center_type_Gurgaon)<\/p>\r\n\r\n\r\n\r\n<p>Here the top 5 influencing variables of the\u00a0ridge regression\u00a0model are:<\/p>\r\n\r\n\r\n\r\n<ol>\r\n<li>home_delivery_1.0<\/li>\r\n<li>food_category_Rice Bowl<\/li>\r\n<li>food_category_Desert<\/li>\r\n<li>food_category_Pizza<\/li>\r\n<li>website_homepage_mention_1<\/li>\r\n<\/ol>\r\n\r\n\r\n\r\n<h2><strong>Conclusion<\/strong><\/h2>\r\n\r\n\r\n\r\n<p>The\u00a0why ridge regression\u00a0question is answered by\u00a0ridge regression solution\u00a0where the beta coefficient, when higher, makes the predictor more significant. This model, when tuned, can help find the business problem\u2019s best\u00a0ridge regression\u00a0variables through\u00a0ridge regression analysis.<\/p>\r\n\r\n\r\n\r\n<p>There are no right or wrong ways of learning AI and ML technologies \u2013 the more, the better! These valuable resources can be the starting point for your journey on how to learn Artificial Intelligence and Machine Learning. Do pursuing AI and ML interest you? If you want to step into the world of emerging tech, you can accelerate your career with this\u00a0<strong><a class=\"rank-math-link\" href=\"https:\/\/www.jigsawacademy.com\/full-stack-machine-learning-artificial-intelligence\/\">Machine Learning And AI Courses<\/a>\u00a0<\/strong>by Jigsaw Academy.<\/p>\r\n\r\n\r\n\r\n<h2>ALSO READ<\/h2>\r\n\r\n\r\n\r\n<ul>\r\n<li><strong><a class=\"rank-math-link\" href=\"https:\/\/www.jigsawacademy.com\/blogs\/ai-ml\/linear-regression-in-machine-learning\">Linear Regression In Machine Learning: A Simple Overview In 4 Points<\/a><\/strong><\/li>\r\n<\/ul>\r\n","protected":false},"excerpt":{"rendered":"<p>Introduction When one analyses data with multicollinearity, the technique of \u2018Ridge regression\u2019 is used for model tuning using\u00a0ridge and lasso regression\u00a0or L2 regularization. It is useful since whenever multicollinearity happens in the data, the data exhibits large variances, and the least-squares will be unbiased. Thus the predicted values and actual values will have large variances. [&hellip;]<\/p>\n","protected":false},"author":188,"featured_media":175068,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[1126],"tags":[7262,7261,7260,7265,7266,7264,7263,7267],"form":[1499],"acf":[],"_links":{"self":[{"href":"https:\/\/www.jigsawacademy.com\/wp-json\/wp\/v2\/posts\/175044"}],"collection":[{"href":"https:\/\/www.jigsawacademy.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.jigsawacademy.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.jigsawacademy.com\/wp-json\/wp\/v2\/users\/188"}],"replies":[{"embeddable":true,"href":"https:\/\/www.jigsawacademy.com\/wp-json\/wp\/v2\/comments?post=175044"}],"version-history":[{"count":2,"href":"https:\/\/www.jigsawacademy.com\/wp-json\/wp\/v2\/posts\/175044\/revisions"}],"predecessor-version":[{"id":238662,"href":"https:\/\/www.jigsawacademy.com\/wp-json\/wp\/v2\/posts\/175044\/revisions\/238662"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.jigsawacademy.com\/wp-json\/wp\/v2\/media\/175068"}],"wp:attachment":[{"href":"https:\/\/www.jigsawacademy.com\/wp-json\/wp\/v2\/media?parent=175044"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.jigsawacademy.com\/wp-json\/wp\/v2\/categories?post=175044"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.jigsawacademy.com\/wp-json\/wp\/v2\/tags?post=175044"},{"taxonomy":"form","embeddable":true,"href":"https:\/\/www.jigsawacademy.com\/wp-json\/wp\/v2\/form?post=175044"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}