<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Model Selection | Haobin Tan</title><link>https://haobin-tan.netlify.app/tags/model-selection/</link><atom:link href="https://haobin-tan.netlify.app/tags/model-selection/index.xml" rel="self" type="application/rss+xml"/><description>Model Selection</description><generator>Hugo Blox Builder (https://hugoblox.com)</generator><language>en-us</language><lastBuildDate>Mon, 07 Sep 2020 00:00:00 +0000</lastBuildDate><image><url>https://haobin-tan.netlify.app/media/icon_hu7d15bc7db65c8eaf7a4f66f5447d0b42_15095_512x512_fill_lanczos_center_3.png</url><title>Model Selection</title><link>https://haobin-tan.netlify.app/tags/model-selection/</link></image><item><title>Model Selection</title><link>https://haobin-tan.netlify.app/docs/ai/machine-learning/model-selection/</link><pubDate>Mon, 07 Sep 2020 00:00:00 +0000</pubDate><guid>https://haobin-tan.netlify.app/docs/ai/machine-learning/model-selection/</guid><description/></item><item><title>Objective Function</title><link>https://haobin-tan.netlify.app/docs/ai/machine-learning/model-selection/objective-function/</link><pubDate>Mon, 06 Jul 2020 00:00:00 +0000</pubDate><guid>https://haobin-tan.netlify.app/docs/ai/machine-learning/model-selection/objective-function/</guid><description>&lt;h2 id="how-does-the-objective-function-look-like">How does the objective function look like?&lt;/h2>
&lt;p>Objective function:&lt;/p>
$$
\operatorname{Obj}(\Theta)= \overbrace{L(\Theta)}^{\text {Training Loss}} + \underbrace{\Omega(\Theta)}_{\text{Regularization}}
$$
&lt;ul>
&lt;li>
&lt;p>Training loss: measures how well the model fit on training data
&lt;/p>
$$
L=\sum_{i=1}^{n} l\left(y_{i}, g_{i}\right)
$$
&lt;ul>
&lt;li>Square loss:
$$
l(y_i, \hat{y}_i) = (y_i - \hat{y}_i)^2
$$&lt;/li>
&lt;li>Logistic loss:
$$
l(y_i, \hat{y}_i) = y_i \log(1 + e^{-\hat{y}_i}) + (1 - y_i) \log(1 + e^{\hat{y}_i})
$$&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>Regularization: How complicated is the model?&lt;/p>
&lt;ul>
&lt;li>$L_2$ norm (Ridge): $\omega(w) = \lambda \|w\|^2$&lt;/li>
&lt;li>$L_1$ norm (Lasso): $\omega(w) = \lambda \|w\|$&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;style type="text/css">
.tg {border-collapse:collapse;border-spacing:0;}
.tg td{border-color:black;border-style:solid;border-width:1px;font-family:Arial, sans-serif;font-size:14px;
overflow:hidden;padding:10px 5px;word-break:normal;}
.tg th{border-color:black;border-style:solid;border-width:1px;font-family:Arial, sans-serif;font-size:14px;
font-weight:normal;overflow:hidden;padding:10px 5px;word-break:normal;}
.tg .tg-0pky{border-color:inherit;text-align:left;vertical-align:top}
.tg .tg-fymr{border-color:inherit;font-weight:bold;text-align:left;vertical-align:top}
&lt;/style>
&lt;table class="tg">
&lt;thead>
&lt;tr>
&lt;th class="tg-0pky">&lt;/th>
&lt;th class="tg-fymr">Objective Function&lt;/th>
&lt;th class="tg-fymr">Linear model?&lt;/th>
&lt;th class="tg-fymr">Loss&lt;/th>
&lt;th class="tg-fymr">Regularization&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td class="tg-fymr">Ridge regression&lt;/td>
&lt;td class="tg-0pky">$\sum_{i=1}^{n}\left(y_{i}-w^{\top} x_{i}\right)^{2}+\lambda\|w\|^{2}$&lt;/td>
&lt;td class="tg-0pky">✅&lt;/td>
&lt;td class="tg-0pky">square&lt;/td>
&lt;td class="tg-0pky">$L_2$&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td class="tg-fymr">Lasso regression&lt;/td>
&lt;td class="tg-0pky">$\sum_{i=1}^{n}\left(y_{i}-w^{\top} x_{i}\right)^{2}+\lambda\|w\|$&lt;/td>
&lt;td class="tg-0pky">✅&lt;/td>
&lt;td class="tg-0pky">square&lt;/td>
&lt;td class="tg-0pky">$L_1$&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td class="tg-fymr">Logistic regression&lt;/td>
&lt;td class="tg-0pky">$\sum_{i=1}^{n}\left[y_{i} \cdot \ln \left(1+e^{-w^{\top} x_{i}}\right)+\left(1-y_{i}\right) \cdot \ln \left(1+e^{w^{\top} x_{i}}\right)\right]+\lambda\|w\|^{2}$&lt;/td>
&lt;td class="tg-0pky">✅&lt;/td>
&lt;td class="tg-0pky">logistic&lt;/td>
&lt;td class="tg-0pky">$L_2$&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;h2 id="why-do-we-want-to-contain-two-component-in-the-objective">Why do we want to contain two component in the objective?&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;strong>Optimizing training loss encourages predictive models&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>&lt;em>Fitting well in training data at least get you close to training data which is hopefully close to the underlying distribution&lt;/em>&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Optimizing regularization encourages simple models&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>&lt;em>Simpler models tends to have smaller variance in future predictions, making prediction&lt;/em> &lt;em>stable&lt;/em>&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul></description></item><item><title>Bias Variance Tradeoff</title><link>https://haobin-tan.netlify.app/docs/ai/machine-learning/model-selection/bias-variance-tradeoff/</link><pubDate>Mon, 06 Jul 2020 00:00:00 +0000</pubDate><guid>https://haobin-tan.netlify.app/docs/ai/machine-learning/model-selection/bias-variance-tradeoff/</guid><description>&lt;h2 id="tldr">TL;DR&lt;/h2>
&lt;style type="text/css">
.tg {border-collapse:collapse;border-spacing:0;}
.tg td{border-color:black;border-style:solid;border-width:1px;font-family:Arial, sans-serif;font-size:14px;
overflow:hidden;padding:10px 5px;word-break:normal;}
.tg th{border-color:black;border-style:solid;border-width:1px;font-family:Arial, sans-serif;font-size:14px;
font-weight:normal;overflow:hidden;padding:10px 5px;word-break:normal;}
.tg .tg-0pky{border-color:inherit;text-align:left;vertical-align:top}
.tg .tg-fymr{border-color:inherit;font-weight:bold;text-align:left;vertical-align:top}
&lt;/style>
&lt;table class="tg">
&lt;thead>
&lt;tr>
&lt;th class="tg-0pky">&lt;/th>
&lt;th class="tg-fymr">Resaon&lt;/th>
&lt;th class="tg-fymr">Example&lt;/th>
&lt;th class="tg-fymr">affect&lt;/th>
&lt;th class="tg-fymr">Model's complexity ⬆️&lt;/th>
&lt;th class="tg-fymr">Model's complexity ⬇️&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td class="tg-0pky">Bias&lt;/td>
&lt;td class="tg-0pky">wrong assumption&lt;/td>
&lt;td class="tg-0pky">assume a quadratic model to be linear&lt;/td>
&lt;td class="tg-0pky">underfitting&lt;/td>
&lt;td class="tg-0pky">⬇️&lt;/td>
&lt;td class="tg-0pky">⬆️&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td class="tg-0pky">Variance&lt;/td>
&lt;td class="tg-0pky">excessive sensitivity to small variations&lt;/td>
&lt;td class="tg-0pky">high-degree polynomial model&lt;/td>
&lt;td class="tg-0pky">overfitting&lt;/td>
&lt;td class="tg-0pky">⬆️&lt;/td>
&lt;td class="tg-0pky">⬇️&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td class="tg-0pky">Inreducible error&lt;/td>
&lt;td class="tg-0pky">noisy data&lt;/td>
&lt;td class="tg-0pky">&lt;/td>
&lt;td class="tg-0pky">&lt;/td>
&lt;td class="tg-0pky">&lt;/td>
&lt;td class="tg-0pky">&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/image-20200120105846503.png" alt="image-20200120105846503" style="zoom:50%;" />
&lt;h2 id="explaination">Explaination&lt;/h2>
&lt;p>A model’s generalization error can be expressed as the sum of three very different errors:&lt;/p>
&lt;h3 id="bias">Bias&lt;/h3>
&lt;p>This part of the generalization error is due to &lt;strong>wrong assumptions&lt;/strong>, such as assuming that the data is linear when it is actually quadratic.
A high-bias model is most likely to &lt;strong>underfit&lt;/strong> the training data.&lt;/p>
&lt;h3 id="variance">Variance&lt;/h3>
&lt;p>This part is due to the model’s &lt;strong>excessive sensitivity to small variations&lt;/strong> in the training data. &lt;br>
A model with many degrees of freedom (such as a high-degree polynomial model) is likely to have &lt;strong>high variance&lt;/strong>, and thus to &lt;strong>overfit&lt;/strong> the training data.&lt;/p>
&lt;h3 id="irreducible-error">Irreducible Error&lt;/h3>
&lt;p>This part is due to the &lt;strong>noisiness of the data&lt;/strong> itself.
The only way to reduce this part of the error is to &lt;strong>clean up the data&lt;/strong> (e.g., fix the data sources, such as broken sensors, or detect and remove outliers).&lt;/p>
&lt;style type="text/css">
.tg {border-collapse:collapse;border-spacing:0;}
.tg td{border-color:black;border-style:solid;border-width:1px;font-family:Arial, sans-serif;font-size:14px;
overflow:hidden;padding:10px 5px;word-break:normal;}
.tg th{border-color:black;border-style:solid;border-width:1px;font-family:Arial, sans-serif;font-size:14px;
font-weight:normal;overflow:hidden;padding:10px 5px;word-break:normal;}
.tg .tg-0pky{border-color:inherit;text-align:left;vertical-align:top}
&lt;/style>
&lt;table class="tg">
&lt;thead>
&lt;tr>
&lt;th class="tg-0pky">&lt;/th>
&lt;th class="tg-0pky">High bias&lt;/th>
&lt;th class="tg-0pky">Low bias&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td class="tg-0pky">High variance&lt;/td>
&lt;td class="tg-0pky">something is terribly wrong! 😭&lt;/td>
&lt;td class="tg-0pky">Overfitting&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td class="tg-0pky">Low variance&lt;/td>
&lt;td class="tg-0pky">Underfitting&lt;/td>
&lt;td class="tg-0pky">too good to be true! 🤪&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table></description></item><item><title>Cross Validation</title><link>https://haobin-tan.netlify.app/docs/ai/machine-learning/model-selection/cross-validation/</link><pubDate>Mon, 06 Jul 2020 00:00:00 +0000</pubDate><guid>https://haobin-tan.netlify.app/docs/ai/machine-learning/model-selection/cross-validation/</guid><description>&lt;img src="https://scikit-learn.org/stable/_images/grid_search_cross_validation.png" style="zoom:60%; background-color:white">
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>&lt;/th>
&lt;th>How it works?&lt;/th>
&lt;th style="text-align:center">Illustration&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>&lt;strong>K-fold&lt;/strong>&lt;/td>
&lt;td>1. Create $k$-fold partition of the dataset&lt;br />2. Estimate $k$ hold-out predictors using $1$ partition as validation and $k-1$ partition as training set&lt;/td>
&lt;td style="text-align:center">&lt;br />&lt;img src="https://miro.medium.com/max/5535/1*QDH0DSCecArPmzQtEBh0yg.png" alt="img" style="zoom: 20%; background-color:white" />&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;strong>Leave-One-Out (LOO)&lt;/strong>&lt;/td>
&lt;td>&lt;strong>(Special case with $k=n$)&lt;/strong> &lt;br />Consequently estimate $n$ hold-out predictors using $1$ partition as validation and $n-1$ partition as training set&lt;/td>
&lt;td style="text-align:center">&lt;br />&lt;img src="https://miro.medium.com/max/5284/1*9bs3OMsKOJntR8blRnVE9g.png" alt="img" style="zoom:20%; background-color:white" />&lt;br />&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;strong>Random sub-sampling&lt;/strong>&lt;/td>
&lt;td>1. Randomly sample a fraction of $\alpha \cdot n, \alpha \in (0,1)$ data points for validation&lt;br />2. Train on remaining points and validate, repeat $K$ times&lt;/td>
&lt;td style="text-align:center">&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;h2 id="-explaination">🎥 Explaination&lt;/h2>
&lt;div style="position: relative; padding-bottom: 56.25%; height: 0; overflow: hidden;">
&lt;iframe allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen="allowfullscreen" loading="eager" referrerpolicy="strict-origin-when-cross-origin" src="https://www.youtube.com/embed/fSytzGwwBVw?autoplay=0&amp;controls=1&amp;end=0&amp;loop=0&amp;mute=0&amp;start=0" style="position: absolute; top: 0; left: 0; width: 100%; height: 100%; border:0;" title="YouTube video"
>&lt;/iframe>
&lt;/div></description></item></channel></rss>