Logistic Regression: Probabilistic view
Class label:
yiββ0,1Conditional probability distribution of the class label is
p(y=1β£x)p(y=0β£x)β=Ο(wTx+b)=1βΟ(wTx+b)βwith
Ο(x)=1+exp(βx)1βThis is a conditional Bernoulli distribution. Therefore, the probability can be represented as
p(yβ£x)β=p(y=1β£x)yp(y=0β£x)1βy=Ο(wTx+b)y(1βΟ(wTx+b))1βyβThe conditional Bernoulli log-likelihood is (assuming training data is i.i.d)
loglik(w,D)β=log(lik(w,D))=log(iββp(yiββ£xiβ))=log(iββΟ(wTxiβ+b)y(1βΟ(wTxiβ+b))1βy)=iββylog(Ο(wTxiβ+b))+(1βy)log(1βΟ(wTxiβ+b))βLet
w~=β1wββ,xiβ~β=βbxiβββThen:
loglik(w,D)=loglik(w~,D)=iββylog(Ο(w~Txiβ~β))+(1βy)log(1βΟ(w~Txiβ~β)))Our objective is to find the w~β that maximize the log-likelihood, i.e.
w~ββ=w~argmaxβloglik(w~,D)=w~argminββloglik(w~,D)=w~argminβcross-entropy lossβ(iββylog(Ο(w~Txiβ~β))+(1βy)log(1βΟ(w~Txiβ~β))))βββIn other words, maximizing the (log-)likelihood is the same as minimizing the cross entropy.