Fitting binary regression models is done via
maximum likelihood estimation.
We can use the Binomial probability and maximise
the log likelihood, or by minimising the negative loglikelihood.
|
To
find the gradients, we use backpropagation!
|
We
also need to derive the gradients for each possible activation function.
Activation |
Function |
Pointwise Derivative |
Red is function. Green derivative. |
Sigmoid |
|
|
|
ReLU |
|
[Note, since ReLU is positive, can test
equality = 0] |
|
Leaky ReLU [Can change 0.01] |
|
|
|
Tanh |
|
|
|
Linear |
|
|
|
Sigmoid Softsign [Approximate Sigmoid] |
|
|
|
By
computing gradients backwards in the chain, we then can derive the gradients
for each weight and bias.
|
(c) Copyright Protected: Daniel Han-Chen 2020
License: All content on this page
is for educational and personal purposes only.
Usage of material, concepts, equations,
methods, and all intellectual property on any page in this publication
is forbidden for any commercial purpose,
be it promotional or revenue generating. I also claim no liability
from any damages caused by my material.
Knowledge and methods summarized from various sources like
papers, YouTube videos and other
mediums are protected under the original publishers licensing arrangements.