III. Logistic regression (under building process)

Remember our example for the linear regression, Y was composed by continuous values (grades of an exam). For a classification problem, Y takes class labels (e.g. good or bad, color, etc…). Let’s take another example: we want to find an algorithm that use the age to predict if they have a driving licence or not

Screen Shot 2016-05-13 at 15.57.33

Our inputs are the age of each person from our sample. The outputs are if they have or not a driving licence.

First, we have to convert Y into 0 or 1 in order to be able to use them mathematically. Let’s say 0 for no and 1 when they do have a driving licence.

Screen Shot 2016-05-13 at 16.02.40

Our goal is to find an algorithm that will classify correctly our x. In our example, we want to find an algorithm that will allow us to say if a person is likely to have a driving licence or not given his age. 

Screen Shot 2016-05-13 at 16.16.54

In logistic regression, we assume that the dependent variable is a stochastic event (also called a dummy variable). Here it means that the response can be whether 0 or 1 (but not 0.3 for example). Thus, if we end up with a predicted Y equal to 0.3 as a prediction for a certain x, then logistic regression will make it to be 0 so without driving licence. In the logistic regression, we will therefore proceed this way:

if Predicted Y > 0.5 => final predicted Y = 1

and if predicted Y<0.5=> final predicted Y = 0

Logistic regression works almost like a linear regression. First the hyperplane that separate our data into two groups is linear thus the algorithm of the hyperplane looks like the one in linear regression:

Y = W0 + W1*X

The difference is in the fact that we want a predicted Y that is between 0 and 1 in order to make our prediction. To make it happen, the logistic regression use the log function which will allowed us to transform our predicted Y into a value of between 0 to 1.

This function, the sigmoid or logistic function is:

1 / (1 + e-Y ) where Y = W0 + W1*X

And looks like that:

Screen Shot 2016-06-29 at 13.52.56.png

 

 

 

 

As you can see, all the predicted Y we would obtain would be between 0 and 1.

What we got is the likelihood that Y=1 on input x given the parameter W0 and W1.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s