Can Linear Regression Be Used On Any Data Set

Hello friends, In my previous articles I wrote about all the basics of Linear regression. In this article I will tell you lot one of the several employ of Linear Regression. And in this commodity we will learn most "how to classification using Linear Regression in R".

Allow's Begin

Pre-requisites for this tutorial:

Introduction to Linear Regression model (optional).
Linear Regression Analysis using R (must take knowledge how to do regression on R).

Every bit we know that linear regression is nigh widely used in predictive analytics. And Here in this article we will employ this for classification.

First of all open R or RStudio and set your working directory.

Footstep 1: First install "mlbench" bundle.

> package.install("mlpackage") ↵ (press enter).

Afterwards pressing enter button, it will ask for the cran mirror. [just select the cloud 0]

Pace two: Load the installed package using the liberary command.

> library(mlbench) ↵ (press enter )

Step 3: load your desired information [Here I am loading the PimaIndiansDiabetes2 (information technology is a inbuilt data in the package)]

> data(PimaIndiansDiabetes2)

The data contains 392 women with pimaindianheritage, they were tasted for diabetes.

Our goal in this article is to get the classification of diagnosis of diabetes

lets await into the data

> head(PimaIndiansDiabetes2)

Here is the structure of the data:

datastructure

Lets get to know the variables of data:

Pregnant : Number of times she got meaning.

Glucose : Glucose tolerance exam.

Pressure : Blood pressure(mm-Hg).

Triceps : Pare thickness (mm).

Insulin : 2 60 minutes Syrum of insulin (miliunit/ml).

Mass : ratio to weight to peak(bmi)[kg/m²].

Pedigree : diabetes pedigree office.

Age : How old she is(year).

Diabetic : whether she is positive or negative.

These are the variables of our data

In the above epitome yous have seen that there is some missing values in insulin variable. Lets look into the complete information to check the missing values. The missing values can exist stock-still.

Step 4: check the consummate data

> PimaIndiansDiabetes2↵ press enter

There are 768 total observations.

datastructure

Only last few observations to testify the number of observations

> pidna <- na.omit(PimaIndiansDiabetes2)

Later on omitting the NA values from the information-ready only 392 valid observation remains.

datastructure2

and 'pidna' is the new variable where nosotros accept salvage the "PimaIndiansDiabetes2".

> pidlm <- pidna

The value of pidna is stored in pidlm

If yous take notice this that the all variables accept numeric value but the diabetes column have only categorical value. We should change the chiselled value to numeric value, how nosotros do it, let's accept a look

Step 5: Change the categorical value into numeric value

> pidlm$diabetes <- as.numeric(pidna$Diabetes)-i

When nosotros alter the categorical value into numeric value. By default 1-negative and

2-positive.

Here we accept gear up similar this that it will come as 1-positive and 0-negative.

Now, lets run across the corrected data again

NOTE: Don't get confused with the response variable(diabetes) having values 0 or 1. Thes are not in binary grade they are just numeric number. If nosotros ready this command like this pidlm$diabetes <- as.numeric(pidna$Diabetes)-2 and then the response values should be like -1 or 0.

> head(pidlm)↵(press enter)

datastructure3

Now we can use this data-set for linear Regression from this data-set we volition brand two dissimilar data-set one is grooming gear up and the other is testing set.

Step 6 : We will take first 300 records for preparation prepare

> train <- pidlm[(i:300),]

> examination <- pidlm[(301:392),]

Notation:If you are getting dimension error in your R command just remove ',' and attempt.

Step 7 : Now lets make a linear regression model

> lm_reg <- lm(diabetes~., data=railroad train)

Hither our regression model is prepare

Step eight : Lets have look into the summary of the lm_reg

> summary(lm_reg)

datastructure4

Here in this summary, nosotros volition conclude that who has diabetes and this summary coefficient volition tell us this.

Step 9: Lets brand a predictive model of linear regression models

> predicted <- predict(lm_reg, new data = exam)

> predicted

> TAB <- table(exam$diabetes, predicted > 0.5)

> TAB

datastructure5

We are assuming this if the predicted value is greater than 0.5 then, we will consider it equally a positive case, Otherwise information technology will be a negative instance.

From the tabular array we tin can say that there are but 62 cases with the diabtese.

57+v of women with '0' diabetes value

57 values are predicted imitation with logic predicted > 0.five

9+21 of women with '1' diabetes values.

21 women are predicted true with the logic of predicted > 0.five, which they are successfully predicted.

9 women cases where predicted value < 0.5 giving the predicted value '0' even though the original value was '1'.

Number of successful prediction are 78 while the total number of test cases are 92.

From here nosotros too summate the miss classification value

Step 10: Computing miss nomenclature value

Don't worry we will calculate the miss classification from here:

> form <- one-sum(diag[TAB])/sum(TAB)

diag[tab]= full predicted right value.

sum(tab) = total predicted value.

> mcrate <- one-[sum(total value predicted correct)/sum(full value)]

We are not express with predicted >0.5, we can change the predicted value and see the other value results.

Pace eleven: Lets bank check for the predicted value >0.seven

> tab_high <- table(test$diabetes, predicted > 0.7) ↵ printing enter

datastructure6

From the table nosotros can say that there are only 62 cases with the diabetes.

60+2 of women with '0' diabetes value

60 values are predicted fake with logic predicted > 0.vii

10+20of women with 'ane' diabetes values.

twenty women are predicted true with the logic of predicted > 0.7, which they are successfully predicted.

10 women cases where predicted value < 0.7 giving the predicted value '0' fifty-fifty though the original value was '1'.

As if you will modify the value of predicted less than 0.5 and so the results will be different.

EDIT: If you have any doubtfulness or found any mistakes please mail it on comment we will look into the mistake.

Can Linear Regression Be Used On Any Data Set,

Source: https://analyticsbuddhu.wordpress.com/2016/06/28/how-linear-regression-can-also-be-used-to-do-classification/

Posted by: motleywillynat81.blogspot.com

Can Linear Regression Be Used On Any Data Set

Can Linear Regression Be Used On Any Data Set,

0 Response to "Can Linear Regression Be Used On Any Data Set"

Post a Comment

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel