Skip to main content

Machine Learning Linear Regression


Machine Learning Linear Regression

In this post, we will learn Machine Learning Techniques Linear Regression using  in Python.

Requirement

For this tutorial, following library  should be installed in your system.
  1. Pandas
  2. Quandl
  3. numpy
  4. sklearn
Linear Regression: Taking continuous data and fitting a best possible function in it.
References:
Regression identifying data-set and importing it and making it into useful format.
Code snippet used in video:
 import pandas as pd  
   
 import quandl  
   
 import math  
   
 df=quandl.get('WIKI/GOOGL')   
   
 df=df[['Adj. Open', 'Adj. High', 'Adj. Low', 'Adj. Close', 'Adj. Volume']]  
   
 df['HL_PCT']=(df['Adj. High']-df['Adj. Close'])/df['Adj. Close']*100.00  
   
 df['PCT_Change']=(df['Adj. Close']-df['Adj. Open'])/df['Adj. Open']*100.00  
   
 df=df[['Adj. Close','HL_PCT','PCT_Change','Adj. Volume']]  
   
 print(df.head())  

Regression Feature Identification:  

References 

 Further code:   
   
 forecast_col='Adj. Close'  
   
 df.fillna(-99999,inplace =True)  
   
 forecast_out=int(math.ceil(0.01*len(df)))  
   
 df['label']=df[forecast_col].shift(-forecast_out)  
   
 df.dropna(inplace=True)  
   
 print(df.head())  
   
 print(df.tail()) 

Regression Training and Testing:

 import pandas as pd  
   
 import quandl  
   
 import math  
   
 import numpy as np #Used in creating arrays etc as python doesn't supports array  
   
 from sklearn import preprocessing, model_selection , svm  
   
 from sklearn.linear_model import LinearRegression  
   
 df=quandl.get('WIKI/GOOGL')  
   
 df=df[['Adj. Open', 'Adj. High', 'Adj. Low', 'Adj. Close', 'Adj. Volume']]  
   
 df['HL_PCT']=(df['Adj. High']-df['Adj. Close'])/df['Adj. Close']*100.00  
   
 df['PCT_Change']=(df['Adj. Close']-df['Adj. Open'])/df['Adj. Open']*100.00  
   
 df=df[['Adj. Close','HL_PCT','PCT_Change','Adj. Volume']]  
   
 forecast_col='Adj. Close'  
   
 df.fillna(-99999,inplace =True)  
   
   
   
 forecast_out=int(math.ceil(0.01*len(df)))  
   
 print(forecast_out)  
   
 df['label']=df[forecast_col].shift(-forecast_out)  
   
 df.dropna(inplace=True)  
   
 print(df.head())  
   
 print(df.tail())  
   
   
   
 X=np.array(df.drop(['label'],1)) #Our features  
   
 y=np.array(df['label'])  
   
 X=preprocessing.scale(X)  
   
 df.dropna(inplace=True)  
   
 X_train, X_test, y_train, y_test=model_selection.train_test_split(X,y, test_size=0.2)  
   
 clf=LinearRegression(n_jobs=100)  
   
 clf.fit(X_train, y_train)  
   
 accuracy=clf.score(X_test, y_test) #Accuracy is squared error  
   
 print(accuracy)  

Using Support vector regression classifier

 clf=svm.SVR()  
   
 clf.fit(X_train, y_train)  
   
 accuracy=clf.score(X_test, y_test) #Accuracy is squared error  
   
 print(accuracy)  
   
   



###Regression forecasting and predicting



Comments