SlideShare une entreprise Scribd logo
1  sur  34
Télécharger pour lire hors ligne
Python for R Users
By
Chandan Routray
As a part of internship at
www.decisionstats.com
Basic Commands
Dec 2014 Copyrigt www.decisionstats.com Licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. i
Functions R Python
Downloading and installing a package install.packages('name') pip install name
Load a package library('name') import name as other_name
Checking working directory getwd() import os
os.getcwd()
Setting working directory setwd() os.chdir()
List files in a directory dir() os.listdir()
List all objects ls() globals()
Remove an object rm('name') del('object')
Data Frame Creation
R Python
(Using pandas package*)
Creating a data frame “df” of
dimension 6x4 (6 rows and 4
columns) containing random
numbers
A<­
matrix(runif(24,0,1),nrow=6,ncol=4)
df<­data.frame(A)
Here,
• runif function generates 24 random
numbers between 0 to 1
• matrix function creates a matrix from
those random numbers, nrow and ncol
sets the numbers of rows and columns
to the matrix
• data.frame converts the matrix to data
frame
import numpy as np
import pandas as pd
A=np.random.randn(6,4)
df=pd.DataFrame(A)
Here,
• np.random.randn generates a
matrix of 6 rows and 4 columns;
this function is a part of numpy**
library
• pd.DataFrame converts the matrix
in to a data frame
*To install Pandas library visit: http://pandas.pydata.org/; To import Pandas library type: import pandas as pd;
**To import Numpy library type: import numpy as np;
Dec 2014 Copyrigt www.decisionstats.com Licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 1
Data Frame Creation
R Python
Dec 2014 Copyrigt www.decisionstats.com Licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 2
Data Frame: Inspecting and Viewing Data
R Python
(Using pandas package*)
Getting the names of rows and
columns of data frame “df”
rownames(df)
returns the name of the rows
colnames(df)
returns the name of the columns
df.index
returns the name of the rows
df.columns
returns the name of the columns
Seeing the top and bottom “x”
rows of the data frame “df”
head(df,x)
returns top x rows of data frame
tail(df,x)
returns bottom x rows of data frame
df.head(x)
returns top x rows of data frame
df.tail(x)
returns bottom x rows of data frame
Getting dimension of data frame
“df”
dim(df)
returns in this format : rows, columns
df.shape
returns in this format : (rows,
columns)
Length of data frame “df” length(df)
returns no. of columns in data frames
len(df)
returns no. of columns in data frames
Dec 2014 Copyrigt www.decisionstats.com Licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 3
Data Frame: Inspecting and Viewing Data
R Python
Dec 2014 Copyrigt www.decisionstats.com Licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 4
Data Frame: Inspecting and Viewing Data
R Python
(Using pandas package*)
Getting quick summary(like
mean, std. deviation etc. ) of
data in the data frame “df”
summary(df)
returns mean, median , maximum,
minimum, first quarter and third quarter
df.describe()
returns count, mean, standard
deviation, maximum, minimum, 25%,
50% and 75%
Setting row names and columns
names of the data frame “df”
rownames(df)=c(“A”, ”B”, “C”, ”D”, 
“E”, ”F”)
set the row names to A, B, C, D and E
colnames=c(“P”, ”Q”, “R”, ”S”)
set the column names to P, Q, R and S
df.index=[“A”, ”B”, “C”, ”D”, 
“E”, ”F”]
set the row names to A, B, C, D and
E
df.columns=[“P”, ”Q”, “R”, ”S”]
set the column names to P, Q, R and
S
Dec 2014 Copyrigt www.decisionstats.com Licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 5
Data Frame: Inspecting and Viewing Data
R Python
Dec 2014 Copyrigt www.decisionstats.com Licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 6
Data Frame: Sorting Data
R Python
(Using pandas package*)
Sorting the data in the data
frame “df” by column name “P”
df[order(df$P),] df.sort(['P'])
Dec 2014 Copyrigt www.decisionstats.com Licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 7
Data Frame: Sorting Data
R Python
Dec 2014 Copyrigt www.decisionstats.com Licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 8
Data Frame: Data Selection
R Python
(Using pandas package*)
Slicing the rows of a data frame
from row no. “x” to row no.
“y”(including row x and y)
df[x:y,] df[x­1:y]
Python starts counting from 0
Slicing the columns name “x”,”Y”
etc. of a data frame “df”
myvars <­ c(“X”,”Y”)
newdata <­ df[myvars]
df.loc[:,[‘X’,’Y’]]
Selecting the the data from row
no. “x” to “y” and column no. “a”
to “b”
df[x:y,a:b] df.iloc[x­1:y,a­1,b]
Selecting the element at row no.
“x” and column no. “y”
df[x,y] df.iat[x­1,y­1]
Dec 2014 Copyrigt www.decisionstats.com Licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 9
Data Frame: Data Selection
R Python
Dec 2014 Copyrigt www.decisionstats.com Licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 10
Data Frame: Data Selection
R Python
(Using pandas package*)
Using a single column’s values
to select data, column name “A”
subset(df,A>0)
It will select the all the rows in which the
corresponding value in column A of that
row is greater than 0
df[df.A > 0]
It will do the same as the R function
PythonR
Dec 2014 Copyrigt www.decisionstats.com Licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 11
Mathematical Functions
Functions R Python
(import math and numpy library)
Sum sum(x) math.fsum(x)
Square Root sqrt(x) math.sqrt(x)
Standard Deviation sd(x) numpy.std(x)
Log log(x) math.log(x[,base])
Mean mean(x) numpy.mean(x)
Median median(x) numpy.median(x)
Dec 2014 Copyrigt www.decisionstats.com Licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 12
Mathematical Functions
R Python
Dec 2014 Copyrigt www.decisionstats.com Licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 13
Data Manipulation
Functions R Python
(import math and numpy library)
Convert character variable to numeric variable as.numeric(x) For a single value: int(x), long(x), float(x)
For list, vectors etc.: map(int,x), map(float,x)
Convert factor/numeric variable to character
variable
paste(x) For a single value: str(x)
For list, vectors etc.: map(str,x)
Check missing value in an object is.na(x) math.isnan(x)
Delete missing value from an object na.omit(list) cleanedList = [x for x in list if str(x) !
= 'nan']
Calculate the number of characters in character
value
nchar(x) len(x)
Dec 2014 Copyrigt www.decisionstats.com Licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 14
Date & Time Manipulation
Functions R
(import lubridate library)
Python
(import datetime library)
Getting time and date at an instant Sys.time() datetime.datetime.now()
Parsing date and time in format:
YYYY MM DD HH:MM:SS
d<­Sys.time()
d_format<­ymd_hms(d)
d=datetime.datetime.now()
format= “%Y %b %d  %H:%M:%S”
d_format=d.strftime(format)
Dec 2014 Copyrigt www.decisionstats.com Licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 15
Data Visualization
Functions R Python
(import matplotlib library**)
Scatter Plot variable1 vs variable2 plot(variable1,variable2) plt.scatter(variable1,variable2)
plt.show()
Boxplot for Var boxplot(Var) plt.boxplot(Var)
plt.show()
Histogram for Var hist(Var) plt.hist(Var)
plt.show()
Pie Chart for Var pie(Var) from pylab import *
pie(Var)
show()
** To import matplotlib library type: import matplotlib.pyplot as plt
Dec 2014 Copyrigt www.decisionstats.com Licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 16
Data Visualization: Scatter Plot
R Python
Dec 2014 Copyrigt www.decisionstats.com Licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 17
Data Visualization: Box Plot
R Python
Dec 2014 Copyrigt www.decisionstats.com Licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 18
Data Visualization: Histogram
R Python
Dec 2014 Copyrigt www.decisionstats.com Licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 19
Data Visualization: Line Plot
R Python
Dec 2014 Copyrigt www.decisionstats.com Licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 20
Data Visualization: Bubble
R Python
Dec 2014 Copyrigt www.decisionstats.com Licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 22
Data Visualization: Bar
R Python
Dec 2014 Copyrigt www.decisionstats.com Licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 21
Data Visualization: Pie Chart
R Python
Dec 2014 Copyrigt www.decisionstats.com Licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 23
Thank You
For feedback contact
DecisionStats.com
Coming up
● Data Mining in Python and R ( see draft slides
afterwards)
Machine Learning: SVM on Iris Dataset
*To know more about svm function in R visit: http://cran.r-project.org/web/packages/e1071/
** To install sklearn library visit : http://scikit-learn.org/, To know more about sklearn svm visit: http://scikit-
learn.org/stable/modules/generated/sklearn.svm.SVC.html
R(Using svm* function) Python(Using sklearn** library)
library(e1071)
data(iris)
trainset <­iris[1:149,]
testset <­iris[150,]
svm.model <­ svm(Species ~ ., data = 
trainset, cost = 100, gamma = 1, type= 'C­
classification')
svm.pred<­ predict(svm.model,testset[­5])
svm.pred
#Loading Library
from sklearn import svm
#Importing Dataset
from sklearn import datasets
#Calling SVM
clf = svm.SVC()
#Loading the package
iris = datasets.load_iris()
#Constructing training data
X, y = iris.data[:­1], iris.target[:­1]
#Fitting SVM
clf.fit(X, y)
#Testing the model on test data
print clf.predict(iris.data[­1])
Output: Virginica Output: 2, corresponds to Virginica
Linear Regression: Iris Dataset
*To know more about lm function in R visit: https://stat.ethz.ch/R-manual/R-devel/library/stats/html/lm.html
** ** To know more about sklearn linear regression visit : http://scikit-
learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html
R(Using lm* function) Python(Using sklearn** library)
data(iris)
total_size<­dim(iris)[1]
num_target<­c(rep(0,total_size))
for (i in 1:length(num_target)){
  if(iris$Species[i]=='setosa'){num_target[i]<­0}
  else if(iris$Species[i]=='versicolor')
{num_target[i]<­1}
  else{num_target[i]<­2}
}
iris$Species<­num_target
train_set <­iris[1:149,]
test_set <­iris[150,]
fit<­lm(Species ~ 0+Sepal.Length+ Sepal.Width+ 
Petal.Length+ Petal.Width , data=train_set)
coefficients(fit)
predict.lm(fit,test_set)
from sklearn import linear_model
from sklearn import datasets
iris = datasets.load_iris()
regr = linear_model.LinearRegression()
X, y = iris.data[:­1], iris.target[:­1]
regr.fit(X, y)
print(regr.coef_)
print regr.predict(iris.data[­1])
Output: 1.64 Output: 1.65
Random forest: Iris Dataset
*To know more about randomForest package in R visit: http://cran.r-project.org/web/packages/randomForest/
** To know more about sklearn random forest visit : http://scikit-
learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html
R(Using randomForest* package) Python(Using sklearn** library)
library(randomForest)
data(iris)
total_size<­dim(iris)[1]
num_target<­c(rep(0,total_size))
for (i in 1:length(num_target)){
  if(iris$Species[i]=='setosa'){num_target[i]<­0}
  else if(iris$Species[i]=='versicolor')
{num_target[i]<­1}
  else{num_target[i]<­2}}
iris$Species<­num_target
train_set <­iris[1:149,]
test_set <­iris[150,]
iris.rf <­ randomForest(Species ~ ., 
data=train_set,ntree=100,importance=TRUE,
                        proximity=TRUE)
print(iris.rf)
predict(iris.rf, test_set[­5], predict.all=TRUE)
from sklearn import ensemble
from sklearn import datasets
clf = 
ensemble.RandomForestClassifier(n_estimato
rs=100,max_depth=10)
iris = datasets.load_iris()
X, y = iris.data[:­1], iris.target[:­1]
clf.fit(X, y)
print clf.predict(iris.data[­1])
Output: 1.845 Output: 2
Decision Tree: Iris Dataset
*To know more about rpart package in R visit: http://cran.r-project.org/web/packages/rpart/
** To know more about sklearn desicion tree visit : http://scikit-
learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html
R(Using rpart* package) Python(Using sklearn** library)
library(rpart)
data(iris)
sub <­ c(1:149)
fit <­ rpart(Species ~ ., data = iris, 
subset = sub)
fit
predict(fit, iris[­sub,], type = "class")
from sklearn.datasets import load_iris
from sklearn.tree import 
DecisionTreeClassifier
clf = 
DecisionTreeClassifier(random_state=0)
iris = datasets.load_iris()
X, y = iris.data[:­1], iris.target[:­1]
clf.fit(X, y)
print clf.predict(iris.data[­1])
Output: Virginica Output: 2, corresponds to virginica
Gaussian Naive Bayes: Iris Dataset
*To know more about e1071 package in R visit: http://cran.r-project.org/web/packages/e1071/
** To know more about sklearn Naive Bayes visit : http://scikit-
learn.org/stable/modules/generated/sklearn.naive_bayes.GaussianNB.html
R(Using e1071* package) Python(Using sklearn** library)
library(e1071)
data(iris)
trainset <­iris[1:149,]
testset <­iris[150,]
classifier<­naiveBayes(trainset[,1:4], 
trainset[,5]) 
predict(classifier, testset[,­5])
from sklearn.datasets import load_iris
from sklearn.naive_bayes import GaussianNB
clf = GaussianNB()
iris = datasets.load_iris()
X, y = iris.data[:­1], iris.target[:­1]
clf.fit(X, y)
print clf.predict(iris.data[­1])
Output: Virginica Output: 2, corresponds to virginica
K Nearest Neighbours: Iris Dataset
*To know more about kknn package in R visit:
** To know more about sklearn k nearest neighbours visit : http://scikit-
learn.org/stable/modules/generated/sklearn.neighbors.NearestNeighbors.html
R(Using kknn* package) Python(Using sklearn** library)
library(kknn)
data(iris)
trainset <­iris[1:149,]
testset <­iris[150,]
iris.kknn <­ kknn(Species~., 
trainset,testset, distance = 1,            
    kernel = "triangular")
summary(iris.kknn)
fit <­ fitted(iris.kknn)
fit
from sklearn.datasets import load_iris
from sklearn.neighbors import 
KNeighborsClassifier
knn = KNeighborsClassifier()
iris = datasets.load_iris()
X, y = iris.data[:­1], iris.target[:­1]
knn.fit(X,y) 
print knn.predict(iris.data[­1])
Output: Virginica Output: 2, corresponds to virginica
Thank You
For feedback please let us know at
ohri2007@gmail.com

Contenu connexe

Tendances

相関分析と回帰分析
相関分析と回帰分析相関分析と回帰分析
相関分析と回帰分析大貴 末廣
 
査読コメントに回答する時の、3つの黄金律
査読コメントに回答する時の、3つの黄金律査読コメントに回答する時の、3つの黄金律
査読コメントに回答する時の、3つの黄金律英文校正エディテージ
 
ベイジアンネットとレコメンデーション -第5回データマイニング+WEB勉強会@東京
ベイジアンネットとレコメンデーション -第5回データマイニング+WEB勉強会@東京ベイジアンネットとレコメンデーション -第5回データマイニング+WEB勉強会@東京
ベイジアンネットとレコメンデーション -第5回データマイニング+WEB勉強会@東京Koichi Hamada
 
データサイエンス概論第一 5 時系列データの解析
データサイエンス概論第一 5 時系列データの解析データサイエンス概論第一 5 時系列データの解析
データサイエンス概論第一 5 時系列データの解析Seiichi Uchida
 
機械学習チュートリアル@Jubatus Casual Talks
機械学習チュートリアル@Jubatus Casual Talks機械学習チュートリアル@Jubatus Casual Talks
機械学習チュートリアル@Jubatus Casual TalksYuya Unno
 
SMOTE resampling method slides 02-19-2018
SMOTE resampling method slides 02-19-2018SMOTE resampling method slides 02-19-2018
SMOTE resampling method slides 02-19-2018Shuma Ishigami
 
Chp2 - Les Entrepôts de Données
Chp2 - Les Entrepôts de DonnéesChp2 - Les Entrepôts de Données
Chp2 - Les Entrepôts de DonnéesLilia Sfaxi
 
線形?非線形?
線形?非線形?線形?非線形?
線形?非線形?nishio
 
scikit-learnを用いた機械学習チュートリアル
scikit-learnを用いた機械学習チュートリアルscikit-learnを用いた機械学習チュートリアル
scikit-learnを用いた機械学習チュートリアル敦志 金谷
 
実践で学ぶネットワーク分析
実践で学ぶネットワーク分析実践で学ぶネットワーク分析
実践で学ぶネットワーク分析Mitsunori Sato
 
Python基礎その1
Python基礎その1Python基礎その1
Python基礎その1大貴 末廣
 
差分プライバシーとは何か? (定義 & 解釈編)
差分プライバシーとは何か? (定義 & 解釈編)差分プライバシーとは何か? (定義 & 解釈編)
差分プライバシーとは何か? (定義 & 解釈編)Kentaro Minami
 
PRML輪読#1
PRML輪読#1PRML輪読#1
PRML輪読#1matsuolab
 
Apache spark-the-definitive-guide-excerpts-r1
Apache spark-the-definitive-guide-excerpts-r1Apache spark-the-definitive-guide-excerpts-r1
Apache spark-the-definitive-guide-excerpts-r1AjayRawat971036
 
Présentation Big Data DFCG
Présentation Big Data DFCGPrésentation Big Data DFCG
Présentation Big Data DFCGMicropole Group
 
Spark RDD : Transformations & Actions
Spark RDD : Transformations & ActionsSpark RDD : Transformations & Actions
Spark RDD : Transformations & ActionsMICHRAFY MUSTAFA
 
Data Science Training | Data Science For Beginners | Data Science With Python...
Data Science Training | Data Science For Beginners | Data Science With Python...Data Science Training | Data Science For Beginners | Data Science With Python...
Data Science Training | Data Science For Beginners | Data Science With Python...Simplilearn
 

Tendances (20)

MICの解説
MICの解説MICの解説
MICの解説
 
相関分析と回帰分析
相関分析と回帰分析相関分析と回帰分析
相関分析と回帰分析
 
査読コメントに回答する時の、3つの黄金律
査読コメントに回答する時の、3つの黄金律査読コメントに回答する時の、3つの黄金律
査読コメントに回答する時の、3つの黄金律
 
主成分分析
主成分分析主成分分析
主成分分析
 
ベイジアンネットとレコメンデーション -第5回データマイニング+WEB勉強会@東京
ベイジアンネットとレコメンデーション -第5回データマイニング+WEB勉強会@東京ベイジアンネットとレコメンデーション -第5回データマイニング+WEB勉強会@東京
ベイジアンネットとレコメンデーション -第5回データマイニング+WEB勉強会@東京
 
データサイエンス概論第一 5 時系列データの解析
データサイエンス概論第一 5 時系列データの解析データサイエンス概論第一 5 時系列データの解析
データサイエンス概論第一 5 時系列データの解析
 
機械学習チュートリアル@Jubatus Casual Talks
機械学習チュートリアル@Jubatus Casual Talks機械学習チュートリアル@Jubatus Casual Talks
機械学習チュートリアル@Jubatus Casual Talks
 
SMOTE resampling method slides 02-19-2018
SMOTE resampling method slides 02-19-2018SMOTE resampling method slides 02-19-2018
SMOTE resampling method slides 02-19-2018
 
Chp2 - Les Entrepôts de Données
Chp2 - Les Entrepôts de DonnéesChp2 - Les Entrepôts de Données
Chp2 - Les Entrepôts de Données
 
線形?非線形?
線形?非線形?線形?非線形?
線形?非線形?
 
scikit-learnを用いた機械学習チュートリアル
scikit-learnを用いた機械学習チュートリアルscikit-learnを用いた機械学習チュートリアル
scikit-learnを用いた機械学習チュートリアル
 
実践で学ぶネットワーク分析
実践で学ぶネットワーク分析実践で学ぶネットワーク分析
実践で学ぶネットワーク分析
 
Python基礎その1
Python基礎その1Python基礎その1
Python基礎その1
 
差分プライバシーとは何か? (定義 & 解釈編)
差分プライバシーとは何か? (定義 & 解釈編)差分プライバシーとは何か? (定義 & 解釈編)
差分プライバシーとは何か? (定義 & 解釈編)
 
PRML輪読#1
PRML輪読#1PRML輪読#1
PRML輪読#1
 
Apache spark-the-definitive-guide-excerpts-r1
Apache spark-the-definitive-guide-excerpts-r1Apache spark-the-definitive-guide-excerpts-r1
Apache spark-the-definitive-guide-excerpts-r1
 
Rの高速化
Rの高速化Rの高速化
Rの高速化
 
Présentation Big Data DFCG
Présentation Big Data DFCGPrésentation Big Data DFCG
Présentation Big Data DFCG
 
Spark RDD : Transformations & Actions
Spark RDD : Transformations & ActionsSpark RDD : Transformations & Actions
Spark RDD : Transformations & Actions
 
Data Science Training | Data Science For Beginners | Data Science With Python...
Data Science Training | Data Science For Beginners | Data Science With Python...Data Science Training | Data Science For Beginners | Data Science With Python...
Data Science Training | Data Science For Beginners | Data Science With Python...
 

Similaire à Python for R Users

PPT on Data Science Using Python
PPT on Data Science Using PythonPPT on Data Science Using Python
PPT on Data Science Using PythonNishantKumar1179
 
R for Pythonistas (PyData NYC 2017)
R for Pythonistas (PyData NYC 2017)R for Pythonistas (PyData NYC 2017)
R for Pythonistas (PyData NYC 2017)Christopher Roach
 
pandas dataframe notes.pdf
pandas dataframe notes.pdfpandas dataframe notes.pdf
pandas dataframe notes.pdfAjeshSurejan2
 
Python Functions Tutorial | Working With Functions In Python | Python Trainin...
Python Functions Tutorial | Working With Functions In Python | Python Trainin...Python Functions Tutorial | Working With Functions In Python | Python Trainin...
Python Functions Tutorial | Working With Functions In Python | Python Trainin...Edureka!
 
Python Pandas
Python PandasPython Pandas
Python PandasSunil OS
 
What's new in Python 3.11
What's new in Python 3.11What's new in Python 3.11
What's new in Python 3.11Henry Schreiner
 
Python-for-Data-Analysis.pptx
Python-for-Data-Analysis.pptxPython-for-Data-Analysis.pptx
Python-for-Data-Analysis.pptxParveenShaik21
 
R programming & Machine Learning
R programming & Machine LearningR programming & Machine Learning
R programming & Machine LearningAmanBhalla14
 
PyTorch 튜토리얼 (Touch to PyTorch)
PyTorch 튜토리얼 (Touch to PyTorch)PyTorch 튜토리얼 (Touch to PyTorch)
PyTorch 튜토리얼 (Touch to PyTorch)Hansol Kang
 
Poetry with R -- Dissecting the code
Poetry with R -- Dissecting the codePoetry with R -- Dissecting the code
Poetry with R -- Dissecting the codePeter Solymos
 
PPT ON MACHINE LEARNING by Ragini Ratre
PPT ON MACHINE LEARNING by Ragini RatrePPT ON MACHINE LEARNING by Ragini Ratre
PPT ON MACHINE LEARNING by Ragini RatreRaginiRatre
 
Wprowadzenie do technologii Big Data / Intro to Big Data Ecosystem
Wprowadzenie do technologii Big Data / Intro to Big Data EcosystemWprowadzenie do technologii Big Data / Intro to Big Data Ecosystem
Wprowadzenie do technologii Big Data / Intro to Big Data EcosystemSages
 
python-numpyandpandas-170922144956 (1).pptx
python-numpyandpandas-170922144956 (1).pptxpython-numpyandpandas-170922144956 (1).pptx
python-numpyandpandas-170922144956 (1).pptxAkashgupta517936
 
Next Generation Programming in R
Next Generation Programming in RNext Generation Programming in R
Next Generation Programming in RFlorian Uhlitz
 
Standardizing on a single N-dimensional array API for Python
Standardizing on a single N-dimensional array API for PythonStandardizing on a single N-dimensional array API for Python
Standardizing on a single N-dimensional array API for PythonRalf Gommers
 

Similaire à Python for R Users (20)

Python for R users
Python for R usersPython for R users
Python for R users
 
R language introduction
R language introductionR language introduction
R language introduction
 
R Introduction
R IntroductionR Introduction
R Introduction
 
PPT on Data Science Using Python
PPT on Data Science Using PythonPPT on Data Science Using Python
PPT on Data Science Using Python
 
R for Pythonistas (PyData NYC 2017)
R for Pythonistas (PyData NYC 2017)R for Pythonistas (PyData NYC 2017)
R for Pythonistas (PyData NYC 2017)
 
R Language Introduction
R Language IntroductionR Language Introduction
R Language Introduction
 
pandas dataframe notes.pdf
pandas dataframe notes.pdfpandas dataframe notes.pdf
pandas dataframe notes.pdf
 
Python Functions Tutorial | Working With Functions In Python | Python Trainin...
Python Functions Tutorial | Working With Functions In Python | Python Trainin...Python Functions Tutorial | Working With Functions In Python | Python Trainin...
Python Functions Tutorial | Working With Functions In Python | Python Trainin...
 
Python Pandas
Python PandasPython Pandas
Python Pandas
 
What's new in Python 3.11
What's new in Python 3.11What's new in Python 3.11
What's new in Python 3.11
 
Python-for-Data-Analysis.pptx
Python-for-Data-Analysis.pptxPython-for-Data-Analysis.pptx
Python-for-Data-Analysis.pptx
 
R programming & Machine Learning
R programming & Machine LearningR programming & Machine Learning
R programming & Machine Learning
 
PyTorch 튜토리얼 (Touch to PyTorch)
PyTorch 튜토리얼 (Touch to PyTorch)PyTorch 튜토리얼 (Touch to PyTorch)
PyTorch 튜토리얼 (Touch to PyTorch)
 
Poetry with R -- Dissecting the code
Poetry with R -- Dissecting the codePoetry with R -- Dissecting the code
Poetry with R -- Dissecting the code
 
PPT ON MACHINE LEARNING by Ragini Ratre
PPT ON MACHINE LEARNING by Ragini RatrePPT ON MACHINE LEARNING by Ragini Ratre
PPT ON MACHINE LEARNING by Ragini Ratre
 
Wprowadzenie do technologii Big Data / Intro to Big Data Ecosystem
Wprowadzenie do technologii Big Data / Intro to Big Data EcosystemWprowadzenie do technologii Big Data / Intro to Big Data Ecosystem
Wprowadzenie do technologii Big Data / Intro to Big Data Ecosystem
 
python-numpyandpandas-170922144956 (1).pptx
python-numpyandpandas-170922144956 (1).pptxpython-numpyandpandas-170922144956 (1).pptx
python-numpyandpandas-170922144956 (1).pptx
 
Lecture 9.pptx
Lecture 9.pptxLecture 9.pptx
Lecture 9.pptx
 
Next Generation Programming in R
Next Generation Programming in RNext Generation Programming in R
Next Generation Programming in R
 
Standardizing on a single N-dimensional array API for Python
Standardizing on a single N-dimensional array API for PythonStandardizing on a single N-dimensional array API for Python
Standardizing on a single N-dimensional array API for Python
 

Plus de Ajay Ohri

Introduction to R ajay Ohri
Introduction to R ajay OhriIntroduction to R ajay Ohri
Introduction to R ajay OhriAjay Ohri
 
Introduction to R
Introduction to RIntroduction to R
Introduction to RAjay Ohri
 
Social Media and Fake News in the 2016 Election
Social Media and Fake News in the 2016 ElectionSocial Media and Fake News in the 2016 Election
Social Media and Fake News in the 2016 ElectionAjay Ohri
 
Download Python for R Users pdf for free
Download Python for R Users pdf for freeDownload Python for R Users pdf for free
Download Python for R Users pdf for freeAjay Ohri
 
Install spark on_windows10
Install spark on_windows10Install spark on_windows10
Install spark on_windows10Ajay Ohri
 
Ajay ohri Resume
Ajay ohri ResumeAjay ohri Resume
Ajay ohri ResumeAjay Ohri
 
Statistics for data scientists
Statistics for  data scientistsStatistics for  data scientists
Statistics for data scientistsAjay Ohri
 
National seminar on emergence of internet of things (io t) trends and challe...
National seminar on emergence of internet of things (io t)  trends and challe...National seminar on emergence of internet of things (io t)  trends and challe...
National seminar on emergence of internet of things (io t) trends and challe...Ajay Ohri
 
Tools and techniques for data science
Tools and techniques for data scienceTools and techniques for data science
Tools and techniques for data scienceAjay Ohri
 
How Big Data ,Cloud Computing ,Data Science can help business
How Big Data ,Cloud Computing ,Data Science can help businessHow Big Data ,Cloud Computing ,Data Science can help business
How Big Data ,Cloud Computing ,Data Science can help businessAjay Ohri
 
Training in Analytics and Data Science
Training in Analytics and Data ScienceTraining in Analytics and Data Science
Training in Analytics and Data ScienceAjay Ohri
 
Software Testing for Data Scientists
Software Testing for Data ScientistsSoftware Testing for Data Scientists
Software Testing for Data ScientistsAjay Ohri
 
A Data Science Tutorial in Python
A Data Science Tutorial in PythonA Data Science Tutorial in Python
A Data Science Tutorial in PythonAjay Ohri
 
How does cryptography work? by Jeroen Ooms
How does cryptography work?  by Jeroen OomsHow does cryptography work?  by Jeroen Ooms
How does cryptography work? by Jeroen OomsAjay Ohri
 
Using R for Social Media and Sports Analytics
Using R for Social Media and Sports AnalyticsUsing R for Social Media and Sports Analytics
Using R for Social Media and Sports AnalyticsAjay Ohri
 
Kush stats alpha
Kush stats alpha Kush stats alpha
Kush stats alpha Ajay Ohri
 
Analyze this
Analyze thisAnalyze this
Analyze thisAjay Ohri
 

Plus de Ajay Ohri (20)

Introduction to R ajay Ohri
Introduction to R ajay OhriIntroduction to R ajay Ohri
Introduction to R ajay Ohri
 
Introduction to R
Introduction to RIntroduction to R
Introduction to R
 
Social Media and Fake News in the 2016 Election
Social Media and Fake News in the 2016 ElectionSocial Media and Fake News in the 2016 Election
Social Media and Fake News in the 2016 Election
 
Pyspark
PysparkPyspark
Pyspark
 
Download Python for R Users pdf for free
Download Python for R Users pdf for freeDownload Python for R Users pdf for free
Download Python for R Users pdf for free
 
Install spark on_windows10
Install spark on_windows10Install spark on_windows10
Install spark on_windows10
 
Ajay ohri Resume
Ajay ohri ResumeAjay ohri Resume
Ajay ohri Resume
 
Statistics for data scientists
Statistics for  data scientistsStatistics for  data scientists
Statistics for data scientists
 
National seminar on emergence of internet of things (io t) trends and challe...
National seminar on emergence of internet of things (io t)  trends and challe...National seminar on emergence of internet of things (io t)  trends and challe...
National seminar on emergence of internet of things (io t) trends and challe...
 
Tools and techniques for data science
Tools and techniques for data scienceTools and techniques for data science
Tools and techniques for data science
 
How Big Data ,Cloud Computing ,Data Science can help business
How Big Data ,Cloud Computing ,Data Science can help businessHow Big Data ,Cloud Computing ,Data Science can help business
How Big Data ,Cloud Computing ,Data Science can help business
 
Training in Analytics and Data Science
Training in Analytics and Data ScienceTraining in Analytics and Data Science
Training in Analytics and Data Science
 
Tradecraft
Tradecraft   Tradecraft
Tradecraft
 
Software Testing for Data Scientists
Software Testing for Data ScientistsSoftware Testing for Data Scientists
Software Testing for Data Scientists
 
Craps
CrapsCraps
Craps
 
A Data Science Tutorial in Python
A Data Science Tutorial in PythonA Data Science Tutorial in Python
A Data Science Tutorial in Python
 
How does cryptography work? by Jeroen Ooms
How does cryptography work?  by Jeroen OomsHow does cryptography work?  by Jeroen Ooms
How does cryptography work? by Jeroen Ooms
 
Using R for Social Media and Sports Analytics
Using R for Social Media and Sports AnalyticsUsing R for Social Media and Sports Analytics
Using R for Social Media and Sports Analytics
 
Kush stats alpha
Kush stats alpha Kush stats alpha
Kush stats alpha
 
Analyze this
Analyze thisAnalyze this
Analyze this
 

Dernier

办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一fhwihughh
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSINGmarianagonzalez07
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectBoston Institute of Analytics
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...GQ Research
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxBoston Institute of Analytics
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理e4aez8ss
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 

Dernier (20)

Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis Project
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 

Python for R Users

  • 1. Python for R Users By Chandan Routray As a part of internship at www.decisionstats.com
  • 2. Basic Commands Dec 2014 Copyrigt www.decisionstats.com Licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. i Functions R Python Downloading and installing a package install.packages('name') pip install name Load a package library('name') import name as other_name Checking working directory getwd() import os os.getcwd() Setting working directory setwd() os.chdir() List files in a directory dir() os.listdir() List all objects ls() globals() Remove an object rm('name') del('object')
  • 3. Data Frame Creation R Python (Using pandas package*) Creating a data frame “df” of dimension 6x4 (6 rows and 4 columns) containing random numbers A<­ matrix(runif(24,0,1),nrow=6,ncol=4) df<­data.frame(A) Here, • runif function generates 24 random numbers between 0 to 1 • matrix function creates a matrix from those random numbers, nrow and ncol sets the numbers of rows and columns to the matrix • data.frame converts the matrix to data frame import numpy as np import pandas as pd A=np.random.randn(6,4) df=pd.DataFrame(A) Here, • np.random.randn generates a matrix of 6 rows and 4 columns; this function is a part of numpy** library • pd.DataFrame converts the matrix in to a data frame *To install Pandas library visit: http://pandas.pydata.org/; To import Pandas library type: import pandas as pd; **To import Numpy library type: import numpy as np; Dec 2014 Copyrigt www.decisionstats.com Licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 1
  • 4. Data Frame Creation R Python Dec 2014 Copyrigt www.decisionstats.com Licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 2
  • 5. Data Frame: Inspecting and Viewing Data R Python (Using pandas package*) Getting the names of rows and columns of data frame “df” rownames(df) returns the name of the rows colnames(df) returns the name of the columns df.index returns the name of the rows df.columns returns the name of the columns Seeing the top and bottom “x” rows of the data frame “df” head(df,x) returns top x rows of data frame tail(df,x) returns bottom x rows of data frame df.head(x) returns top x rows of data frame df.tail(x) returns bottom x rows of data frame Getting dimension of data frame “df” dim(df) returns in this format : rows, columns df.shape returns in this format : (rows, columns) Length of data frame “df” length(df) returns no. of columns in data frames len(df) returns no. of columns in data frames Dec 2014 Copyrigt www.decisionstats.com Licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 3
  • 6. Data Frame: Inspecting and Viewing Data R Python Dec 2014 Copyrigt www.decisionstats.com Licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 4
  • 7. Data Frame: Inspecting and Viewing Data R Python (Using pandas package*) Getting quick summary(like mean, std. deviation etc. ) of data in the data frame “df” summary(df) returns mean, median , maximum, minimum, first quarter and third quarter df.describe() returns count, mean, standard deviation, maximum, minimum, 25%, 50% and 75% Setting row names and columns names of the data frame “df” rownames(df)=c(“A”, ”B”, “C”, ”D”,  “E”, ”F”) set the row names to A, B, C, D and E colnames=c(“P”, ”Q”, “R”, ”S”) set the column names to P, Q, R and S df.index=[“A”, ”B”, “C”, ”D”,  “E”, ”F”] set the row names to A, B, C, D and E df.columns=[“P”, ”Q”, “R”, ”S”] set the column names to P, Q, R and S Dec 2014 Copyrigt www.decisionstats.com Licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 5
  • 8. Data Frame: Inspecting and Viewing Data R Python Dec 2014 Copyrigt www.decisionstats.com Licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 6
  • 9. Data Frame: Sorting Data R Python (Using pandas package*) Sorting the data in the data frame “df” by column name “P” df[order(df$P),] df.sort(['P']) Dec 2014 Copyrigt www.decisionstats.com Licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 7
  • 10. Data Frame: Sorting Data R Python Dec 2014 Copyrigt www.decisionstats.com Licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 8
  • 11. Data Frame: Data Selection R Python (Using pandas package*) Slicing the rows of a data frame from row no. “x” to row no. “y”(including row x and y) df[x:y,] df[x­1:y] Python starts counting from 0 Slicing the columns name “x”,”Y” etc. of a data frame “df” myvars <­ c(“X”,”Y”) newdata <­ df[myvars] df.loc[:,[‘X’,’Y’]] Selecting the the data from row no. “x” to “y” and column no. “a” to “b” df[x:y,a:b] df.iloc[x­1:y,a­1,b] Selecting the element at row no. “x” and column no. “y” df[x,y] df.iat[x­1,y­1] Dec 2014 Copyrigt www.decisionstats.com Licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 9
  • 12. Data Frame: Data Selection R Python Dec 2014 Copyrigt www.decisionstats.com Licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 10
  • 13. Data Frame: Data Selection R Python (Using pandas package*) Using a single column’s values to select data, column name “A” subset(df,A>0) It will select the all the rows in which the corresponding value in column A of that row is greater than 0 df[df.A > 0] It will do the same as the R function PythonR Dec 2014 Copyrigt www.decisionstats.com Licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 11
  • 14. Mathematical Functions Functions R Python (import math and numpy library) Sum sum(x) math.fsum(x) Square Root sqrt(x) math.sqrt(x) Standard Deviation sd(x) numpy.std(x) Log log(x) math.log(x[,base]) Mean mean(x) numpy.mean(x) Median median(x) numpy.median(x) Dec 2014 Copyrigt www.decisionstats.com Licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 12
  • 15. Mathematical Functions R Python Dec 2014 Copyrigt www.decisionstats.com Licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 13
  • 16. Data Manipulation Functions R Python (import math and numpy library) Convert character variable to numeric variable as.numeric(x) For a single value: int(x), long(x), float(x) For list, vectors etc.: map(int,x), map(float,x) Convert factor/numeric variable to character variable paste(x) For a single value: str(x) For list, vectors etc.: map(str,x) Check missing value in an object is.na(x) math.isnan(x) Delete missing value from an object na.omit(list) cleanedList = [x for x in list if str(x) ! = 'nan'] Calculate the number of characters in character value nchar(x) len(x) Dec 2014 Copyrigt www.decisionstats.com Licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 14
  • 17. Date & Time Manipulation Functions R (import lubridate library) Python (import datetime library) Getting time and date at an instant Sys.time() datetime.datetime.now() Parsing date and time in format: YYYY MM DD HH:MM:SS d<­Sys.time() d_format<­ymd_hms(d) d=datetime.datetime.now() format= “%Y %b %d  %H:%M:%S” d_format=d.strftime(format) Dec 2014 Copyrigt www.decisionstats.com Licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 15
  • 18. Data Visualization Functions R Python (import matplotlib library**) Scatter Plot variable1 vs variable2 plot(variable1,variable2) plt.scatter(variable1,variable2) plt.show() Boxplot for Var boxplot(Var) plt.boxplot(Var) plt.show() Histogram for Var hist(Var) plt.hist(Var) plt.show() Pie Chart for Var pie(Var) from pylab import * pie(Var) show() ** To import matplotlib library type: import matplotlib.pyplot as plt Dec 2014 Copyrigt www.decisionstats.com Licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 16
  • 19. Data Visualization: Scatter Plot R Python Dec 2014 Copyrigt www.decisionstats.com Licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 17
  • 20. Data Visualization: Box Plot R Python Dec 2014 Copyrigt www.decisionstats.com Licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 18
  • 21. Data Visualization: Histogram R Python Dec 2014 Copyrigt www.decisionstats.com Licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 19
  • 22. Data Visualization: Line Plot R Python Dec 2014 Copyrigt www.decisionstats.com Licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 20
  • 23. Data Visualization: Bubble R Python Dec 2014 Copyrigt www.decisionstats.com Licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 22
  • 24. Data Visualization: Bar R Python Dec 2014 Copyrigt www.decisionstats.com Licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 21
  • 25. Data Visualization: Pie Chart R Python Dec 2014 Copyrigt www.decisionstats.com Licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 23
  • 26. Thank You For feedback contact DecisionStats.com
  • 27. Coming up ● Data Mining in Python and R ( see draft slides afterwards)
  • 28. Machine Learning: SVM on Iris Dataset *To know more about svm function in R visit: http://cran.r-project.org/web/packages/e1071/ ** To install sklearn library visit : http://scikit-learn.org/, To know more about sklearn svm visit: http://scikit- learn.org/stable/modules/generated/sklearn.svm.SVC.html R(Using svm* function) Python(Using sklearn** library) library(e1071) data(iris) trainset <­iris[1:149,] testset <­iris[150,] svm.model <­ svm(Species ~ ., data =  trainset, cost = 100, gamma = 1, type= 'C­ classification') svm.pred<­ predict(svm.model,testset[­5]) svm.pred #Loading Library from sklearn import svm #Importing Dataset from sklearn import datasets #Calling SVM clf = svm.SVC() #Loading the package iris = datasets.load_iris() #Constructing training data X, y = iris.data[:­1], iris.target[:­1] #Fitting SVM clf.fit(X, y) #Testing the model on test data print clf.predict(iris.data[­1]) Output: Virginica Output: 2, corresponds to Virginica
  • 29. Linear Regression: Iris Dataset *To know more about lm function in R visit: https://stat.ethz.ch/R-manual/R-devel/library/stats/html/lm.html ** ** To know more about sklearn linear regression visit : http://scikit- learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html R(Using lm* function) Python(Using sklearn** library) data(iris) total_size<­dim(iris)[1] num_target<­c(rep(0,total_size)) for (i in 1:length(num_target)){   if(iris$Species[i]=='setosa'){num_target[i]<­0}   else if(iris$Species[i]=='versicolor') {num_target[i]<­1}   else{num_target[i]<­2} } iris$Species<­num_target train_set <­iris[1:149,] test_set <­iris[150,] fit<­lm(Species ~ 0+Sepal.Length+ Sepal.Width+  Petal.Length+ Petal.Width , data=train_set) coefficients(fit) predict.lm(fit,test_set) from sklearn import linear_model from sklearn import datasets iris = datasets.load_iris() regr = linear_model.LinearRegression() X, y = iris.data[:­1], iris.target[:­1] regr.fit(X, y) print(regr.coef_) print regr.predict(iris.data[­1]) Output: 1.64 Output: 1.65
  • 30. Random forest: Iris Dataset *To know more about randomForest package in R visit: http://cran.r-project.org/web/packages/randomForest/ ** To know more about sklearn random forest visit : http://scikit- learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html R(Using randomForest* package) Python(Using sklearn** library) library(randomForest) data(iris) total_size<­dim(iris)[1] num_target<­c(rep(0,total_size)) for (i in 1:length(num_target)){   if(iris$Species[i]=='setosa'){num_target[i]<­0}   else if(iris$Species[i]=='versicolor') {num_target[i]<­1}   else{num_target[i]<­2}} iris$Species<­num_target train_set <­iris[1:149,] test_set <­iris[150,] iris.rf <­ randomForest(Species ~ .,  data=train_set,ntree=100,importance=TRUE,                         proximity=TRUE) print(iris.rf) predict(iris.rf, test_set[­5], predict.all=TRUE) from sklearn import ensemble from sklearn import datasets clf =  ensemble.RandomForestClassifier(n_estimato rs=100,max_depth=10) iris = datasets.load_iris() X, y = iris.data[:­1], iris.target[:­1] clf.fit(X, y) print clf.predict(iris.data[­1]) Output: 1.845 Output: 2
  • 31. Decision Tree: Iris Dataset *To know more about rpart package in R visit: http://cran.r-project.org/web/packages/rpart/ ** To know more about sklearn desicion tree visit : http://scikit- learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html R(Using rpart* package) Python(Using sklearn** library) library(rpart) data(iris) sub <­ c(1:149) fit <­ rpart(Species ~ ., data = iris,  subset = sub) fit predict(fit, iris[­sub,], type = "class") from sklearn.datasets import load_iris from sklearn.tree import  DecisionTreeClassifier clf =  DecisionTreeClassifier(random_state=0) iris = datasets.load_iris() X, y = iris.data[:­1], iris.target[:­1] clf.fit(X, y) print clf.predict(iris.data[­1]) Output: Virginica Output: 2, corresponds to virginica
  • 32. Gaussian Naive Bayes: Iris Dataset *To know more about e1071 package in R visit: http://cran.r-project.org/web/packages/e1071/ ** To know more about sklearn Naive Bayes visit : http://scikit- learn.org/stable/modules/generated/sklearn.naive_bayes.GaussianNB.html R(Using e1071* package) Python(Using sklearn** library) library(e1071) data(iris) trainset <­iris[1:149,] testset <­iris[150,] classifier<­naiveBayes(trainset[,1:4],  trainset[,5])  predict(classifier, testset[,­5]) from sklearn.datasets import load_iris from sklearn.naive_bayes import GaussianNB clf = GaussianNB() iris = datasets.load_iris() X, y = iris.data[:­1], iris.target[:­1] clf.fit(X, y) print clf.predict(iris.data[­1]) Output: Virginica Output: 2, corresponds to virginica
  • 33. K Nearest Neighbours: Iris Dataset *To know more about kknn package in R visit: ** To know more about sklearn k nearest neighbours visit : http://scikit- learn.org/stable/modules/generated/sklearn.neighbors.NearestNeighbors.html R(Using kknn* package) Python(Using sklearn** library) library(kknn) data(iris) trainset <­iris[1:149,] testset <­iris[150,] iris.kknn <­ kknn(Species~.,  trainset,testset, distance = 1,                 kernel = "triangular") summary(iris.kknn) fit <­ fitted(iris.kknn) fit from sklearn.datasets import load_iris from sklearn.neighbors import  KNeighborsClassifier knn = KNeighborsClassifier() iris = datasets.load_iris() X, y = iris.data[:­1], iris.target[:­1] knn.fit(X,y)  print knn.predict(iris.data[­1]) Output: Virginica Output: 2, corresponds to virginica
  • 34. Thank You For feedback please let us know at ohri2007@gmail.com