In this tutorial, we learn to access MySQL database from R using the RMySQL package. The tutorial covers everything from creating tables, appending data to removing tables from the database.
2. Course Material R2 Academy
All the material related to this course are available at our Website
Slides can be viewed at SlideShare
Scripts can be downloaded from GitHub
Videos can be viewed on our YouTube Channel
www.rsquaredacademy.com 2
3. Table Of Contents R2 Academy
→ Objectives
→ Introduction
→ Installing RMySQL
→ RMySQL Commands
→ Connecting to MySQL
→ Database Info
→ Listing Tables
→ Creating Tables
→ Import data into R data frame
→ Export data from R
www.rsquaredacademy.com 3
4. Objectives R2 Academy
→ Install & load RMySQL package
→ Connect to a MySQL Database from R
→ Display database information
→ List tables in the database
→ Create new table
→ Import data into R for analysis
→ Export data from R
→ Remove tables & disconnect
www.rsquaredacademy.com 4
5. Introduction R2 Academy
www.rsquaredacademy.com 5
In real world, data is often stored in relational databases such as MySQL and an analyst is required to extract the
data in order to perform any type of analysis. If you are using R for statistical analysis and a relational database for
storing the data, you need to interact with the database in order to access the relevant data sets.
One way to accomplish the above task is to export the data from the database in some file format and import the
same into R. Similarly, if you have some data as a data frame in R and want to store it in a database, you will need to
export the data from R and import it into the database. This method can be very cumbersome and frustrating.
The RMySQL package was created to help R users to easily access a MySQL database from R. In order to take
advantage of the features of the package, you need the following:
• Access to MySQL database
• Knowledge of basic SQL commands
• Latest version of R (3.2.3)
• RStudio (Version 0.99.491) (Optional)
• RMySQL Package (Version 0.10.8)
6. RMySQL Package R2 Academy
www.rsquaredacademy.com 6
RMySQL package allows you to access MySQL from R. It was created by Jeffrey Horner but is being maintained by
Jeroen Ooms and Hadley Wickham. The latest release of the package is version 0.10.8. You can install and load the
package using the following commands:
# install the package
install.packages("RMySQL")
# load the package
library(RMySQL)
7. Connect To Database R2 Academy
www.rsquaredacademy.com 7
We can establish a connection to a MySQL database using the dbConnect() function. In order to connect to the
database, we need to specify the following:
• MySQL Connection
• Database name
• Username
• Password
• Host Details
Below is an example:
# create a MySQL connection object
con <- dbConnect(MySQL(),
user = 'root',
password = 'password',
host = 'localhost',
dbname = 'world')
8. Connection Summary R2 Academy
www.rsquaredacademy.com 8
We can get a summary or meta data of the connection using summary() function. We need to specify the name of
the MySQL Connection object for which we are seeking meta data.
Below is an example:
# connect to MySQL
con <- dbConnect(MySQL(),
user = 'root',
password = 'password',
host = 'localhost',
dbname = 'world')
> summary(con)
<MySQLConnection:0,0>
User: root
Host: localhost
Dbname: world
Connection type: localhost via TCP/IP
Results:
9. Database Info R2 Academy
www.rsquaredacademy.com 9
The dbGetInfo() function can be used
to access information about the
database to which we have established
a connection. Among other things, it
will return the following information
about host, server and connection
type.
> dbGetInfo(con)
$host
[1] "localhost"
$user
[1] "root"
$dbname
[1] "world"
$conType
[1] "localhost via TCP/IP"
$serverVersion
[1] "5.7.9-log"
$protocolVersion
[1] 10
$threadId
[1] 7
$rsId
list()
10. List Tables R2 Academy
www.rsquaredacademy.com 10
Once we have successfully established a connection to a MySQL database, we can use the dbListTables() function
to access the list of tables that are present in that particular database. We need to specify the name of the MySQL
connection object for which we are seeking the list of tables.
Below is an example:
# list of tables in the database
> dbListTables(con)
[1] "city" "country" "countrylanguage"
[4] "mtcars"
As you can see, there are four tables in the database to which we established the connection through RMySQL
package. In the function, we have not specified the database name but the name of the MySQL connection object
we created when we connected to the database.
11. List Fields R2 Academy
www.rsquaredacademy.com 11
To get a list of fields or columns in a particular table in the database, we can use the dbListFields() function. We
need to specify the name of the MySQL connection object as well as the table name. If the table exists in the
database, the names of the fields will be returned.
Below is an example:
# list of fields in table city
> dbListFields(con, "city")
[1] "ID" "Name" "CountryCode" "District"
[5] "Population"
The name of the table must be enclosed in single/double quotes and the names of the fields is returned as a
character vector.
12. Testing Data Types R2 Academy
www.rsquaredacademy.com 12
To test the SQL data type of an object, we can use the dbDataType() function.
Below is an example:
> # data type
> dbDataType(RMySQL::MySQL(), "a")
[1] "text"
> dbDataType(RMySQL::MySQL(), 1:5)
[1] "bigint"
> dbDataType(RMySQL::MySQL(), 1.5)
[1] "double"
We need to specify the driver details as well as the object to test the SQL data type.
13. Querying Data R2 Academy
www.rsquaredacademy.com 13
There are three different methods of querying data from a database:
• Import the complete table using dbReadTable()
• Send query and retrieve results using dgGetQuery()
• Submit query using dbSendQuery() and fetch results using dbFetch()
Let us explore each of the above methods one by one.
14. Import Table R2 Academy
www.rsquaredacademy.com 14
The dbReadTable() can be used to extract an entire table from a MySQL database. We can use this method only if
the table is not very big. We need to specify the name of the MySQL connection object and the table. The name of
the table must be enclosed in single/double quotes.
In the below example, we read the entire table named “trial” from the database.
> dbReadTable(con, "trial")
x y
1 1 a
2 2 b
3 3 c
4 4 d
5 5 e
6 6 f
7 7 g
8 8 h
9 9 i
10 10 j
15. Import Rows R2 Academy
www.rsquaredacademy.com 15
The dbGetQuery() function can be used to extract specific rows from a table. We can use this method when we
want to import rows that meet certain conditions from a big table stored in the database. We need to specify the
name of the MySQL connection object and query. The query must be enclosed in single/double quotes.
In the below example, we read the first 5 lines from the table named trial.
> dbGetQuery(con, "SELECT * FROM trial LIMIT 5;")
x y
1 1 a
2 2 b
3 3 c
4 4 d
5 5 e
dbGetQuery() function sends the query and fetches the results from the table in the database.
16. Import Data in Batches R2 Academy
www.rsquaredacademy.com 16
We can import data in batches as well. To achieve this, we will use two different functions:
• dbSendQuery()
• dbFetch()
dbSendQuery() will submit the query but will not extract any data. To fetch data from the database, we will use
the dbFetch() function which will fetch data from the query that was executed by dbSendQuery(). As you can
see, this method works in two stages. Let us look at an example to get a better understanding:
> # pull data in batches
> query <- dbSendQuery(con, "SELECT * FROM trial;")
> data <- dbFetch(query, n = 5)
We store the result of the dbSendQuery() function in an object ‘query.’ The MySQL connection object and the
SQL query are the inputs to this function. Next, we fetch the data using the dbFetch() function. The inputs for this
function are the result of the dbSendQuery() function and the number of rows to be fetched. The rows fetched
are stored in a new object ‘data ‘.
17. Query Information R2 Academy
www.rsquaredacademy.com 17
The dbGetInfo() function returns information about query that has been submitted for execution using
dbSendQuery(). Below is an example:
> res <- dbSendQuery(con, "SELECT * FROM trial;")
> dbGetInfo(res)
$statement
[1] "SELECT * FROM trial;"
$isSelect
[1] 1
$rowsAffected
[1] -1
$rowCount
[1] 0
$completed
[1] 0
$fieldDescription
$fieldDescription[[1]]
NULL
18. Query & Rows Info R2 Academy
www.rsquaredacademy.com 18
The dbGetStatement() function returns query that has been submitted for execution using dbSendQuery().
Below is an example:
> res <- dbSendQuery(con, "SELECT * FROM trial;")
> dbGetStatement(res)
[1] "SELECT * FROM trial;“
The dbGetRowCount() function returns the number of rows fetched from the database by the dbFetch()
function. Below is an example:
> res <- dbSendQuery(con, "SELECT * FROM trial;")
> data <- dbFetch(res, n = 5)
> dbGetRowCount(res)
[1] 5
The dbGetRowsAffected() function returns the number of rows affected returns query that has been submitted
for execution using dbSendQuery() function. Below is an example:
> dbGetRowsAffected(res)
[1] -1
19. Column Info R2 Academy
www.rsquaredacademy.com 19
The dbColumnInfo() function returns information about the columns of the table for which query has been
submitted using dbSendQuery(). Below is an example:
> res <- dbSendQuery(con, "SELECT * FROM trial;")
> dbColumnInfo(res)
name Sclass type length
1 row_names character BLOB/TEXT 196605
2 x double BIGINT 20
3 y character BLOB/TEXT 196605
The dbClearResult() function frees all the resources associated with the result set of the dbSendQuery()
function. Below is an example:
> res <- dbSendQuery(con, "SELECT * FROM trial;")
> dbClearResult(res)
[1] TRUE
20. Export/Write Table R2 Academy
www.rsquaredacademy.com 20
The dbWriteTable() function is used to export data from R to a database. It can be used for the following:
• Create new table
• Overwrite existing table
• Append data to table
In the first example, we will create a dummy data set and export it to the database. We will specify the following
within the dbWriteTable() function:
1. Name of the MySQL connection object
2. Name of the table to created in the database
3. Name of the data frame to be exported
21. Export/Write Table R2 Academy
www.rsquaredacademy.com 21
We will create the table trial that we have so far used in all the previous examples:
# list of tables in the database
> dbListTables(con)
[1] "city" "country" "countrylanguage"
[4] "mtcars"
# create dummy data set
> x <- 1:10
> y <- letters[1:10]
> trial <- data.frame(x, y, stringsAsFactors = FALSE)
# create table in the database
> dbWriteTable(con, "trial", trial)
[1] TRUE
# updated list of tables in the database
> dbListTables(con)
[1] "city" "country" "countrylanguage"
[4] "mtcars" "trial"
22. Overwrite Table R2 Academy
www.rsquaredacademy.com 22
We can overwrite the data in a table by using the overwrite option and setting it to TRUE. Let us overwrite the
table we created in the previous example:
# list of tables in the database
> dbListTables(con)
[1] "city" "country" "countrylanguage"
[4] "mtcars" "trial"
# create dummy data set
> x <- sample(100, 10)
> y <- letters[11:20]
> trial2 <- data.frame(x, y, stringsAsFactors = FALSE)
# overwrite table in the database
> dbWriteTable(con, "trial", trial2, overwrite = TRUE)
[1] TRUE
23. Append Data R2 Academy
www.rsquaredacademy.com 23
We can overwrite the data in a table by using the append option and setting it to TRUE. Let us append data to the
table we created in the previous example:
# list of tables in the database
> dbListTables(con)
[1] "city" "country" "countrylanguage"
[4] "mtcars" "trial"
# create dummy data set
> x <- sample(100, 10)
> y <- letters[5:14]
> trial3 <- data.frame(x, y, stringsAsFactors = FALSE)
# append data to the table in the database
> dbWriteTable(con, "trial", trial3, append = TRUE)
[1] TRUE
24. Remove Table R2 Academy
www.rsquaredacademy.com 24
The dbRemoveTable() function can be used to remove tables from the database. We need to specify the name of
the MySQL connection object and the table to be removed. The name of the table must be enclosed in single/double
quotes. Below is an example
# list of tables in the database
> dbListTables(con)
[1] "city" "country" "countrylanguage"
[4] "mtcars" "trial"
# remove table trial
> dbRemoveTable(con, "trial")
[1] TRUE
# updated list of tables in the database
> dbListTables(con)
[1] "city" "country" "countrylanguage"
[4] "mtcars"
25. Disconnect R2 Academy
www.rsquaredacademy.com 25
It is very important to close the connection to the database. The dbDisconnect() function can be used to
disconnect from the database. We need to specify the name of the MySQL connection object. Below is an example
# create a MySQL connection object
con <- dbConnect(MySQL(),
user = 'root',
password = 'password',
host = 'localhost',
dbname = 'world')
# disconnect from the database
> dbDisconnect(con)
[1] TRUE
26. R2 Academy
www.rsquaredacademy.com 26
Visit Rsquared Academy for
tutorials on:
→ R Programming
→ Business Analytics
→ Data Visualization
→ Web Applications
→ Package Development
→ Git & GitHub