How to Connect to Microsoft SQL Server Using R
Refresher on SQL, Databases, Servers, and Ports
A database is the basic electronic information storage unit. It is a collection of information organized to provide efficient retrieval. The collected information could be in any number of formats (electronic, printed, graphic, audio, statistical, combinations). There are physical (paper/print) and electronic databases.
A database could be as simple as an alphabetical arrangement of names in an address book or as complex as a database that provides information in a combination of formats.
A server is something that handles requsts over a network. It listens for requests on a port. A database server handles query requsts into a database. Th database server retrieves the requests an assembles the result in a way specified in the query.
SQL stands for structured query language. It is just a query language, not a database. It does not describe a database. It can describe a server because it can be the language in which the server reads its queries.
# install.packages("odbc")
library(DBI)
What does the DBI
package do?
DBI
separates the connectivity to the DBMS into a “front-end” and a “back-end”. Applications use only the exposed front-end API. The back-end facilities that communicate with specific DBMSs (SQLite, MySQL, PostgreSQL, MonetDB, etc.) are provided by drivers (other packages) that get invoked automatically.
In our case, the driver we are using is odbc
. We need the odbc
package to access our that we’ve been managing using Microsoft SQL Server Management Studio. Note that we are not explicitly invoking the odbc
package ie we are not running library("odbc")
.
What is the odbc
package?
The goal of the odbc
package is to provide a DBI-compliant interface to Open Database Connectivity (ODBC) drivers. This allows for an efficient, easy to setup connection to any database with ODBC drivers available, including SQL Server, Oracle, MySQL, PostgreSQL, SQLite and others. The implementation builds on the nanodbc C++ library.
Definition Open Database Connectivity (ODBC) Drivers
The Microsoft Open Database Connectivity (ODBC) interface is a C programming language interface that makes it possible for applications to access data from a variety of database management systems (DBMSs).
# how to connect to a local MS SQL Express instance:
con <- DBI::dbConnect(odbc::odbc(),
Driver = "SQL Server",
Server = "localhost\\SQLEXPRESS",
Database = "AdventureWorks2016",
Trusted_Connection = "True")
How are we using odbc
without running library("odbc")
? What is odbc::odbc()
?
DBI
provides a front-end interface that we will use and does all the back-end stuff for us invisibly; hence, we do not need to run library("odbc")
.
However, dbConnect(driver, ...)
does requre a driver. It is more efficient and cleaner to just grab the driver, odbc()
object from the odbc
package. ::
is an R unary operator that allows us to access variabls in a namespace.
# Create a RSQLite driver with a string
d <- dbDriver("SQLite")
# BETTER: access the object directly
RSQLite::SQLite()
dbConnect(driver, ...)
con <- DBI::dbConnect(odbc::odbc(),
Driver = "[your driver's name]",
Server = "[your server's path]",
Database = "[your database's name]",
UID = rstudioapi::askForPassword("Database user"),
PWD = rstudioapi::askForPassword("Database password"),
Port = 1433)
dbListTables(con)[0:10]
‘AWBuildVersion’, ‘DatabaseLog’, ‘ErrorLog’, ‘Department’, ‘Employee’, ‘EmployeeDepartmentHistory’, ‘EmployeePayHistory’, ‘JobCandidate’, ‘Shift’, ‘Address’ |
dbSendQuery(conn, statement, ...)
, dbFetch(res, n)
, dbClearResult(rs)
, and dbGetQuery
The function dbSendQuery
only submits and synchronously executes the SQL query to the database engine. It does not extract any records — for that you need to use the function dbFetch
, and then you must call dbClearResult
when you finish fetching the records you need.
For interactive use, you should almost always prefer dbGetQuery
. It sends the query, fetches the result, and clears the result in one swoop.
stmt = "SELECT SCHEMA_NAME(schema_id) As SchemaName, name As TableName FROM sys.tables;"
rs <- dbSendQuery(con, stmt)
dbFetch(rs, n = 10)
SchemaName | TableName |
---|---|
Sales | SalesTaxRate |
Sales | PersonCreditCard |
Person | PersonPhone |
Sales | SalesTerritory |
Person | PhoneNumberType |
Production | Product |
Sales | SalesTerritoryHistory |
Production | ScrapReason |
HumanResources | Shift |
Production | ProductCategory |
Note that we get an error. This is because we did not clear result set rs
dbListFields(con, 'Product')
Error in connection_sql_columns(conn@ptr, table_name = name): nanodbc/nanodbc.cpp:4332: HY000: [Microsoft][ODBC SQL Server Driver]Connection is busy with results for another hstmt
Traceback:
1. dbListFields(con, "Product")
2. dbListFields(con, "Product")
3. connection_sql_columns(conn@ptr, table_name = name)
dbClearResult(rs)
Now dbListFields(con, 'Product')
works.
dbListFields(con, 'Product')
‘ProductID’, ‘Name’, ‘ProductNumber’, ‘MakeFlag’, ‘FinishedGoodsFlag’, ‘Color’, ‘SafetyStockLevel’, ‘ReorderPoint’, ‘StandardCost’, ‘ListPrice’, ‘Size’, ‘SizeUnitMeasureCode’, ‘WeightUnitMeasureCode’, ‘Weight’, ‘DaysToManufacture’, ‘ProductLine’, ‘Class’, ‘Style’, ‘ProductSubcategoryID’, ‘ProductModelID’, ‘SellStartDate’, ‘SellEndDate’, ‘DiscontinuedDate’, ‘rowguid’, ‘ModifiedDate’ |
df = dbGetQuery(con, "SELECT * FROM Production.Product WHERE SellStartDate = '2008-04-30'")
head(df)
ProductID | Name | ProductNumber | MakeFlag | FinishedGoodsFlag | Color | SafetyStockLevel | ReorderPoint | StandardCost | ListPrice | … | ProductLine | Class | Style | ProductSubcategoryID | ProductModelID | SellStartDate | SellEndDate | DiscontinuedDate | rowguid | ModifiedDate |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | Adjustable Race | AR-5381 | FALSE | FALSE | NA | 1000 | 750 | 0 | 0 | … | NA | NA | NA | NA | NA | 2008-04-30 | NA | NA | 694215B7-08F7-4C0D-ACB1-D734BA44C0C8 | 2014-02-08 10:01:36 |
2 | Bearing Ball | BA-8327 | FALSE | FALSE | NA | 1000 | 750 | 0 | 0 | … | NA | NA | NA | NA | NA | 2008-04-30 | NA | NA | 58AE3C20-4F3A-4749-A7D4-D568806CC537 | 2014-02-08 10:01:36 |
3 | BB Ball Bearing | BE-2349 | TRUE | FALSE | NA | 800 | 600 | 0 | 0 | … | NA | NA | NA | NA | NA | 2008-04-30 | NA | NA | 9C21AED2-5BFA-4F18-BCB8-F11638DC2E4E | 2014-02-08 10:01:36 |
4 | Headset Ball Bearings | BE-2908 | FALSE | FALSE | NA | 800 | 600 | 0 | 0 | … | NA | NA | NA | NA | NA | 2008-04-30 | NA | NA | ECFED6CB-51FF-49B5-B06C-7D8AC834DB8B | 2014-02-08 10:01:36 |
316 | Blade | BL-2036 | TRUE | FALSE | NA | 800 | 600 | 0 | 0 | … | NA | NA | NA | NA | NA | 2008-04-30 | NA | NA | E73E9750-603B-4131-89F5-3DD15ED5FF80 | 2014-02-08 10:01:36 |
317 | LL Crankarm | CA-5965 | FALSE | FALSE | Black | 500 | 375 | 0 | 0 | … | NA | L | NA | NA | NA | 2008-04-30 | NA | NA | 3C9D10B7-A6B2-4774-9963-C19DCEE72FEA | 2014-02-08 10:01:36 |
Note that dbReadTable(con, ‘Product’, schema = ‘Production’) does not work. We won’t go into details as to why. Suffice it to say that there is a bug in DBI
and we are not the first to get this error.
head(dbReadTable(con, SQL('Production.Product')))
ProductID | Name | ProductNumber | MakeFlag | FinishedGoodsFlag | Color | SafetyStockLevel | ReorderPoint | StandardCost | ListPrice | … | ProductLine | Class | Style | ProductSubcategoryID | ProductModelID | SellStartDate | SellEndDate | DiscontinuedDate | rowguid | ModifiedDate |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | Adjustable Race | AR-5381 | FALSE | FALSE | NA | 1000 | 750 | 0 | 0 | … | NA | NA | NA | NA | NA | 2008-04-30 | NA | NA | 694215B7-08F7-4C0D-ACB1-D734BA44C0C8 | 2014-02-08 10:01:36 |
2 | Bearing Ball | BA-8327 | FALSE | FALSE | NA | 1000 | 750 | 0 | 0 | … | NA | NA | NA | NA | NA | 2008-04-30 | NA | NA | 58AE3C20-4F3A-4749-A7D4-D568806CC537 | 2014-02-08 10:01:36 |
3 | BB Ball Bearing | BE-2349 | TRUE | FALSE | NA | 800 | 600 | 0 | 0 | … | NA | NA | NA | NA | NA | 2008-04-30 | NA | NA | 9C21AED2-5BFA-4F18-BCB8-F11638DC2E4E | 2014-02-08 10:01:36 |
4 | Headset Ball Bearings | BE-2908 | FALSE | FALSE | NA | 800 | 600 | 0 | 0 | … | NA | NA | NA | NA | NA | 2008-04-30 | NA | NA | ECFED6CB-51FF-49B5-B06C-7D8AC834DB8B | 2014-02-08 10:01:36 |
316 | Blade | BL-2036 | TRUE | FALSE | NA | 800 | 600 | 0 | 0 | … | NA | NA | NA | NA | NA | 2008-04-30 | NA | NA | E73E9750-603B-4131-89F5-3DD15ED5FF80 | 2014-02-08 10:01:36 |
317 | LL Crankarm | CA-5965 | FALSE | FALSE | Black | 500 | 375 | 0 | 0 | … | NA | L | NA | NA | NA | 2008-04-30 | NA | NA | 3C9D10B7-A6B2-4774-9963-C19DCEE72FEA | 2014-02-08 10:01:36 |
Writing To Database
head(ToothGrowth)
len | supp | dose |
---|---|---|
4.2 | VC | 0.5 |
11.5 | VC | 0.5 |
7.3 | VC | 0.5 |
5.8 | VC | 0.5 |
6.4 | VC | 0.5 |
10.0 | VC | 0.5 |
dbWriteTable(con, "ToothGrowth", ToothGrowth)
dbListTables(con)[3:5]
‘ErrorLog’, ‘mtcars’, ‘ToothGrowth’ |
dbListFields(con, "ToothGrowth")
‘len’, ‘supp’, ‘dose’ |
head(dbReadTable(con, "ToothGrowth"))
len | supp | dose |
---|---|---|
4.2 | VC | 0.5 |
11.5 | VC | 0.5 |
7.3 | VC | 0.5 |
5.8 | VC | 0.5 |
6.4 | VC | 0.5 |
10.0 | VC | 0.5 |
dbDisconnect(con)
dbDisconnect(conn, ...)
closes the connection, discards all pending work, and frees resources (e.g., memory, sockets).