How to Connect to Microsoft SQL Server Using R

Refresher on SQL, Databases, Servers, and Ports

A database is the basic electronic information storage unit. It is a collection of information organized to provide efficient retrieval. The collected information could be in any number of formats (electronic, printed, graphic, audio, statistical, combinations). There are physical (paper/print) and electronic databases.

A database could be as simple as an alphabetical arrangement of names in an address book or as complex as a database that provides information in a combination of formats.

A server is something that handles requsts over a network. It listens for requests on a port. A database server handles query requsts into a database. Th database server retrieves the requests an assembles the result in a way specified in the query.

SQL stands for structured query language. It is just a query language, not a database. It does not describe a database. It can describe a server because it can be the language in which the server reads its queries.

# install.packages("odbc")
library(DBI)

What does the DBI package do?

DBI separates the connectivity to the DBMS into a “front-end” and a “back-end”. Applications use only the exposed front-end API. The back-end facilities that communicate with specific DBMSs (SQLite, MySQL, PostgreSQL, MonetDB, etc.) are provided by drivers (other packages) that get invoked automatically.

In our case, the driver we are using is odbc. We need the odbc package to access our that we’ve been managing using Microsoft SQL Server Management Studio. Note that we are not explicitly invoking the odbc package ie we are not running library("odbc").

What is the odbc package?

The goal of the odbc package is to provide a DBI-compliant interface to Open Database Connectivity (ODBC) drivers. This allows for an efficient, easy to setup connection to any database with ODBC drivers available, including SQL Server, Oracle, MySQL, PostgreSQL, SQLite and others. The implementation builds on the nanodbc C++ library.

Definition Open Database Connectivity (ODBC) Drivers

The Microsoft Open Database Connectivity (ODBC) interface is a C programming language interface that makes it possible for applications to access data from a variety of database management systems (DBMSs).

# how to connect to a local MS SQL Express instance:
con <- DBI::dbConnect(odbc::odbc(), 
                      Driver = "SQL Server", 
                      Server = "localhost\\SQLEXPRESS", 
                      Database = "AdventureWorks2016", 
                      Trusted_Connection = "True")

How are we using odbc without running library("odbc")? What is odbc::odbc()?

DBI provides a front-end interface that we will use and does all the back-end stuff for us invisibly; hence, we do not need to run library("odbc").

However, dbConnect(driver, ...) does requre a driver. It is more efficient and cleaner to just grab the driver, odbc() object from the odbc package. :: is an R unary operator that allows us to access variabls in a namespace.

# Create a RSQLite driver with a string
d <- dbDriver("SQLite")

# BETTER: access the object directly
RSQLite::SQLite()

dbConnect(driver, ...)

con <- DBI::dbConnect(odbc::odbc(),
                      Driver   = "[your driver's name]",
                      Server   = "[your server's path]",
                      Database = "[your database's name]",
                      UID      = rstudioapi::askForPassword("Database user"),
                      PWD      = rstudioapi::askForPassword("Database password"),
                      Port     = 1433)
dbListTables(con)[0:10]
‘AWBuildVersion’, ‘DatabaseLog’, ‘ErrorLog’, ‘Department’, ‘Employee’, ‘EmployeeDepartmentHistory’, ‘EmployeePayHistory’, ‘JobCandidate’, ‘Shift’, ‘Address’

dbSendQuery(conn, statement, ...), dbFetch(res, n), dbClearResult(rs), and dbGetQuery

The function dbSendQuery only submits and synchronously executes the SQL query to the database engine. It does not extract any records — for that you need to use the function dbFetch, and then you must call dbClearResult when you finish fetching the records you need.

For interactive use, you should almost always prefer dbGetQuery. It sends the query, fetches the result, and clears the result in one swoop.

stmt = "SELECT SCHEMA_NAME(schema_id) As SchemaName, name As TableName FROM sys.tables;"
rs <- dbSendQuery(con, stmt)
dbFetch(rs, n = 10)
SchemaName TableName
Sales SalesTaxRate
Sales PersonCreditCard
Person PersonPhone
Sales SalesTerritory
Person PhoneNumberType
Production Product
Sales SalesTerritoryHistory
Production ScrapReason
HumanResources Shift
Production ProductCategory

Note that we get an error. This is because we did not clear result set rs

dbListFields(con, 'Product')
Error in connection_sql_columns(conn@ptr, table_name = name): nanodbc/nanodbc.cpp:4332: HY000: [Microsoft][ODBC SQL Server Driver]Connection is busy with results for another hstmt 
Traceback:


1. dbListFields(con, "Product")

2. dbListFields(con, "Product")

3. connection_sql_columns(conn@ptr, table_name = name)
dbClearResult(rs)

Now dbListFields(con, 'Product') works.

dbListFields(con, 'Product')
‘ProductID’, ‘Name’, ‘ProductNumber’, ‘MakeFlag’, ‘FinishedGoodsFlag’, ‘Color’, ‘SafetyStockLevel’, ‘ReorderPoint’, ‘StandardCost’, ‘ListPrice’, ‘Size’, ‘SizeUnitMeasureCode’, ‘WeightUnitMeasureCode’, ‘Weight’, ‘DaysToManufacture’, ‘ProductLine’, ‘Class’, ‘Style’, ‘ProductSubcategoryID’, ‘ProductModelID’, ‘SellStartDate’, ‘SellEndDate’, ‘DiscontinuedDate’, ‘rowguid’, ‘ModifiedDate’
df = dbGetQuery(con, "SELECT * FROM Production.Product WHERE SellStartDate = '2008-04-30'")
head(df)
ProductID Name ProductNumber MakeFlag FinishedGoodsFlag Color SafetyStockLevel ReorderPoint StandardCost ListPrice ProductLine Class Style ProductSubcategoryID ProductModelID SellStartDate SellEndDate DiscontinuedDate rowguid ModifiedDate
1 Adjustable Race AR-5381 FALSE FALSE NA 1000 750 0 0 NA NA NA NA NA 2008-04-30 NA NA 694215B7-08F7-4C0D-ACB1-D734BA44C0C8 2014-02-08 10:01:36
2 Bearing Ball BA-8327 FALSE FALSE NA 1000 750 0 0 NA NA NA NA NA 2008-04-30 NA NA 58AE3C20-4F3A-4749-A7D4-D568806CC537 2014-02-08 10:01:36
3 BB Ball Bearing BE-2349 TRUE FALSE NA 800 600 0 0 NA NA NA NA NA 2008-04-30 NA NA 9C21AED2-5BFA-4F18-BCB8-F11638DC2E4E 2014-02-08 10:01:36
4 Headset Ball Bearings BE-2908 FALSE FALSE NA 800 600 0 0 NA NA NA NA NA 2008-04-30 NA NA ECFED6CB-51FF-49B5-B06C-7D8AC834DB8B 2014-02-08 10:01:36
316 Blade BL-2036 TRUE FALSE NA 800 600 0 0 NA NA NA NA NA 2008-04-30 NA NA E73E9750-603B-4131-89F5-3DD15ED5FF80 2014-02-08 10:01:36
317 LL Crankarm CA-5965 FALSE FALSE Black 500 375 0 0 NA L NA NA NA 2008-04-30 NA NA 3C9D10B7-A6B2-4774-9963-C19DCEE72FEA 2014-02-08 10:01:36

Note that dbReadTable(con, ‘Product’, schema = ‘Production’) does not work. We won’t go into details as to why. Suffice it to say that there is a bug in DBI and we are not the first to get this error.

head(dbReadTable(con, SQL('Production.Product')))
ProductID Name ProductNumber MakeFlag FinishedGoodsFlag Color SafetyStockLevel ReorderPoint StandardCost ListPrice ProductLine Class Style ProductSubcategoryID ProductModelID SellStartDate SellEndDate DiscontinuedDate rowguid ModifiedDate
1 Adjustable Race AR-5381 FALSE FALSE NA 1000 750 0 0 NA NA NA NA NA 2008-04-30 NA NA 694215B7-08F7-4C0D-ACB1-D734BA44C0C8 2014-02-08 10:01:36
2 Bearing Ball BA-8327 FALSE FALSE NA 1000 750 0 0 NA NA NA NA NA 2008-04-30 NA NA 58AE3C20-4F3A-4749-A7D4-D568806CC537 2014-02-08 10:01:36
3 BB Ball Bearing BE-2349 TRUE FALSE NA 800 600 0 0 NA NA NA NA NA 2008-04-30 NA NA 9C21AED2-5BFA-4F18-BCB8-F11638DC2E4E 2014-02-08 10:01:36
4 Headset Ball Bearings BE-2908 FALSE FALSE NA 800 600 0 0 NA NA NA NA NA 2008-04-30 NA NA ECFED6CB-51FF-49B5-B06C-7D8AC834DB8B 2014-02-08 10:01:36
316 Blade BL-2036 TRUE FALSE NA 800 600 0 0 NA NA NA NA NA 2008-04-30 NA NA E73E9750-603B-4131-89F5-3DD15ED5FF80 2014-02-08 10:01:36
317 LL Crankarm CA-5965 FALSE FALSE Black 500 375 0 0 NA L NA NA NA 2008-04-30 NA NA 3C9D10B7-A6B2-4774-9963-C19DCEE72FEA 2014-02-08 10:01:36

Writing To Database

head(ToothGrowth)
len supp dose
4.2 VC 0.5
11.5 VC 0.5
7.3 VC 0.5
5.8 VC 0.5
6.4 VC 0.5
10.0 VC 0.5
dbWriteTable(con, "ToothGrowth", ToothGrowth)
dbListTables(con)[3:5]
‘ErrorLog’, ‘mtcars’, ‘ToothGrowth’
dbListFields(con, "ToothGrowth")
‘len’, ‘supp’, ‘dose’
head(dbReadTable(con, "ToothGrowth"))
len supp dose
4.2 VC 0.5
11.5 VC 0.5
7.3 VC 0.5
5.8 VC 0.5
6.4 VC 0.5
10.0 VC 0.5
dbDisconnect(con)

dbDisconnect(conn, ...) closes the connection, discards all pending work, and frees resources (e.g., memory, sockets).