SQL Server 2016 has introduced a very important service for data science with SQL Server and this was the integration of the R language and R scripts into the SQL Server. With the release of SQL Server 2016, this feature was still named R Services, as the R Server was also available as a standalone product or add-on to other programs or frameworks. With SQL Server 2017, Microsoft has added the support for the Python language too, so you can choose from both major languages used in data science.
Neither of the languages are installed by default, so you need to include any of them during the installation, where you can choose the proper environment as a part of Machine Learning Services, which is the new name used since the 2017 release:
Installing languages R and Python
Machine Learning Services can be used to perform different work with the data stored or incoming to the database than we can regularly see on DBMS systems. As a common machine learning approach, with SQL Server, Machine Learning Services can be used to predict properties of new data by learning from a sample of data stored in the database. With Microsoft SQL Server 2016, the only way to run the R script inside of the database was to use a new stored procedure, sp_execute_external_script, which used, as one of the parameters, the R script that you would like to execute. With SQL Server 2017, you can also use a new function called PREDICT, which can be used for native scoring on the data, assuming that you have built and trained the models for machine learning before using that function.
Even if you install Machine Learning Services, running external scripts in R or Python is disabled by default. If you would like to enable the feature, you can use the following T-SQL code to make the configuration change on your instance:
sp_configure 'external scripts enabled',1 GO RECONFIGURE GO
When you use Python, you can use either Visual Studio, Visual Studio Code, Management Studio, or any other application used for Python development. However, you always need to consider which version of Python is being used with Machine Learning Services, as Python 2.x and 3.x differ a little. Microsoft is using Continuum Analytics Anaconda distribution of Python in Machine Learning Services; specifically, the version of Python 3.5.2 is being used at the time of writing this book.
To check which version of Python you are running, you can use following T-SQL code:
Important things to notice are the language parameter, which accepts two languages (either R or Python), marked with the N sign for unicode, and the script variable that contains the script itself, which is to be executed. If R is your language of choice, you can use a wide variety of tools available for development, if you keep in mind that version of R 3.3.3 is used with SQL Server. Using external tools for development can be more flexible and comfortable, compared with SQL Server Management Studio, but keep in mind that if you use any package that is not part of the SQL Server deployment, you'll need to do extra steps to install the package on the SQL Server, either for Python or for R.
As for Python, Microsoft has chosen the Anaconda distribution. The story with R language is different. R language was developed back in the 1980s, where it became a significant language used by data science and the academics community from the 1990s. When R become very popular and widely used, Microsoft joined the R consortium to help develop the R language and get better support for it. In 2015, Microsoft acquired a company named Revolution Analytics and rebranded many of Revolution Analytics's products. These included:
Microsoft R Server
Microsoft R Open
Data Science VM (available in Azure)
As a part of the effort with R language, Microsoft has created its own package repository called MRAN, available at https://mran.microsoft.com/download
Microsoft Open R is based on the R language release, but includes numerous improvements for performance, platforms, and scalability, while keeping full compatibility with packages available on the CRAN repository, the official R package repository for the R project, which is available at https://cran.r-project.org/.
While Microsoft R Open is free and is available for download for numerous platforms, Microsoft has also rebranded Revolution R Enterprise into two different products. One stream went into the SQL Server as R Services or Machine Learning Services in 2017 release and the other is available as a standalone installation for different operating systems and frameworks, such as the following: