This post is authored by Nagesh Pabbisetty, Partner Director of Program Management at Microsoft
Expert data scientists are adopting Advanced Analytics (AA) and Machine Learning (ML) at a rapid pace. This pace can be significantly increased when enterprise-grade AA and ML are available within environments where the customers’ data is, infusing intelligence into mission-critical applications is made much easier and, enterprises can turn to a single vendor to make the world of AA and ML synthesized and supported with the SLAs they have come to expect. At Microsoft, our mission has been to make this vision of ambient intelligence a reality for our customers. We took the first step with Microsoft R Server 9.0, and this follow on release includes significant innovations such as:
- New machine learning enhancements and inclusion of pre-trained cognitive models such as sentiment analysis & image featurizers
- SQL Server Machine Learning Services with integrated Python in Preview
- Enterprise grade operationalization with real-time scoring and dynamic scaling of VMs
- Deep customer & ISV partnerships to deliver the right solutions to customers
- A panoply of sources to help you get started with ease
You can immediately download Microsoft R Server 9.1 from MSDN and Visual Studio Dev Essentials. It comes packed with tons of value built on top of the latest open source R engine that makes R enterprise-class. Also check out R Client for Windows and R Client for Linux.
State of the Art Machine Learning
- 1 State of the Art Machine Learning
- 2 Enterprise-Grade Operationalization
- 3 SQL Server R Services
- 4 SQL Server Machine Learning Services – Python Preview
- 5 Customer & ISV Partnerships
- 6 Getting Started
- 7 In Summary
Bring Machine Learning to where your data is
With Microsoft R Server 9.0 release, we provided Machine Learning algorithms battle-tested by Microsoft as MicrosoftML package, available as a part of SQL Server R Services and Microsoft R Server 9.0 on Windows. We have now made these MicrosoftML algorithms portable and distributed to run on Linux, Windows, and the most popular distributions of Hadoop — Cloudera, Hortonworks, MapR, in addition to SQL Server 2016: Fast linear with L1 and L2 regularization, Fast boosted decision tree, Fast random forest, Logistic regression, with support for L1 and L2 regularization, GPU-accelerated Deep Neural Networks (DNNs) with convolutions, Binary classification using a One-Class Support Vector Machine. This blog demonstrates the use of Microsoft ML algorithms on Hadoop and Spark.
Pre-trained Cognitive Models
We make it easy for enterprises to infuse intelligence into their Line of Business (LOB) applications. Conventional methods require significant investments of time and effort to hand-craft Machine Learning models from scratch. Harnessing decades of work on cognitive computing in the context of Bing, Office 365 and Xbox, we are delivering the first installment of pre-trained cognitive models that accelerate time to value. Further, these models can be re-trained with your data and optimized for your business.
We now offer a Sentiment Analysis pre-trained cognitive model, using which you can assess the sentiment of an English sentence/paragraph with just a few lines of code. With the Image Featurizer pre-trained cognitive model, you can derive up to 5,000 features on a given image, and use that to compare similarity between two images. This blog shows you how to benefit from the power of image featurizers and more details of Sentiment Analysis are covered in this blog.
Combining the best of Microsoft Innovation and Open Source
We are delivering on the promise of embracing the best of open source, and pairing it with the best of Microsoft innovation. With this release, within the same R script, you can mix and match functions from RevoScaleR and Microsoft ML packages with popular open source packages like SparklyR and through it, H2O. Refer to this blog for examples on how to get the best of both worlds!
Optimized Algorithm for Pleasingly Parallel
One of the most popular advanced analytics use cases is Pleasingly Parallel where you run massively parallel computations on partitions that are grouped by one or more attributes. These embarrassingly parallel use cases are common across industries:
- Life sciences simulations to identify the best drug for a given situation
- Portfolio analysis to identify the right investment for each portfolio
- Utilities to forecast energy consumption for each cohort
- Shipping to forecast demand for various container types
We have generalized the pattern and provided a highly performant, simple, and flexible RxExecBy() function within RevoScaleR, to address these use cases. Furthermore, this function is portable across all platforms that support Microsoft R Server — Windows, Linux, Hadoop, SQL Server. More details on how to choose the best algorithm for Pleasingly Parallel use-cases are available here.
This release also includes support for Optimized Row Columnar (ORC) file format which provides a highly efficient way to store Hive data, and distributed merge for Spark compute context, RxMerge().
We recognize that easy, secure, and high-performance operationalization is essential for Tier-1 enterprises, at scale, to derive maximum value from their analytics investments. Microsoft R Server 9.1 release continues strengthening the power of operationalization. See this blog for more details.
- Real time web services: realize 10X to 100X boost in scoring performance, scoring speeds at <10ms. Currently on Windows platform; other platforms will be supported soon.
- Role Based Access Control: enables admins to control who can publish, update, delete or consume web services
- Asynchronous batch processing: speed up the scoring performance for the web services with large input data sets and long-running jobs
- Asynchronous remote execution: run scripts in background mode on a remote server, without having to wait for the job to complete
- Dynamic scaling of operationalization grid with Azure VMs: easily spin up a set of R Server VMs in Azure, configure them as a grid for operationalization, and scale it up and down based on CPU / Memory usage
SQL Server R Services
The innovations in Microsoft R Server 9.1 are available to SQL Server 2016 customers; an easy upgrade of R services in SQL Server 2016 as described in this doc and in this blog post, is all you need. The machine learning and pleasingly parallel enhancements listed in the previous section are fully supported on SQL server as well. SQL Server is the first database in the world that has in-database Machine Learning!
With R Services in SQL Server 2016, we set the industry benchmark for high throughput scoring at 1 Million predictions per second. Now, we have improved single row scoring performance significantly, up to two orders of magnitude better than earlier versions. Real-time scoring is supported on models trained using both RevoScaleR and MicrosoftML algorithms & transforms. With this release, SQL Server understands these models natively and scores inputs without the need of R interpreter and associated overhead, delivering significantly better performance.
Flexible R package management
In 9.0.1 release of Microsoft R Server we added functionality in RevoScaleR package that enables users to install, uninstall and manage packages on SQL Server without requiring administrative access to the SQL Server machine. Data scientists and other non-admin users can install packages in specific databases, user or group scope. In this release, we have added the rxSyncPackages API to ensure that the user-installed packages are not lost if the SQL Server node goes down or if the database is migrated. The list of packages and the permissions is maintained in a server table and this API ensures that the required packages are installed on the file system.
SQL Server Machine Learning Services – Python Preview
SQL Server 2016 brought you in-database analytics with SQL Server R Services. With CTP 1 of SQL Server 2017, MicrosoftML provided in-database Machine Learning. CTP 2.0 of SQL Server 2017 brings you SQL Server Machine Learning Services that embraces both R and Python. Data Scientists can now choose from a huge collection of analytics and machine learning algorithms across R and Python communities to execute in-database and get their job done much more effectively. CTP 2.0 enables collaboration between traditional data scientists with strong R backgrounds and computer scientists with strong Python backgrounds, to deliver the best business ROI.
Additionally, the real-time scoring and flexible package management functionality listed above for SQL Server R Services is also available in the CTP2 release as part of Machine Learning services.
Customer & ISV Partnerships
Engaging with Customers
“Working with Microsoft R Server for our data science needs at eToro has been a key factor in our success. The tools are appropriate for all levels of data scientist skills from beginners to seasoned professionals. Using Microsoft R Server, we were able to quickly run large scale statistical simulations in a distributed manner that ensured the robustness of our machine learning models. This partnership was instrumental in meeting our business goals and we look forward to using the continuing innovation coming out on Microsoft R Server!” — Moti Goldklang, Director of Trading Systems, eToro.
We are committed to finding more ways for our customers to connect with us, to understand how to get the most out of their investments and provide feedback to influence product direction. We offer a variety of ways customers can engage closely with Microsoft and provide product feedback.
User Voice: As a customer focus team, we are interested in listening to your feedback and to help us steer our product capability we are launching User Voice for Microsoft R today. You can partake in discussion and cast your vote on features that you’d like to see us enable. We are listening!
Authoring Tools from Microsoft and Partners
I am happy to announce that we have a number options to help you develop Microsoft R based applications, both from Microsoft and from our partners. R Tools for Visual Studio (RVTS) is now Generally Available, and brings support for Microsoft R into Visual Studio. In addition, we also have Python Tools for Visual Studio (PVTS) for your Python development. In addition, we have worked with MicroStrategy, Alteryx and KNIME, and, augmented open source Rattle, to give you more choices.
Microsoft has been contributing to the R Community to ensure that there is an open source WYSIWYG tool to do big data analytics in the community. We have enhanced the popular Rattle package to support Microsoft R Server capabilities. You can download the latest, and stay abreast with the developments here.
With Alteryx Designer 11.0, a self-service analytics workflow tool from Alteryx, business analysts and data scientists can work with Microsoft R Server and SQL Server R Services. In the words of Neil Ryan, Product Manager at Alteryx, “At Alteryx we’re acutely aware of the challenge of getting faster insights from very large datasets. When it comes to computation-intensive machine learning, it’s even more important to leverage existing hardware resources and keep the processing as close as possible to where the data lives. That’s why we’re excited about our partnership with Microsoft. By leveraging Microsoft R Server and SQL Server’s in database analytics, our customers are scaling their analytics to the size of their data through a consistent, code-free, drag and drop interface for both data preparation and modeling within SQL Server.” More details are in this blog post.
Microsoft and KNIME have partnered to bring Microsoft R capabilities to the KNIME platform. “KNIME has added the option to reach out to Microsoft R from KNIME Analytics Platform to make a scalable and enterprise ready R integration part of any KNIME workflow,” says Michael Berthold, CEO of KNIME. Here is an example of how this works and you can see it in action here.
MicroStrategy has made Microsoft R runtime accessible from MicroStrategy Desktop. “MicroStrategy is embracing Microsoft R in our analytics platform tools to bring the power of advanced analytics and machine learning to our customers. We just announced this at MicroStrategy World and you can read more about this here,” says Sandipto Banerjee, VP Data Group & Advanced Technologies, MicroStrategy.
The best place to get started is our comprehensive documentation site, which introduces concepts, platforms, features, code samples, and how-tos. Our vibrant blogs include R Server Blog that was launched earlier this year on all thing R, R Tiger Team which covers deep technical insights on Microsoft R Server, and the Revolutions R Blog which highlights both Microsoft R and open-source R innovations. Together, these blogs provide a plethora of articles, tips and tricks for novices and experts alike. I welcome you to check these out and leave us your comments.
Check out the free Data Science with Microsoft SQL Server 2016 eBook that covers what’s new, installing & configuring R Services, and how to develop full applications through walkthroughs.
Want to get certified and show your mastery in data science? We have your covered via several courses at Microsoft LearnAnalytics and Microsoft Academy! We have several training partners that can help you train your teams on advanced analytics and machine learning!
Check out the R Solution Templates that will walk you through how to develop a solution using Microsoft R Server, from beginning to end. In addition, with the click of a button, you can deploy these templates to an AzureVM and see the entire application in action. You can follow the links to github and use the code as a starting point for your own solution, and accelerate time to value!
In our last release, we provided a Solutions Template for Campaign Optimization using SQL Server R Services. Now, we have added a solution template for the Azure HDInsight platform on Spark compute context. In the words of Anindya Palit, EVP Affine Analytics, “Partnering with Microsoft allowed Affine’s extensive analytics experience in marketing to be transformed into a solution for optimizing lead generation through Campaign Optimization. We were able to quickly ramp up and build the solution utilizing the power of R Services within SQL Server.”
Hospital Length Of Stay (LOS) is the latest solution template built on SQL Server R Services. Dr. Greg Mckelvey, Head of Clinical Insights, KenSci, says “The Hospital ‘Length of Stay Prediction’ solution shows how you can build a potentially life-saving machine learning solution by leveraging the power of R within SQL Server. By predicting how long an admitted patient is likely to stay at the hospital based on clinical history, labs and vital, the solution enables doctors and nurses to better manage patient flow and coordinate post-discharge patient care.”
Microsoft R Server 9.1 will be released as Azure VMs in Azure Marketplace, Data Science VMs, and on Azure HDInsight. VMs were available on CentOS 7.2 and Ubuntu 16.04. Now, we have added support for RHEL 7.2, and made all VMs available in China.
Microsoft R Client
With our current release, we are delivering Microsoft R Client on the Linux platform for the first time, in addition to Windows. R Client is available on all four popular flavors of Linux – RHEL, CentOS, Ubuntu, and SuSE. Please check out R Client for Windows and R Client for Linux.
I am proud of how we are making R enterprise-grade through Microsoft R portfolio of products and services, building on top of open source R in fully compatible ways. Adopting advanced analytics and machine learning requires a holistic approach that transcends technology, people and processes; we continue to deliver more handholding to ensure that enterprise users are set up for success! With the 9.1 release, you have in-database analytics and machine learning in a variety of platforms, develop powerful analytics models leveraging both open source and Microsoft innovation, deploy them at scale, and easily integrate into line-of-business systems to maximize ROI. We invite you to get started with Microsoft R Server 9.1.