Visual Studio Code: Develop PySpark jobs for SQL Server 2019 Big Data Clusters

Today were announcing the support in Code for SQL Clusters PySpark development and query submission. It provides complementary capabilities to Azure for data engineers to author and productionize PySpark jobs after data scientists data explore and experimentation. The Code Apache Spark and Hive extension enables you to enjoy cross platform and enhanced light weight Python editing capabilities. It covers scenarios around Python authoring, debugging, Jupyter Notebook integration, and notebook like interactive query.

With the Code extension, you can enjoy native Python programming experiences such as linting, debugging support, language service, and so on. You can run current line, run selected lines of code, or run all for your PY file. You can import and export a .ipynb notebook and perform a notebook like query including Run Cell, Run Above, or Run Below. You can also enjoy a notebook like interactive experience that includes your source code and markdown comments along with the running results and output. You can remove the unneeded sections, enter comments, or type additional code in the interactive results window. Moreover, you can visualize your results in a graphic format through a matplotlib like Jupyter Notebook. The integration with SQL Server 2019 Big Data Clusters empowers you to quickly submit a PySpark batch job to the cluster and monitor job progress.

Highlights of key features

  • You can link to : The toolkit enables you to connect and submit PySpark jobs to SQL Server 2019 Big Data Clusters.
  • Python editing: Develop PySpark applications with native Python authoring support (e.g. IntelliSense, auto format, error checking, etc.).
  • Jupyter Notebook integration: Import and export .ipynb files.
  • PySpark interactive: Run selected lines of code, or notebook like cell PySpark execution, and interactive visualizations.
  • PySpark batch: Submit PySpark applications to SQL Big Data Clusters.
  • PySpark monitoring: Integrate with the Apache Spark history server to view job history, debug, and diagnose Spark jobs.

How to install or update

First, install Visual Studio Code and download Mono 4.2.x for or Mac. Then get the latest Apache Spark and Hive tools by going to the Visual Studio Code extension repository or the Visual Studio Code Marketplace and searching for Spark.

For more information about Apache Spark and Hive tools for Visual Studio Code, please use the following resources:

If you have questions, feedback, comments, or bug reports, please use the comments below or send a note to hdivstool@microsoft.com.

The post Visual Studio Code: Develop PySpark jobs for SQL Server 2019 Big Data Clusters appeared first on SQL Server Blog.

 

This article was originally published by Microsoft's Azure SQL Database Blog. You can find the original article here.