SAS Code Migration into Open-Source Language by Using Azure OpenAI

In the world of programming, the ability to convert code from one language to another is a valuable skill. This skill becomes even more crucial if you are aiming to modernize your legacy and migrate from one programming language to another. With the advent of Azure OpenAI GPT-4, the possibilities for code conversion have reached new heights. In this blog post, we will explore the incredible capabilities of Azure OpenAI GPT-4 to effortlessly convert code from SAS to Python:

  • Understanding SAS and Python: Before diving into the capabilities of Azure OpenAI GPT-4, let's briefly touch upon the languages involved. SAS (Statistical Analysis System) is a widely used programming language for data management, statistical analysis, reporting and even machine learning. On the other hand, Python is a versatile and powerful language with a vast ecosystem of libraries for data analysis, machine learning, and more.
  • Challenges in Code Conversion: Manually converting code from SAS to Python can be a time-consuming and error-prone task. It requires a deep understanding of both languages and their syntax, which can be a barrier for individuals who are proficient in only one of them. Additionally, the differences in programming paradigms and data handling can further complicate the conversion process.
  • Leveraging Azure OpenAI GPT-4 Model: Azure OpenAI GPT-4 comes to the rescue with its advanced natural language processing capabilities and deep understanding of programming languages. It can seamlessly convert SAS code snippets to Python, reducing the manual effort and allowing users to focus on their other tasks rather than spending time on code translation.

       For example; let's start with SAS Data Step which names the data set, defines the variables and enters the data. In SAS, if the variable is string, it ends with $ sign, whereas numerical variables ends without $.

System Message: You are an assistant that converts SAS code into Python.

GPT-4 Model Parameters: Default values

User Message: data work.experience;

input employee $ experience;

datalines;

Ali 10

Elif 8

Muhammet 12

Magma 7

Kuyruk 4

Nazli 15

;

run;

GPT-4 Completion:

import pandas as pd
# Create a dictionary for the data
data = {
'employee': [‘Ali', ‘Elif', ‘Muhammet', ‘Magma', ‘Kuyruk', ‘Nazli'],
‘experience': [10, 8, 12, 7, 4, 15]
}
# Convert the dictionary to a pandas DataFrame
work_experience = pd.DataFrame(data)

yelizkilinc_1-1693123951883.png

 Testing the Python Code generated by GPT-4 using Azure Machine Learning Notebooks:

yelizkilinc_2-1693124119772.png

Evaluation: Without applying any prompt engineering techniques, just stating in the system message that ” You are an assistant that converts SAS Code to Python“, GPT-4 model effectively transformed the provided code into Python.

  • Syntax Adaptation: Azure OpenAI GPT-4 goes beyond a simple text replacement approach. It understands the semantics and structure of SAS code, allowing it to generate Python code that preserves the functionality and logic of the original SAS script. This ensures that the converted code works as expected and eliminates the need for extensive debugging.

For example; macro variables hold a value and can be repeatedly utilized within a SAS program through a specific syntax, the ampersand (&) sign.

System Message: You are an assistant that converts SAS code to Python.

GPT-4 Model Parameters: Default values

User Message:

/* Assign some macro variables */

%let var1 = test;

%let id = 39;

/* Print to log */

%put var1 = &var1.;

%put id = &id.;

/* Use the macro variables in a data step to create a 1 record table */

data test;

       m&id. = 333;

       v = “&var1.”;

run;

GPT-4 Completion:

# Assign some variables
var1 = ‘test'
id = 39
# Print to console
print(f'var1 = {var1}')
print(f'id = {id}')
# Use the variables to create a pandas DataFrame
import pandas as pd
data = {
f'm{id}': [333],
‘v': [var1]
}
test = pd.DataFrame(data)

yelizkilinc_0-1693125749818.png

 Testing the Python Code generated by GPT-4 using Azure Machine Learning Notebooks:

yelizkilinc_1-1693125868391.png

Evaluation: Without requiring any extra prompts, GPT-4 managed to convert SAS code containing macro variables with different syntax into accurate Python while also providing a clear explanation of the code and the difference between SAS code and python for this specific code snippet.

  • Handling SAS-specific Features: SAS has its own set of functions, procedures, and data handling techniques.GPT-4 can handle them smoothly without advanced prompt engineering.

For example; HP4SCORE Procedure is utilized for scoring the binary model file in SAS.

System Message: You are an AI assistant that converts SAS code to Python.

GPT-4 Model Parameters: Default values

User Message:

proc hp4score data=TestScore;

score file=”testModel.sav” out=Scoreout;

run;

proc print data=Scoreout;

run;

GPT-4 Completion:

import pandas as pd
import pickle
# Load the data
TestScore = pd.read_csv(‘TestScore.csv') # assuming your data is in CSV format
# Load the model
with open(‘testModel.pkl', ‘rb') as file:
model = pickle.load(file)
# Score the model
Scoreout = model.predict(TestScore)
# Print the scored data
print(Scoreout)

yelizkilinc_0-1693127680801.png

Evaluation:

In GPT 4 completion, you can observe that it begins by explaining the HP4SCORE procedure, and then compares its capabilities with those of Python capabilities. Also, it suggests suitable python libraries. That means, GPT-4 automatically applies Chain of Thoughts (CoT) technique without requiring any prompt engineering. Once it thinks and decides what to do by CoT, it generates the equivalent Python code for the provided SAS code. After reviewing the generated response, you have the option to modify your initial prompt or apply few-shot learning technique or use CoT technique with reasoning to control GPT-4 as an iterative process. In this example, I want GPT-4 to use “predict_proba” function instead of “predict” so I'll add my request into prompt like below:

System Message: You are an AI assistant that converts SAS code to Python. During the code translation, use predict_proba function instead of predict function.

For the same user message, GPT-4 generated the below code by just adding one sentence in natural language and you managed to control it.

yelizkilinc_0-1693202474479.png

  • Limitations and : While Azure OpenAI GPT-4 is a powerful tool for code conversion, it is essential to understand its limitations. Therefore, expert review might be necessary for complex SAS programs. 

Conclusion: Azure OpenAI GPT-4 is revolutionizing the way code conversion is approached as happening to converting SAS code to Python. Its advanced natural language processing capabilities and code conversion skills make it a game-changer for programmers. By automating the code conversion process, Azure OpenAI GPT-4 liberates professionals from the burden of manual translation. You can build a web app and build your own modernization tool for any code translation use cases, not only for SAS! Embrace the power of Azure OpenAI GPT-4 and unlock a world of possibilities for seamless code conversion!

 

This article was originally published by Microsoft's Azure AI Services Blog. You can find the original article here.