How to create a Responsible AI dashboard to debug AI models (Part 3)

In the last tutorial, we trained a model, to predict diabetes patient hospital readmission, that we will be using to analyze and identify issues from the Azure Machine Learning's Responsible dashboard. In this tutorial, we'll learn create a Responsible AI (RAI) dashboard with its python SDK. We will show you use tools such as Model Overview, Error Analysis, Data Explorer, Fairness Assessment, Model Interpretability, Casuals and Counterfactuals to discover and solve issues with the model or data. Furthermore, we'll show how data scientists and stakeholders can better communicate a model's performance and behavior by using the RAI scorecard, which is a generated PDF summary report from insights gained from assessing and mitigating issues from the model.

dashboard-functions.png

Background

Just as with software application development, models need to be debugged for errors and inaccuracies. The RAI dashboard is built on the latest open-source tools developed by the leading academic institutions and organizations including Microsoft. These tools have been instrumental for data scientists and AI developers to better understand model behavior, discover and mitigate undesirable issues from AI models using ErrorAnalysis, InterpretML, Fairlearn, DiCE, and EconML. These tools assist the debugging process by analyzing whether/why a model has made a mistake, whether/why the model is unfair to some groups of people compared to others, what data features are contributing to the overall model error rates and predictions, explore alternative outcomes of a model through counterfactuals and assessing “What if?” scenarios…if some features of the datapoints are changed.

We will see how this essential information can be extracted from using the interactive and easy to use RAI dashboard to help data scientists and AI developers analyze and evaluate their models in a centralized user interface in the Azure ML studio.

Prerequisites

This is Part 3 of a tutorial series. You'll need to complete the prior tutorial(s) below:

RAI Components

After completing the last tutorial, you should have the trained model registered as a component stored in Azure Machine Learning studio. We can now configure the RAI components and pipeline with the registered model and the stored dataset.  The first thing we need is to get all the RAI components that we need to analyze and debug our Diabetes Hospital Readmission model. Note: These RAI insights components were already available in the Azure ML workspace in Part 1 of this tutorial series.

When you call the ml_client.component.get function, verify that the name matches the register component name in the workspace. Since the RAI Dashboard will need our trained model, we'll also be getting the model artifacts from the latest registered model in the prior tutorial. Another component that is required is the rai_insights_constructor, because it provides details about your model for the other RAI insight components to know how to use the information to analyze the model and data optimally. You can pick and choose the rest of the RAI insight components based on your use case, but we are going to choose all the RAI insights to get a holistic evaluation of our model.

label = "latest"
rai_constructor_component = ml_client_registry.components.get(
    name="microsoft_azureml_rai_tabular_insight_constructor", label=label
)
# We get latest version and use the same version for all components
version = rai_constructor_component.version
rai_counterfactual_component = ml_client_registry.components.get(
    name="microsoft_azureml_rai_tabular_counterfactual", version=version
)
rai_causal_component = ml_client_registry.components.get(
    name="microsoft_azureml_rai_tabular_causal", version=version
)
rai_explanation_component = ml_client_registry.components.get(
    name="microsoft_azureml_rai_tabular_explanation", version=version
)
rai_erroranalysis_component = ml_client_registry.components.get(
    name="microsoft_azureml_rai_tabular_erroranalysis", version=version
)
rai_gather_component = ml_client_registry.components.get(
    name="microsoft_azureml_rai_tabular_insight_gather", version=version
)
rai_scorecard_component = ml_client_registry.components.get(
    name="microsoft_azureml_rai_tabular_score_card", version=version
)

RAI Scorecard config

The RAI scorecard is a useful report for providing transparency about the model and data quality records. It's a report that is generated in a multi-sectional pdf file, covering information about model performance, error profile, explanations, and fairness as well as some data stats. This is not just useful for data scientists and business decision makers, but compliance auditors as well. To use a scorecard, you will need to configure it based on your use case. For our Diabetes Hospital Readmission model, we start by specifying the Model name; use classification as our ModelType, and a summary of what our model does.

Next under the Metrics section, you can list the metrics you would like to report in the scorecard for the classification such as accuracy, precision, recall or f1 score. For our Diabetes Hospital Readmission model, we are interested in the accuracy score. You can also specify a threshold score on your selected performance metrics, which leads to a warning sign, if your model hasn't satisfied the threshold. For this example, we set a threshold of “equal to or greater than 0.80” on the model accuracy. We need the RAI dashboard to show us the top 10 feature columns that are influencing the model's predictions as well. NOTE: You will use different metrics if you have a Regression use-case (E.g., mse, mae, r2 etc).

For the Data Explorer field, we need a deep dive data analysis for the feature columns in our dataset that we think will be the key determining factors of a diabetic patient returning back to a hospital within 30 days of being discharged. So, we are going to assume the “Age”, “Prior_inpatient” and “Time_in_hospital” from our data that will drive our model's prediction. Finally, we want to see how fair our model's predictions are and if there are any performance or prediction outcome disparities among certain groups. To do this, we need to list feature columns in our dataset that we consider sensitive features. In our case, we pick “Age, Race, Gender”. We also need to specify our fairness metric of choice. Here we specify ratio disparity in accuracy, which calculates the following value: (subgroup with the minimum accuracy) / (subgroup with the maximum accuracy)

import json
score_card_config_dict = {
    "Model": {
        "ModelName": "Diabetes hospital readmission",
        "ModelType": "Classification",
        "ModelSummary": "This model predicts whether a diabetic patient will be readmitted back to hospital within 30 days"
    },
    "Metrics" :{
        "accuracy_score": {
            "threshold": ">=0.80"
        },
        "precision_score": {}
    },
    "FeatureImportance": {
        "top_n": 10
    },
    "DataExplorer": {
        "features": [
        "time_in_hospital",
        "prior_inpatient",
        "age"
        ]
    },
    "Fairness": {
        "metric": ["accuracy_score"],
        "sensitive_features": ["age", "race", "gender"],
        "fairness_evaluation_kind": "ratio"
    }
}
score_card_config_filename = "rai_hospital_readmission_score_card_config.json"
with open(score_card_config_filename, 'w') as f:
    json.dump(score_card_config_dict, f)

See tips on how to configure your RAI scorecard. NOTE: some of the configuration for a classification vs regression problem are different.

After you've configured the information in the scorecard. Save the configuration as a JSON file in your local file directory.

RAI Pipeline

When you have specified the RAI Insight components you need and configured your score card, it is time to define an Azure pipeline and config each RAI component. For the list of settings needed to configure each of your RAI components, refer to component parameters.

  1. Pipeline Inputs

The scorecard path needs to be an input parameter into our RAI pipeline, so we'll use the Input class to create an input object for the pipeline. The path field is a pointer to the local or cloud location of the file.

score_card_config_path = Input(
    type="uri_file",
    path=score_card_config_filename,
    mode="download"
)
  1. Define Pipeline

To define the pipeline for the RAI Dashboard, declare the experiment name, description and the name of the compute server that will be running the pipeline job. We use the dsl.pipeline annotation above the pipeline function to specify these fields.

Next is to define the pipeline function and input parameters that the pipeline will be running. For our Diabetes Hospital Readmission use case, the inputs for the function will be the target column name, training dataset, testing dataset and the path of scorecard config file (See scorecard input object above) that was stored locally.

@dsl.pipeline(
        compute=compute_name,
        description="RAI computation on hospital readmit classification data",
        experiment_name=f"RAI_hospital_Classification_RAIInsights_Computation_{model_name_suffix}",
    )
def rai_classification_pipeline(
        target_column_name,
        training_data,
        testing_data,
        score_card_config_path
    
  1. Create the dashboard constructor

The RAI constructor component is what initializes the global data needed for the different components for the RAI dashboard.

  • It needs the “title” for the dashboard.
  • Our Diabetes Hospital Readmission is a classification use-case, so we'll set the “task_type” to classification.
  • The “model_info” is set to the model output path we get from the registered_model component.
  • The “model_input” is set to the MLFlow model input.
  • The “train_dataset” is set to the training dataset location registered to Azure ML studio.
  • The “test_dataset” is set to the testing dataset location registered to Azure ML studio.
  • The “target_column_name” is set to the target column our model is trying to predict.
  • The “classes” is set to the label values for the column our model is trying to predict (e.g., ‘not readmitted' vs ‘readmitted').
  • The “categorical_column_names” is set to all the columns in our dataset that have non-numeric values.
        # Initiate the RAIInsights
        create_rai_job = rai_constructor_component(
            title="RAI Dashboard",
            task_type="classification",
            model_info=expected_model_id,
            model_input=Input(type=AssetTypes.MLFLOW_MODEL, path=azureml_model_id),
            train_dataset=training_data,
            test_dataset=testing_data,
            target_column_name=target_column_name,
            #classes=json.dumps(['not readmitted', 'readmitted']),
            categorical_column_names=json.dumps(categorical),
        )
        create_rai_job.set_limits(timeout=120)
  1. Explanation component

The explanation component is responsible for the dashboard providing a better understanding of the model's behavior. The dashboard provides both Global and Local explanations. For example:

  • Global Explanation: What are the top features driving the overall model prediction whether patients will be Readmitted or Not Readmitted back to a hospital within 30 days? 
  • Local Explanation: Why did patient X get readmitted in the hospital in less than 30 days? 

The configuration for the component is a comment pertaining to your use case. Then set the “rai_insights_dashboard” to be the output insights generated from the RAI pipeline job for Explanations.

        # Add an explanation
        explain_job = rai_explanation_component(
            comment="Explanation for hospital remitted less than 30days  classification",
            rai_insights_dashboard=create_rai_job.outputs.rai_insights_dashboard,
        )
        explain_job.set_limits(timeout=120)
  1. Error Analysis

The Error Analysis component is responsible for the dashboard providing an error distribution of the feature groups contributing to the error rate of the model. For instance, which feature computations need tuning & are negatively affecting the model prediction performance. It can further showcase the blindspots of your model in terms of performance. Errors are often not distributed evenly across different data subgroups and Error Analysis helps you identify data cohorts with higher error rates. For the scope of this tutorial, we will just set the “rai_insights_dashboard” to be the output insights generated from the RAI pipeline job for the overall and feature error rates.

        # Add error analysis
        erroranalysis_job = rai_erroranalysis_component(
            rai_insights_dashboard=create_rai_job.outputs.rai_insights_dashboard,
        )
        erroranalysis_job.set_limits(timeout=120)
  1. Counterfactual/What-If

The Counterfactual component consists of two functionalities for better model behavior hypothesis:

  • Counterfactual Analysis: Generating a set of examples with minimal changes to a given point such that they change the model's prediction (i.e. showing the closest datapoints with opposite model precisions). For example, if a patient shows a high prediction of being readmitted within 30 days to a hospital, the counterfactual will show decision-makers which features in the dataset if changed can make the model change the prediction for the patient not being readmitted. The “total_cfs” shows how many counterfactual points to generate for each datapoint represented in the test data. The “desired_class” indicates that counter outcome of a predication.
  • What-if Analysis: Enabling interactive and custom what-if perturbations for individual data points to understand how the model reacts to feature changes. For example, what if the amount of time in the hospital for a patient was increased, and the number of procedures increased? Is the patient likely to not be readmitted with 30 days?
        # Add counterfactual analysis
        counterfactual_job = rai_counterfactual_component(
            rai_insights_dashboard=create_rai_job.outputs.rai_insights_dashboard,
            total_cfs=10,
            desired_class='opposite',
        )
        counterfactual_job.set_limits(timeout=600)
  1. Package RAI insights components

Once all the RAI components are configured with the parameters needed for the use case, the next thing to do is add all of them into the list of insights to include on the RAI dashboard. Then upload the dashboard and UX settings for the RAI Dashboard.

        # Combine everything
        rai_gather_job = rai_gather_component(
            constructor=create_rai_job.outputs.rai_insights_dashboard,
            insight_1=explain_job.outputs.explanation,
            insight_2=causal_job.outputs.causal,
            insight_3=counterfactual_job.outputs.counterfactual,
            insight_4=erroranalysis_job.outputs.error_analysis,
        )
        rai_gather_job.set_limits(timeout=120)

  1. Scorecard component

The scorecard needs to be formatted and customized into a PDF report so that stakeholders or decision makers can read it. To make this contain the RAI insights and layout, we need to add the definitions to the scorecard component. “pdf_generate_config” is set to the path of the scorecard config JSON file that contains information to include in the report. The “dashboard” is set to the dashboard's output analysis.

        rai_scorecard_job = rai_scorecard_component(
            dashboard=rai_gather_job.outputs.dashboard,
            pdf_generation_config=score_card_config_path
        )
  1. Pipeline outputs

When the pipeline job has completed running, it will return the dashboard, UX config, and the Scorecard to be displayed.

        return {
            "dashboard": rai_gather_job.outputs.dashboard,
            "ux_json": rai_gather_job.outputs.ux_json,
            "scorecard": rai_scorecard_job.outputs.scorecard
        }

Run RAI Dashboard pipeline job

After the pipeline is defined, we'll initialize and run it. To configure the pipeline, we pass our Diabetes Hospital Readmission input parameters in the pipeline's python code. We use the Output class to specify the path to the pipeline outputs. And finally, we use the submit_and_wait function to run the pipeline and register it to the Azure ML workspace.

import uuid
from azure.ai.ml import Output
# Pipeline to construct the RAI Insights
insights_pipeline_job = rai_classification_pipeline(
    target_column_name=target_column,
    training_data=hospital_train_parquet,
    testing_data=hospital_test_parquet,
    score_card_config_path=score_card_config_path,
)
# Workaround to enable the download
rand_path = str(uuid.uuid4())
insights_pipeline_job.outputs.dashboard = Output(
    path=f"azureml://datastores/workspaceblobstore/paths/{rand_path}/dashboard/",
    mode="upload",
    type="uri_folder",
)
insights_pipeline_job.outputs.ux_json = Output(
    path=f"azureml://datastores/workspaceblobstore/paths/{rand_path}/ux_json/",
    mode="upload",
    type="uri_folder",
)
insights_pipeline_job.outputs.scorecard = Output(
    path=f"azureml://datastores/workspaceblobstore/paths/{rand_path}/scorecard/",
    mode="upload",
    type="uri_folder",
)
# submit pipeline
insights_job = submit_and_wait(ml_client, insights_pipeline_job)

To monitor the progress of the pipeline job, click on the Jobs icon from the Azure ML studio. By clicking on the pipeline job, you can get the status.

azureml_jobs_page.png

To visualize the individual progression of each of the components in the pipeline, click on the pipeline name. This gives you a better view of which components are completed, pending, or failed.

rai_dashboard_pipeline.png

View the RAI Dashboard

After the RAI dashboard pipeline job has successfully completed, click on the “Models” tab of the Azure ML studio to find your registered model. Then, select the name of the model you generated from the train model tutorial.

model-list.png

From the Model details page, click on the “Responsible AI” tab. Then select the name of the dashboard name.

model-details.png

(Optional) From the python code you can also create a link in the format below using your model id, subscription id, workspace name and resource group to view the dashboard.

sub_id = ml_client._operation_scope.subscription_id
rg_name = ml_client._operation_scope.resource_group_name
ws_name = ml_client.workspace_name
expected_uri = f"https://ml.azure.com/model/{expected_model_id}/model_analysis?wsid=/subscriptions/{sub_id}/resourcegroups/{rg_name}/workspaces/{ws_name}"
print(f"Please visit {expected_uri} to see your analysis")

Terrific…you now have an RAI dashboard.

rai-dashboard.gif

Useful Tip: Refer to the UI overview to fully understand how to use all the settings and controls on the RAI dashboard.

Now you can start identifying model errors using Error Analysis on the RAI dashboard!

Stay tuned for Part 4 of the next tutorial…

 

This article was originally published by Microsoft's AI - Machine Learning Blog. You can find the original article here.