Mastering AKS Troubleshooting #3: Kernel view and AKS Observability

Introduction

This blog post concludes the three part series that addresses common networking problems that may occur while working with Azure Kubernetes Service (AKS). Although AKS is a managed container orchestration service, issues can still arise, requiring .

The earlier blog post covered endpoint connectivity issues across virtual networks and port configuration problems with services and their associated pods. This article focusses on solving issues using Linux toolsets to get a kernel view of the Kubernetes layout and using Container Insights to view logging and diagnostics to take remedial actions.

Prerequisites

Before setting up AKS, ensure that you have the necessary Azure account and subscription permissions, as well as PowerShell installed on your system. Follow setup and scenario instructions found in this Github link. It's important to be familiar with inbound and outbound networking scenarios in AKS. The environment shown in the figure uses a custom VNet with an NSG attached to its subnet. Additionally, AKS uses this custom subnet and creates its own NSG attached to the Nodepool's Network Interface.

varghesejoji_0-1683354527983.png

Scenario 5: Using Linux toolset to analyze failed application

Objective: Within the , there are two applications running: one is functioning correctly, while the other is experiencing issues and causing ‘curl' to fail with a timeout. In this lab we will use tools available on the Linux node hosting the application to diagnose the problem with the faulty application.

Step 1: Set up the environment.

kubectl create ns student
kubectl config set-context --current --namespace=student
# Verify current namespace
kubectl config view --minify --output 'jsonpath={..namespace}'
  1. Enable Cloud Shell within Azure Portal. Select Bash option and set the storage and allow completion.
varghesejoji_1-1683354527984.png
From AKS blade > Overview > Connect, run the ‘az account..' and ‘az aks get-credentials..' commands in the Cloud Shell.
Verify kubectl commands work.
  • Download kubectl-node-shell using below steps. When executed it creates an Nsenter pod, which will have the advanced privileges to run iptables. This level of access is not available with debug pods to connect to Nodes.
curl -LO https://github.com/kvaps/kubectl-node-shell/raw/master/kubectl-node_shell
$ ./kubectl-node_shell
  • Clone solutions Github link and deploy PS scripts from Lab5 folder.
cd Lab5; .working.ps1

Scripts, as shown below, setup the deployment with 3 Pod replicas, and service of type Loadbalancer running on port 4000. They are both identical applications, except for the image. One deployment works and the other does not.

Working

apiVersion: v1
kind: Service
metadata:
name: working-app-clusterip
spec:
type: LoadBalancer
ports:
– port: 4000
protocol: TCP
targetPort: 4000
selector:
app: working-app

apiVersion: apps/v1
kind: Deployment
metadata:
name: working-app-deployment
labels:
app: working-app
spec:
replicas: 3
selector:
matchLabels:
app: working-app
template:
metadata:
labels:
app: working-app
spec:
containers:
– name: working-app
image: jvargh/nodejs-app:working
ports:
– containerPort: 4000

Faulty

apiVersion: v1
kind: Service
metadata:
name: faulty-app-clusterip
spec:
type: LoadBalancer
ports:
– port: 4000
protocol: TCP
targetPort: 4000
selector:
app: faulty-app

apiVersion: apps/v1
kind: Deployment
metadata:
name: faulty-app-deployment
labels:
app: faulty-app
spec:
replicas: 3
selector:
matchLabels:
app: faulty-app
template:
metadata:
labels:
app: faulty-app
spec:
containers:
– name: faulty-app
image: jvargh/nodejs-app:faulty
ports:
– containerPort: 4000
  • Make a note of the and External IPs associated with Faulty and Working apps.

varghesejoji_2-1683354527985.png

varghesejoji_3-1683354527985.png

  • Create pod for validation within Pod
kubectl run test-pod --image=nginx --port=80 --restart=Never
  •   Allow Inbound access through Custom NSG
$custom_aks_nsg="custom_aks_nsg" # <- verify>$nsg_list=az network nsg list `
--query "[?contains(name,'$custom_aks_nsg')].{ ResourceGroup:resourceGroup}" --output json
# Extract NSG Resource Group
$resource_group=$(echo $nsg_list | jq -r '.[].ResourceGroup')
echo $nsg_list, $nsg_name, $resource_group
az network nsg rule create --name AllowHTTPInbound `
--resource-group $resource_group --nsg-name $custom_aks_nsg `
--destination-port-range * --destination-address-prefix * `
--source-address-prefixes Internet --protocol tcp `
--priority 100 --access allow->
  • Validation Test
# Test internal access within cluster
kubectl exec -it test-pod -- curl working-app-clusterip:4000 # works
kubectl exec -it test-pod -- curl faulty-app-clusterip:4000 # fails with Connection refused
# Test external access to cluster
curl :4000 # works
curl :4000 # fails with `Unable to connect to the remote server`

Step 2: Walk through the Kubernetes view

Use this step to confirm that from Internet through to the Pod, Kubernetes setup is configured correctly.

varghesejoji_4-1683354527993.png

  1. Curl request hits the Public IP assigned to the service. Service IP get added as a Front End IP rule to the existing AKS Loadbalancer.
  2. Service ties into the endpoints by forwarding requests to the pods, and in turn to the Application container.

varghesejoji_5-1683354527995.png

Step 3: Verify Loadbalancer Insights and Metrics

From LoadBalancer blade, go to AKS LB > Insights. Ensure the Loadbalancer is functional and capturing metrics. From below you can  there's an issue with the backend pool.

varghesejoji_6-1683354527999.png

From Detailed metrics > ‘Frontend and Backend Availability' section, you should see the Failing app FE IP is Red for Availability but Working app FE IP is Green for Availability. Change ‘Time Range' to 5m.

varghesejoji_7-1683354528002.png

Step 4: Perform network trace to the Faulty app

Use this step to confirm if the Faulty app is even listening. We should see Working app responding but Faulty app does not.

# IP addresses listed below applies to this example,  for reference only. Replace with your own
test-pod-ip =       #10.244.0.61
working-app-svc =   #10.0.81.248
faulty-app-svc =    #10.0.189.236
working-pod-IPs =   #10.244.0.54, 55, 56
faulty-pod-IPs =    #10.244.0.57, 58, 59
  1. Get the test-pod IP and destination service IP and run tcpdump on the associated Node of the Pod
  2. Working app provides trace
kubectl exec -it test-pod – curl :4000
  1. From Cloud Shell in Azure Portal, run below command. Get node from ‘kubectl get pods -o wide'.
kubectl-node_shell 
  1. Setup trace from test-pod and the Pod network.
tcpdump -en -i any src  and dst net 10.244.0.0/16
  1. From another terminal, execute curl to Faulty and Working app's service.
kubectl exec -it test-pod -- curl :4000
kubectl exec -it test-pod -- curl :4000

Working App

varghesejoji_8-1683354528006.png

Faulty App

varghesejoji_9-1683354528008.png

From trace above, there's only response from Working App pod. No response from Faulty App pod.

Step 5: Collect tcpdump trace for Wireshark view

This section captures to file, copy from nsenter pod to local desktop where Wireshark will visualize the trace. Need two consoles.

  1. From Cloud Shell run kubectl-node_shell .  Run below commands.
cd /tmp
tcpdump -nn -s0 -vvv -i any -w capture.cap
where,
     -nn: display IP addresses and port numbers in numeric
     -s0: set snapshot=0 i.e., capture entire packet
     -l: output asap without buffering
     -vvv: max verbosity
  1. From 2nd console run below to view the HTML output
kubectl exec -it  -- curl 
  1. On Cloud Shell, break the tcpdump (CTRL+c) and capture.cap should be written to /tmp
  2. From 2nd console use below command to download capture.cap. Use ‘k get pod' to get nsenter pod name.
kubectl cp :/tmp/capture.cap capture.cap

# Wireshark will need to be installed for next step. Check this link for Win OS install

  1. Open capture.cap in Wireshark. Use below filter to refine view.
ip.addr ==  # might not need this "and ip.addr == "
  1. Use Analyze > Follow > HTTP Stream to view the HTTP flow as seen below

varghesejoji_10-1683354528011.png

varghesejoji_11-1683354528012.png

  1. For long running traces that need to be saved to storage account, use utility below. Helm install creates storage account and daemon set creates tcpdump Pods on all nodes, that continuously writes capture to storage account.

https://github.com/amjadaljunaidi/tcpdump/blob/main/README.md

Uninstall Helm chart to stop tracing and capture will be left intact in storage account.

  1. To just focus on one node than all nodes, as above, use Lab5 > tcpdump-pod.yaml.  Change node name and use below command. Storage account > file share should have tcpdump contents.
kubectl apply -f tcpdump-pod.yaml

View from Storage

varghesejoji_12-1683354528018.png

View from Pod

varghesejoji_13-1683354528020.png

On completion delete using “kubectl delete -f tcpdump-pod.yaml”. Delete storage account to delete file share.

Step 6: Walk through the Linux Kernel view

Use this step to confirm that from Linux Kernel level everything is configured correctly, allowing packets to flow. Also, it is not a issue since we have Working-App Pods able to be called from the Internet.

  1. Run below command.  This provides higher level privileges on the Node.
kubectl get pods -o wide # gives node name to use below
kubectl-node_shell
# IPTables has a chain structure. Managed by kube-proxy pods in cluster.
  1. View faulty app's iptable NAT table and show the KUBE-SERVICES chain, using below command to show the Services Internal and External IPs.
iptables -t nat -nL KUBE-SERVICES | grep faulty-app
  1. Walk down the chain by using below command below, which gives the Endpoints for the Service. Also gives selection probability of the Endpoint. Running this again gives on an Endpoint, gives the Pod IP associated with the Endpoint.
iptables -t nat -nL 

varghesejoji_14-1683354528027.png

In KUBE-SERVICES chain, if src=0.0.0.0/0 or ANY and Protocol=TCP then forward from KUBE-SERVICES to KUBE-SVC chain, as the next hop for incoming packet. KUBE-SVC represents the Service's IP.

In KUBE-SVC chain, if src=0.0.0.0/0 or ANY and Protocol=TCP then forward from KUBE-SVC to KUBE-SEP chain, as the next hop for incoming packet. KUBE-SEP represents the ENDPOINT. Notice there are 3 rules for 3 Pod's. Pod1 gets 1/3 traffic, Pod2 gets 1/2 traffic and rest goes to Pod3. This could affect latency due to statistical based on probability, especially around multi-zone balancing where Pod's are distributed across zones and latency caused by hops.

In KUBE-SEP chain, if IP= then direct incoming packet to the designated Pod.

  1. Validate route associated with Pod network and its eth0 interface. This route should map to AKS route table for Kubenet.

varghesejoji_15-1683354528028.png

This should validate the route in the route table for the Kubenet networking associated with the AKS.

varghesejoji_16-1683354528029.png

By using the ‘crictl ps' command, you can navigate through the running containers on a Kubernetes cluster and interact with the container runtime interface (CRI) to manage containers. Below provides containers labelled faulty.

crictl ps | grep faulty

varghesejoji_17-1683354528031.png

Map this to ‘kubectl get pods -o wide | grep faulty' to get a match on Pod names.

# From ‘kubectl get pod' to get one of the faulty pod names to use here to grep. 
# Use the obtained Container ID of one of the faulty app containers to return the Process ID.
crictl inspect --output go-template --template '{{.info.pid}}'

varghesejoji_18-1683354528032.png

  1. Use the Process ID to enter the Pods' Network namespace using command nsenter. This allows us to execute commands into the Pod namespace. In this case, command ‘ip address show' displays Pod IP. Running ‘k get pods' confirms from the IP that we're on the right pod.

varghesejoji_19-1683354528034.png

Step 7: Confirm if App is listening

This step uses lsof (List Open Files) utility using following parameters:

  • The -i parameter is used to display information about network connections.
  • The -P parameter is used to prevent the conversion of port numbers to port names. When used with the -i parameter, it will display the port number instead of the name.
  • The -n parameter is used to prevent the conversion of network addresses to hostnames. When used with the -i parameter, it will display the IP address instead of the hostname.
Command to use:   nsenter -t  -n lsof -i -P -n

From below, working container is listening on ANY IPs i.e., *:4000.

Faulty container is tied to local loopback or 127.0.0.1 instead of ANY as above.

varghesejoji_20-1683354528039.png

Step 8: Fixing the issue

Issue was in the Docker file where working app was set to bind to 0.0.0.0 or default/Any address, but faulty app was set to bind to a fixed loopback 127.0.0.1 address, as seen below.

Working

varghesejoji_21-1683354528043.png

Faulty

varghesejoji_22-1683354528045.png

Step 9: Challenge                                                                                                           

From docker-app folder fix the Dockerfile for Faulty app, create a new image, and create new Pod using this image to check if it resolves issue.

Step 10: Cleanup                                                                         

k delete ns student
az network nsg rule delete --name AllowHTTPInbound `
--resource-group $resource_group --nsg-name $nsg_name

Scenario 6: Enable AKS Monitoring and Logging

Objective: Enable Container Insights to provide container performance and health monitoring. Also enable Container Diagnostics to collect container logs and metrics and make them available for analysis and .

Step 1: Set up the environment.

  1. Setup up AKS as outlined in this script. Clone solutions Github link and cd to Lab6 folder.
  2. Create and switch to the newly created namespace.
kubectl create ns student
kubectl config set-context --current --namespace=student
# Verify current namespace
kubectl config view --minify --output 'jsonpath={..namespace}'
  1. Confirm Container Insights has been set up. This was setup during AKS cluster creation in Lab setup section. From AKS blade in portal > Monitor > Insights, confirm metrics collection.

Step 2: Deploy and Monitor apps that spike CPU/Memory utilization

  1. Assuming namespace ‘student' still exists, run working.ps1 shown below to turn on CPU and Memory load.
$kubectl_apply = @”

# deployment to generate high cpu
apiVersion: apps/v1
kind: Deployment
metadata:
name: openssl-loop
namespace: student
spec:
replicas: 3
selector:
matchLabels:
app: openssl-loop
template:
metadata:
labels:
app: openssl-loop
spec:
containers:
– args:
– |
while true; do
openssl speed >/dev/null;
done
command:
– /bin/bash
– -c
image: polinux/stress
name: openssl-loop

# deployment to generate high memory
apiVersion: apps/v1
kind: Deployment
metadata:
name: stress-memory
namespace: student
spec:
replicas: 3
selector:
matchLabels:
app: stress-memory
template:
metadata:
labels:
app: stress-memory
spec:
containers:
– image: polinux/stress
name: stress-memory-container
resources:
requests:
memory: 50Mi
limits:
memory: 50Mi
command: [“stress”]
args: [“–vm”, “1”, “–vm-bytes”, “250M”, “–vm-hang”, “1”]

“@
$kubectl_apply | kubectl apply -f –

‘kubectl get pods' should have stress-memory pods in ‘CrashLoopBackOff' and empty-loop pods in ‘Pending'.

  1. From Insights tab, validate the CPU/Memory consumption

varghesejoji_23-1683354528048.png

From Nodes tab, see if the top consuming Pods match those deployed.

varghesejoji_24-1683354528051.png

Step 3: View container logs and generate an alert resulting in email

  1. From Logs, search and select KubeEvents and run the below query to get the Pod results.

varghesejoji_25-1683354528053.png

KubePodInventory
| where TimeGenerated > ago(2h)
| where ContainerStatusReason == "CrashLoopBackOff"
| where Namespace == "student"
| project TimeGenerated, Name, ContainerStatus, ContainerStatusReason

varghesejoji_26-1683354528055.png

  1. Create an alert as highlighted above. Confirm Email has been received on next alert.
  • Set threshold to 0.
  • In ‘Actions' create an Action group with Email ID if it doesn't exist.
  • Set Alert rule name and create Alert.
  1. Confirm email receipt on next occurrence of the threshold.

Step 4: Search Diagnostics logs

  1. Ensure AzureDiagnostics section is seen in Logs. If available, run below commands to Create and Delete objects. This should generate additional log data.

varghesejoji_27-1683354528058.png

k create ns test-diag
k create deploy deploy-diag-alert --image busybox -n test-diag
k delete deploy deploy-diag-alert -n test-diag

Queries section should lead to the Query finder to get AzureDiagnostics logs if it exists.

varghesejoji_28-1683354528060.png

  1. Run the below queries to view log data. Log details are found in log_s. Using parse_json() you can drill down to display content of embedded fields, objects, or arrays.
AzureDiagnostics
| where Category contains "kube-audit"
| extend log=parse_json(log_s)
| extend verb=log.verb
| extend resource=log.objectRef.resource
| extend ns=log.objectRef.namespace
| extend name=log.objectRef.name
| where resource == "pods"
| where ns=="test-diag"
| project TimeGenerated, verb, resource, name, log_s

varghesejoji_29-1683354528064.png

To get graphical view, run below. This gets line chart of all the created pods in ns ‘test-diag'

AzureDiagnostics
| where Category contains "kube-audit"
| extend log=parse_json(log_s)
| extend verb=log.verb
| extend resource=log.objectRef.resource
| extend name=log.objectRef.name
| extend ns=log.objectRef.namespace
| where resource == "pods"
| where verb=="create"
| where ns=="test-diag"
| summarize count() by bin(TimeGenerated, 1m), tostring(name), tostring(verb)
| render timechart

Step 5: Challenge                                                              

Repeat labs 1 to 5 and use the Logs section above to query and analyze the logs.

Step 6: Final cleanup

az group delete -n  -y

Conclusion

This post illustrated usage of Linux tools to get a kernel level view of Kubernetes processes and get to the root cause of the faulty application. We also saw the use of Container Insights and diagnostics to analyze and troubleshoot using logs and metrics. Finally, we hope that this three-part series has been helpful in your troubleshooting journey with AKS, and that the techniques and discussed will aid you in resolving issues efficiently and effectively.

Disclaimer

The sample scripts are not supported by any Microsoft standard support program or service. The sample scripts are provided AS IS without a warranty of any kind. Microsoft further disclaims all implied warranties including, without limitation, any implied warranties of merchantability or of fitness for a particular purpose. The entire risk arising out of the use or performance of the sample scripts and documentation remains with you. In no event shall Microsoft, its authors, or anyone else involved in the creation, production, or delivery of the scripts be liable for any damages whatsoever (including, without limitation, damages for loss of business profits, business interruption, loss of business information, or other pecuniary loss) arising out of the use of or inability to use the sample scripts or documentation, even if Microsoft has been advised of the possibility of such damages.

 

This article was originally published by Microsoft's Entra (Azure AD) Blog. You can find the original article here.