Quick Fix: vSphere with Tanzu - Unable to validate against any pod security policy

I’ve been grappling with vSphere with Tanzu for the past week or so and while I haven’t completely nailed how things operate end to end in this new world, i’ve been able to get to a point beyond the common “kubectl get nodes” which is where plenty of people kicking the tyres get to… and then stop. There also isn’t a lot of straight forward content out there on how to quickly fix issues that come up after the deployment of a Tanzu Kubernetes Grid instance into a fresh namespace. There are kubernetes security and kubernetes storage constructs to deal with and for those not familiar with or new to Kubernetes… it can drive you mad!

The Problem:

One of the first issues I came across was when trying to deploy pods or solutions outside of the default namespace created as part of the TKG installation. There is a whole heap to read up on here in terms of how Tanzu Kubernetes works together with internal namespaces and how Pod Security Policies are leveraged together with the Cluster Roles and Role Bindings that are associated with the Default Pod Security Policy. Bindings can be set at the default level using pre-built PSPs or they can be applied at the namespace level using existing or custom PSPs.

About Kubernetes Pod Security Policies
Kubernetes pod security policies (PSPs) are cluster-level resources that control the security of pods. Using PSPs gives you control over the types of pods that can be deployed and the types of accounts that can deploy them. A PodSecurityPolicy resource defines a set of conditions that a pod must satisfy to be deployable. If the conditions are not met, the pod cannot be deployed. A single PodSecurityPolicy must validate a pod in its entirety. A pod cannot have some of its rules in one policy and some in another.

The Error:

When looking to deploy solutions or individual POD s either directly from the kubectl command line or using Helm Charts into a namespace, objects where getting created, but I was seeing PODs in a pending state and looking into the events of the namespaces was seeing event entries as seen below.

LAST SEEN TYPE REASON OBJECT MESSAGE
Warning FailedCreate job/kubeapps-internal-apprepository-jobs-cleanup Error creating: pods "kubeapps-internal-apprepository-jobs-cleanup-" is forbidden: unable to validate against any pod security policy: []
Warning FailedCreate job/kubeapps-internal-apprepository-jobs-cleanup Error creating: pods "kubeapps-internal-apprepository-jobs-cleanup-" is forbidden: unable to validate against any pod security policy: []

LAST SEEN TYPE REASON OBJECT MESSAGE

Warning FailedCreate job/kubeapps-internal-apprepository-jobs-cleanup Error creating: pods "kubeapps-internal-apprepository-jobs-cleanup-" is forbidden: unable to validate against any pod security policy: []

Basically the user has no rights in the Cluster for privileged deployments. There are a couple ways to fix this… you can try to deploy into the default namespace, but if you want to keep deployments separated using namespaces you needs to configure and attach some policies. There is clearly a reason fos these PSPs to be in place inside the Kubernetes Cluster within a Tanzu deployment. The authenticated user gets edit or view rights controlled at the vSphere namespace level but then within the TKG cluster there is also a set of policies that need to be considered to do stuff.

The Fix:

The fix below is basically like granting ALLOW ANY ANY on a firewall (though any damage possible is self contained inside the specific TKG deployment) and for those looking to dive a little deeper into Kubernetes deployments, this is the quick way to create a new PSP and get to the business of deploying containerised application.

Note: I am documenting this fix as a workaround for testing purposes. This article can help you configure things in a more tight manor the VMware Kubernetes Tanzu way. You can apply a ClusterRoleBinding that applies “vmware-system-privileged” to the logged in user.

Create and apply a new Pod Security Policy (PSP)

	apiVersion: policy/v1beta1
	kind: PodSecurityPolicy
	metadata:
	name: kubeapps-psp
	spec:
	privileged: true
	seLinux:
	rule: RunAsAny
	supplementalGroups:
	rule: RunAsAny
	runAsUser:
	rule: RunAsAny
	fsGroup:
	rule: RunAsAny
	volumes:
	– '*'

view raw

justwork-psp.yaml

hosted with ❤ by GitHub

Create and apply a new Cluster Role tied to the new PSP

	kind: ClusterRole
	apiVersion: rbac.authorization.k8s.io/v1
	metadata:
	name: kubeapps-clusterrole
	rules:
	– apiGroups:
	– policy
	resources:
	– podsecuritypolicies
	verbs:
	– use
	resourceNames:
	– kubeapps-psp

view raw

justwork-clusterrole.yaml

hosted with ❤ by GitHub

Bind the Cluster Role to a Service Account associated to a namespace

	apiVersion: rbac.authorization.k8s.io/v1
	kind: RoleBinding
	metadata:
	name: kubeapps-clusterrole
	namespace: kubeapps
	roleRef:
	apiGroup: rbac.authorization.k8s.io
	kind: ClusterRole
	name: kubeapps-clusterrole
	subjects:
	– apiGroup: rbac.authorization.k8s.io
	kind: Group
	name: system:serviceaccounts
	– kind: ServiceAccount # Omit apiGroup
	name: default
	namespace: kubeapps

view raw

justwork-rolebinding.yaml

hosted with ❤ by GitHub

Use kubectl to apply the new roles and bindings.

In my example above where I was watching the events, as soon as I applied the new PSP, Cluster Role and RoleBinding, the pending/failed jobs were able to complete.

[root@ANSIBLE-01 bin]# kubectl get events -n kubeapps -w
LAST SEEN TYPE REASON OBJECT MESSAGE
28s Normal Scheduled pod/kubeapps-internal-apprepository-jobs-cleanup-n4dbz Successfully assigned kubeapps/kubeapps-internal-apprepository-jobs-cleanup-n4dbz to tkg-cluster-001-workers-qcqqz-86b999cc5c-q7gz2
28s Normal Pulling pod/kubeapps-internal-apprepository-jobs-cleanup-n4dbz Pulling image "docker.io/bitnami/kubectl:1.18.9-debian-10-r5"
15s Normal Pulled pod/kubeapps-internal-apprepository-jobs-cleanup-n4dbz Successfully pulled image "docker.io/bitnami/kubectl:1.18.9-debian-10-r5"
14s Normal Created pod/kubeapps-internal-apprepository-jobs-cleanup-n4dbz Created container kubectl
14s Normal Started pod/kubeapps-internal-apprepository-jobs-cleanup-n4dbz Started container kubectl
28s Normal SuccessfulCreate job/kubeapps-internal-apprepository-jobs-cleanup Created pod: kubeapps-internal-apprepository-jobs-cleanup-n4dbz
05s Normal Completed job/kubeapps-internal-apprepository-jobs-cleanup Job completed

[root@ANSIBLE-01 bin]# kubectl get events -n kubeapps -w

LAST SEEN TYPE REASON OBJECT MESSAGE

28s Normal Scheduled pod/kubeapps-internal-apprepository-jobs-cleanup-n4dbz Successfully assigned kubeapps/kubeapps-internal-apprepository-jobs-cleanup-n4dbz to tkg-cluster-001-workers-qcqqz-86b999cc5c-q7gz2