I’ve been grappling with vSphere with Tanzu for the past week or so and while I haven’t completely nailed how things operate end to end in this new world, i’ve been able to get to a point beyond the common “kubectl get nodes” which is where plenty of people kicking the tyres get to… and then stop. There also isn’t a lot of straight forward content out there on how to quickly fix issues that come up after the deployment of a Tanzu Kubernetes Grid instance into a fresh namespace. There are kubernetes security and kubernetes storage constructs to deal with and for those not familiar with or new to Kubernetes… it can drive you mad!
The Problem:
One of the first issues I came across was when trying to deploy pods or solutions outside of the default namespace created as part of the TKG installation. There is a whole heap to read up on here in terms of how Tanzu Kubernetes works together with internal namespaces and how Pod Security Policies are leveraged together with the Cluster Roles and Role Bindings that are associated with the Default Pod Security Policy. Bindings can be set at the default level using pre-built PSPs or they can be applied at the namespace level using existing or custom PSPs.
About Kubernetes Pod Security Policies
Kubernetes pod security policies (PSPs) are cluster-level resources that control the security of pods. Using PSPs gives you control over the types of pods that can be deployed and the types of accounts that can deploy them. A PodSecurityPolicy resource defines a set of conditions that a pod must satisfy to be deployable. If the conditions are not met, the pod cannot be deployed. A single PodSecurityPolicy must validate a pod in its entirety. A pod cannot have some of its rules in one policy and some in another.
The Error:
When looking to deploy solutions or individual POD s either directly from the kubectl command line or using Helm Charts into a namespace, objects where getting created, but I was seeing PODs in a pending state and looking into the events of the namespaces was seeing event entries as seen below.
1 2 3 |
LAST SEEN TYPE REASON OBJECT MESSAGE Warning FailedCreate job/kubeapps-internal-apprepository-jobs-cleanup Error creating: pods "kubeapps-internal-apprepository-jobs-cleanup-" is forbidden: unable to validate against any pod security policy: [] Warning FailedCreate job/kubeapps-internal-apprepository-jobs-cleanup Error creating: pods "kubeapps-internal-apprepository-jobs-cleanup-" is forbidden: unable to validate against any pod security policy: [] |
Basically the user has no rights in the Cluster for privileged deployments. There are a couple ways to fix this… you can try to deploy into the default namespace, but if you want to keep deployments separated using namespaces you needs to configure and attach some policies. There is clearly a reason fos these PSPs to be in place inside the Kubernetes Cluster within a Tanzu deployment. The authenticated user gets edit or view rights controlled at the vSphere namespace level but then within the TKG cluster there is also a set of policies that need to be considered to do stuff.
The Fix:
The fix below is basically like granting ALLOW ANY ANY on a firewall (though any damage possible is self contained inside the specific TKG deployment) and for those looking to dive a little deeper into Kubernetes deployments, this is the quick way to create a new PSP and get to the business of deploying containerised application.
Note: I am documenting this fix as a workaround for testing purposes. This article can help you configure things in a more tight manor the VMware Kubernetes Tanzu way. You can apply a ClusterRoleBinding that applies “vmware-system-privileged” to the logged in user.
Create and apply a new Pod Security Policy (PSP)
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
apiVersion: policy/v1beta1 | |
kind: PodSecurityPolicy | |
metadata: | |
name: kubeapps-psp | |
spec: | |
privileged: true | |
seLinux: | |
rule: RunAsAny | |
supplementalGroups: | |
rule: RunAsAny | |
runAsUser: | |
rule: RunAsAny | |
fsGroup: | |
rule: RunAsAny | |
volumes: | |
– '*' |
Create and apply a new Cluster Role tied to the new PSP
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
kind: ClusterRole | |
apiVersion: rbac.authorization.k8s.io/v1 | |
metadata: | |
name: kubeapps-clusterrole | |
rules: | |
– apiGroups: | |
– policy | |
resources: | |
– podsecuritypolicies | |
verbs: | |
– use | |
resourceNames: | |
– kubeapps-psp |
Bind the Cluster Role to a Service Account associated to a namespace
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
apiVersion: rbac.authorization.k8s.io/v1 | |
kind: RoleBinding | |
metadata: | |
name: kubeapps-clusterrole | |
namespace: kubeapps | |
roleRef: | |
apiGroup: rbac.authorization.k8s.io | |
kind: ClusterRole | |
name: kubeapps-clusterrole | |
subjects: | |
– apiGroup: rbac.authorization.k8s.io | |
kind: Group | |
name: system:serviceaccounts | |
– kind: ServiceAccount # Omit apiGroup | |
name: default | |
namespace: kubeapps |
Use kubectl to apply the new roles and bindings.
In my example above where I was watching the events, as soon as I applied the new PSP, Cluster Role and RoleBinding, the pending/failed jobs were able to complete.
1 2 3 4 5 6 7 8 9 |
[root@ANSIBLE-01 bin]# kubectl get events -n kubeapps -w LAST SEEN TYPE REASON OBJECT MESSAGE 28s Normal Scheduled pod/kubeapps-internal-apprepository-jobs-cleanup-n4dbz Successfully assigned kubeapps/kubeapps-internal-apprepository-jobs-cleanup-n4dbz to tkg-cluster-001-workers-qcqqz-86b999cc5c-q7gz2 28s Normal Pulling pod/kubeapps-internal-apprepository-jobs-cleanup-n4dbz Pulling image "docker.io/bitnami/kubectl:1.18.9-debian-10-r5" 15s Normal Pulled pod/kubeapps-internal-apprepository-jobs-cleanup-n4dbz Successfully pulled image "docker.io/bitnami/kubectl:1.18.9-debian-10-r5" 14s Normal Created pod/kubeapps-internal-apprepository-jobs-cleanup-n4dbz Created container kubectl 14s Normal Started pod/kubeapps-internal-apprepository-jobs-cleanup-n4dbz Started container kubectl 28s Normal SuccessfulCreate job/kubeapps-internal-apprepository-jobs-cleanup Created pod: kubeapps-internal-apprepository-jobs-cleanup-n4dbz 05s Normal Completed job/kubeapps-internal-apprepository-jobs-cleanup Job completed |
From that point forward, any new deployments either direct or via Helm had no issues.
References:
https://cloud.google.com/kubernetes-engine/docs/how-to/pod-security-policies
https://core.vmware.com/resource/vsphere-tanzu-quick-start-guide
pod s kubernetes deployment kubernetes pod security policy