The vRealize Automation appliance is installed on Photon OS since verion 8.0. It includes native Kubernetes to host containerized services.The vRA services run as Docker containers in Kubernetes pods. Each Kubernetes pod hosts one or more containers. Examples of vRA containerized services are:
- RabbitMQ is the industry-standard message bus used in vRealize Automation.
- The vRA database is a PostgreSQL database which runs as a pod and uses a Persistent Volume (PV) to store data.
- The vRealize Orchestrator service runs as a pod hosting two key containers, the control center (manages operations of vRO and plugins) and the vco-server (the orchestration engine).
Another basic container concept is a Namespace. Namespaces are a way to divide Kubernetes cluster resources between multiple users.
As you can see, vRA has many components. Of course, they can fail and some knowledge is required to troubleshoot these scenarios.
vRA Namespaces
In vRA there exist three main namespaces:
- prelude: The main namespace with all system pods
- kube-system: The namespace for internal Kubernetes functions, where Docker, networking, and load balancing are created and managed
- ingress: A special namespace which manages external communication
The kubectl get pods -n
command enables us to get a list of all pods in a namespace, e.g. to get all pods of the prelude namespace:
root@vra1a [ ~ ]# kubectl get pods -n prelude
NAME READY STATUS RESTARTS AGE
abx-service-app-895c89777-dcvwg 1/1 Running 0 24m
adapter-host-service-app-5fd86c88d7-s9k9w 1/1 Running 0 24m
approval-service-7759f64fcf-9kl5k 1/1 Running 0 24m
assessment-service-app-74d896c769-ncbfz 1/1 Running 0 24m
automation-ui-app-6784ffbdc7-hwvmk 1/1 Running 0 16m
blueprints-ui-app-bf49bcb7f-8l6c4 1/1 Running 0 16m
catalog-service-app-6b89646d49-8k8hw 1/1 Running 0 24m
catalog-ui-app-6574cb6bdb-9ztgb 1/1 Running 0 16m
cgs-service-64cf784b6f-klkck 1/1 Running 0 24m
cgs-ui-app-74545b89cc-mvngr 1/1 Running 0 16m
cmx-service-app-588b6674d8-ncn9r 1/1 Running 0 24m
codestream-app-687dd4d45f-spfhc 1/1 Running 0 24m
deployment-ui-app-5c588f44fb-m5ltn 1/1 Running 0 16m
docker-registry-6f9756957c-78xvh 1/1 Running 0 27m
ebs-app-6c8f4c9f96-rctmk 1/1 Running 0 24m
extensibility-ui-app-64c4dd59d7-m9xk7 1/1 Running 0 16m
form-service-app-58d6f47d79-ndcgj 1/1 Running 0 24m
hcmp-service-app-8699b77dc-59pd9 1/1 Running 0 24m
identity-service-app-5cfcdd9d7-7mvlj 1/1 Running 0 26m
identity-ui-app-77fdccfff-6mn8s 1/1 Running 0 16m
landing-ui-app-7545b6d668-x65v2 1/1 Running 0 16m
migration-service-app-6d5968f4-62btj 1/1 Running 0 24m
migration-ui-app-697b5d9c57-9rnl2 1/1 Running 0 16m
nginx-httpd-app-646d64c4d9-w25qv 1/1 Running 0 16m
no-license-app-d99748dcb-jw4rp 1/1 Running 0 28m
orchestration-ui-app-8546bd87f-gw68v 1/1 Running 0 16m
pipeline-ui-app-7c99f45659-p2dmd 1/1 Running 0 16m
postgres-0 1/1 Running 0 28m
project-service-app-86f7947dd6-flcgm 1/1 Running 0 24m
provisioning-service-app-66bd4848c-4wrwc 1/1 Running 0 24m
provisioning-ui-app-795f959bf7-bdk8k 1/1 Running 0 16m
proxy-service-c564f6b6f-bvvlb 1/1 Running 0 27m
quickstart-ui-app-5d7976599d-pk6gt 1/1 Running 0 16m
rabbitmq-ha-0 1/1 Running 0 28m
relocation-service-app-776cc8b6fd-n8k4v 1/1 Running 0 24m
relocation-ui-app-c976d8577-fnhn8 1/1 Running 0 16m
tango-blueprint-service-app-858865cdd6-mmzfm 1/1 Running 0 24m
tango-vro-gateway-app-6668bc7bd4-ppf7n 1/1 Running 0 24m
tenant-management-ui-app-9c78b96f-brdfd 1/1 Running 0 16m
terraform-service-app-7df9dfd8bc-2l25d 2/2 Running 0 24m
user-profile-service-app-55bf5444b6-2sbdj 1/1 Running 0 24m
vco-app-5c6fff49d-vw2bn 3/3 Running 0 24m
For a complete list of namespaces, run the kubectl get namespaces
command:
root@vra1a [ ~ ]# kubectl get namespaces
NAME STATUS AGE
default Active 124d
ingress Active 30m
kube-node-lease Active 124d
kube-public Active 124d
kube-system Active 124d
openfaas Active 30m
openfaas-fn Active 30m
openfaas-ip Active 30m
prelude Active 30m
To view all pods, we execute the command kubectl get pods -A
:
root@vra1a [ ~ ]# kubectl get pods -A
NAMESPACE NAME READY STATUS RESTARTS AGE
ingress ingress-ctl-traefik-7bb7769db4-8lhbv 1/1 Running 0 28m
kube-system command-executor-9pvl7 1/1 Running 0 124d
kube-system coredns-85snm 1/1 Running 0 124d
kube-system etcd-vra1a.lab.local 1/1 Running 0 124d
kube-system health-reporting-app-rvg9s 1/1 Running 0 124d
kube-system kube-apiserver-vra1a.lab.local 1/1 Running 0 124d
kube-system kube-controller-manager-vra1a.lab.local 1/1 Running 0 124d
kube-system kube-flannel-ds-7lj88 1/1 Running 0 124d
kube-system kube-node-monitor-6tb6m 1/1 Running 0 124d
kube-system kube-proxy-86d7d 1/1 Running 0 124d
kube-system kube-scheduler-vra1a.lab.local 1/1 Running 0 124d
kube-system kubelet-rubber-stamp-mgnkl 1/1 Running 0 124d
kube-system metrics-server-6qdgc 1/1 Running 0 124d
kube-system network-health-monitor-xx2p8 1/1 Running 0 124d
kube-system predictable-pod-scheduler-8nzhl 1/1 Running 0 124d
kube-system prelude-network-monitor-cron-1661168340-2bwfh 0/1 Completed 0 4m27s
kube-system prelude-network-monitor-cron-1661168520-xckpd 0/1 Completed 0 86s
kube-system state-enforcement-cron-1661168280-24rpm 0/1 Completed 0 5m27s
kube-system state-enforcement-cron-1661168400-42qz9 0/1 Completed 0 3m26s
kube-system state-enforcement-cron-1661168520-97wqm 0/1 Completed 0 86s
kube-system update-etc-hosts-jzm7w 1/1 Running 0 124d
openfaas gateway-b65b4c449-62j2x 2/2 Running 0 28m
prelude abx-service-app-895c89777-dcvwg 1/1 Running 0 25m
prelude adapter-host-service-app-5fd86c88d7-s9k9w 1/1 Running 0 25m
prelude approval-service-7759f64fcf-9kl5k 1/1 Running 0 25m
prelude assessment-service-app-74d896c769-ncbfz 1/1 Running 0 25m
prelude automation-ui-app-6784ffbdc7-hwvmk 1/1 Running 0 17m
prelude blueprints-ui-app-bf49bcb7f-8l6c4 1/1 Running 0 17m
prelude catalog-service-app-6b89646d49-8k8hw 1/1 Running 0 25m
prelude catalog-ui-app-6574cb6bdb-9ztgb 1/1 Running 0 17m
prelude cgs-service-64cf784b6f-klkck 1/1 Running 0 25m
prelude cgs-ui-app-74545b89cc-mvngr 1/1 Running 0 17m
prelude cmx-service-app-588b6674d8-ncn9r 1/1 Running 0 25m
prelude codestream-app-687dd4d45f-spfhc 1/1 Running 0 25m
prelude deployment-ui-app-5c588f44fb-m5ltn 1/1 Running 0 17m
prelude docker-registry-6f9756957c-78xvh 1/1 Running 0 28m
prelude ebs-app-6c8f4c9f96-rctmk 1/1 Running 0 25m
prelude extensibility-ui-app-64c4dd59d7-m9xk7 1/1 Running 0 17m
prelude form-service-app-58d6f47d79-ndcgj 1/1 Running 0 25m
prelude hcmp-service-app-8699b77dc-59pd9 1/1 Running 0 25m
prelude identity-service-app-5cfcdd9d7-7mvlj 1/1 Running 0 27m
prelude identity-ui-app-77fdccfff-6mn8s 1/1 Running 0 17m
prelude landing-ui-app-7545b6d668-x65v2 1/1 Running 0 17m
prelude migration-service-app-6d5968f4-62btj 1/1 Running 0 25m
prelude migration-ui-app-697b5d9c57-9rnl2 1/1 Running 0 17m
prelude nginx-httpd-app-646d64c4d9-w25qv 1/1 Running 0 17m
prelude no-license-app-d99748dcb-jw4rp 1/1 Running 0 29m
prelude orchestration-ui-app-8546bd87f-gw68v 1/1 Running 0 17m
prelude pipeline-ui-app-7c99f45659-p2dmd 1/1 Running 0 17m
prelude postgres-0 1/1 Running 0 29m
prelude project-service-app-86f7947dd6-flcgm 1/1 Running 0 25m
prelude provisioning-service-app-66bd4848c-4wrwc 1/1 Running 0 25m
prelude provisioning-ui-app-795f959bf7-bdk8k 1/1 Running 0 17m
prelude proxy-service-c564f6b6f-bvvlb 1/1 Running 0 28m
prelude quickstart-ui-app-5d7976599d-pk6gt 1/1 Running 0 17m
prelude rabbitmq-ha-0 1/1 Running 0 29m
prelude relocation-service-app-776cc8b6fd-n8k4v 1/1 Running 0 25m
prelude relocation-ui-app-c976d8577-fnhn8 1/1 Running 0 17m
prelude tango-blueprint-service-app-858865cdd6-mmzfm 1/1 Running 0 25m
prelude tango-vro-gateway-app-6668bc7bd4-ppf7n 1/1 Running 0 25m
prelude tenant-management-ui-app-9c78b96f-brdfd 1/1 Running 0 17m
prelude terraform-service-app-7df9dfd8bc-2l25d 2/2 Running 0 25m
prelude user-profile-service-app-55bf5444b6-2sbdj 1/1 Running 0 25m
prelude vco-app-5c6fff49d-vw2bn 3/3 Running 0 25m
To view networking components, run the kubectl get services -A
command:
root@vra1a [ ~ ]# kubectl get services -A
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
default kubernetes ClusterIP 10.244.4.1 <none> 443/TCP 124d
ingress ingress-ctl-traefik NodePort 10.244.6.123 <none> 80:80/TCP,443:443/TCP 29m
ingress ingress-ctl-traefik-dashboard ClusterIP 10.244.7.210 <none> 80/TCP 29m
kube-system etcd-service ClusterIP 10.244.6.8 <none> 2379/TCP 124d
kube-system health-reporting-service NodePort 10.244.6.250 <none> 80:8008/TCP 124d
kube-system kube-dns ClusterIP 10.244.4.10 <none> 53/UDP,53/TCP,9153/TCP 124d
kube-system metrics-server ClusterIP 10.244.4.54 <none> 443/TCP 124d
kube-system network-health-monitor ClusterIP None <none> <none> 124d
openfaas gateway ClusterIP 10.244.6.98 <none> 8080/TCP 29m
prelude abx-service ClusterIP 10.244.6.13 <none> 80/TCP 27m
prelude adapter-host-service ClusterIP 10.244.5.65 <none> 8283/TCP 27m
prelude aggregator-service ClusterIP 10.244.7.134 <none> 8082/TCP 27m
prelude approval-service ClusterIP 10.244.6.147 <none> 8082/TCP 27m
prelude assessment-service ClusterIP 10.244.4.70 <none> 8080/TCP 27m
prelude automation-ui ClusterIP 10.244.6.71 <none> 80/TCP 19m
prelude blueprints-ui ClusterIP 10.244.7.141 <none> 80/TCP 19m
prelude catalog-service ClusterIP 10.244.7.143 <none> 8000/TCP,47500/TCP 27m
prelude catalog-ui ClusterIP 10.244.7.64 <none> 80/TCP 19m
prelude cgs-service ClusterIP 10.244.5.69 <none> 8090/TCP 27m
prelude cgs-ui ClusterIP 10.244.5.241 <none> 80/TCP 19m
prelude cmx-service ClusterIP 10.244.6.239 <none> 8292/TCP 27m
prelude codestream ClusterIP 10.244.6.148 <none> 8000/TCP 27m
prelude deployment-limit-service ClusterIP 10.244.5.207 <none> 8082/TCP 27m
prelude deployment-ui ClusterIP 10.244.5.120 <none> 80/TCP 19m
prelude docker-registry ClusterIP 10.244.5.169 <none> 5000/TCP 29m
prelude ebs-service ClusterIP 10.244.6.146 <none> 4242/TCP,31090/TCP 27m
prelude extensibility-ui ClusterIP 10.244.6.5 <none> 80/TCP 19m
prelude form-service ClusterIP 10.244.4.226 <none> 8383/TCP 27m
prelude hcmp-service ClusterIP 10.244.4.231 <none> 8100/TCP 27m
prelude identity-service ClusterIP 10.244.4.71 <none> 8000/TCP 28m
prelude identity-ui ClusterIP 10.244.5.240 <none> 80/TCP 19m
prelude landing-ui ClusterIP 10.244.7.23 <none> 80/TCP 19m
prelude migration-service ClusterIP 10.244.5.90 <none> 8080/TCP 27m
prelude migration-ui ClusterIP 10.244.6.140 <none> 80/TCP 19m
prelude nginx-httpd ClusterIP 10.244.4.116 <none> 80/TCP 19m
prelude no-license-service ClusterIP 10.244.5.64 <none> 80/TCP 31m
prelude orchestration-ui ClusterIP 10.244.6.78 <none> 80/TCP 19m
prelude pgpool ClusterIP 10.244.6.3 <none> 5432/TCP 31m
prelude pipeline-ui ClusterIP 10.244.4.125 <none> 80/TCP 19m
prelude postgres ClusterIP None <none> <none> 31m
prelude project-service ClusterIP 10.244.7.128 <none> 8080/TCP 27m
prelude provisioning-service ClusterIP 10.244.4.62 <none> 8282/TCP,8484/TCP 27m
prelude provisioning-service-no-local ClusterIP 10.244.4.239 <none> 8282/TCP,8484/TCP 27m
prelude provisioning-ui ClusterIP 10.244.7.33 <none> 80/TCP 19m
prelude proxy-service ClusterIP 10.244.6.7 <none> 3128/TCP 29m
prelude quickstart-ui ClusterIP 10.244.7.213 <none> 80/TCP 19m
prelude rabbitmq-ha ClusterIP 10.244.4.101 <none> 5672/TCP,4369/TCP 31m
prelude rabbitmq-ha-discovery ClusterIP None <none> 15672/TCP,5672/TCP,4369/TCP 31m
prelude rabbitmq-ha-http ClusterIP 10.244.7.236 <none> 15672/TCP 31m
prelude relocation-service ClusterIP 10.244.6.73 <none> 8980/TCP 27m
prelude relocation-ui ClusterIP 10.244.5.8 <none> 80/TCP 19m
prelude tango-blueprint ClusterIP 10.244.7.12 <none> 8080/TCP 27m
prelude tango-vro-gateway ClusterIP 10.244.5.99 <none> 8080/TCP 27m
prelude tapestry-service ClusterIP 10.244.7.89 <none> 8000/TCP 27m
prelude tenant-management-ui ClusterIP 10.244.5.253 <none> 80/TCP 19m
prelude terraform-service ClusterIP 10.244.5.127 <none> 8686/TCP 27m
prelude user-profile-service ClusterIP 10.244.4.44 <none> 8080/TCP 27m
prelude vco-controlcenter-service ClusterIP 10.244.4.115 <none> 8282/TCP 27m
prelude vco-polyglot-debugging-service NodePort 10.244.7.246 <none> 18281:18281/TCP,18282:18282/TCP,18283:18283/TCP,18284:18284/TCP,18285:18285/TCP 27m
prelude vco-service ClusterIP 10.244.5.41 <none> 8280/TCP
We can view a service in detail by running kubectl describe pods -n
, e.g.:
root@vra1a [ ~ ]# kubectl describe pods -n prelude postgres-0
Name: postgres-0
Namespace: prelude
Priority: 0
Node: vra1a.lab.local/172.16.11.234
Start Time: Sun, 17 Jul 2022 11:13:54 +0000
Labels: app=postgres
controller-revision-hash=postgres-6f48d75786
name=postgres
statefulset.kubernetes.io/pod-name=postgres-0
Annotations: <none>
Status: Running
IP: 10.244.0.144
IPs:
IP: 10.244.0.144
Controlled By: StatefulSet/postgres
Init Containers:
init:
Container ID: docker://a9b9213891aa5774d01718849bae5bbbe37993ce4a2b1cd19868367aafa34c88
Image: db-image_private:latest
Image ID: docker://sha256:b0b692edf3ad8dc87d681184062a2f61556ebcdb7f799e4fcb539a27f5a21d58
Port: <none>
Host Port: <none>
Command:
/bin/bash
Args:
-c
/scripts/postgres_init.sh
State: Terminated
Reason: Completed
Exit Code: 0
...
vRA Service Status
We can use the vRA CLI command to monitor and troubleshoot the vRA appliance. The vracli tool can be used for tasks such as initial configuration, cluster managment, database, certificates, and log bundle generation.
To get an overview of the whole vRA cluster and database infrastructure:
root@vra1a [ ~ ]# vracli status
{
"hostNodes": [
{
"hostname": "vra1a.lab.local",
"info": {
"architecture": "amd64",
"boot_id": "15105383-943b-409d-91ff-c710ad29c828",
"container_runtime_version": "docker://19.3.13",
"kernel_version": "4.19.219-1.ph3",
"kube_proxy_version": "v1.20.11-dirty",
"kubelet_version": "v1.20.11-dirty",
"machine_id": "92f8571f665b4aacafc0f0e8f063f1b8",
"operating_system": "linux",
"os_image": "VMware Photon OS/Linux",
"system_uuid": "42024bc6-dc65-60c3-c121-3fb765c764f9"
},
"status": [
{
"type": "NetworkUnavailable",
"status": "False"
},
{
"type": "MemoryPressure",
"status": "False"
},
{
"type": "DiskPressure",
"status": "False"
},
{
"type": "PIDPressure",
"status": "False"
},
{
"type": "Ready",
"status": "True"
}
]
}
],
"databaseNodes": [
{
"DEBUG: connecting to: \"user=repmgr-db passfile=/run/repmgr-db.cred connect_timeout=10 dbname=repmgr-db host=postgres-0.postgres.prelude.svc.cluster.local keepalives=1 fallback_application_name=repmgr options=-csearch_path=\"": "",
"Node name": "postgres-0.postgres.prelude.svc.cluster.local",
"Node ID": "100",
"PostgreSQL version": "10.18",
"Total data size": "659 MB",
"Conninfo": "host=postgres-0.postgres.prelude.svc.cluster.local dbname=repmgr-db user=repmgr-db passfile=/run/repmgr-db.cred connect_timeout=10 keepalives=1",
"Role": "primary",
"WAL archiving": "enabled",
"Archive command": "/bin/true",
"WALs pending archiving": "0",
"Replication connections": "0 (of maximal 10)",
"Replication slots": "0 physical (of maximal 10; 0 missing)",
"Upstream node": "(none)",
"Replication lag": "n/a",
"Last received LSN": "(none)",
"Last replayed LSN": "(none)",
"max_wal_senders": "10",
"occupied_wal_senders": "0",
"max_replication_slots": "10",
"active_replication_slots": "0",
"inactive_replication_slots": "0",
"missing_replication_slots": "0",
"vra_status": "up",
"vra_error": null,
"vra_exitcode": 0
}
]
}
To confirm that the deployment is successful:
root@vra1a [ ~ ]# vracli status deploy
Deployment complete
To create a log bundle in the current directory:
root@vra1a [ ~ ]# vracli log-bundle
2022-07-17 14:08:25,623 [INFO] Use temporary directory /home/root/.log-bundle-20220717T140825e1iuv_pc
2022-07-17 14:08:25,624 [INFO] Generation of log bundle log-bundle-20220717T140825.tar started
2022-07-17 14:08:25,654 [INFO] Getting endpoints for log collection
2022-07-17 14:08:25,684 [INFO] Log collection endpoints are [command-executor-9pvl7]
2022-07-17 14:08:25,685 [INFO] Building bundle contents for local node
2022-07-17 14:08:31,142 [INFO] Collecting node logs from local node...
2022-07-17 14:08:31,168 [INFO] Collecting an environment file...
2022-07-17 14:08:39,340 [INFO] Assembling log bundle...
2022-07-17 14:08:44,538 [INFO] Assembly of log-bundle-20220717T140825.tar completed successfully.
2022-07-17 14:08:44,539 [INFO] assembly script stdout> Concatenating bundle part local-node-logs.tar...
2022-07-17 14:08:44,539 [INFO] assembly script stdout> Appending extra file environment...
2022-07-17 14:08:44,539 [INFO] assembly script stdout> Appending extra file log-bundle-20220717T140825.log...
2022-07-17 14:08:44,539 [INFO] Generation of log bundle /home/root/log-bundle-20220717T140825.tar completed successfully
The log bundle is a timestamped tar file. The name of the bundle matches the patter log-bundle-T.tar file, in the above example log-bundle-20220717T140825.tar
. Typically the log bundle contains logs from all nodes in the environment (minimally contains logs from the local node).
root@vra1a [ ~ ]# ls -la vra1a.lab.local/
total 32
drwx------ 8 root root 4096 Jul 17 14:55 .
drwx------ 6 root root 4096 Jul 17 14:55 ..
drwx------ 3 root root 4096 Jul 17 14:55 data
drwx------ 9 root root 4096 Jul 17 14:55 etc
drwx------ 3 root root 4096 Jul 17 14:55 opt
drwx------ 2 root root 4096 Jul 17 14:55 proc
drwx------ 9 root root 4096 Jul 17 14:55 services-logs
drwx------ 4 root root 4096 Jul 17 14:55 var
The log bundle consists of the following content:
- Environment file: The environment file contains the output of various Kubernetes maintenance commands. It supplies information about current resource usage per nodes and per pods. It also contains cluster information and description of all available Kubernetes entities.
- Host logs and configuration: The configuration of each host (for example its /etc directory) and the host-specific logs (for example journald).
- Services logs: Logs for Kubernetes services are located in the following folder structure:
<hostname>/services-logs/<namespace>/<app-name>/console-logs/<container-name>.log
<hostname>/services-logs/<namespace>/<app-name>/file-logs/<container-name>.log
Important log files
Apart from the service log files the below table list some important log files.
Log File | Purpose |
/var/log/deploy.log | Logs for the deployment of the virtual appliance and deploy services |
/services-logs/kube-system/etcd/console-logs/etcd.log | Access log for the Kubernetes etcd database |
/services-logs/ingress/traefik/console-logs/ingress-ctl-traefik.log | Reverse proxy system used to capture the communication to and from the system |
Rebuild vRA services
The /opt/scripts/deploy.sh
script can be used to force all the services to rebuild. It retrieves all images from the helm registry and redeploys the vRA services based on the configuration files. All information in the Postgres database remains intact, and service pods are reconfigured.
To preserve data integrity, we should shut down the vRA servives using the following commands:
/opt/scripts/svc-stop.sh
sleep 120
/opt/scripts/deploy.sh --onlyClean
To restore all the services, we should run the following command:
/opt/scripts/deploy.sh
Leave a Reply