ElasticSearch status in red after full restart vIDM cluster

After completely shutting down a vIDM cluster and starting it up again, I noticed that the ElasticSearch status on all three nodes was in red. You can see this in the System Diagnosis Dashboard.

Initially I thought this was just a matter of time before everything was properly synced. But unfortunately the status remained red.

In the VMware documentation you can find information on how to troubleshoot ElasticSeearch. You can find it here.

To check the health state via command line, open an SSH session to one of the vIDM nodes and enter this command:

curl ‘http://localhost:9200/_cluster/health?pretty’

which gives the following output:

curl ‘http://localhost:9200/_cluster/health?pretty’
{
“cluster_name” : “horizon”,
“status” : “red”,
“timed_out” : false,
“number_of_nodes” : 3,
“number_of_data_nodes” : 3,
“active_primary_shards” : 28,
“active_shards” : 56,
“relocating_shards” : 0,
“initializing_shards” : 0,
“unassigned_shards” : 14,
“delayed_unassigned_shards” : 0,
“number_of_pending_tasks” : 0,
“number_of_in_flight_fetch” : 0,
“task_max_waiting_in_queue_millis” : 0,
“active_shards_percent_as_number” : 80.0
}

A restart of the ElasticSearch service did not change anything. Also, in the logs no errors could be found.

Looking more closely at the output of health check command however, I could see that there were unassigned shards.

“unassigned_shards” : 14,

To understand the meaning of shards, please see: https://www.elastic.co/guide/en/elasticsearch/reference/6.2/_basic_concepts.html

To get more detailed info on the shards, execute this command on one of the vIDM nodes:

url ‘localhost:9200/_cat/shards?v’

From the output of this command I could see that shards after the startup of the cluster were assigned successfully and that the unassigned were only from the moment the cluster was shutdown.

This indicated that the ElasticSearch cluster was running without any errors, but there were some older entries stuck in unassigned mode.

The table below shows the output of the unassigned shards:

curl ‘localhost:9200/_cat/shards?v’ | grep UNASS

% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 5822 100 5822 0 0 844k 0 –:–:– –:–:– –:–:– 947k
searchentities 0 p UNASSIGNED
searchentities 0 r UNASSIGNED

(There were 14 unassigned shards, but for clarity I removed the others.)

To resolve this, you have to manually reallocate the shards.

Use the following command for this:

curl -XPOST ‘localhost:9200/_cluster/reroute’ -d ‘{
“commands”: [{
“allocate”: {
“index”: “one_of_the_indexes“,
“shard”: shard_number,
“node”: “one_of_nodes“,
“allow_primary”: 1
}
}]
}’

To find the name of the nodes, execute this command:

curl ‘localhost:9200/_cat/nodes?v’

The name of the nodes is shown in the last column:

In my case the command to reallocate the shards looks like this:

curl -XPOST ‘localhost:9200/_cluster/reroute’ -d ‘{“commands”:[{“allocate”:{“index”: “searchentities”,”shard”: 0,”node”: “Ringleader”,”allow_primary”: 1}}]}’

Listing the unassigned shards again, you can now see that “searchentities” dissappeared from the unassigned list.

curl ‘localhost:9200/_cat/shards?v’ | grep UNASS

% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 5822 100 5822 0 0 844k 0 –:–:– –:–:– –:–:– 947k


and the searchentities shards have been added to the assigned:

curl ‘localhost:9200/_cat/shards?v’ | grep “searchentities”
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 2240 100 2240 0 0 286k 0 –:–:– –:–:– –:–:– 364k
searchentities 0 p STARTED
searchentities 0 r STARTED

If you have multiple unassigned shards, you should run the command to reallocate the shards for each of them.

When all shards have been reallocated, the ElasticSearch cluster becomes green again ūüôā

4096 bit certificates with Identity Manager

Recently I struggled with applying certificates to an Identity Manager 3.3 appliance. I ran into a few issues.

The customer supplied me with a PFX file. This file contains the certificate chain and the private key.

I extracted the certificate and the private key form the PFX file (see commands below on how to do this). When trying to  apply the certificate chain and the private key to the Idm 3.3 appliance, I received an error: “Private key length is invalid

After some googling I found this article: https://kb.vmware.com/s/article/56960

The KB states that 4096 bit certificates are not supported on vIDM 3.2 and higher due to FIPS regulations. vIDM 3.2 and later comes with FIPS and this cannot be disabled.

There are 2 workarounds:

  1. Use a 2048bit certificate
  2. Install vIDM 3.1, upgrade to 3.3 and apply the 4096bit certificate. Upgrading will not enable FIPS mode.

For me the only option was number 2.

I  removed the 3.3 vIDM instance, deployed vIDM 3.1 and upgraded to 3.3. After this, I tried again to apply the 4096bit certificate.

Unfortunately I got another error: “The format of the private key is invalid

To resolve this, follow the steps in this nice blogpost from @thepeb  https://blogs.vmware.com/horizontech/2018/08/vmware-identity-manager-and-certificates.html

In short, the following commands should be executed using OpenSSL to generate the right certificate and private key pem files:

  1. Extract the certificate from the pfx file:
    • openssl pkcs12 -in mycaservercert.pfx -nokeys -out mycaservercert.pem
  2. Extract the private key from the pfx file:
    • openssl pkcs12 -in mycaservercert.pfx -nodes -nocerts -out mycaservercertkey.pem
  3. Convert from PKCS #8 to a PKCS  #1 private key
    • openssl rsa -in mycaservercertkey.pem -check -out mycaservercertkeyrsa.pem

Step 3 is the one that did the trick on vIDM 3.1. This version only supports PKCS #1 private keys. Hence the error I got, because my private key was still in PKCS #8. From version 3.2 on also PKCS #8 is supported.

IDM Connector authentication issue

Today I encountered an authenticaton issue on an idm connector, which is used for verification. Two verification adapters are configured on it, PasswordIdpAdapter and RadiusAuthAdapter.

I wanted to change a setting in the RadiusAuthAdapter.

(The screenshots are in dutch, because Identity Manager takes over the regional settings of your browser/endpoint. Currently there is no way to change languages.)

When you click on RadiusAuthAdapter, you are redirected to the configuration page of the verification adapter on the connector appliance. There you get an authentication prompt.

When entering the correct password (yes, I am sure it was the correct one ūüôā ), I always get the message “Your username or password is incorrect”:

I am not sure what the problem is, but it might be related to the redirection of the web page.

However there is an easy way to get around this.

Open a new tab in the browser and connect to the configuration page: https://FQDN of the connector:8443. Click on Connector-Services Manager. You are prompted for the admin password. Enter it and click on logon.

Logon is now succesfull:

Go back now to the browser tab with the verification adapters. Click again on the verification adapter you wanted to configure. In my case it was the RadiusAuthAdapter.

You can see it opens now without prompting for a password and you are able to configure it.

Identity Manager Cluster

As a follow up on my previous post (see here) I want to focus on how to create an Identity Manager Cluster.

This is my setup:

  • 1 Identity Manager (idm0)1 in DMZ, already behind a load balancer.
  • FQDN is changed from idm01.domain.com to portal.domain.com
  • Connectors in LAN are setup and configured for AD/Radius authentication and Horizon integration.

As you can see from the image above, everything is setup, except for the Identity Manager cluster. Identity Manager 2 and 3 are not in place yet.

To finalise the high available setup, the Identity Manager cluster in DMZ must be created. VMware recommends a 3-node cluster, because Elastic search has a known limitation with 2-node clusters. For more info, see here.

To create the cluster, follow these steps:

  1. Create DNS A-record and PTR (reverse lookup) for idm02.
  2. Create DNS A-record and PTR (reverse lookup) for idm03.
  3. Shutdown idm01.
  4. Shutdown both connectors.
  5. Snapshot idm01 (to be able to revert to the current situation in case anything goes wrong).
  6. Backup the sql database (or shutdown and snapshot sql).
  7. Clone idm01 to idm02.
  8. Clone idm01 to idm03.
  9. Start idm01.
  10. Start connector1.
  11. Start connector2.
  12. Wait until idm01 and connectors are fully booted and operational.
  13. Change ip address and hostname/FQDN on idm02 in the vAPP properties of the cloned appliance and power on the vm.
  14. Change ip address and hostname/FQDN on idm03 in the vAPP properties of the cloned appliance and power on the vm.
  15. Check the Elasticsearch cluster by executing this command on the idm appliances: curl -XGET ‘http://localhost:9200/_cluster/health?pretty=true’.
  16. Verify AD and Horizon synchronization (in my case an extra reboot of the connector appliances was needed)

In case anything goes wrong and you have to revert:

Shutdown idm02 and idm03
Revert snapshot on idm01

IDENTITY MANAGER ARCHITECTURE

On Premises¬†VMware Identity Manager High Available architecture in a single¬†‚Äčda‚Äčtacenter‚Äč

When designing Horizon Apps and VDI environments, VMware Identity Manager more and more becomes an essential part of it. It acts as a central portal providing single sign on access for users to their desktops and applications. Depending on location or permissions authorisation might be more or less restrictive.

In this blogpost I will describe the architecture of VMware Identity Manager as part of a Horizon environment with redundant components in a single datacenter.  I decided to write  an article about this, because I was somehow confused by the existing documentation and it was difficult to  find best practices for this setup. Special thanks goes to Peter Bjork (@thepeb), VMware Principal System Engineer and VMware Identity Manager and Unified Access Gateway Specialist, for providing me the right information and reviewing this document.

The most common use case I come across is this one:

  • Internal users working on thin clients need access to Horizon virtual desktops and applications.
  • Internal users with laptops or workstations want to access their virtual desktops and applications through the Identity Manager user portal.
  • External access to desktops and applications, secured with MFA, should be¬†provided via the same Identity Manager portal.
  • Users connecting from the corporate network authenticate using Active Directory username and password.

Next to these business needs another important requirement is that within a single datacenter SPOFs should be eliminated.

High Level architecture

To meet these requirements, following configuration is needed:

  • Two load balanced internal connection servers (1 and 2) with SAML authentication allowed.
  • Two load balanced external connection servers (3 and 4) with SAML authentication¬†required and Workspace One Mode enabled.
  • Two load balanced Unified Access Gateways in DMZ.
  • Three load balanced Identity Manager Appliances in DMZ with two connectors in LAN.
  • Two IDM connectors to sync AD users/groups, authenticate users against AD¬†and connect with Radius server for MFA authentication.
  • Internal¬†DNS A-record vdi.corp.local matching the load balancers vip of the internal connection servers.
  • Internal DNS A-Record vdiuag.corp.local matching the load balancers vip of the external connection servers.
  • Internal DNS A record portal.corp.com (split DNS required) matching the load balancers vip of the IDM appliances in DMZ. Both A and PTR records are required.
  • Public DNS A-record uag.corp.com for the Unified Access Gateways (UAG) matching the load balancers vip of the UAG’s.
  • Public DNS -record portal.corp.com for external access to the IDM portal

The two internal connection servers will service requests coming from thin clients and users working laptops or workstations on the corporate network. SAML authentication (between IDM and connection servers) will be configured and set to allowed (not required). The reason for not requiring SAML is that thin clients will access the connection servers directly, bypassing IDM. They will authenticate directly to the connection server with their AD username and password. Users working on  a laptop or workstation however, will first browse to the IDM portal and start their desktop or application from there. Authentication between IDM and Horizon is SAML.
Both connection servers 1 and 2 will be load balanced. Thin clients will be configured with the load balanced url (vdi.corp.local)

The IDM portal will be setup in DMZ.  Users sessions from the external network as well as from the internal network will all pass via this IDM portal. For HA reasons three appliances are needed.
To setup the IDM cluster the database must be SQL. The internal Postgress database is not supported in this scenario. To avoid SPOF, the SQL database should be hosted on a SQL Always On Cluster.

Two IDM connectors will be installed in the trusted network. These connectors handle AD authentication requests, sync AD users and groups, provide access to the Horizon environment and sync Horizon Pools and assignments. Only outbound connections over TCP port 443 will occur  between these connectors in the trusted network and the IDM appliances in DMZ. A load balancer in front of the two IDM connectors is not required, unless you are planning on doing kerberos authentication (which is out-of-scope here). Also, make sure each IDM node is accessible by both connectors.
On both connectors the PasswordIdpAdapter and RadiusAdapter must be enabled and configured.
In IDM, create an AD Integrated Windows Authentication directory. The connectors bind to AD using this directory. To configure outbound-only mode, associate the connectors with the built-in identity provider.

‚ÄčREMARK: only 1 connector can do AD sync. In case this connector is not available the other connector should be manually selected. Authentication will be done by both connectors.

In IDM two network zones will be configured: an internal one and an external one. The connection server url matching the internal zone will be set to vdi.corp.local.
The url matching the external network zone will be uag.corp.com. This dns record should be publicly available.

Two Unified Access Gateways¬†will be setup in DMZ behind the load balancer. These appliances provide external access to the Horizon desktops and applications. IDM will redirect request coming from the external network to the load balancer’s vip of the UAG’s. Authentication will be handled by IDM.
Configure the following URL’s on the UAG’s:
Connection server URL = vdiuag.corp.local.
Tunnel URL = uag.corp.com.
Blast External URL = uag.corp.com
PCOIP External URL = <public ip>:4172

To force and redirect all external user requests to the IDM portal, you must set SAML authentication to Required on the two external connection servers (3 and 4) and enable Workspace One Mode. On the UAG appliances set the Horizon URL to vdiuag.corp.local or the load balancers vip of the external connection servers. As a result, users trying to access the UAG servers (uag.corp.com) directly from the Horizon client, will be redirected to the IDM portal, honouring all authentication requirements such as MFA for external users.

For detailed info on how to configure the different components, see the following links to the VMware documentation:

IDM installation:

https://docs.vmware.com/en/VMware-Identity-Manager/3.2/vidm-install/GUID-A29C51E5-6FF5-4F7F-8FC2-1A0F687F6DC5.html‚Äč

IDM connector configuration:

https://docs.vmware.com/en/VMware-Identity-Manager/3.2/vidm-dir-integration/GUID-F0D9C0D6-E9D9-42E4-B7F9-E44E727BB07A.html

IDM connector high availability configuration:

https://docs.vmware.com/en/VMware-Identity-Manager/3.2/com.vmware.aw-enterpriseSystemsConn/GUID-32E67D59-B73C-41EF-8683-F250CB15EDE4.html

Configure IDM connector in outbound mode only:

https://docs.vmware.com/en/VMware-Identity-Manager/3.2/com.vmware.aw-enterpriseSystemsConn/GUID-C97A4D37-8F1F-4B24-9A97-1A25A0033999.html

Configure multiple client access url’s:

https://docs.vmware.com/en/VMware-Identity-Manager/3.2/com.vmware.wsp-resource/GUID-32752D1B-3937-490D-8136-D5EA664F1F8E.html