Office Web Apps server farm updates – “Unhealthy” over again

The other night I had to do some patching on our Office Web Apps Server (OWAS) 2013 farm. Even though it was only Windows Updates, with some security patches for OWAS as well, I still went for the official update method where the node to be updated is removed from the farm.

Reviewing the update notes I realized that it either must have changed recently or that I have got it all wrong before. I have always thought of it as removing a node from your farm (as well as the load balancer), applying the updates and then adding it back in. Actually, it is quite the opposite: The first node to be updated will be the first node of your new farm. Then, as you update the others, you will join them to your new farm – using the load balancer to decide what farm will take care of the actual traffic. The actual process is like this:

  1. Decommission the first node you want to update, using PowerShell cmdlet
    Remove-OfficeWebAppsMachine
    This is run on the actual node. On the same time you will decommission this node on your load balancer.
  2. Run updates on your node. Reboot when requested to do so.
  3. As the node is up and running again, run the cmdlet to make it the first node of your new farm:
    New-OfficeWebAppsFarm -InternalURL https://webapps.contoso.com -ExternalURL https://webapps.contoso.com -CertificateName “YourCertFriendlyName”
    It will not matter if you reuse the URL’s for your old farm. This info is only used to let your new farm know what URL it will be accepting traffic for, and it will have no knowledge of your old farm/nodes.
  4. Continue upgrading the next node in your farm by running step 1 and 2. After you have finished upgrading the node, add it to your new farm using the cmdlet
    New-OfficeWebAppsMachine -MachineToJoin “name-of-first-updated-node”

Depending on the number of nodes in your old farm; at some point you will want to send traffic to the new and updated farm. This is where the load balancer holds the key; when you have upgraded a “sufficient” number of nodes to handle the traffic, this is when you add the updated node(s) to your load balanced service, and removing the other(s).

That aside; during my patching session I realized that my current farm certificate would soon expire – so I would simply enroll for a new web certificate from my domain CA. This is where I made my first mistake; when requesting an OWAS farm certificate, remember to get your farm FQDN both in the certificate subject name as well as in the certicate subject alternate names (SAN) field. Also; remember to add all of the nodes’ FQDN’s (e.g. webapp01.contoso.local etc.) to the SAN fields. Without these, the farm will still work and serve traffic destined to the farm – because it will always be destined for  “webapps.contoso.com”. Nevertheless; the farm health watchdog service will keep complaining that the machine is “Unhealthy”. The most likely reason for this will show up in Event Viewer as “Could not establish trust relationship for the SSL/TLS secure channel“. What this basically means is that name of the actual node is missing from the certificate SAN and as such the server cannot even create an SSL connection with itself, which is what the health watchdogs are actually doing.

To remediate this, simply enroll for another certificate that contains both the farm FQDN as well as all farm nodes FQDN in the subject alternate names fields. And then, there’s still another lesson learned: After I had aquired the new certificate and deployed it to all nodes in the farm I just ran the cmdlet
Set-OfficeWebAppsFarm -CertificateName “NewCertFriendlyName”

After this, all nodes in the farm have to be rebooted to take effect. But, it still seems that only the master node will effectively make use of the new certificate – reporting a “Healthy” status. Even though the cmdlet Get-OfficeWebAppsFarm will report the new certificate being used, try this simple test; using your browser, navigate to https://nodeFQDN/hosting/discovery/ for each node and see whether the certificate being presented is actually trusted – if it is not, then it is most likely still using the old certificate not containing all the SAN fields.

To fix this all you have to do is removing each node (beside the master/first node, which will likely work) from the farm using cmdlet Remove-OfficeWebAppsMachine. Then add it back in once more using the previous cmdlet:
New-OfficeWebAppsMachine -MachineToJoin “name-of-first-updated-node”

Remember to give the node some time to report it’s new status, as it is only being updated during the scheduled watchdog runs every 3-4 minutes or so.

 

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s