Office Web Apps server farm updates – “Unhealthy” over again

The other night I had to do some patching on our Office Web Apps Server (OWAS) 2013 farm. Even though it was only Windows Updates, with some security patches for OWAS as well, I still went for the official update method where the node to be updated is removed from the farm.

Reviewing the update notes I realized that it either must have changed recently or that I have got it all wrong before. I have always thought of it as removing a node from your farm (as well as the load balancer), applying the updates and then adding it back in. Actually, it is quite the opposite: The first node to be updated will be the first node of your new farm. Then, as you update the others, you will join them to your new farm – using the load balancer to decide what farm will take care of the actual traffic. The actual process is like this:

  1. Decommission the first node you want to update, using PowerShell cmdlet
    Remove-OfficeWebAppsMachine
    This is run on the actual node. On the same time you will decommission this node on your load balancer.
  2. Run updates on your node. Reboot when requested to do so.
  3. As the node is up and running again, run the cmdlet to make it the first node of your new farm:
    New-OfficeWebAppsFarm -InternalURL https://webapps.contoso.com -ExternalURL https://webapps.contoso.com -CertificateName “YourCertFriendlyName”
    It will not matter if you reuse the URL’s for your old farm. This info is only used to let your new farm know what URL it will be accepting traffic for, and it will have no knowledge of your old farm/nodes.
  4. Continue upgrading the next node in your farm by running step 1 and 2. After you have finished upgrading the node, add it to your new farm using the cmdlet
    New-OfficeWebAppsMachine -MachineToJoin “name-of-first-updated-node”

Depending on the number of nodes in your old farm; at some point you will want to send traffic to the new and updated farm. This is where the load balancer holds the key; when you have upgraded a “sufficient” number of nodes to handle the traffic, this is when you add the updated node(s) to your load balanced service, and removing the other(s).

That aside; during my patching session I realized that my current farm certificate would soon expire – so I would simply enroll for a new web certificate from my domain CA. This is where I made my first mistake; when requesting an OWAS farm certificate, remember to get your farm FQDN both in the certificate subject name as well as in the certicate subject alternate names (SAN) field. Also; remember to add all of the nodes’ FQDN’s (e.g. webapp01.contoso.local etc.) to the SAN fields. Without these, the farm will still work and serve traffic destined to the farm – because it will always be destined for  “webapps.contoso.com”. Nevertheless; the farm health watchdog service will keep complaining that the machine is “Unhealthy”. The most likely reason for this will show up in Event Viewer as “Could not establish trust relationship for the SSL/TLS secure channel“. What this basically means is that name of the actual node is missing from the certificate SAN and as such the server cannot even create an SSL connection with itself, which is what the health watchdogs are actually doing.

To remediate this, simply enroll for another certificate that contains both the farm FQDN as well as all farm nodes FQDN in the subject alternate names fields. And then, there’s still another lesson learned: After I had aquired the new certificate and deployed it to all nodes in the farm I just ran the cmdlet
Set-OfficeWebAppsFarm -CertificateName “NewCertFriendlyName”

After this, all nodes in the farm have to be rebooted to take effect. But, it still seems that only the master node will effectively make use of the new certificate – reporting a “Healthy” status. Even though the cmdlet Get-OfficeWebAppsFarm will report the new certificate being used, try this simple test; using your browser, navigate to https://nodeFQDN/hosting/discovery/ for each node and see whether the certificate being presented is actually trusted – if it is not, then it is most likely still using the old certificate not containing all the SAN fields.

To fix this all you have to do is removing each node (beside the master/first node, which will likely work) from the farm using cmdlet Remove-OfficeWebAppsMachine. Then add it back in once more using the previous cmdlet:
New-OfficeWebAppsMachine -MachineToJoin “name-of-first-updated-node”

Remember to give the node some time to report it’s new status, as it is only being updated during the scheduled watchdog runs every 3-4 minutes or so.

 

 

Persistent Group Chat appears broken after udate

After updating the Skype for Business Persistent Group Chat server I noticed that the Skype client would give me an error message:

Broken pchat

Old messages would not load, and trying to enter new ones would simply time out.

Suspecting a back-end connection error (not able to retrieve messages from DB) I ran the Test-CsDatabase -DatabaseType PersistentChat cmdlet, but it completed with success.

Looking into Event Viewer on the Persistent Chat server showed no errors.

A quick search online gave few results, but this blog post had some similarities. Going with the suggestion that this might simply be client related I restarted the client- and voilá, it started working again! The client had been running during server update and probably just needed a kick in the “behind”.

 

Speeding up Invoke-CsComputerFailover

As of Skype for Business server 2015 there is a new cmdlet for the update process if you are running a 3+ server pool, named Invoke-CsComputerFailover, which differs from previous versions. This cmdlet makes sure all services are drained before they are stopped, much like the older cmdlet Stop-CsWindowsService -Graceful. But where the Stop-cmdlet actually would time out if a service would’t drain, say an ongoing conference was hosted on the server, the new cmdlet will keep on forever (or at least an hour, by default).

But what if you are on a clock, maybe a maintenance window closing in?

Continue reading

Latest Skype for Business iOS client is not “foreign friendly” [LOCID]

I just got a new iPhone 6s and upon activating it I noticed the (Skype for) “Business” app being updated (v6.0.1447, released October 22nd).

When I logged in to Skype on the phone I noticed the UI being quite “funny”.

LOCID tag

At first I thought that the “LOCID” had some reference to my account being “local” or on-premise (not Office365). But even after logging in several status tags also had text like “LOC ID unavailable” or “LOC ID missed call”. I realized that this had to be reflecting some app code tag where my locale / location (language) settings were not being recognized by the app. So I tried to change the language settings to English.

Language preference

And voilá! Seems like “Norwegian Bokmål” isn’t Microsoft’s preferred language after all, even though it is listed as one of the supported ones in App Store.

Correct language

I don’t know if there are other foreign languages (non-English) that are affected by this, and I expect the bug to be fixed shortly. A funny one, though!

Upgrading to Skype for Business – some experiences

This weekend I finally got to upgrade our Lync 2013 servers to Skype for Business. The delay has been intentional as we have awaited at least the first cumulative update to be announced. We rely heavily on Lync/Skype for Business in our daily operations (1,362 A/V conferences over the last week and more than 103,113 participant conference minutes in our 250 pax company), both for telephony and collaboration, so any service disruption is poorly welcomed. As we are running an Enterprise pool with three Front Ends and the Lync 2013->Skype for Business requires an In-place upgrade this means quite some downtime as well as the added complexity of an Enterprise solution.

In the upgrade process I experienced several things that others might benefit from – so I thought I’d share some thoughts here. Continue reading

Lync server KB3080353 breaks your mobile and web app clients

Keeping your servers up to date is essential, and not only the application server parts but the OS and others as well. The other day I went with a Windows Update that also included a Lync Server security update. After a short while I would get feedback from users no longer being able to use the mobile client, and later I also got reports on the Web App not working. Continue reading

Using Chromecast in your Cisco enterprise WLAN environment

Chromecast is an interesting product even for businesses, especially it’s screen casting function (although still listed as experimental) can be utilized to present your PC to an external monitor – reducing the need for the right cables or adapters in between.

In an enterprise environment, however, it can prove difficult to connect a device like the Chromecast to WiFi networks that require features like dot1x authentication. This was the case in my company and searching for answers led me to a deployment guide from Cisco, the vendor we use our wireless solution. Although relevant it did not completely solve my issue, so if you are struggling with the same problem just keep on reading.

Continue reading