Office Web Apps server farm updates – “Unhealthy” over again

The other night I had to do some patching on our Office Web Apps Server (OWAS) 2013 farm. Even though it was only Windows Updates, with some security patches for OWAS as well, I still went for the official update method where the node to be updated is removed from the farm.

Reviewing the update notes I realized that it either must have changed recently or that I have got it all wrong before. I have always thought of it as removing a node from your farm (as well as the load balancer), applying the updates and then adding it back in. Actually, it is quite the opposite: The first node to be updated will be the first node of your new farm. Then, as you update the others, you will join them to your new farm – using the load balancer to decide what farm will take care of the actual traffic. The actual process is like this:

  1. Decommission the first node you want to update, using PowerShell cmdlet
    Remove-OfficeWebAppsMachine
    This is run on the actual node. On the same time you will decommission this node on your load balancer.
  2. Run updates on your node. Reboot when requested to do so.
  3. As the node is up and running again, run the cmdlet to make it the first node of your new farm:
    New-OfficeWebAppsFarm -InternalURL https://webapps.contoso.com -ExternalURL https://webapps.contoso.com -CertificateName “YourCertFriendlyName”
    It will not matter if you reuse the URL’s for your old farm. This info is only used to let your new farm know what URL it will be accepting traffic for, and it will have no knowledge of your old farm/nodes.
  4. Continue upgrading the next node in your farm by running step 1 and 2. After you have finished upgrading the node, add it to your new farm using the cmdlet
    New-OfficeWebAppsMachine -MachineToJoin “name-of-first-updated-node”

Depending on the number of nodes in your old farm; at some point you will want to send traffic to the new and updated farm. This is where the load balancer holds the key; when you have upgraded a “sufficient” number of nodes to handle the traffic, this is when you add the updated node(s) to your load balanced service, and removing the other(s).

That aside; during my patching session I realized that my current farm certificate would soon expire – so I would simply enroll for a new web certificate from my domain CA. This is where I made my first mistake; when requesting an OWAS farm certificate, remember to get your farm FQDN both in the certificate subject name as well as in the certicate subject alternate names (SAN) field. Also; remember to add all of the nodes’ FQDN’s (e.g. webapp01.contoso.local etc.) to the SAN fields. Without these, the farm will still work and serve traffic destined to the farm – because it will always be destined for  “webapps.contoso.com”. Nevertheless; the farm health watchdog service will keep complaining that the machine is “Unhealthy”. The most likely reason for this will show up in Event Viewer as “Could not establish trust relationship for the SSL/TLS secure channel“. What this basically means is that name of the actual node is missing from the certificate SAN and as such the server cannot even create an SSL connection with itself, which is what the health watchdogs are actually doing.

To remediate this, simply enroll for another certificate that contains both the farm FQDN as well as all farm nodes FQDN in the subject alternate names fields. And then, there’s still another lesson learned: After I had aquired the new certificate and deployed it to all nodes in the farm I just ran the cmdlet
Set-OfficeWebAppsFarm -CertificateName “NewCertFriendlyName”

After this, all nodes in the farm have to be rebooted to take effect. But, it still seems that only the master node will effectively make use of the new certificate – reporting a “Healthy” status. Even though the cmdlet Get-OfficeWebAppsFarm will report the new certificate being used, try this simple test; using your browser, navigate to https://nodeFQDN/hosting/discovery/ for each node and see whether the certificate being presented is actually trusted – if it is not, then it is most likely still using the old certificate not containing all the SAN fields.

To fix this all you have to do is removing each node (beside the master/first node, which will likely work) from the farm using cmdlet Remove-OfficeWebAppsMachine. Then add it back in once more using the previous cmdlet:
New-OfficeWebAppsMachine -MachineToJoin “name-of-first-updated-node”

Remember to give the node some time to report it’s new status, as it is only being updated during the scheduled watchdog runs every 3-4 minutes or so.

 

 

Advertisements

Lync server KB3080353 breaks your mobile and web app clients

Keeping your servers up to date is essential, and not only the application server parts but the OS and others as well. The other day I went with a Windows Update that also included a Lync Server security update. After a short while I would get feedback from users no longer being able to use the mobile client, and later I also got reports on the Web App not working. Continue reading

Using Lync phones with voice VLAN and dot1x

In a recent project I have been working on voice VLAN implementation and 802.1x (or dot1x) authentication in our Cisco switching infrastructure. There was little to nothing on the subject to be found online, so I thought I would share my experiences.

Continue reading

Topology publishing fails because Trusted Server and FQDN already exists for a different TLS target

When upgrading our Lync infrastructure from 2010 to 2013 I encountered some errors upon the first time I would publish the Lync Server 2013 Enterprise pool, consisting of three Front End servers and a fresh SQL server instance.

Diving into the resulting log file can quickly lead you to think that almost everything failed, as every parent category of the action point that actually went wrong will also be labeled “Completed with errors” or “Failed”. Therefore it is important that you (for your own mental well-being) filter out those things and drill down to the action point that is causing the problem, often with the “Execution result” column simply indicating “Error”.

Publishing error log

Continue reading

Monitoring your Lync Peak Call Capacity

Both SIP and ISDN trunk providers will often bill you based on the number of simultaneous calls/channels they provide (as well as minute charges). As a result you may end up scaling your capacity above what your real needs might be, just to be on the safe side.

With Lync there is no available tool to monitor this, neither real-time nor historically. There used to be a cool script available by Tom Pacyk to do this, but the times this need has arisen over the last year I have only been faced by an error message stating that the resource is not available.

The call performance counters reside on the Mediation server and are by no means a secret, but I haven’t seen anyone else (beside Tom) provide something to output them.

That aside; I decided to make my own PowerShell script to this.
It is still maturing, but for now it will give you

  • Console output with the current total of inbound, outbound and concurrent calls (sampled every 15 seconds)
  • CSV file output with hourly peak and average (per 15 seconds) statistics on the same counters

I will continue to develop it into a more complete solution, as I see fit. If you have suggestions or comments on the topic they are more than welcome! Although, I have to admit that my PowerShell skills are not unlimited, I promise to give it my best effort!

DISCLAIMER: As I just began working for a new employer where I have not yet got the chance to upgrade our Lync platform to server 2013, I can only vouch for it working on Lync Server 2010 – but I cannot see any reason why it should not run on the 2013 version (the counters would be identical, I think). In any case it will have to be run on the server hosting the Mediation role.

Download the latest version of the script from here.

Cheers!


Release history:

April 15 2014 – v0.5 – first basic version, dumping hourly statistics to CSV file (max/avg in/out/concurrent calls)
April 16 2014 – v0.8 – added console output with current counters, added keyboard input to exit script

Lync Server Management Shell on Windows Server 2012R2

I recently installed Lync Server 2013 on a Windows Server 2012R2 for the first time.

Although everything worked just as before I noticed that Lync Server Management Shell would not start – it would just hang with a blank window without the prompt. No error or indication as to what might be the problem.

To get around it I simply launched the “regular” PowerShell and ran a import-module lync cmdlet. After this “Lync PowerShell” can be used just like before.

Lync 2013 dual homed collocated Mediation server – the solution

This blog post is all about how to go about setting up a collocated Lync Server Mediation server with separate NIC’s for Primary (or Lync if you will) and PSTN traffic. I wrote it due to the fact that I find this setup poorly documented, and hopefully others will escape the pitfalls that I encountered by reading it.

If you stumbled upon this post directly you might also find the previous one describing the problem in more detail interesting. If not, or if you are more into just fixing problems, then please keep reading.

Continue reading