Recently I had to deal with sign-in issues a few external sfb desktop clients on a non-regular basis experience. So this is always an ungrateful situation and costs a lot of time digging into it, especially since the skype for business sign-in process is very complex.

Internal the affected clients can sign-in without any issues.

Here you can see the sign-in attempt from external at the affected client.



Because of this error message the first thought was network connectivity issues. So the first thing was testing if DNS is working and I will get the correct IPs for the Webservices FQDN and the Edge Access Server FQDN.

Everything was fine and I got the correct IPs. Now we can test if we can establish a tcp connection to the webservices which listens on tcp port 443 and the edge access server also on tcp port 443 for sip signalling which is used by our external desktop clients for authentification. The edge access server also had published tcp port 5061 for federation.


Trying to access the edge access server and tcp port 443 which uses desktop client for sign-in.


Tcp port 5061 on edge access server is typically used for federation and not relevant for our sign-process.


Trying to access the webservices from the internal frontendpool on port 443. The lyncdiscover.<your-sip-domain>.tld points to the webservices IP resp your reverse proxy which publishes the webservices URLs.


If the TCP 3-Way-Handshake to both services could be established with telnet, we can presume that network connectivity is not our problem.

So now the next step is to inspect the local log files from the desktop client.

https://docs.microsoft.com/en-us/skype-sdk/websdk/docs/troubleshooting/gatheringlogs/logs-desktopclient

There are two types of logs available from the desktop client:

  • .UccApilog files contain general client usage information
  • .etl files contain media-specific log information

For any bugs related to Audio/Video, please attach both log types if possible. For bugs not related to Audio/Video, the .UccApilog files should be sufficient.

The .UccApilog files will have names that look like this:

Lync-UccApi-[[n]].UccApilog where [[n]] should be replaced by a number 0-2.

The .etl media log files will have names that look like this:

Lync-16.0.6965.5305-Office-x86ship-U.etl


On a Windows computer, the logs for a Skype for business desktop client will be located in the following directory:

%LOCALAPPDATA%MicrosoftOffice16.0LyncTracing

The .UccApilog files contain general client usage information and is our first choice for digging into. To inspect this file you can use any text editor but to focus on relevant sip messages better use Snooper. You can download it from the internet or you will find it on your SFB Server Frontend Server under C:Program FilesSkype for Business Server 201xClsAgent

Open the file will show us that all our Register messages get an 401 Unauthorized message back.


It is quite OK depending on the three available authentification methods (NTLM, Kerberos or TLS-DSK), that you will see here some 401 Unauthorized messages first before seeing an SIP/2.0 200 OK.

The Client first tries to connect without an authentication header to gets in response the available authentification methods. The whole sign-in process and why it’s quite normal to see some 401 Unauthrized messages you can read at the sites I will provide below.

But of course it’s not quite OK to see only 401 messages to our Register requests.



So now we can be sure that we had no network connectivity issues, the problem is related to authentification. But unfortunately this will not tell me what’s going wrong under the hood.

To find more about the authentification issue I had to digging into the logs on the Frontendpool resp Registrar Server.

https://de.wikipedia.org/wiki/Session_Initiation_Protocol

We can capture traffic from the Edge and Frontenpool Servers with the Centralised Logging Service which was introduced in Lync 2013. So we do not have to capture the traffic on the edge and frontend servers separately, instead we can start a capture for both directly on one frontend server with the ClsLogger.exe located under C:Program FilesSkype for Business Server 201xClsAgent

As mentioned above there you can find also the Snooper.exe tool.



After opening the ClsLogger you can define in the first register Start-Stop Scenarios, these are filters on which the capture of the packets depends.

As we want to troubleshoot autentification issues we select the Authentifciation scenario which will show us the relevant logs inclusive autodiscover. In the section Topology on the right we see all our SFB related servers. We checked the Frontend server and edge to gather logs from them.


At the Edit Scenario you can adjust the default settings for the selected scenario. You can choose the components which should be included in the logs and the level to define how much information will be shown in the logs.

For troubleshooting I will choose Level All and the default components for the scenario. When you analyze the logs later in Snooper you can set there also filter to limit the output.


To start capturing you must go back to the Start-Stop Scenarios register an click on the Start Scenario button.

After starting go to the affected client and try to sign in that our capturing logs this try. Wait until the error appears and stop the scenario on the server.

Then we go back to our ClsLogger and the Search CLS Logs register. Here we can create the actual desired log file for analyzing with Snooper later.

For this we must provide the path to the event trace log (.etl) files .

At the moment the logfiles exists only in a compressed binary format under this path. To read and analyze this messages with Snooper or any other text editor, we first have to parse and format this .etl files. This is done with the ClsLogger when you click on the Search Logs button, you will see this below.



https://docs.microsoft.com/en-us/windows-hardware/drivers/devtest/trace-log

In my case and the default path to this .etl log files is:

C:WindowsServiceProfilesNetworkServiceAppDataLocalTempTracing

Below I will show you how to determine the path of the logfiles.

To limit the output you have to define the start and end time our logfile should contain. You can set more filters like the sip uri to limit the output.
After set all parameter click on the Search Logs button. Now ClsLogger is gathering all selected informations from the Log File Folder and put them together in one ASCII File which we can analyze with Snooper. ClsLogger save this file as a .txt file and also to the folder as the binary .etl zip files exist.


To determine as mentioned the path of the .etl logfiles execute the following command to show the whole CLS Configuration in your deployment.

Get-CsClsConfiguration


Be aware that %TEMP%Tracing is not directing to the user temp folder located under C:Users<username>AppDataLocalTemp

The Logging Service is a network service and therefore the real path of %TEMP%Tracing is directing to:

C:WindowsServiceProfilesNetworkServiceAppDataLocalTempTracing

Now we can open the .txt logfile from this folder with Snooper.exe.


You can see that my client in a loop requests a web ticket from the Web Ticket Service. After that you see a request to the autodiscover service to determine the internal and external addresses for the frontend pool where my user is homed. For this request it gets an 401 Unauthorized and the following error message in the HTTP header:

X-Ms-diagnostics: 28032;source=”<FQDN Webservices>/”;reason=”The web ticket is invalid.”;faultcode=”wsse:InvalidSecurityToken”

This procedure repeats several times till in the end an 500 internal error occurs.



After a web research most results indicate a problem with the internal server certificate. As the problem only appears to some clients from external, it must be the certificate from the External Web Site on the frontend server and the binding on port 4443.



As described under https://gallery.technet.microsoft.com/Certificate-requirements-996da98f the internal certificates subject name has to be the FQDN of the frontendpool.

In my case the External Web Site had the public certificate with the subject name from the edge access server assigned, the sip.domain.com FQDN.

This was by accident as I already had a similar problem a few years ago with Lync 2013 and therefore knows damn well the strict requirements for Lync resp. Skype for Business certificates and the subject name problem.

At that time I intentionally set the public certificate on the internal frontendpool and was not aware of the subject name problem.

Lync 2013 at that time ends up in stucking frontend service.

Keep in mind, that Lync 2013 also don’t like it, if you by accident place an intermediate certificate into the Trusted Root Certification Authorities storage, this also ends up in stucking frontend service. I don’t know if this is still present in Skype for Business Server 2019, but I suppose not as the common subject name problem did also not end up in stucking frontend service at that version.

So you can see how strict Lync/SFB is handling certificates!

After requesting and assigning a new internal certificate with the Deployment Wizard, the problem is gone and all clients could login from external without problems.

But why only some clients experienced the sign problem?

because of TLS-DSK authentication

Now we come to the complex client sign process . I don’t want to describe the whole process here as there are many good resources in the web already like:

https://docs.microsoft.com/de-de/archive/blogs/praj/skype-for-business-client-sign-in-call-flow-detailed

https://techcommunity.microsoft.com/t5/skype-for-business-blog/sfb-online-client-sign-in-and-authentication-deep-dive-part-1/ba-p/621243

https://channel9.msdn.com/Events/TechEd/NorthAmerica/2014/OFC-B412


Briefly, Skype for Business offers three different authentification packages, which clients can use to authenticate against the Frontendpool resp. Registrar Server.

  • NTLM
  • Kerberos
  • TLS-DSK (Transport Layer Security Derived Session Key)

Skype for Business by default authenticate against the server using this new TLS-DSK authentication package introduced with Lync 2010.

The TLS-DSK authentication package uses a X.509 v3 self-signed client authentication certificate. The client itself calculates the certificate and sends the public key to the Skype for Business Certificate Provisioning Service (Webservice). Therefore it must authenticate itself first with NTLM or Kerberos to the webservice and will get in response the Skype for Business User Certificate.

This User Certificate will be stored in the user’s personal certificate store.



From now on, the client using this User Certificate to authenticate against Skype for Business and therefore do not need the stored AD credentials.

This User Certificate is per default 180 days valid, which means, that the client won’t have to show it’s AD Credentials and therefore it’s password to the server to authenticate itself.



You can change the default settings with the Set-CsWebServicesConfiguration cmdlet.

In this case, if you want to deny a user access to the Skype for Business Server, it is not enough to disable the account in your Active Directory, the user is still able to authenticate himself with the User Certificate. Instead you must delete the account in your AD or remove the user certificate from the frontendpool. Herefore you can use the GUI or PowerShell.





Back to the reason why only some clients experienced the sign-in error, the sign-in process was stuck at trying to connect to the Autodiscover Webservice with the following error message:

X-Ms-diagnostics: 28032;source=”/”;reason=”The web ticket is invalid.”;faultcode=”wsse:InvalidSecurityToken”

So here something gets broken with the web ticket related to the wrong certificate on the External Web Site at the frontendpool. As I told you I accidently choose here the public certificate which had as subject name the FQDN from the Access Edge instead the the FQDN from the frontendpool.

All clients they already stored the User Certificate using TLS-DSK for authentication, they do not need a web ticket to authenticate against the Autodiscover Service, to determine the internal an external addresses for the frontend pool, where their user is homed. They can use instead the User Certificate and therefore got no errors regarding an invalid web ticket.

All Clients they do not already stored the User Certificate in it’s local storage, first needs a web ticket to authenticate against the Autodiscover Service and therefore get’s this error at sign-in from external. From internal they connect to the Internal Web Site at the Frontendpool on which the correct certificate was installed and so gets a valid web ticket.








Below a few useful commands and settings …



Determine if Tracing is enabled on the SFB Server.


Get-CsClientPolicy | fl *tra*



Enable Logging at the Client




Skype for Business 2016 and 2019 Path to Logfiles at the Client:

C:Users<your_alias>AppDataLocalMicrosoftOffice16.0LyncTracing


Enable Event Logging in the registry



Determine if AlwaysOn it enabled



Certification Provisioning Service URL

https://<FQDN Frontendpool>/CertProv/CertProvisioningService.svc


Web Tickt Service URL

https://<FQDN Frontendpool>:4443/WebTicket/WebTicketService.svc/cert,RequestVerb=POST


Autodiscover Service URLs

GET /Autodiscover/AutodiscoverService.svc/root?sipuri=<user>@domain.tld

: GET /Autodiscover/AutodiscoverService.svc/root/user?originalDomain=domain.tld&sipuri=<user>@domain.tld

https://<FQDN Frontendpool>/Autodiscover/XFrame/XFrame.html

https://lyncdiscover.domain.tld



Determine Authentication Config