In Part 3 of this blog post series we saw how we can enable the replication of virtual machines in on-premise (vSphere) which we want to protect and replicate to Azure.

Further we saw how to perform a failover from on-premise to Azure and how to re-protect the virtual machine to on-premise back after the failover to Azure.

In this part we will determine and checking which traffic between the ASR Appliance and our protected virtual machines will occur and we finally need to allow. Further we will see some troubleshooting in case something doesn’t work as expected.


Internet Traffic from the ASR Appliance to Microsoft

Our in vSphere deployed ASR Appliance should be able to connect to the following URLs by using outbound HTTPS TCP 443 over the Internet.

More about creating and using private endpoints for site recovery you will find in the following article https://learn.microsoft.com/en-us/azure/site-recovery/azure-to-azure-how-to-enable-replication-private-endpoints#creating-and-using-private-endpoints-for-site-recovery.

More about setting up an IPSec VPN Tunnel between on-premise and Azure you will also find in my following posts.
https://blog.matrixpost.net/azure-ipsec-vpn-tunnel-onpremise/
https://blog.matrixpost.net/set-up-a-site-to-site-ipsec-route-based-vpn-tunnel-in-azure/


Either you allow outbound all HTTPS TCP 443 traffic or you filter to just allow the listed URLs in the article by Microsoft below.

ASR Appliance –> Internet HTTPS TCP 443 (MS URIs https://learn.microsoft.com/en-us/azure/site-recovery/replication-appliance-support-matrix#allow-urls)




Determine Traffic used for Windows Virtual Machines

Below we will see whats happening under hood when we trigger the enable replication in Azure for a virtual machine running in our on-premise vSphere environment.

The ASR appliance will first copy the ASR Mobility Agent to the virtual machine we want to protect.

First the ASR appliance will connect to our virtual machine we want to protect by using as destination the TCP port 135 (RPC Endpoint Mapper or just RPC Port Mapper).

RPC plays a critical role in enabling replication, coordination, and management of workloads and data, ensuring that recovery processes can be executed effectively in a disaster recovery scenario.

The Mobility Service uses RPC to interact with the Process Server (running on the ASR appliance), which manages the replication flow, including compressing and encrypting the data before sending it to Azure.


10.0.0.45 is my ASR appliance and 10.0.0.77 is my virtual machine (Matrix-Web) I want to protect and was triggering the enable replication in Azure.


Next the ASR appliance will connect to our virtual machine we want to protect by using as destination the TCP port 445 (SMB). Here the ASR appliance is copying the installation files for the Mobility Service Agent to our virtual machine.


After the files for the Mobility Service Agent were copied to our protected virtual machine, the ASR appliance is establishing another connection to our virtual machine by using again TCP port 135 (RPC Endpoint Mapper or just RPC Port Mapper).

The RPC Endpoint Mapper listens on a well-known port (TCP 135) and acts like a “directory” that helps clients locate the network services (i.e., server processes) that provide specific RPC services.

When an RPC service starts on a server (in our case on our virtual machines we want to protect and installing the Mobility Service Agent), it dynamically registers itself with the RPC Endpoint Mapper and informs it about the port (or endpoint) it is using.

This system allows RPC services to dynamically select the ports they use instead of being restricted to well-known static port numbers.


The ASR appliance is using here remote procedure calls (RPC) to trigger the installation of the Mobility Service Agent and to manage it like triggering the replication on the protected virtual machines.

DCE/RPC (Distributed Computing Environment/Remote Procedure Call) is a communication protocol that allows software applications to call functions or procedures on remote systems as if they were local, without worrying about the details of the network communication.

It is widely used in distributed systems for enabling applications running on different machines to interact with each other.

DCE is a framework developed by the Open Software Foundation (OSF) in the late 1980s to provide tools and services for building distributed applications. It includes a set of technologies, such as distributed file systems, directory services, security services, and remote procedure calls (RPC).

RPC is a protocol that allows a program to execute a procedure (or function) on a different computer (remote system) as if it were executing locally. With RPC, the client sends a request to the remote server, which then processes it and returns the result. The complexity of managing network communication, such as data transmission, is hidden from the developer.

The key goal of DCE is to simplify the development of applications that operate across networked, distributed systems.


As mentioned, the RPC services register itself with the RPC Endpoint Mapper and informs it about the dynamically port (or endpoint) it is using.

In case of my protected virtual machine (Matrix-Web) this is TCP port 52582 as shown below. So the ASR appliance is first connecting to the protected virtual machine on TCP port 135 (RPC Port Mapper) to ask which dynamically port the Mobility Service Agent is actually using, then the RPC Port Mapper will tell the ASR appliance that this is TCP port 52582.

Below you will see the RPC reply from the protected virtual machine (Matrix-Web) on which dynamic TCP port the Mobility Service Agent is listening, here it is TCP port 52582.


From now on the ASR appliance will connect to the Mobility Service Agent on the protected virtual machine on TCP port 52582.

Therefore also the TCP 49152 – 65535 ports should be allowed inbound on the protected virtual machines. These dynamic ports are required after the initial connection on port 135 to complete the installation process.


The next protocol which is used for the connection between our protected virtual machine and the ASR appliance is TCP 443 (HTTPS) as shown below.

Using TCP 443 (HTTPS) allows the Mobility Service installed on the protected VMs to send metadata, control messages, and status updates to the ASR appliance securely.

The same port is used for initial handshakes and setup between the VMs and the ASR components, where security and encryption are important.


Here the protected virtual machine (Matrix-Web) is initiating a new HTTPS connection to the ASR appliance which is listening on TCP port 443. You will see here the initial TLS handshake between both. So also TCP port 443 inbound must be allowed on the ASR appliance.

The TLS handshake is the process by which the client and server establish a secure connection before any actual application data (like an HTTP request or response) is exchanged. The handshake consists of several messages, with ClientHello and ServerHello being the key messages that initiate the handshake.


Finally the last missing TCP port which is used between the protected virtual machines and the ASR appliance is TCP port 9443 (also HTTPS) which is used by the Mobility Service Agent on the protected virtual machines to send the replicated data and status information to the process server (ASR appliance). So finally also TCP port 9443 inbound on the ASR appliance must be allowed.


All this traffic above I was capturing directly on the protected virtual machine during the initial enable replication process triggered in Azure shown below.



Determine Traffic by using WireShark

Below we will see which traffic actually will occur for Azure Site Recovery (ASR) and we need to allow.


Between vCenter and ASR Appliance

ASR Appliance ==> vCenter inbound TCP 443



Windows VMs

ASR Appliance ==> Protected Server TCP 445

Protected Servers ==> ASR Appliance TCP 443, 9443

ASR Appliance ==> Protected Server TCP 135 (RPC Port Mapper), TCP 49152-65535 (dynamic RPC Ports)

Dynamic RPC Ports (49152–65535, TCP): Used as part of the RPC dynamic port range on the target Windows VM. These dynamic ports are required after the initial connection on port 135 to complete the installation process.


During the Synchronization progress the protected virtual machine reps. its Mobility service agent is sending the replication data to the TCP Port 9443 listening on the ASR appliance.

10.0.0.72 is the protected VM and 10.0.0.45 is the ASR appliance.



Linux VMs

ASR Appliance ==> Protected Server TCP 22 SSH — > communication for push installation

Protected Server ==> ASR Appliance TCP 443, 9443




Mobility service agent

When you set up disaster recovery for VMware virtual machines (VM) and physical servers using Azure Site Recovery, you install the Site Recovery Mobility service on each on-premises VMware VM and physical server. The Mobility service captures data, writes on the machine, and forwards them to the Site Recovery process server

The Mobility service is installed by the Mobility service agent software that you can deploy using the following methods:

  • Push installation: When protection is enabled via the Azure portal, Site Recovery installs the Mobility service on the server.
  • Manual installation: You can install the Mobility service manually on each machine through the user interface (UI) or command prompt.
  • Automated deployment: You can automate the Mobility service installation with software deployment tools such as Configuration Manager.


https://learn.microsoft.com/en-us/azure/site-recovery/vmware-physical-mobility-service-overview



Windows

Below you can see the program folder the agent will be installed on Windows.


You will also find the Mobility Service under the Windows services as usual.



Linux

In Linux the Mobility Agent will be installed in /user/local/ASR as shown below.




The ASR Mobility service agent in Linux is named vxagent.


The installed version in Linux you can determine by running the following command.

#  cat /usr/local/ASR/Vx/.vx_version





Troubleshooting


ASR Appliance not showing up Healthy in Azure

Check if the appliance can connect to Azure.

Check if all necessary services are running on the appliance. In my case some services were not started.


After starting them the ASR Appliance shown up as healthy again in Azure.



Mobility Agent Installation failed on Virtual Machines

Disable local firewall on the ASR appliance and protected virtual machines or just allow all necessary traffic as determined above.




Enable protection failed error code EP0883

Enable protection failed error code EP0883.The service was unable to install mobility service on the source machine 10.0.0.77 as the install detected an older version of mobility service on the source machine.


Remove the ASR Mobility Service on the protected virtual machines.

This will remove the Azure Site Recovery VSS Provider service.


Removing the above Mobility Service will also automatically remove the Windows Azure VM Agent.

The following folder is after the deinstallation empty, to finally remove it we need to delete it by hand.

A reboot is needed to finally complete the deinstallation, otherwise you will run again into the following error when trying to re-enable replication and to re-deploy the mobility service again.





SUSE Enterprise Linux 15 SP6 so far not supported by ASR

Support matrix for disaster recovery of VMware VMs and physical servers to Azure
https://learn.microsoft.com/en-us/azure/site-recovery/vmware-physical-azure-support-matrix#for-linux

As per support matrix document – https://learn.microsoft.com/en-us/azure/site-recovery/azure-to-azure-support-matrix#supported-suse-linux-enterprise-server-15-kernel-versions-for-azure-virtual-machines All stock SUSE 12 SP1,SP2,SP3,SP4,SP5 kernels are supported with mobility service version 9.62. SUSE Enterprise Linux 15 SP6 is supported as Azure Marketplace offering – https://azuremarketplace.microsoft.com/en-us/marketplace/apps/suse.sles-15-sp6-arm64?tab=overview but not supported yet with Azure Site Recovery. This has been confirmed by the product group: currently, it is not supported and there is not a definite timeliness to share publicly.

Source: https://learn.microsoft.com/en-us/answers/questions/1820216/linux-suse-15-sp6-compatibility-with-asr-mobility




The count of attached disks for the virtual machine could not get reported correctly by Site Recovery services.

I was running into this error because in vSphere an additional disk was attached to but so far not initialized within the virtual machine.




All attached disks must be mounted, formatted, and be assigned drive letters for the site recovery replication agent to discover them correctly.



So I first need to take the disk online, initializing it, creating an new volume and finally assign a new drive letter.


After that we first need to wait some time before we can trigger a restart of the Enable replication process.


Now it works and the Mobility Service agent could be successfully installed, also the replication could be successfully enabled.





Modernized Experience

When enabling replication in Azure, below you can see that modernized experience is selected by default.

Source: https://techcommunity.microsoft.com/t5/azure-compute-blog/general-availability-simplified-disaster-recovery-for-vmware/ba-p/3645694






Links

About Site Recovery
https://learn.microsoft.com/en-us/azure/site-recovery/site-recovery-overview

General availability: Simplified disaster recovery for VMware machines using Azure Site Recovery
https://techcommunity.microsoft.com/t5/azure-compute-blog/general-availability-simplified-disaster-recovery-for-vmware/ba-p/3645694

Prepare source machine for push installation of mobility agent
https://learn.microsoft.com/en-us/azure/site-recovery/vmware-azure-install-mobility-service

What is disaster recovery?
https://learn.microsoft.com/en-us/azure/reliability/disaster-recovery-overview