When running a Schedule or Real-Time Monitor, the following warning can be received:
"Root path cannot be accessed or does not exist. Usage has been discontinued."
This warning indicates a communication issue occurred reaching the path denoted in the Log Viewer (Job Log for versions older than v7). A communications issue could be caused by a number of things including:
- An issue with the physical connection between the machines. For example, an Internet outage.
- Slow response time from the destination machine. For example, if the machine is CPU or I/O bound at 100% for extended periods of time, it's possible for the machine to not respond to a SureSync request within the configured timeout.
- An extreme temporary spike in latency on the network.
- The Communications Agent port not being open. This can occur in new installations where the port hasn't been opened or in existing environments where someone on the network team makes a firewall change. Try a Test Connection to the agent. Does it succeed? If not check this KB for v9: https://support.softwarepursuits.com/test-connections-to-agent-button-in-suresync-9-fails and this KB for v8: https://support.softwarepursuits.com/test-connections-to-agent-button-in-suresync-8-fails
SureSync has extensive recovery options available to recover dropped paths. SureSync itself cannot prevent communications issues between the machines involved in the synchronization/replication. However, it can recover so the work gets done.
By default SureSync 7 (and newer) have timeout values of 90 seconds or greater on almost all actions. In SureSync 8 (and newer), you can further increase timeouts by clicking on the Computer in question in the left-hand tree view of the SureSync Desktop and clicking on the Throttling tab. There is a timeout value which can be set to a maximum of 300 seconds.
SureSync 6 and SureSync 5 have shorter timeouts ranging from 10 to 30 seconds. Due to this, newer versions are much more tolerant of slow networks or slow machines. Upgrading to the current release of SureSync is strongly recommended in all environments.
The behavior that happens when this warning is encountered varies depending on if you're running a Schedule or a Real-Time Monitor.
If a Schedule is running a Job that consists of only 2 path, the Schedule will be terminated and rescheduled to run at the next scheduled start time. You must have at least 2 paths for synchronizations/replication to occur.
If a Schedule is running a Job that has more than 2 paths, the Schedule will continue processing files to the other paths that are online. Once completed, the Schedule will be rescheduled to the run at the next scheduled start time. When that next scheduled run of the Schedule executes, the path that was offline will be caught up to the others.
Corrective Actions for Schedules
On the Repeat tab of a Schedule you can configure retries for the Schedule using the "Number of times to retry a failed job" option. Enter a number of retries and the time between retries. This will tell the Schedule to retry execution the configured number of times before rescheduling to the next run time. If the connection issue is temporary, this can allow recovery to happen faster. The recommendation is to use a few retries spaced short intervals of time apart. For example, 3 retries every 5 minutes.
Real-Time Monitors are designed to be running 24/7. With this in mind, path recovery is much more in-depth with a Real-Time Monitor. A Real-Time Monitor will always continue synchronizing the remaining online paths in a particular Job (v7 and newer) or Relation (v6 and v5).
In v5 and v6, the default retry interval is 5 minutes with unlimited retries. This means that when a path drop occurs, every 5 minutes recovery of the path will be attempted. When the path is available again, it will be caught up to the other paths and further changes will be processed to all paths.
In v7 (and newer), the default retry interval is 5 minutes with unlimited retries as well. However, the recovery process has been significantly enhanced where the Real-Time Monitor will be internally retrying much sooner. It's possible to get a temporarily offline path back much faster in newer versions. If a path is offline for the full 5 minutes, another retry will be issued.
Corrective Actions for Real-Time Monitors
No changes are needed for Real-Time Monitors. Out of the box, Real-Time Monitors are extremely durable. On the Options tab of a Real-Time Monitor you can check the "Number of times to restart a failed Real-Time Monitor to ensure it's set to 0 (unlimited). Putting a value of other than 0 in this field limits the monitor to the defined number of retries and can result in the Real-Time Monitor being placed on hold. You can also change the "Time between retries" option.
Additional Actions for All Types
Path drops happen in complex network environments and do not necessarily indicate a problem. If the path drops and successfully recovers, the software has done what it is supposed to do. You can determine if a path is recovered by looking in the Job Log Viewer (v6 and v5) or the Log Viewer (v7 and newer). If you see a "Monitoring of path successfully restored." message then the path was recovered.
If you have chronic path drops, it's worth investigating the state of the network connections. It is also a good idea to use Task Manager to review the state of each machine. Are any machines CPU bound? Memory bound? Performing excessive I/O or paging operations?
Another potential issue is network optimization or packet shaping type devices on the network. An example of such device would be a Riverbed Steelhead WAN Optimizer but there are countless others. If you're shaping traffic and the device you're using to do so either drops SureSync packets or slows them down you can unexpectedly create path drops. Often times, path drops won't occur for long periods of time and then suddenly show up coinciding with things like a firmware upgrade or configuration change to these types of devices. The same can be true for firmware updates, software updates or other changes to firewalls and other network devices.
In SureSync v6 and newer, the multi-threaded nature of the synchronization engine can sometimes cause path drops with slow machines. For example, assume you have 5 Real-Time Monitors running each with 4 active file copies. This could result in up to 20 active file copies for a machine. If the destination machine becomes I/O bound or is paging heavily it could cause a slow response time that results in a path drop.
For v6 (and newer), it is recommended to set the Real-Time Monitors, Schedules or Relations to run single threaded to reduce load and see if that helps. This is done by changing the Normal priority (or whatever Priority they are using) in Tools | Options. On the Priorities tab there is an option "Maximum number of worker threads" that can be set to 1 to single thread the engine. The Real-Time Monitors, Schedules or Relations must be stopped and started again to pick up the new setting. If single threaded resolves the issue then you can slowly increase the threads by setting it to 2 and then 3 until you see where the problem surfaces.
For v7 (and newer), extensive internal throttling has been added. SureSync monitors the busy % of each hard drive involved and will slow down if the busy % reaches over 80% by default. With v7 (and newer) no changes should be needed but it never hurts to try single threading as well.