Skipping Non Regular File Rsync Symlink

The UI makes it easy to specify entire subfolders to sync to the right or left or exclude temporarily. FreeFileSync is a C app cross-platform for Win, Mac and Linux. It loads huge directories (100,000+ files) easily. Tip: to sync a tree of symlinks, set 'Symbolic Link Handling' to 'Follow' under the 'Compare' gear-icon. Rsync, remote synchronize is known as a remote synchronization function of the software, it synchronizes files at the same time, can maintain the original file permissions, time, hard and soft links and other additional information. It is a “sync algorithm” that provides a quick way to synchronize files between clients and remote file servers, and can also be transferred via. Three basic behaviors are possible when rsync encounters a symbolic link in the source directory. By default, symbolic links are not transferred at all. A message 'skipping non-regular' file is emitted for any symlinks that exist. If -links is specified, then symlinks are recreated with the same target on the destination. On Wed, 2007-12-12 at 14:13 +0000, Chris G wrote: I was expecting that if I specified the -copy-unsafe-links option to rsync that I'd then get no warnings about 'skipping non-regular file 'bla/bla/bla' but it doesn't seem to work like that. You have to additionally pass -links to make rsync copy the safe symlinks. This is mentioned in the 'SYMBOLIC LINKS' section of the man page. Three basic behaviors are possible when rsync encounters a symbolic link in the source directory. By default, symbolic links are not transferred at all. A message 'skipping non-regular' file is emitted for any symlinks that exist.

As part of the 8.0 pre-release announcement, the OpenSSH project stated that they consider the scp protocol outdated, inflexible, and not readily fixed. They then go on to recommend the use of sftp or rsync for file transfer instead.

Many users grew up on the scp command, however, and so are not familiar with rsync. Additionally, rsync can do much more than just copy files, which can give a beginner the impression that it’s complicated and opaque. Especially when broadly the scp flags map directly to the cp flags while the rsync flags do not.

This article will provide an introduction and transition guide for anyone familiar with scp. Let’s jump into the most common scenarios: Copying Files and Copying Directories.

Symlink

Copying files

For copying a single file, the scp and rsync commands are effectively equivalent. Let’s say you need to ship foo.txt to your home directory on a server named server.

The equivalent rsync command requires only that you type rsync instead of scp:

Copying directories

For copying directories, things do diverge quite a bit and probably explains why rsync is seen as more complex than scp. If you want to copy the directory bar to server the corresponding scp command looks exactly like the cp command except for specifying ssh information:

With rsync, there are more considerations, as it’s a more powerful tool. First, let’s look at the simplest form:

Looks simple right? For the simple case of a directory that contains only directories and regular files, this will work. However, rsync cares a lot about sending files exactly as they are on the host system. Let’s create a slightly more complex, but not uncommon, example.

We now have a directory tree that looks like the following:

If we try the commands from above to copy bar, we’ll notice very different (and surprising) results. First, let’s give scp a go:

If you ssh into your server and look at the directory tree of bar you’ll notice an important and subtle difference from your host system:

Note that link.txt is no longer a symlink. It is now a full-blown copy of foo.txt. This might be surprising behavior if you’re used to cp. If you did try to copy the bar directory using cp -r, you would get a new directory with the exact symlinks that bar had. Now if we try the same rsync command from before we’ll get a warning:

Rsync has warned us that it found a non-regular file and is skipping it. Because you didn’t tell it to copy symlinks, it’s ignoring them. Rsync has an extensive manual section titled “SYMBOLIC LINKS” that explains all of the possible behavior options available to you. For our example, we need to add the –links flag.

On the remote server we see that the symlink was copied over as a symlink. Note that this is different from how scp copied the symlink.

To save some typing and take advantage of more file-preserving options, use the –archive (-a for short) flag whenever copying a directory. The archive flag will do what most people expect as it enables recursive copy, symlink copy, and many other options.

The rsync man page has in-depth explanations of what the archive flag enables if you’re curious.

Caveats

There is one caveat, however, to using rsync. It’s much easier to specify a non-standard ssh port with scp than with rsync. If server was using port 8022 SSH connections, for instance, then those commands would look like this:

With rsync, you have to specify the “remote shell” command to use. This defaults to ssh. You do so using the-e flag.

Rsync does use your ssh config; however, so if you are connecting to this server frequently, you can add the following snippet to your ~/.ssh/config file. Then you no longer need to specify the port for the rsync or ssh commands!

Alternatively, if every server you connect to runs on the same non-standard port, you can configure the RSYNC_RSH environment variable.

Why else should you switch to rsync?

Now that we’ve covered the everyday use cases and caveats for switching from scp to rsync, let’s take some time to explore why you probably want to use rsync on its own merits. Many people have made the switch to rsync long before now on these merits alone.

In-flight compression

If you have a slow or otherwise limited network connection between you and your server, rsync can spend more CPU cycles to save network bandwidth. It does this by compressing data before sending it. Compression can be enabled with the -z flag.

Delta transfers

Rsync also only copies a file if the target file is different than the source file. This works recursively through directories. For instance, if you took our final bar example above and re-ran that rsync command multiple times, it would do no work after the initial transfer. Using rsync even for local copies is worth it if you know you will repeat them, such as backing up to a USB drive, for this feature alone as it can save a lot of time with large data sets.

Syncing

As the name implies, rsync can do more than just copy data. So far, we’ve only demonstrated how to copy files with rsync. If you instead want rsync to make the target directory look like your source directory, you can add the –delete flag to rsync. The delete flag makes it so rsync will copy files from the source directory which don’t exist on the target directory. Then it will remove files on the target directory which do not exist in the source directory. The result is the target directory is identical to the source directory. By contrast, scp will only ever add files to the target directory.

Conclusion

For simple use cases, rsync is not significantly more complicated than the venerable scp tool. The only significant difference being the use of -a instead of -r for recursive copying of directories. However, as we saw rsync’s -a flag behaves more like cp’s -r flag than scp’s -r flag does.

Hopefully, with these new commands, you can speed up your file transfer workflow!

Remote file copy - Synchronize file trees across local disks, directories or across a network.

rsync is a program that behaves in much the same way that rcp does, but has many more options and uses the rsync remote-update protocol to greatly speed up file transfers when the destination file already exists. Rsync is widely used for backups and mirroring and as an improved copy command for everyday use.

Rsync finds files that need to be transferred using a 'quick check' algorithm (by default) that looks for files that have changed in size or in last-modified time. Any changes in the other preserved attributes (as requested by options) are made on the destination file directly when the quick check indicates that the file's data does not need to be updated.
Some of the additional features of rsync are:

  • Support for copying links, devices, owners, groups and permissions
  • Exclude and exclude-from options similar to GNU tar
  • A CVS exclude mode for ignoring the same files that CVS would ignore
  • Can use any transparent remote shell, including rsh or ssh
  • Does not require root privileges
  • Pipelining of file transfers to minimize latency costs
  • Support for anonymous or authenticated rsync servers (ideal for mirroring)

Usage

You use rsync in the same way you use rcp. You must specify a source and a destination, one of which can be remote.
Perhaps the best way to explain the syntax is some examples:

rsync -t *.c foo:src/
this would transfer all files matching the pattern *.c from the current directory to the directory src on the machine foo. If any of the files already exist on the remote system then the rsync remote-update protocol is used to update the file by sending only the differences. See the tech report for details.
rsync -avz foo:src/bar /data/tmp
This would recursively transfer all files from the directory src/bar on the machine foo into the /data/tmp/bar directory on the local machine.
The files are transferred in 'archive' mode, which ensures that symbolic links, devices, attributes, permissions, ownerships etc are preserved in the transfer.
Additionally, compression will be used to reduce the size of data portions of the transfer.
rsync -avz foo:src/bar/ /data/tmp
a trailing slash on the source changes this behavior to transfer all files from the directory src/bar on the machine foo into the /data/tmp/.
A trailing / on a source name means 'copy the contents of this directory'. Without a trailing slash it means 'copy the directory'.
This difference becomes particularly important when using the --delete option.
You can also use rsync in local-only mode, where both the source and destination don't have a ':' in the name. In this case it behaves like an improved copy command.

rsync somehost.mydomain.com::

this would list all the anonymous rsync modules available on the host somehost.mydomain.com. (See the following section for more details.)

Connecting to a RSYNC Server

It is also possible to use rsync without using rsh or ssh as the transport. In this case you will connect to a remote rsync server running on TCP port 873.
You can establish the connection via a web proxy by setting the environment variable RSYNC_PROXY to a hostname:port pair pointing to your web proxy.
Note that your web proxy's configuration must allow proxying to port 873.
Using rsync in this way is the same as using it with rsh or ssh except that:

  • You use a double colon :: instead of a single colon to separate the hostname from the path.
  • The first word of the 'path' is actually a module name.
  • The remote daemon might print a message of the day when you connect.
  • If you specify no path name on the remote daemon then the list of accessible paths on the daemon will be shown.
  • If you specify no local destination then a listing of the specified files on the remote server is provided.
  • Do not specify the --rsh (-e) option.
Rsync missing files

Some paths on the remote server might require authentication. If so then you will receive a password prompt when you connect.
You can avoid the password prompt by setting the environment variable RSYNC_PASSWORD to the password you want to use or using the --password-file option.
This can be useful when scripting rsync.
WARNING: On some systems environment variables are visible to all users. On those systems using --password-file is recommended.

Running an RSYNC Server

An rsync server is configured using a config file which by default is called /etc/rsyncd.conf. Please see the rsyncd.conf(5) man page for more information.

EXAMPLES
To Backup the home directory using a cron job:
rsync -Cavz . ss64:backup
Run the above over a PPP link to a duplicate directory on machine 'ss64'.
To synchronize samba source trees use the following Makefile targets:
get:
rsync -avuzb --exclude '*~' samba:samba/ .
put:
rsync -Cavuzb . samba:samba/
sync: get put
this allows me to sync with a CVS directory at the other end of the link. I then do cvs operations on the remote machine, which saves a lot of time as the remote cvs protocol isn't very efficient.
I mirror a directory between my 'old' and 'new' ftp sites with the command

rsync -az -e ssh --delete ~ftp/pub/samba/ nimbus:'~ftp/pub/tridge/samba'

this is launched from cron every few hours.

OPTIONS SUMMARY

Here is a short summary of the options available in rsync.
Please refer to the FULL List of OPTIONS for a complete description.

Skipping Non-regular File Rsync Symlink

Tips on how to use each of the options above can be found in the
FULL List of OPTIONS and Exit Values

EXCLUDE PATTERNS

The exclude and include patterns specified to rsync allow for flexible selection of which files to transfer and which files to skip.
rsync builds an ordered list of include/exclude options as specified on the command line. When a filename is encountered, rsync checks the name against each exclude/include pattern in turn. The first matching pattern is acted on.
If it is an exclude pattern, then that file is skipped.
If it is an include pattern then that filename is not skipped.
If no matching include/exclude pattern is found then the filename is not skipped.
Note that when used with -r (which is implied by -a), every subcomponent of every path is visited from top down, so include/exclude patterns get applied recursively to each subcomponent.
Note also that the --include and --exclude options take one pattern each.
To add multiple patterns use the --include-from and --exclude-from options or multiple --include and --exclude options.
The patterns can take several forms. The rules are:
# if the pattern starts with a / then it is matched against the start of the filename, otherwise it is matched against the end of the filename.
Thus '/foo' would match a file called 'foo' at the base of the tree. On the other hand, 'foo' would match any file called 'foo' anywhere in the tree
because the algorithm is applied recursively from top down; it behaves as if each path component gets a turn at being the end of the file name.
# if the pattern ends with a / then it will only match a directory, not a file, link or device.
# if the pattern contains a wildcard character from the set *?[ then expression matching is applied using the shell filename matching rules.
Otherwise a simple string match is used.
# if the pattern includes a double asterisk '**' then all wildcards in the pattern will match slashes, otherwise they will stop at slashes.
# if the pattern contains a / (not counting a trailing /) then it is matched against the full filename, including any leading directory.
If the pattern doesn't contain a / then it is matched only against the final component of the filename. Again, remember that the algorithm is applied recursively so 'full filename' can actually be any portion of a path.
# if the pattern starts with '+ ' (a plus followed by a space) then it is always considered an include pattern, even if specified as part of an exclude option. The '+ ' part is discarded before matching.
# if the pattern starts with '- ' (a minus followed by a space) then it is always considered an exclude pattern, even if specified as part of an include option. The '- ' part is discarded before matching.
# if the pattern is a single exclamation mark ! then the current include/exclude list is reset, removing all previously defined patterns.
The +/- rules are most useful in exclude lists, allowing you to have a single exclude list that contains both include and exclude options.
If you end an exclude list with --exclude '*', note that since the algorithm is applied recursively that unless you explicitly include parent directories of files you want to include then the algorithm will stop at the parent directories and never see the files below them. To include all directories, use --include '*/' before the --exclude '*'.

Some exclude/include examples:

Rsync Skip Symlinks

Batch Mode

The following call generates 4 files that encapsulate the information for synchronizing the contents of target_dir with the updates found in src_dir

Symbolic Links

Three basic behaviours are possible when rsync encounters a symbolic link in
the source directory.
By default, symbolic links are not transferred at all.
A message 'skipping non-regular' file is emitted for any symlinks that exist.
If --links is specified, then symlinks are recreated with the same target
on the destination. Note that --archive implies --links.
If --copy-links is specified, then symlinks are 'collapsed' by copying their referent,
rather than the symlink.
rsync also distinguishes 'safe' and 'unsafe' symbolic links.
An example where this might be used is a web site mirror that wishes ensure the
rsync module they copy does not include symbolic links to /etc/passwd in the public
section of the site. Using --copy-unsafe-links will cause any links to be copied
as the file they point to on the destination.
Using --safe-links will cause unsafe links to be ommitted altogether.

Diagnostics

rsync occasionally produces error messages that can seem a little cryptic.
The one that seems to cause the most confusion is 'protocol version mismatch - is your shell clean?'.
This message is usually caused by your startup scripts or remote shell facility producing unwanted garbage on the stream that rsync is using for its transport. The way to diagnose this problem is to run your remote shell like this:

then look at out.dat. If everything is working correctly then out.dat should be a zero length file. If you are getting the above error from rsync then you will probably find that out.dat contains some text or data.
Look at the contents and try to work out what is producing it.
The most common cause is incorrectly configured shell startup scripts (such as .cshrc or .profile) that contain output statements for non-interactive logins.
If you are having trouble debugging include and exclude patterns, then try specifying the -vv option.
At this level of verbosity rsync will show why each individual file is included or
excluded.

Setup

See the file README for installation instructions.
Once installed you can use rsync to any machine that you can use rsh to.
rsync uses rsh for its communications, unless both the source and destination are local.
You can also specify an alternative to rsh, either by using the -e command line
option, or by setting the RSYNC_RSH environment variable.
One common substitute is to use ssh, which offers a high degree of security.
Note that rsync must be installed on both the source and destination machines.

Environment Variables

Files

“And yet I do observe that audiences which used to be deeply affected by the inspiring sternness of the music of Livius and Naevius, now leap up and twist their necks and turn their eyes in time with our modern tunes” ~ Cicero (De Legibus II.39 c. 50 BCE) on the evils of modern music.

Related linux commands:

Grsync - GUI for rsync (how to install).
rsyncd.conf(5)
rsnapshot - Save multiple backups with rsync.
rcp - Copy files between two machines.
cp - Copy one or more files to another location.
install - Copy files and set attributes.
dd - Data Duplicator - convert and copy a file.
remsync - Synchronize remote files via email.
Equivalent Windows command: ROBOCOPY - Robust File and Folder Copy.

Copyright © 1999-2020 SS64.com
Some rights reserved