Tweets by @markusgattol |
Theoretical PartThis section list many different things about and with regards to Unison. It is a loose collections of things that should be known by people before they move on to actually installing and setting up unison for their every day usage. Peace of MindUnison provides us with two major benefits at once
Unison, a free cross-platform file synchronization program, can not only provide us with multiple backups of our files, but more importantly, grant us the freedom to simultaneously use different computers with access to all of our files, thus liberating us from the confines of one particular machine. Unison allows us to access the same set of files from any computer (running Mac OS X, Windows XP, or UNIX/Linux variants) and keeps these files up-to-date by always maintaining the most recently-modified version of each file during synchronization. I personally use Unison to keep replicas of all of my personal files across two different computers — my workstation and my subnotebook. I also use another computer (server) in the progress (more on that later).
For example, if someone is in the office on a Linux machine and wants to work on a paper for a class, he can just open up the file and start typing. Before he leave, he simply synchronizes that directory to his server located somewhere in the Internet. Then when he gets home, he runs Unison again to synchronize on his Mac and continues working on his paper. If he feels like watching TV later while continuing to work, he can simply switch to using his Windows laptop. Then he can finish up final edits at the office the next morning on his Linux box. By the time he is done for the night, not only has he edited the same paper on three different computers without the hassle of emailing copies to himself, but he has also three identical copies of it so that if any one of his computers blows up, he can still turn in his paper on time. Unison has allowed him to have the peace of mind that comes with having his files seamlessly backed-up while he is working on them and also the freedom of being able to do his work wherever and whenever is convenient. This section describes some of the benefits of using Unison and provides some tips on doing so The Benefits of UnisonThis subsection list some of the most obvious benefits that come with using Unison. Liberation from a particular Computer or Operating SystemThis is perhaps the most practical and visible advantage of using Unison in our daily computing life. If we can have access to our files from any machine that we use (and assuming that we have programs on each machine that can utilize these files), then it really does not matter which one we use. Furthermore, if we can put custom configuration files for our shell or applications in our Unison hierarchy and simply use symlinks to refer to these copies on every computer, then we can have a uniform working environment. For example, one might use the Bash shell on every computer, whether it be Windows XP with Cygwin, Mac OS X, FreeBSD, or whatever Linux he has in front of him at the moment. He might then have a common Bash configuration file shared by all machines, and files particular to each machine. His command prompt looks the same on all machines, and he can use all of the same aliases and shell functions. When he finds a cool Bash function when browsing the Internet at work, he can simply add it to his common Bash config file, sync it, and when he gets home at night, he can access that same function on his home machine. This freedom allows us to transcend the incessant bickering over which operating system is better — we can use whatever OS has the programs we want for some particular application, or simply whatever OS is in front of us at the moment. Live backups via file replicationOur personal data (documents, photographs, emails, etc...) is the most valuable component in our interaction with computers, because it can be irreplaceable if lost. Data backup is something that everybody should do, but unfortunately, few people do it on a regular basis. In contrast to traditional backup methods, the great benefit of using Unison to replicate our files across different computers is that our backups are alive. They are not sitting on some archive tape in the basement i.e. they are on the hard drives of each and every computer we use. Seamless control and Verification of BackupsBy synchronizing our Unison file replicas, we are the one who controls our backups so that we can be confident that they are being performed correctly. We verify the integrity of our backups simply by switching computers and accessing the files during our normal course of work. What often happens to people is that we think that our organization is properly backing up our files, when in fact they are not. We never consider backing up our own files because we know that our company takes care of that (better double check that!). If we lose a file, we do not sweat because we know that the sysadmins have a backup, but to our surprise, their backup was not done properly... that is when backup trauma strikes. With Unison, though, we control our own backups, and the more replicas we have, the less likely it is that we will lose our precious data. Fast and non-traumatic Recovery from Hardware FailuresA hard drive crash or total computer meltdown is traumatic for most people. Why? Not because they need to pay a few hundred dollars for new hardware, but because they have just lost most or all of their precious data. If they are somewhat diligent about backups, they probably have some old backup CDs, dating back a few months, but that is still a few months of lost work. With Unison, we back up basically as often as we use our computer, so we will at worst lose only the data that we have immediately been working on for the past few hours. If one of our machines dies, then it is annoying to pay to buy new hardware and install our OS and software again (which is trivial if you have an OS with automated package management software such as Fink, RPM Manager, or APT (Advanced Packaging Tool) or even better something like FAI (Fully Automatic Installation)), but it is non-traumatic because we have not lost any data. If we replicated the configuration files for our favorite applications, then restoring their pre-crash state is as easy as re-installing and moving those files back to the correct places. Unison allows data to transcend hardware — after all, hardware is cheap and plentiful, but our data is irreplaceable and worth a lot. Who Should Use UnisonI am not going to preach that everybody in the world should use Unison. I think that everybody should back-up their data regularly, but Unison is overkill for simply backing up data. However, for those who use more than one computer on a regular basis, those can probably gain benefits from Unison. Here are some typical configurations for different types of users: Casual Home User with no access to a ServerThe typical home user who has a laptop and desktop computer but no access to a file server probably uses a removable USB stick or hard drive to shuttle files back and forth between his computers. With Unison, he can still use that method of transferring data, except that he can be confident that all of his computers will always have up-to-date copies of files (as long as he remembers to synchronize i.e. invoke unison). For example, he can do some work on his laptop, synchronize with the removable drive, move the drive to the desktop computer, synchronize again before he starts working, and therefore has both computers (as well as the removable drive) contain the most recent versions of all files, regardless of which computer he used to edit them. University Student / On-line Storage OwnerA student at a modern university probably has a certain amount of storage space on the university servers as well as SSH (Secure Shell) remote login access, which is enough to run Unison. He should definitely take advantage of this space because it is probably well-maintained and regularly backed-up respectively it probably runs on high-end hardware... maybe even a SAN (Storage Area Network). After all, our tuition is helping to pay the salaries of people who are in charge of protecting our data. We can synchronize our various machines against the school's servers and therefore have a very well secured storage for a relatively low price. Server AdministratorThe ideal way to run Unison is if one can set up his own personal server with SSH login capabilities (this is possible with any flavor of UNIX or Linux, Mac OS X, and Windows XP with Cygwin). My suggestion is to dedicate one computer as our Unison server (especially easy using virtualization e.g. OpenVZ) which holds all of our relevant data and synchronize all of our other computers (workstation, subnotebook, etc.) to that server. I use that setup also known as star topology setup. Security ConcernsWhen I first tell people about the benefits of keeping multiple replicas of their personal data on different machines, preferably at different physical locations, one recurring concern is security.
It is true that, the more places our data resides, the more vulnerable it is to third-party snoopers. However, if we are careful with choosing a strong passphrase for user accounts, if we use secure tools like SSH (preferably in a PKA (Public Key Authentication) setup), if we use block-layer encryption, and, if we store our data on reliable/trusted servers only, then everything is fine.
Well, since I am a bit paranoid, meticulous and... well, I know a trick or two ;-]... my bottom line is, I use encrypted connections in between my data sinks and sources, block-layer encryption, IDS (Intrusion Detection System), firewall, honey pot and a bunch of other cunning things to stay safe. InvariantsGiven the importance and delicacy of the job that Unison performs, it is important to understand both what a synchronizer does under normal conditions and what can happen under unusual conditions such as system crashes and communication failures. Unison is careful to protect both its internal state and the state of the replicas at every point in this process. Specifically, the following guarantees are enforced:
The upshot is that it is safe to interrupt Unison at any time, either manually or accidentally.
If an interruption happens while it is propagating updates, then there may be some paths for which an update has been propagated but which have not been marked as synchronized in Unison's archives. This is no problem since the next time Unison runs, it will detect changes to these paths in both replicas, notice that the contents are now equal, and mark the paths as successfully updated when it writes back its private state at the end of this run. If Unison is interrupted, it may sometimes leave temporary working
files (with suffix Unison is not bothered by clock skew between the different hosts on which it is running. It only performs comparisons between timestamps obtained from the same host, and the only assumption it makes about them is that the clock on each system always runs forward. If Unison finds that its archive files have been deleted (or that the archive format has changed and they cannot be read, or that they do not exist because this is the first run of Unison on these particular roots), it takes a conservative approach i.e. it behaves as though the replicas had both been completely empty at the point of the last synchronization. The effect of this is that, on the first run, files that exist in only one replica will be propagated to the other, while files that exist in both replicas but are unequal will be marked as conflicting. Touching a file without changing its contents should never affect whether or not Unison does an update.
It is safe to brainwash Unison by deleting its archive files on both replicas. The next time it runs, it will assume that all the files it sees in the replicas are new. It is safe to modify files while Unison is working. If Unison discovers that it has propagated an out-of-date change, or that the file it is updating has changed on the target replica, it will signal a failure for that file. In such case, running Unison again will propagate the latest changes. Changes to the ignore patterns from the user interface (e.g., using
the Remote UsageThere are two basic choices to synchronize data with some remote machine
SSH is the standard/preferred method. Most folks — including me — never tried the socket connection method. How to SynchronizeThere are four possible choices
I recommend using #4. I choose What I find even more important, using approach #4 keeps things small and simple i.e. easy to maintain even after months or years of usage — try this with symlinks i.e. #3... I have been there, simply does not scale... However, how exactly my config for #4 looks like can be seen in my config file further down. Preferences respectively SwitchesThe unison manual lists all possible switches. This subsection contains a subset of facilities/switches that I find worth mentioning here explicitly. I use pretty much all the facilities listed here.
What shall I synchronize?Well, that is entirely up to anybody himself. Here is what I do ,----[ head -n40 ~/.unison/common ] | ## Paths (directories resp. files) to synchronize | #directories | path = home/sa/Desktop | path = home/sa/Mail | path = home/sa/News | path = home/sa/misc | path = home/sa/work/git | path = home/sa/em | path = home/sa/mm | path = usr/local/sbin | path = home/sa/.purple | path = home/sa/.mozilla | path = home/sa/.workrave | path = home/sa/.sec | path = home/sa/.local | path = home/sa/.emacs.d | | | #files | path = home/sa/.bashrc | path = home/sa/.bash_history | path = home/sa/.bash_profile | path = home/sa/.dingrc | path = home/sa/.emacs | path = home/sa/.emacs.elc | path = home/sa/.emacs.desktop | path = home/sa/.emacs.desktop.lock | path = home/sa/.dired | path = home/sa/.adobe | path = home/sa/.unison/common | path = usr/share/games/fortunes/mjg | path = usr/share/games/fortunes/mjg.dat | path = usr/share/games/fortunes/mjg.u8 | | | ## Data not to be synchronized | ignore = Path home/sa/mm/audio/music | ignore = Path home/sa/.adobe/Acrobat/8.0/Synchronizer | `---- As can be seen, I synchronize a bunch of directories (recursively) and
a bunch of files. Last but not least, I also use the TopologyThis is about quite the same as we already discussed before. The way how Unison can be used and therefore how the synchronization process happens can be best described via well-known network topologies. Unison can do pretty much all of them. However, the majority of people use just two — Line and Star that is.
LineLet us assume one has 1 A -> B -> C or 2 B -> A -> C or 3 B <- C <- A 4 etc. In words, in line 1, we make changes to a file called One can think about various other combinations about what can be done
with a line topology and what not. Bottom line is, if we want to have
the same version of StarUsing Unison in a star topology requires at least three replicas (the reader should not confuse replica with computer). That is, for the minimum of three replicas the following would work
I love using Unison with a star topology. I have 3 computers at home that I use actively — one server (passive node; center node within the star), a workstation and a subnotebook (both active nodes). Therefore, I never synchronize between the workstation and the subnotebook — with the star topology it is not allowed that any two active nodes synchronize themselves. Only can a synchronization happen through the passive node (the center of the star)! Where the full power of using Unison with a star topology becomes obvious is if we are on the go a lot and the passive node is accessible via a secure connection (e.g. SSH) over the Internet from any point in the world, at any times. This is exactly what I do. My server, acting as the passive node (center of the star) is located within a datacenter. Now, no matter where I am on this planet, as long as I have connectivity to the Internet, I can not just synchronize my data with the passive node but I also get my backup of my precious data in one go. Back at home from a trip to Africa or whatever, I synchronize my workstation with my passive node located far away in a highly secured datacenter and after some seconds respectively minutes, all machines/replicas (subnotebook, workstation and server i.e. active nodes and passive node) have the exact same up-to-date versions of all my data. Actually, my setup is a bit more complex i.e. I have one server located in the datacenter and one at home. The two servers synchronize themselves respectively their replicas using a simple line topology. This is triggered automatically and requires no human interaction whatsoever. I use inotify, cron and incron to trigger the synchronize with unison. Depending on where I am with the each of my active nodes (wks or sub), they either synchronize themselves with my server at home or the one located in the datacenter. Practical PartThis part takes into account all the afore mentioned aspects of living a life with Unison. After reading that subsection, everyone should be able to install, setup configure and fine tune Unison to fit his needs. However, it is not meant to be a comprehensive guide, and is merely a supplement to the official Unison manual. InstallationOne needs to install one respectively two packages: 1 sa@wks:~$ type dpl 2 dpl is aliased to `dpkg -l' 3 sa@wks:~$ dpl unison* | grep ^ii 4 ii unison 2.27.57-1+b1 A file-synchronization tool for Unix and Win 5 ii unison-gtk 2.27.57-1+b1 A file-synchronization tool for Unix and Win 6 sa@wks:~$ As can be seen in line 2, Line 5 is a nice to have but one should never need it if he is
comfortable with the CLI (Command Line Interface). Personally I am now
going to remove sa@wks:~$ su Password: wks:/home/sa# update-alternatives --config unison There are 2 alternatives which provide `unison'. Selection Alternative ----------------------------------------------- + 1 /usr/bin/unison-latest-stable * 2 /usr/bin/unison-latest-stable-gtk Press enter to keep the default[*], or type selection number: 1 Using '/usr/bin/unison-latest-stable' to provide 'unison'. wks:/home/sa# exit exit sa@wks:~$ to switch among the two. In the example above I decided to go with the non-gtk version as can be seen. Preparatory WorkNow that we have installed Unison, there are a few things that I recommend should be done before one hits the road to unlimited file synchronization. Organizing our Files and overall File System StructureWe need to organize all of the files we want to synchronize in our replicas. Before we run Unison for the first time on our data, it is important that all of our files and folders are named and organized the way that we want it to be. This is because Unison does not know when things are renamed. If for
example I suggest that all of our files be organized in sub-directories under
one main directory, which will be the root directory for our
synchronization. Again, take a look at approach Bottom line here is, before we issue unison for the first time, all data planned to be synchronized in the future should be the same in all replicas. However, this is just a recommendation and no mandatory thing since unison can do it itself — it would just take longer and also, by cleaning up a bit and reorganizing his data, one might actually get rid of some dust that set in over the years. Remove crap, rename awkward stuff, consolidate stuff, remove duplicates (fdupes, fslint, etc.), etc. Clean up your file system(s) ladies and gents! ;-] Determining our RootsWe need to now figure out which computers and hard drives we want to use to house the replicas of our files (these locations are called roots), and how they are going to communicate with one another (either locally or remotely). I recommend a star topology where one server (if possible) with a constant Internet connection is the central root, and all other computers synchronize with it remotely via SSH. This effectively turns the Unison peer-to-peer system into a client-server system. If someone does not have access to a server, then he might use a removable hard drive, acting as his central root and move it to different computers whenever he wants to synchronize the files/replicas. ConfiguringThis is the point where everyone should have taken his time with the theoretical part of this section and the official unison manual already. I am not going to provide the reader with yet another version of the official manual. I am just showing my Unison config file(s) and explain a bit what I did and why I did what I did. Setting up our Unison profileOn the computer where we are invoking Unison, it looks for a profile
located in the
What does this mean? Well, since every profile file can be divided into local and common parts, we can split a profile file into two files — one containing the common parts for all nodes/replicas and one containing information specific to one particular node/replica. As a consequence, we can share the file containing the common parts among all involved nodes/replicas. This is what I do, sharing the common parts and keeping one node specific file per node/replica. Below follows the node specific part on my subnotebook's profile file.
The file containing the node specific part is called 1 sa@sub:~$ cat .unison/default.prf 2 ## Unison preferences file 3 4 ## Roots of the synchronization 5 root = / 6 root = ssh://192.168.1.4:1235//home/sa/ur/0/ 7 8 ## Include common settings for profiles no matter where they are 9 ## invoked (client or server) 10 include common Line 5 shows the file system root (i.e.
Line 10 shows how we include the common part of the profile file. We
keep the common parts for all nodes in a separate file e.g. 11 sa@sub:~$ cat .unison/common 12 ## Paths (directories resp. files) to synchronize 13 #directories 14 path = home/sa/Desktop 15 path = home/sa/Mail 16 path = home/sa/News 17 path = home/sa/misc 18 path = home/sa/work/git 19 path = home/sa/em 20 path = home/sa/mm 21 path = usr/local/sbin 22 path = home/sa/.adobe 23 path = home/sa/.purple 24 path = home/sa/.mozilla 25 path = home/sa/.workrave 26 path = home/sa/.sec 27 path = home/sa/.local 28 path = home/sa/.emacs.d 29 30 31 #files 32 path = home/sa/.bashrc 33 path = home/sa/.bash_history 34 path = home/sa/.bash_profile 35 path = home/sa/.dingrc 36 path = home/sa/.emacs 37 path = home/sa/.emacs.elc 38 path = home/sa/.emacs.desktop 39 path = home/sa/.emacs.desktop.lock 40 path = home/sa/.dired 41 path = home/sa/.unison/common 42 path = usr/share/games/fortunes/mjg 43 path = usr/share/games/fortunes/mjg.dat 44 path = usr/share/games/fortunes/mjg.u8 45 Because I opted for approach 46 47 48 ## Data not to be synchronized 49 ignore = Path home/sa/mm/audio/music 50 ignore = Path home/sa/.adobe/Acrobat/8.0/Synchronizer 51 52 ## Miscellaneous settings 53 rshargs = -C 54 auto =true 55 confirmbigdeletes = true 56 perms = -1 57 owner = true 58 group = true 59 times = true 60 #force = newer 61 sortbysize = true 62 sortnewfirst = true 63 maxthreads = 50 64 log = true 65 logfile = /home/sa/.unison/unison.log 66 sa@sub:~$ In lines 49 and 50 I specified two paths I want to ignore i.e. although they are located within a path that is listed for synchronization (line 20 respectively line 22), I am excluding them from synchronization. Lines 53 to 65 contain various settings regarding the overall synchronization process. I already listed a description about their meaning above in this section. Using UnisonOk, now that we have made our initial copies, renames, deletes, etc. and set-up a basic profile which tells Unison which two locations (roots) to synchronize, we are ready to run Unison for the first time. We can invoke Unison by typing However, no files should be different during this initial run because we have just made a fresh identical copy across the two roots. After Unison finishes propagating all changes (not that there should be some on the initial run since we did our preparatory work), those two roots have now been initialized. When we run Unison again on those two roots, it should go much faster because the metadata has already been stored. We need to repeat this process with every pair of roots that we want to synchronize. If possible, I suggest that one adopts a star topology and synchronize all roots against a central server/replica/root, which minimizes the number of pair-wise synchronizations required. About speed... Well, I often heard folks complaining Unison is slow
... Sure it is if one is not using approach We have to remember to synchronize every time right after we login to a machine and right before we logout. Unison is only effective if we use it! Maybe we want to trigger Unison some other way instead of invoking it manually every time?! I do so. What I do exactly and other more advanced aspects of using Unison follows below. Advanced TopicsWhat we did so far is already pretty sophisticated and covers a lot of use cases. However, there are a few things that I consider useful... Automating the ProcessOne thing we can do in order to make things even more comfortable and suitable for the forgetting mind is to automate the whole process i.e. we take our Unison configuration and trigger the synchronization process depending on certain circumstances. Those circumstances can either be
Either ways, what needs to be done is to figure a way how to
Event TriggeredIn case we wanted
Out of the box inotify does not watch subdirectories. However, there are tons of wrappers out there
wks:/home/sa# type afs; afs pynotify | grep so afs is aliased to `apt-file search' python-notify: /usr/lib/python-support/python-notify/python2.4/gtk-2.0/pynotify/_pynotify.so python-notify: /usr/lib/python-support/python-notify/python2.5/gtk-2.0/pynotify/_pynotify.so wks:/home/sa# WRITEME Time TriggeredWRITEME |