A simple versioning system in Java

Why

This is where software helps you store copies of files, or complete directory hierarchies, and later recover them, as they were at various times in the past. To save filespace, the software typically stores a copy of the most recent state, and 'difference files' that tell it what changes to make to recover previous states.

The industrial-strength version of this is a 'Source Code Control System', which also allows many people to work on the same set of files without tramping over each other's work. Typically you can 'lock' a file or directory while you are working on it, or perhaps the system warns you if somebody has changed a file between the time you took a copy of it and the time you put back your modified version.

This is a necessity if you have a group of people working on, for example, a set of files that make up a large program. It is also very useful for a single programmer working alone, who just wants to be able to recover from changes that turn out to be bad ideas. This is especially the case with 'agile' methods where the programmer frequently presents successive versions of a program to the customer to learn from their feedback. If you are doing any sort of data analysis or experimental programming, such a facility is also very useful, as it ensures that you can always reproduce previous analyses or experiments exactly.

How

This system maintains a directory tree containing one directory for every file and directory to be saved in the system. Within these directories it keeps (human-readable) control files describing what files have appeared, disappeared, and been changed. For each file it keeps the current state of the file (or the last state of the file, if it has been deleted) and a record of changes to be made to recover the previous state of the file from there, and another record to go back one more step, and so on. Be warned that the files that describe what changes need to be made to recover previous versions are not human readable or human editable, and use all 8 bits of each byte: any form of character conversion will probably corrupt them.

This system provides no way to 'lock' files or detect concurrent changes: it is intended for a single programmer working alone.

Most versioning systems use clever (and cpu-intensive) algorithms to work out the smallest possible set of changes required to change one version of a file into another. This program doesn't try to be that clever. It uses a simpler strategy (looking for chunks of text common to the old and new file versions) that should produce a correct set of changes, but probably won't produce the smallest such set. On the other hand, it doesn't require that the files in question contain text (most of the clever systems require that their input consist of a sequence of lines, and work on lines as a whole, because the cost of their algorithms can go up as the square of the number of items considered, so you'ld much rather deal in lines than in characters).

Where

The source in question is within the file mcdowella.zip (this is actually a jar file, but I've put it on this web site as a zip file to stop it being ascii-converted). The main Java class for this program within this is uk.co.demon.mcdowella.filesync.DirSync. If you run this as a program, you can create a directory to hold the state of some other file or directory hierarchy as follows:

java uk.co.demon.mcdowella.filesync.DirSync create f:\temp\duk

This directory must not already exist, and its name must start with a d (because the names of all of the directories created to represent directories in the hierarchy to be saved start with a d). In fact, its name should be a d followed by the name of the top-level directory in the hierachy you intend to save. I will be saving the directory uk and its subdirectories, so I must call the top level directory I am going to save to duk.

To save the current state of the directory hierarchy uk, run the following:

java uk.co.demon.mcdowella.filesync.DirSync syncDir uk f:\temp\duk

This is the easiest way to work: create a directory to save the state of a complete directory hierarchy, and then save the whole thing with one command every now and then. The program will not do much if a file hasn't changed, so you can afford to be lazy here and let the program work out what has changed and needs to be saved and what hasn't. If you look at the file f:\temp\duk\versions, you will see that it contains a list of version numbers, 1, 2, 3... and the date at which that version was created. If you look lower down in the hierarchy, you can see that each file in the source directory has its own directory, with versioning info and data held in that directory. You should see that more files are created in this directory only when the underlying file changes.

The easiest way to get a copy of previous state is to extract the whole thing, quoting the version number you want. For example, this extracts the state of uk as of version 3.

java uk.co.demon.mcdowella.filesync.Dirsync xd f:\temp\duk 3 f:\temp\uk_v3.

Here f:\temp\duk is the top-level directory in the save location, so it extracts the whole thing. The same command can be used quoting lower level directories to extract only subdirectories, or you can use the 'xf' command to extract a single file (the extraction process doesn't need to know where the top level of the saved area is, just the top of whatever it is you want to extract). Similarly, you can use syncDir to synchronise only a subdirectory, or syncFile to synchronise only a file. For instance, I can synchronise only the directory filesync as follows:

java uk.co.demon.mcdowella.filesync.DirSync syncDir uk\co\demon\mcdowella\filesync f:\temp\duk\dco\ddemon\dmcdowella\dfilesync

As usual, I quote the directory I want to save, and the directory within the save location I want to save it to. Also, the names of corresponding directories in the save location consist of a d followed by the name of the directory being saved. (Files are saved in directories with the file name prefixed by f).

If there is an IO error before the program has moved the new versions of its files into their old locations, it leave the existing state unchanged, and you can clean the directory up as follows:

java uk.co.demon.mcdowella.filesync.DirSync clean f:\temp\duk\dco\ddemon\dmcdowella\dfilesync

However, if the IO error happens in the final stage, when all the real work has been done, and the modified files are being moved into place, you are likely to end up with some inconsistencies. The easiest way out of this is to restore the state of the versioning directory from a backup (see the following warning).

Warning

This program contains a number of internal checks, so I believe that it is more likely to collapse noticeably than to corrupt information silently, but I do not provide any warranty for it. It is not a substitute for a proper backup strategy. (If I had to give just one piece of advice about computers it would be this: make lots of backups, and check that you can retrieve from them).

If the program does collapse as the result of some sort of I/O error it is likely to have left around files starting with "pending", because it tries to do all its I/O to such files, renaming them when it has finished all its work without error. You may be able to recover the state before the failing operation was started by simply deleting all such files.