Converting to the latest version

Suppose that you have some format that exists in multiple versions, with new versions appearing regularly. You want to convert everything you see to the most recent version. The following choices are standard, but can still be unsatisfactory.

Simple Choices

A Middle Way

If you view a system of converters as a network, it is obvious not only that intermediate choices not yet mentioned are possible, but also that lots of people have put a lot of work into this problem. The nodes of the network are the different versions of your format. The links of the network are the conversions you decide to implement directly. You are trying to minimise both the length of each path from one node to another and the total number of links in the network. Just to demonstrate that this approach can lead to new solutions, I will outline a compromise scheme very loosely inspired by what I can remember of some multi-processor routing networks.

In the previous cases you fix either the number of conversion steps required or the number of new converters per version required for all N, but whatever you vary grows at least as fast as N. With this middle way, both the number of conversion steps required and the number of different converters per version required increase, but only about as fast as log2 N. The message of this section is not that this particular scheme (dreamt up in fifteen minutes while trying to sleep in a plane) is the best option, but that there are potentially useful compromises here. In practice, I would expect the real network of conversions to depend a lot more on the particular changes made to the data format than on the simple structure used here.

Number the versions in order, starting at one. This means that every version number requires about log2 N bits. It also means that every version number, written out in binary, contains at least one '1'. In fact each version number consists of some arbitrary sequence of bits, followed by a '1', followed by some number of '0's.

You can think of a conversion as a way to change the binary representation of a version number. You need converters that correspond to simultaneously clearing the lowest '1' bit and incrementing the number formed by the sequence of bits above it. This amounts to clearing the 2n's bit by adding 2n to the version number. You also need converters that correspond to setting any one of a low order string of '0' bits.

This allows you to perform any conversion from an older to a more recent version. You compare the source and destination version numbers, looking for bit differences, and apply the following rules:

  1. Find the lowest order '1' bit in the source. Consider the number represented by the bit pattern with just this bit set. If you can add it to the source without making it larger than the destination, do so (clearing the '1' bit), and repeat this step.
  2. The bit pattern representing the source must now consist of a chunk of bits identical to the destination bit pattern, followed by zeros. Repeatedly set the highest missing source '1' bit.

These two rules correspond to two phases. The first moves in steps of at least one bit from right to left, and the second moves in steps of at least one bit from left to right. So you need no more than 2n different conversion steps to convert between two n-bit version numbers. That is, between any two versions in the range 1..2n-1.

There are at most k+1 ways to reach any version number with k low order zeros. If you got to it by setting a single bit, that bit must be the lowest order '1' bit, so you know exactly what the previous version was. If you got to it by clearing a low order '1' bit, it must have been where one of the k low order zeros now is. So in the worst case, when you introduce version 2k, You have to write only k converters. You can also see that each conversion step corresponds to adding on a number with only 1 '1' bit in it, and there are at most n of those, not all of which are applicable in every case.

Example

This table shows the required conversions for version numbers 1..31 (in binary), so here n=5, N=32. The argument above guarantees no more than 5 required conversions for any version, and no more than 10 steps for any path.
Version NumberMust convert from
00001(nothing)
0001000001
0001100010
0010000010, 00011
0010100100
0011000100, 00101
0011100110
0100000100, 00110, 00111
0100101000
0101001000, 01001
0101101010
0110001000, 01010, 01011
0110101100
0111001100, 01101
0111101110
1000001000, 01100, 01110, 01111
1000110000
1001010000, 10001
1001110010
1010010000, 10010, 10011
1010110100
1011010100, 10101
1011110110
1100010000, 10100, 10110, 10111
1100111000
1101011000, 11001
1101111010
1110011000, 11010, 11011
1110111100
1111011100, 11101
1111111110

A longest path here is 00001-00010-00100-01000-10000-11000-11100-11110-11111. This moves a single bit from one end to the other and back in 2(n-1) = 8 steps, so the worst case path length in general for this scheme is 2(n-1), not 2n, and this bound will always be achieved.