Subversion mtimes

Subversion has some very useful and interesting features for a source control system. One shortcoming it shares with a number of other similar cvs-like systems is that it doesn't version files' modification time. This has been the subject of many discussions on the web over the last 3 years or more, though surprisingly little seems to have come of them.

I've tried a few workarounds to this problem, but haven't yet come up with one that I find fully satisfactory. There are version control systems around, like Perforce, that do version file mtimes; I may end up switching to one of those systems.

By default, when retrieving a file from the repository, svn sets its mtime to the current time. svn can be configured, however, to use the file's commit time instead. If files are committed shortly after modification, then this is a good approximation. The problem arises when importing old files.

Oliver Betz came up with a useful perl script which commits new files to a repository one-at-a-time in chronological order, using subversion's setprop to set the commit time for each file. This is a pretty elegant solution for a single, one-time import to a new repository, and I have a slightly modified version which seeks to minimize the number of commits necessary, by committing files with the same mtime all together. The problem with this whole approach, however, is that it causes problems when you want to add old files at a later point. Subversion permits out-of-order commit dates, but that breaks some functionality and makes it hard to find things.

I had two ideas for handling the latter situation. One possibility is to use Betz' script to import into a new repository, then dump it using svnadmin and merge it into the existing svn repository. Of course svnadmin's load command produces the same out-of-order commits problem. So I wrote my own script which merges multiple svn dump files. Assuming the initial files are in correct chronological order, the output will also be in chronological order. The script also does some other things with svn dumps which fill in missing functionality. I also wrote a primitive shell script which uses grep, awk, sort, and dd to put the revisions in an svn dump file back in chronological order. I tested this briefly under cygwin; it has some known holes and isn't really sufficient to do anything real.

The problem with all of this is that importing files would require dumping svn repositories, merging the dump files, and reloading it all into a new repository. This could become a big pain quickly. It also artificially inflates the number of revisions, which in turn slows things down a bit. Another problem is that by generating revisions in the files' chronological order, one loses the chronology of the development, which is at least as important.

What Ryan Schmidt suggested on the svn users' mailing list was to write a wrapper for svn to save each file's mtime in a versioned property at commit time and to use that property to restore the mtime when retrieving the files. At first, I thought this would be more difficult than the first method, but I threw together something functional and relatively simple, though with some obvious holes. Instead of calling svn directly, I invoke this perl script, which I named svm. I also wrote a script for adding versioned mtime properties to nodes of a particular revision in a dumpfile, and another one for setting mtime property of files in a working copy (i.e. after an initial import).

The problem with this approach is that it is very, very slow. Storage and retrieval of each file's mtime requires a separate call to svn propget or propset. On my run-of-the-mill 3GHz P4, this goes at a rate of only 4 or 5 files per second. Additionally, I had to call svn stat first, in order to get a list of which files were modified. With about 2300 nodes in the trunk, this takes much longer than svn commit. (Anyone know why?)

Perhaps it would be better to maintain all the file dates in a separate file, or even an unversioned revision property. This would improve the performance of my wrapper many fold, but would have to be thought out a bit.

It seems to me that the only truly viable solution aside from using some other system would be to change svn itself to maintain the file mtime as an additional file property and to restore it where appropriate.