SVN-Git migration plan
Jamon Camisso
jamonation at gmail.com
Sat Jan 29 02:59:11 UTC 2011
On 1/25/2011 4:07 PM, Jamon Camisso wrote:
> For anyone who is interested, here is a list of all relevant directories
> from SVN, including those that were deleted at some point in the past.
> The plan is to map what Colin has outlined below to directories in this
> file, and then to convert each to a tag, branch, or master branch
> depending on where it needs to live in Git.
Responding to myself here, and would like to hear from people about the
following:
Justin and I have been working on importing SVN into Git this week, with
a fair amount of success. We managed to cut infusion down to about
22-24mb by removing extraneous psd files from the repository.
However, in shuffling repositories and branches around, we have
discovered that the tool being used svn-all-fast-export[1][2] does not
incorporate SVN commits to empty directories into the git repository.
This behaviour is by design - both Git and Mercurial explicitly do not
support tracking directories.
This feature (or bug depending on which side of the fence is most
attractive or comfortable) means that where historical changes to SVN
like the move from /utoronto/fluid to /fluid occurred, the particular
commit tracking that change is not present in Git.
One of goals during this migration to Git is to preserve as much history
in the various repositories that are being forked as possible. This
attempt at maintaining the historical integrity of Fluid's source code
repositories will ensure that future members or external participants in
the Fluid community will have access to relevant information about the
historical development of various projects.
With all that in mind, Justin and I can think of a few options that are
or will be more or less palatable to those who have read this far:
Option 1) Stick with SVN. Unlikely. This choice would not be in keeping
with the distributed collaborative nature of Fluid. As such it would be
a very unsavory outcome.
Option 2) Use svn-all-fast-export as it currently runs, with the proviso
that any SVN commit of an empty directory or directories will be elided
from the history of the repository. This option is semi-palatable in
that the final repositories would look and behave exactly as if they
were created in Git in the first place.
Option 3) Convert repositories using svn-all-fast-export and run "git
commit --append" on each commit in question. Said commits can be found
using the output of the svn-all-fast-export tool with full rule
debugging output enabled and piped to a log file or extracted directly
using grep:
grep -E "Exporting revision ([0-9]{4,5})?{4,5}(.*)nothing to do" import.log
That output (of 4286 commits) could then be matched to specific commits
that solely affected A/D changes to directories in SVN. For example,
r4124-4126 is one such series of commits.
Whereas each Git commit would initially look like the following:
commit ec2571d0833cbd72fa42d471ba2acdbe9ece71dd
Author: Joseph Scheuhammer <jscheuhammer at ocad.ca>
Date: Fri May 18 15:56:36 2007 +0000
Initial Fluid branch of Berkeley's Gallery Tool
svn path=/utoronto/fluid/gallery/; revision=4126
The affected commits can then be edited to look like this:
svn path=/utoronto/fluid/gallery/; revision=4124,4125,4126
Extra comment here pointing to Wiki, or SVN, or a file in Git
outlining changes to the repository
Option 4) Hack on svn-all-fast-export to make it do something with
directory modifications. This option would likely take a fair amount of
time and work to get it working just right, and is not in keeping with
the fundamental design of Git.
Option 5) Use a different tool altogether, like git-svn, or the original
svn2git tool. These tools are not nearly as sophisticated as
svn-all-fast-export in that they are a) incredibly slow and b) unable to
track changes to a file's location between directories historically
deleted directories the same way that svn-all-fast-export does.
My first preference would be Option 3. However, successfully mapping
commits of empty directories to preceding commits depends on how much
information can be extracted and correlated programmatically. If there
is too much manual work required then my other preference would be Option 2.
Option 2 is viable and would be the fastest of the two. This optiont
akes into account the fact that SVN will still be online. I would
imagine that anyone who is interested enough in who created an empty
directory would probably be willing to do the work of quickly doing and
svn log -r0001 on the repository and extracting the information that way.
The fact that not all information is being imported from SVN to Git
(Photoshop psd files for example) makes option 2 that much more
compelling in that it would take very little time to freeze SVN and just
do the conversion.
In the end options 2 and 3 both preserve information about empty
directories, albeit in two different locations. Whereas the former
retains an intact record in SVN, the latter entails taking small
liberties with the historical record in Git. However, in both cases, the
fact that committer X created directory Y will still be easily gleaned
from some easily found and well documented location for those who are
interested in such information.
tl;dr there is no easy way to import empty directories into Git. Option
2 is less disruptive and faster, while leaving information in multiple
locations. Option 3 will require some small amount of historical
revisionism, while retaining what history and files are deemed important
in one repository format.
Feedback is welcome at this point. I imagine Colin and Antranig will be
especially interested in sharing their thoughts.
Regards, Jamon
[1] http://packages.debian.org/testing/main/svn-all-fast-export
[2] svn-all-fast-export has been forked and named svn2git, the confusing
part being that there is a Ruby project that precedes the fork with the
same name..)
More information about the fluid-work
mailing list