distrusting git

Sat, 01 Oct 2011 07:06:11 +0000
git tech rant

tl;dr git destroyed my data; my team now has severe trust issues with git

We ask a lot from our source control systems. We want them to be flexible, fast, distributed, clever and even easy-to-use. But the number 1 thing we should demand from a source control system is that it doesn’t destroy our data. Probably most importantly, it shouldn’t ever lose stuff that has been committed, but just behind that it really shouldn’t destroy data in our working directory.

When you find out that your source control system has lost your data you end up in a very bad place. Once your source control system destroys your data once, you immediately have a severe break-down of trust between yourself and your tool. You revert to using cp -R to create backups before doing anything with the tool, just in case it destroys your data again.

Development was proceeding along at a cracking pace, and I was getting ready to commit a series of important changes. Before doing so, I want to merge in the recent changes from the remote master, so I do the familiar git pull. It complained about some files that would be overwritten by the merge, so I saved a backup of my changes, then reverted my changes in those specific files, and proceeded. The merge went went fine, and pulled in all the remote changes to the working directory. Getting ready for the commit, I do a git status and start to get a little concerned; one of the files I’ve been heavily editting doesn’t show up in the status list. I cat the file to see what is going on; seems none of my changes are there. Now, I’m getting concerned, maybe I’m going slightly crazy after 3 days straight hacking, but I’m sure I made some changes to this file. I scroll up the terminal backlog to the git status I did before the pull. Sure enough, the file is marked as changed there, but not after the merge. I carefully read the full details from the merge; my file isn’t listed being touched there. Now I am really worried. git has just gone and destroyed the last 5 or 6 hours worht of work. Not happy!

Luckily, I was able to reconstruct most of the work from editor buffers, which I luckily still had open.

But, now I am worried. Why the fuck did git decide to destroy data in my working directory, without even telling me!. Did I do something wrong? Is this a case I should know about? I had to investigate.

So, I took a snapshot of my repository, rolled back to the revision before the merge, mad some minor modifications to my file, the ran the merge again. And, again, git destroys the change in the working directory. Now this isn’t normal behaviour, something is really fucked. The only thing slightly interesting about the file in question is that it had been recently renamed. Somehow this rename had confused the merge, and the merge was silently overwriting files.

Now git has a few different merge strategies, so I tried out some different ones. This was a simple pretty simple merge with 2-heads so the options were really recursive or resolve. git picks recursive be default, so I tried resolve instead. This worked fine. Surprsingly this made me feel a little better, I wasn’t completely crazy, silently updating files in my working directory wasn’t intended behaviour, there had to be something wrong in recursive merge.

So, I updated to the latest version in homebrew. Same problem.

Then it was time to start debugging git for real. So I downloaded the source (using git of course). I started having a look through merge-recursive.c. It didn’t look too bad, but there was clearly going to be a lot to learn if I was going to debug this. Before I started literring the code with prints I thought I better just see if head had the same problem. Lo and behold head worked! OK, cool, they fixed the bug. But that isn’t really a satisfying answer. Just for fun I checked out some random version to try and narrow down when the bug was fixed. In doing so I found that actually it worked in some old vesions, then didn’t work, and then finally worked again in the very latest. Here are my raw notes:

1.7.1               => good
1.7.2               => good
1.7.3               => good
1.7.4               => bad
1.7.5               => bad              
1.7.6.1 (installed) => bad
1.7.6.1 (checkout)  => bad
1.7.6.4             => bad
1.7.7-rc0           => fail
1.7.7-rc1           => pass
1.7.7-rc3           => pass

OK, this is getting more interesting. So somewhere between 1.7.2 and 1.7.3 this bug was introduced. I started using git bisect to narrow things down. I quickly got bored of manually doing git bisect good and git bisect bad, luckily I stumbled upon git bisect run that automates the whole process. After about 20 minutes compiling and testing it found the bad commit.

commit 882fd11aff6f0e8add77e75924678cce875a0eaf
Author: Elijah Newren 
Date:   Mon Sep 20 02:29:03 2010 -0600

    merge-recursive: Delay content merging for renames
    
    Move the handling of content merging for renames from process_renames() to
    process_df_entry().
    
    Signed-off-by: Elijah Newren 
    Signed-off-by: Junio C Hamano 

OK, lots of talk about merge-recursive and renames. That sounds like it makes sense; at least there is a specific bit of code that I can blame for my data destruction, maybe I don’t have to distrust the whole tool.

But to really be confident, I want to think that the fix isn’t just something random, and was actually done to fix this this problem. So I switched the return code in my test script, and ran git bisect again to find when the bug was fixed. Eventually it found this commit:

commit 5b448b8530308b1f5a7a721cb1bf0ba557b5c78d
Author: Elijah Newren 
Date:   Thu Aug 11 23:20:10 2011 -0600

    merge-recursive: When we detect we can skip an update, actually skip it
    
    In 882fd11 (merge-recursive: Delay content merging for renames 2010-09-20),
    there was code that checked for whether we could skip updating a file in
    the working directory, based on whether the merged version matched the
    current working copy.  Due to the desire to handle directory/file conflicts
    that were resolvable, that commit deferred content merging by first
    updating the index with the unmerged entries and then moving the actual
    merging (along with the skip-the-content-update check) to another function
    that ran later in the merge process.  As part moving the content merging
    code, a bug was introduced such that although the message about skipping
    the update would be printed (whenever GIT_MERGE_VERBOSITY was sufficiently
    high), the file would be unconditionally updated in the working copy
    anyway.
    
    When we detect that the file does not need to be updated in the working
    copy, update the index appropriately and then return early before updating
    the working copy.
    
    Note that there was a similar change in b2c8c0a (merge-recursive: When we
    detect we can skip an update, actually skip it 2011-02-28), but it was
    reverted by 6db4105 (Revert "Merge branch 'en/merge-recursive'"
    2011-05-19) since it did not fix both of the relevant types of unnecessary
    update breakages and, worse, it made use of some band-aids that caused
    other problems.  The reason this change works is due to the changes earlier
    in this series to (a) record_df_conflict_files instead of just unlinking
    them early, (b) allowing make_room_for_path() to remove D/F entries,
    (c) the splitting of update_stages_and_entry() to have its functionality
    called at different points, and (d) making the pathnames of the files
    involved in the merge available to merge_content().
    
    Signed-off-by: Elijah Newren 
    Signed-off-by: Junio C Hamano 

OK, this is good. Looks like they fixed the bug, and it even references the bad commit that I had narrowed things down to.

So I’m a little bit dismayed that this bug existed for almost a full year before being fixed. I can’t be the only person to have been hit by this problem can I? I looked at the release notes for v1.7.7. This is what they have to say abou the issue:

 The recursive merge strategy implementation got a fairly large
 fix for many corner cases that may rarely happen in real world
 projects (it has been verified that none of the 16000+ merges in
 the Linux kernel history back to v2.6.12 is affected with the
 corner case bugs this update fixes).

OK, so the bug never trigged in 16,000+ Linux kernel merges. Strangely that doesn’t actually make me feel any better.

So, I don’t think git sucks. All software has bugs, but bugs that destroy data are pretty devastating. It is a little hard to trust git merge operations now. I’ll probably try to make sure I don’t merge on to a working directory (i.e: stash my changes first, since then they are at least backed up on the object database).

Of course convincing my colleagues, who were also affected by this bug, and didn’t really have any love for git in the first place, that git isn’t completely broken is going to be a tough sell.

blog comments powered by Disqus