For example, it seems like most SCM people think that merging is about
getting the end result of two conflicting patches right.
In my opinion, that's the _least_ important part of a merge. Maybe the
kernel is very unusual in this, but basically true _conflicts_ are not
only rare, but they tend to be things you want a human to look at
regardless.
The important part of a merge is not how it handles conflicts (which need
to be verified by a human anyway if they are at all interesting), but that
it should meld the history together right so that you have a new solid
base for future merges.
In other words, the important part is the _trivial_ part: the naming of
the parents, and keeping track of their relationship. Not the clashes.
For example, CVS gets this part totally wrong. Sure, it can merge the
contents, but it totally ignores the important part, so once you've done a
merge, you're pretty much up shit creek wrt any subsequent merges in any
other direction. All the other CVS problems pale in comparison. Renames?
Just a detail.
And it looks like 99% of SCM people seem to think that the solution to
that is to be more clever about content merges. Which misses the point
entirely.
Don't get me wrong: content merges are nice, but they are _gravy_. They
are not important. You can do them manually if you have to. What's
important is that once you _have_ done them (manually or automatically),
the system had better be able to go on, knowing that they've been done.
I see that git has been updated to 'pass' the indent-block test, because it produces the correct output, but the resulting indentation is not correct.
I have git.mergetool set to bc3 (Beyond Compare 3), so I tried running 'git mergetool' for each of the failed cases. In the adjacent case, bc3 merged things correctly, and all I had to do was accept its merge. In the indent-block case, I just had to fix (some of) the spaces, before accepting the merge. The only case where I had to do some real work was in dual-renames, but even then, it was fairly trivial.
So, I agree with you. To me, it doesn't matter that git (or any other tool) sometimes gets content merging wrong. It _is_ gravy, and can be handled by other tools (bc3 in my case). What external tools can't do is manage your history.
I agree with Linus that creating a solid merge base is far more important than clever merging. But I think there is still room for improving on Git in this respect. A lot of what I'm about to say is inspired by this video from the Camp guys: http://projects.haskell.org/camp/unique
Git forces you to treat your history as a single linear sequence of commits. This is an unnecessary restriction if some of the changes in that sequence are totally independent of each other. For example, if two changes touch two completely different files and are unrelated, why should you be forced to sequence them in one order?
Here is a practical situation illustrating this limitation. Once in a while I'll want to patch a coworker's in-progress change into my working directory where I also have changes. Perhaps I want to build a binary with several experimental in-progress changes in it. Suppose my coworker's changes are totally independent of mine (say they touch completely different files).
I can do this in Git by applying his patch to my working directory (or doing a "git merge" with his branch). But now suppose I'm done with the experiment and want to back out my coworker's change, so my working directory is left with only my change. If I haven't made any more tweaks to my change in the meantime then I'm ok, I can just "git reset --hard HEAD^" to discard my coworker's change. But what if I made further changes to my change in the meantime? There's no easy way with Git to manipulate the two changes independently within the same branch, even though there are no actual dependencies between them.
Sure you could create a separate branch for the merged thing. Every time you want to change your part you switch back to your branch, make the change, then switch back to the merged branch and merge again. But who wants to be that disciplined? Who should have to be that disciplined when the computer could do the work of knowing that the two lines of change are independent of each other?
Git's ability to create stable and verifiable SHA1's is important, and I think that any future SCM will need to have this capability. But I don't think this implies that you have to treat the history in a strictly linear way. You could create SHA1 checkpoints when a particular person wants to publish and/or sign a tree and its contents, but still allow the individual commits to be treated in a more flexible way. The SHA1 checkpoints could be like barriers; each change is either part of the checkpoint or not, and the checkpoints could know their parent checkpoint(s) so that there is still a verifiable history available for auditing.
I hope an approach like this could make large projects like Linux more intuitive to follow. I always found it unfortunate that the graph of commits for any project with lots of merge activity is totally indecipherable. For example, here is a screenshot of Git's own Git repository: http://i.imgur.com/RyQm3.png If independent changes could be viewed independently, and if every merge didn't have to be an explicit commit, perhaps this could be easier to follow.
There are definitely lots of unanswered questions here and I don't claim to have all the answers. My point is just that I don't think Git is necessarily the last word in distributed version control.
I don't see why you think git's approach to history is flawed. If changes to separate areas are kept separate, how can anyone coordinate on a single set of changes? What if 1 patch does't affect an area, but depends on it staying the same? Sorry, I can't imagine why you would want a commit to contain anything other than the state of the entire repo.
That doesn't mean there aren't easy solutions to your problems, though.
For your first situation (test-merging in coworker commits), why are you doing it on your own dev branch? Say you're working on branch 'mine', just `git checkout -b 'mine-exp'` before merging things. Then you can continue to develop on 'mine' and/or mess with 'mine-exp'. Can't manipulate separate changes within the same branch? Create more branches!
Want to look at the history of only certain paths? Just specify them at the end of your `git log` command (you can get something similar to gitk's output with `git log --oneline --graph --decorate`). If that's not enough for you, there's a whole section on 'History Simplification' in the git-log manpage. It's possible in gitk as well, under View>(New/Edit) view. The penultimate text area lets you narrow the history to commits that affect the specified files and directories.
Could all of this be documented better? Maybe, but in the first case, creating branches is the git philosophy (they're only 40 bytes each). And limiting the log output is something that svn had, so I figured git did as well.
> I don't see why you think git's approach to history is flawed.
I didn't use the word "flawed" nor do I believe it is flawed. If anything I would call it "incomplete," since what I am describing is essentially a superset of Git's existing model. I'm dreaming about whether the future could be better than what we have today.
> Say you're working on branch 'mine', just `git checkout -b 'mine-exp'` before merging things.
I explored this option in my comment. The problem is that it requires discipline that is not fundamentally necessary. It imposes branching/merging busywork that I believe could be avoided.
Suppose I'm right and some of this branching/merging busywork could be avoided. Wouldn't we be in a better place than we are today? Isn't it worth exploring this possibility?
Well, if you're asking the user to keep track of anything more than the SHA1 of a commit and all of the commands available to you, then that's asking a lot (because of all the commands that git has)
> it requires discipline that is not fundamentally necessary
If you don't proactively make branches, then you're like me. In that case, I make liberal use of git add -p, git stash, and git reset --hard (but only when everything is stashed or committed, to move branch pointers around). And then I always make a mental note, that next time I'm going to make risky changes, make the branch first (I don't get to use git often enough for my habits to change, though). But in my case, it's usually because I start working on one thing, and then mid-course, decide to work on something else. Because I came from svn, I forget how cheap commits are as well, and that it's possible to commit non-working code. It's definitely more desirable than creating commits that do too much.
So, I disagree with both parts of your claim. It doesn't require significantly more discipline, provided you don't make the mistake of committing too much (e.g. always be sure to start working on an idea from a clean checkout). And the discipline required for the smoothest workflow _is_ fundamentally necessary regardless of what VCS you use, because no VCS can know what idea you're working on unless you tell it.
If you're often committing to the wrong place, I suggest adding the current branch name to your shell prompt, or getting familiar with `git cherry-pick`. If you can't keep track of what's been merged with what, gitk and various git-log options are your friend.
Do you at least agree that git has the history pruning options you were asking for?
> Suppose I'm right and some of this branching/merging busywork could be avoided. Wouldn't we be in a better place than we are today? Isn't it worth exploring this possibility?
First of all, my proposed solutions can be followed _now_, regardless of the pursuit of yours.
Second, what you're describing sounds to me exactly like submodules, which require their own discipline, and have their own set of problems (hence the recent inclusion of the git-subtrees project). And if you think that maybe git can automatically decide what files go in each submodule, then good luck, because I think you're at a point where nothing can convince you otherwise.
Edit: I take back all I said about your idea being unconditionally too complicated. It seems like it's already being done by the darcs/camp projects. I'll have to check those out eventually.
> First of all, my proposed solutions can be followed _now_, regardless of the pursuit of yours.
That may be, but if Linus had thought this way there would be no Git. We'd still be trying to shoehorn CVS into doing what we want. Personally I think it's extremely gratifying to help make the future happen.
> Second, what you're describing sounds to me exactly like submodules
What? Not at all. Take five minutes and watch the video I linked to.
> That may be, but if Linus had thought this way there would be no Git.
No, because CVS did not track the necessary information. Git does.
> What? Not at all. Take five minutes and watch the video I linked to.
An oversignt on my part. I didn't even see the video link the first time around, but when I saw it in another HN post, I added the following to my post
>> Edit: I take back all I said about your idea being unconditionally too complicated. It seems like it's already being done by the darcs/camp projects. I'll have to check those out eventually.
So, it still seems like it's more complicated than git for the average case (everyone syncing to a common state), but it's obviously not too much information for any one person to keep track of in all cases, since these projects exist.
> You can use "git rebase -i" to remove the unwanted commit from the history.
True, but now you're forcing the user to make a bunch of yes/no decisions about what should be kept and what should be discarded. If you make the wrong decision at any point you can lose your work! (I can't remember off the top of my head if Git keeps a tag for the pre-rebase state, but if so that's a lot of clutter that would accumulate over time). And the VCS still hasn't helped you determine what lines of development are independent of each other.
The workflow I am envisioning lets you visually see your working directory as a bunch of independent lines of development. Before you do anything, you can see that your coworker's change is independent of your own changes (because the VCS has analyzed the changes and knows that this is so), which gives you confidence that you can easily remove his change without affecting any of your changes. Then you have the option to simply remove his change, and it's gone.
This. In fact we have a policy of never using merge, but always using rebase (to the point where we think 'git pull --rebase' should be the default action of pulls).
+1, although we use "git fetch origin && git rebase -p origin/branchname" to avoid the nasty behaviour of 'git pull --rebase' where it rewrite all commits of a merged branch on the current branch instead of just redoing the merge commit. Looking back at a 18 month old part of the history and seeing (feature branch x was merged here) is far more helpful than finding a bunch of duplicate commits IMHO
yes, using rebase to avoid spurious merge commits is also a useful practice, but I was referring to "git rebase -i" (interactive), which can be used to remove commits, splitting/squashing commits, editing commits etc.
Sometimes you just don't have time to do clean commits (by committing only relevant changes with "git add -p"), or you fix something later which was logically part of a previous commit.
Being able to curate the commit history or your local clone, besides pure aesthetics, allows you to present a given contribution to other team members so that they can better understand what you wanted to do and perhaps review the commits.
2009: Git: Bram Cohen vs Linus Torvalds http://news.ycombinator.com/item?id=505876
which refers to
2007: A look back: Bram Cohen vs Linus Torvalds http://www.wincent.com/a/about/wincent/weblog/archives/2007/...
which refers to
2005: Re: Merge with git-pasky II. http://www.gelato.unsw.edu.au/archives/git/0504/2153.html
Where Linus says:
For example, it seems like most SCM people think that merging is about getting the end result of two conflicting patches right.
In my opinion, that's the _least_ important part of a merge. Maybe the kernel is very unusual in this, but basically true _conflicts_ are not only rare, but they tend to be things you want a human to look at regardless.
The important part of a merge is not how it handles conflicts (which need to be verified by a human anyway if they are at all interesting), but that it should meld the history together right so that you have a new solid base for future merges.
In other words, the important part is the _trivial_ part: the naming of the parents, and keeping track of their relationship. Not the clashes.
For example, CVS gets this part totally wrong. Sure, it can merge the contents, but it totally ignores the important part, so once you've done a merge, you're pretty much up shit creek wrt any subsequent merges in any other direction. All the other CVS problems pale in comparison. Renames? Just a detail.
And it looks like 99% of SCM people seem to think that the solution to that is to be more clever about content merges. Which misses the point entirely.
Don't get me wrong: content merges are nice, but they are _gravy_. They are not important. You can do them manually if you have to. What's important is that once you _have_ done them (manually or automatically), the system had better be able to go on, knowing that they've been done.