Some of the more complicated merges, e.g. "adjacent lines" scare me. The comment...

qznc · on May 9, 2012

More than language-specific knowledge is necessary. Let's say we have two branches to merge:

branch A: rename foo() to bar() and adapt calls

branch B: add baz(), which calls foo()

We can assume that there is no merge conflict here, since A touches various lines within major code blocks and B adds some lines between two code blocks.

Now show me one merge tool, which understands that the call to foo() within baz(), must also be renamed to bar(). Most tools will probably just merge and produce a broken build.

jules · on May 9, 2012

With a structured code representation (i.e. ASTs rather than flat text) this falls out naturally. The calls to foo identify foo via foo's GUID, not via the string "foo". So if you rename foo to bar you don't need to rename any call sites, since the GUID is still the same. Similarly, merging the branch B just works: the call to foo there is still pointing to the right foo (now called bar) via its GUID. When displaying a call to a function, the IDE looks up the name associated with the GUID.

InclinedPlane · on May 9, 2012

Imagine you have a modularized compiler that can round-trip between raw text-based source and parse trees as well as final binaries with associated meta data attached. In that case it's not too far fetched to imagine version control systems that merge at the level of parse trees, which would allow it to detect the conflicts you describe.

qznc · on May 9, 2012

Detection is possible. Just automatically try to build it after the merge. Auto-fixing seems impossible to me, though.

rwmj · on May 9, 2012

... unless the reason you renamed 'foo' was so you could introduce another function called 'foo' which does foo properly/differently.

For a realistic example, suppose you decided that 'foo' should acquire a lock. So you rename all existing 'foo' to 'foo_nolock', and add a new wrapper 'foo' which takes the lock and called 'foo_nolock'.

If your other branch called the original 'foo', it should probably now be calling 'foo_nolock', but instead it'll be calling the lock function after the merge, and your compile (or even tests) may not be able to find that error.

sirclueless · on May 9, 2012

This is why the round trip between source-code and parse tree is so great. Say branch A adds a call to foo(), and branch B swaps out foo() for foo_nolock(). You can tell from the round trip on branch A that there was a new reference to foo(). Then in branch B you can tell that the implementation of foo() has changed.

I'm not sure how you would represent such a conflict. A valid way to resolve it would be to tell the DVCS, "You dummy, this isn't a conflict, the author of branch B obviously wanted to change foo() for every call-site, even those he didn't know about." The normal diff-file syntax of "this branch added these lines, that branch removed those lines" wouldn't work.

rwmj · on May 9, 2012

There is already a semantic format for patches: http://coccinelle.lip6.fr/sp.php

However I don't think semantic parsing helps here. For example, suppose I'd told you (the feature branch developer) that I was going to change 'foo' so that it had locking semantics, and you had deliberately used 'foo' because of this. Now when we merge you definitely don't want your 'foo' to be changed to 'foo_nolock'. Alternately you can think of a case where I don't change all 'foo' to 'foo_nolock', so the VCS has no idea what the "rule" is.

sirclueless · on May 9, 2012

I don't think it is appropriate to change a reference, like you say. If two people modify the same code, then you should signify a conflict and have someone resolve the issue by hand. There's no machine on earth that can tell whether you meant to call foo() or foo_nolock(). The point is to prevent a false positive (the worst thing by far when merging). If you modify the foo() function and I add a new reference to it, current line-based merge strategies will silently resolve that because our edits appear to be far apart, even though they are semantically conflicting. With some semantic analysis you can determine that manual resolution is much better. The point is to throw a conflict, not change a reference silently.

InclinedPlane · on May 9, 2012

The point of better merge tools shouldn't be to automate merging with 100% correctness, that's an impossible task. Instead, the point should be to have a high level of accuracy in doing safe merges and in alerting a human being to an unsafe merge that requires resolution.

JoeAltmaier · on May 9, 2012

Its true that most of the time, merging files by "shuffling the decks together" works pretty ok. Many small changes are independent, even most.

But sometimes they're not, and there is no way on earth the merge tool can distinguish. E.g. I fix a bug by incrementing a counter in the caller; you fix it my incrementing in the subroutine. Merge: now we have a new bug, the counter is incremented twice.

Another simple, irreconcilable issue is files that contain lists of things. I want to add two more items to the list; you add three. Merging, we have a 'conflict': we've both changed the end of the file. The merge here would have been trivial: add all the lines. But the merge tool cannot intuit from a text file, which kind of change it was: list edit or algorithmic change.

A database instead of a text file can help, as long as the schema allows sufficiently complex description to help the merge tool, and that would require considerable foresight on the part of the database designer.

jbri · on May 9, 2012

What I'd like to see at some point is language-aware merge tools that can both correctly merge stuff like that, and flag conflicting edits even if they don't touch the same source lines.