Benefits for LWN customers
The primary benefit from subscribing to LWN is helping us continue publishing, but, in addition, subscribers get immediate access to all site content and access to many additional site features. Please sign up today!
By darroch alden
31 October 2025
The idea of automated syntax-aware merging in version-control systems dates back to 2005 or earlier, but early implementations were often language-specific and slow. Mergiraff is a merge-conflict resolver that uses a general algorithm and a small amount of language-specific knowledge to resolve conflicts that Git’s default strategy cannot. Contributors to the project have only been working on the tool for less than a year, but it already supports 33 languages, including C, Python, Rust, and even SystemVerilog.
Mergirafe was started by Antonin Delpuych, but many other contributors have come forward to help, of whom Ada Alakbarova is the most prolific. The project is written in Rust and licensed under version 3 of the GPL.
The default Git merge algorithm (“ort”) is primarily line-based. It contains some tree-based logic for merging directories, but changes within a file are merged on a line-by-line basis. This can lead to situations where two logically different changes affecting the same row cause a merge conflict.
Consider the following base version:
void callback(int status);
And then let’s say a person makes a mistake:
int callback(int status);
While someone else changes the argument type:
void callback(long status);
The default merge algorithm cannot handle this, because there are conflicting changes on the same row. However, syntax-aware merging is based on syntactic elements of the language, not individual lines. So, for example, Mergiraff might resolve the above conflict like this:
int callback(long status);
From its point of view, transformations do not actually overlap, because the return type and the argument type are treated as separate, non-overlapping fields. This type of syntax-aware merging has been around for many years, but the complexity of writing merge algorithms for syntax trees has prevented it from being truly practical for widespread use. An implementation of the idea for Java, Spark, was released in 2023, showing that it was indeed possible. Mergiraff attempts to extend that Java-specific algorithm to programming (and configuration or markup) languages in general.
design
Mergiraff relies on a tree-sitter incremental parsing library to convert different languages into common syntax trees where each leaf corresponds to a specific token in the file, and each internal node represents a language construct. However, Mergirafe requires relatively little knowledge of each language to work. Instead, it uses a non-language-specific tree-matching algorithm to guide conflict resolution, along with a small amount of language knowledge on top. This design is part of the reason the tool has been adapted into many different languages.
The MergeRef algorithm starts by performing a regular line-based merge; If it is successful, as it often is, the program does not need to resort to more expensive tree-based merging algorithms. However, even if a line-based merge fails, it often fails only in a few places. When parsing different versions of the file being merged, mergeref can mark any part of the syntax tree that was resolved by the line-based merge with no conflicts as not requiring changes, allowing it to focus only on the conflicting parts. It provides adequate speed especially for large files.
For the remaining parts, the tool uses the Gumtree algorithm to find unambiguous matches among the remaining subtrees. Identifying matches is enough to differentiate, but it does not provide enough information in itself to resolve any conflicts. Next, Mergiraff flattened the syntax tree into a list of facts about how the nodes in the tree relate to each other. These facts are tagged according to whether they come from the base, left, or right revision of the merge (i.e., the most recent common ancestor, the commit being merged, and the commit being merged). A new syntax tree is then reconstructed from the merged list of facts. If any fact of Aadhaar amendment conflicts with any other fact then it is rejected. If two facts of left and right revisions disagree, this indicates a real conflict that Mergiraf cannot resolve.
The advantage of this approach is that it eliminates the kind of move/edit conflicts that plague AUT algorithms: if one revision edits the internals of some part of the program, and another revision moves that part of the program, those facts do not contradict each other. On the other hand, if both modifications edit exactly the same part of the program, this represents a real conflict that a human must actually see.
However, for edits in some languages, Mergiraff may use language-specific knowledge to resolve such conflicts. For example, consider the following change in rust structure:
// Base version
struct Foo {
field1: Bar,
}
// Left revision
struct Foo {
field1: Bar,
new_field_left: Baz,
}
// Right revision
struct Foo {
field1: Bar,
new_field_right: Quux,
}
This is a merge conflict because a line-based algorithm cannot tell in what order to add new lines – and what order lines appear in a program is usually important. However, in Rust, the compiler is allowed to rearrange structure fields at will (as long as the structure is not marked up). ,[repr(C)] or one of the other
reaper Settings – which appears to be a known bug in the current version of Merjiraf). Therefore, this merge conflict can be resolved automatically by placing the rows in any order. The behavior of the resulting merged program is identical in any case. On the other hand, this would not be the correct way to resolve equivalent merge conflicts in C, because, in C, the order of members in a structure can affect the correctness of the program.
When the children of a syntactic element can be freely rearranged without changing the meaning of the program, Mergiraff calls it a “commutative parent”. One piece of language-specific information that Mergirafe needs is a list of which language parts are commutative parents, if any. However, a commutative parent merge is not a get out of jail card for conflicts: for example, if two revisions add fields with the same name and different types, it will still be a conflict. In such cases, Mergiraff uses an additional piece of language-specific information to put the conflicting lines together, so that the resulting conflict markers can pinpoint the problem as accurately as possible.
using it
Mergiraff’s approach seemed promising when I encountered it, but I was curious about how much of a difference it would actually make in real-world usage of Git. At the time of writing, there are 7,415 merge commits in the Linux kernel repository that, when rerun using the default merge algorithm, resulted in conflicts. These are merge commits that have to be fixed by hand, although this is probably an underestimate of the number of merge conflicts that kernel developers have had to deal with. For example, it does not include merge conflicts that may have appeared during rebasing, because information about rebases is not included in the git history for analysis.
After extracting a list of every merge conflict in the kernel’s Git history, I tried using mergeref to resolve them. 6,987 still had conflicts, but 428 were successfully resolved. A large portion of the merge disputes were still partially resolved. Should those results generalize, which I think is likely, adopting mergeref may reduce the number of merge conflicts requiring manual merging by a small amount, which is still potentially helpful in saving valuable maintenance time.
The tool itself has two interfaces: one that can be run by hand on a file to attempt to resolve conflicts with conflict markers (such as those produced by ORT), and one that can be used automatically by Git. run “Solve Mergiraf
[merge "mergiraf"]
name = mergiraf
driver = mergiraf merge --git %O %A %B -s %S -x %X -y %Y -p %P -l %L
When invoked by Git, the user can review the conflicts that mergiraf encountered and how it resolved them by running “marziraf review“. For those who don’t have merge conflicts, mergeref has an example repository containing different types of conflicts, to show how mergeref resolves them. The tool also works with Jujutsu, and possibly with other version-control systems, as long as they use the same merge-conflict syntax as Git.
Programmers are doing just fine without Mergiraff, so it’s not necessarily something everyone will want to add to their set of programming tools. But some people enjoy participating in merge conflicts, and tools that can intelligently help resolve them – especially those that are obvious to a human, and therefore a waste of time to deal with – are an attractive prospect.