Modern C++ codebases – from browsers to GPU frameworks – rely heavily on templates, and this often means on a large scale Abstract syntax tree. Even small inefficiencies in Clang’s AST representation can add noticeable compile-time overhead.
This post describes a set of structural improvements I recently made to Clang’s AST that make type representations smaller, simpler, and faster to create – leading to measurable build-time benefits in real-world projects.
A few months ago, I found a big patch in Clang that brought substantial compile-time improvements for heavily templated C++ code.
For example, in stdexec – the reference implementation of std::execution Feature set for C++26 – slowest test (test_on2.cpp) saw a 7% reduction in construction time,
Also the chromium build showed a 5% improvement (Source).
At a high level, the patch creates the Clang AST lean: This reduces the memory footprint of type representations and reduces the cost of creating and making them unique.
These improvements will be sent with ringing 22Expected to happen in the next few months.
How extensions and qualified names work
Consider this simple snippet:
namespace NS {
struct A {};
}
using T = struct NS::A;
type of T ,struct NS::A) holds two pieces of information:
- Its detailed – The
structThe keyword appears. - Its able ,
NS::serves as nested-name-specifier,
This is what the AST dump looked like before this patch:
ElaboratedType 'struct NS::A' sugar
`-RecordType 'test::NS::A'
`-CXXRecord 'A'
RecordType Provides a direct reference to the previously declared struct A – kinda Canon’s type view, stripped of syntax details such as struct Or namespace qualifier.
Those syntax details were stored separately in a ElaboratedType node that is wrapped RecordType,
The interesting thing is that a ElaboratedType The node existed even if no expansion or qualification appeared in the source (instance). This was needed to distinguish between explicitly unqualified types and types that lost their qualifiers through template substitution.
However, this design was expensive: every ElaboratedType node consumption 48 bytesAnd creating a necessary additional function to make it unique – an important step for Clang’s fast sort comparison.
A more concise representation
new approach removes ElaboratedType Completely. Instead, extensions and qualifiers are now stored straight in RecordType,
The new AST dump for the same example looks like this:
RecordType 'struct NS::A' struct
|-NestedNameSpecifier Namespace 'NS'
`-CXXRecord 'A'
struct Extension now fits into previously unused bits RecordTypewhile the qualifier is tail-allocated If present – making the nodes different sizes.
This change reduces the memory footprint and eliminates a level of indirection when traversing the AST.
representation of NestedNameSpecifier
NestedNameSpecifier is Clang’s internal representation for name qualifiers.
Before this patch, it was represented by a pointer (NestedNameSpecifier*) to a unique structure that can describe:
- global namespace (
::, - A named namespace (including aliases)
- Type
- an identifier naming an unknown entity
- A
__superReference (Microsoft extension)
For all cases except (1) and (5) NestedNameSpecifier also held Prefix – Qualifier on the left side of it.
For example:
Namespace::Class::NestedClassTemplate::XX
This will be stored as a linked list:
[id: XX] -> [type: NestedClassTemplate] -> [type: Class] -> [namespace: Namespace]
internally, it meant seven allotments overall around 160 bytes,
NestedNameSpecifier(identifier) – 16 bytesNestedNameSpecifier(type) – 16 bytesTemplateSpecializationType– 48 bytesQualifiedTemplateName– 16 bytesNestedNameSpecifier(type) – 16 bytesRecordType– 32 bytesNestedNameSpecifier(namespace) – 16 bytes
The real problem wasn’t just size – it was unique costEach possible node has to look in the hash table for an already existing instance,
To make matters worse, ElaboratedType Sometimes nodes would leak into these chains, which should not have happened and led to many long-standing bugs.
a new, smarter NestedNameSpecifier
After this patch, NestedNameSpecifier becomes one compact, tagged pointer – Only one machine word wide.
The pointer uses 8-byte alignment leaving three extra bits. Two bits are used for type discrimination, and one remains available for arbitrary use.
When non-zero, the tag bits encode:
- Type
- a declaration (either a
__superclass or namespace) - A namespace prefixed by the global scope (
::Namespace, - A special object that associates a namespace with its prefix
When zero, the tag bits instead encode:
- An empty nested name (terminator)
- global name
- An invalid/dead entry (for hash tables)
Other changes include:
- The “unknown identifier” case is now represented by a
DependentNameType, - Type prefixes are controlled directly in the type hierarchy.
Looking at the previous example again, its AST dump is created after the patch:
DependentNameType 'Namespace::Class::NestedClassTemplate::XX' dependent
`-NestedNameSpecifier TemplateSpecializationType 'Namespace::Class::NestedClassTemplate' dependent
`-name: 'Namespace::Class::NestedClassTemplate' qualified
|-NestedNameSpecifier RecordType 'Namespace::Class'
| |-NestedNameSpecifier Namespace 'Namespace'
| `-CXXRecord 'Class'
`-ClassTemplate NestedClassTemplate
This representation is now only needed Four allocations (total 156 bytes):
DependentNameType– 48 bytesTemplateSpecializationType– 48 bytesQualifiedTemplateName– 16 bytesRecordType– 40 bytes
This is approximately half the number of nodes.
Whereas DependentNameType The previous 16-byte “identifier” is larger than the node, the extra space is not wasted – it contains cached answers to common questions like “Does this type refer to a template parameter?” or “What is its canonical form?”
These caches make those operations much cheaper, further improving performance.
wrapping up
There’s a lot more to the patch besides what I’ve covered here, including:
RecordTypeNow points directly to the declaration found at creation time, enriching the AST without measurable overhead.RecordTypeNodes are now created lazily.- redesigned
NestedNameSpecifierSimplified several template instantiation changes.
Each of these may warrant its own write, but even this high-level overview shows that careful structural changes to an AST can lead to tangible compile-time wins.
I hope you found this deep dive into the internals of Clang interesting – and that it gives a glimpse of the small, structural optimizations that lead to real performance improvements in large C++ builds.