Better C++ Syntax Highlighting - Part 5: Classes
Classes introduce significantly more complexity than enums, namespaces, or functions, but the same core principles apply. Consider the following example:
And corresponding AST:
Classes involve multiple interconnected AST node types, each representing different aspects of the class definition and usage. The nodes we’ll encounter in this section include:
CXXRecordDeclnodes for class, struct, and union declarationsCXXConstructorDeclandCXXDestructorDeclnodes for constructors and destructorsCXXCtorInitializernodes for elements in constructor initializer listsCXXMethodDeclfor class member function declarationsFieldDeclnodes for class member variable declarationsMemberExprfor expressions that reference member variablesVarDeclfor static member declarations and out-of-line definitionsDeclRefExprfor references to static member functions
We’ll need to extend our visitor with several new visitor functions to handle these node types:
Class definitions
Declarations of classes, structs, and unions are represented by CXXRecordDecl nodes.
The implementation of this visitor follows the same pattern we’ve seen before:
Clang provides the isAnonymousStructOrUnion() check to help us exclude anonymous classes from being annotated.
This removes faulty [[class-name,]]struct { ... } annotations that would have otherwise been inserted.
With this visitor implemented, the tool properly annotates class, struct, and union declarations.
This inserts a class-name annotation for each type definition:
Member variable declarations
Member variable declarations like the x, y, and z fields of our Vector3 class are represented by FieldDecl nodes.
The implementation is similar to our previous visitors:
With this visitor in place, a member-variable annotation is inserted for each member variable declaration.
Member variable references
The x, y, and z references inside the length() function are captured as MemberExpr nodes.
Similar to member variable declarations, these identifiers benefit significantly from semantic highlighting as they’re often indistinguishable from local variables or function parameters without additional context.
The name of the member is retrieved from the underlying declaration using getMemberDecl().
The getMemberLoc() function returns the location of the member name, accounting for access operators like . and ->.
References to member variables are annotated with the member-variable annotation.
Constructors and initializer lists
Not all references to member variables are handled by the VisitMemberExpr function - the x, y, and z references in the constructor initializer list remain unhandled.
This happens because initializer list entries are represented by CXXCtorInitializer nodes, and not MemberExpr.
However, CXXCtorInitializer isn’t a node that we can visit directly through the usual traversal of the AST.
Instead, we need to access initializers as children of the parent CXXConstructorDecl node, which represents a constructor definition.
The isImplicit() check is crucial here - compiler-generated constructors don’t exist in our source code, so attempting to annotate them will fail.
We also skip base class initializers (for now), since those require different handling that we’ll address when annotating references to types.
Individual initializer expressions can be iterated over using the inits() function:
The name of the member variable is retrieved from its declaration using getMember().
As before, initializers for member variables are annotated with member-variable.
This approach works well for typical class members, but fails to deal with anonymous structs or unions.
Consider an improvement made to the Vector3 class to allow for representing RGB colors:
In this case, members get promoted as part of the Vector3 definition.
However, getMember() returns null for member variables that originate as members of an anonymous type.
To get around this, we’ll introduce a new collect_members() function that collects the names of all available members of a class:
This function recurses over all nested type definitions, gathering both explicit members and those implicitly promoted through anonymous structs or unions.
We’ll update VisitCXXConstructorDecl to use this when annotating member initializers:
The collect_members() function is called with the declaration of the enclosing class.
This is retrieved by walking up the declaration hierarchy of a given AST node, accessed through the node’s DeclContext chain.
The next parent is accessed through the getParent() function.
Initializers for members that originate from an anonymous context are identified with the isIndirectMemberInitializer() check.
Instead of getMember(), we’ll retrieve the name of the member through direct tokenization, taking advantage of the fact that the first token in the initializer’s source range will always be the name of the member being initialized.
Annotating the name of the constructor was already done in the FunctionDecl visitor, so we don’t need to do any additional processing here.
With this visitor implemented, both direct member initializations and promoted member initializations are properly annotated in constructor initializer lists:
Static class member declarations
Static class members present a unique challenge for syntax highlighting - they are declared within the class but often require separate definitions outside of it.
Both scenarios are captured by VarDecl nodes, but we need to distinguish them from regular variable declarations.
Let’s augment our existing VisitVarDecl implementation from a previous post to handle static class members:
Static class members are annotated with member-variable, just as instance member variables.
The isStaticDataMember() check ensures that we only apply the annotation to static class member declarations.
With this visitor implemented, here is what our example now looks like:
Static class member references
Similar to enumeration values from an earlier post, references to static class members are captured by DeclRefExpr nodes.
Let’s augment our existing VisitDeclRefExpr visitor to also handle static class members:
References to static class members are caught under DeclRefExpr nodes.
We have an existing definition for this visitor from when we annotated enum constants:
As before, we retrieve information about the underlying declaration with getDecl().
With a combination of the isCXXClassMember(), isCXXInstanceMember(), and isFunctionOrFunctionTemplate() checks, we can isolate only references to static members variables.
As before, these are annotated with the member-variable tag.
This logic needs to come after the check for enum constants to avoid annotating unscoped enum members as member variables.
Temporary objects
The final node type we need to handle is CXXTemporaryObjectExpr, which represents the construction of temporary objects.
In the example we’ve been using throughout this post, this applies to the definition of the Vector3::zero static class member.
Generally speaking, these nodes appear in a variety of contexts, such as:
- Direct temporary construction
- Passing a temporary as a function argument
- Returning a temporary from a function
Despite appearing as constructor calls, all of these are represented by CXXTemporaryObjectExpr nodes and do not generate CallExpr nodes as one might expect.
Without a dedicated visitor, references to these constructors would go unannotated.
The implementation of the VisitCXXTemporaryObjectExpr visitor is straightforward:
We retrieve the type name of the object being constructed from the CXXConstructorDecl associated with the expression.
As with other function calls, constructor calls are annotated using the function tag.
The isListInitialization() check ensures we skip brace-initialized constructors like Vector3 { }, as those should instead be annotated as types.
We’ll handle this in a later post.
With this visitor in place, temporary constructor calls are properly annotated:
Styling
The final step is to add definitions for the class-name and member-variable CSS styles:
Type aliases
Type aliases are loosely coupled with classes, so we’ll cover them in this section.
typedef declarations are represented by TypedefDecl nodes, while using declarations are represented by TypeAliasDecl nodes.
Functionally, both constructs serve the same purpose: defining an alias for an existing type.
For example, we can extend our example from earlier even further and allow our users to reference members of the Vector3 struct through different type aliases altogether:
Although typedef and using are represented by different AST nodes, both are annotated in the same way.
For this reason, only the implementation of VisitTypedefDecl is shown below:
Type aliases are annotated with the class-name tag.
The implementation of VisitTypeAliasDecl is identical and omitted for brevity, but can be found here.
With both visitors implemented, typedef and using declarations are properly annotated:
We’ve added support for annotating class declarations, static and class member variable declarations and references, constructor initializer lists, and type aliases. In the <LocalLink text={“next post”} to={“Better C++ Syntax Highlighting - Part 6: Templates”}>, we’ll take a closer look at annotating classes, functions, and parameters in template contexts. We will also revisit some of our existing visitors and add support for C++20 concepts.
Thanks for reading!