Better C++ Syntax Highlighting - Part 4: Functions

· Updated Jul 14, 2025

Functions are next on our list. Their declarations, definitions, and calls appear throughout C++ code, and Clang provides a rich set of node types for processing them.

Consider the following example:

cpp
1
#include <cstring> // std::strcmp
2
#include <string> // std::string, std::string_literals
3
#include <chrono> // std::chrono::duration, std::chrono_literals
4
5
template <typename T>
6
bool equal(T a, T b) {
7
return a == b;
8
}
9
10
// Template specialization
11
template <>
12
bool equal(const char* a, const char* b) {
13
if (!a || !b) {
14
return false;
15
}
16
17
return std::strcmp(a, b) == 0;
18
}
19
20
namespace math {
21
22
struct Point {
23
static float distance(const Point& a, const Point& b);
24
25
float x;
26
float y;
27
};
28
29
// Operator overload
30
bool operator==(const Point& a, const Point& b);
31
32
}
33
34
int main() {
35
math::Point p1(1.2f, 3.4f);
36
math::Point p2(5.6f, 7.8f);
37
38
if (p1 != p2 && math::Point::distance(p1, p2) < 5.0f) {
39
// ...
40
}
41
42
int value = 42;
43
int* ptr = &value;
44
45
(*ptr)++;
46
*ptr += -4 * (1 + 3) / (9 - 5) % 2;
47
48
bool eq = equal("apple", "banana");
49
50
// Literal operators
51
using namespace std::string_literals;
52
std::string str = "Hello, world!"s;
53
54
using namespace std::chrono_literals;
55
std::chrono::duration timeout = 200ms;
56
57
// ...
58
}

And corresponding AST:

text
Show 140 more lines

The process of annotating functions and operators is a lot more involved than previous node types we’ve seen, so let’s establish some success criteria before getting started:

  1. Function names (regular and template) should be annotated with function - this includes the equal template function declarations on line 6 and 12, its use on line 49, the distance static class member function on line 23 and use on line 39, and the main function definition on line 26.
  2. Unary operators (lines 13, 43, and 46) should be annotated with unary-operator.
  3. Binary operators (lines 7, 17, and 38) should be annotated with binary-operator.
  4. Compound assignment operators like += on line 38 should also use binary-operator.
  5. Overloaded operator declarations and definitions, such as operator== on lines 30-31 and its use on line 39, should be annotated with function-operator.
  6. User-defined literal operators like the s operator on line 53 and ms operator on line 56 should match their underlying parameter type.
  7. Variable declarations using functional-style initialization (p1 and p2 on lines 36 and 37) should remain as plain tokens.

To annotate functions and operators, we’ll define visitor functions for eight new node types:

cpp
bool VisitFunctionDecl(clang::FunctionDecl* node);
bool VisitFunctionTemplateDecl(clang::FunctionTemplateDecl* node);
bool VisitUnaryOperator(clang::UnaryOperator* node);
bool VisitBinaryOperator(clang::BinaryOperator* node);
bool VisitCallExpr(clang::CallExpr* node);
bool VisitCXXOperatorCallExpr(clang::CXXOperatorCallExpr* node);
bool VisitCompoundAssignOperator(clang::CompoundAssignOperator* node);
bool VisitUserDefinedLiteral(clang::UserDefinedLiteral* node);

Function declarations

FunctionDecl nodes represent standard function declarations and definitions. We’ll annotate these with a function tag, with special handling for overloaded operators.

cpp
1
bool Visitor::VisitFunctionDecl(clang::FunctionDecl* node) {
2
// Check to ensure this node originates from the file we are annotating
3
// ...
4
5
if (node->isImplicit()) {
6
return true;
7
}
8
9
std::string name = node->getNameAsString();
10
unsigned line = source_manager.getSpellingLineNumber(location);
11
unsigned column = source_manager.getSpellingColumnNumber(location);
12
13
if (node->isOverloadedOperator()) {
14
name = name.substr(8); // Skip 'operator' keyword
15
m_annotator->insert_annotation("function-operator", line, column + 8, name.length());
16
}
17
else {
18
m_annotator->insert_annotation("function", line, column, name.length());
19
}
20
21
return true;
22
}

The isImplicit() check prevents annotating compiler-generated placeholder declarations. For overloaded operators, isOverloadedOperator() is used to detect operator functions and skip the first 8 characters (operator) to annotate only the operator symbol. operator itself should instead be highlighted as a language keyword, which we’ll handle in a later post in this series.

Note that CXXMethodDecl nodes (class member functions) are picked up by this visitor, since they derive from FunctionDecl. This includes static class functions, constructors, and destructors. Similarly, template function declarations are also visited because each FunctionTemplateDecl contains a child FunctionDecl node representing the actual function. This means we don’t actually need to set up dedicated visitors for these nodes (unless we want some specialized logic).

With this visitor implemented, function declarations and definitions are properly annotated:

text
 
template <typename T>
 
bool [[function,equal]](T a, T b) {
 
// ...
 
}
 
+
// Template specialization
 
template <>
 
bool [[function,equal]](const char* a, const char* b) {
 
// ...
 
}
 
+
namespace math {
 
 
struct Point {
 
static float [[function,distance]](const Point& a, const Point& b);
 
 
float x;
 
float y;
 
};
 
 
// Operator overloads
 
bool operator[[function-operator,==]](const Point& a, const Point& b);
+
bool operator[[function-operator,!=]](const Point& a, const Point& b);
 
 
}
 
 
int [[function,main]]() {
 
// ...
 
}

Function calls

CallExpr nodes represent function calls. As before, we’ll annotate the function name of each call with the function tag.

cpp
1
bool Visitor::VisitCallExpr(clang::CallExpr* node) {
2
// Check to ensure this node originates from the file we are annotating
3
// ...
4
5
const clang::FunctionDecl* function = clang::dyn_cast<clang::FunctionDecl>(node->getCalleeDecl());
6
std::string name = function->getNameAsString();
7
8
for (const Token& token : m_tokenizer->get_tokens(node->getSourceRange())) {
9
for (const Token& token : tokens) {
10
if (token.spelling == name) {
11
m_annotator->insert_annotation("function", token.line, token.column, name.length());
12
break;
13
}
14
}
15
}
16
17
return true;
18
}

We can retrieve the function name from the underlying declaration, which we get through getCalleeDecl(). Unlike other AST nodes, CallExpr does not provide direct access to the function name’s location in the source. The getBeginLoc() function returns the location of the fully-qualified function call, including any namespace and/or class qualifiers.

cpp
// This approach won't work for qualified calls like math::Point::distance()
clang::SourceLocation location = node->getBeginLoc(); // Points to 'math', not 'distance'

To work around this, we’ll tokenize the function call’s source range and annotate only the token matching the function’s name.

This approach elegantly handles arbitrarily qualified function calls.

text
 
// Template specialization
 
template <>
 
bool equal(const char* a, const char* b) {
 
if (!a || !b) {
 
return false;
 
}
 
 
return std::[[function,strcmp]](a, b) == 0;
 
}
 
 
int main() {
 
if (p1 != p2 && math::Point::[[function,distance]](p1, p2) < 5.0f) {
 
// ...
 
}
 
 
// ...
+
 
bool eq = [[function,equal]]("apple", "banana");
 
 
// ...
 
}

Built-in operators

Unary, binary, and compound assignment operators are captured under UnaryOperator, BinaryOperator, and CompoundAssignOperator nodes respectively. All three follow the same implementation pattern, so we’ll focus on unary operators as an example.

cpp
1
bool Visitor::VisitUnaryOperator(clang::UnaryOperator* node) {
2
// Check to ensure this node originates from the file we are annotating
3
// ...
4
5
clang::SourceLocation location = node->getOperatorLoc();
6
7
const std::string& name = clang::UnaryOperator::getOpcodeStr(node->getOpcode()).str();
8
unsigned line = source_manager.getSpellingLineNumber(location);
9
unsigned column = source_manager.getSpellingColumnNumber(location);
10
11
m_annotator->insert_annotation("unary-operator", line, column, name.length());
12
return true;
13
}

Unlike other nodes, operator nodes provide direct access to the operator’s location through getOperatorLoc(). We retrieve the operator symbol using the static getOpcodeStr() function. The implementations of VisitBinaryOperator and VisitCompoundAssignOperator follow the same pattern, using their respective getOpcodeStr() functions. Unary operators are annotated with unary-operator, while binary and compound assignment operators with binary-operator.

Another type of built-in operator is the array subscript operator, represented by the ArraySubscriptExpr AST node. Handling this requires setting up a dedicated visitor, as these nodes are not visited by other operator visitors.

cpp
1
bool Visitor::VisitArraySubscriptExpr(clang::ArraySubscriptExpr* node) {
2
// Check to ensure this node originates from the file we are annotating
3
// ...
4
5
for (const Token& token : m_tokenizer->get_tokens(node->getSourceRange())) {
6
if (token.spelling == "[" || token.spelling == "]") {
7
m_annotator->insert_annotation("operator", token.line, token.column, 1);
8
}
9
}
10
11
return true;
12
}

Unlike most other operators, Clang does not provide a direct way of retrieving the locations of both the opening and closing brackets. Functions like getExprLoc() only return the location of the expression the operator is applied to, and not the operator symbols themselves. To work around this, we simply tokenize the source range of the node and manually annotate both the [ and ] tokens as operators.

Overloaded operators

CXXOperatorCallExpr nodes represent calls to overloaded operators. The implementation largely follows the same structure as built-in operators: Overloaded operators are captured under CXXOperatorCallExpr nodes:

cpp
1
bool Visitor::VisitCXXOperatorCallExpr(clang::CXXOperatorCallExpr* node) {
2
// Check to ensure this node originates from the file we are annotating
3
// ...
4
5
const std::string& op = clang::getOperatorSpelling(node->getOperator());
6
unsigned line = source_manager.getSpellingLineNumber(location);
7
unsigned column = source_manager.getSpellingColumnNumber(location);
8
9
if (op == "[]") {
10
// Special handling for array subscript operator
11
for (const Token& token : m_tokenizer->get_tokens(node->getSourceRange())) {
12
if (token.spelling == "[" || token.spelling == "]") {
13
m_annotator->insert_annotation("function-operator", token.line, token.column, 1);
14
}
15
}
16
}
17
else {
18
m_annotator->insert_annotation("function-operator", line, column, op.length());
19
}
20
21
visit_qualifiers(node->getDirectCallee()->getDeclContext(), clang::SourceRange(node->getBeginLoc(), node->getOperatorLoc()));
22
return true;
23
}

We use getOperatorSpelling() to retrieve the operator symbol and annotate it with function-operator to match our handling of overloaded operator declarations from earlier.

Overloaded array subscript operators are handled separately from other overloaded operators, as these require two annotations instead of one. Similar to what we did when annotating ArraySubscriptExpr nodes in the previous section, we tokenize the source range of the function call and manually annotate both the [ and ] tokens with the function-operator tag.

Note that overloaded operators in template contexts (particularly with fold expressions) can introduce challenges for annotation due to ambiguity around operator resolution. One possible solution is to iterate through the tokens of a template function definition and annotate those that match operator spellings. However, this is difficult to automate, as C++ provides a lot of flexibility when it comes to defining custom operator types. Because of this, I decided to leave the annotation process for these to be manual. I prefer this approach, as I don’t use many fold expression in my code.

text
template <typename T>
bool equal(T a, T b) {
return a [[binary-operator,==]] b;
}
// Template specialization
template <>
bool equal(const char* a, const char* b) {
if ([[unary-operator,!]]a [[binary-operator,||]] [[unary-operator,!]]b) {
return false;
}
return std::strcmp(a, b) [[binary-operator,==]] 0;
}
int main() {
// ...
if (p1 [[function-operator,!=]] p2 [[binary-operator,&&]] math::Point::function,distance(p1, p2) [[binary-operator,<]] 5.0f) {
// ...
}
int value = 42;
int* ptr = [[unary-operator,&]]value;
([[unary-operator,*]]ptr)[[unary-operator,++]];
[[unary-operator,*]]ptr [[binary-operator,+=]] [[unary-operator,-]]4 [[binary-operator,*]] (1 [[binary-operator,+]] 3) [[binary-operator,/]] (9 [[binary-operator,-]] 5) [[binary-operator,%]] 2;
// ...
}

User-defined literal operators

UserDefinedLiteral nodes represent user-defined literal operators. We’ll annotate these to match the type of literal they’re applied to.

Unlike built-in operators, we need to retrieve the operator name from the function declaration:

cpp
const clang::FunctionDecl* function = node->getCalleeDecl()->getAsFunction();
std::string name = function->getNameAsString();
name = name.substr(10); // Skip 'operator""' prefix

We get the function declaration through getCalleeDecl() and strip the operator"" prefix to get the actual suffix used in the code. For the annotation type, the annotation of the operator should match the underlying literal type. Rather than relying on getLiteralOperatorKind() (which can be misleading for template-based operators), we parse the token directly:

cpp
for (const Token& token : tokens) {
std::size_t position = token.spelling.find(name);
if (position != std::string::npos) {
bool is_string_type = token.spelling.find('\'') != std::string::npos [[binary-opetator,||]] token.spelling.find('"') != std::string::npos;
const char* annotation = is_string_type [[binary-opertaor,?]] "string" : "number";
m_annotator->insert_annotation(annotation, token.line, token.column + position, name.length());
}
}

We can do this because literal operators can only be applied to integer, floating-point, character, and string literals. If the token containing our operator suffix contains quotations marks, the operator is annotated as a string - otherwise, we know the operator is a number.

An alternative approach uses the getLiteralOperatorKind() function, which returns a category corresponding to the function signature of the operator according to the specification:

cpp
unsigned line = source_manager.getSpellingLineNumber(location);
unsigned column = source_manager.getSpellingColumnNumber(location);
const char* annotation;
switch (node->getLiteralOperatorKind()) {
case clang::UserDefinedLiteral::LOK_Raw: // C-style string
case clang::UserDefinedLiteral::LOK_Template: // Template parameter pack of characters (numeric literal operator template)
case clang::UserDefinedLiteral::LOK_String: // C-style string and length
case clang::UserDefinedLiteral::LOK_Character:
annotation = "string";
break;
case clang::UserDefinedLiteral::LOK_Integer:
case clang::UserDefinedLiteral::LOK_Floating:
annotation = "number";
break;
}
m_annotator->insert_annotation(annotation, line, column, name.length());

However, this approach has some unexpected drawbacks. For example, the C++ std::chrono library does not provide an overload to resolve 200ms into a function that accepts an integer. Instead, the following overload is called (accepting a variadic list of characters as the digits of the number):

cpp
// Literal suffix for durations of type `std::chrono::milliseconds`
template <char... _Digits>
constexpr std::chrono::milliseconds operator""ms() {
// ...
}

Why is this implemented in such a way? Well, I’m not sure. Using this approach categorizes 200ms as a string of characters, incorrectly marking the ms as a string instead of a number.

text
 
int main() {
 
// ...
 
 
// Literal operators
 
using namespace std::string_literals;
 
std::string str = "Hello, world!"[[string,s]];
 
 
using namespace std::chrono_literals;
 
std::chrono::duration timeout = 200[[number,ms]];
 
 
// ...
 
}

We’ll also need some special handling for annotating function declarations of literal operators. Currently, our VisitFunctionDecl visitor incorrectly annotates the operator"" portion of the function name in addition to the operator itself.

text
template <char... _Digits>
constexpr std::chrono::milliseconds [[function,operator""ms]]() {
// ...
}

We’ll fix this by checking for and handling this case explicitly:

cpp
1
bool Visitor::VisitFunctionDecl(clang::FunctionDecl* node) {
2
// ...
3
4
[[keyword (node->isOverloadedOperator()) {
5
// ...
6
}
7
else if (const clang::IdentifierInfo* identifier = node->getLiteralIdentifier()) {
8
name = identifier->getName();
9
for (auto it = m_tokenizer->at(node->getTypeSpecStartLoc()); it != m_tokenizer->end(); ++it) {
10
const Token& token = *it;
11
if (token.spelling == name) {
12
m_annotator->insert_annotation("function", token.line, token.column, name.length());
13
break;
14
}
15
else if (token.spelling == utils::format("\"\"{}", name)) {
16
m_annotator->insert_annotation("function", token.line, token.column + 2, name.length());
17
break;
18
}
19
}
20
}
21
22
// ...
23
}

We use the getLiteralIdentifer() function to check if the function declaration refers to a literal operator. Using the returned IdentifierInfo struct, we can query the name of the operator using the getName() function and search for it in the source range of the node.

Unfortunately, there is no direct way to retrieve the location of the operator itself, so we’ll resort to manually searching for a token that matches the name of the operator using the tokenization approach. One small caveat here is that literal operators are one of the few exceptions to functions that may contain a space in the function name. Names that contain no space (for example operator""ms) will combine the quotes with the name of the function into the same token. This must also be accounted for so that only the operator name is annotated.

Literal operator declarations, as with declarations for other functions, are annotated with the function annotation.

text
template <char... _Digits>
constexpr std::chrono::milliseconds operator""[[function,ms]]() {
// ...
}

Functional-style variable declarations

In most other syntax highlighters, variable declarations using functional-style initialization are incorrectly highlighted as function calls. This likely occurs because functions are identified based on the presence of parentheses.

We can fix this by implementing a VarDecl visitor, which represents variable declarations and definitions.

cpp
1
bool Visitor::VisitVarDecl(clang::VarDecl* node) {
2
// Check to ensure this node originates from the file we are annotating
3
// ...
4
5
const std::string& name = node->getNameAsString();
6
7
clang::SourceLocation location = node->getLocation();
8
unsigned line = source_manager.getSpellingLineNumber(location);
9
unsigned column = source_manager.getSpellingColumnNumber(location);
10
11
if (node->isDirectInit()) {
12
m_annotator->insert_annotation("plain", line, column, name.length());
13
}
14
15
return true;
16
}

The key is the isDirectInit() check, which helps identify variables using functional-style initialization. We annotate these as plain tokens to prevent them from being highlighted as function calls.

With this visitor implemented, functional-style variable declarations are properly handled:

text
int main() {
math::Point [[plain,p1]](1.2f, 3.4f);
math::Point [[plain,p2]](5.6f, 7.8f);
// ...
}

Styling

The final step is to add definitions for the various CSS styles for the different kinds of function annotations: The plain CSS style is language-agnostic, and provides the default style to use for tokens in code blocks.

css
.language-cpp .function {
color: rgb(255, 198, 109);
}
.language-cpp .unary-operator,
.language-cpp .binary-operator,
.language-cpp .function-operator {
color: rgb(95, 140, 138);
}
.language-cpp .char,
.language-cpp .string {
color: rgb(106, 171, 115);
}
.language-cpp .number {
color: rgb(42, 172, 184);
}
cpp
#include <cstring> // std::strcmp
#include <string> // std::string, std::string_literals
#include <chrono> // std::chrono::duration, std::chrono_literals
template <typename T>
bool equal(T a, T b) {
return a == b;
}
// Template specialization
template <>
bool equal(const char* a, const char* b) {
if (!a || !b) {
return false;
}
return std::strcmp(a, b) == 0;
}
namespace math {
struct Point {
static float distance(const Point& a, const Point& b);
float x;
float y;
};
// Operator overloads
bool operator==(const Point& a, const Point& b);
bool operator!=(const Point& a, const Point& b);
}
int main() {
math::Point p1(1.2f, 3.4f);
math::Point p2(5.6f, 7.8f);
if (p1 != p2 && math::Point::distance(p1, p2) < 5.0f) {
// ...
}
int value = 42;
int* ptr = &value;
(*ptr)++;
*ptr += -4 * (1 + 3) / (9 - 5) % 2;
bool eq = equal("apple", "banana");
// Literal operators
using namespace std::string_literals;
std::string str = "Hello, world!"s;
using namespace std::chrono_literals;
std::chrono::duration timeout = 200ms;
// ...
}

We’ve added support for annotating functions declarations, definitions, calls, and several kinds of operators. We also improved the consistency of our syntax highlighting by overriding annotations on functional-style variable initializations. In the <LocalLink text={“next post”} to={“Better C++ Syntax Highlighting - Part 5: Classes”}>, we’ll take a deeper look at annotating the different components of classes: declarations, static and class member variables, constructor initializer lists, and type aliases. Thanks for reading!

Series: Better C++ Syntax Highlighting - Part 4 of 10