tucuxi.org

C++ Declarations and Definitions

In C, C++, Java and many other programming languages, the function, also known as a method or subroutine, is the basic unit of code that can be grouped together. This article talks a little about how compilers analyse your code and work out what to actually call.

C and C++ compilers read your code from top to bottom, and only deal with a single compilation unit at any one time. That's just a fancy way of saying they deal with a single .c or .cpp file at once, and don't look at the contents of any other .c/.cc files. This is an artifact of how the languages were designed (in 1969 for C, and 1978 for C++), and the capabilities of the computers back then. Splitting code into individual compilation units also forces programmers to think about not only how code is divided between different files, but also how the code is exposed between different files.

Let's start with a basic example, where all the code for a program is in a single C++ source file; let's call it app.cc.

#include <math.h>
#include <iomanip>
#include <iostream>


float distance(float x1, float y1, float x2, float y2) {
  return sqrt(pow(x2 - x1, 2.0) + pow(y2 - y1, 2.0));
}

int main(int argc, char **argv) {
  std::cout << "Distance between points is " << std::setprecision(1)
      << std::fixed << distance(0.0, 0.0, 300.0, 218.0);
}

We can see here that our program is contained entirely within a single file, apart from standard library functions that we call such as sqrt() and pow(). We have two functions that we define, main() and distance(). But, how does the C++ compiler know to call the function distance() when we refer to it inside of main()? The answer is, because we have already defined it, above the reference in main(). When it comes to parsing our C++ code, the compiler simply reads the code from top-to-bottom, building up a symbol table as it goes. When it reads the definition of distance(), that is, float distance(float x1, float y1, float x2, float y2), it maintains a reference back to that part of the parsed C++ code with the name 'distance'. When we subsequently refer to distance(0.0, 0.0, 300.0, 218.0), the compiler knows we already have a function defined named distance that takes four parameters, and will use that reference that it saved earlier.

But, what if we didn't want to put main() at the end of our file? You might re-write your code to look something like this:

#include <math.h>
#include <iomanip>
#include <iostream>


int main(int argc, char **argv) {
  std::cout << "Distance between points is " << std::setprecision(1)
      << std::fixed << distance(0.0, 0.0, 300.0, 218.0);
}

float distance(float x1, float y1, float x2, float y2) {
  return sqrt(pow(x2 - x1, 2.0) + pow(y2 - y1, 2.0));
}

However, unlike the earlier snippet of code, this code will not compile. You'll get an error something like this:

app.cc: In function ‘int main(int, char**)’:
app.cc:7: error: ‘distance’ was not declared in this scope

You might recall that I mentioned the C++ compiler reads the code from top-to-bottom, so when you refer to distance(...) before defining it, the compiler has no reference available to any code that it has parsed, and will simply give up. But, there is a way to give the compiler a hint that a function does exist, even if you haven't defined it yet in the file.

Declarations and Definitions

There are two ways to tell a C/C++ compiler about a function; you can provide a definition, which includes the function's type signature, and the code that actually makes up the function, or you can provide a declaration, which only includes the type signature as a hint to the compiler to say "I've defined this somewhere else, but it exists, honest". Let's make the code above able to compile again, and show the difference between a definition, an a declaration.

#include <math.h>
#include <iomanip>
#include <iostream>


// This is a declaration of distance()
// It doesn't contain the code, just the type signature.
// Notice how it ends with a semicolon, and not curly braces.

float distance(float x1, float y1, float x2, float y2);

// This is a definition of main()
int main(int argc, char **argv) {
  std::cout << "Distance between points is " << std::setprecision(1)
      << std::fixed << distance(0.0, 0.0, 300.0, 218.0);
}

// This is a definition of distance()
float distance(float x1, float y1, float x2, float y2) {
  return sqrt(pow(x2 - x1, 2.0) + pow(y2 - y1, 2.0));
}

This code will compile successfully because we've already given the compiler a hint about distance() by including the declaration above main().

Multiple Files

Let's say that we found the distance() routine fairly handy, and that we wanted to use it in a number of places. One way to re-use code is to move it into a separate file, and refer to the function in just the same way as we did above. Header files, like the ones we're already using above, math.h, iomanip and iostream provide a set of declarations, exposing functions that we can call, despite not copying those functions into our own .cc file.

So, what do header files contain? They usually contain three things; a set of declarations, like the one we provided for distance(); an include guard to stop from re-declaring the same functions multiple times; and optionally, they may #include other headers. Let's create our own header file for distance(), and call it distance.h.

#ifndef __DISTANCE_H__
#define __DISTANCE_H__ (1)


// This is a declaration of distance().
// distance() is a function that returns the Euclidean distance between
// two points, using Pythagoreas' theorem.

extern float distance(float x1, float y1, float x2, float y2);

#endif  // __DISTANCE_H__

The declaration of distance() is almost the same as the declaration that we had in app.cc, but you might notice the extern that was added at the start of the line. We'll discuss that later. Also, you might notice the #ifndef, #define and #endif lines surrounding the declaration. This is referred to as an include guard, and stops us including the same declaration more than once. It's good practice to have an include guard in your header files, typically named after the filename.

In addition to creating the header file, we need to split out the definition of distance() into another file so it can be compiled and linked into your program. We'll create another file, distance.cc that looks like this:

#include <math.h>
#include "distance.h"


// This is a definition of distance()
float distance(float x1, float y1, float x2, float y2) {
  return sqrt(pow(x2 - x1, 2.0) + pow(y2 - y1, 2.0));
}

There are three important parts of this C++ file; firstly, we #include the system header file math.h which defines sqrt() and pow(). This is needed because we call these functions from distance(). Secondly, we include the header file with the declaration of distance(), the file we just created, distance.h. This isn't strictly necessary, but it is good practice - if you change the type signature in the C++ file, but not the header, your compiler should warn you if they do not match. That way, they won't fall out of sync. Finally, we've also put the definition of distance() into the C++ file. This file will be compiled by itself, as a single compilation unit, and the compiled code will be made available to the linker when it creates your application.

So, now that we've taken distance() into its own file, we need to remove it from app.cc. We need to remove the definition and declaration from app.cc, and replace it with a #include of the header file distance.h, which contains the declaration of distance(). The updated app.cc will look something like this:

#include <iomanip>
#include <iostream>
#include "distance.h"


int main(int argc, char **argv) {
  std::cout << "Distance between points is " << std::setprecision(1)
      << std::fixed << distance(0.0, 0.0, 300.0, 218.0);
}

app.cc no longer contains any declaration or definition of distance(), but we can call it all the same. How does it work under the hood? When you use the #include pre-processor command, the compiler literally includes the contents of that file in-place before trying to compile the C++ code. So, to the compiler, app.cc looks something like this:

#include <iomanip>
#include <iostream>

#ifndef __DISTANCE_H__
#define __DISTANCE_H__ (1)


// This is a declaration of distance().
// distance() is a function that returns the Euclidean distance between
// two points, using Pythagoreas' theorem.

extern float distance(float x1, float y1, float x2, float y2);

#endif  // __DISTANCE_H__


int main(int argc, char **argv) {
  std::cout << "Distance between points is " << std::setprecision(1)
      << std::fixed << distance(0.0, 0.0, 300.0, 218.0);
}

The section in yellow is the code from distance.h that has been included verbatim, and the system headers iomanip and iostream will be similarly expanded by the compiler.

But what about the definition?

Earlier, we saw that we needed a definition of distance() in the same file for the compiler to know what to run when we call distance(). But, now that the code has been split into three files, app.cc, distance.cc and distance.h, app.cc does not contain any definition of distance(), even after expanding the included header file. So, how does it all work?

When we moved the definition to a separate file, we also added the prefix extern to the declaration of distance(). Using extern tells the compiler that the definition might be in an external compilation unit (or, to put it another way, a different .cc file), and that the compiler shouldn't worry about trying to resolve it at compile time.

Compilers run over your C and C++ code, translating one .cpp (or .cc) file at a time from human-readable code into machine code which can run on your processor natively. However, your code most likely depends on code that other people have written, in other .cpp or .cc files which are also processed by themselves. How do we pull it all together into one coherent application? With a stage called the linker. The linker takes all the loose ends from the compiler; the extern symbols that the compiler couldn't resolve, and stitches together all the different compiled blobs of machine code, patching the loose ends together.

If we tried to compile and link app.cc by itself, we would get an error at the linker step, saying that it could not resolve the function distance. For example:

/tmp/ccFp1Ju6.o: In function `main':
app.cc:(.text+0x27): undefined reference to `distance(float, float, float, float)'
collect2: ld returned 1 exit status

To successfully build the application, we need to tell the compiler and linker to combine both app.cc and distance.cc into a single output file, and then the application can successfully be run:

Distance between points is 370.8