new2 new2

Building the Clang + LLVM compilers

Valid HTML 3.2!
Nelson H. F. Beebe
University of Utah
Department of Mathematics, 110 LCB
155 S 1400 E RM 233
Salt Lake City, UT 84112-0090
Email: , ,
Telephone: +1 801 581 5254
FAX: +1 801 581 4148
Last updates: Mon Jul 21 19:14:14 2014     ...     Fri Jul 25 10:31:01 2014 ...     Tue Aug 26 10:24:27 2014


This document describes simplifications in the build process for an important C/C++ compiler family, Clang + LLVM, and offers downloadable shell scripts for automating almost all of the human work required to fetch, build, validate, and install a daily snapshot of the compiler software.

This document also offers pointers to pre-built binary distributions for numerous systems of summer-2014 snapshots of the compilers. Those distributions can be downloaded, unpacked, and used anywhere; it is not necessary to duplicate the installation directory structure used at install time at the build site. Nor are any special privileges required to do so: ordinary users can install and use the compilers from their personal login directory trees, or from any other directory to which they have write access. The platforms covered implicitly include related ones: for example, Red Hat builds can likely be used on Fedora, CentOS, and Scientific Linux, provided that the kernel and C library versions are close enough.

If all you want to do is use our pre-built Clang + LLVM compilers on your system, jump directly to the downloads section of this document.


This author began build attempts for LLVM in March 2009, and for Clang, in February 2010. Almost all of his many build attempts for snapshots and official releases of Clang + LLVM were unsuccessful, until a discovery of the need for additional library search-order control in July 2014. That significantly improved the success rate, and spurred the creation of this document, with the hope that others are then able to make more rapid progress with installation of this important compiler family.

Why Clang + LLVM compilers are desirable

On many Unix systems today, there is only compiler family, GNU gcc, which is sometimes available with support for Ada, C, C++, Fortran, Java, Objective C, Pascal, and a few other programming languages. The gcc compilers are relatively permissive, and contain many nonstandard language extensions, some of which have been adopted by other compiler families. Thus, programmers who develop software only with GNU compilers may not be aware that their code portability likely suffers from lack of exposure to other compilers. In the author's long experience, software cannot be considered portable until it has been successfully compiled by scores of compilers on dozens of different operating systems running on at least a half-dozen CPU architectures. With a little care, C code can be written to be acceptable to C++ compilers, which increases the number of available compilers, and subjects the code to the stricter scrutiny required by the C++ language.

The Clang C/C++ compiler front end for the LLVM compiler back-end system provides a completely separate implementation of the entire compiler suite, sharing no code at all with the GNU compilers. In addition, the C++ part of Clang claims conformance to the latest 2011 ISO C++ Standard, which few other compilers fully support in 2014.

An increasing number of software packages are programmed with features of the latest ISO Standards for C and C++, so it is essential to have the latest compiler versions available to permit building those packages from source code.

The problem

The standard GNU-style configuration and build procedure looks simple, and often is, for simple packages:

./configure && make all check install

However, when the package build requires nondefault compilers, or needs nondefault header files, or produces load libraries, or uses libraries that may have been installed locally, or is to be installed in a nonstandard location, or needs special control of the search path, it is necessary to augment the recipe with additional settings that define the locations and values of those resources:


export PATH

env       CC=gcc-4.9-20140716                                 \
         CXX=g++-4.9-20140716                                 \
      CFLAGS="-O2 -I$myprefix/include"                        \
    CXXFLAGS="-O2 -I$myprefix/include"                        \
     LDFLAGS="-Wl,rpath,$myprefix/lib64 -L$myprefix/lib64"    \
        LIBS="-lmpc -lmpfr -lgmp"                             \
    ./configure --prefix=$myprefix --libdir=$myprefix/lib64 &&
    make all check install

For Clang + LLVM, even that complex recipe rarely suffices, because both ends of the compiler family are written in bleeding-edge C++ syntax that is near, or at, the 2011 ISO C++ Standard. To build the compiler family, you need either a recent version of clang itself (probably version 3.4 or later), or you need the GNU compiler family, version 4.7 or later. No other compiler family available to the author of this document suffices, and he has many to choose from.

A solution

A Clang + LLVM installation contains more than 1600 files, including about 40 in the bin directory, most of which would never be directly invoked by users, so it is best to give the software its own installation tree. At the author's site, we use something like one of these:



The first part, /usr/local/ or /usr/uumath/, is the filesystem location where locally-installed software resides. That second choice is a recent local change, the reasons for which are described here .

The second part, ashare/, is our convention for architecture-dependent sharable files, and is distinct from the common share/ tree used for architecture-independent sharable files. By architecture, we mean a particular combination of an operating system major release and a computer CPU family; there is rarely any need at the user level to be concerned about particular vendor model names or numbers of computers, CPUs, and storage devices.

The third part, llvm/, is a container directory for one or more versions of the compiler family.

The fourth part, llvm-20140716, is the name of the snapshot of the compiler source code, encoding the year, month, and day of its downloading. The Clang + LLVM development is a continuous stream of changes, with major releases announced only about once a year, so we cannot give it a particular compiler version number. When version 3.5 is released, it would be called llvm-3.5 in this part of the pathname.

Most of the 40+ flavors of Unix in our test lab do not have sufficiently-new C and C++ compilers to build Clang + LLVM. However, we have many locally-installed versions of the GNU compilers that are identified by both a version number and a date stamp, such as gcc-4.9-20140702. That compiler family's development snapshots are released weekly at , and there are currently ten minor version tracks: 4.1, 4.2, ..., 4.10. Higher minor versions provide closer support to recent ISO language standards.

However, if we use one of those recent GNU compilers to build Clang + LLVM, compilation fails unless we tell the build process that it must use load libraries specific to that compiler release. We can do that with suitable settings of environment variables at configure time:




We are not quite done yet, because the default build of Clang + LLVM is a compiler built with debugging turned on, making it slow, and several times bigger than an optimized compiler. We therefore need one additional configure-time option:


Getting a Clang + LLVM snapshot

Because the Clang front-end is a separate development project from the LLVM backend, the developers, regrettably, do not provide date-stamped combined snapshots like the GNU compiler team does. Instead, people are expected to go through a painful and error-prone process of using svn to pull down various pieces of the projects into exactly the right locations in the download tree. That miserable process is documented at the development site in Getting Started: Building and Running Clang .

It is much better to put those steps into a local shell script, , that can be run without arguments to create a single distribution file, llvm-YYYYMMDD.tar.gz, in the current directory, where YYYYMMDD represents today's date. The downloaded files are preserved in the directory llvm-YYYYMMDD, but you can optionally remove that once the script completes. However, if you preserve it, you could use it to pre-populate a future snapshot directory, speeding the download process.

Tests show that the script takes about three to five minutes to run on a modern Unix system at the University of Utah, which has relatively fast connections to the Internet backbone. The master site from which code is downloaded is located in San Jose, CA, USA. Another test on a home computer with a slow cable-modem connection fetched and created a snapshot in six minutes.

Resource requirements

The Clang + LLVM source code is large. In mid-2014, it consists of about 2.1 million lines of code distributed across 80,000 files in 30,000 (yes, that many!) directories. C++ compilers have more work to do than C compilers, and the large number of source files mean that huge numbers of header files have to be searched for during a complete build. Indeed, based on system-call traces, a build of Clang + LLVM could fork 1.2 million processes, and make 265 million attempted file openings. The configure script takes about one minute to run, but the complete build and validation process takes three to eight hours on most of our servers. That time shortens to about two hours on servers that have solid-state storage, instead of rotating magnetic disks. It can be substantially reduced when the build host is powerful enough to permit the compilations to be done in parallel, as described later in this document.

Disk space requirements for the build are about 1GB for the compiler source tree, and 5GB in a separate build directory with optimization, or 18GB with the default build of a debug-level compiler.

Disk space requirements for the final installation are about 0.65GB for an optimized-compiler build, and 8GB for a debug-level-compiler build.

Attempts to build Clang + LLVM on an x86 server with only 1GB of DRAM failed with the compiler message Memory exhausted. Most servers at which this author has obtained successful builds have at least 8GB of DRAM, although builds have also succeeded on x86-64 virtual machines with only 1GB of DRAM.

Clang + LLVM requires GNU make, and its configure script checks that the bootstrap compiler passes this test:

#if __GNUC__ < 4 || (__GNUC__ == 4 && __GNUC_MINOR__ < 7)
#error This version of GCC is too old to build LLVM

That test essentially says that the bootstrap compilers must be compatible with gcc version 4.7 or later. Both clang and Intel icc define those symbols, but their values depend on the versions of those compilers. On Fedora 20 x86-64, where we have version 14.0.3 of the Intel compiler built by the vendor on 25 April 2014, those symbols are equivalent to version 4.8; unfortunately, a build with that compiler fails with compilation errors. On Red Hat 6.5, the same version of that compiler only claims compability with version 4.4, and thus, cannot be used. It appears from that behavior that the Intel compiler reports the version numbers of /usr/bin/gcc.

So far, at the author's site, all successful builds have been done with various snapshots of versions 4.7, 4.8, or 4.9 of the GNU compilers, or with Clang versions 3.4.2 or 3.5 (20140716 snapshot).

Simplifying the build process

The need to set several environment variables, and specify important configure-time options, strongly suggests that the build process should be controlled by a shell script that handles almost all of the drudge work. The only thing that the user should have to do is name the build compilers, and optionally provide the name of a Clang + LLVM snapshot, if the built-in default is not wanted.

The script handles that job, and can be used in any of three ways:

### Build with sufficiently-new vendor-provided C and C++ compilers
env BOOTSTRAPLIBDIR=/usr/lib/gcc/x86_64-redhat-linux/4.8.2         \
    CC=/usr/bin/gcc                                                \
    CXX=/usr/bin/g++                                               \
    /path/to/somewhere/ [optional-snapshot-name]

### Build with a recent version of clang
env CC=/usr/bin/clang                                              \
    CXX=/usr/bin/clang++                                           \
    /path/to/somewhere/ [optional-snapshot-name]

### Build with a locally-installed recent version of gcc (must be
### 4.8 or later)
env GCCVER=4.9-20140702                                            \
    /path/to/somewhere/ [optional-snapshot-name]

In the first case, it is necessary to find the location of the native compiler's installation tree. For most Unix flavors, that tree resides under /usr/lib/gcc, and ends with the same version number reported by gcc --version.

Once the builds are in progress, there is typically only a single line output for each source file compiled:

llvm[0]: Constructing LLVMBuild project information.
make[1]: Entering directory '/tmp/build/bare/llvm-20140716-objdir/lib/Support'
llvm[1]: Compiling APFloat.cpp for Release+Asserts build
llvm[1]: Compiling APInt.cpp for Release+Asserts build
llvm[1]: Compiling APSInt.cpp for Release+Asserts build
llvm[1]: Compiling ARMBuildAttrs.cpp for Release+Asserts build
llvm[1]: Compiling ARMWinEH.cpp for Release+Asserts build
llvm[1]: Compiling Allocator.cpp for Release+Asserts build
llvm[1]: Compiling Atomic.cpp for Release+Asserts build

On the author's systems, the build log remains clean, and largely free of compiler warnings.

If you prefer more verbose output showing the compiler commands actually used, add VERBOSE=1 to the make command in the shell script. That is most easily done by an environment variable setting in the command that invokes the build script:


If you have a sufficiently powerful multicore or multiprocessor server on which to build the compilers, then you might request parallelization of the thousands of compilation steps:


A test on a 64-core server with the --jobs=64 option got a successful build in under 10 minutes, instead of the usual time of several hours.

The build script intentionally does not run the make install command, but it reports how to do so. There are just too many ways that things can go wrong, and it is advisable to examine the end of the build log before proceeding with the installation. In a good case, that log should report near the end something like this:

  Expected Passes    : 18610
  Expected Failures  : 110
  Unsupported Tests  : 35
  Unexpected Failures: 13

1 warning(s) in tests.

On that particular system, the 13 failures were in tests of Ocaml bindings, which are unlikely to matter for use of the compilers on C and C++ code.

Where there are no Ocaml failures, a perfect final validation report looks like this:

  Expected Passes    : 18610
  Expected Failures  : 110
  Unsupported Tests  : 48

1 warning(s) in tests.

Successful builds

Builds, validations, and installations of the llvm-20140716 snapshot have been successfully completed by the author on at least these systems:

Operating systemCPUEnvironment variables
Arch Linux (LSB 1.4)x86-64BOOTSTRAPLIBDIR=/usr/lib/gcc/x86_64-unknown-linux-gnu/4.8.2 CC=/usr/bin/gcc CXX=/usr/bin/g++
Debian 6.0.9x86-64CXXFLAGS=-std=c++11 GCCVER=4.8-20130509
Debian 7.5x86-64GCCVER=4.9-20140716
Fedora 15x86-64[see next section]
Fedora 20x86-64BOOTSTRAPLIBDIR=/usr/lib/gcc/x86_64-redhat-linux/4.8.2 CC=/usr/bin/gcc CXX=/usr/bin/g++
Fedora 20x86GCCVER=4.9.0
FreeBSD 9.1x86CC=/usr/local/bin/clang CXX=/usr/local/bin/clang++ PYTHON=/usr/local/bin/python
FreeBSD 10.0x86-64CC=/usr/local/bin/clang CXX=/usr/local/bin/clang++ PYTHON=/usr/local/bin/python
Gentoo 2.2PowerPC-64BOOTSTRAPLIBDIR=/usr/local/ashare/gcc-4.9-20140716/lib64 GCCVER=4.9-20140716
OpenSUSE 11.4x86-64GCCVER=4.9-2013061
OpenSUSE 12.0x86-64GCCVER=4.9.0
OpenSUSE 13.0x86-64CC=/usr/local/bin/clang CXX=/usr/local/bin/clang++ (Clang 3.4.2)
Red Hat 6.5x86-64GCCVER=4.8.3
Red Hat 6.5x86-64GCCVER=4.9-20140716
Red Hat 7.0x86-64BOOTSTRAPLIBDIR=/usr/lib/gcc/x86_64-redhat-linux/4.8.2 CC=/usr/bin/gcc CXX=/usr/bin/g++
Red Hat 7.0x86-64GCCVER=4.9-20140702
Scientific Linux 6.5x86-64BOOTSTRAPLIBDIR=/usr/local/ashare/gcc-4.9-20140716/lib64 GCCVER=4.9-20140716
Slackware 14x86-64CC=/usr/local/bin/clang CXX=/usr/local/bin/clang++ (Clang 3.4.2)

Attempts to build that snapshot on Mac OS X (PowerPC and x86-64), Red Hat 5.10 (x86 and x86-64), and Solaris 10 (SPARC) have all failed for various reasons, including fatal internal compiler errors, lack of sufficiently-modern bootstrap compilers, failure to compile code in Clang + LLVM source files or in vendor-supplied system header files, and failure to assemble machine-language code produced by the bootstrap compiler. On Solaris 10 (x86 and x86-64), despite numerous attempts, we have been unable to build any versions of the GNU compilers newer than 4.7, and they are too old to use for Clang + LLVM. As noted earlier , on Fedora 20 x86-64, compilation with version 14.0.3 of the Intel compiler fails with syntax errors, and the same version on Red Hat 6.5 is determined to be too old to use.

Further investigation of the failure to build on Red Hat 5 is due, in part, to code with this statement in source file projects/compiler-rt/lib/sanitizer_common/

#include <linux/perf_event.h>

That header file, which belongs to the Linux kernel-headers package, does not exist on that system, which has Linux kernel 2.6.18. That file is present on Fedora 15, with kernel version 2.6.43. The file defines data types and kernel data structures that Clang code examines; it is thus not simple to provide a workaround for Red Hat 5 and other GNU/Linux systems that are not new enough to provide it.

Build attempts on other systems continue, and this document will be updated to report any further progress.

A patchwork build on Fedora 15

Initial build attempts on Fedora 15 (x86-64) failed. In a final attempt, it took several restarts with compiler and option changes to finally get a build that completed and passed the validation tests:

% env GCCVER=4.8-20130411 ./
... build begins, but later fails with compilation errors...

% cd llvm-20140716-objdir

### Try multiple compiler versions until one is found that handles the
### file that the initial compiler could not, and restart the make:

% make CC=/usr/local/bin/gcc-4.8-20120401 CXX=/usr/local/bin/g++-4.8-20120401
... build continues, but again fails ...

% make CC=/usr/local/bin/gcc-4.8.0 CXX=/usr/local/bin/g++-4.8.0
... build continues, but again fails ...

% make CC=/usr/local/bin/gcc-4.7.1 CXX=/usr/local/bin/g++-4.7.1
... build continues, but again fails ...

% make CC=/usr/local/bin/gcc-4.7.1 CXX=/usr/local/bin/g++-4.7.1
... build continues, but fails with fatal internal compiler error ...

% make CC=/usr/local/bin/gcc-4.7.1 CXX=/usr/local/bin/g++-4.7.1 \
... build fails immediately with fatal internal compiler error ...

% make CC=/usr/local/bin/gcc-4.7.1 CXX=/usr/local/bin/g++-4.7.1 \
... build fails immediately with fatal internal compiler error ...

% make CC=/usr/local/bin/gcc-4.7.1 CXX=/usr/local/bin/g++-4.7.1 \
... build continues, but again fails with missing shared library ...

% setenv LD_LIBRARY_PATH /usr/local/lib64

% make CC=/usr/local/bin/gcc-4.7.1 CXX=/usr/local/bin/g++-4.7.1
... build continues, but again fails for lack of ocaml header files ...

% make  CC='/usr/local/bin/gcc-4.7.1 -I/usr/lib64/ocaml' \
       CXX='/usr/local/bin/g++-4.7.1 -I/usr/lib64/ocaml' all check
... build and validation completes ...

  Expected Passes    : 11086
  Expected Failures  : 90
  Unsupported Tests  : 22

% make install

A later retry with a build using only the 4.7.1 compilers failed, even when optimization was reduced to -g, so for this system, it appears that no single-command build is feasible.

An attempt to use the Fedora 15 build on Red Hat 5 required a copy of one Fedora library, but failed with missing library symbols:

% ls /tmp/lib

% env LD_LIBRARY_PATH=/tmp/lib \
    /usr/local/ashare/llvm/llvm-20140716/bin/clang hello.c && ./a.out
/usr/local/ashare/llvm/llvm-20140716/bin/clang: /lib64/ \
    version `GLIBC_2.6' not found (required by \
/usr/local/ashare/llvm/llvm-20140716/bin/clang: /lib64/ \
    version `GLIBC_2.14' not found (required by \

It is worth asking whether the newly-built version 3.5 Clang + LLVM on Fedora 15 can be used to build itself without the multistep grief with gcc versions, because that would then make builds of future compiler releases easier. The answer is yes, with this recipe:

% env BOOTSTRAPLIBDIR=/usr/local/ashare/gcc-4.7.1/lib64 \
      CC=/usr/local/bin/clang-3.5 \
      CXX=/usr/local/bin/clang++-3.5 \
      CFLAGS=-I/usr/lib64/ocaml \
      CXXFLAGS='-I/usr/lib64/ocaml -I/usr/local/ashare/gcc-4.7.1/include/c++/4.7.1' \
      LD_LIBRARY_PATH=/usr/local/lib64 \
  Expected Passes    : 11086
  Expected Failures  : 90
  Unsupported Tests  : 22

The references to gcc-4.7.1 are essential, because they provide header files and a C++ library that are new enough for Clang + LLVM.

Completing the job

Once the make install step has been done manually after a successful build, there are two ways to hide the long paths to the compilers and tools, in order to make them convenient for users. If you have only a single version of them on your system, then just adding a pathname like /usr/local/ashare/llvm/llvm-20140716/bin to your default PATH setting is a reasonable approach.

However, as soon as you have multiple versions of software on your systems (and there are many good reasons for doing so), it is better to identify the programs with version numbers, with a versionless name being the default. For example, on one machine at the author's site, we have symbolic links in the /usr/local/bin directory that is in everyone's search path:

% cd /usr/local/bin

% ls -l clang clang-[0-9]* | cut -c51-
clang -> clang-3.5
clang-3.5 -> /usr/local/ashare/llvm/llvm-20140716/bin/clang

The default Clang compiler family version is 3.5, but 3.2 and 3.4 are available as local executables, or shell scripts, and 3.5 is a symbolic link into the installation tree.

Similar symbolic links are installed for clang++, clang-check, clang-format, clang-tblgen, and clang-tidy, which are the only programs from the installation directory that are likely to be of interest to local programmers.

The clang++* names are always symbolic links to corresponding clang* names: a single compiler executable chooses the input programming language according to its own name.

An alternate approach used on a few of our systems where only a single Clang + LLVM compiler version is expected to be needed is to create symbolic links like these:

% file /usr/local/bin/clang
/usr/local/bin/clang: symbolic link to `/usr/local/ashare/llvm/current/bin/clang'

% file /usr/local/ashare/llvm/current
/usr/local/ashare/llvm/current: symbolic link to `llvm-20140716'

That way, only the single link for current needs to be adjusted in the parent of the compiler directory when a new version is later installed.


To make it easier for others to install the Clang + LLVM compilers, we make the shell scripts, a few source snapshots, and binary installation trees for numerous platforms, freely available, but subject to the open-source licenses of the source code.

The shell scripts are written by the author of this document, and are public-domain files that you can revise as needed for your own site conventions.

The source and binary distributions are all digitally signed by the author of this document, who asserts that the source code is unmodified from that downloaded on the given date from the Clang + LLVM Web sites, and that the binary code was produced by him from the identical source code, using a build script similar to the one below, with the build flags documented in the earlier table of builds . The digital signature files have extension .sig, and their corresponding signed files can be verified with whichever of these works on your system (with YYYYMMDD replaced by the date stamp of the file that you downloaded):

% pgp  llvm-YYYYMMDD.tar.gz.sig

% gpg  llvm-YYYYMMDD.tar.gz.sig

% gpg2 llvm-YYYYMMDD.tar.gz.sig

Please do verify the digital signatures of downloaded files. They are your assurance that the download did not corrupt the file, and that, even if the download site is compromised and distributions are replaced with maliciously-modified ones, the digital signatures will not match, unless they too have been replaced, in which case, their owner is not the author of this document. Attackers are unlikely to be willing to identify themselves that way!

All build logs have been preserved, and any particular log file can be made available on request to the author. However, for security reasons, the particular name of the build host may be obliterated.

Shell scripts

Earlier in this document, we show the use of our scripts for building the compilers and downloading new snapshots . Here are links to the public-domain scripts:

Source distributions

These files contain the merged Clang + LLVM source trees, including the LLVM test suite, fetched by on the year, month, and day indicated in their name suffixes. Each file is about 0.2GB, and unpacks with tar xf llvm-YYYYMMDD.tar.gz, or with gunzip < llvm-YYYYMMDD.tar.gz | tar xf -, into a source-code directory tree named llvm-YYYYMMDD that requires about 1GB of storage space.

WARNING: These files have the same names as the corresponding binary distributions, so pick only one of them, or else download and unpack them in separate directories.

Binary distributions

Each binary distribution can be unpacked at any convenient point in the filesystem where you have write access, using the same recipes as given earlier . At the build site, they are stored in either of the directories /usr/local/ashare/llvm/ or /usr/uumath/ashare/llvm/.

WARNING: These files have the same names as the corresponding source distributions, so pick only one of them, or else download and unpack them in separate directories. Also, because the executables are linked against several shared libraries, it is possible that your system lacks some of them. For example, for the Red Hat 6 x86-64 build, a scan of all of the executables shows that they depend on these libraries for which equivalents are required:

% cd /usr/local/ashare/llvm/llvm-20140716/bin

% ldd * | awk '/\/local\// {print $3}' | sort -u

Replacements for those libraries might be providable by suitable settings of the environment variable LD_LIBRARY_PATH.

Even when the correct libraries are available, there may still be version incompatibilities, such as shown in this experiment with a Red Hat 6 build installed on a Fedora 15 system:

% cd /usr/local/ashare/llvm/llvm-20140716/bin

% ./clang --version
./clang: /usr/local/lib64/ \
         version `GLIBCXX_3.4.18' not found (required by ./clang)
./clang: /usr/local/lib64/ \
         version `GLIBCXX_3.4.20' not found (required by ./clang)

For convenience, and history, each directory listed here contains one or more binary distribution files, but their filenames do not encode the target platform name; they only indicate the name of the directory into which they unpack. Just follow the link to the directory that appears closest to your own system.

[an error occurred while processing this directive]